While laying some groundwork for constexpr std::vector, I noticed some
bugs in the std::uninitialized_xxx algorithms. The conditions being
checked for optimizing trivial cases were not quite right, as shown in
the examples in the PR.
This consolidates the checks into a single macro. The macro has
appropriate definitions for C++98 or for later standards, to avoid a #if
everywhere the checks are used. For C++11 and later the check makes a
call to a new function doing a static_assert to ensure we don't use
assignment in cases where construction would have been invalid.
Extracting that check to a separate function will be useful for
constexpr std::vector, as that can't use std::uninitialized_copy
directly because it isn't constexpr).
The consolidated checks mean that some slight variations in static
assert message are gone, as there is only one place that does the assert
now. That required adjusting some tests. As part of that the redundant
89164_c++17.cc test was merged into 89164.cc which is compiled as C++17
by default now, but can also use other -std options if the
C++17-specific error is made conditional with a target selector.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
PR libstdc++/102064
* include/bits/stl_uninitialized.h (_GLIBCXX_USE_ASSIGN_FOR_INIT):
Define macro to check conditions for optimizing trivial cases.
(__check_constructible): New function to do static assert.
(uninitialized_copy, uninitialized_fill, uninitialized_fill_n):
Use new macro.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/1.cc:
Adjust dg-error pattern.
* testsuite/23_containers/vector/cons/89164.cc: Likewise. Add
C++17-specific checks from 89164_c++17.cc.
* testsuite/23_containers/vector/cons/89164_c++17.cc: Removed.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/102064.cc:
New test.
* testsuite/20_util/specialized_algorithms/uninitialized_copy_n/102064.cc:
New test.
* testsuite/20_util/specialized_algorithms/uninitialized_fill/102064.cc:
New test.
* testsuite/20_util/specialized_algorithms/uninitialized_fill_n/102064.cc:
New test.
This function claims to remove a single character at index p, but it
actually removes p+1 characters beginning at p. So r.erase(0) removes
the first character, but r.erase(1) removes the second and third, and
r.erase(2) removes the second, third and fourth. This is not a useful
API.
The overload is present in the SGI STL <stl_rope.h> header that we
imported, but it isn't documented in the API reference. The erase
overloads that are documented are:
erase(const iterator& p)
erase(const iterator& f, const iterator& l)
erase(size_type i, size_type n);
Having an erase(size_type p) overload that erases a single character (as
the comment says it does) might be useful, but would be inconsistent
with std::basic_string::erase(size_type p = 0, size_type n = npos),
which erases from p to the end of the string when called with a single
argument.
Since the function isn't part of the documented API, doesn't do what it
claims to do (or anything useful) and "fixing" it would leave it
inconsistent with basic_string, I'm just removing that overload.
libstdc++-v3/ChangeLog:
PR libstdc++/102048
* include/ext/rope (rope::erase(size_type)): Remove broken
function.
So the problem here is there is code in the C++ front-end not to add a
break statement (to the IR) if the previous block does not fall through.
The problem is the code which does the check to see if the block
may fallthrough does not check a CLEANUP_STMT; it assumes it is always
fall through. Anyways this adds the code for the case of a CLEANUP_STMT
that is only for !CLEANUP_EH_ONLY (the try/finally case).
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/cp/ChangeLog:
PR c++/66590
* cp-objcp-common.c (cxx_block_may_fallthru): Handle
CLEANUP_STMT for the case which will be try/finally.
gcc/testsuite/ChangeLog:
PR c++/66590
* g++.dg/warn/Wreturn-5.C: New test.
The removal of remove_zero_width_bit_fields, in addition to triggering
some ABI issues that need solving anyway (ABI incompatibility between
C and C++) also resulted in UB inside of gcc, we now call build_zero_init
which calls build_int_cst on an integral type with TYPE_PRECISION of 0.
Fixed by ignoring the zero width bitfields. I understand
build_value_init_noctor wants to initialize to 0 even unnamed bitfields
(of non-zero width), at least until we have some CONSTRUCTOR flag that says
that even all the padding bits should be cleared.
2021-08-25 Jakub Jelinek <jakub@redhat.com>
PR c++/102019
* init.c (build_value_init_noctor): Ignore unnamed zero-width
bitfields.
this patch adds logic needed to merge neighbouring accesses in ipa-modref
summaries. This helps analyzing array initializers and similar code. It is
bit of work, since it breaks the fact that modref tree makes a good lattice for
dataflow: the access ranges can be extended indefinitely. For this reason I
added counter tracking number of adjustments and a cap to limit them during the
dataflow.
gcc/ChangeLog:
* doc/invoke.texi: Document --param modref-max-adjustments.
* ipa-modref-tree.c (test_insert_search_collapse): Update.
(test_merge): Update.
* ipa-modref-tree.h (struct modref_access_node): Add adjustments;
(modref_access_node::operator==): Fix handling of access ranges.
(modref_access_node::contains): Constify parameter; handle also
mismatched parm offsets.
(modref_access_node::update): New function.
(modref_access_node::merge): New function.
(unspecified_modref_access_node): Update constructor.
(modref_ref_node::insert_access): Add record_adjustments parameter;
handle merging.
(modref_ref_node::try_merge_with): New private function.
(modref_tree::insert): New record_adjustments parameter.
(modref_tree::merge): New record_adjustments parameter.
(modref_tree::copy_from): Update.
* ipa-modref.c (dump_access): Dump adjustments field.
(get_access): Update constructor.
(record_access): Update call of insert.
(record_access_lto): Update call of insert.
(merge_call_side_effects): Add record_adjustments parameter.
(get_access_for_fnspec): Update.
(process_fnspec): Update.
(analyze_call): Update.
(analyze_function): Update.
(read_modref_records): Update.
(ipa_merge_modref_summary_after_inlining): Update.
(propagate_unknown_call): Update.
(modref_propagate_in_scc): Update.
* params.opt (param-max-modref-adjustments=): New.
gcc/testsuite/ChangeLog:
* gcc.dg/ipa/modref-1.c: Update testcase.
* gcc.dg/tree-ssa/modref-4.c: Update testcase.
* gcc.dg/tree-ssa/modref-8.c: New test.
I noticed that the built-functions for xxspltiw, xxspltidp, xxsplti32dx,
xxpermx, and xxeval all used the 'vecsimple' type. These instructions are
permute instructions (3 cycle latency) and should use 'vecperm' instead.
While I was at it, I changed the UNSPEC name for xxspltidp to be
UNSPEC_XXSPLTIDP instead of UNSPEC_XXSPLTID.
2021-08-25 Michael Meissner <meissner@linux.ibm.com>
gcc/
* config/rs6000/vsx.md (UNSPEC_XXSPLTIDP): Rename from
UNSPEC_XXSPLTID.
(xxspltiw_v4si): Use vecperm type attribute.
(xxspltiw_v4si_inst): Use vecperm type attribute.
(xxspltiw_v4sf_inst): Likewise.
(xxspltidp_v2df): Use vecperm type attribute. Use
UNSPEC_XXSPLTIDP instead of UNSPEC_XXSPLTID.
(xxspltidp_v2df_inst): Likewise.
(xxsplti32dx_v4si): Use vecperm type attribute.
(xxsplti32dx_v4si_inst): Likewise.
(xxsplti32dx_v4sf_inst): Likewise.
(xxblend_<mode>): Likewise.
(xxpermx): Likewise.
(xxpermx_inst): Likewise.
(xxeval): Likewise.
Adds the logic to handle -finput-charset in layout_get_source_line(), so that
source lines are converted from their input encodings prior to being output by
diagnostics machinery. Also adds the ability to strip a UTF-8 BOM similarly.
gcc/c-family/ChangeLog:
PR other/93067
* c-opts.c (c_common_input_charset_cb): New function.
(c_common_post_options): Call new function
diagnostic_initialize_input_context().
gcc/d/ChangeLog:
PR other/93067
* d-lang.cc (d_input_charset_callback): New function.
(d_init): Call new function
diagnostic_initialize_input_context().
gcc/fortran/ChangeLog:
PR other/93067
* cpp.c (gfc_cpp_post_options): Call new function
diagnostic_initialize_input_context().
gcc/ChangeLog:
PR other/93067
* coretypes.h (typedef diagnostic_input_charset_callback): Declare.
* diagnostic.c (diagnostic_initialize_input_context): New function.
* diagnostic.h (diagnostic_initialize_input_context): Declare.
* input.c (default_charset_callback): New function.
(file_cache::initialize_input_context): New function.
(file_cache_slot::create): Added ability to convert the input
according to the input context.
(file_cache::file_cache): Initialize the new input context.
(class file_cache_slot): Added new m_alloc_offset member.
(file_cache_slot::file_cache_slot): Initialize the new member.
(file_cache_slot::~file_cache_slot): Handle potentially offset buffer.
(file_cache_slot::maybe_grow): Likewise.
(file_cache_slot::needs_read_p): Handle NULL fp, which is now possible.
(file_cache_slot::get_next_line): Likewise.
* input.h (class file_cache): Added input context member.
libcpp/ChangeLog:
PR other/93067
* charset.c (init_iconv_desc): Adapt to permit PFILE argument to
be NULL.
(_cpp_convert_input): Likewise. Also move UTF-8 BOM logic to...
(cpp_check_utf8_bom): ...here. New function.
(cpp_input_conversion_is_trivial): New function.
* files.c (read_file_guts): Allow PFILE argument to be NULL. Add
INPUT_CHARSET argument as an alternate source of this information.
(read_file): Pass the new argument to read_file_guts.
(cpp_get_converted_source): New function.
* include/cpplib.h (struct cpp_converted_source): Declare.
(cpp_get_converted_source): Declare.
(cpp_input_conversion_is_trivial): Declare.
(cpp_check_utf8_bom): Declare.
gcc/testsuite/ChangeLog:
PR other/93067
* gcc.dg/diagnostic-input-charset-1.c: New test.
* gcc.dg/diagnostic-input-utf8-bom.c: New test.
When we swap operands for SLP builds we lose track where exactly
pattern defs are - but we fail to update the any_pattern member
of the operands info. Do so conservatively.
2021-08-25 Richard Biener <rguenther@suse.de>
PR tree-optimization/102046
* tree-vect-slp.c (vect_build_slp_tree_2): Conservatively
update ->any_pattern when swapping operands.
* gcc.dg/vect/pr102046.c: New testcase.
For ASHIFT + ZERO_EXTEND pattern, combine pass failed to
match it to lea since it will generate non-canonical
zero-extend. Adjust predicate and cost_model to allow combine
for lea.
gcc/ChangeLog:
PR target/101716
* config/i386/i386.c (ix86_live_on_entry): Adjust comment.
(ix86_decompose_address): Remove retval check for ASHIFT,
allow non-canonical zero extend if AND mask covers ASHIFT
count.
(ix86_legitimate_address_p): Adjust condition for decompose.
(ix86_rtx_costs): Adjust cost for lea with non-canonical
zero-extend.
Co-Authored by: Uros Bizjak <ubizjak@gmail.com>
gcc/testsuite/ChangeLog:
PR target/101716
* gcc.target/i386/pr101716.c: New test.
For code like:
unsigned foo(unsigned val, unsigned start)
{
unsigned cnt = 0;
for (unsigned i = start; i > val; ++i)
cnt++;
return cnt;
}
The number of iterations should be about UINT_MAX - start.
There is function adjust_cond_for_loop_until_wrap which
handles similar work for const bases.
Like adjust_cond_for_loop_until_wrap, this patch enhance
function number_of_iterations_cond/number_of_iterations_lt
to analyze number of iterations for this kind of loop.
gcc/ChangeLog:
2021-08-25 Jiufu Guo <guojiufu@linux.ibm.com>
PR tree-optimization/101145
* tree-ssa-loop-niter.c (number_of_iterations_until_wrap):
New function.
(number_of_iterations_lt): Invoke above function.
(adjust_cond_for_loop_until_wrap):
Merge to number_of_iterations_until_wrap.
(number_of_iterations_cond): Update invokes for
adjust_cond_for_loop_until_wrap and number_of_iterations_lt.
gcc/testsuite/ChangeLog:
2021-08-25 Jiufu Guo <guojiufu@linux.ibm.com>
PR tree-optimization/101145
* gcc.dg/vect/pr101145.c: New test.
* gcc.dg/vect/pr101145.inc: New test.
* gcc.dg/vect/pr101145_1.c: New test.
* gcc.dg/vect/pr101145_2.c: New test.
* gcc.dg/vect/pr101145_3.c: New test.
* gcc.dg/vect/pr101145inf.c: New test.
* gcc.dg/vect/pr101145inf.inc: New test.
* gcc.dg/vect/pr101145inf_1.c: New test.
The existing vec_unpacku_{hi,lo} supports emulated unsigned
unpacking for short and char but misses the support for int.
This patch adds the support of vec_unpacku_{hi,lo}_v4si.
Meanwhile, the current implementation uses vector permutation
way, which requires one extra customized constant vector as
the permutation control vector. It's better to use vector
merge high/low with zero constant vector, to save the space
in constant area as well as the cost to initialize pcv in
prologue. This patch updates it with vector merging and
simplify it with iterators.
gcc/ChangeLog:
* config/rs6000/altivec.md (vec_unpacku_hi_v16qi): Remove.
(vec_unpacku_hi_v8hi): Likewise.
(vec_unpacku_lo_v16qi): Likewise.
(vec_unpacku_lo_v8hi): Likewise.
(vec_unpacku_hi_<VP_small_lc>): New define_expand.
(vec_unpacku_lo_<VP_small_lc>): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/unpack-vectorize-1.c: New test.
* gcc.target/powerpc/unpack-vectorize-1.h: New test.
* gcc.target/powerpc/unpack-vectorize-2.c: New test.
* gcc.target/powerpc/unpack-vectorize-2.h: New test.
* gcc.target/powerpc/unpack-vectorize-3.c: New test.
* gcc.target/powerpc/unpack-vectorize-3.h: New test.
* gcc.target/powerpc/unpack-vectorize-run-1.c: New test.
* gcc.target/powerpc/unpack-vectorize-run-2.c: New test.
* gcc.target/powerpc/unpack-vectorize-run-3.c: New test.
* gcc.target/powerpc/unpack-vectorize.h: New test.
AIX 7.3 system headers are C++ safe and GCC no longer needs to define
SYSTEM_IMPLICIT_EXTERN_C for AIX 7.3. This patch moves the definition
from aix.h to the individual OS-level configuration files and does not
define the macro for AIX 7.3.
The patch also corrects the definition of TARGET_AIX_VERSION to 73.
gcc/ChangeLog:
* config/rs6000/aix.h (SYSTEM_IMPLICIT_EXTERN_C): Delete.
* config/rs6000/aix71.h (SYSTEM_IMPLICIT_EXTERN_C): Define.
* config/rs6000/aix72.h (SYSTEM_IMPLICIT_EXTERN_C): Define.
* config/rs6000/aix73.h (TARGET_AIX_VERSION): Increase to 73.
My apologies again. My patch to simplify truncations of SUBREGs in
simplify-rtx.c contained an error where I'd accidentally compared
against a mode instead of the precision of that mode. Grr! It even
survived regression testing on two platforms. Fixed below, and
committed as obvious, after a full "make bootstrap" and "make -k check"
on x86_64-pc-linux-gnu with no new regressions.
2021-08-24 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR middle-end/102031
* simplify-rtx.c (simplify_truncation): When comparing precisions
use "subreg_prec" variable, not "subreg_mode".
gcc/fortran/ChangeLog:
PR fortran/98411
* trans-decl.c (gfc_finish_var_decl): Adjust check to handle
implicit SAVE as well as variables in the main program. Improve
warning message text.
gcc/testsuite/ChangeLog:
PR fortran/98411
* gfortran.dg/pr98411.f90: Adjust testcase options to restrict to
F2008, and verify case of implicit SAVE.
2021-08-24 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/
* config/rs6000/rs6000-call.c (rs6000_init_builtins): Initialize
various pointer type nodes.
* config/rs6000/rs6000.h (rs6000_builtin_type_index): Add enum
values for various pointer types.
(ptr_V16QI_type_node): New macro.
(ptr_V1TI_type_node): New macro.
(ptr_V2DI_type_node): New macro.
(ptr_V2DF_type_node): New macro.
(ptr_V4SI_type_node): New macro.
(ptr_V4SF_type_node): New macro.
(ptr_V8HI_type_node): New macro.
(ptr_unsigned_V16QI_type_node): New macro.
(ptr_unsigned_V1TI_type_node): New macro.
(ptr_unsigned_V8HI_type_node): New macro.
(ptr_unsigned_V4SI_type_node): New macro.
(ptr_unsigned_V2DI_type_node): New macro.
(ptr_bool_V16QI_type_node): New macro.
(ptr_bool_V8HI_type_node): New macro.
(ptr_bool_V4SI_type_node): New macro.
(ptr_bool_V2DI_type_node): New macro.
(ptr_bool_V1TI_type_node): New macro.
(ptr_pixel_type_node): New macro.
(ptr_intQI_type_node): New macro.
(ptr_uintQI_type_node): New macro.
(ptr_intHI_type_node): New macro.
(ptr_uintHI_type_node): New macro.
(ptr_intSI_type_node): New macro.
(ptr_uintSI_type_node): New macro.
(ptr_intDI_type_node): New macro.
(ptr_uintDI_type_node): New macro.
(ptr_intTI_type_node): New macro.
(ptr_uintTI_type_node): New macro.
(ptr_long_integer_type_node): New macro.
(ptr_long_unsigned_type_node): New macro.
(ptr_float_type_node): New macro.
(ptr_double_type_node): New macro.
(ptr_long_double_type_node): New macro.
(ptr_dfloat64_type_node): New macro.
(ptr_dfloat128_type_node): New macro.
(ptr_ieee128_type_node): New macro.
(ptr_ibm128_type_node): New macro.
(ptr_vector_pair_type_node): New macro.
(ptr_vector_quad_type_node): New macro.
(ptr_long_long_integer_type_node): New macro.
(ptr_long_long_unsigned_type_node): New macro.
This patch adds a __PTX_SM__ predefined macro to the nvptx backend that
allows code to check the compute model being targeted by the compiler.
This is equivalent to the __CUDA_ARCH__ macro defined by CUDA's nvcc
compiler, but to avoid causing problems for source code that checks
for that compiler, this macro uses GCC's nomenclature; it's easy
enough for users to "#define __CUDA_ARCH__ __PTX_SM__".
What might have been a four line patch is actually a little more
complicated, as this patch takes the opportunity to upgrade the
nvptx backend to use the now preferred nvptx-c.c idiom.
2021-08-24 Roger Sayle <roger@nextmovesoftware.com>
Tom de Vries <tdevries@suse.de>
gcc/ChangeLog
* config.gcc (nvptx-*-*): Define {c,c++}_target_objs.
* config/nvptx/nvptx-protos.h (nvptx_cpu_cpp_builtins): Prototype.
* config/nvptx/nvptx.h (TARGET_CPU_CPP_BUILTINS): Implement with
a call to the new nvptx_cpu_cpp_builtins function in nvptx-c.c.
* config/nvptx/t-nvptx (nvptx-c.o): New rule.
* config/nvptx/nvptx-c.c: New source file.
(nvptx_cpu_cpp_builtins): Move implementation here.
Resolves:
PR middle-end/101600 - Spurious -Warray-bounds downcasting a polymorphic pointer
PR middle-end/101977 - bogus -Warray-bounds on a negative index into a parameter in conditional with null
gcc/ChangeLog:
PR middle-end/101600
PR middle-end/101977
* gimple-ssa-warn-access.cc (maybe_warn_for_bound): Tighten up
the phrasing of a warning.
(check_access): Use the remaining size after subtracting any offset
rather than the whole object size.
* pointer-query.cc (access_ref::get_ref): Clear BASE0 flag if it's
clear for any nonnull PHI argument.
(compute_objsize): Clear argument.
gcc/testsuite/ChangeLog:
PR middle-end/101600
PR middle-end/101977
* g++.dg/pr100574.C: Prune out valid warning.
* gcc.dg/pr20126.c: Same.
* gcc.dg/Wstringop-overread.c: Adjust text of expected warnings.
Add new instances.
* gcc.dg/warn-strnlen-no-nul.c: Same.
* g++.dg/warn/Warray-bounds-26.C: New test.
* gcc.dg/Warray-bounds-88.c: New test.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* include/std/type_traits (is_layout_compatible): Define.
(is_corresponding_member): Define.
* include/std/version (__cpp_lib_is_layout_compatible): Define.
* testsuite/20_util/is_layout_compatible/is_corresponding_member.cc:
New test.
* testsuite/20_util/is_layout_compatible/value.cc: New test.
* testsuite/20_util/is_layout_compatible/version.cc: New test.
* testsuite/20_util/is_pointer_interconvertible/with_class.cc:
New test.
* testsuite/23_containers/span/layout_compat.cc: Do not use real
std::is_layout_compatible trait if available.
When registering relations in the oracle, search for other relations which
imply new transitive relations.
gcc/
* value-relation.cc (rr_transitive_table): New.
(relation_transitive): New.
(value_relation::swap): Remove.
(value_relation::apply_transitive): New.
(relation_oracle::relation_oracle): Allocate a new tmp bitmap.
(relation_oracle::register_relation): Call register_transitives.
(relation_oracle::register_transitives): New.
* value-relation.h (relation_oracle): Add new temporary bitmap and
methods.
gcc/testsuite/
* gcc.dg/predict-1.c: Disable evrp.
* gcc.dg/tree-ssa/evrp-trans.c: New.
Clang warns about this, but GCC doesn't (see PR c++/102036).
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* src/c++11/cxx11-shim_facets.cc: Fix mismatched class-key in
explicit instantiation definitions.
Broadcast from integer to a pseudo vector register instead of a hard
vector register to allow LRA to remove redundant move instruction after
broadcast.
gcc/
PR target/102021
* config/i386/i386-expand.c (ix86_expand_vector_move): Broadcast
from integer to a pseudo vector register.
gcc/testsuite/
PR target/102021
* gcc.target/i386/pr100865-10b.c: Expect vzeroupper.
* gcc.target/i386/pr100865-4b.c: Likewise.
* gcc.target/i386/pr100865-6b.c: Expect vmovdqu and vzeroupper.
* gcc.target/i386/pr100865-7b.c: Likewise.
* gcc.target/i386/pr102021.c: New test.
This avoids leaving scalar if-converted code around for the case
of BB vectorizing an if-converted loop body when using the very-cheap
cost model. In this case we scan not vectorized scalar stmts in
the basic-block vectorized for COND_EXPRs and force the vectorization
to be marked as not profitable.
The patch also makes sure to always consider all BB vectorization
subgraphs together for costing purposes when vectorizing an
if-converted loop body.
2021-08-24 Richard Biener <rguenther@suse.de>
PR tree-optimization/100089
* tree-vectorizer.h (vect_slp_bb): Rename to ...
(vect_slp_if_converted_bb): ... this and get the original
loop as new argument.
* tree-vectorizer.c (try_vectorize_loop_1): Revert previous fix,
pass original loop to vect_slp_if_converted_bb.
* tree-vect-slp.c (vect_bb_vectorization_profitable_p):
If orig_loop was passed scan the not vectorized stmts
for COND_EXPRs and force not profitable if found.
(vect_slp_region): Pass down all SLP instances to costing
if orig_loop was specified.
(vect_slp_bbs): Pass through orig_loop.
(vect_slp_bb): Rename to ...
(vect_slp_if_converted_bb): ... this and get the original
loop as new argument.
(vect_slp_function): Adjust.
For Armv8.1-m we generate code that emits VLLDM directly and do not
rely on support code in the library, so emit the mitigation directly
as well, when required. In this case, we can use the compiler options
to determine when to apply the fix and when it is safe to omit it.
gcc:
PR target/102035
* config/arm/arm.md (attribute arch): Add fix_vlldm.
(arch_enabled): Use it.
* config/arm/vfp.md (lazy_store_multiple_insn): Add alternative to
use when erratum mitigation is needed.
Add the recommended erratum mitigation sequence to
__gnu_cmse_nonsecure_call for use on Armv8-m.main devices. Since this
is in the library code we cannot know in advance whether the core we
are running on will be affected by this, so always enable it.
libgcc:
PR target/102035
* config/arm/cmse_nonsecure_call.S (__gnu_cmse_nonsecure_call):
Add vlldm erratum work-around.
Add a new option, -mfix-cmse-cve-2021-35465 and document it. Enable it
automatically for cortex-m33, cortex-m35p and cortex-m55.
gcc:
PR target/102035
* config/arm/arm.opt (mfix-cmse-cve-2021-35465): New option.
* doc/invoke.texi (Arm Options): Document it.
* config/arm/arm-cpus.in (quirk_vlldm): New feature bit.
(ALL_QUIRKS): Add quirk_vlldm.
(cortex-m33): Add quirk_vlldm.
(cortex-m35p, cortex-m55): Likewise.
* config/arm/arm.c (arm_option_override): Enable fix_vlldm if
targetting an affected CPU and not explicitly controlled on
the command line.
The test for CMSE support being available in hardware currently
relies on the compiler not optimizing away a secure gateway operation.
But even that is suspect, because the SG instruction is just a NOP
on armv8-m implementations that do not support the security extension.
Replace the existing test with a new one that reads and checks
the appropriate hardware feature register (memory mapped). This has
to be run from secure mode, but that shouldn't matter, because if we
can't do that we can't really test the CMSE extensions anyway. We
retain the SG instruction to ensure the test can't pass accidentally
if run on pre-armv8-m devices.
gcc/testsuite:
* lib/target-supports.exp (check_effective_target_arm_cmse_hw):
Check the CMSE feature register, rather than relying on the
SG operation causing an execution fault.
Both lazy_store_multiple_insn and lazy_load_multiple_insn contain
invalid RTL (eg they contain a post_inc statement outside of a mem).
What's more, the instructions concerned do not modify their input
address register. We probably got away with this because they are
generated so late in the compilation that no subsequent pass needed to
understand them. Nevertheless, this could cause problems someday, so
fixed to use a simple legal unspec.
gcc:
* config/arm/vfp.md (lazy_store_multiple_insn): Rewrite as valid RTL.
(lazy_load_multiple_insn): Likewise.
Also optimize below 3 forms to vpternlog, op1, op2, op3 are
register_operand or unary_p as (not reg)
A: (any_logic (any_logic op1 op2) op3)
B: (any_logic (any_logic op1 op2) (any_logic op3 op4)) op3/op4 should
be equal to op1/op2
C: (any_logic (any_logic (any_logic:op1 op2) op3) op4) op3/op4 should
be equal to op1/op2
gcc/ChangeLog:
PR target/101989
* config/i386/i386.c (ix86_rtx_costs): Define cost for
UNSPEC_VTERNLOG.
* config/i386/i386.h (STRIP_UNARY): New macro.
* config/i386/predicates.md (reg_or_notreg_operand): New
predicate.
* config/i386/sse.md (*<avx512>_vternlog<mode>_all): New define_insn.
(*<avx512>_vternlog<mode>_1): New pre_reload
define_insn_and_split.
(*<avx512>_vternlog<mode>_2): Ditto.
(*<avx512>_vternlog<mode>_3): Ditto.
(any_logic1,any_logic2): New code iterator.
(logic_op): New code attribute.
(ternlogsuffix): Extend to VNxDF and VNxSF.
gcc/testsuite/ChangeLog:
PR target/101989
* gcc.target/i386/pr101989-1.c: New test.
* gcc.target/i386/pr101989-2.c: New test.
* gcc.target/i386/avx512bw-shiftqihi-constant-1.c: Adjust testcase.
This makes use of the estimated number of iterations of the inner loop
to limit --param vect-inner-loop-cost-factor scaling. It also reduces
the maximum value of vect-inner-loop-cost-factor to 10000 making it
less likely to cause overflow of costs.
2021-08-23 Richard Biener <rguenther@suse.de>
* doc/invoke.texi (vect-inner-loop-cost-factor): Adjust.
* params.opt (--param vect-inner-loop-cost-factor): Adjust
maximum value.
* tree-vect-loop.c (vect_analyze_loop_form): Initialize
inner_loop_cost_factor to the minimum of the estimated number
of iterations of the inner loop and vect-inner-loop-cost-factor.
There are a few problems with download_prerequisites are
described in PR 82704. The first is on busy-box version of
shasum and md5sum the extended option --check don't exist
so just use -c. The second issue is the code for which
shasum program to use is included twice and is different.
So move which program to use for the checksum after argument
parsing. The last issue is --md5 option has been broken for
sometime now as the program is named md5sum and not just md5.
Nobody updated switch table to be correct.
contrib/ChangeLog:
PR other/82704
* download_prerequisites: Fix issues with --md5 and
--sha512 options.
Back in June I briefly mentioned in one of my gcc-patches posts that
a change that should have always reduced code size, would mysteriously
occasionally result in slightly larger code (according to CSiBE):
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573233.html
Investigating further, the cause turns out to be that x86_64's
scalar-to-vector (stv) pass is relying on poor estimates of the size
costs/benefits. This patch tweaks the backend's compute_convert_gain
method to provide slightly more accurate values when compiling with
-Os. Compilation without -Os is (should be) unaffected. And for
completeness, I'll mention that the stv pass is a net win for code
size so it's much better to improve its heuristics than simply gate
the pass on !optimize_for_size.
The net effect of this change is to save 1399 bytes on the CSiBE
code size benchmark when compiling with -Os.
2021-08-24 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
* config/i386/i386-features.c (compute_convert_gain): Provide
more accurate values for CONST_INT, when optimizing for size.
* config/i386/i386.c (COSTS_N_BYTES): Move definition from here...
* config/i386/i386.h (COSTS_N_BYTES): to here.
My sincere apologies to everyone (again). As diagnosed by
Jakub Jelinek, my recent patch to fold the signedness of LSHIFT_EXPR
needs to be careful not to attempt transforming a left shift in an
integer type into an invalid left shift of a pointer type.
2021-08-24 Roger Sayle <roger@nextmovesoftware.com>
Jakub Jelinek <jakub@redhat.com>
gcc/ChangeLog
PR middle-end/102029
* match.pd (shift transformations): Add an additional check for
!POINTER_TYPE_P in the recently added left shift transformation.
gcc/testsuite/ChangeLog
PR middle-end/102029
* gcc.dg/fold-convlshift-3.c: New test case.
When investigating false positives on the Linux kernel from
-Wanalyzer-use-of-uninitialized-value, I noticed that the existing
implementation of switch statements in the analyzer is broken.
Specifically, the existing implementation assumes a 1:1 association
between CFG out-edges from the basic block and case labels in the
gimple switch statement. This happened to be the case in the
examples I had tested, but there is no such association in general.
In particular, in the motivating example:
arch/x86/kernel/cpu/mtrr/if.c: mtrr_ioctl
the switch statement has 3 blocks, each covering multiple ranges of
ioctl command IDs for which different local variables are initialized,
which the existing implementation gets badly wrong. [1]
This patch reimplements switch handling in the analyzer to eliminate
this false assumption - instead, for each out-edge we gather the set
of case labels for that out-edge, and use that to determine the
set of value ranges for the edge. Avoiding false positives for the
above example requires that we accurately track value ranges for
symbolic values, so the patch extends constraint_manager with a new
bounded_ranges_constraint, adding just enough information to capture the
ranges for switch statements whilst retaining combatility with the
existing constraint-handling (ultimately I'd prefer to simply throw
all of this into a SAT solver and let it track things).
Doing so fixes the false positives seen on the Linux kernel and an
existing xfail in the test suite.
The patch also fixes a long-standing bug in
constraint_manager::add_unknown_constraint when updating constraints
due to combining equivalence classes, spotted when debugging the
same logic for the new kind of constraints.
[1] a reduced version of this code is captured in this patch, in
gcc.dg/analyzer/torture/switch-3.c
gcc/analyzer/ChangeLog:
* analyzer.h (struct rejected_constraint): Convert to...
(class rejected_constraint): ...this.
(class bounded_ranges): New forward decl.
(class bounded_ranges_manager): New forward decl.
* constraint-manager.cc: Include "analyzer/analyzer-logging.h" and
"tree-pretty-print.h".
(can_plus_one_p): New.
(plus_one): New.
(can_minus_one_p): New.
(minus_one): New.
(bounded_range::bounded_range): New.
(dump_cst): New.
(bounded_range::dump_to_pp): New.
(bounded_range::dump): New.
(bounded_range::to_json): New.
(bounded_range::set_json_attr): New.
(bounded_range::contains_p): New.
(bounded_range::intersects_p): New.
(bounded_range::operator==): New.
(bounded_range::cmp): New.
(bounded_ranges::bounded_ranges): New.
(bounded_ranges::bounded_ranges): New.
(bounded_ranges::bounded_ranges): New.
(bounded_ranges::canonicalize): New.
(bounded_ranges::validate): New.
(bounded_ranges::operator==): New.
(bounded_ranges::dump_to_pp): New.
(bounded_ranges::dump): New.
(bounded_ranges::to_json): New.
(bounded_ranges::eval_condition): New.
(bounded_ranges::contain_p): New.
(bounded_ranges::cmp): New.
(bounded_ranges_manager::~bounded_ranges_manager): New.
(bounded_ranges_manager::get_or_create_empty): New.
(bounded_ranges_manager::get_or_create_point): New.
(bounded_ranges_manager::get_or_create_range): New.
(bounded_ranges_manager::get_or_create_union): New.
(bounded_ranges_manager::get_or_create_intersection): New.
(bounded_ranges_manager::get_or_create_inverse): New.
(bounded_ranges_manager::consolidate): New.
(bounded_ranges_manager::get_or_create_ranges_for_switch): New.
(bounded_ranges_manager::create_ranges_for_switch): New.
(bounded_ranges_manager::make_case_label_ranges): New.
(bounded_ranges_manager::log_stats): New.
(bounded_ranges_constraint::print): New.
(bounded_ranges_constraint::to_json): New.
(bounded_ranges_constraint::operator==): New.
(bounded_ranges_constraint::add_to_hash): New.
(constraint_manager::constraint_manager): Update for new field
m_bounded_ranges_constraints.
(constraint_manager::operator=): Likewise.
(constraint_manager::hash): Likewise.
(constraint_manager::operator==): Likewise.
(constraint_manager::print): Likewise.
(constraint_manager::dump_to_pp): Likewise.
(constraint_manager::to_json): Likewise.
(constraint_manager::add_unknown_constraint): Update the lhs_ec_id
if necessary in existing constraints when combining equivalence
classes. Add similar code for handling
m_bounded_ranges_constraints.
(constraint_manager::add_constraint_internal): Add comment.
(constraint_manager::add_bounded_ranges): New.
(constraint_manager::eval_condition): Use new field
m_bounded_ranges_constraints.
(constraint_manager::purge): Update bounded_ranges_constraint
instances.
(constraint_manager::canonicalize): Update for new field.
(merger_fact_visitor::on_ranges): New.
(constraint_manager::for_each_fact): Use new field
m_bounded_ranges_constraints.
(constraint_manager::validate): Fix off-by-one error needed due
to bug fixed above in add_unknown_constraint. Validate the EC IDs
in m_bounded_ranges_constraints.
(constraint_manager::get_range_manager): New.
(selftest::assert_dump_bounded_range_eq): New.
(ASSERT_DUMP_BOUNDED_RANGE_EQ): New.
(selftest::test_bounded_range): New.
(selftest::assert_dump_bounded_ranges_eq): New.
(ASSERT_DUMP_BOUNDED_RANGES_EQ): New.
(selftest::test_bounded_ranges): New.
(selftest::run_constraint_manager_tests): Call the new selftests.
* constraint-manager.h (struct bounded_range): New.
(struct bounded_ranges): New.
(template <> struct default_hash_traits<bounded_ranges::key_t>): New.
(class bounded_ranges_manager): New.
(fact_visitor::on_ranges): New pure virtual function.
(class bounded_ranges_constraint): New.
(constraint_manager::add_bounded_ranges): New decl.
(constraint_manager::get_range_manager): New decl.
(constraint_manager::m_bounded_ranges_constraints): New field.
* diagnostic-manager.cc (epath_finder::process_worklist_item):
Transfer ownership of rc to add_feasibility_problem.
* engine.cc (feasibility_problem::dump_to_pp): Use get_model.
* feasible-graph.cc (infeasible_node::dump_dot): Update for
conversion of m_rc to a pointer.
(feasible_graph::add_feasibility_problem): Pass RC by pointer and
take ownership.
* feasible-graph.h (infeasible_node::infeasible_node): Pass RC by
pointer and take ownership.
(infeasible_node::~infeasible_node): New.
(infeasible_node::m_rc): Convert to a pointer.
(feasible_graph::add_feasibility_problem): Pass RC by pointer and
take ownership.
* region-model-manager.cc: Include
"analyzer/constraint-manager.h".
(region_model_manager::region_model_manager): Initializer new
field m_range_mgr.
(region_model_manager::~region_model_manager): Delete it.
(region_model_manager::log_stats): Call log_stats on it.
* region-model.cc (region_model::add_constraint): Use new subclass
rejected_op_constraint.
(region_model::apply_constraints_for_gswitch): Reimplement using
bounded_ranges_manager.
(rejected_constraint::dump_to_pp): Convert to...
(rejected_op_constraint::dump_to_pp): ...this.
(rejected_ranges_constraint::dump_to_pp): New.
* region-model.h (struct purge_stats): Add field
m_num_bounded_ranges_constraints.
(region_model_manager::get_range_manager): New.
(region_model_manager::m_range_mgr): New.
(region_model::get_range_manager): New.
(struct rejected_constraint): Split into...
(class rejected_constraint):...this new abstract base class,
and...
(class rejected_op_constraint): ...this new concrete subclass.
(class rejected_ranges_constraint): New.
* supergraph.cc: Include "tree-cfg.h".
(supergraph::supergraph): Drop idx param from add_cfg_edge.
(supergraph::add_cfg_edge): Drop idx param.
(switch_cfg_superedge::switch_cfg_superedge): Move here from
header. Populate m_case_labels with all cases which go to DST.
(switch_cfg_superedge::dump_label_to_pp): Reimplement to use
m_case_labels.
(switch_cfg_superedge::get_case_label): Delete.
* supergraph.h (supergraphadd_cfg_edge): Drop "idx" param.
(switch_cfg_superedge::switch_cfg_superedge): Drop idx param and
move implementation to supergraph.cc.
(switch_cfg_superedge::get_case_label): Delete.
(switch_cfg_superedge::get_case_labels): New.
(switch_cfg_superedge::m_idx): Delete.
(switch_cfg_superedge::m_case_labels): New field.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/switch.c: Remove xfail. Add various tests.
* gcc.dg/analyzer/torture/switch-2.c: New test.
* gcc.dg/analyzer/torture/switch-3.c: New test.
* gcc.dg/analyzer/torture/switch-4.c: New test.
* gcc.dg/analyzer/torture/switch-5.c: New test.
2021-08-23 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/
* config/rs6000/rs6000-gen-builtins.c (parse_bif_entry): Don't call
asprintf, which is not available on AIX.