As PR100794 shows, in the current implementation PRE bypasses
some optimization to avoid introducing loop carried dependence
which stops loop vectorizer to vectorize the loop. At -O2,
there is no downstream pass to re-catch this kind of opportunity
if loop vectorizer fails to vectorize that loop.
This patch follows Richi's suggestion in the PR, if predcom flag
isn't set and loop vectorization will enable predcom without any
unrolling implicitly. The Power9 SPEC2017 evaluation showed it
can speed up 521.wrf_r 3.30% and 554.roms_r 1.08% at very-cheap
cost model, no remarkable impact at cheap cost model, the build
time and size impact is fine (see the PR for the details).
By the way, I tested another proposal to guard PRE not skip the
optimization for cheap and very-cheap vect cost models, the
evaluation results showed it's fine with very cheap cost model,
but it can degrade some bmks like 521.wrf_r -9.17% and
549.fotonik3d_r -2.07% etc.
Bootstrapped/regtested on powerpc64le-linux-gnu P9,
x86_64-redhat-linux and aarch64-linux-gnu.
gcc/ChangeLog:
PR tree-optimization/100794
* tree-predcom.c (tree_predictive_commoning_loop): Add parameter
allow_unroll_p and only allow unrolling when it's true.
(tree_predictive_commoning): Add parameter allow_unroll_p and
adjust for it.
(run_tree_predictive_commoning): Likewise.
(pass_predcom::gate): Check flag_tree_loop_vectorize and
global_options_set.x_flag_predictive_commoning.
(pass_predcom::execute): Adjust for allow_unroll_p.
gcc/testsuite/ChangeLog:
PR tree-optimization/100794
* gcc.dg/tree-ssa/pr100794.c: New test.
As Richi suggested in PR100794, this patch is to remove
some unnecessary update_ssa calls with flag
TODO_update_ssa_only_virtuals, also do some refactoring.
Bootstrapped/regtested on powerpc64le-linux-gnu P9,
x86_64-redhat-linux and aarch64-linux-gnu, built well
on Power9 ppc64le with --with-build-config=bootstrap-O3,
and passed both P8 and P9 SPEC2017 full build with
{-O3, -Ofast} + {,-funroll-loops}.
gcc/ChangeLog:
* tree-predcom.c (execute_pred_commoning): Remove update_ssa call.
(tree_predictive_commoning_loop): Factor some cleanup stuffs into
lambda function cleanup, remove scev_reset call, and adjust return
value.
(tree_predictive_commoning): Adjust for different changed values,
only set flag TODO_update_ssa_only_virtuals if changed.
(pass_data pass_data_predcom): Remove TODO_update_ssa_only_virtuals
from todo_flags_finish.
In the earlier patch for PR91706 I fixed the BASELINK built by
baselink_for_fns, but since we already had one from lookup, we should keep
that one around instead of stripping it. The removed hunk in
get_class_binding was a wierdly large amount of code to decide whether to
pull out BASELINK_FUNCTIONS.
gcc/cp/ChangeLog:
PR c++/91706
* name-lookup.c (get_class_binding): Keep a BASELINK.
(set_inherited_value_binding_p): Adjust.
* lambda.c (is_lambda_ignored_entity): Adjust.
* pt.c (lookup_template_function): Copy a BASELINK before
modifying it.
This is a bit complex. Looking up c<T> in the definition of D::c finds
C::c, OK. Looking up c in the definition of E finds D::c, OK. Since the
alias is not dependent, we strip it from the template argument, leaving
using E = A<decltype(c<T>())>;
where 'c' still refers to C::c. But instantiating E looks up 'c' again and
finds D::c, which isn't a function, and sadness ensues.
I think the bug here is looking up 'c' in D at instantiation time; the
declaration we found before is not dependent. This seems to happen because
baselink_for_fns gets BASELINK_BINFO wrong; it is supposed to be the base
where lookup found the functions, C in this case.
gcc/cp/ChangeLog:
PR c++/91706
* semantics.c (baselink_for_fns): Fix BASELINK_BINFO.
gcc/testsuite/ChangeLog:
PR c++/91706
* g++.dg/template/lookup17.C: New test.
My coming fix for PR91706 caused some regressions in the modules testsuite.
This turned out to be because the change to properly use the base subobject
BINFO as BASELINK_BINFO hit problems with the code for merging binfos. The
tree reader needed a typo fix. The duplicate_hash function was crashing on
the BINFO for a variadic base in <variant>. I started fixing the hash
function, but then noticed that there's no ::equal function defined;
duplicate_hash just uses pointer equality, so we might as well also
use the normal pointer hash for the moment.
gcc/cp/ChangeLog:
* module.cc (duplicate_hash::hash): Comment out.
(trees_in::tree_value): Adjust loop counter.
Patrick already fixed the primary cause of this bug. But while I was
looking at this testcase I noticed that with the qualified name k::o we
ended up with a plain FUNCTION_DECL, whereas without the k:: we got a
BASELINK. There seems to be no good reason not to return the BASELINK
in this case as well.
PR c++/100102
gcc/cp/ChangeLog:
* init.c (build_offset_ref): Return the BASELINK for a static
member function.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/alias-decl-73.C: New test.
Use a sparse representation for the on entry cache, and utilize it when
the number of basic blocks in the function exceeds param_evrp_sparse_threshold.
PR tree-optimization/PR100299
* gimple-range-cache.cc (class sbr_sparse_bitmap): New.
(sbr_sparse_bitmap::sbr_sparse_bitmap): New.
(sbr_sparse_bitmap::bitmap_set_quad): New.
(sbr_sparse_bitmap::bitmap_get_quad): New.
(sbr_sparse_bitmap::set_bb_range): New.
(sbr_sparse_bitmap::get_bb_range): New.
(sbr_sparse_bitmap::bb_range_p): New.
(block_range_cache::block_range_cache): initialize bitmap obstack.
(block_range_cache::~block_range_cache): Destruct obstack.
(block_range_cache::set_bb_range): Decide when to utilze the
sparse on entry cache.
* gimple-range-cache.h (block_range_cache): Add bitmap obstack.
* params.opt (-param=evrp-sparse-threshold): New.
Provide set/get routines to allow sparse bitmaps to be treated as an array
of multiple bit values. Only chunk sizes that are powers of 2 are supported.
* bitmap.c (bitmap_set_aligned_chunk): New.
(bitmap_get_aligned_chunk): New.
(test_aligned_chunk): New.
(bitmap_c_tests): Call test_aligned_chunk.
* bitmap.h (bitmap_set_aligned_chunk, bitmap_get_aligned_chunk): New.
Here, when resolving the destructor named by Inner<int>::~Inner<int>
(which is valid until C++20) we end up in cp_parser_lookup_name called
indirectly from cp_parser_template_id to look up the name Inner from
the scope Inner<int>. The lookup naturally finds the injected-class-name,
and because the flag is_template is true, we adjust this lookup result
to the TEMPLATE_DECL Inner. We then check access of this adjusted
lookup result. But this access check fails because the lookup scope is
Inner<int> and the context_for_name_lookup for the TEMPLATE_DECL is
Outer (whereas for the injected-class-name it's also Inner<int>).
The simplest fix seems to be to check access of the original lookup
result (the injected-class-name) instead of the adjusted result (the
TEMPLATE_DECL). So this patch moves the access check in
cp_parser_lookup_name to before the injected-class-name adjustment.
PR c++/100918
gcc/cp/ChangeLog:
* parser.c (cp_parser_lookup_name): Check access of the lookup
result before we potentially adjust an injected-class-name to
its TEMPLATE_DECL.
gcc/testsuite/ChangeLog:
* g++.dg/template/access38.C: New test.
Clang complains about the missing typename. I believe it's not required
in a more complete implementation of C++, but it's nicer to support
less complete implementations.
PR libstdc++/100900
libstdc++-v3/ChangeLog:
* include/std/ranges (elements_view::__iter_cat::_S_iter_cat):
Add missing typename.
Since long is 32 bits for x32, update g++.target/i386/pr100885.C to cast
__m64 to long long for x32.
PR target/100885
* g++.target/i386/pr100885.C (_mm_set_epi64): Cast __m64 to long
long.
The operator<=>(const optional<T>&, const U&) operator is supposed to be
constrained with three_way_comparable_with<U, T> so that it can only be
used when T and U are weakly-equality-comparable and also three-way
comparable.
Adding that constrain completely breaks std::optional comparisons,
because it causes constraint recursion. To avoid that, an additional
check that U is not a specialization of std::optional is needed. That
appears to be a defect in the standard and should be reported to LWG.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
PR libstdc++/98842
* include/std/optional (operator<=>(const optional<T>& const U&)):
Add missing constraint and add workaround for template
recursion.
* testsuite/20_util/optional/relops/three_way.cc: Check that
type without equality comparison cannot be compared when wrapped
in std::optional.
gcc/
* config/h8300/movepush.md: Change most _clobber_flags
patterns to instead use <cczn> subst.
(movsi_cczn): New pattern with usable CC cases split out.
(movsi_h8sx_cczn): Likewise.
gcc/c-family/ChangeLog:
* c-target.def: Split long lines and replace them
with '\n\'.
gcc/ChangeLog:
* common/common-target.def: Split long lines and replace them
with '\n\'.
* target.def: Likewise.
* doc/tm.texi: Re-generated.
This silences the stage compare.
gcc/objc:
2021-06-07 Bernd Edlinger <bernd.edlinger@softing.com>
* Make-lang.in (cc1obj-checksum.c): For stage-final re-use
the checksum from the previous stage.
gcc/objcp:
2021-06-07 Bernd Edlinger <bernd.edlinger@softing.com>
* Make-lang.in (cc1objplus-checksum.c): For stage-final re-use
the checksum from the previous stage.
The callers of fold_read_from_vector expect that the index they pass is
an index of an element in the vector and the function does that most of the
time. But we allow CONSTRUCTORs with VECTOR_TYPE to have VECTOR_TYPE
elements and in that case every CONSTRUCTOR element represents not just one
index (with the exception of V1 vectors), but multiple.
So returning zero vector if i >= CONSTRUCTOR_NELTS or returning some
CONSTRUCTOR_ELT's value might not be what the callers expect.
Fixed by punting if the first element has vector type.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
In theory we could instead recurse (and assert that for CONSTRUCTORs of
vector elements we have always all elements specified like tree-cfg.c
verifies?) after adjusting the index appropriately.
2021-06-07 Jakub Jelinek <jakub@redhat.com>
PR target/100887
* fold-const.c (fold_read_from_vector): Return NULL if trying to
read from a CONSTRUCTOR with vector type elements.
* gcc.dg/pr100887.c: New test.
The following testcase ICEs, because gimple_call_arg_ptr (..., 0)
asserts that there is at least one argument, while we were using
it even if we didn't copy anything just to get a pointer from/to which
the zero arguments should be copied.
Fixed by guarding the memcpy calls. Also, the code was calling
gimple_call_num_args too many times - 5 times instead of 2, so the patch
adds two temporaries for those.
2021-06-07 Jakub Jelinek <jakub@redhat.com>
PR middle-end/100898
* tree-inline.c (copy_bb): Only use gimple_call_arg_ptr if memcpy
should copy any arguments. Don't call gimple_call_num_args
on id->call_stmt or call_stmt more than once.
* g++.dg/ext/va-arg-pack-3.C: New test.
evex encoding vpmovzxbx needs both AVX512BW and AVX512VL which means
constraint "Yw" should be used instead of constraint "v".
gcc/ChangeLog:
PR target/100885
* config/i386/sse.md (*sse4_1_zero_extendv8qiv8hi2_3): Refine
constraints.
(<insn>v4siv4di2): Delete constraints for define_expand.
gcc/testsuite/ChangeLog:
PR target/100885
* g++.target/i386/pr100885.C: New test.
When __builtin_ia32_vzeroupper is called explicitly, the corresponding
vzeroupper pattern does not carry any CLOBBERS or SETs before LRA,
which leads to incorrect optimization in pass_reload. In order to
solve this problem, this patch refine instructions as call_insns in
which the call has a special vzeroupper ABI.
gcc/ChangeLog:
PR target/82735
* config/i386/i386-expand.c (ix86_expand_builtin): Remove
assignment of cfun->machine->has_explicit_vzeroupper.
* config/i386/i386-features.c
(ix86_add_reg_usage_to_vzerouppers): Delete.
(ix86_add_reg_usage_to_vzeroupper): Ditto.
(rest_of_handle_insert_vzeroupper): Remove
ix86_add_reg_usage_to_vzerouppers, add df_analyze at the end
of the function.
(gate): Remove cfun->machine->has_explicit_vzeroupper.
* config/i386/i386-protos.h (ix86_expand_avx_vzeroupper):
Declared.
* config/i386/i386.c (ix86_insn_callee_abi): New function.
(ix86_initialize_callee_abi): Ditto.
(ix86_expand_avx_vzeroupper): Ditto.
(ix86_hard_regno_call_part_clobbered): Adjust for vzeroupper
ABI.
(TARGET_INSN_CALLEE_ABI): Define as ix86_insn_callee_abi.
(ix86_emit_mode_set): Call ix86_expand_avx_vzeroupper
directly.
* config/i386/i386.h (struct GTY(()) machine_function): Delete
has_explicit_vzeroupper.
* config/i386/i386.md (enum unspec): New member
UNSPEC_CALLEE_ABI.
(ABI_DEFAULT,ABI_VZEROUPPER,ABI_UNKNOWN): New
define_constants for insn callee abi index.
* config/i386/predicates.md (vzeroupper_pattern): Adjust.
* config/i386/sse.md (UNSPECV_VZEROUPPER): Deleted.
(avx_vzeroupper): Call ix86_expand_avx_vzeroupper.
(*avx_vzeroupper): Rename to ..
(avx_vzeroupper_callee_abi): .. this, and adjust pattern as
call_insn which has a special vzeroupper ABI.
(*avx_vzeroupper_1): Deleted.
gcc/testsuite/ChangeLog:
PR target/82735
* gcc.target/i386/pr82735-1.c: New test.
* gcc.target/i386/pr82735-2.c: New test.
* gcc.target/i386/pr82735-3.c: New test.
* gcc.target/i386/pr82735-4.c: New test.
* gcc.target/i386/pr82735-5.c: New test.
Use "used" flag for CALL_INSN to indicate it's a fake call. If it's a
fake call, it won't have its own function stack.
gcc/ChangeLog
PR target/82735
* df-scan.c (df_get_call_refs): When call_insn is a fake call,
it won't use stack pointer reg.
* final.c (leaf_function_p): When call_insn is a fake call, it
won't affect caller as a leaf function.
* reg-stack.c (callee_clobbers_any_stack_reg): New.
(subst_stack_regs): When call_insn doesn't clobber any stack
reg, don't clear the arguments.
* rtl.c (shallow_copy_rtx): Don't clear flag used when orig is
a insn.
* shrink-wrap.c (requires_stack_frame_p): No need for stack
frame for a fake call.
* rtl.h (FAKE_CALL_P): New macro.
The current implementation as an array of chars is indeed a bit awkward
so this reimplements it as a function taking and returning an int.
gcc/
* config/sparc/sparc-protos.h (order_regs_for_local_alloc): Rename
to...
(sparc_order_regs_for_local_alloc): ...this.
(sparc_leaf_reg_remap): Declare.
* config/sparc/sparc.h (ADJUST_REG_ALLOC_ORDER): Adjust.
(LEAF_REG_REMAP): Reimplement as call to sparc_leaf_reg_remap.
* config/sparc/sparc.c (leaf_reg_remap): Delete.
(order_regs_for_local_alloc): Rename to...
(sparc_order_regs_for_local_alloc): ...this.
(sparc_leaf_reg_remap): New function.
(sparc_conditional_register_usage): Do not modify leaf_reg_remap.
The code to emit BSS CSECT needs to support user assembler name.
* config/rs6000/rs6000.c (rs6000_xcoff_asm_output_aligned_decl_common):
Use assemble_name to output BSS section name.
> In convert_nonlocal_omp_clauses, the following clauses are
> missing: OMP_CLAUSE_AFFINITY, OMP_CLAUSE_DEVICE_TYPE,
> OMP_CLAUSE_EXCLUSIVE, OMP_CLAUSE_INCLUSIVE.
OMP_CLAUSE_{EXCLUSIVE,INCLUSIVE} isn't needed, because we don't
walk the clauses at all for GIMPLE_OMP_SCAN. It would be a bug
if we used the exclusive/inclusive operands after gimplification,
but we apparently don't do that, all we check is whether the
OMP_CLAUSE_KIND of the first clause (all should be the same) is
OMP_CLAUSE_EXCLUSIVE or OMP_CLAUSE_INCLUSIVE, nothing else.
That said, I think we should have a testcase.
2021-06-06 Jakub Jelinek <jakub@redhat.com>
* gcc.dg/gomp/scan-1.c: New test.
When looking at in_reduction support for target, I've noticed that
c_omp_adjust_map_clauses is not called for the combined target case.
The following patch fixes it.
Unfortunately, there are other issues.
One is (also mentioned in the PR) that currently the pointer attachment
stuff seems to be clause ordering dependent (the standard says that clause
ordering on the same construct does not matter), the baz and qux cases
in the PR are rejected while when swapped it is accepted.
Note, the order of clauses in GCC really is treated as insignificant
initially and only later on the compiler can adjust the ordering (e.g. when
we sort map clauses based on what they refer to etc.) and in particular,
clauses from parsing is reverse of the order in user code, while
c_omp_split_clauses performed for combined/composite constructs typically
reverses that ordering, i.e. makes it follow the user code ordering.
And another one is I'm slightly afraid c_omp_adjust_map_clauses might
misbehave in templates, though haven't tried to verify it with testcases.
When processing_template_decl, the non-dependent clauses will be handled
usually the same as when not in a template, but dependent clauses aren't
processed or only limited processing is done there, and rest is deferred
till later. From quick skimming of c_omp_adjust_map_clauses, it seems
it might not be very happy about non-processed map clauses that might
still have the TREE_LIST representation of array sections, or might
not have finalized decls or base decls etc.
So, for this I wonder if cp_parser_omp_target (and other cp/parser.c
callers of c_omp_adjust_map_clauses) shouldn't call it only
if (!processing_template_decl) - perhaps you could add
cp_omp_adjust_map_clauses wrapper that would be
if (!processing_template_decl)
c_omp_adjust_map_clauses (...);
- and call c_omp_adjust_map_clauses from within pt.c after the clauses
are tsubsted and finish_omp_clauses is called again.
2021-06-06 Jakub Jelinek <jakub@redhat.com>
PR c/100902
* c-parser.c (c_parser_omp_target): Call c_omp_adjust_map_clauses
even when target is combined with other constructs.
* parser.c (cp_parser_omp_target): Call c_omp_adjust_map_clauses
even when target is combined with other constructs.
* c-c++-common/gomp/pr100902-1.c: New test.
gcc/ChangeLog:
* genhooks.c (emit_findices): Remove unused function.
(emit_documentation): Do not call emit_findices
and do not search for @Fcode directives.
In C, unlike in Ada, the storage order of arrays is that of their component
type, so you need to look at it when deciding to warn. And the PR complains
about a bogus warning on the assignment of a pointer returned by alloca or
malloc, so this also fixes that.
gcc/c
PR c/100920
* c-decl.c (finish_struct): Fix thinko in previous change.
* c-typeck.c (convert_for_assignment): Do not warn on pointer
assignment and initialization for storage order purposes if the
RHS is a call to a DECL_IS_MALLOC function.
gcc/testsuite/
* gcc.dg/sso-14.c: New test.
gcc/fortran/ChangeLog:
PR fortran/100120
PR fortran/100816
PR fortran/100818
PR fortran/100819
PR fortran/100821
* trans-array.c (gfc_get_array_span): rework the way character
array "span" was calculated.
(gfc_conv_expr_descriptor): improve handling of character sections
and unlimited polymorphic objects.
* trans-expr.c (gfc_get_character_len): new function to calculate
character string length.
(gfc_get_character_len_in_bytes): new function to calculate
character string length in bytes.
(gfc_conv_scalar_to_descriptor): add call to set the "span".
(gfc_trans_pointer_assignment): set "_len" and antecipate the
initialization of the deferred character length hidden argument.
* trans-intrinsic.c (gfc_conv_associated): set "force_no_tmp" to
avoid the creation of a temporary.
* trans-types.c (gfc_get_dtype_rank_type): rework type detection
so that unlimited polymorphic objects get proper type infomation,
also important for bind(c).
(gfc_get_dtype): add argument to pass the rank if necessary.
(gfc_get_array_type_bounds): cosmetic change to have character
arrays called character instead of unknown.
* trans-types.h (gfc_get_dtype): modify prototype.
* trans.c (get_array_span): rework the way character array "span"
was calculated.
* trans.h (gfc_get_character_len): new prototype.
(gfc_get_character_len_in_bytes): new prototype.
Add "unlimited_polymorphic" flag to "gfc_se" type to signal when
expression carries an unlimited polymorphic object.
libgfortran/ChangeLog:
PR fortran/100120
* intrinsics/associated.c (associated): have associated verify if
the "span" matches insted of the "elem_len".
* libgfortran.h (GFC_DESCRIPTOR_SPAN): add macro to retrive the
descriptor "span".
gcc/testsuite/ChangeLog:
PR fortran/100120
* gfortran.dg/PR100120.f90: New test.
PR fortran/100816
PR fortran/100818
PR fortran/100819
PR fortran/100821
* gfortran.dg/character_workout_1.f90: New test.
* gfortran.dg/character_workout_4.f90: New test.
I already changed the constraints for ranges::ssize to use ranges::size,
this implements the rest of LWG 3403, so that the returned type is the
signed type corresponding to the result of ranges::size.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
* include/bits/ranges_base.h (_SSize): Return the result of
ranges::size converted to the wider of make-signed-like-t<S> and
ptrdiff_t, rather than the ranges different type.
* testsuite/std/ranges/access/ssize.cc: Adjust expected result
for an iota_view that uses an integer class type for its
difference_type.
We need to decay the result of t.data() before checking if it's a
pointer.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:
PR libstdc++/100824
* include/bits/ranges_base.h (__member_data): Use __decay_copy.
* testsuite/std/ranges/access/data.cc: Add testcase from PR.