Currently we diagnose vector lowering of V1mode operations that
are not natively supported into V_C_E, scalar op plus CTOR with
-Wvector-operation-performance but that's hardly useful behavior
even though the way we lower things can be improved.
The following disables the diagnostics for the cases the vect.exp
testsuite runs into, on x86 that are vect-cond-11.c and
vect-singleton_1.c.
2022-01-19 Richard Biener <rguenther@suse.de>
PR tree-optimization/104114
* tree-vect-generic.cc (expand_vector_piecewise): Do not diagnose
single element vector decomposition.
The patch for PR41723 properly changed one place to look into the current
instantiation; now we need to fix this place as well.
PR c++/102300
gcc/cp/ChangeLog:
* parser.cc (cp_parser_template_name): Use dependent_scope_p.
gcc/testsuite/ChangeLog:
* g++.dg/parse/no-typename1.C: Remove expected error.
* g++.dg/template/nested7.C: New test.
The default is -gdwarf-5 now, so this is hurting rather than improving
things.
libstdc++-v3/ChangeLog:
* configure.ac (GLIBCXX_ENABLE_DEBUG_FLAGS): Remove -gdwarf-4
from default flags.
* configure: Regenerate.
If one of the to-be-converted SETs requires the original comparison
(i.e. in order to generate a min/max insn) but no other insn after it
does, we can omit creating temporaries, thus facilitating costing.
gcc/ChangeLog:
* ifcvt.cc (noce_convert_multiple_sets_1): New function.
(noce_convert_multiple_sets): Call function a second time if we can
improve the first try.
Add new s390-specific tests that check if we convert two SETs into two
loads on condition. Remove the s390-specific target-check in
gcc.dg/ifcvt-4.c.
gcc/testsuite/ChangeLog:
* gcc.dg/ifcvt-4.c: Remove s390-specific check.
* gcc.target/s390/ifcvt-two-insns-bool.c: New test.
* gcc.target/s390/ifcvt-two-insns-int.c: New test.
* gcc.target/s390/ifcvt-two-insns-long.c: New test.
Following up on the previous patch, this patch makes
noce_convert_multiple emit two cmov sequences: The same one as before
and a second one that tries to re-use the existing CC. Then their costs
are compared and the cheaper one is selected.
gcc/ChangeLog:
* ifcvt.cc (cond_exec_get_condition): New parameter to allow getting the
reversed comparison.
(try_emit_cmove_seq): New function to facilitate creating a cmov
sequence.
(noce_convert_multiple_sets): Create two sequences and use the less
expensive one.
Currently we only ever call emit_conditional_move with the comparison
(as well as its comparands) we got from the jump. Thus, backends are
going to emit a CC comparison for every conditional move that is being
generated instead of re-using the existing CC.
This, combined with emitting temporaries for each conditional move,
causes sky-high costs for conditional moves.
This patch allows to re-use a CC so the costing situation is improved a
bit.
gcc/ChangeLog:
* rtl.h (struct rtx_comparison): New struct that holds an rtx
comparison.
* config/rs6000/rs6000.cc (rs6000_emit_minmax): Use struct instead of
single parameters.
(rs6000_emit_swsqrt): Likewise.
* expmed.cc (expand_sdiv_pow2): Likewise.
(emit_store_flag): Likewise.
* expr.cc (expand_cond_expr_using_cmove): Likewise.
(expand_expr_real_2): Likewise.
* ifcvt.cc (noce_emit_cmove): Add compare and reversed compare
parameters.
* optabs.cc (emit_conditional_move_1): New function.
(expand_doubleword_shift_condmove): Use struct.
(emit_conditional_move): Use struct and allow to call directly
without going through preparation steps.
* optabs.h (emit_conditional_move): Use struct.
When noce_convert_multiple is called the original costs are not yet
initialized. Therefore, up to now, costs were only ever unfairly
compared against COSTS_N_INSNS (2). This would lead to
default_noce_conversion_profitable_p () rejecting all but the most
contrived of sequences.
This patch temporarily initializes the original costs by counting
adding costs for all sets inside the then_bb.
gcc/ChangeLog:
* ifcvt.cc (bb_ok_for_noce_convert_multiple_sets): Estimate insns costs.
(noce_process_if_block): Use potential costs.
This lifts the restriction of not allowing constants for
noce_convert_multiple. The code later checks if a valid sequence
is produced anyway.
gcc/ChangeLog:
* ifcvt.cc (noce_convert_multiple_sets): Allow constants.
(bb_ok_for_noce_convert_multiple_sets): Likewise.
When if-converting multiple SETs and we encounter a swap-style idiom
if (a > b)
{
tmp = c; // [1]
c = d;
d = tmp;
}
ifcvt should not generate a conditional move for the instruction at
[1].
In order to achieve that, this patch goes through all relevant SETs
and marks the relevant instructions. This helps to evaluate costs.
On top, only generate temporaries if the current cmov is going to
overwrite one of the comparands of the initial compare.
gcc/ChangeLog:
* ifcvt.cc (need_cmov_or_rewire): New function.
(noce_convert_multiple_sets): Call it.
This makes it possible to combine --enable-libstdcxx-debug with
--enable-libstdcxx-backtrace, by adding a rule to src/Makefile to copy
the backtrace-supported.h header into the src/debug/libbacktrace
directory.
Add libbacktrace path to testsuite flags so the tests can link without
having the library installed.
Also fix some warnings when running automake for the libbacktrace
makefile.
Use a per-library CPPFLAGS variable to fix:
src/libbacktrace/Makefile.am:38: warning: AM_CPPFLAGS multiply defined in condition TRUE ...
fragment.am:43: ... 'AM_CPPFLAGS' previously defined here
src/libbacktrace/Makefile.am:32: 'fragment.am' included from here
Create symlinks to the libbacktrace sources to fix:
src/libbacktrace/Makefile.am:55: warning: source file '../../../libbacktrace/atomic.c' is in a subdirectory,
src/libbacktrace/Makefile.am:55: but option 'subdir-objects' is disabled
libstdc++-v3/ChangeLog:
* scripts/testsuite_flags.in: Add src/libbacktrace/.libs to
linker search paths.
* src/Makefile.am: Fix src/debug/libbacktrace build.
* src/Makefile.in: Regenerate.
* src/libbacktrace/Makefile.am: Use per-library CPPFLAGS
variable. Use symlinks for the source files.
* src/libbacktrace/Makefile.in: Regenerate.
power10 has modv4si3 expander and so vectorizes the following testcase
where Fortran modulo is FLOOR_MOD_EXPR.
optabs_for_tree_code indicates that the optab for all the *_MOD_EXPR
variants is umod_optab or smod_optab, but that isn't true, that optab
actually expands just TRUNC_MOD_EXPR. For the other tree codes expmed.cc
has code how to adjust the TRUNC_MOD_EXPR into those by emitting some
extra comparisons and conditional updates. Similarly for *_DIV_EXPR,
except in that case it actually needs both division and modulo.
While it would be possible to handle it in expmed.cc for vectors as well,
we'd need to be sure all the vector operations we need for that are
available, and furthermore we wouldn't account for that in the costing.
So, IMHO it is better to stop pretending those non-truncating (and
non-exact) div/mod operations have an optab. For GCC 13, we should
IMHO pattern match these in tree-vect-patterns.cc and transform them
to truncating div/mod with follow-up adjustments and let the vectorizer
vectorize that. As written in the PR, for signed operands:
r = x %[fl] y;
is
r = x % y; if (r && (x ^ y) < 0) r += y;
and
d = x /[fl] y;
is
r = x % y; d = x / y; if (r && (x ^ y) < 0) --d;
and
r = x %[cl] y;
is
r = x % y; if (r && (x ^ y) >= 0) r -= y;
and
d = /[cl] y;
is
r = x % y; d = x / y; if (r && (x ^ y) >= 0) ++d;
(too lazy to figure out rounding div/mod now). I'll create a PR
for that.
The patch also extends a match.pd optimization that floor_mod on
unsigned operands is actually trunc_mod.
2022-01-19 Jakub Jelinek <jakub@redhat.com>
PR middle-end/102860
* match.pd (x %[fl] y -> x % y): New simplification for
unsigned integral types.
* optabs-tree.cc (optab_for_tree_code): Return unknown_optab
for {CEIL,FLOOR,ROUND}_{DIV,MOD}_EXPR with VECTOR_TYPE.
* gfortran.dg/pr102860.f90: New test.
The testcase from the PR got fixed with r12-3119-g675a3e40567e1d
and looks quite similar to the evrp-trans.c test, except evrp-trans.c
is tested on signed integer types.
I think it would be useful to test it for unsigned comparisons too.
2022-01-19 Jakub Jelinek <jakub@redhat.com>
PR c/104115
* gcc.dg/tree-ssa/evrp-trans2.c: New test.
This adds a missing check for the availability of intermediate vector
types required to re-use the accumulator of a vectorized reduction
in the vectorized epilogue. For SVE and VNx2DF vs V2DF with
-msve-vector-bits=512 for example V4DF is not available.
In addition to that we have to verify the reduction operation is
supported, otherwise we for example on i?86 get vector code that's
later decomposed again by vector lowering when trying to use
a V2HI epilogue for a V8HI reduction with a target without
TARGET_MMX_WITH_SSE.
It might be we want -Wvector-operation-performance for all vect.exp
tests but that seems to have existing regressions.
2022-01-19 Richard Biener <rguenther@suse.de>
PR tree-optimization/104112
* tree-vect-loop.cc (vect_find_reusable_accumulator): Check
for required intermediate vector types.
* gcc.dg/vect/pr104112-1.c: New testcase.
* gcc.dg/vect/pr104112-2.c: New testcase.
Currently omp_get_device_num does not work on gcn targets with more than one
offload device. The reason is that GOMP_DEVICE_NUM_VAR is static in
icv-device.c and thus "__gomp_device_num" is not visible in the offload image.
This patch removes "static" such that "__gomp_device_num" is now part of the
offload image and can now be found in GOMP_OFFLOAD_load_image in the plugin.
This is not an issue for nvptx. There, "__gomp_device_num" is in the offload
image even with "static".
libgomp/ChangeLog:
* config/gcn/icv-device.c: Make GOMP_DEVICE_NUM_VAR public (remove
"static") to make the device num available in the offload image.
Use SFINAE magic to support: "It is unspecified whether math_errhandling
is a macro or an identifier with external linkage." [C Standard]
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
* include/experimental/bits/simd.h (__floating_point_flags): Do
not rely on math_errhandling to expand to a constant expression.
Since the x86_64-linux-gnux32 compiler is actually an x32 compiler, set
target_cpu to x32 for x86_64-linux-gnux32.
PR ada/103538
* gcc-interface/Makefile.in (target_cpu): Set to x32 for
x86_64-linux-gnux32.
The tests are C++ code, so use a proper file extension.
gcc/testsuite/ChangeLog:
* g++.dg/ext/boolcomplex-1.c: Moved to...
* g++.dg/ext/boolcomplex-1.C: ...here.
* g++.dg/opt/pr47639.c: Moved to...
* g++.dg/opt/pr47639.C: ...here.
* g++.dg/pr83979.c: Moved to...
* g++.dg/pr83979.C: ...here.
* g++.dg/tm/asm-1.c: Moved to...
* g++.dg/tm/asm-1.C: ...here.
* g++.dg/vect/pr71483.c: Moved to...
* g++.dg/vect/pr71483.cc: ...here.
> On 18/01/2022 22:42, Segher Boessenkool wrote:
> > > + default:
> > > + break;
> > Please don't do that. You can do
> >
> > default:
> > break;
> > break;
> > /* And just to make sure: */
> > break;
> > break;
> >
> > and it will do exactly the same as not having a default at all. Not
> > having such useless code is by far the most readable, so please don't
> > include a default case at all.
>
> I removed the default case. I hope this is what you wanted.
Unfortunately the removal of default: break; breaks bootstrap:
../../gcc/config/rs6000/rs6000.cc: In function ‘const char* rs6000_machine_from_flags()’:
../../gcc/config/rs6000/rs6000.cc:5940:10: error: enumeration value ‘PROCESSOR_PPC601’ not handled in switch [-Werror=switch]
5940 | switch (rs6000_cpu)
| ^
../../gcc/config/rs6000/rs6000.cc:5940:10: error: enumeration value ‘PROCESSOR_PPC603’ not handled in switch [-Werror=switch]
...
default: break; is needed to tell the -Wswitch warning that it is intentional
that not all enumerators are handled in the switch.
2022-01-19 Jakub Jelinek <jakub@redhat.com>
* config/rs6000/rs6000.cc (rs6000_machine_from_flags): Add default:.
As reported in the PR or as I've seen since the weekend, asan_test.C fails
because of many warnings like:
gcc/testsuite/g++.dg/asan/asan_test.cc:1157:10: error: using a dangling pointer to an unnamed temporary [-Werror=dangling-pointer=]
gcc/testsuite/g++.dg/asan/asan_test.cc:1157:10: error: using a dangling pointer to an unnamed temporary [-Werror=dangling-pointer=]
gcc/testsuite/g++.dg/asan/asan_test.cc:1162:27: error: using a dangling pointer to an unnamed temporary [-Werror=dangling-pointer=]
...
(lots of them).
There are no dangling pointers though, the warning pass sees:
some_automatic_var ={v} {CLOBBER};
.ASAN_MARK (POISON, &some_automatic_var, 8);
and warns on that (both on user vars and on e.g. TARGET_EXPR temporaries).
There is nothing wrong on that, .ASAN_MARK is compiler instrumentation,
which doesn't even touch the variable in any way nor make it escaped.
What it instead does is change bytes in the shadow memory corresponding
to the variable to reflect that the variable is out of scope and make
sure that access to it would be diagnosed at runtime.
So, for all purposes of the -Wdangling-pointer and -Wuse-after-free
warnings, we should ignore this internal call.
2022-01-19 Jakub Jelinek <jakub@redhat.com>
PR middle-end/104103
* gimple-ssa-warn-access.cc (pass_waccess::check_call): Don't check
.ASAN_MARK calls.
This is a non-C++ related part from the PR89074 address_compare changes.
For "foo" == "foo" we already optimize this from the (cmp @0 @0)
simplification, because we use operand_equal_p in that case
and operand_equal_p also compares the STRING_CSTs bytes rather than
just addresses.
2022-01-19 Jakub Jelinek <jakub@redhat.com>
PR c++/89074
* fold-const.cc (address_compare): Consider different STRING_CSTs
with the same lengths that memcmp the same as equal, not different.
* gcc.dg/tree-ssa/pr89074.c: New test.
grep '{[^|}]*}"' *.md
found another spot, though dunno if we have sufficient effective targets
etc. to add an -masm=intel test for it (and my installed binutils doesn't
support it anyway).
Binutils trunk testsuite shows the argument isn't omitted even in the Intel
syntax:
grep aesencwide *.s
keylocker.s: aesencwide128kl 126(%edx)
keylocker.s: aesencwide256kl 126(%edx)
keylocker.s: aesencwide128kl [edx+126]
keylocker.s: aesencwide256kl [edx+126]
property-10.s: aesencwide128kl (%eax)
x86-64-keylocker.s: aesencwide128kl 126(%rdx)
x86-64-keylocker.s: aesencwide256kl 126(%rdx)
x86-64-keylocker.s: aesencwide128kl [rdx+126]
x86-64-keylocker.s: aesencwide256kl [rdx+126]
and doesn't use any WHATEVER PTR.
2022-01-19 Jakub Jelinek <jakub@redhat.com>
* config/i386/sse.md (*aes<aeswideklvariant>u*): Use %0 instead of
{%0}.
The desired transform requires V2SI vector add support, the closest
we have is vect64 so check that which at least fixes the i?86 fail.
2022-01-19 Richard Biener <rguenther@suse.de>
PR testsuite/102833
* gcc.dg/vect/bb-slp-17.c: Require vect64.
For some CPUs, the assembler machine directive cannot be determined by ISA
flags.
gcc/
PR target/104090
* config/rs6000/rs6000.cc (rs6000_machine_from_flags): Use also
rs6000_cpu.
Currently all tsvc tests fail to build on DragonFly BSD because they
assume <malloc.h> and memalign() are available.
gcc/testsuite/ChangeLog:
PR testsuite/104021
* gcc.dg/vect/tsvc/tsvc.h: Do not include malloc.h on dragonfly
and use posix_memalign ().
Signed-off-by: Rimvydas Jasinskas <rimvydas.jas@gmail.com>
On DragonFly BSD profiling ends before these DTORs are invoked on dso cleanup.
The -static compilation works as expected.
gcc/testsuite/ChangeLog:
PR testsuite/104022
* g++.dg/gcov/pr16855.C: xfail the count lines for DTORs on dragonfly.
* g++.dg/gcov/pr16855-priority.C: Ditto. Adjust source layout so that
dejagnu xfail expressions work.
Signed-off-by: Rimvydas Jasinskas <rimvydas.jas@gmail.com>
As Richard pointed out in PR104015, the test case slp-perm-9.c
can be fragile when vectorizer tries to use different
vectorisation strategies.
As suggested, this patch tries to make the check not sensitive
on the re-trying times by removing the times checking. To still
retain the test coverage on unnecessary re-trying, for example
it exposes this PR104015 on Power9, I added two test cases to
powerpc testsuite.
gcc/testsuite/ChangeLog:
PR tree-optimization/104015
* gcc.dg/vect/slp-perm-9.c: Adjust.
* gcc.target/powerpc/pr104015-1.c: New test.
* gcc.target/powerpc/pr104015-2.c: New test.
Somehow I pushed my earlier patch without it actually fixing the test; we
need input_location to be for the last consumed token, not the next one.
gcc/cp/ChangeLog:
* parser.cc (saved_token_sentinel::rollback): Use
cp_lexer_previous_token.
> > On Sat, Jan 15, 2022 at 5:39 PM Hongyu Wang <wwwhhhyyy333@gmail.com> wrote:
> > > Thanks for the suggestion, here is the updated patch that survived
> > > bootstrap/regtest.
Unfortunately the patch results in assembler failures with -masm=intel.
> > > > + if (TARGET_DEST_FALSE_DEPENDENCY
> > > > + && get_attr_dest_false_dep (insn) ==
> > > > + DEST_FALSE_DEP_TRUE)
> > > > + output_asm_insn ("vxorps\t{%x0, %x0, %x0}", operands);
All the vxorps insns were emitted like the above, which means for -masm=sysv
it looks like
vxorps %xmm3, %xmm3, %xmm3
but for -masm=intel like:
vxorps
We want obviously
vxorps xmm3, xmm3, xmm3
so the following patch just drops the errorneous {}s.
2022-01-19 Jakub Jelinek <jakub@redhat.com>
PR target/104104
* config/i386/sse.md
(<avx512>_<complexopname>_<mode><maskc_name><round_name>,
avx512fp16_<complexopname>sh_v8hf<mask_scalarc_name><round_scalarcz_name>,
avx512dq_mul<mode>3<mask_name>, <avx2_avx512>_permvar<mode><mask_name>,
avx2_perm<mode>_1<mask_name>, avx512f_perm<mode>_1<mask_name>,
avx512dq_rangep<mode><mask_name><round_saeonly_name>,
avx512dq_ranges<mode><mask_scalar_name><round_saeonly_scalar_name>,
<avx512>_getmant<mode><mask_name><round_saeonly_name>,
avx512f_vgetmant<mode><mask_scalar_name><round_saeonly_scalar_name>):
Use vxorps\t%x0, %x0, %x0 instead of vxorps\t{%x0, %x0, %x0}.
* gcc.target/i386/pr104104.c: New test.
This was deprecated in C++17, not C++14.
libstdc++-v3/ChangeLog:
* include/bits/stl_tempbuf.h (get_temporary_buffer): Change
_GLIBCXX14_DEPRECATED to _GLIBCXX17_DEPRECATED.
This function is no longer used since r12-6691 and can be removed.
libstdc++-v3/ChangeLog:
* include/bits/stl_pair.h (_PCC::_DeprConsPair): Remove unused
function.
This fixes an on AIX.
The lock function currently just spins, which should be changed to use
back-off, and maybe then _M_val.wait(__current) when supported.
libstdc++-v3/ChangeLog:
PR libstdc++/104101
* include/bits/shared_ptr_atomic.h (_Sp_atomic::_Atomic_count::lock):
Only use __thread_relax if __cpp_lib_atomic_wait is defined.
It was pointed out to me by Jakub, that the comment in front of
the new code which handles warning/error attribute was not really
understandable. This fixes the comment to be understandable; I
don't know why I wrote the original comment that way even.
Committed as obvious after a quick build.
gcc/ChangeLog:
* ipa-split.cc (visit_bb): Fix comment before the
warning/error attribute checking code.