This improves diagnostic quality for unsatisfied atomic constraints
that consist of a fold expression, e.g. in concepts/diagnostic3.C
the "evaluated to false" diagnostic now points to the expression:
.../diagnostic3.C:10:22: note: the expression ‘(foo<Ts> && ...) [with Ts = {int, char}]’ evaluated to ‘false’
10 | requires (foo<Ts> && ...)
| ~~~~~~~~~~~~^~~~
gcc/cp/ChangeLog:
* semantics.cc (finish_unary_fold_expr): Use input_location
instead of UNKNOWN_LOCATION.
(finish_binary_fold_expr): Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/concepts/diagnostic3.C: Adjusted expected location of
"evaluated to false" diagnostics.
Fixes: 77eccbf39e
rs6000.h has
#define PROCESSOR_POWERPC PROCESSOR_PPC604
#define PROCESSOR_POWERPC64 PROCESSOR_RS64A
which means that if you use things like -mcpu=powerpc -mvsx it will no
longer work after my latest .machine patch. This causes GCC build errors
in some cases, not a good idea (even if the errors are actually
pre-existing: using -mvsx with a machine that does not have VSX cannot
work properly).
2022-03-11 Segher Boessenkool <segher@kernel.crashing.org>
PR target/104829
* config/rs6000/rs6000.cc (rs6000_machine_from_flags): Don't output
"ppc" and "ppc64" based on rs6000_cpu.
Currently in OpenACC 'kernels' decomposition, there is special handling of
'GOMP_MAP_FORCE_TOFROM', documented to be done to avoid "internal compiler
errors in later passes". For performance reasons, the current repetitive
to/from device copying for every region is not ideal, compared to using
'present' clauses, as done for almost all other 'GOMP_MAP_*'. Also, the
current special handling (incomplete, evidently) is the reason for the PR104892
misbehavior. For PR100280 etc. we've resolved all such known ICEs -- removing
the special handling for 'GOMP_MAP_FORCE_TOFROM' now resolves PR104892.
PR middle-end/100280
PR middle-end/104892
gcc/
* omp-oacc-kernels-decompose.cc (omp_oacc_kernels_decompose_1):
Remove special handling of 'GOMP_MAP_FORCE_TOFROM'.
gcc/testsuite/
* c-c++-common/goacc/kernels-decompose-2.c: Adjust.
* c-c++-common/goacc/kernels-decompose-pr100400-1-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr100400-1-2.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr100400-1-3.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr100400-1-4.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104061-1-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104774-1.c: Likewise.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.
* gfortran.dg/goacc/kernels-decompose-2.f95: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
* testsuite/libgomp.oacc-fortran/asyncwait-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90:
Likewise.
Document a few examples of the status quo.
PR middle-end/104892
libgomp/
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Point
to PR104892.
* testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise,
enable '--param=openacc-kernels=decompose' and adjust.
* testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
* testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90:
Likewise.
... like in recent commit 9b32c1669a
"OpenACC 'kernels' decomposition: Mark variables used in synthesized
data clauses as addressable [PR100280]". Otherwise, we may run into
'gcc/omp-low.cc:lower_omp_target':
13125 else if (is_gimple_reg (var))
13126 {
13127 gcc_assert (offloaded);
PR middle-end/100280
PR middle-end/104086
gcc/
* omp-oacc-kernels-decompose.cc (omp_oacc_kernels_decompose_1):
Mark variables used in 'present' clauses as addressable.
* omp-low.cc (scan_sharing_clauses) <OMP_CLAUSE_MAP>: Gracefully
handle duplicate 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE'.
gcc/testsuite/
* c-c++-common/goacc/kernels-decompose-pr104086-1.c: Adjust,
extend.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
Merge this...
* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c:
..., and this...
* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: ... into
this, and adjust.
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Extend.
..., currently XFAILed with 'dg-ice', as it runs into
'gcc/omp-low.cc:lower_omp_target':
13125 else if (is_gimple_reg (var))
13126 {
13127 gcc_assert (offloaded);
This means, the recent PR100280 etc. changes are still not sufficient.
gcc/testsuite/
PR middle-end/104086
* c-c++-common/goacc/kernels-decompose-pr104086-1.c: New file.
This patch addresses PR middle-end/98420, which is inappropriate constant
folding of x - x to 0.0 (in match.pd) when -frounding-math is specified.
Specifically, x - x may be -0.0 with FE_DOWNWARD as the rounding mode.
To summarize, the desired IEEE behaviour, x - x for floating point x,
(1) can't be folded to 0.0 by default, due to the possibility of NaN or Inf
(2) can be folded to 0.0 with -ffinite-math-only
(3) can't be folded to 0.0 with -ffinite-math-only -frounding-math
(4) can be folded with -ffinite-math-only -frounding-math -fno-signed-zeros
2022-03-12 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR middle-end/98420
* match.pd (minus @0 @0): Additional checks for -fno-rounding-math
(the defaut) or -fno-signed-zeros.
gcc/testsuite/ChangeLog
PR middle-end/98420
* gcc.dg/pr98420.c: New test case.
PR target/104868 had had an issue where my code that updated the DImode to
TImode sign extension for power10 failed. In looking at the failure
message, the reason is when extendditi2 tries to split the insn, it
generates an insn that does not satisfy its constraints:
(set (reg:V2DI 65 1)
(vec_duplicate:V2DI (reg:DI 0)))
The reason is vsx_splat_v2di does not allow GPR register 0 when the will
be generating a mtvsrdd instruction. In the definition of the mtvsrdd
instruction, if the RA register is 0, it means clear the upper 64 bits of
the vector instead of moving register GPR 0 to those bits.
When I wrote the extendditi2 pattern, I forgot that mtvsrdd had that
behavior so I used a 'r' constraint instead of 'b'. In the rare case
where the value is in GPR register 0, this split will fail.
This patch uses the right constraint for extendditi2.
2022-03-11 Michael Meissner <meissner@linux.ibm.com>
gcc/
PR target/104868
* config/rs6000/vsx.md (extendditi2): Use a 'b' constraint when
moving from a GPR register to an Altivec register.
Originally, these were cached in the front-end AST node field `isym'.
However, this field is due to be removed in the future.
gcc/d/ChangeLog:
* imports.cc (imported_decls): Define.
(class ImportVisitor): Add result_ field.
(ImportVisitor::result): New method.
(ImportVisitor::visit (Module *)): Store decl to result_.
(ImportVisitor::visit (Import *)): Likewise.
(ImportVisitor::visit (AliasDeclaration *)): Don't cache decl in
front-end AST node.
(ImportVisitor::visit (OverDeclaration *)): Likewise.
(ImportVisitor::visit (FuncDeclaration *)): Likewise.
(ImportVisitor::visit (Declaration *)): Likewise.
(build_import_decl): Use imported_decls to cache and lookup built
declarations.
Address comments made in PR104552 about documented D language options.
gcc/d/ChangeLog:
PR translation/104552
* lang.opt (fdump-cxx-spec=): Fix typo in argument handle.
(fpreview=fixaliasthis): Quote `alias this' as code.
This patch is the backend piece of my proposed fix to PR tree-opt/98335,
to allow C++ partial struct initialization to be as efficient/optimized
as full struct initialization.
With the middle-end patch just posted to gcc-patches, the test case
in the PR compiles on x86_64-pc-linux-gnu with -O2 to:
xorl %eax, %eax
movb c(%rip), %al
ret
with this additional peephole2 (actually four peephole2s):
movzbl c(%rip), %eax
ret
2022-03-11 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR tree-optimization/98335
* config/i386/i386.md (peephole2): Eliminate redundant insv.
Combine movl followed by movb. Transform xorl followed by
a suitable movb or movw into the equivalent movz[bw]l.
gcc/testsuite/ChangeLog
PR tree-optimization/98335
* g++.target/i386/pr98335.C: New test case.
* gcc.target/i386/pr98335.c: New test case.
This patch is the main middle-end piece of a fix for PR tree-opt/98335,
which is a code-quality regression affecting mainline. The issue occurs
in DSE's (dead store elimination's) compute_trims function that determines
where a store to memory can be trimmed. In the testcase given in the
PR, this function notices that the first byte of a DImode store is dead,
and replaces the 8-byte store at (aligned) offset zero, with a 7-byte store
at (unaligned) offset one. Most architectures can store a power-of-two
bytes (up to a maximum) in single instruction, so writing 7 bytes requires
more instructions than writing 8 bytes. This patch follows Jakub Jelinek's
suggestion in comment 5, that compute_trims needs improved heuristics.
On x86_64-pc-linux-gnu with -O2 the new test case in the PR goes from:
movl $0, -24(%rsp)
movabsq $72057594037927935, %rdx
movl $0, -21(%rsp)
andq -24(%rsp), %rdx
movq %rdx, %rax
salq $8, %rax
movb c(%rip), %al
ret
to
xorl %eax, %eax
movb c(%rip), %al
ret
2022-03-11 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
PR tree-optimization/98335
* builtins.cc (get_object_alignment_2): Export.
* builtins.h (get_object_alignment_2): Likewise.
* tree-ssa-alias.cc (ao_ref_alignment): New.
* tree-ssa-alias.h (ao_ref_alignment): Declare.
* tree-ssa-dse.cc (compute_trims): Improve logic deciding whether
to align head/tail, writing more bytes but using fewer store insns.
gcc/testsuite/ChangeLog
PR tree-optimization/98335
* g++.dg/pr98335.C: New test case.
* gcc.dg/pr86010.c: New test case.
* gcc.dg/pr86010-2.c: New test case.
The "sorry, unimplemented" message in the new g++.dg/other/pr84964.C is
apparently dependent upon whether the target passes multi-gigabyte
arguments on the stack. This tweaks the testcase to just confirm that
it no longer ICEs, not the specific set of warnings/errors triggered.
2022-03-11 Roger Sayle <roger@nextmovesoftware.com>
gcc/testsuite/ChangeLog
PR c++/84964
* g++.dg/other/pr84964.C: Tweak test to check for the ICE, not for
the (target-dependent) sorry.
The following addresses optimistic non-addressable marking of
an argument of __atomic_compare_exchange_n which broke when
I added DECL_NOT_GIMPLE_REG_P since we cannot guarantee we can
rewrite it when TREE_ADDRESSABLE is unset. Instead we have to
restore TREE_ADDRESSABLE in that case.
2022-03-11 Richard Biener <rguenther@suse.de>
PR tree-optimization/104880
* tree-ssa.cc (execute_update_address_taken): Remember if we
optimistically made something not addressable and
prepare to undo it.
* g++.dg/opt/pr104880.cc: New testcase.
After accounting for GPR -> XMM move cost for vec_construct the
base cost needs adjustments to not double-cost those. This also
lowers the cost when such move is not necessary.
2022-03-11 Richard Biener <rguenther@suse.de>
PR target/104762
* config/i386/i386.cc (ix86_builtin_vectorization_cost): Do not
cost the first lane of SSE pieces as inserts for vec_construct.
lto-plugin/ChangeLog:
* lto-plugin.c (all_symbols_read_handler): With -save-temps, use
link_output_name for -foffload-objects's file name, if available.
libphobos is currently only enabled on Solaris/x86 with gas. As
discovered when gdc was switched to the dmd frontend, this initially
broke bootstrap for the other Solaris configurations.
However, it's now well possible to enable it both for Solaris/x86 with
as and Solaris/SPARC (both as and gas) since the original problems (x86
as linelength limit, among others) are long gone.
The following patch does just that.
Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11 (both as and
gas) with gdc 9.3.0 (x86) resp. 9.4.0 (sparc, configured with
--enable-libphobos) as bootstrap compilers.
2021-12-01 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
libphobos:
PR d/103528
* configure.ac <x86_64-*-solaris2.* | i?86-*-solaris2.*>: Remove
gas requirement.
* configure: Regenerate.
* configure.tgt (sparc*-*-solaris2.11*): Mark supported.
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_trans_omp_clauses, gfc_omp_finish_clause):
Obtain size for mapping only if allocatable array is allocated.
gcc/testsuite/ChangeLog:
* gfortran.dg/goacc/array-with-dt-1.f90: Update/add comments;
remove dg-warning for 'is used uninitialized'.
* gfortran.dg/goacc/pr93464.f90: Likewise.
* gfortran.dg/goacc/array-with-dt-1a.f90: New; copied from
gfortran.dg/goacc/array-with-dt-1.f90 but run with -O0. Update
dg-warning for 'may be used uninitialized'.
* gfortran.dg/goacc/pr93464-2.f90: Likewise; copied from
gfortran.dg/goacc/pr93464.f90.
This patch resolves PR c++/84969 which is an ICE in the middle-end after
emitting a "sorry, unimplemented" message, and is a regression from
earlier releases of GCC. This issue is that after encountering a
function call requiring an unreasonable amount of stack space, the
code continues and falls foul of an assert checking that stack pointer
has been correctly updated. The fix is to (locally) consider aborted
function calls as "no return", which skips this downstream sanity check.
2022-03-10 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR c++/84964
* calls.cc (expand_call): Ignore stack adjustments after sorry.
gcc/testsuite/ChangeLog
PR c++/84964
* g++.dg/other/pr84964.C: New test case.
The preprocessor condition for using fast_float should match the one in
the header, and require at least 32-bit size_t.
libstdc++-v3/ChangeLog:
PR libstdc++/104870
* src/c++17/floating_from_chars.cc: Check __SIZE_WIDTH__ >= 32
before using fast_float.
Splitting hard register live range can create (artificial)
conflict of the hard register with another pseudo because of simplified
conflict calculation in LRA. We should check such conflict on the next
assignment sub-pass and spill and reassign the pseudo if necessary.
The patch implements this.
gcc/ChangeLog:
PR target/103074
* lra-constraints.cc (split_reg): Set up
check_and_force_assignment_correctness_p when splitting hard
register live range.
gcc/testsuite/ChangeLog:
PR target/103074
* gcc.target/i386/pr103074.c: New.
1. Thomas reported in
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589039.html
that this testcase is randomly failing. The problem was fixed pool
size which was exhausted when there were a lot of threads. Fixed it
by removing pool_size trait which causes default pool size to be used
which should be big enough.
2. Array indices have been changed to check the last element in the
array.
3. Remove a redundant assignment and move some code to better match
C testcase.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/allocate-1.f90: Remove pool_size
trait. Test last index in w and v array. Remove redundant
assignment to V(1). Move alignment checks at the end of
parallel region.
Here we have a forward declaration of Parameter for which we create
an implicit typedef, which is a TYPE_DECL. Then, when looking it up
at template definition time, cp_parser_template_id gets (since r12-6754)
this TYPE_DECL which it can't handle.
This patch defers lookup for TYPE_DECLs that cp_parser_template_id can't
handle, a la r12-6879.
PR c++/104608
gcc/cp/ChangeLog:
* parser.cc (cp_parser_template_name): Repeat lookup of
TYPE_DECLs.
gcc/testsuite/ChangeLog:
* g++.dg/parse/template-keyword3.C: New test.
* g++.dg/parse/template-keyword4.C: New test.
104752 points out that
template<class T>
concept C = true;
auto y = C auto(1);
is ill-formed as per [dcl.type.auto.deduct]: "For an explicit type conversion,
T is the specified type, which shall be auto." which doesn't allow
type-constraint auto.
PR c++/104752
gcc/cp/ChangeLog:
* semantics.cc (finish_compound_literal): Disallow auto{x} for
is_constrained_auto.
* typeck2.cc (build_functional_cast_1): Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/cpp23/auto-fncast12.C: New test.
This is an ICE-on-invalid with "auto operator delete[] (void *)" whose
return type must be void. The return type is checked in coerce_delete_type
but we never got there in this test, because we took the wrong path in
grokdeclarator, set type to error_mark_node, ended up creating a FIELD_DECL
with build_decl, and confused grokmethod by giving it a FIELD_DECL.
Fixed by not taking the data member path for a FUNCTION_TYPE.
PR c++/104846
gcc/cp/ChangeLog:
* decl.cc (grokdeclarator): Check FUNC_OR_METHOD_TYPE_P before giving
data member errors.
gcc/testsuite/ChangeLog:
* g++.dg/init/delete5.C: New test.
wg21.link/p2128 removed "with exactly one parameter" from over.sub
section. grok_op_properties has for that the last 2 lines in:
case OVL_OP_FLAG_BINARY:
if (arity != 2)
{
if (operator_code == ARRAY_REF && cxx_dialect >= cxx23)
break;
but unfortunately it isn't enough, we reject variadic operator[]
earlier. The following patch accepts variadic operator[] for C++23
too.
2022-03-10 Jakub Jelinek <jakub@redhat.com>
PR c++/103460
* decl.cc (grok_op_properties): Allow variadic operator[] for
C++23.
* g++.dg/cpp23/subscript7.C: New test.
I mistakenly believed that VAX floats do not support NaN, but with GCC
__builtin_isnan(__builtin_nan("")) is true. That means my previous
change to <compare> is wrong, because it fails to handle NaN.
When std::numeric_limits<floating-point-type>::is_iec559 is false, as on
VAX, the standard only requires an ordering that is consistent with the
ordering observed by comparison operators. With this change the ordering
is -NaN < numbers < +NaN, and there is no support for different NaN bit
patterns (as I'm not even sure if GCC supports any for VAX).
libstdc++-v3/ChangeLog:
* libsupc++/compare (_Strong_order::_S_fp_cmp) [__vax__]:
Handle NaN.
The previous patch extended
-Wanalyzer-write-to-const
-Wanalyzer-write-to-string-literal
to make use of __attribute__ ((access, ....), but the results could be
inscrutable.
This patch adds notes to such diagnostics to give the user a reason for
why the analyzer is complaining.
Example output:
test.c: In function 'main':
test.c:15:13: warning: write to string literal [-Wanalyzer-write-to-string-literal]
15 | if (getrandom((char *)test, sizeof(buf), GRND_RANDOM))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
'main': event 1
|
| 15 | if (getrandom((char *)test, sizeof(buf), GRND_RANDOM))
| | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| | |
| | (1) write to string literal here
|
test.c:3:5: note: parameter 1 of 'getrandom' marked with attribute 'access (write_only, 1, 2)'
3 | int getrandom (void *__buffer, size_t __length,
| ^~~~~~~~~
Unfortunately we don't have location information for the attributes
themselves, just the function declaration, and there doesn't seem to be
a good way of getting at the location of the individual parameters from
the middle end (the C and C++ FEs both have get_fndecl_argument_location,
but the implementations are different).
gcc/analyzer/ChangeLog:
PR analyzer/104793
* analyzer.h (class pending_note): New forward decl.
* diagnostic-manager.cc (saved_diagnostic::saved_diagnostic):
Initialize m_notes.
(saved_diagnostic::operator==): Compare m_notes.
(saved_diagnostic::add_note): New.
(saved_diagnostic::emit_any_notes): New.
(diagnostic_manager::add_note): New.
(diagnostic_manager::emit_saved_diagnostic): Call emit_any_notes
after emitting the warning.
* diagnostic-manager.h (saved_diagnostic::add_note): New decl.
(saved_diagnostic::emit_any_notes): New decl.
(saved_diagnostic::m_notes): New field.
(diagnostic_manager::add_note): New decl.
* engine.cc (impl_region_model_context::add_note): New.
* exploded-graph.h (impl_region_model_context::add_note): New
decl.
* pending-diagnostic.h (class pending_note): New.
(class pending_note_subclass): New template.
* region-model.cc (class reason_attr_access): New.
(check_external_function_for_access_attr): Add class
annotating_ctxt and use it when checking region.
(noop_region_model_context::add_note): New.
* region-model.h (region_model_context::add_note): New vfunc.
(noop_region_model_context::add_note): New decl.
(class region_model_context_decorator): New.
(class note_adding_context): New.
gcc/testsuite/ChangeLog:
PR analyzer/104793
* gcc.dg/analyzer/write-to-const-2.c: Add dg-message directives
for expected notes.
* gcc.dg/analyzer/write-to-function-1.c: Likewise.
* gcc.dg/analyzer/write-to-string-literal-2.c: Likewise.
* gcc.dg/analyzer/write-to-string-literal-3.c: Likewise.
* gcc.dg/analyzer/write-to-string-literal-4.c: Likewise.
* gcc.dg/analyzer/write-to-string-literal-5.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
This patch extends:
-Wanalyzer-write-to-const
-Wanalyzer-write-to-string-literal
so that they will check for __attribute__ ((access, ....) on calls to
externally-defined functions, and complain about read-only regions
pointed to by arguments marked with a "write_only" or "read_write"
attribute.
gcc/analyzer/ChangeLog:
PR analyzer/104793
* region-model.cc
(region_model::check_external_function_for_access_attr): New.
(region_model::handle_unrecognized_call): Call it.
* region-model.h
(region_model::check_external_function_for_access_attr): New decl.
(region_model::handle_unrecognized_call): New decl.
gcc/testsuite/ChangeLog:
PR analyzer/104793
* gcc.dg/analyzer/write-to-const-2.c: New test.
* gcc.dg/analyzer/write-to-function-1.c: New test.
* gcc.dg/analyzer/write-to-string-literal-2.c: New test.
* gcc.dg/analyzer/write-to-string-literal-3.c: New test.
* gcc.dg/analyzer/write-to-string-literal-4.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
gcc/analyzer/ChangeLog:
* sm-taint.cc (taint_state_machine::check_for_tainted_size_arg):
Avoid generating duplicate saved_diagnostics by only handling the
rdwr_map entry for the ptrarg, not the duplicate entry for the
sizarg.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/taint-size-access-attr-1.c: Add
-fanalyzer-show-duplicate-count to options; verify that a
duplicate was not created for the tainted size.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
When writing the patch that downgrades address-taken references to
load references when IPA-CP can prove that all uses of the taken
address ends up in loads, I unfortunately did not take into account
that find_more_scalar_values_for_callers_subset now happily adds
self-recursive edges to the set of callers which should be immediately
redirected (originally recursion was meant to be handled as edge
redirection in a second pass over the SCC).
The code as it is can now decrement the referece counters too many
times. This can remedied by removing self-recursive edges earlier, we
already do it because of thunk expansion issues, and so this patch
does exactly that.
gcc/ChangeLog:
2022-03-07 Martin Jambor <mjambor@suse.cz>
PR ipa/104813
* ipa-cp.cc (create_specialized_node): Move removal of
self-recursive calls from callers vector before refrence
adjustments.
gcc/testsuite/ChangeLog:
2022-03-07 Martin Jambor <mjambor@suse.cz>
PR ipa/104813
* gcc.dg/ipa/pr104813.c: New test.
The following arranges to remove an indirection do the bitvector
in sbr_sparse_bitmap by embedding bitmap_head instead of bitmap
and using the tree form (since we only ever set/query individual
aligned bit chunks). That shaves off 6 seconds from 70 seconds
of the slowest 521.wrf_r LRANS unit build.
2022-03-10 Richard Biener <rguenther@suse.de>
PR tree-optimization/102943
* gimple-range-cache.cc (sbr_sparse_bitmap::bitvec):
Make a bitmap_head.
(sbr_sparse_bitmap::sbr_sparse_bitmap): Adjust and switch
to tree view.
(sbr_sparse_bitmap::set_bb_range): Adjust.
(sbr_sparse_bitmap::get_bb_range): Likewise.
The following arranges for s-oscons.ads to record target_noncanonical
for Target_Name, matching the install directory layout and what
gcc -dumpmachine says. This fixes build issues with gprbuild.
2022-03-10 Richard Biener <rguenther@suse.de>
PR ada/104861
gcc/ada/
* gcc-interface/Makefile.in (target_noncanonical): Substitute.
(OSCONS_CPP): Pass target_noncanonical as TARGET.
Currently back_propagate_equivalences tries to optimize dominance
queries in a smart way but it fails to notice that when fast indexes
are available the dominance query is fast (when called from DOM).
It also re-computes the dominance bitmap for each equivalence recorded
on an edge, which for FP are usually several. Finally it fails to
use the tree bitmap view for efficiency. Overall this cuts 7
seconds of compile-time from originally 77 in the slowest LTRANS
unit when building 521.wrf_r.
2022-03-10 Richard Biener <rguenther@suse.de>
PR tree-optimization/102943
* tree-ssa-dom.cc (back_propagate_equivalences): Only
populate the dominance bitmap if fast queries are not
available. Use a tree view bitmap.
(record_temporary_equivalences): Cache the dominance bitmap
across all equivalences on the edge.
The VAX float and double format does not support NaN, so the
std::partial_ordering returned by <=> will never be 'unordered'. We can
just use the partial_ordering value as the strong_ordering.
libstdc++-v3/ChangeLog:
* libsupc++/compare (_Strong_ordering::_S_fp_cmp) [__vax__]: Use
<=> comparison.
Use printf instead of echo -n. Use Basic Regular Expressions instead of
sed -r. Check for error from ancient Git versions that don't support the
--git-path option for git-rev-parse. Remove -c flag from install
command, as it's ignored by GNU and BSD install, but means something
different for Solaris and AIX.
contrib/ChangeLog:
PR other/102664
* gcc-git-customization.sh: Fix non-portable commands.
The documentation states about the predicable instruction attribute:
...
This attribute must be a boolean (i.e. have exactly two elements in its
list-of-values), with the possible values being no and yes.
...
The nvptx port has instead:
...
(define_attr "predicable" "false,true"
(const_string "true"))
...
Fix this by updating to:
...
(define_attr "predicable" "no,yes"
(const_string "yes"))
...
Tested on nvptx.
gcc/ChangeLog:
2022-03-08 Tom de Vries <tdevries@suse.de>
PR target/104840
* config/nvptx/nvptx.md (define_attr "predicable"): Use no,yes instead
of false,true.
I ran into a hang for this code:
...
#pragma omp target map(tofrom: counter_N0)
#pragma omp simd
for (int i = 0 ; i < 1 ; i++ )
{
#pragma omp atomic update
counter_N0 = counter_N0 + 1 ;
}
...
This has to do with the nature of -muniform-simt. It has two modes of
operation: inside and outside an SIMT region.
Outside an SIMT region, a warp pretends to execute a single thread, but
actually executes in all threads, to keep the local registers in all threads
consistent. This approach works unless the insn that is executed is a syscall
or an atomic insn. In that case, the insn is predicated, such that it
executes in only one thread. If the predicated insn writes a result to a
register, then that register is propagated to the other threads, after which
the local registers in all threads are consistent again.
Inside an SIMT region, a warp executes in all threads. However, the
predication and propagation for syscalls and atomic insns is also present
here, because nvptx_reorg_uniform_simt works on all code. Care has been taken
though to ensure that the predication and propagation is a nop. That is,
inside an SIMT region:
- the predicate evalutes to true for each thread, and
- the propagation insn copies a register from each thread to the same thread.
That works fine, until we use -mptx=6.0, and instead of using the deprecated
warp propagation insn shfl, we start using shfl.sync:
...
@%r33 atom.add.u32 _, [%r29], 1;
shfl.sync.idx.b32 %r30, %r30, %r32, 31, 0xffffffff;
...
The shfl.sync specifies a member mask indicating all threads, but given that
the loop only has a single iteration, only thread 0 will execute the insn,
where it will hang waiting for the other threads.
Fix this by predicating the shfl.sync (and likewise, bar.warp.sync and the
uniform warp check) such that it only executes outside the SIMT region.
Tested on x86_64 with nvptx accelerator.
gcc/ChangeLog:
2022-03-08 Tom de Vries <tdevries@suse.de>
PR target/104783
* config/nvptx/nvptx.cc (nvptx_init_unisimt_predicate)
(nvptx_output_unisimt_switch): Handle unisimt_outside_simt_predicate.
(nvptx_get_unisimt_outside_simt_predicate): New function.
(predicate_insn): New function, factored out of ...
(nvptx_reorg_uniform_simt): ... here. Predicate all emitted insns.
* config/nvptx/nvptx.h (struct machine_function): Add
unisimt_outside_simt_predicate field.
* config/nvptx/nvptx.md (define_insn "nvptx_warpsync")
(define_insn "nvptx_uniform_warp_check"): Make predicable.
libgomp/ChangeLog:
2022-03-10 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.c/pr104783.c: New test.
For an example:
...
#pragma omp target map(tofrom: counter_N0)
#pragma omp simd
for (int i = 0 ; i < 1 ; i++ )
{
#pragma omp atomic update
counter_N0 = counter_N0 + 1 ;
}
...
I noticed that the result of the atomic update (%r30) is propagated:
...
@%r33 atom.add.u32 _, [%r29], 1;
shfl.sync.idx.b32 %r30, %r30, %r32, 31, 0xffffffff;
...
even though it is unused (which is why the bit bucket operand _ is used).
Fix this by not emitting the shuffle in this case, such that we have instead:
...
@%r33 atom.add.u32 _, [%r29], 1;
bar.warp.sync 0xffffffff;
...
Tested on nvptx.
gcc/ChangeLog:
2022-03-07 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.cc (nvptx_unisimt_handle_set): Handle unused
result.
gcc/testsuite/ChangeLog:
2022-03-07 Tom de Vries <tdevries@suse.de>
* gcc.target/nvptx/uniform-simt-4.c: New test.
For an atomic fetch operation that doesn't use the result:
...
__atomic_fetch_add (p64, v64, MEMMODEL_RELAXED);
...
we currently emit:
...
atom.add.u64 %r26, [%r25], %r27;
...
Detect the REG_UNUSED reg-note for %r26, and emit instead:
...
atom.add.u64 _, [%r25], %r27;
...
Likewise for all atom insns.
Tested on nvptx.
gcc/ChangeLog:
2022-03-07 Tom de Vries <tdevries@suse.de>
PR target/104815
* config/nvptx/nvptx.cc (nvptx_print_operand): Handle 'x' operand
modifier.
* config/nvptx/nvptx.md: Use %x0 destination operand in atom insns.
gcc/testsuite/ChangeLog:
2022-03-07 Tom de Vries <tdevries@suse.de>
PR target/104815
* gcc.target/nvptx/atomic-bit-bucket-dest.c: New test.
The ptx manual prescribes the instruction format atom{.space}.op.type but the
compiler currently emits:
...
atom.b64.and %r31, [%r30], %r32;
...
which uses the instruction format atom{.space}.type.op.
Fix this by emitting instead:
...
atom.and.b64 %r31, [%r30], %r32;
...
Tested on nvptx.
gcc/ChangeLog:
2022-03-07 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.md (define_insn "atomic_fetch_<logic><mode>"):
Emit atom.and.b64 instead of atom.b64.and.
gcc/testsuite/ChangeLog:
2022-03-07 Tom de Vries <tdevries@suse.de>
* gcc.target/nvptx/atomic_fetch-1.c: Update.
* gcc.target/nvptx/atomic_fetch-2.c: Update.
With commit 5b5e456f01 ("[nvptx] Build libraries with mptx=3.1") the
intention was that the ptx isa version for all libraries was switched back to
3.1 using MULTILIB_EXTRA_OPTS, without changing the default 6.0.
Further testing revealed that this is not the case, and some libs were still
build with 6.0.
Fix this by introducing an mptx=3.1 multilib.
Adding a multilib should be avoided if possible, because it adds build time.
But I think it's a reasonable trade-off. With --disable-multilib, the default
lib with misa=sm_30 and mptx=6.0 should be usable in most scenarios. With
--enable-multilib, we can enable older drivers, as well as generate code
similar to how that was done in previous gcc releases, which is very useful.
Tested on nvptx.
gcc/ChangeLog:
2022-03-07 Tom de Vries <tdevries@suse.de>
* config/nvptx/t-nvptx (MULTILIB_EXTRA_OPTS): Move mptx=3.1 ...
(MULTILIB_OPTIONS): ... here.
With commit 07667c911b ("[nvptx] Build libraries with misa=sm_30") the
intention was that the sm_xx for all libraries was switched back to sm_30
using MULTILIB_EXTRA_OPTS, without changing the default sm_35.
Testing on an sm_30 board revealed that still some libs were build with sm_35,
so fix this by switching back to default sm_30.
Tested on nvptx.
gcc/ChangeLog:
2022-03-07 Tom de Vries <tdevries@suse.de>
PR target/104758
* config/nvptx/nvptx.opt (misa): Set default to sm_30.
* config/nvptx/t-nvptx (MULTILIB_EXTRA_OPTS): Remove misa=sm_30.