When running with target board unix/-foffload=-mptx=3.1, we run into:
...
lto1: error: PTX version (-mptx) needs to be at least 4.2 to support \
selected -misa (sm_53)^M
mkoffload: fatal error: x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned \
1 exit status^M
compilation terminated.^M
...
FAIL: libgomp.c/declare-variant-3-sm53.c (test for excess errors)
...
Fix this by adding -foffload=-mptx=_ in the libgomp.c/declare-variant-3-sm*.c
test-cases.
Tested on x86_64 with nvptx accelerator.
libgomp/ChangeLog:
2022-02-28 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.c/declare-variant-3-sm30.c: Add -foffload=-mptx=_.
* testsuite/libgomp.c/declare-variant-3-sm35.c: Same.
* testsuite/libgomp.c/declare-variant-3-sm53.c: Same.
* testsuite/libgomp.c/declare-variant-3-sm70.c: Same.
* testsuite/libgomp.c/declare-variant-3-sm75.c: Same.
* testsuite/libgomp.c/declare-variant-3-sm80.c: Same.
When running with target board nvptx-none-run/-mptx=3.1, I run into:
...
cc1: error: PTX version (-mptx) needs to be at least 4.2 to support selected \
-misa (sm_53)^M
compiler exited with status 1
FAIL: gcc.target/nvptx/atomic-store-1.c (test for excess errors)
...
Fix this and similar cases by adding an explicit -mptx=_ setting.
Tested on nvptx.
gcc/testsuite/ChangeLog:
2022-02-28 Tom de Vries <tdevries@suse.de>
* gcc.target/nvptx/atomic-store-1.c: Add -mptx=_.
* gcc.target/nvptx/atomic-store-2.c: Same.
* gcc.target/nvptx/float16-1.c: Same.
* gcc.target/nvptx/float16-2.c: Same.
* gcc.target/nvptx/float16-3.c: Same.
* gcc.target/nvptx/float16-4.c: Same.
* gcc.target/nvptx/float16-5.c: Same.
* gcc.target/nvptx/float16-6.c: Same.
* gcc.target/nvptx/tanh-1.c: Same.
* gcc.target/nvptx/uniform-simt-1.c: Same.
* gcc.target/nvptx/uniform-simt-3.c: Same.
Add an -mptx=_ value, that indicates the default ptx version.
It can be used to undo an explicit -mptx setting, so this:
...
$ gcc test.c -mptx=3.1 -mptx=_
...
has the same effect as:
...
$ gcc test.c
...
Tested on nvptx.
gcc/ChangeLog:
2022-02-28 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx-opts.h (enum ptx_version): Add
PTX_VERSION_default.
* config/nvptx/nvptx.cc (handle_ptx_version_option): Handle
PTX_VERSION_default.
* config/nvptx/nvptx.opt: Add EnumValue "_" / PTX_VERSION_default.
When running with target board nvptx-none-run/-misa=sm_70 I run into:
...
FAIL: gcc.target/nvptx/atomic-store-3.c scan-assembler-times st.global.u32 1
FAIL: gcc.target/nvptx/atomic-store-3.c scan-assembler-times st.global.u64 1
...
Fix this by adding an explicit -misa=sm_30 in the test-case.
Tested on nvptx.
gcc/testsuite/ChangeLog:
2022-02-28 Tom de Vries <tdevries@suse.de>
* gcc.target/nvptx/atomic-store-3.c: Add -misa=sm_30.
When running with target board nvptx-none-run/-misa=sm_53 we run into:
...
cc1: error: PTX version (-mptx) needs to be at least 4.2 to support selected \
-misa (sm_53)^M
compiler exited with status 1
FAIL: gcc.target/nvptx/uniform-simt-2.c (test for excess errors)
...
Fix this by adding an explicit -misa=sm_30 in the test-case.
Tested on nvptx.
gcc/testsuite/ChangeLog:
2022-02-28 Tom de Vries <tdevries@suse.de>
* gcc.target/nvptx/uniform-simt-2.c: Add -misa=sm_30.
When running with target board nvptx-none-run/-misa=sm_30 we run into:
...
FAIL: gcc.target/nvptx/rotate.c scan-assembler-times shf.l.wrap.b32 1
FAIL: gcc.target/nvptx/rotate.c scan-assembler-times shf.r.wrap.b32 1
FAIL: gcc.target/nvptx/rotate.c scan-assembler-not and.b32
...
Fix this by adding an explicit -misa=sm_35 in the test-case.
Tested on nvptx.
gcc/testsuite/ChangeLog:
2022-02-28 Tom de Vries <tdevries@suse.de>
* gcc.target/nvptx/rotate.c: Add -misa=sm_35.
The following replaces
/* Skip bits that are zero. */
for (; (word & 1) == 0; word >>= 1)
bit_num++;
idioms in ira-int.h in the attempt to speedup update_conflict_hard_regno_costs
which we're bound on in PR104686. The trick is to use ctz_hwi here
which should pay off even with dense bitmaps on architectures that
have HW support for this.
For the PR in question this speeds up compile-time from 31s to 24s for
me.
2022-02-25 Richard Biener <rguenther@suse.de>
PR rtl-optimization/104686
* ira-int.h (minmax_set_iter_cond): Use ctz_hwi to elide loop
skipping bits that are zero.
(ira_object_conflict_iter_cond): Likewise.
Sync with llvm change in https://reviews.llvm.org/D120307 to
add enumeration and truncate imm to unsigned char, so users could
use ~ on immediates.
gcc/ChangeLog:
* config/i386/avx512fintrin.h (_MM_TERNLOG_ENUM): New enum.
(_mm512_ternarylogic_epi64): Truncate imm to unsigned
char to avoid error when using ~enum as parameter.
(_mm512_mask_ternarylogic_epi64): Likewise.
(_mm512_maskz_ternarylogic_epi64): Likewise.
(_mm512_ternarylogic_epi32): Likewise.
(_mm512_mask_ternarylogic_epi32): Likewise.
(_mm512_maskz_ternarylogic_epi32): Likewise.
* config/i386/avx512vlintrin.h (_mm256_ternarylogic_epi64):
Adjust imm param type to unsigned char.
(_mm256_mask_ternarylogic_epi64): Likewise.
(_mm256_maskz_ternarylogic_epi64): Likewise.
(_mm256_ternarylogic_epi32): Likewise.
(_mm256_mask_ternarylogic_epi32): Likewise.
(_mm256_maskz_ternarylogic_epi32): Likewise.
(_mm_ternarylogic_epi64): Likewise.
(_mm_mask_ternarylogic_epi64): Likewise.
(_mm_maskz_ternarylogic_epi64): Likewise.
(_mm_ternarylogic_epi32): Likewise.
(_mm_mask_ternarylogic_epi32): Likewise.
(_mm_maskz_ternarylogic_epi32): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512f-vpternlogd-1.c: Use new enum.
* gcc.target/i386/avx512f-vpternlogq-1.c: Likewise.
* gcc.target/i386/avx512vl-vpternlogd-1.c: Likewise.
* gcc.target/i386/avx512vl-vpternlogq-1.c: Likewise.
* gcc.target/i386/testimm-10.c: Remove imm check for vpternlog
insns since the imm has been truncated in intrinsic.
The patch for PR90451 deferred marking to the point of actual use; we missed
this one because of the parens.
PR c++/104618
gcc/cp/ChangeLog:
* typeck.cc (cp_build_addr_expr_1): Also
maybe_undo_parenthesized_ref.
gcc/testsuite/ChangeLog:
* g++.dg/overload/paren1.C: New test.
The declarations of _DINFINITY, _SINFINITY and _SQNAN need to be constant
expressions.
2022-02-27 John David Anglin <danglin@gcc.gnu.org>
fixincludes/ChangeLog:
* inclhack.def (hpux_math_constexpr): New hack.
* fixincl.x: Regenerate.
* tests/base/math.h: Update.
Mark mentioned in the PR further 2 simplifications that also ICE
with complex types.
For these, eventually (but IMO GCC 13 materials) we could support it
for vector types if it would be uniform vector constants.
Currently integer_pow2p is true only for INTEGER_CSTs and COMPLEX_CSTs
and we can't use bit_and etc. for complex type.
2022-02-25 Jakub Jelinek <jakub@redhat.com>
Marc Glisse <marc.glisse@inria.fr>
PR tree-optimization/104675
* match.pd (t * 2U / 2 -> t & (~0 / 2), t / 2U * 2 -> t & ~1):
Restrict simplifications to INTEGRAL_TYPE_P.
* gcc.dg/pr104675-3.c : New test.
The following testcase ICEs, because for some strange reason it decides to use
movmisaligntf during expansion where the destination is MEM and source is
CONST_DOUBLE. For normal mov<mode> expanders the rs6000 backend uses
rs6000_emit_move to ensure that if one operand is a MEM, the other is a REG
and a few other things, but for movmisalign<mode> nothing enforced this.
The middle-end documents that movmisalign<mode> shouldn't fail, so we can't
force that through predicates or condition on the expander.
2022-02-25 Jakub Jelinek <jakub@redhat.com>
PR target/104681
* config/rs6000/vector.md (movmisalign<mode>): Use rs6000_emit_move.
* g++.dg/opt/pr104681.C: New test.
Both -mforce-drap and -mstackrealign options are x86 specific.
2022-02-25 Jakub Jelinek <jakub@redhat.com>
* g++.dg/pr104540.C: Move to ...
* g++.target/i386/pr104540.C: ... here.
If the movcc comparison is not valid it triggers an assert in the
current implementation. This behavior is not needed as we can FAIL
the movcc expand pattern.
gcc/
* config/arc/arc.cc (gen_compare_reg): Return NULL_RTX if the
comparison is not valid.
* config/arc/arc.md (movsicc): Fail if comparison is not valid.
(movdicc): Likewise.
(movsfcc): Likewise.
(movdfcc): Likewise.
Signed-off-by: Claudiu Zissulescu <claziss@synopsys.com>
This fixes a long-standing issue in PRE where we track valueized
expressions in our expression sets that we use for PHI translation,
code insertion but also feed into match-and-simplify via
vn_nary_simplify. But that's not what is expected from vn_nary_simplify
or match-and-simplify which assume we are simplifying with operands
available at the point of the expression so they can use contextual
information on the SSA names like ranges. While the VN side was
updated to ensure this with the rewrite to RPO VN, thereby removing
all workarounds that nullified such contextual info on all SSA names,
the PRE side still suffers from this.
The following patch tries to apply minimal surgery at this point
and makes PRE track un-valueized expressions in the expression sets
but only for the NARY kind (both NAME and CONSTANT do not suffer
from this issue), leaving the REFERENCE kind alone. The REFERENCE
kind is important when trying to remove the workarounds still in
place in compute_avail for code hoisting, but that's a separate issue
and we have a working workaround in place.
Doing this comes at the cost of duplicating the VN IL on the PRE side
for NARY and eventually some extra overhead for translated expressions
that is difficult to assess.
2022-02-25 Richard Biener <rguenther@suse.de>
PR tree-optimization/103037
* tree-ssa-sccvn.h (alloc_vn_nary_op_noinit): Declare.
(vn_nary_length_from_stmt): Likewise.
(init_vn_nary_op_from_stmt): Likewise.
(vn_nary_op_compute_hash): Likewise.
* tree-ssa-sccvn.cc (alloc_vn_nary_op_noinit): Export.
(vn_nary_length_from_stmt): Likewise.
(init_vn_nary_op_from_stmt): Likewise.
(vn_nary_op_compute_hash): Likewise.
* tree-ssa-pre.cc (pre_expr_obstack): New obstack.
(get_or_alloc_expr_for_nary): Pass in the value-id to use,
(re-)compute the hash value and if the expression is not
found allocate it from pre_expr_obstack.
(phi_translate_1): Do not insert the NARY found in the
VN tables but build a PRE expression from the valueized
NARY with the value-id we eventually found.
(find_or_generate_expression): Assert we have an entry
for constant values.
(compute_avail): Insert not valueized expressions into
EXP_GEN using the value-id from the VN tables.
(init_pre): Allocate pre_expr_obstack.
(fini_pre): Free pre_expr_obstack.
* gcc.dg/torture/pr103037.c: New testcase.
As mentioned in the PR, the following testcase is miscompiled for similar
reasons as the already fixed PR78791 - we use SLOT_TEMP slots in various
places during expansion and during expansion we can guarantee that the
lifetime of those temporary slot doesn't overlap. But the following
splitter uses SLOT_TEMP too and in between expansion and split1 there is
a possibility that something extends the lifetime of SLOT_TEMP created
slots across an instruction that will be split by this splitter.
The following patch fixes it by using a new temp slot kind to make sure
it doesn't reuse a SLOT_TEMP that could be live across the instruction.
2022-02-25 Jakub Jelinek <jakub@redhat.com>
PR target/104674
* config/i386/i386.h (enum ix86_stack_slot): Add SLOT_FLOATxFDI_387.
* config/i386/i386.md (splitter to floatdi<mode>2_i387_with_xmm): Use
SLOT_FLOATxFDI_387 rather than SLOT_TEMP.
* gcc.target/i386/pr104674.c: New test.
This fixes a spelling mistake I found while looking at warning-control
implementation.
2022-02-25 Jakub Jelinek <jakub@redhat.com>
* warning-control.cc (get_nowarn_spec): Comment spelling fix.
The following testcase is miscompiled on ia32 at -O2, because
when expand_SPACESHIP is called, we have pending stack adjustment
from the foo call right before it.
Now, ix86_expand_fp_spaceship uses emit_jump_insn several times
but then emit_jump also several times. While emit_jump_insn doesn't
do do_pending_stack_adjust (), emit_jump does, so we end up with:
...
8: call [`_Z3foodl'] argc:0x10
REG_CALL_DECL `_Z3foodl'
9: r88:DF=[`a']
10: r89:HI=unspec[cmp(r88:DF,0.0)] 25
11: flags:CC=unspec[r89:HI] 26
12: pc={(unordered(flags:CCFP,0))?L27:pc}
REG_BR_PROB 536868
66: NOTE_INSN_BASIC_BLOCK 4
13: pc={(uneq(flags:CCFP,0))?L19:pc}
REG_BR_PROB 214748364
67: NOTE_INSN_BASIC_BLOCK 5
14: pc={(flags:CCFP>0)?L23:pc}
REG_BR_PROB 536870916
68: NOTE_INSN_BASIC_BLOCK 6
15: r86:SI=0xffffffffffffffff
16: {sp:SI=sp:SI+0x10;clobber flags:CC;}
REG_ARGS_SIZE 0
17: pc=L29
18: barrier
19: L19:
69: NOTE_INSN_BASIC_BLOCK 7
...
The sp += 16 pending stuck adjust was emitted in the middle of the
sequence and is effective only for the single case of the 4 possibilities
where .SPACESHIP returns -1, in all other cases the stack isn't adjusted
and so we ICE during dwarf2cfi.
Now, we could either call do_pending_stack_adjust in
ix86_expand_fp_spaceship, or use there calls that actually don't call
do_pending_stack_adjust (but having the stack adjustment across branches is
generally undesirable), or we can call it in expand_SPACESHIP for all
targets (note, just i386 currently implements it).
I chose the generic code because e.g. expand_{addsub,neg,mul}_overflow
in the same file also call do_pending_stack_adjust in internal-fn.cc for the
same reasons, that it is expected that most if not all targets will expand
those through jumps and we don't want all of the targets to need to deal
with that.
2022-02-25 Jakub Jelinek <jakub@redhat.com>
PR middle-end/104679
* internal-fn.cc (expand_SPACESHIP): Call do_pending_stack_adjust.
* g++.dg/torture/pr104679.C: New test.
We don't support BIT_{AND,IOR,XOR,NOT}_EXPR on complex types,
&/|/^ are just rejected for them, and ~ is parsed as CONJ_EXPR.
So, we should avoid simplifications which turn valid complex type
expressions into something that will ICE during expansion.
2022-02-25 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/104675
* match.pd (-A - 1 -> ~A, -1 - A -> ~A): Don't simplify for
COMPLEX_TYPE.
* gcc.dg/pr104675-1.c: New test.
* gcc.dg/pr104675-2.c: New test.
The patch for PR103302 caused PR104121, and extended the live ranges
of LRA reloads.
for gcc/ChangeLog
PR target/104121
PR target/103302
* expr.cc (emit_move_multi_word): Restore clobbers during LRA.
This problem was already fixed as part of PR104263: the abnormal edge
that remained from before inlining didn't make sense after inlining.
So this patch adds only the testcase.
for gcc/testsuite/ChangeLog
PR tree-optimization/103845
PR tree-optimization/104263
* gcc.dg/pr103845.c: New.
In def_cfa_0, we may set the 2nd operand's dw_cfi_cfa_loc to NULL, but
then cfi_oprnd_equal_p calls cfa_equal_p with a NULL dw_cfa_location*.
This patch aranges for us to tolerate NULL dw_cfi_cfa_loc.
for gcc/ChangeLog
PR middle-end/104540
* dwarf2cfi.cc (cfi_oprnd_equal_p): Cope with NULL
dw_cfi_cfa_loc.
for gcc/testsuite/ChangeLog
PR middle-end/104540
* g++.dg/pr104540.C: New.
When we duplicate a throwing compare for hardening, the EH edge from
the original compare gets duplicated for the inverted compare, but we
failed to adjust any PHI nodes in the EH block. This patch adds the
needed adjustment, copying the PHI args from those of the preexisting
edge.
for gcc/ChangeLog
PR tree-optimization/103856
* gimple-harden-conditionals.cc (non_eh_succ_edge): Enable the
eh edge to be requested through an extra parameter.
(pass_harden_compares::execute): Copy PHI args in the EH dest
block for the new EH edge added for the inverted compare.
for gcc/testsuite/ChangeLog
PR tree-optimization/103856
* g++.dg/pr103856.C: New.
This fixes a problem for Clang, which is going to return a non-void
pointer from __builtin_source_location(). The current definition of
std::source_location::current() converts that to void* and then has to
cast it back again in the body (which makes it invalid in a constant
expression). By using the actual type of the returned pointer, we avoid
the problematic cast for Clang.
libstdc++-v3/ChangeLog:
PR libstdc++/104602
* include/std/source_location (source_location::current): Use
deduced type of __builtin_source_location().
Fortran 2018 allows for a QUIET specifier to the STOP and ERROR STOP
statements. Whilst the gfortran library code provides support for this
specifier for quite some time, the frontend implementation was missing.
gcc/fortran/ChangeLog:
PR fortran/84519
* dump-parse-tree.cc (show_code_node): Dump QUIET specifier when
present.
* match.cc (gfc_match_stopcode): Implement parsing of F2018 QUIET
specifier. F2018 stopcodes may have non-default integer kind.
* resolve.cc (gfc_resolve_code): Add checks for QUIET argument.
* trans-stmt.cc (gfc_trans_stop): Pass QUIET specifier to call of
library function.
gcc/testsuite/ChangeLog:
PR fortran/84519
* gfortran.dg/stop_1.f90: New test.
* gfortran.dg/stop_2.f: New test.
* gfortran.dg/stop_3.f90: New test.
* gfortran.dg/stop_4.f90: New test.
The code generated by -mcmodel=medany is defined to be
position-independent, but is not guaranteed to function correctly when
linked into position-independent executables or libraries. See the
recent discussion at the psABI specification [1] for more details.
It would be better to reject these invalid sequences when linking, but
as pointed out in a recent LD bug [2] there may be some compatibility
issues related to the PCREL_HI20 relocations used to initialize GP.
Given the complexity here it's unlikely we'll be able to reject these
sequences any time soon, so instead just document that these may not
work.
[1]: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/245
[2]: https://sourceware.org/bugzilla/show_bug.cgi?id=28789
gcc/ChangeLog:
* doc/invoke.texi (RISC-V -mcmodel=medany): Document the degree
of position independence that -mcmodel=medany affords.
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The third parameter of find_fde_tail is an _Unwind_Ptr (which is an
integer type instead of a pointer), but we are passing NULL to it. This
causes a -Wint-conversion warning.
libgcc/
* unwind-dw2-fde-dip.c (_Unwind_Find_FDE): Call find_fde_tail
with 0 instead of NULL.
Fixes:
gcc/cp/pt.cc:13755:23: warning: suggest braces around initialization of subobject [-Wmissing-braces]
tree_vec_map in = { fn, nullptr };
gcc/cp/ChangeLog:
* pt.cc (defarg_insts_for): Use braces for subobject.
This patch changes the build machinery in order to disable the build
of GCOV (both compiler and libgcc) in bpf-*-* targets. The reason for
this change is that BPF is (currently) too restricted in order to
support the coverage instrumentalization.
Tested in bpf-unknown-none and x86_64-linux-gnu targets.
2022-02-23 Jose E. Marchesi <jose.marchesi@oracle.com>
gcc/ChangeLog
PR target/104656
* configure.ac: --disable-gcov if targetting bpf-*.
* configure: Regenerate.
libgcc/ChangeLog
PR target/104656
* configure.ac: --disable-gcov if targetting bpf-*.
* configure: Regenerate.
Loop distribution can release SSA names used in nb_iterations, make
sure to release those.
2022-02-24 Richard Biener <rguenther@suse.de>
PR tree-optimization/104676
* tree-loop-distribution.cc (loop_distribution::execute):
Do a full scev_reset.
* gcc.dg/torture/pr104676.c: New testcase.
The following testcase is miscompiled, because -fipa-pure-const discovers
that bar is const, but when sccvn during fre3 sees
# .MEM_140 = VDEF <.MEM_96>
*__pred$__d_43 = _50 (_49);
where _50 value numbers to &bar, it value numbers .MEM_140 to
vuse_ssa_val (gimple_vuse (stmt)). For const/pure calls that return
a SSA_NAME (or don't have lhs) that is fine, those calls don't store
anything, but if the lhs is present and not an SSA_NAME, value numbering
the vdef to anything but itself means that e.g. walk_non_aliased_vuses
won't consider the call, but the call acts as a store to its lhs.
When it is ignored, sccvn will return whatever has been stored to the
lhs earlier.
I've bootstrapped/regtested an earlier version of this patch, which did the
if (!lhs && gimple_call_lhs (stmt))
changed |= set_ssa_val_to (vdef, vdef);
part before else if (vnresult->result_vdef), and that regressed
+FAIL: gcc.dg/pr51879-16.c scan-tree-dump-times pre "foo \\\\(" 1
+FAIL: gcc.dg/pr51879-16.c scan-tree-dump-times pre "foo2 \\\\(" 1
so this updated patch uses result_vdef there as before and only otherwise
(which I think must be the const/pure case) decides based on whether the
lhs is non-SSA_NAME.
2022-02-24 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/104601
* tree-ssa-sccvn.cc (visit_reference_op_call): For calls with
non-SSA_NAME lhs value number vdef to itself instead of e.g. the
vuse value number.
* g++.dg/torture/pr104601.C: New test.
Add openmp test-cases that test the omp declare variant construct:
...
#pragma omp declare variant (f30) match (device={isa("sm_30")})
...
using the available nvptx isas.
Only the one for sm_30 is a dg-do run test-case, the other ones are dg-do
link.
Tested on x86_64 with nvptx accelerator.
libgomp/ChangeLog:
2022-02-24 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.c/declare-variant-3-sm30.c: New test.
* testsuite/libgomp.c/declare-variant-3-sm35.c: New test.
* testsuite/libgomp.c/declare-variant-3-sm53.c: New test.
* testsuite/libgomp.c/declare-variant-3-sm70.c: New test.
* testsuite/libgomp.c/declare-variant-3-sm75.c: New test.
* testsuite/libgomp.c/declare-variant-3-sm80.c: New test.
* testsuite/libgomp.c/declare-variant-3.h: New header file.
In t-omp-device we list isas that can be used in omp declare variant like so:
...
#pragma omp declare variant (f30) match (device={isa("sm_30")})
...
and in nvptx_omp_device_kind_arch_isa we handle them.
Update both to reflect the current list of isas.
Tested on x86_64-linux with nvptx accelerator.
gcc/ChangeLog:
2022-02-23 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.cc (nvptx_omp_device_kind_arch_isa): Handle
sm_70, sm_75 and sm_80.
* config/nvptx/t-omp-device: Add sm_53, sm_70, sm_75 and sm_80.
Co-Authored-By: Tobias Burnus <tobias@codesourcery.com>
Ptx contains funnel shift operations shf.l.wrap and shf.r.wrap that can be
used to implement 32-bit left or right rotate.
Add define_insns rotlsi3 and rotrsi3.
Tested on nvptx.
gcc/ChangeLog:
2022-02-23 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.md (define_insn "rotlsi3", define_insn
"rotrsi3"): New define_insn.
gcc/testsuite/ChangeLog:
2022-02-23 Tom de Vries <tdevries@suse.de>
* gcc.target/nvptx/rotate-run.c: New test.
* gcc.target/nvptx/rotate.c: New test.
I committed "[nvptx] Add -mptx-comment", but tested it in combination with the
proposed "[final] Handle compiler-generated asm insn" (
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590721.html ), so
by itself the commit introduced some regressions:
...
FAIL: gcc.dg/20020426-2.c (internal compiler error: Segmentation fault)
FAIL: gcc.dg/analyzer/zlib-3.c (internal compiler error: Segmentation fault)
FAIL: gcc.dg/pr101223.c (internal compiler error: Segmentation fault)
FAIL: gcc.dg/torture/pr80764.c -O2 (internal compiler error: Segmentation fault)
...
There are due to cfun->function_start_locus == 0.
Fix these by using DECL_SOURCE_LOCATION (cfun->decl) instead.
Tested on nvptx.
gcc/ChangeLog:
2022-02-23 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.cc (gen_comment): Use
DECL_SOURCE_LOCATION (cfun->decl) instead of cfun->function_start_locus.
For evex encoding vp{xor,or,and}, suffix is needed.
Or there would be an error for
vpxor %xmm0, %xmm31, %xmm1
Error: unsupported instruction `vpxor'
gcc/ChangeLog:
* config/i386/sse.md (<code>v1ti3): Add suffix and replace
isa attr of alternative 2 from avx to avx512vl.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512vl-logicsuffix-1.c: New test.
When testing -fanalyzer on openblas-0.3, I noticed slightly over 2000
false positives from -Wanalyzer-malloc-leak on code like this:
if( LAPACKE_lsame( vect, 'b' ) || LAPACKE_lsame( vect, 'p' ) ) {
pt_t = (lapack_complex_float*)
LAPACKE_malloc( sizeof(lapack_complex_float) *
ldpt_t * MAX(1,n) );
[...snip...]
}
[...snip lots of code...]
if( LAPACKE_lsame( vect, 'b' ) || LAPACKE_lsame( vect, 'q' ) ) {
LAPACKE_free( pt_t );
}
where LAPACKE_lsame is a char-comparison function implemented in a
different TU.
The analyzer naively considers the execution path where:
LAPACKE_lsame( vect, 'b' ) || LAPACKE_lsame( vect, 'p' )
is true at the malloc guard, but then false at the free guard, which
is thus a memory leak.
This patch makes -fanalyer respect __attribute__((const)), so that the
analyzer treats such functions as returning the same value when given
the same inputs.
I've filed https://github.com/xianyi/OpenBLAS/issues/3543 suggesting that
LAPACKE_lsame be annotated with __attribute__((const)); with that, and
with this patch, the false positives seem to be fixed.
gcc/analyzer/ChangeLog:
PR analyzer/104434
* analyzer.h (class const_fn_result_svalue): New decl.
* region-model-impl-calls.cc (call_details::get_manager): New.
* region-model-manager.cc
(region_model_manager::get_or_create_const_fn_result_svalue): New.
(region_model_manager::log_stats): Log
m_const_fn_result_values_map.
* region-model.cc (const_fn_p): New.
(maybe_get_const_fn_result): New.
(region_model::on_call_pre): Handle fndecls with
__attribute__((const)) by calling the above rather than making
a conjured_svalue.
* region-model.h (visitor::visit_const_fn_result_svalue): New.
(region_model_manager::get_or_create_const_fn_result_svalue): New
decl.
(region_model_manager::const_fn_result_values_map_t): New typedef.
(region_model_manager::m_const_fn_result_values_map): New field.
(call_details::get_manager): New decl.
* svalue.cc (svalue::cmp_ptr): Handle SK_CONST_FN_RESULT.
(const_fn_result_svalue::dump_to_pp): New.
(const_fn_result_svalue::dump_input): New.
(const_fn_result_svalue::accept): New.
* svalue.h (enum svalue_kind): Add SK_CONST_FN_RESULT.
(svalue::dyn_cast_const_fn_result_svalue): New.
(class const_fn_result_svalue): New.
(is_a_helper <const const_fn_result_svalue *>::test): New.
(template <> struct default_hash_traits<const_fn_result_svalue::key_t>):
New.
gcc/testsuite/ChangeLog:
PR analyzer/104434
* gcc.dg/analyzer/attr-const-1.c: New test.
* gcc.dg/analyzer/attr-const-2.c: New test.
* gcc.dg/analyzer/attr-const-3.c: New test.
* gcc.dg/analyzer/pr104434-const.c: New test.
* gcc.dg/analyzer/pr104434-nonconst.c: New test.
* gcc.dg/analyzer/pr104434.h: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
A nice side effect of r12-1822 was improving the diagnostic
we emit for the following test.
PR c++/79493
gcc/testsuite/ChangeLog:
* g++.dg/diagnostic/undeclared1.C: New test.
The following patch avoids infinite recursion during generic folding.
The (cmp (bswap @0) INTEGER_CST@1) simplification relies on
(bswap @1) actually being simplified, if it is not simplified, we just
move the bswap from one operand to the other and if @0 is also INTEGER_CST,
we apply the same rule next.
The reason why bswap @1 isn't folded to INTEGER_CST is that the INTEGER_CST
has TREE_OVERFLOW set on it and fold-const-call.cc predicate punts in
such cases:
static inline bool
integer_cst_p (tree t)
{
return TREE_CODE (t) == INTEGER_CST && !TREE_OVERFLOW (t);
}
The patch uses ! modifier to ensure the bswap is simplified and
extends support to GENERIC by means of requiring !EXPR_P which
is not perfect but a conservative approximation.
2022-02-22 Richard Biener <rguenther@suse.de>
PR tree-optimization/104644
* doc/match-and-simplify.texi: Amend ! documentation.
* genmatch.cc (expr::gen_transform): Code-generate ! support
for GENERIC.
(parser::parse_expr): Allow ! for GENERIC.
* match.pd (cmp (bswap @0) INTEGER_CST@1): Use ! modifier on
bswap.
* gcc.dg/pr104644.c: New test.
Co-Authored-by: Jakub Jelinek <jakub@redhat.com>
Currently we fail to parse
int * _3;
as SSA name and instead get a VAR_DECL because of the way the C
frontends declarator specs work. That causes havoc if those
supposed SSA names are used in PHIs or in other places where
VAR_DECLs are not allowed. The following fixes the pointer case
in an ad-hoc way - for more complex type declarators we probably
have to find a way to re-use the C frontend grokdeclarator without
actually creating a VAR_DECL there (or maybe make it create an
SSA name).
Pointers appear too often to be neglected though, thus the following
ad-hoc fix for this. This also adds verification that we do not
end up with SSA names without definitions as can happen when
reducing a GIMPLE testcase. Instead of working through segfaults
one-by-one we emit errors for all of those at once now.
2022-02-23 Richard Biener <rguenther@suse.de>
gcc/c
* gimple-parser.cc (c_parser_parse_gimple_body): Diagnose
SSA names without definition.
(c_parser_gimple_declaration): Handle pointer typed SSA names.
gcc/testsuite/
* gcc.dg/gimplefe-49.c: New testcase.
* gcc.dg/gimplefe-error-13.c: Likewise.
The following fixes an ICE when vectorizing the defs of a CTOR
results in a different vector type than expected. That can happen
with AARCH64 SVE and a fixed vector length as noted in r10-5979
and on x86 with AVX512 mask CTORs and trying to re-vectorize
using SSE as shown in this bug.
The fix is simply to reject the vectorization when it didn't
produce the desired type.
2022-02-23 Richard Biener <rguenther@suse.de>
PR tree-optimization/101636
* tree-vect-slp.cc (vect_print_slp_tree): Dump the
vector type of the node.
(vect_slp_analyze_operations): Make sure the CTOR
is vectorized with an expected type.
(vectorize_slp_instance_root_stmt): Revert r10-5979 fix.
* gcc.target/i386/pr101636.c: New testcase.
* c-c++-common/torture/pr101636.c: Likewise.
The first two testcases show different ways how e.g. the glibc
_FORTIFY_SOURCE wrappers are implemented, and on Winfinite-recursion-3.c
the new -Winfinite-recursion warning emits a false positive warning.
It is a false positive because when a builtin with 2 names is called
through the __builtin_ name (but not all builtins have a name prefixed
exactly like that) from extern inline function with gnu_inline semantics,
it doesn't mean the compiler will ever attempt to use the user inline
wrapper for the call, the __builtin_ just does what the builtin function
is expected to do and either expands into some compiler generated code,
or if the compiler decides to emit a call it will use an actual definition
of the function, but that is not the extern inline gnu_inline function
which is never emitted out of line.
Compared to that, in Winfinite-recursion-5.c the extern inline gnu_inline
wrapper calls the builtin by the same name as the function's name and in
that case it is infinite recursion, we actuall try to inline the recursive
call and also error because the recursion is infinite during inlining;
without always_inline we wouldn't error but it is still infinite recursion,
the user has no control on how many recursive calls we actually inline.
2022-02-22 Jakub Jelinek <jakub@redhat.com>
PR c/104633
* gimple-warn-recursion.cc (pass_warn_recursion::find_function_exit):
Don't warn about calls to corresponding builtin from extern inline
gnu_inline wrappers.
* gcc.dg/Winfinite-recursion-3.c: New test.
* gcc.dg/Winfinite-recursion-4.c: New test.
* gcc.dg/Winfinite-recursion-5.c: New test.