Commit Graph

187830 Commits

Author SHA1 Message Date
Richard Biener
716a583692 c++/102228 - make lookup_anon_field O(1)
For the testcase in PR101555 lookup_anon_field takes the majority
of parsing time followed by get_class_binding_direct/fields_linear_search
which is PR83309.  The situation with anon aggregates is particularly
dire when we need to build accesses to their members and the anon
aggregates are nested.  There for each such access we recursively
build sub-accesses to the anon aggregate FIELD_DECLs bottom-up,
DFS searching for them.  That's inefficient since as I believe
there's a 1:1 relationship between anon aggregate types and the
FIELD_DECL used to place them.

The patch below does away with the search in lookup_anon_field and
instead records the single FIELD_DECL in the anon aggregate types
lang-specific data, re-using the RTTI typeinfo_var field.  That
speeds up the compile of the testcase with -fsyntax-only from
about 4.5s to slightly less than 1s.

I tried to poke holes into the 1:1 relationship idea with my C++
knowledge but failed (which might not say much).  It also leaves
a hole for the case when the C++ FE itself duplicates such type
and places it at a semantically different position.  I've tried
to poke holes into it with the duplication mechanism I understand
(templates) but failed.

2021-09-08  Richard Biener  <rguenther@suse.de>

	PR c++/102228
gcc/cp/
	* cp-tree.h (ANON_AGGR_TYPE_FIELD): New define.
	* decl.c (fixup_anonymous_aggr): Wipe RTTI info put in
	place on invalid code.
	* decl2.c (reset_type_linkage): Guard CLASSTYPE_TYPEINFO_VAR
	access.
	* module.cc (trees_in::read_class_def): Likewise.  Reconstruct
	ANON_AGGR_TYPE_FIELD.
	* semantics.c (finish_member_declaration): Populate
	ANON_AGGR_TYPE_FIELD for anon aggregate typed members.
	* typeck.c (lookup_anon_field): Remove DFS search and return
	ANON_AGGR_TYPE_FIELD directly.
2021-09-08 17:43:40 +02:00
Joseph Myers
d27d694151 testsuite: Allow .sdata in more cases in gcc.dg/array-quals-1.c
When testing for Nios II (gcc-testresults shows this for MIPS as
well), failures of gcc.dg/array-quals-1.c appear where a symbol was
found in .sdata rather than one of the expected sections.

FAIL: gcc.dg/array-quals-1.c scan-assembler-symbol-section symbol ^_?a$ (found a) has section ^\\.(const|rodata|srodata)|\\[RO\\] (found .sdata)
FAIL: gcc.dg/array-quals-1.c scan-assembler-symbol-section symbol ^_?b$ (found b) has section ^\\.(const|rodata|srodata)|\\[RO\\] (found .sdata)
FAIL: gcc.dg/array-quals-1.c scan-assembler-symbol-section symbol ^_?c$ (found c) has section ^\\.(const|rodata|srodata)|\\[RO\\] (found .sdata)
FAIL: gcc.dg/array-quals-1.c scan-assembler-symbol-section symbol ^_?d$ (found d) has section ^\\.(const|rodata|srodata)|\\[RO\\] (found .sdata)

Jakub's commit 0b34dbc0a2 allowed .sdata
for many variables in that test where use of .sdata caused a failure
on powerpc-linux.  I'm presuming the choice of which variables had
.sdata allowed was based only on the code generated for powerpc-linux,
not on any reason it would be wrong to allow it for the other
variables; thus, this patch adjusts the test to allow .sdata for some
more variables where that is needed on Nios II (and in one case where
it's not needed on Nios II, but the test results on gcc-testresults
suggest that it is needed on MIPS).

Tested with no regressions with cross to nios2-elf.

	* gcc.dg/array-quals-1.c: Allow .sdata section in more cases.
2021-09-08 15:38:18 +00:00
Joseph Myers
d081516ae1 testsuite: Use explicit -ftree-cselim in tests using -fdump-tree-cselim-details
When testing for Nios II (gcc-testresults shows this for various other
targets as well), tests scanning cselim dumps produce an UNRESOLVED
result because those dumps do not exist.

cselim is enabled conditionally by code in toplev.c:

  if (flag_tree_cselim == AUTODETECT_VALUE)
    {
      if (HAVE_conditional_move)
	flag_tree_cselim = 1;
      else
	flag_tree_cselim = 0;
    }

Add explicit -ftree-cselim to dg-options in the affected tests (as
already used by some other tests of cselim dumps) so that this dump
exists on all architectures.

Tested with no regressions with cross to nios2-elf, where this causes
the tests in question to PASS instead of being UNRESOLVED.

	* gcc.dg/tree-ssa/pr89430-1.c, gcc.dg/tree-ssa/pr89430-2.c,
	gcc.dg/tree-ssa/pr89430-3.c, gcc.dg/tree-ssa/pr89430-4.c,
	gcc.dg/tree-ssa/pr89430-5.c, gcc.dg/tree-ssa/pr89430-6.c,
	gcc.dg/tree-ssa/pr89430-7-comp-ref.c,
	gcc.dg/tree-ssa/pr89430-8-mem-ref-size.c,
	gcc.dg/tree-ssa/pr99473-1.c: Use -ftree-cselim.
2021-09-08 14:57:20 +00:00
Segher Boessenkool
86e6268cff rs6000: Fix ELFv2 r12 use in epilogue
We cannot use r12 here, it is already in use as the GEP (for sibling
calls).

2021-09-08  Segher Boessenkool  <segher@kernel.crashing.org>
	PR target/102107
	* config/rs6000/rs6000-logue.c (rs6000_emit_epilogue): For ELFv2 use
	r11 instead of r12 for restoring CR.
2021-09-08 13:27:56 +00:00
Jakub Jelinek
7485a52551 i386: Fix up xorsign for AVX [PR89984]
Thinking about it more this morning, while this patch fixes the problems
revealed in the testcase, the recent PR89984 change was buggy too, but
perhaps that can be fixed incrementally.  Because for AVX the new code
destructively modifies op1.  If that is different from dest, say on:
float
foo (float x, float y)
{
  return x * __builtin_copysignf (1.0f, y) + y;
}
then we get after RA:
(insn 8 7 9 2 (set (reg:SF 20 xmm0 [orig:82 _2 ] [82])
        (unspec:SF [
                (reg:SF 20 xmm0 [88])
                (reg:SF 21 xmm1 [89])
                (mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S16 A128])
            ] UNSPEC_XORSIGN)) "hohoho.c":4:12 649 {xorsignsf3_1}
     (nil))
(insn 9 8 15 2 (set (reg:SF 20 xmm0 [87])
        (plus:SF (reg:SF 20 xmm0 [orig:82 _2 ] [82])
            (reg:SF 21 xmm1 [89]))) "hohoho.c":4:44 1021 {*fop_sf_comm}
     (nil))
but split the xorsign into:
        vandps  .LC0(%rip), %xmm1, %xmm1
        vxorps  %xmm0, %xmm1, %xmm0
and then the addition:
        vaddss  %xmm1, %xmm0, %xmm0
which means we miscompile it - instead of adding y in the end we add
__builtin_copysignf (0.0f, y).
So, wonder if we don't want instead in addition to the &Yv <- Yv, 0
alternative (enabled for both pre-AVX and AVX as in this patch) the
&Yv <- Yv, Yv where destination must be different from inputs and another
Yv <- Yv, Yv where it can be the same but then need a match_scratch
(with X for the other alternatives and =Yv for the last one).
That way we'd always have a safe register we can store the op1 & mask
value into, either the destination (in the first alternative known to
be equal to op1 which is needed for non-AVX but ok for AVX too), in the
second alternative known to be different from both inputs and in the third
which could be used for those
float bar (float x, float y) { return x * __builtin_copysignf (1.0f, y); }
cases where op1 is naturally xmm1 and dest == op0 naturally xmm0 we'd use
some other register like xmm2.

On Wed, Sep 08, 2021 at 05:23:40PM +0800, Hongtao Liu wrote:
> I'm curious why we need the  post_reload splitter @xorsign<mode>3_1
> for scalar mode, can't we just expand them into and/xor operations in
> the expander, just like vector modes did.

Following seems to work for all the testcases I've tried (and in some
generates better code than the post-reload splitter).

2021-09-08  Jakub Jelinek  <jakub@redhat.com>
	    liuhongt  <hongtao.liu@intel.com>

	PR target/89984
	* config/i386/i386.md (@xorsign<mode>3_1): Remove.
	* config/i386/i386-expand.c (ix86_expand_xorsign): Expand right away
	into AND with mask and XOR, using paradoxical subregs.
	(ix86_split_xorsign): Remove.
	* config/i386/i386-protos.h (ix86_split_xorsign): Remove.

	* gcc.target/i386/avx-pr102224.c: Fix up PR number.
	* gcc.dg/pr89984.c: New test.
	* gcc.target/i386/avx-pr89984.c: New test.
2021-09-08 14:06:10 +02:00
liuhongt
6576ad5add Compile __{mul,div}hc3 into libgcc_s.so.1.
libgcc/ChangeLog:

	* config/i386/t-softfp: Compile __{mul,div}hc3 into
	libgcc_s.so.1.
2021-09-08 19:18:15 +08:00
Di Zhao
7285f39455 tree-optimization/102183 - sccvn: fix result compare in vn_nary_op_insert_into
If the first predicate value is different and copied, the comparison will then
be between val->result and the copied one. That can cause inserting extra
vn_pvals.

gcc/ChangeLog:

	* tree-ssa-sccvn.c (vn_nary_op_insert_into): fix result compare
2021-09-08 18:47:18 +08:00
Jakub Jelinek
87d55da7d7 libgcc, i386: Export *hf* and *hc* from libgcc_s.so.1
The following patch exports it for Linux from config/i386/*.ver where it
IMNSHO belongs, aarch64 already exports some of those at GCC_11* and other
targets might add them at completely different gcc versions.

2021-09-08  Jakub Jelinek  <jakub@redhat.com>
	    Iain Sandoe  <iain@sandoe.co.uk>

	* config/i386/libgcc-glibc.ver: Add %inherit GCC_12.0.0 GCC_7.0.0
	and export *hf* and *hc* functions at GCC_12.0.0.
2021-09-08 11:34:45 +02:00
Jakub Jelinek
a7b626d98a i386: Fix up @xorsign<mode>3_1 [PR102224]
As the testcase shows, we miscompile @xorsign<mode>3_1 if both input
operands are in the same register, because the splitter overwrites op1
before with op1 & mask before using op0.

For dest = xorsign op0, op0 we can actually simplify it from
dest = (op0 & mask) ^ op0 to dest = op0 & ~mask (aka abs).

The expander change is an optimization improvement, if we at expansion
time know it is xorsign op0, op0, we can emit abs right away and get better
code through that.

The @xorsign<mode>3_1 is a fix for the case where xorsign wouldn't be known
to have same operands during expansion, but during RTL optimizations they
would appear.  For non-AVX we need to use earlyclobber, we require
dest and op1 to be the same but op0 must be different because we overwrite
op1 first.  For AVX the constraints ensure that at most 2 of the 3 operands
may be the same register and if both inputs are the same, handles that case.
This case can be easily tested with the xorsign<mode>3 expander change
reverted.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Thinking about it more this morning, while this patch fixes the problems
revealed in the testcase, the recent PR89984 change was buggy too, but
perhaps that can be fixed incrementally.  Because for AVX the new code
destructively modifies op1.  If that is different from dest, say on:
float
foo (float x, float y)
{
  return x * __builtin_copysignf (1.0f, y) + y;
}
then we get after RA:
(insn 8 7 9 2 (set (reg:SF 20 xmm0 [orig:82 _2 ] [82])
        (unspec:SF [
                (reg:SF 20 xmm0 [88])
                (reg:SF 21 xmm1 [89])
                (mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S16 A128])
            ] UNSPEC_XORSIGN)) "hohoho.c":4:12 649 {xorsignsf3_1}
     (nil))
(insn 9 8 15 2 (set (reg:SF 20 xmm0 [87])
        (plus:SF (reg:SF 20 xmm0 [orig:82 _2 ] [82])
            (reg:SF 21 xmm1 [89]))) "hohoho.c":4:44 1021 {*fop_sf_comm}
     (nil))
but split the xorsign into:
        vandps  .LC0(%rip), %xmm1, %xmm1
        vxorps  %xmm0, %xmm1, %xmm0
and then the addition:
        vaddss  %xmm1, %xmm0, %xmm0
which means we miscompile it - instead of adding y in the end we add
__builtin_copysignf (0.0f, y).
So, wonder if we don't want instead in addition to the &Yv <- Yv, 0
alternative (enabled for both pre-AVX and AVX as in this patch) the
&Yv <- Yv, Yv where destination must be different from inputs and another
Yv <- Yv, Yv where it can be the same but then need a match_scratch
(with X for the other alternatives and =Yv for the last one).
That way we'd always have a safe register we can store the op1 & mask
value into, either the destination (in the first alternative known to
be equal to op1 which is needed for non-AVX but ok for AVX too), in the
second alternative known to be different from both inputs and in the third
which could be used for those
float bar (float x, float y) { return x * __builtin_copysignf (1.0f, y); }
cases where op1 is naturally xmm1 and dest == op0 naturally xmm0 we'd use
some other register like xmm2.

2021-09-08  Jakub Jelinek  <jakub@redhat.com>

	PR target/102224
	* config/i386/i386.md (xorsign<mode>3): If operands[1] is equal to
	operands[2], emit abs<mode>2 instead.
	(@xorsign<mode>3_1): Add early-clobbers for output operand, enable
	first alternative even for avx, add another alternative with
	=&Yv <- 0, Yv, Yvm constraints.
	* config/i386/i386-expand.c (ix86_split_xorsign): If op0 is equal
	to op1, emit vpandn instead.

	* gcc.dg/pr102224.c: New test.
	* gcc.target/i386/avx-pr102224.c: New test.
2021-09-08 11:25:31 +02:00
liuhongt
4a61bcaca0 AVX512FP16: Add abi test for zmm
gcc/testsuite/ChangeLog:

	* gcc.target/x86_64/abi/avx512fp16/m512h/abi-avx512fp16-zmm.exp:
	New file.
	* gcc.target/x86_64/abi/avx512fp16/m512h/args.h: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m512h/asm-support.S: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m512h/avx512fp16-zmm-check.h:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m512h/test_m512_returning.c:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_m512.c:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_structs.c:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m512h/test_passing_unions.c:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m512h/test_varargs-m512.c:
	Likewise.
2021-09-08 12:44:50 +08:00
liuhongt
07308cdb0c AVX512FP16: Add ABI test for ymm.
gcc/testsuite/ChangeLog:

	* gcc.target/x86_64/abi/avx512fp16/m256h/abi-avx512fp16-ymm.exp:
	New exp file.
	* gcc.target/x86_64/abi/avx512fp16/m256h/args.h: New header.
	* gcc.target/x86_64/abi/avx512fp16/m256h/avx512fp16-ymm-check.h:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m256h/asm-support.S: New.
	* gcc.target/x86_64/abi/avx512fp16/m256h/test_m256_returning.c:
	New test.
	* gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_m256.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_structs.c:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m256h/test_passing_unions.c:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/m256h/test_varargs-m256.c: Likewise.
2021-09-08 12:44:50 +08:00
H.J. Lu
22ce16ffa4 AVX512FP16: Add ABI tests for xmm.
Copied from regular XMM ABI tests. Only run AVX512FP16 ABI tests for ELF
targets.

gcc/testsuite/ChangeLog:

	* gcc.target/x86_64/abi/avx512fp16/abi-avx512fp16-xmm.exp: New exp
	file for abi test.
	* gcc.target/x86_64/abi/avx512fp16/args.h: New header file for abi test.
	* gcc.target/x86_64/abi/avx512fp16/avx512fp16-check.h: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/avx512fp16-xmm-check.h: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/defines.h: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/macros.h: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/asm-support.S: New asm for abi check.
	* gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c:
	New test.
	* gcc.target/x86_64/abi/avx512fp16/test_basic_alignment.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_basic_array_size_and_align.c:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_basic_returning.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_basic_sizes.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_basic_struct_size_and_align.c:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_basic_union_size_and_align.c:
	Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_complex_returning.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_passing_floats.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_passing_m64m128.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_passing_structs.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_passing_unions.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c: Likewise.
	* gcc.target/x86_64/abi/avx512fp16/test_varargs-m128.c: Likewise.
2021-09-08 12:44:50 +08:00
H.J. Lu
5bbd88bb1e AVX512FP16: Add tests for vector passing in variable arguments.
gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-vararg-1.c: New test.
	* gcc.target/i386/avx512fp16-vararg-2.c: Ditto.
	* gcc.target/i386/avx512fp16-vararg-3.c: Ditto.
	* gcc.target/i386/avx512fp16-vararg-4.c: Ditto.
2021-09-08 12:44:50 +08:00
liuhongt
2f3318dbcf AVX512FP16: Add testcase for vector init and broadcast intrinsics.
gcc/testsuite/ChangeLog:

	* gcc.target/i386/m512-check.h: Add union128h, union256h, union512h.
	* gcc.target/i386/avx512fp16-10a.c: New test.
	* gcc.target/i386/avx512fp16-10b.c: Ditto.
	* gcc.target/i386/avx512fp16-1a.c: Ditto.
	* gcc.target/i386/avx512fp16-1b.c: Ditto.
	* gcc.target/i386/avx512fp16-1c.c: Ditto.
	* gcc.target/i386/avx512fp16-1d.c: Ditto.
	* gcc.target/i386/avx512fp16-1e.c: Ditto.
	* gcc.target/i386/avx512fp16-2a.c: Ditto.
	* gcc.target/i386/avx512fp16-2b.c: Ditto.
	* gcc.target/i386/avx512fp16-2c.c: Ditto.
	* gcc.target/i386/avx512fp16-3a.c: Ditto.
	* gcc.target/i386/avx512fp16-3b.c: Ditto.
	* gcc.target/i386/avx512fp16-3c.c: Ditto.
	* gcc.target/i386/avx512fp16-4.c: Ditto.
	* gcc.target/i386/avx512fp16-5.c: Ditto.
	* gcc.target/i386/avx512fp16-6.c: Ditto.
	* gcc.target/i386/avx512fp16-7.c: Ditto.
	* gcc.target/i386/avx512fp16-8.c: Ditto.
	* gcc.target/i386/avx512fp16-9a.c: Ditto.
	* gcc.target/i386/avx512fp16-9b.c: Ditto.
	* gcc.target/i386/pr54855-13.c: Ditto.
	* gcc.target/i386/avx512fp16-vec_set_var.c: Ditto.
2021-09-08 12:44:50 +08:00
liuhongt
9e2a82e1f9 AVX512FP16: Support vector init/broadcast/set/extract for FP16.
gcc/ChangeLog:

	* config/i386/avx512fp16intrin.h (_mm_set_ph): New intrinsic.
	(_mm256_set_ph): Likewise.
	(_mm512_set_ph): Likewise.
	(_mm_setr_ph): Likewise.
	(_mm256_setr_ph): Likewise.
	(_mm512_setr_ph): Likewise.
	(_mm_set1_ph): Likewise.
	(_mm256_set1_ph): Likewise.
	(_mm512_set1_ph): Likewise.
	(_mm_setzero_ph): Likewise.
	(_mm256_setzero_ph): Likewise.
	(_mm512_setzero_ph): Likewise.
	(_mm_set_sh): Likewise.
	(_mm_load_sh): Likewise.
	(_mm_store_sh): Likewise.
	* config/i386/i386-builtin-types.def (V8HF): New type.
	(DEF_FUNCTION_TYPE (V8HF, V8HI)): New builtin function type
	* config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
	Support vector HFmodes.
	(ix86_expand_vector_init_one_nonzero): Likewise.
	(ix86_expand_vector_init_one_var): Likewise.
	(ix86_expand_vector_init_interleave): Likewise.
	(ix86_expand_vector_init_general): Likewise.
	(ix86_expand_vector_set): Likewise.
	(ix86_expand_vector_extract): Likewise.
	(ix86_expand_vector_init_concat): Likewise.
	(ix86_expand_sse_movcc): Handle vector HFmodes.
	(ix86_expand_vector_set_var): Ditto.
	* config/i386/i386-modes.def: Add HF vector modes in comment.
	* config/i386/i386.c (classify_argument): Add HF vector modes.
	(ix86_hard_regno_mode_ok): Allow HF vector modes for AVX512FP16.
	(ix86_vector_mode_supported_p): Likewise.
	(ix86_set_reg_reg_cost): Handle vector HFmode.
	(ix86_get_ssemov): Handle vector HFmode.
	(function_arg_advance_64): Pass unamed V16HFmode and V32HFmode
	by stack.
	(function_arg_advance_32): Pass V8HF/V16HF/V32HF by sse reg for 32bit
	mode.
	(function_arg_advance_32): Ditto.
	* config/i386/i386.h (VALID_AVX512FP16_REG_MODE): New.
	(VALID_AVX256_REG_OR_OI_MODE): Rename to ..
	(VALID_AVX256_REG_OR_OI_VHF_MODE): .. this, and add V16HF.
	(VALID_SSE2_REG_VHF_MODE): New.
	(VALID_AVX512VL_128_REG_MODE): Add V8HF and TImode.
	(SSE_REG_MODE_P): Add vector HFmode.
	* config/i386/i386.md (mode): Add HF vector modes.
	(MODE_SIZE): Likewise.
	(ssemodesuffix): Add ph suffix for HF vector modes.
	* config/i386/sse.md (VFH_128): New mode iterator.
	(VMOVE): Adjust for HF vector modes.
	(V): Likewise.
	(V_256_512): Likewise.
	(avx512): Likewise.
	(avx512fmaskmode): Likewise.
	(shuffletype): Likewise.
	(sseinsnmode): Likewise.
	(ssedoublevecmode): Likewise.
	(ssehalfvecmode): Likewise.
	(ssehalfvecmodelower): Likewise.
	(ssePScmode): Likewise.
	(ssescalarmode): Likewise.
	(ssescalarmodelower): Likewise.
	(sseintprefix): Likewise.
	(i128): Likewise.
	(bcstscalarsuff): Likewise.
	(xtg_mode): Likewise.
	(VI12HF_AVX512VL): New mode_iterator.
	(VF_AVX512FP16): Likewise.
	(VIHF): Likewise.
	(VIHF_256): Likewise.
	(VIHF_AVX512BW): Likewise.
	(V16_256): Likewise.
	(V32_512): Likewise.
	(sseintmodesuffix): New mode_attr.
	(sse): Add scalar and vector HFmodes.
	(ssescalarmode): Add vector HFmode mapping.
	(ssescalarmodesuffix): Add sh suffix for HFmode.
	(*<sse>_vm<insn><mode>3): Use VFH_128.
	(*<sse>_vm<multdiv_mnemonic><mode>3): Likewise.
	(*ieee_<ieee_maxmin><mode>3): Likewise.
	(<avx512>_blendm<mode>): New define_insn.
	(vec_setv8hf): New define_expand.
	(vec_set<mode>_0): New define_insn for HF vector set.
	(*avx512fp16_movsh): Likewise.
	(avx512fp16_movsh): Likewise.
	(vec_extract_lo_v32hi): Rename to ...
	(vec_extract_lo_<mode>): ... this, and adjust to allow HF
	vector modes.
	(vec_extract_hi_v32hi): Likewise.
	(vec_extract_hi_<mode>): Likewise.
	(vec_extract_lo_v16hi): Likewise.
	(vec_extract_lo_<mode>): Likewise.
	(vec_extract_hi_v16hi): Likewise.
	(vec_extract_hi_<mode>): Likewise.
	(vec_set_hi_v16hi): Likewise.
	(vec_set_hi_<mode>): Likewise.
	(vec_set_lo_v16hi): Likewise.
	(vec_set_lo_<mode>): Likewise.
	(*vec_extract<mode>_0): New define_insn_and_split for HF
	vector extract.
	(*vec_extracthf): New define_insn.
	(VEC_EXTRACT_MODE): Add HF vector modes.
	(PINSR_MODE): Add V8HF.
	(sse2p4_1): Likewise.
	(pinsr_evex_isa): Likewise.
	(<sse2p4_1>_pinsr<ssemodesuffix>): Adjust to support
	insert for V8HFmode.
	(pbroadcast_evex_isa): Add HF vector modes.
	(AVX2_VEC_DUP_MODE): Likewise.
	(VEC_INIT_MODE): Likewise.
	(VEC_INIT_HALF_MODE): Likewise.
	(avx2_pbroadcast<mode>): Adjust to support HF vector mode
	broadcast.
	(avx2_pbroadcast<mode>_1): Likewise.
	(<avx512>_vec_dup<mode>_1): Likewise.
	(<avx512>_vec_dup<mode><mask_name>): Likewise.
	(<mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>):
	Likewise.
2021-09-08 12:44:50 +08:00
Guo, Xuepeng
a68412117f AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16 instructions.
gcc/ChangeLog:

	* common/config/i386/cpuinfo.h (get_available_features):
	Detect FEATURE_AVX512FP16.
	* common/config/i386/i386-common.c
	(OPTION_MASK_ISA_AVX512FP16_SET,
	OPTION_MASK_ISA_AVX512FP16_UNSET,
	OPTION_MASK_ISA2_AVX512FP16_SET,
	OPTION_MASK_ISA2_AVX512FP16_UNSET): New.
	(OPTION_MASK_ISA2_AVX512BW_UNSET,
	OPTION_MASK_ISA2_AVX512BF16_UNSET): Add AVX512FP16.
	(ix86_handle_option): Handle -mavx512fp16.
	* common/config/i386/i386-cpuinfo.h (enum processor_features):
	Add FEATURE_AVX512FP16.
	* common/config/i386/i386-isas.h: Add entry for AVX512FP16.
	* config.gcc: Add avx512fp16intrin.h.
	* config/i386/avx512fp16intrin.h: New intrinsic header.
	* config/i386/cpuid.h: Add bit_AVX512FP16.
	* config/i386/i386-builtin-types.def: (FLOAT16): New primitive type.
	* config/i386/i386-builtins.c: Support _Float16 type for i386
	backend.
	(ix86_register_float16_builtin_type): New function.
	(ix86_float16_type_node): New.
	* config/i386/i386-c.c (ix86_target_macros_internal): Define
	__AVX512FP16__.
	* config/i386/i386-expand.c (ix86_expand_branch): Support
	HFmode.
	(ix86_prepare_fp_compare_args): Adjust TARGET_SSE_MATH &&
	SSE_FLOAT_MODE_P to SSE_FLOAT_MODE_SSEMATH_OR_HF_P.
	(ix86_expand_fp_movcc): Ditto.
	* config/i386/i386-isa.def: Add PTA define for AVX512FP16.
	* config/i386/i386-options.c (isa2_opts): Add -mavx512fp16.
	(ix86_valid_target_attribute_inner_p): Add avx512fp16 attribute.
	* config/i386/i386.c (ix86_get_ssemov): Use
	vmovdqu16/vmovw/vmovsh for HFmode/HImode scalar or vector.
	(ix86_get_excess_precision): Use
	FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when TARGET_AVX512FP16
	existed.
	(sse_store_index): Use SFmode cost for HFmode cost.
	(inline_memory_move_cost): Add HFmode, and perfer SSE cost over
	GPR cost for HFmode.
	(ix86_hard_regno_mode_ok): Allow HImode in sse register.
	(ix86_mangle_type): Add manlging for _Float16 type.
	(inline_secondary_memory_needed): No memory is needed for
	16bit movement between gpr and sse reg under
	TARGET_AVX512FP16.
	(ix86_multiplication_cost): Adjust TARGET_SSE_MATH &&
	SSE_FLOAT_MODE_P to SSE_FLOAT_MODE_SSEMATH_OR_HF_P.
	(ix86_division_cost): Ditto.
	(ix86_rtx_costs): Ditto.
	(ix86_add_stmt_cost): Ditto.
	(ix86_optab_supported_p): Ditto.
	* config/i386/i386.h (VALID_AVX512F_SCALAR_MODE): Add HFmode.
	(SSE_FLOAT_MODE_SSEMATH_OR_HF_P): Add HFmode.
	(PTA_SAPPHIRERAPIDS): Add PTA_AVX512FP16.
	* config/i386/i386.md (mode): Add HFmode.
	(MODE_SIZE): Add HFmode.
	(isa): Add avx512fp16.
	(enabled): Handle avx512fp16.
	(ssemodesuffix): Add sh suffix for HFmode.
	(comm): Add mult, div.
	(plusminusmultdiv): New code iterator.
	(insn): Add mult, div.
	(*movhf_internal): Adjust for avx512fp16 instruction.
	(*movhi_internal): Ditto.
	(*cmpi<unord>hf): New define_insn for HFmode.
	(*ieee_s<ieee_maxmin>hf3): Likewise.
	(extendhf<mode>2): Likewise.
	(trunc<mode>hf2): Likewise.
	(float<floatunssuffix><mode>hf2): Likewise.
	(*<insn>hf): Likewise.
	(cbranchhf4): New expander.
	(movhfcc): Likewise.
	(<insn>hf3): Likewise.
	(mulhf3): Likewise.
	(divhf3): Likewise.
	* config/i386/i386.opt: Add mavx512fp16.
	* config/i386/immintrin.h: Include avx512fp16intrin.h.
	* doc/invoke.texi: Add mavx512fp16.
	* doc/extend.texi: Add avx512fp16 Usage Notes.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx-1.c: Add -mavx512fp16 in dg-options.
	* gcc.target/i386/avx-2.c: Ditto.
	* gcc.target/i386/avx512-check.h: Check cpuid for AVX512FP16.
	* gcc.target/i386/funcspec-56.inc: Add new target attribute check.
	* gcc.target/i386/sse-13.c: Add -mavx512fp16.
	* gcc.target/i386/sse-14.c: Ditto.
	* gcc.target/i386/sse-22.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.
	* lib/target-supports.exp: (check_effective_target_avx512fp16): New.
	* g++.target/i386/float16-1.C: New test.
	* g++.target/i386/float16-2.C: Ditto.
	* g++.target/i386/float16-3.C: Ditto.
	* gcc.target/i386/avx512fp16-12a.c: Ditto.
	* gcc.target/i386/avx512fp16-12b.c: Ditto.
	* gcc.target/i386/float16-3a.c: Ditto.
	* gcc.target/i386/float16-3b.c: Ditto.
	* gcc.target/i386/float16-4a.c: Ditto.
	* gcc.target/i386/float16-4b.c: Ditto.
	* gcc.target/i386/pr54855-12.c: Ditto.
	* g++.dg/other/i386-2.C: Ditto.
	* g++.dg/other/i386-3.C: Ditto.

Co-Authored-By: H.J. Lu <hongjiu.lu@intel.com>
Co-Authored-By: Liu Hongtao <hongtao.liu@intel.com>
Co-Authored-By: Wang Hongyu <hongyu.wang@intel.com>
Co-Authored-By: Xu Dianhong <dianhong.xu@intel.com>
2021-09-08 12:44:50 +08:00
liuhongt
f19a327077 Support -fexcess-precision=16 which will enable FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
gcc/ada/ChangeLog:

	* gcc-interface/misc.c (gnat_post_options): Issue an error for
	-fexcess-precision=16.

gcc/c-family/ChangeLog:

	* c-common.c (excess_precision_mode_join): Update below comments.
	(c_ts18661_flt_eval_method): Set excess_precision_type to
	EXCESS_PRECISION_TYPE_FLOAT16 when -fexcess-precision=16.
	* c-cppbuiltin.c (cpp_atomic_builtins): Update below comments.
	(c_cpp_flt_eval_method_iec_559): Set excess_precision_type to
	EXCESS_PRECISION_TYPE_FLOAT16 when -fexcess-precision=16.

gcc/ChangeLog:

	* common.opt: Support -fexcess-precision=16.
	* config/aarch64/aarch64.c (aarch64_excess_precision): Return
	FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when
	EXCESS_PRECISION_TYPE_FLOAT16.
	* config/arm/arm.c (arm_excess_precision): Ditto.
	* config/i386/i386.c (ix86_get_excess_precision): Ditto.
	* config/m68k/m68k.c (m68k_excess_precision): Issue an error
	when EXCESS_PRECISION_TYPE_FLOAT16.
	* config/s390/s390.c (s390_excess_precision): Ditto.
	* coretypes.h (enum excess_precision_type): Add
	EXCESS_PRECISION_TYPE_FLOAT16.
	* doc/tm.texi (TARGET_C_EXCESS_PRECISION): Update documents.
	* doc/tm.texi.in (TARGET_C_EXCESS_PRECISION): Ditto.
	* doc/extend.texi (Half-Precision): Document
	-fexcess-precision=16.
	* flag-types.h (enum excess_precision): Add
	EXCESS_PRECISION_FLOAT16.
	* target.def (excess_precision): Update document.
	* tree.c (excess_precision_type): Set excess_precision_type to
	EXCESS_PRECISION_FLOAT16 when -fexcess-precision=16.

gcc/fortran/ChangeLog:

	* options.c (gfc_post_options): Issue an error for
	-fexcess-precision=16.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/float16-6.c: New test.
	* gcc.target/i386/float16-7.c: New test.
2021-09-08 12:44:49 +08:00
liuhongt
a549a9a39a Adjust the wording for x86 _Float16 type.
gcc/ChangeLog:

	* doc/extend.texi: (@node Floating Types): Adjust the wording.
	(@node Half-Precision): Ditto.
2021-09-08 09:10:45 +08:00
GCC Administrator
b2748138c0 Daily bump. 2021-09-08 00:16:23 +00:00
Max Filippov
b552c4e601 gcc: xtensa: fix PR target/102115
2021-09-07  Takayuki 'January June' Suwa  <jjsuwa_sys3175@yahoo.co.jp>
gcc/
	PR target/102115
	* config/xtensa/xtensa.c (xtensa_emit_move_sequence): Add
	'CONST_INT_P (src)' to the condition of the block that tries to
	eliminate literal when loading integer contant.
2021-09-07 15:40:26 -07:00
Ian Lance Taylor
21b046bade runtime: use hash32, not hash64, for amd64p32, mips64p32, mips64p32le
Fixes PR go/102102

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/348015
2021-09-07 15:05:11 -07:00
David Faust
d9996ccb94 doc: BPF CO-RE documentation
Document the new command line options (-mco-re and -mno-co-re), the new
BPF target builtin (__builtin_preserve_access_index), and the new BPF
target attribute (preserve_access_index) introduced with BPF CO-RE.

gcc/ChangeLog:

	* doc/extend.texi (BPF Type Attributes) New node.
	Document new preserve_access_index attribute.
	Document new preserve_access_index builtin.
	* doc/invoke.texi: Document -mco-re and -mno-co-re options.
2021-09-07 13:48:58 -07:00
David Faust
f4cdfd4856 bpf testsuite: Add BPF CO-RE tests
This commit adds several tests for the new BPF CO-RE functionality to
the BPF target testsuite.

gcc/testsuite/ChangeLog:

	* gcc.target/bpf/core-attr-1.c: New test.
	* gcc.target/bpf/core-attr-2.c: Likewise.
	* gcc.target/bpf/core-attr-3.c: Likewise.
	* gcc.target/bpf/core-attr-4.c: Likewise
	* gcc.target/bpf/core-builtin-1.c: Likewise
	* gcc.target/bpf/core-builtin-2.c: Likewise.
	* gcc.target/bpf/core-builtin-3.c: Likewise.
	* gcc.target/bpf/core-section-1.c: Likewise.
2021-09-07 13:48:58 -07:00
David Faust
8bdabb3754 bpf: BPF CO-RE support
This commit introduces support for BPF Compile Once - Run
Everywhere (CO-RE) in GCC.

gcc/ChangeLog:

	* config/bpf/bpf.c: Adjust includes.
	(bpf_handle_preserve_access_index_attribute): New function.
	(bpf_attribute_table): Use it here.
	(bpf_builtins): Add BPF_BUILTIN_PRESERVE_ACCESS_INDEX.
	(bpf_option_override): Handle "-mco-re" option.
	(bpf_asm_init_sections): New.
	(TARGET_ASM_INIT_SECTIONS): Redefine.
	(bpf_file_end): New.
	(TARGET_ASM_FILE_END): Redefine.
	(bpf_init_builtins): Add "__builtin_preserve_access_index".
	(bpf_core_compute, bpf_core_get_index): New.
	(is_attr_preserve_access): New.
	(bpf_expand_builtin): Handle new builtins.
	(bpf_core_newdecl, bpf_core_is_maybe_aggregate_access): New.
	(bpf_core_walk): New.
	(bpf_resolve_overloaded_builtin): New.
	(TARGET_RESOLVE_OVERLOADED_BUILTIN): Redefine.
	(handle_attr): New.
	(pass_bpf_core_attr): New RTL pass.
	* config/bpf/bpf-passes.def: New file.
	* config/bpf/bpf-protos.h (make_pass_bpf_core_attr): New.
	* config/bpf/coreout.c: New file.
	* config/bpf/coreout.h: Likewise.
	* config/bpf/t-bpf (TM_H): Add $(srcdir)/config/bpf/coreout.h.
	(coreout.o): New rule.
	(PASSES_EXTRA): Add $(srcdir)/config/bpf/bpf-passes.def.
	* config.gcc (bpf): Add coreout.h to extra_headers.
	Add coreout.o to extra_objs.
	Add $(srcdir)/config/bpf/coreout.c to target_gtfiles.
2021-09-07 13:48:58 -07:00
David Faust
0a2bd52f1a btf: expose get_btf_id
Expose the function get_btf_id, so that it may be used by the BPF
backend. This enables the BPF CO-RE machinery in the BPF backend to
lookup BTF type IDs, in order to create CO-RE relocation records.

A prototype is added in ctfc.h

gcc/ChangeLog:

	* btfout.c (get_btf_id): Function is no longer static.
	* ctfc.h: Expose it here.
2021-09-07 13:48:58 -07:00
David Faust
5b723401b3 ctfc: add function to lookup CTF ID of a TREE type
Add a new function, ctf_lookup_tree_type, to return the CTF type ID
associated with a type via its is TREE node. The function is exposed via
a prototype in ctfc.h.

gcc/ChangeLog:

	* ctfc.c (ctf_lookup_tree_type): New function.
	* ctfc.h: Likewise.
2021-09-07 13:48:58 -07:00
David Faust
44e4ed6a3c ctfc: externalize ctf_dtd_lookup
Expose the function ctf_dtd_lookup, so that it can be used by the BPF
CO-RE machinery. The function is no longer static, and an extern
prototype is added in ctfc.h.

gcc/ChangeLog:

	* ctfc.c (ctf_dtd_lookup): Function is no longer static.
	* ctfc.h: Analogous change.
2021-09-07 13:48:58 -07:00
David Faust
81eced213c dwarf: externalize lookup_type_die
Expose the function lookup_type_die in dwarf2out, so that it can be used
by CTF/BTF when adding BPF CO-RE information. The function is now
non-static, and an extern prototype is added in dwarf2out.h.

gcc/ChangeLog:

	* dwarf2out.c (lookup_type_die): Function is no longer static.
	* dwarf2out.h: Expose it here.
2021-09-07 13:48:57 -07:00
Hans-Peter Nilsson
578cd82af7 Fix fatal typo in gcc.dg/no_profile_instrument_function-attr-2.c
Dejagnu is unfortunately brittle: a syntax error in a
directive can abort the test-run for the current "tool"
(gcc, g++, gfortran), and if you don't check for this
condition or actually read the stdout log yourself, your
tools may make you believe the test was successful without
regressions.  At the very least, always grep for ^ERROR: in
the stdout log!

With r12-3379, the testsuite got such a fatal syntax error,
causing the gcc test-run to abort at (e.g.):

...
FAIL: gcc.dg/memchr.c (test for excess errors)
FAIL: gcc.dg/memcmp-3.c (test for excess errors)
ERROR: (DejaGnu) proc "scan-tree-dump-not\" = foo {\(\)"} optimized" does not exist.
The error code is TCL LOOKUP COMMAND scan-tree-dump-not\"
The info on the error is:
invalid command name "scan-tree-dump-not""
    while executing
"::tcl_unknown scan-tree-dump-not\" = foo {\(\)"} optimized"
    ("uplevel" body line 1)
    invoked from within
"uplevel 1 ::tcl_unknown $args"

		=== gcc Summary ===

# of expected passes		63740
# of unexpected failures	38
# of unexpected successes	2
# of expected failures		351
# of unresolved testcases	3
# of unsupported tests		662
x/cris-elf/gccobj/gcc/xgcc  version 12.0.0 20210907 (experimental)\
 [master r12-3391-g849d5f5929fc] (GCC)

testsuite:
	* gcc.dg/no_profile_instrument_function-attr-2.c: Fix
	typo in last change.
2021-09-07 22:36:59 +02:00
Harald Anlauf
2a1537a19c Fortran - improve error recovery determining array element from constructor
gcc/fortran/ChangeLog:

	PR fortran/101327
	* expr.c (find_array_element): When bounds cannot be determined as
	constant, return error instead of aborting.

gcc/testsuite/ChangeLog:

	PR fortran/101327
	* gfortran.dg/pr101327.f90: New test.
2021-09-07 20:51:49 +02:00
Indu Bhagat
849d5f5929 dwarf2out: Emit BTF in dwarf2out_finish for BPF CO-RE usecase
DWARF generation is split between early and late phases when LTO is in effect.
This poses challenges for CTF/BTF generation especially if late debug info
generation is desirable, as turns out to be the case for BPF CO-RE.

The approach taken here in this patch is:

1. LTO is disabled for BPF CO-RE
The reason to disable LTO for BPF CO-RE is that if LTO is in effect, BPF CO-RE
relocations need to be generated in the LTO link phase _after_ the optimizations
are done. This means we need to devise way to combine early and late BTF. At
this time, in absence of linker support for BTF sections, it makes sense to
steer clear of LTO for BPF CO-RE and bypass the issue.

2. The BPF backend updates the write_symbols with BPF_WITH_CORE_DEBUG to convey
the case that BTF with CO-RE support needs to be generated.  This information
is used by the debug info emission routines to defer the emission of BTF/CO-RE
until dwarf2out_finish.

So, in other words,

dwarf2out_early_finish
  - Always emit CTF here.
  - if (BTF && !BTF_WITH_CORE), emit BTF now.

dwarf2out_finish
  - if (BTF_WITH_CORE) emit BTF now.

gcc/ChangeLog:

	* dwarf2ctf.c (ctf_debug_finalize): Make it static.
	(ctf_debug_early_finish): New definition.
	(ctf_debug_finish): Likewise.
	* dwarf2ctf.h (ctf_debug_finalize): Remove declaration.
	(ctf_debug_early_finish): New declaration.
	(ctf_debug_finish): Likewise.
	* dwarf2out.c (dwarf2out_finish): Invoke ctf_debug_finish.
	(dwarf2out_early_finish): Invoke ctf_debug_early_finish.
2021-09-07 11:18:54 -07:00
Indu Bhagat
e29a9607fa bpf: Add new -mco-re option for BPF CO-RE
-mco-re in the BPF backend enables code generation for the CO-RE usecase. LTO is
disabled for CO-RE compilations.

gcc/ChangeLog:

	* config/bpf/bpf.c (bpf_option_override): For BPF backend, disable LTO
	support when compiling for CO-RE.
	* config/bpf/bpf.opt: Add new command line option -mco-re.

gcc/testsuite/ChangeLog:

	* gcc.target/bpf/core-lto-1.c: New test.
2021-09-07 11:17:55 -07:00
Indu Bhagat
053db9a49b debug: Add BTF_WITH_CORE_DEBUG debug format
To best handle BTF/CO-RE in GCC, a distinct BTF_WITH_CORE_DEBUG debug format is
being added.  This helps the compiler detect whether BTF with CO-RE relocations
needs to be emitted.

gcc/ChangeLog:

	* flag-types.h (enum debug_info_type): Add new enum
	DINFO_TYPE_BTF_WITH_CORE.
	(BTF_WITH_CORE_DEBUG): New bitmask.
	* flags.h (btf_with_core_debuginfo_p): New declaration.
	* opts.c (btf_with_core_debuginfo_p): New definition.
2021-09-07 11:16:53 -07:00
Jason Merrill
c03db573b9 tree: Change error_operand_p to an inline function
I've thought for a while that many of the macros in tree.h and such should
become inline functions.  This one in particular was confusing Coverity; the
null check in the macro made it think that all code guarded by
error_operand_p would also need null checks.

gcc/ChangeLog:

	* tree.h (error_operand_p): Change to inline function.
2021-09-07 14:09:38 -04:00
Jakub Jelinek
81f9718139 c++: Fix up constexpr evaluation of deleting dtors [PR100495]
We do not save bodies of constexpr clones and instead evaluate the bodies
of the constexpr functions they were cloned from.
I believe that is just fine for constructors because complete vs. base
ctors differ only in classes that have virtual bases and such constructors
aren't constexpr, similarly complete/base destructors.
But as the testcase below shows, for deleting destructors it is not fine,
deleting dtors while marked as clones in fact are just artificial functions
with synthetized body which calls the user destructor and deallocation.

So, either we'd need to evaluate the destructor and afterwards synthetize
and evaluate the deallocation, or we can just save and use the deleting
dtors bodies.  The latter seems much easier to me.

2021-09-07  Jakub Jelinek  <jakub@redhat.com>

	PR c++/100495
	* constexpr.c (maybe_save_constexpr_fundef): Save body even for
	constexpr deleting dtors.
	(cxx_eval_call_expression): Don't use DECL_CLONED_FUNCTION for
	deleting dtors.

	* g++.dg/cpp2a/constexpr-new21.C: New test.
2021-09-07 19:33:28 +02:00
Tobias Burnus
ff7bc505b1 libgomp.texi: Extend OpenMP 5.0 Implementation Status
libgomp/
	* libgomp.texi (OpenMP Implementation Status): Extend
	OpenMP 5.0 section.
	(OpenACC Profiling Interface): Fix typo.
2021-09-07 18:30:25 +02:00
Aldy Hernandez
020e2db0a8 Rename forwarder_block_p in treading code to empty_block_with_phis_p.
gcc/ChangeLog:

	* tree-ssa-threadedge.c (forwarder_block_p): Rename to...
	(empty_block_with_phis_p): ...this.
	(potentially_threadable_block): Same.
	(jump_threader::thread_through_normal_block): Same.
2021-09-07 18:04:36 +02:00
Tobias Burnus
fc4f0631de libgfortran: Makefile fix for ISO_Fortran_binding.h
libgfortran/ChangeLog:

	* Makefile.am (gfor_built_src): Depend on
	include/ISO_Fortran_binding.h not on ISO_Fortran_binding.h.
	(ISO_Fortran_binding.h): Rename make target to ...
	(include/ISO_Fortran_binding.h): ... this.
	* Makefile.in: Regenerate.
2021-09-07 17:46:05 +02:00
Eric Botcazou
81e9178fe7 Fix PR debug/101947
This is the recent LTO bootstrap failure with Ada enabled.  The compiler now
generates DW_OP_deref_type for a unit of the Ada front-end, which means that
the offset of base types in the CU must be computed during early DWARF too.

gcc/
	PR debug/101947
	* dwarf2out.c (mark_base_types): New overloaded function.
	(dwarf2out_early_finish): Invoke it on the COMDAT type list as well
	as the compilation unit, and call move_marked_base_types afterward.
2021-09-07 15:42:51 +02:00
H.J. Lu
ad9fcb961c x86: Enable FMA in unsigned SI to SF expanders
Enable FMA in scalar/vector unsigned SI to SF expanders.  Don't check
TARGET_AVX512F which has vcvtusi2ss and vcvtudq2ps instructions.

gcc/

	PR target/85819
	* config/i386/i386-expand.c (ix86_expand_convert_uns_sisf_sse):
	Enable FMA.
	(ix86_expand_vector_convert_uns_vsivsf): Likewise.

gcc/testsuite/

	PR target/85819
	* gcc.target/i386/pr85819-1a.c: New test.
	* gcc.target/i386/pr85819-1b.c: Likewise.
	* gcc.target/i386/pr85819-2a.c: Likewise.
	* gcc.target/i386/pr85819-2b.c: Likewise.
	* gcc.target/i386/pr85819-2c.c: Likewise.
	* gcc.target/i386/pr85819-3.c: Likewise.
2021-09-07 05:28:07 -07:00
Richard Biener
843068149e tree-optimization/102226 - fix epilogue vector re-use
This fixes re-use of the reduction value in epilogue vectorization
when a conversion from/to variable lenght vectors is required.

2021-09-07  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/102226
	* tree-vect-loop.c (vect_transform_cycle_phi): Record
	the converted value for the epilogue PHI use.

	* g++.dg/vect/pr102226.cc: New testcase.
2021-09-07 13:10:37 +02:00
Marcel Vollweiler
ba1cc6956b C, C++, Fortran, OpenMP: Add support for 'flush seq_cst' construct.
This patch adds support for the 'seq_cst' memory order clause on the 'flush'
directive which was introduced in OpenMP 5.1.

gcc/c-family/ChangeLog:

	* c-omp.c (c_finish_omp_flush): Handle MEMMODEL_SEQ_CST.

gcc/c/ChangeLog:

	* c-parser.c (c_parser_omp_flush): Parse 'seq_cst' clause on 'flush'
	directive.

gcc/cp/ChangeLog:

	* parser.c (cp_parser_omp_flush): Parse 'seq_cst' clause on 'flush'
	directive.
	* semantics.c (finish_omp_flush): Handle MEMMODEL_SEQ_CST.

gcc/fortran/ChangeLog:

	* openmp.c (gfc_match_omp_flush): Parse 'seq_cst' clause on 'flush'
	directive.
	* trans-openmp.c (gfc_trans_omp_flush): Handle OMP_MEMORDER_SEQ_CST.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/flush-1.c: Add test case for 'seq_cst'.
	* c-c++-common/gomp/flush-2.c: Add test case for 'seq_cst'.
	* g++.dg/gomp/attrs-1.C: Adapt test to handle all flush clauses.
	* g++.dg/gomp/attrs-2.C: Adapt test to handle all flush clauses.
	* gfortran.dg/gomp/flush-1.f90: Add test case for 'seq_cst'.
	* gfortran.dg/gomp/flush-2.f90: Add test case for 'seq_cst'.
2021-09-07 03:46:28 -07:00
Martin Liska
aad72d2ea8 inline: do not einline when no_profile_instrument_function is different
PR gcov-profile/80223

gcc/ChangeLog:

	* ipa-inline.c (can_inline_edge_p): Similarly to sanitizer
	options, do not inline when no_profile_instrument_function
	attributes are different in early inliner. It's fine to inline
	it after PGO instrumentation.

gcc/testsuite/ChangeLog:

	* gcc.dg/no_profile_instrument_function-attr-2.c: New test.
2021-09-07 11:47:57 +02:00
Richard Biener
f387ff788f tree-optimization/101555 - avoid redundant alias queries in PRE
This avoids doing redundant work during PHI translation to invalidate
mems when translating their corresponding VUSE through the blocks
virtual PHI node.  All the invalidation work is already done by
prune_clobbered_mems.

This speeds up the compile of the testcase from 275s with PRE
taking 91% of the compile-time down to 43s with PRE taking 16%
of the compile-time.

2021-09-07  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/101555
	* tree-ssa-pre.c (translate_vuse_through_block): Do not
	perform an alias walk to determine the validity of the
	mem at the start of the block which is already guaranteed
	by means of prune_clobbered_mems.
	(phi_translate_1): Pass edge to translate_vuse_through_block.
2021-09-07 11:14:17 +02:00
Tobias Burnus
cff72ef4e2 libgomp.texi: Add OpenMP Implementation Status
libgomp/
	* libgomp.texi (Enabling OpenMP): Refer to OMP spec in general
	not to 4.5; link to new section.
	(OpenMP Implementation Status): New.
2021-09-07 11:01:38 +02:00
Sandra Loosemore
13beaf9e8d Fortran: Revert to non-multilib-specific ISO_Fortran_binding.h
Commit fef67987cf changed the
libgfortran build process to generate multilib-specific versions of
ISO_Fortran_binding.h from a template, by running gfortran to identify
the values of the Fortran kind constants C_LONG_DOUBLE, C_FLOAT128,
and C_INT128_T.  This caused multiple problems with search paths, both
for build-tree testing and installed-tree use, not all of which have
been fixed.

This patch reverts to a non-multilib-specific .h file that uses GCC's
predefined preprocessor symbols to detect the supported types and map
them to kind values in the same way as the Fortran front end.

2021-09-06  Sandra Loosemore  <sandra@codesourcery.com>

libgfortran/
	* ISO_Fortran_binding-1-tmpl.h: Deleted.
	* ISO_Fortran_binding-2-tmpl.h: Deleted.
	* ISO_Fortran_binding-3-tmpl.h: Deleted.
	* ISO_Fortran_binding.h: New file to replace the above.
	* Makefile.am (gfor_cdir): Remove MULTISUBDIR.
	(ISO_Fortran_binding.h): Simplify to just copy the file.
	* Makefile.in: Regenerated.
	* mk-kinds-h.sh: Revert pieces no longer needed for
	ISO_Fortran_binding.h.
2021-09-06 21:28:50 -07:00
Xionghu Luo
546ecb0054 rs6000: Expand fmod and remainder when built with fast-math [PR97142]
fmod/fmodf and remainder/remainderf could be expanded instead of library
call when fast-math build, which is much faster.

fmodf:
     fdivs   f0,f1,f2
     friz    f0,f0
     fnmsubs f1,f2,f0,f1

remainderf:
     fdivs   f0,f1,f2
     frin    f0,f0
     fnmsubs f1,f2,f0,f1

SPEC2017 Ofast P8LE: 511.povray_r +1.14%,  526.blender_r +1.72%

gcc/ChangeLog:

2021-09-07  Xionghu Luo  <luoxhu@linux.ibm.com>

	PR target/97142
	* config/rs6000/rs6000.md (fmod<mode>3): New define_expand.
	(remainder<mode>3): Likewise.

gcc/testsuite/ChangeLog:

2021-09-07  Xionghu Luo  <luoxhu@linux.ibm.com>

	PR target/97142
	* gcc.target/powerpc/pr97142.c: New test.
2021-09-06 20:22:50 -05:00
YunQiang Su
58572bbb62 MIPS: add .module arch and ase to all output asm
Currently, the asm output file for MIPS has no rev info.
It can make some trouble, for example:

  assembler is mips1 by default,
  gcc is fpxx by default.

To assemble the output of gcc -S, we have to pass -mips2
to assembler.

The same situation is for some CPU has extension insn.
Octeon is an example.
So we can just add ".set arch=octeon".

If an ASE is enabled, .module ase will also be used.

gcc/ChangeLog:
	* config/mips/mips.c (mips_file_start): add .module for
	  arch and ase.
2021-09-07 08:45:37 +08:00
GCC Administrator
9f99555f29 Daily bump. 2021-09-07 00:16:34 +00:00
Roger Sayle
74cb45e67d Correct implementation of wi::clz
As diagnosed with Jakub and Richard in the analysis of PR 102134, the
current implementation of wi::clz has incorrect/inconsistent behaviour.
As mentioned by Richard in comment #7, clz should (always) return zero
for negative values, but the current implementation can only return 0
when precision is a multiple of HOST_BITS_PER_WIDE_INT.  The fix is
simply to reorder/shuffle the existing tests.

2021-09-06  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* wide-int.cc (wi::clz): Reorder tests to ensure the result
	is zero for all negative values.
2021-09-06 22:50:45 +01:00