Commit Graph

188486 Commits

Author SHA1 Message Date
Tobias Burnus
1f0a57bd54 libgomp: Only check for 2*sizeof(void*) int type with Fortran [PR96661]
The depend type is a struct with two pointer members for C/C++ - but for
Fortran OpenMP requires an integer type with kind = omp_depend_kind. Thus,
libgomp's configure checks that an integer type/kind with size 2*sizeof(void*)
is available. However, this integer type/kind is not needed when building without
Fortran support. Thus, only check this when Fortran is enabled.

libgomp/
	PR libgomp/96661
	* configure.ac: Only check for int-type = 2*size_t support when
	building with Fortran support.
	* configure: Regenerate.
2021-09-28 15:15:47 +02:00
Ilya Leoshkevich
92cdd338fd reassoc: Test rank biasing
Add both positive and negative tests.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/reassoc-46.c: New test.
	* gcc.dg/tree-ssa/reassoc-46.h: Common code for new tests.
	* gcc.dg/tree-ssa/reassoc-47.c: New test.
	* gcc.dg/tree-ssa/reassoc-48.c: New test.
	* gcc.dg/tree-ssa/reassoc-49.c: New test.
	* gcc.dg/tree-ssa/reassoc-50.c: New test.
	* gcc.dg/tree-ssa/reassoc-51.c: New test.
2021-09-28 14:58:23 +02:00
Aldy Hernandez
c32f7df917 Enable jump threading at -O1.
My previous patch gating all jump threading by -fthread-jumps had the
side effect of turning off DOM jump threading at -O1.  This causes
numerous -Wuninitialized false positives.  This patch turns on jump
threading at -O1 to minimize the disruption.

gcc/ChangeLog:

	* cfgcleanup.c (pass_jump::execute): Check
	flag_expensive_optimizations.
	(pass_jump_after_combine::gate): Same.
	* doc/invoke.texi (-fthread-jumps): Enable for -O1.
	* opts.c (default_options_table): Enable -fthread-jumps at -O1.
	* tree-ssa-threadupdate.c
	(fwd_jt_path_registry::remove_jump_threads_including): Bail unless
	flag_thread_jumps.

gcc/testsuite/ChangeLog:

	* gcc.dg/auto-init-uninit-1.c: Adjust.
	* gcc.dg/auto-init-uninit-15.c: Same.
	* gcc.dg/guality/example.c: Same.
	* gcc.dg/loop-8.c: Same.
	* gcc.dg/strlenopt-40.c: Same.
	* gcc.dg/tree-ssa/pr18133-2.c: Same.
	* gcc.dg/tree-ssa/pr18134.c: Same.
	* gcc.dg/uninit-1.c: Same.
	* gcc.dg/uninit-pr44547.c: Same.
	* gcc.dg/uninit-pr59970.c: Same.
2021-09-28 14:33:53 +02:00
Thomas Schwinge
95540a6d1d 'gfortran.dg/assumed_rank_22_aux.c' messages printed vs. DejaGnu
Print lower-case 'error: [...]' instead of upper-case 'ERROR: [...]', to not
confuse the DejaGnu log processing harness into thinking these are DejaGnu
harness ERRORs:

    Running /scratch/tschwing/build2-trusty-cs/gcc/build/submit-big/source-gcc/gcc/testsuite/gfortran.dg/dg.exp ...
    +ERROR: c_assumed num=100: x->dim[2].extent = -1 != 0
    +ERROR: c_assumed num=100: x->dim[2].extent = -1 != 0
    +ERROR: c_assumed num=100: x->dim[2].extent = -1 != 0
    +ERROR: c_assumed num=100: x->dim[2].extent = -1 != 0
    +ERROR: c_assumed num=100: x->dim[2].extent = -1 != 0
    +ERROR: c_assumed num=100: x->dim[2].extent = -1 != 0
    [...]

Fix-up for recent commit 00f6de9c69
"Fortran: Fix assumed-size to assumed-rank passing [PR94070]".

	gcc/testsuite/
	* gfortran.dg/assumed_rank_22_aux.c: Adjust messages printed.
2021-09-28 14:18:23 +02:00
Thomas Schwinge
a43ae03a05 Further test case adjustment re "Fortran: Fix assumed-size to assumed-rank passing"
Fix-up for recent commit 00f6de9c69
"Fortran: Fix assumed-size to assumed-rank passing [PR94070]",
and commit da1f6391b7
"libgomp.oacc-fortran/privatized-ref-2.f90: Fix dg-note".

Due to use of '#if !ACC_MEM_SHARED' conditionals in
'libgomp.oacc-fortran/if-1.f90', 'target { !  openacc_host_selected }'
needs some special care (ignoring the pre-existing mismatch of
'ACC_MEM_SHARED' vs. 'openacc_host_selected').

As seen with GCN offloading, we need to revert to another bit of the
original code in 'libgomp.oacc-fortran/privatized-ref-2.f90'.

	libgomp/
	* testsuite/libgomp.oacc-fortran/if-1.f90: Adjust.
	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.
2021-09-28 14:18:21 +02:00
Ilya Leoshkevich
dbed1c8693 reassoc: Propagate PHI_LOOP_BIAS along single uses
PR tree-optimization/49749 introduced code that shortens dependency
chains containing loop accumulators by placing them last on operand
lists of associative operations.

456.hmmer benchmark on s390 could benefit from this, however, the code
that needs it modifies loop accumulator before using it, and since only
so-called loop-carried phis are are treated as loop accumulators, the
code in the present form doesn't really help.   According to Bill
Schmidt - the original author - such a conservative approach was chosen
so as to avoid unnecessarily swapping operands, which might cause
unpredictable effects.  However, giving special treatment to forms of
loop accumulators is acceptable.

The definition of loop-carried phi is: it's a single-use phi, which is
used in the same innermost loop it's defined in, at least one argument
of which is defined in the same innermost loop as the phi itself.
Given this, it seems natural to treat single uses of such phis as phis
themselves.

gcc/ChangeLog:

	* tree-ssa-reassoc.c (biased_names): New global.
	(propagate_bias_p): New function.
	(loop_carried_phi): Remove.
	(propagate_rank): Propagate bias along single uses.
	(get_rank): Update biased_names when needed.
2021-09-28 14:10:59 +02:00
Ilya Leoshkevich
99c106e695 reassoc: Do not bias loop-carried PHIs early
Biasing loop-carried PHIs during the 1st reassociation pass interferes
with reduction chains and does not bring measurable benefits, so do it
only during the 2nd reassociation pass.

gcc/ChangeLog:

	* passes.def (pass_reassoc): Rename parameter to early_p.
	* tree-ssa-reassoc.c (reassoc_bias_loop_carried_phi_ranks_p):
	New variable.
	(phi_rank): Don't bias loop-carried phi ranks
	before vectorization pass.
	(execute_reassoc): Add bias_loop_carried_phi_ranks_p parameter.
	(pass_reassoc::pass_reassoc): Add bias_loop_carried_phi_ranks_p
	initializer.
	(pass_reassoc::set_param): Set bias_loop_carried_phi_ranks_p
	value.
	(pass_reassoc::execute): Pass bias_loop_carried_phi_ranks_p to
	execute_reassoc.
	(pass_reassoc::bias_loop_carried_phi_ranks_p): New member.
2021-09-28 14:10:13 +02:00
Jakub Jelinek
3b7041e834 i386: Don't emit fldpi etc. if -frounding-math [PR102498]
i387 has instructions to store some transcedental numbers into the top of
stack.  The problem is that what exact bit in the last place one gets for
those depends on the current rounding mode, the CPU knows the number with
slightly higher precision.  The compiler assumes rounding to nearest when
comparing them against constants in the IL, but at runtime the rounding
can be different and so some of these depending on rounding mode and the
constant could be 1 ulp higher or smaller than expected.
We only support changing the rounding mode at runtime if the non-default
-frounding-mode option is used, so the following patch just disables
using those constants if that flag is on.

2021-09-28  Jakub Jelinek  <jakub@redhat.com>

	PR target/102498
	* config/i386/i386.c (standard_80387_constant_p): Don't recognize
	special 80387 instruction XFmode constants if flag_rounding_math.

	* gcc.target/i386/pr102498.c: New test.
2021-09-28 13:02:51 +02:00
Richard Biener
34b1e44e16 tree-optimization/99793 - testcase for the PR
This adds a testcase for the PR which was fixed with the fix for
PR100112.

2021-09-28  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/99793
	* gcc.dg/tree-ssa/pr99793.c: New testcase.
2021-09-28 12:50:29 +02:00
Richard Biener
5b8b1522e0 tree-optimization/100112 - VN last_vuse and redundant store elimination
This avoids the last_vuse optimization hindering redundant store
elimination by always also recording the original VUSE that was
in effect on the load.

In stage3 gcc/*.o we have 3182752 times recorded a single
entry and 903409 times two entries (that's ~20% overhead).

With just recording a single entry the number of hashtable lookups
done when walking the vuse->vdef links to find an earlier access
is 28961618.  When recording the second entry this makes us find
that earlier for donwnstream redundant accesses, reducing the number
of hashtable lookups to 25401052 (that's a ~10% reduction).

2021-09-27  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/100112
	* tree-ssa-sccvn.c (visit_reference_op_load): Record the
	referece into the hashtable twice in case last_vuse is
	different from the original vuse on the stmt.

	* gcc.dg/tree-ssa/ssa-fre-95.c: New testcase.
2021-09-28 12:31:46 +02:00
Jakub Jelinek
4f07769057 openmp: Don't call omp_finish_clause on implicitly added private clauses on simd [PR102492]
The gimplifier adds implicit private clauses on SIMD constructs for local
variables in the SIMD body if they are addressable to make sure they use
the magic arrays with "omp simd array" attribute (such that each SIMD lane
has its own copy), but we actually don't need to default privatize etc. those,
the construction for them is done in the SIMD body and so is destruction.
omp_finish_clause for C++ now requires default constructor (and dtor) for private,
so that OpenMP 5.1 default(private) works, but that will never be needed on
SIMD.  So, this patch just doesn't call omp_finish_clause for private on simd.
The C and Fortran langhooks don't do anything for private.

2021-09-28  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/102492
	* gimplify.c (gimplify_adjust_omp_clauses_1): Don't call the
	omp_finish_clause langhook on implicitly added OMP_CLAUSE_PRIVATE
	clauses on SIMD constructs.

	* g++.dg/gomp/simd-3.C: New test.
2021-09-28 11:38:03 +02:00
Aldy Hernandez
fb8b72ebb5 Return VARYING in range_on_path_entry if nothing found.
The problem here is that the solver's code solving unknown SSAs on entry
to a path was returning UNDEFINED if there were no incoming edges to the
start of the path that were not the function entry block.  This caused a
cascade of pain down stream.

Tested on x86-64 Linux.

	PR tree-optimization/102511

gcc/ChangeLog:

	* gimple-range-path.cc (path_range_query::range_on_path_entry):
	Return VARYING when nothing found.

gcc/testsuite/ChangeLog:

	* gcc.dg/pr102511.c: New test.
	* gcc.dg/tree-ssa/ssa-dom-thread-14.c: Adjust.
2021-09-28 11:11:20 +02:00
Andrew Burgess
dc614a838e top-level configure: setup target_configdirs based on repository
The top-level configure script is shared between the gcc repository
and the binutils-gdb repository.

The target_configdirs variable in the configure.ac script, defines
sub-directories that contain components that should be built for the
target using the target tools.

Some components, e.g. zlib, are built as both host and target
libraries.

This causes problems for binutils-gdb.  If we run 'make all' in the
binutils-gdb repository we end up trying to build a target version of
the zlib library, which requires the target compiler be available.
Often the target compiler isn't immediately available, and so the
build fails.

The problem with zlib impacted a previous attempt to synchronise the
top-level configure scripts from gcc to binutils-gdb, see this thread:

  https://sourceware.org/pipermail/binutils/2019-May/107094.html

And I'm in the process of importing libbacktrace in to binutils-gdb,
which is also a host and target library, and triggers the same issues.

I believe that for binutils-gdb, at least at the moment, there are no
target libraries that we need to build.

In the configure script we build three lists of things we want to
build, $configdirs, $build_configdirs, and $target_configdirs, we also
build two lists of things we don't want to build, $skipdirs and
$noconfigdirs.  We then remove anything that is in the lists of things
not to build, from the list of things that should be built.

My proposal is to add everything in target_configdirs into skipdirs,
if the source tree doesn't contain a gcc/ sub-directory.  The result
is that for binutils-gdb no target tools or libraries will be built,
while for the gcc repository, nothing should change.

If a user builds a unified source tree, then the target tools and
libraries should still be built as the gcc/ directory will be present.

I've tested a build of gcc on x86-64, and the same set of target
libraries still seem to get built.  On binutils-gdb this change
resolves the issues with 'make all'.

ChangeLog:

	* configure: Regenerate.
	* configure.ac (skipdirs): Add the contents of target_configdirs if
	we are not building gcc.
2021-09-28 09:43:36 +01:00
Hongyu Wang
eea10afef7 AVX512FP16: Support basic 64/32bit vector type and operation.
For 32bit target, V4HF vector is parsed same as __m64 type, V2HF
is parsed by stack and returned from GPR since it is not specified
by ABI.

gcc/ChangeLog:

	PR target/102230
	* config/i386/i386.h (VALID_AVX512FP16_REG_MODE): Add
	V2HF mode check.
	(VALID_SSE2_REG_VHF_MODE): Add V4HFmode and V2HFmode.
	(VALID_MMX_REG_MODE): Add V4HFmode.
	(SSE_REG_MODE_P): Replace VALID_AVX512FP16_REG_MODE with
	vector mode condition.
	* config/i386/i386.c (classify_argument): Parse V4HF/V2HF
	via sse regs.
	(function_arg_32): Add V4HFmode.
	(function_arg_advance_32): Likewise.
	* config/i386/i386.md (mode): Add V4HF/V2HF.
	(MODE_SIZE): Likewise.
	* config/i386/mmx.md (MMXMODE): Add V4HF mode.
	(V_32): Add V2HF mode.
	(VHF_32_64): New mode iterator.
	(*mov<mode>_internal): Adjust sse alternatives to support
	V4HF mode move.
	(*mov<mode>_internal): Adjust sse alternatives to support
	V2HF mode move.
	(<insn><mode>3): New define_insn for add/sub/mul/div.

gcc/testsuite/ChangeLog:

	PR target/102230
	* gcc.target/i386/avx512fp16-floatvnhf.c: Remove xfail.
	* gcc.target/i386/avx512fp16-trunc-extendvnhf.c: Ditto.
	* gcc.target/i386/avx512fp16-truncvnhf.c: Ditto.
	* gcc.target/i386/avx512fp16-64-32-vecop-1.c: New test.
	* gcc.target/i386/avx512fp16-64-32-vecop-2.c: Ditto.
	* gcc.target/i386/pr102230.c: Ditto.
2021-09-28 16:39:32 +08:00
Richard Biener
1dadd5110f Fix gcc.target/i386/vect-pr97352.c for -m32 -march=cascadelake
The easiest is to disable AVX2 and AVX512F explicitely.

2021-09-28  Richard Biener  <rguenther@suse.de>

	* gcc.target/i386/vect-pr97352.c: Pass -mno-avx2 -mno-avx512f.
2021-09-28 10:05:22 +02:00
Tobias Burnus
ce450af508 gfortran.dg/include_15.f90: Add dg-prune-output [PR102500]
gcc/testsuite/
	PR fortran/102500
	* gfortran.dg/include_15.f90: Add 'dg-prune-output' to prune
	-Wmissing-include-dirs output printed or not depending on
	how the testsuite is run.
2021-09-28 09:49:12 +02:00
Richard Biener
6fabd9e25d Fix gcc.dg/vect/bb-slp-pr65935.c FAIL with AVX after recent change
This avoids bigger than V2DF vectorization which disturbs the ability
to consistently check for the vectorization result after us now
also vectorizing the V2DF tail of a V4DF vectorization variant.

2021-09-28  Richard Biener  <rguenther@suse.de>

	* gcc.dg/vect/bb-slp-pr65935.c: Prefer 128bit vectorization
	on x86.
2021-09-28 09:02:12 +02:00
Aldy Hernandez
e475ae9bbf Control all jump threading passes with -fjump-threads.
Last year I mentioned that -fthread-jumps was being ignored by the
majority of our jump threading passes, and Jeff said he'd be in favor
of fixing this.

This patch remedies the situation, but it does change existing behavior.
Currently -fthread-jumps is only enabled for -O2, -O3, and -Os.  This
means that even if we restricted all jump threading passes with
-fthread-jumps, DOM jump threading would still seep through since it
runs at -O1.

I propose this patch, but it does mean that DOM jump threading would
have to be explicitly enabled with -O1 -fthread-jumps.

gcc/ChangeLog:

	* tree-ssa-threadbackward.c (pass_thread_jumps::gate): Check
	flag_thread_jumps.
	(pass_early_thread_jumps::gate): Same.
	* tree-ssa-threadedge.c (jump_threader::thread_outgoing_edges):
	Return if !flag_thread_jumps.
	* tree-ssa-threadupdate.c
	(jt_path_registry::register_jump_thread): Assert that
	flag_thread_jumps is true.

gcc/testsuite/ChangeLog:

	* gcc.dg/auto-init-uninit-1.c: Add -fthread-jumps.
	* gcc.dg/auto-init-uninit-15.c: Same.
	* gcc.dg/guality/example.c: Same.
	* gcc.dg/loop-8.c: Same.
	* gcc.dg/strlenopt-40.c: Same.
	* gcc.dg/tree-ssa/pr18133-2.c: Same.
	* gcc.dg/tree-ssa/pr18134.c: Same.
	* gcc.dg/uninit-1.c: Same.
	* gcc.dg/uninit-pr44547.c: Same.
	* gcc.dg/uninit-pr59970.c: Same.
2021-09-28 08:17:29 +02:00
liuhongt
9cfb95f9b9 Relax condition of (vec_concat:M(vec_select op0 idx0)(vec_select op0 idx1)) to allow different modes between op0 and M, but have same inner mode.
This will enable optimization for below pattern.

(set (reg:V2DF 87 [ xx ])
    (vec_concat:V2DF (vec_select:DF (reg:V4DF 92)
            (parallel [
                    (const_int 2 [0x2])
                ]))
        (vec_select:DF (reg:V4DF 92)
            (parallel [
                    (const_int 3 [0x3])
                ]))))

gcc/ChangeLog:

	* simplify-rtx.c
	(simplify_context::simplify_binary_operation_1): Relax
	condition of simplifying (vec_concat:M (vec_select op0
	index0)(vec_select op1 index1)) to allow different modes
	between op0 and M, but have same inner mode.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/vect-rebuild.c: Adjust testcases.
	* gcc.target/i386/avx512f-vect-rebuild.c: New test.
2021-09-28 11:00:29 +08:00
liuhongt
3540429be7 Support 128/256/512-bit vector plus/smin/smax reduction for _Float16.
gcc/ChangeLog:

	* config/i386/i386-expand.c (emit_reduc_half): Handle
	V8HF/V16HF/V32HFmode.
	* config/i386/sse.md (REDUC_SSE_PLUS_MODE): Add V8HF.
	(REDUC_SSE_SMINMAX_MODE): Ditto.
	(REDUC_PLUS_MODE): Add V16HF and V32HF.
	(REDUC_SMINMAX_MODE): Ditto.

gcc/testsuite

	* gcc.target/i386/avx512fp16-reduce-op-2.c: New test.
	* gcc.target/i386/avx512fp16-reduce-op-3.c: New test.
2021-09-28 09:40:30 +08:00
GCC Administrator
cf966403d9 Daily bump. 2021-09-28 00:16:21 +00:00
Patrick Palka
51018dd139 c++: deduction guides and ttp rewriting [PR102479]
The problem here is ultimately that rewrite_tparm_list when rewriting a
TEMPLATE_TEMPLATE_PARM introduces a tree cycle in the rewritten
ttp that structural_comptypes can't cope with.  In particular the
DECL_TEMPLATE_PARMS of a ttp's TEMPLATE_DECL normally captures an empty
parameter list at its own level (and so the TEMPLATE_DECL doesn't appear
in its own DECL_TEMPLATE_PARMS), but rewrite_tparm_list ends up giving
it a complete parameter list.  In the new testcase below, this causes
infinite recursion from structural_comptypes when comparing Tmpl<char>
with Tmpl<long> (where both 'Tmpl's are rewritten ttps).

This patch fixes this by making rewrite_template_parm give a rewritten
template template parm an empty parameter list at its own level, thereby
avoiding the tree cycle.  Testing the alias CTAD case revealed that
we're not setting current_template_parms in alias_ctad_tweaks, which
this patch also fixes.

	PR c++/102479

gcc/cp/ChangeLog:

	* pt.c (rewrite_template_parm): Handle single-level tsubst_args.
	Avoid a tree cycle when assigning the DECL_TEMPLATE_PARMS for a
	rewritten ttp.
	(alias_ctad_tweaks): Set current_template_parms accordingly.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp1z/class-deduction12.C: Also test alias CTAD in the
	same way.
	* g++.dg/cpp1z/class-deduction99.C: New test.
2021-09-27 16:01:10 -04:00
Aldy Hernandez
8366836860 Minor cleanups to solver.
These are some minor cleanups and renames that surfaced after the
hybrid_threader work.

gcc/ChangeLog:

	* gimple-range-path.cc
	(path_range_query::precompute_ranges_in_block): Rename to...
	(path_range_query::compute_ranges_in_block): ...this.
	(path_range_query::precompute_ranges): Rename to...
	(path_range_query::compute_ranges): ...this.
	(path_range_query::precompute_relations): Rename to...
	(path_range_query::compute_relations): ...this.
	(path_range_query::precompute_phi_relations): Rename to...
	(path_range_query::compute_phi_relations): ...this.
	* gimple-range-path.h: Rename precompute* to compute*.
	* tree-ssa-threadbackward.c
	(back_threader::find_taken_edge_switch): Same.
	(back_threader::find_taken_edge_cond): Same.
	* tree-ssa-threadedge.c
	(hybrid_jt_simplifier::compute_ranges_from_state): Same.
	(hybrid_jt_state::register_equivs_stmt): Inline...
	* tree-ssa-threadedge.h: ...here.
2021-09-27 17:39:51 +02:00
Aldy Hernandez
4ef1e524fd Remove old VRP jump threader code.
There's a lot of code that melts away without the ASSERT_EXPR based jump
threader.  Also, I cleaned up the include files as part of the process.

gcc/ChangeLog:

	* tree-vrp.c (lhs_of_dominating_assert): Remove.
	(class vrp_jt_state): Remove.
	(class vrp_jt_simplifier): Remove.
	(vrp_jt_simplifier::simplify): Remove.
	(class vrp_jump_threader): Remove.
	(vrp_jump_threader::vrp_jump_threader): Remove.
	(vrp_jump_threader::~vrp_jump_threader): Remove.
	(vrp_jump_threader::before_dom_children): Remove.
	(vrp_jump_threader::after_dom_children): Remove.
2021-09-27 17:39:51 +02:00
Aldy Hernandez
0288527f47 Replace VRP threader with a hybrid forward threader.
This patch implements the new hybrid forward threader and replaces the
embedded VRP threader with it.

With all the pieces that have gone in, the implementation of the hybrid
threader is straightforward: convert the current state into
SSA imports that the solver will understand, and let the path solver
precompute ranges and relations for the path.  After this setup is done,
we can use the range_query API to solve gimple statements in the threader.
The forward threader is now engine agnostic so there are no changes to
the threader per se.

I have put the hybrid bits in tree-ssa-threadedge.*, instead of VRP,
because they will also be used in the evrp removal of the DOM/threader,
which is my next task.

Most of the patch, is actually test changes.  I have gone through every
single one and verified that we're correct.  Most were trivial dump
file name changes, but others required going through the IL an
certifying that the different IL was expected.

For example, in pr59597.c, we have one less thread because the
ASSERT_EXPR was getting in the way, and making it seem like things were
not crossing loops.  The hybrid threader sees the correct representation
of the IL, and avoids threading this one case.

The final numbers are a 12.16% improvement in jump threads immediately
after VRP, and a 0.82% improvement in overall jump threads.  The
performance drop is 0.6% (plus the 1.43% hit from moving the embedded
threader into its own pass).  As I've said, I'd prefer to keep the
threader in its own pass, but if this is an issue, we can address this
with a shared ranger when VRP is replaced with an evrp instance
(upcoming).

Note, that these numbers are slightly different than what I originally
posted.  A few correctness tweaks, plus restricting loop threads, made
the difference.  That being said, I was aiming for par.  A 12% gain is
just gravy ;-).  When we merge the threaders, we should see even better
numbers-- and we'll have the benefit of an entire release stress testing
the solver.

As I mentioned in my introductory note, paths ending in MEM_REF
conditional are missing.  In reality, this didn't make a difference, as
it was so rare.  However, as a follow-up, I will distill a test and add
a suitable PR to keep us honest.

There is a one-line change to libgomp/team.c silencing a new used
uninitialized warning.  As my previous work with the threaders has
shown, warnings flare up after each improvement to jump threading.  I
expect this to be no different.  I've promised Jakub to investigate
fully, so I will analyze and add the appropriate PR for the warning
experts.

Oh yeah, the new pass dump is called vrp-threader[12] to match each
VRP[12] pass.  However, there's no reason for it to either be named
vrp-threader, or for it to live in tree-vrp.c.

Tested on x86-64 Linux.

OK?

p.s. "Did I say 5 weeks?  My bad, I meant 5 months."

gcc/ChangeLog:

	* passes.def (pass_vrp_threader): New.
	* tree-pass.h (make_pass_vrp_threader): Add make_pass_vrp_threader.
	* tree-ssa-threadedge.c (hybrid_jt_state::register_equivs_stmt): New.
	(hybrid_jt_simplifier::hybrid_jt_simplifier): New.
	(hybrid_jt_simplifier::simplify): New.
	(hybrid_jt_simplifier::compute_ranges_from_state): New.
	* tree-ssa-threadedge.h (class hybrid_jt_state): New.
	(class hybrid_jt_simplifier): New.
	* tree-vrp.c (execute_vrp): Remove ASSERT_EXPR based jump
	threader.
	(class hybrid_threader): New.
	(hybrid_threader::hybrid_threader): New.
	(hybrid_threader::~hybrid_threader): New.
	(hybrid_threader::before_dom_children): New.
	(hybrid_threader::after_dom_children): New.
	(execute_vrp_threader): New.
	(class pass_vrp_threader): New.
	(make_pass_vrp_threader): New.

libgomp/ChangeLog:

	* team.c: Initialize start_data.
	* testsuite/libgomp.graphite/force-parallel-4.c: Adjust.
	* testsuite/libgomp.graphite/force-parallel-8.c: Adjust.

gcc/testsuite/ChangeLog:

	* gcc.dg/torture/pr55107.c: Adjust.
	* gcc.dg/tree-ssa/phi_on_compare-1.c: Adjust.
	* gcc.dg/tree-ssa/phi_on_compare-2.c: Adjust.
	* gcc.dg/tree-ssa/phi_on_compare-3.c: Adjust.
	* gcc.dg/tree-ssa/phi_on_compare-4.c: Adjust.
	* gcc.dg/tree-ssa/pr21559.c: Adjust.
	* gcc.dg/tree-ssa/pr59597.c: Adjust.
	* gcc.dg/tree-ssa/pr61839_1.c: Adjust.
	* gcc.dg/tree-ssa/pr61839_3.c: Adjust.
	* gcc.dg/tree-ssa/pr71437.c: Adjust.
	* gcc.dg/tree-ssa/ssa-dom-thread-11.c: Adjust.
	* gcc.dg/tree-ssa/ssa-dom-thread-16.c: Adjust.
	* gcc.dg/tree-ssa/ssa-dom-thread-18.c: Adjust.
	* gcc.dg/tree-ssa/ssa-dom-thread-2a.c: Adjust.
	* gcc.dg/tree-ssa/ssa-dom-thread-4.c: Adjust.
	* gcc.dg/tree-ssa/ssa-thread-14.c: Adjust.
	* gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Adjust.
	* gcc.dg/tree-ssa/vrp106.c: Adjust.
	* gcc.dg/tree-ssa/vrp55.c: Adjust.
2021-09-27 17:39:51 +02:00
Martin Liska
dd11aab646 Come up with section_flag enum.
gcc/ChangeLog:

	* output.h (enum section_flag): New.
	(SECTION_FORGET): Remove.
	(SECTION_ENTSIZE): Make it (1UL << 8) - 1.
	(SECTION_STYLE_MASK): Define it based on other enum
	values.
	* varasm.c (switch_to_section): Remove unused handling of
	SECTION_FORGET.
2021-09-27 16:59:38 +02:00
Martin Liska
a64697d7a3 flag_complex_method: support optimize attribute
gcc/c-family/ChangeLog:

	* c-opts.c (c_common_init_options_struct): Set also
	  x_flag_default_complex_method.

gcc/ChangeLog:

	* common.opt: Add new variable flag_default_complex_method.
	* opts.c (finish_options): Handle flags related to
	  x_flag_complex_method.
	* toplev.c (process_options): Remove option handling related
	to flag_complex_method.

gcc/go/ChangeLog:

	* go-lang.c (go_langhook_init_options_struct): Set also
	  x_flag_default_complex_method.

gcc/lto/ChangeLog:

	* lto-lang.c (lto_init_options_struct): Set also
	  x_flag_default_complex_method.

gcc/testsuite/ChangeLog:

	* gcc.c-torture/compile/attr-complex-method-2.c: New test.
	* gcc.c-torture/compile/attr-complex-method.c: New test.
2021-09-27 16:58:37 +02:00
Vincent Lefevre
3e6a511b94 Update pathname for IBM long double description.
include/
	* floatformat.h: Update pathname for IBM long double description.
2021-09-27 10:56:14 -04:00
Richard Biener
d06dc8a2c7 middle-end/102450 - avoid type_for_size for non-existing modes
This avoids asking type_for_size for types with sizes for which
no scalar integer mode exists.  Instead the following uses
int_mode_for_size to get the same result.

2021-09-27  Richard Biener  <rguenther@suse.de>

	PR middle-end/102450
	* gimple-fold.c (gimple_fold_builtin_memory_op): Avoid using
	type_for_size, instead use int_mode_for_size.
2021-09-27 15:04:32 +02:00
Tobias Burnus
da1f6391b7 libgomp.oacc-fortran/privatized-ref-2.f90: Fix dg-note
In my last commit, r12-3897-g00f6de9c69119594f7dad3bd525937c94c8200d0,
which inlined array-size code, I had to update the expected output.  However,
in doing so, I accidentally (copy'n'paste) changed dg-note into dg-message.

libgomp/
	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Change
	dg-message back to dg-note.
2021-09-27 14:33:39 +02:00
Tobias Burnus
00f6de9c69 Fortran: Fix assumed-size to assumed-rank passing [PR94070]
This code inlines the size0 and size1 libgfortran calls, the former is still
used by libgfortan itself (and by old code). Besides permitting more
optimizations, it also permits to handle assumed-rank dummies better: If the
dummy argument is a nonpointer/nonallocatable, an assumed-size actual arg is
repesented by having ubound == -1 for the last dimension. However, for
allocatable/pointers, this value can also exist. Hence, the dummy arg attr
has to be honored.

For that reason, when calling an assumed-rank procedure with nonpointer,
nonallocatable dummy arguments, the bounds have to be updated to avoid
the case ubound == -1 for the last dimension.

	PR fortran/94070

gcc/fortran/ChangeLog:

	* trans-array.c (gfc_tree_array_size): New function to
	find size inline (whole array or one dimension).
	(array_parameter_size): Use it, take stmt_block as arg.
	(gfc_conv_array_parameter): Update call.
	* trans-array.h (gfc_tree_array_size): Add prototype.
	* trans-decl.c (gfor_fndecl_size0, gfor_fndecl_size1): Remove
	these global vars.
	(gfc_build_intrinsic_function_decls): Remove their initialization.
	* trans-expr.c (gfc_conv_procedure_call): Update
	bounds of pointer/allocatable actual args to nonallocatable/nonpointer
	dummies to be one based.
	* trans-intrinsic.c (gfc_conv_intrinsic_shape): Fix case for
	assumed rank with allocatable/pointer dummy.
	(gfc_conv_intrinsic_size): Update to use inline function.
	* trans.h (gfor_fndecl_size0, gfor_fndecl_size1): Remove var decl.

libgfortran/ChangeLog:

	* intrinsics/size.c (size0, size1): Comment that now not
	used by newer compiler code.

libgomp/ChangeLog:

	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Update
	expected dg-note output.

gcc/testsuite/ChangeLog:

	* gfortran.dg/c-interop/cf-out-descriptor-6.f90: Remove xfail.
	* gfortran.dg/c-interop/size.f90: Remove xfail.
	* gfortran.dg/intrinsic_size_3.f90: Update scan-tree-dump-times.
	* gfortran.dg/transpose_optimization_2.f90: Likewise.
	* gfortran.dg/size_optional_dim_1.f90: Add scan-tree-dump-not.
	* gfortran.dg/assumed_rank_22.f90: New test.
	* gfortran.dg/assumed_rank_22_aux.c: New test.
2021-09-27 14:04:54 +02:00
Andrew Pinski
76773d3fea Fix PR c/94726: ICE with __builtin_shuffle and changing of types
The problem here is __builtin_shuffle when called with two arguments
instead of 1, uses a SAVE_EXPR to put in for the 1st and 2nd operand
of VEC_PERM_EXPR and when we go and gimplify the SAVE_EXPR, the type
is now error_mark_node and that fails hard.
This fixes the problem by adding a simple check for type of operand
of SAVE_EXPR not to be error_mark_node.

OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.

gcc/ChangeLog:

	PR c/94726
	* gimplify.c (gimplify_save_expr): Return early
	if the type of val is error_mark_node.

gcc/testsuite/ChangeLog:

	PR c/94726
	* gcc.dg/pr94726.c: New test.
2021-09-27 10:37:28 +00:00
Aldy Hernandez
d5f8abe1d3 Use on-demand ranges in ssa_name_has_boolean_range before querying nonzero bits.
The function ssa_name_has_boolean_range looks at the nonzero bits stored
in SSA_NAME_RANGE_INFO.  These are global in nature and are the result
of a previous evrp/VRP run (technically other passes can also set them).

However, we can do better if we use get_range_query.  Doing so will use
a ranger if enabled in a pass, or global ranges otherwise.  The call to
get_nonzero_bits remains, as there are passes that will set them
independently of the global range info.

Tested on x86-64 Linux with a regstrap as well as in a DOM environment
using an on-demand ranger instead of evrp.

gcc/ChangeLog:

	* tree-ssanames.c (ssa_name_has_boolean_range): Use
	get_range_query.
2021-09-27 12:23:59 +02:00
Aldy Hernandez
e1d01f4973 Convert some evrp uses in DOM to the range_query API.
DOM is the last remaining user of the evrp engine.  This patch converts
a few uses of the engine and vr-values into the new API.

There is one subtle change.  The call to vr_value's
op_with_constant_singleton_value_range can theoretically return
non-constants, unlike the range_query API which only returns constants.
In this particular case it doesn't matter because the symbolic stuff will
have been handled by the const_and_copies/avail_exprs read in the
SSA_NAME_VALUE copy immediately before.  I have verified this is the case
by asserting that all calls to op_with_constant_singleton_value_range at
this point return either NULL or an INTEGER_CST.

Tested on x86-64 Linux with a regstrap, as well as the aforementioned
assert.

gcc/ChangeLog:

	* gimple-ssa-evrp-analyze.h (class evrp_range_analyzer): Remove
	vrp_visit_cond_stmt.
	* tree-ssa-dom.c (cprop_operand): Convert to range_query API.
	(cprop_into_stmt): Same.
	(dom_opt_dom_walker::optimize_stmt): Same.
2021-09-27 11:43:19 +02:00
Richard Biener
6390c5047a Allow different vector types for stmt groups
This allows vectorization (in practice non-loop vectorization) to
have a stmt participate in different vector type vectorizations.
It allows us to remove vect_update_shared_vectype and replace it
by pushing/popping STMT_VINFO_VECTYPE from SLP_TREE_VECTYPE around
vect_analyze_stmt and vect_transform_stmt.

For data-ref the situation is a bit more complicated since we
analyze alignment info with a specific vector type in mind which
doesn't play well when that changes.

So the bulk of the change is passing down the actual vector type
used for a vectorized access to the various accessors of alignment
info, first and foremost dr_misalignment but also aligned_access_p,
known_alignment_for_access_p, vect_known_alignment_in_bytes and
vect_supportable_dr_alignment.  I took the liberty to replace
ALL_CAPS macro accessors with the lower-case function invocations.

The actual changes to the behavior are in dr_misalignment which now
is the place factoring in the negative step adjustment as well as
handling alignment queries for a vector type with bigger alignment
requirements than what we can (or have) analyze(d).

vect_slp_analyze_node_alignment makes use of this and upon receiving
a vector type with a bigger alingment desire re-analyzes the DR
with respect to it but keeps an older more precise result if possible.
In this context it might be possible to do the analysis just once
but instead of analyzing with respect to a specific desired alignment
look for the biggest alignment we can compute a not unknown alignment.

The ChangeLog includes the functional changes but not the bulk due
to the alignment accessor API changes - I hope that's something good.

2021-09-17  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/97351
	PR tree-optimization/97352
	PR tree-optimization/82426
	* tree-vectorizer.h (dr_misalignment): Add vector type
	argument.
	(aligned_access_p): Likewise.
	(known_alignment_for_access_p): Likewise.
	(vect_supportable_dr_alignment): Likewise.
	(vect_known_alignment_in_bytes): Likewise.  Refactor.
	(DR_MISALIGNMENT): Remove.
	(vect_update_shared_vectype): Likewise.
	* tree-vect-data-refs.c (dr_misalignment): Refactor, handle
	a vector type with larger alignment requirement and apply
	the negative step adjustment here.
	(vect_calculate_target_alignment): Remove.
	(vect_compute_data_ref_alignment): Get explicit vector type
	argument, do not apply a negative step alignment adjustment
	here.
	(vect_slp_analyze_node_alignment): Re-analyze alignment
	when we re-visit the DR with a bigger desired alignment but
	keep more precise results from smaller alignments.
	* tree-vect-slp.c (vect_update_shared_vectype): Remove.
	(vect_slp_analyze_node_operations_1): Do not update the
	shared vector type on stmts.
	* tree-vect-stmts.c (vect_analyze_stmt): Push/pop the
	vector type of an SLP node to the representative stmt-info.
	(vect_transform_stmt): Likewise.

	* gcc.target/i386/vect-pr82426.c: New testcase.
	* gcc.target/i386/vect-pr97352.c: Likewise.
2021-09-27 10:24:12 +02:00
liuhongt
e7b8d70200 Revert "Optimize v4sf reduction.".
This reverts commit 8f323c712e.

     PR target/102473
     PR target/101059
2021-09-27 15:51:24 +08:00
GCC Administrator
1932e1169a Daily bump. 2021-09-27 00:16:16 +00:00
Tobias Burnus
fe2771b291 Fortran: Fix associated intrinsic with assumed rank [PR101334]
ASSOCIATE (ptr, tgt) takes as first argument also an assumed-rank array;
however, using it together with a tgt (required to be non assumed rank)
had issues for both scalar and nonscalar tgt.

	PR fortran/101334
gcc/fortran/ChangeLog:

	* trans-intrinsic.c (gfc_conv_associated): Support assumed-rank
	'pointer' with scalar/array 'target' argument.

libgfortran/ChangeLog:

	* intrinsics/associated.c (associated): Also check for same rank.

gcc/testsuite/ChangeLog:

	* gfortran.dg/associated_assumed_rank.f90: New test.
2021-09-26 19:26:01 +02:00
liuhongt
e98e12c40b Remove storage only description for _Float16 w/o avx512fp16.
gcc/ChangeLog:

	* doc/extend.texi (Half-Precision): Remove storage only
	description for _Float16 w/o avx512fp16.
2021-09-26 09:04:41 +08:00
GCC Administrator
f5ef07a322 Daily bump. 2021-09-26 00:16:16 +00:00
Dimitar Dimitrov
8bafc9640f pru: Named address space for R30/R31 I/O access
The PRU architecture provides single-cycle access to GPIO pins via
special designated CPU registers - R30 and R31. These two registers can
of course be accessed in C code using inline assembly, but that can be
intimidating to users.

The TI proprietary compiler [1] can expose these I/O registers as global
volatile registers:
  volatile register unsigned int __R31;

Consequently, accessing them in user programs is as straightforward as
using a regular global variable:
  __R31 |= (1 << 2);

Unfortunately, global volatile registers are not supported by GCC [2].
I decided to implement convenient access to __R30 and __R31 using a new
named address space:
  extern volatile __regio_symbol unsigned int __R30;
Unlike global registers, volatile global memory variables are well
supported in GCC.  Memory writes and reads to the __regio_symbol address
space are converted to writes and reads to R30 and R31 CPU registers.
The declared variable name determines which of the two registers it is
representing.

With an ifdef for the __R30/__R31 declarations, user programs can now
be source-compatible with both TI and GCC toolchains.

[1] https://www.ti.com/lit/ug/spruhv7c/spruhv7c.pdf , "Global Register Variables"
[2] https://gcc.gnu.org/ml/gcc-patches/2015-01/msg02241.html

gcc/ChangeLog:

	* config/pru/constraints.md (Rrio): New constraint.
	* config/pru/predicates.md (regio_operand): New predicate.
	* config/pru/pru-pragma.c (pru_register_pragmas): Register
	the __regio_symbol address space.
	* config/pru/pru-protos.h (pru_symref2ioregno): Declaration.
	* config/pru/pru.c (pru_symref2ioregno): New helper function.
	(pru_legitimate_address_p): Remove.
	(pru_addr_space_legitimate_address_p): Use the address space
	aware hook variant.
	(pru_nongeneric_pointer_addrspace): New helper function.
	(pru_insert_attributes): New function to validate __regio_symbol
	usage.
	(TARGET_INSERT_ATTRIBUTES): New macro.
	(TARGET_LEGITIMATE_ADDRESS_P): Remove.
	(TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P): New macro.
	* config/pru/pru.h (enum reg_class): Add REGIO_REGS class.
	* config/pru/pru.md (*regio_readsi): New pattern to read I/O
	registers.
	(*regio_nozext_writesi): New pattern to write to I/O registers.
	(*regio_zext_write_r30<EQS0:mode>): Ditto.
	* doc/extend.texi: Document the new PRU Named Address Space.

gcc/testsuite/ChangeLog:

	* gcc.target/pru/regio-as-pointer.c: New negative test.
	* gcc.target/pru/regio-as-pointer-2.c: New negative test.
	* gcc.target/pru/regio-decl-2.c: New negative test.
	* gcc.target/pru/regio-decl-3.c: New negative test.
	* gcc.target/pru/regio-decl-4.c: New negative test.
	* gcc.target/pru/regio-decl.c: New negative test.
	* gcc.target/pru/regio-di.c: New negative test.
	* gcc.target/pru/regio-hi.c: New negative test.
	* gcc.target/pru/regio-qi.c: New negative test.
	* gcc.target/pru/regio.c: New test.
	* gcc.target/pru/regio.h: New helper header.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
2021-09-25 16:44:54 +03:00
GCC Administrator
9a4293ed9b Daily bump. 2021-09-25 00:16:20 +00:00
Andrew Burgess
71f9651108 top-level: merge Makefile.def patches from binutils-gdb repository
This commit back-ports two patches to Makefile.def from the
binutils-gdb repository, these patches were committed over there
without first being merged in to the gcc repository.

These commits all relate to dependencies for binutils-gdb modules, so
should have no impact on gcc, I tested a gcc build/install on x86-64
GNU/Linux, and everything looked OK.

The two patches being backported are binutils-gdb commits:

  commit ba4d88ad892fe29c6ca7938c8861f8edef5f7a3f (gdb-gnulib-issues)
  Date:   Mon Oct 12 16:04:32 2020 +0100

      gdb/gdbserver: add dependencies for distclean-gnulib

And

  commit 755ba58ebef02e1be9fc6770d00243ba6ed0223c
  Date:   Thu Mar 18 12:37:52 2021 +0000

      Add install dependencies for ld -> bfd and libctf -> bfd

2021-09-07  Andrew Burgess  <andrew.burgess@embecosm.com>

	* Makefile.def: Back-port commits ba4d88ad892f and
	755ba58ebef0 from binutils-gdb repository.
	* Makefile.in: Regenerated.
2021-09-24 18:16:55 +01:00
Harald Anlauf
84cccff60a Fortran - improve checking for intrinsics allowed in constant expressions
gcc/fortran/ChangeLog:

	PR fortran/102458
	* expr.c (is_non_constant_intrinsic): Check for intrinsics
	excluded in constant expressions (F2018:10.1.2).
	(gfc_is_constant_expr): Use that check.

gcc/testsuite/ChangeLog:

	PR fortran/102458
	* gfortran.dg/pr102458.f90: New test.
2021-09-24 19:11:02 +02:00
Sandra Loosemore
2364250ecc Fortran: Add missing diagnostic for F2018 C711 (TS29113 C407c)
2021-09-24  Sandra Loosemore  <sandra@codesourcery.com>

	PR fortran/101333

gcc/fortran/
	* interface.c (compare_parameter): Enforce F2018 C711.

gcc/testsuite/
	* gfortran.dg/c-interop/c407c-1.f90: Remove xfails.
2021-09-24 10:08:18 -07:00
Patrick Palka
34947d4e97 real: fix encoding of negative IEEE double/quad values [PR98216]
In encode_ieee_double/quad, the assignment

  unsigned long WORD = r->sign << 31;

is intended to set the 31st bit of WORD whenever the sign bit is set.
But on LP64 hosts it also unintentionally sets the upper 32 bits of WORD,
because r->sign gets promoted from unsigned:1 to int and then the result
of the shift (equal to INT_MIN) gets sign extended from int to long.

In the C++ frontend, this bug causes incorrect mangling of negative
floating point values because the output of real_to_target called from
write_real_cst unexpectedly has the upper 32 bits of this word set,
which the caller doesn't mask out.

This patch fixes this by avoiding the unwanted sign extension.  Note
that r0-53976 fixed the same bug in encode_ieee_single long ago.

	PR c++/98216
	PR c++/91292

gcc/ChangeLog:

	* real.c (encode_ieee_double): Avoid unwanted sign extension.
	(encode_ieee_quad): Likewise.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp2a/nontype-float2.C: New test.
2021-09-24 12:36:26 -04:00
Vladimir N. Makarov
51ca050319 Make profitability calculation of RA conflict presentations independent of host compiler type sizes. [PR102147]
gcc/ChangeLog:

2021-09-24  Vladimir Makarov  <vmakarov@redhat.com>

	PR rtl-optimization/102147
	* ira-build.c (ira_conflict_vector_profitable_p): Make
	profitability calculation independent of host compiler pointer and
	IRA_INT_BITS sizes.
2021-09-24 11:14:47 -04:00
Aldy Hernandez
55b3299dcd path solver: Avoid further lookups when range is defined in block.
If an SSA is defined in the current block, there is no need to query
range_on_path_entry for additional information.

gcc/ChangeLog:

	* gimple-range-path.cc (path_range_query::path_range_query):
	Move debugging header...
	(path_range_query::precompute_ranges): ...here.
	(path_range_query::internal_range_of_expr): Do not call
	range_on_path_entry if NAME is defined in the current block.
2021-09-24 16:49:51 +02:00
Jonathan Wakely
9b11107ed7 libstdc++: Remove redundant 'inline' specifiers
These functions are constexpr, which means they are implicitly inline.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* include/bits/range_access.h (cbegin, cend): Remove redundant
	'inline' specifier.
2021-09-24 15:38:44 +01:00
Richard Biener
710c6ab4ad Verify unallocated edge/BB flags are clear
This adds verification that unused auto_{edge,bb}_flag are not
remaining set but correctly cleared by consumers.  The intent
is that those flags can be cheaply used on a smaller IL region
and thus afterwards clearing can be restricted to the same
small region as well.

2021-09-24  Richard Biener  <rguenther@suse.de>

	* cfghooks.c (verify_flow_info): Verify unallocated BB and
	edge flags are not set.
2021-09-24 10:16:19 +02:00