gcc/libgomp/testsuite/libgomp.graphite/force-parallel-8.c
Aldy Hernandez 4b3a325f07 Remove VRP threader passes in exchange for better threading pre-VRP.
This patch upgrades the pre-VRP threading passes to fully resolving
backward threaders, and removes the post-VRP threading passes altogether.
With it, we reduce the number of threaders in our pipeline from 9 to 7.

This will leave DOM as the only forward threader client.  When the ranger
can handle floats, we should be able to upgrade the pre-DOM threaders to
fully resolving threaders and kill the embedded DOM threader.

The numbers are as follows:

	prev: # threads in backward + vrp-threaders = 92624
	now:  # threads in backward threaders = 94275
	Gain: +1.78%

	prev: # total threads: 189495
	now:  # total threads: 193714
	Gain: +2.22%

	The numbers are not as great as my initial proposal, but I've
	recently pushed all the work that got us to this point ;-).

And... the compilation improves by 1.32%!

There's a regression on uninit-pred-7_a.c that I've yet to look at.  I
want to make sure it's not a missing thread.  If it is, I'll create a PR
and own it.

Also, the tree-ssa/phi_on_compare-*.c tests have all regressed.  This
seems to be some special case the forward threader handles that the
backward threader does not (edge_forwards_cmp_to_conditional_jump*).
I haven't dug deep to see if this is solveable within our
infrastructure, but a cursory look shows that even though the VRP
threader threads this, the *.optimized dump ends with more conditional
jumps than without the optimization.  I'd like to punt on this for
now, because DOM actually catches this through its lone use of the
forward threader (I've adjusted the tests).  However, we will need to
address this sooner or later, if indeed it's still improving the final
assembly.

gcc/ChangeLog:

	* passes.def: Replace the pass_thread_jumps before VRP* with
	pass_thread_jumps_full.  Remove all pass_vrp_threader instances.
	* tree-ssa-threadbackward.c (pass_data_thread_jumps_full):
	Remove hyphen from "thread-full" name.

libgomp/ChangeLog:

	* testsuite/libgomp.graphite/force-parallel-4.c: Adjust for threading changes.
	* testsuite/libgomp.graphite/force-parallel-8.c: Same.

gcc/testsuite/ChangeLog:

	* gcc.dg/loop-unswitch-2.c: Adjust for threading changes.
	* gcc.dg/old-style-asm-1.c: Same.
	* gcc.dg/tree-ssa/phi_on_compare-1.c: Same.
	* gcc.dg/tree-ssa/phi_on_compare-2.c: Same.
	* gcc.dg/tree-ssa/phi_on_compare-3.c: Same.
	* gcc.dg/tree-ssa/phi_on_compare-4.c: Same.
	* gcc.dg/tree-ssa/pr20701.c: Same.
	* gcc.dg/tree-ssa/pr21001.c: Same.
	* gcc.dg/tree-ssa/pr21294.c: Same.
	* gcc.dg/tree-ssa/pr21417.c: Same.
	* gcc.dg/tree-ssa/pr21559.c: Same.
	* gcc.dg/tree-ssa/pr21563.c: Same.
	* gcc.dg/tree-ssa/pr49039.c: Same.
	* gcc.dg/tree-ssa/pr59597.c: Same.
	* gcc.dg/tree-ssa/pr61839_1.c: Same.
	* gcc.dg/tree-ssa/pr61839_3.c: Same.
	* gcc.dg/tree-ssa/pr66752-3.c: Same.
	* gcc.dg/tree-ssa/pr68198.c: Same.
	* gcc.dg/tree-ssa/pr77445-2.c: Same.
	* gcc.dg/tree-ssa/pr77445.c: Same.
	* gcc.dg/tree-ssa/ranger-threader-1.c: Same.
	* gcc.dg/tree-ssa/ranger-threader-2.c: Same.
	* gcc.dg/tree-ssa/ranger-threader-4.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-1.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-11.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-12.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-14.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-16.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
	* gcc.dg/tree-ssa/ssa-thread-14.c: Same.
	* gcc.dg/tree-ssa/ssa-thread-backedge.c: Same.
	* gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Same.
	* gcc.dg/tree-ssa/vrp02.c: Same.
	* gcc.dg/tree-ssa/vrp03.c: Same.
	* gcc.dg/tree-ssa/vrp05.c: Same.
	* gcc.dg/tree-ssa/vrp06.c: Same.
	* gcc.dg/tree-ssa/vrp07.c: Same.
	* gcc.dg/tree-ssa/vrp08.c: Same.
	* gcc.dg/tree-ssa/vrp09.c: Same.
	* gcc.dg/tree-ssa/vrp33.c: Same.
	* gcc.dg/uninit-pred-9_b.c: Same.
	* gcc.dg/uninit-pred-7_a.c: xfail.
2021-10-29 17:57:27 +02:00

51 lines
885 B
C

/* { dg-additional-options "-fno-thread-jumps --param max-stores-to-sink=0" } */
#define N 1500
int x[N][N], y[N];
void abort (void);
int foo(void)
{
int i, j;
for (i = 0; i < N; i++)
y[i] = i;
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
x[i][j] = i + j;
for (i = 0; i < N; i++)
{
y[i] = i;
for (j = 0; j < N; j++)
{
if (j > 500)
{
x[i][j] = i + j + 3;
y[j] = i*j + 10;
}
else
x[i][j] = x[i][j]*3;
}
}
return x[2][5]*y[8];
}
int main(void)
{
if (168 != foo())
abort ();
return 0;
}
/* Check that parallel code generation part make the right answer. */
/* { dg-final { scan-tree-dump-times "5 loops carried no dependency" 1 "graphite" } } */
/* { dg-final { scan-tree-dump-times "loopfn.0" 4 "optimized" } } */
/* { dg-final { scan-tree-dump-times "loopfn.1" 4 "optimized" } } */