4b3a325f07
This patch upgrades the pre-VRP threading passes to fully resolving backward threaders, and removes the post-VRP threading passes altogether. With it, we reduce the number of threaders in our pipeline from 9 to 7. This will leave DOM as the only forward threader client. When the ranger can handle floats, we should be able to upgrade the pre-DOM threaders to fully resolving threaders and kill the embedded DOM threader. The numbers are as follows: prev: # threads in backward + vrp-threaders = 92624 now: # threads in backward threaders = 94275 Gain: +1.78% prev: # total threads: 189495 now: # total threads: 193714 Gain: +2.22% The numbers are not as great as my initial proposal, but I've recently pushed all the work that got us to this point ;-). And... the compilation improves by 1.32%! There's a regression on uninit-pred-7_a.c that I've yet to look at. I want to make sure it's not a missing thread. If it is, I'll create a PR and own it. Also, the tree-ssa/phi_on_compare-*.c tests have all regressed. This seems to be some special case the forward threader handles that the backward threader does not (edge_forwards_cmp_to_conditional_jump*). I haven't dug deep to see if this is solveable within our infrastructure, but a cursory look shows that even though the VRP threader threads this, the *.optimized dump ends with more conditional jumps than without the optimization. I'd like to punt on this for now, because DOM actually catches this through its lone use of the forward threader (I've adjusted the tests). However, we will need to address this sooner or later, if indeed it's still improving the final assembly. gcc/ChangeLog: * passes.def: Replace the pass_thread_jumps before VRP* with pass_thread_jumps_full. Remove all pass_vrp_threader instances. * tree-ssa-threadbackward.c (pass_data_thread_jumps_full): Remove hyphen from "thread-full" name. libgomp/ChangeLog: * testsuite/libgomp.graphite/force-parallel-4.c: Adjust for threading changes. * testsuite/libgomp.graphite/force-parallel-8.c: Same. gcc/testsuite/ChangeLog: * gcc.dg/loop-unswitch-2.c: Adjust for threading changes. * gcc.dg/old-style-asm-1.c: Same. * gcc.dg/tree-ssa/phi_on_compare-1.c: Same. * gcc.dg/tree-ssa/phi_on_compare-2.c: Same. * gcc.dg/tree-ssa/phi_on_compare-3.c: Same. * gcc.dg/tree-ssa/phi_on_compare-4.c: Same. * gcc.dg/tree-ssa/pr20701.c: Same. * gcc.dg/tree-ssa/pr21001.c: Same. * gcc.dg/tree-ssa/pr21294.c: Same. * gcc.dg/tree-ssa/pr21417.c: Same. * gcc.dg/tree-ssa/pr21559.c: Same. * gcc.dg/tree-ssa/pr21563.c: Same. * gcc.dg/tree-ssa/pr49039.c: Same. * gcc.dg/tree-ssa/pr59597.c: Same. * gcc.dg/tree-ssa/pr61839_1.c: Same. * gcc.dg/tree-ssa/pr61839_3.c: Same. * gcc.dg/tree-ssa/pr66752-3.c: Same. * gcc.dg/tree-ssa/pr68198.c: Same. * gcc.dg/tree-ssa/pr77445-2.c: Same. * gcc.dg/tree-ssa/pr77445.c: Same. * gcc.dg/tree-ssa/ranger-threader-1.c: Same. * gcc.dg/tree-ssa/ranger-threader-2.c: Same. * gcc.dg/tree-ssa/ranger-threader-4.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-1.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-11.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-12.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-14.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-16.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same. * gcc.dg/tree-ssa/ssa-thread-14.c: Same. * gcc.dg/tree-ssa/ssa-thread-backedge.c: Same. * gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Same. * gcc.dg/tree-ssa/vrp02.c: Same. * gcc.dg/tree-ssa/vrp03.c: Same. * gcc.dg/tree-ssa/vrp05.c: Same. * gcc.dg/tree-ssa/vrp06.c: Same. * gcc.dg/tree-ssa/vrp07.c: Same. * gcc.dg/tree-ssa/vrp08.c: Same. * gcc.dg/tree-ssa/vrp09.c: Same. * gcc.dg/tree-ssa/vrp33.c: Same. * gcc.dg/uninit-pred-9_b.c: Same. * gcc.dg/uninit-pred-7_a.c: xfail.
51 lines
885 B
C
51 lines
885 B
C
/* { dg-additional-options "-fno-thread-jumps --param max-stores-to-sink=0" } */
|
|
|
|
#define N 1500
|
|
|
|
int x[N][N], y[N];
|
|
|
|
void abort (void);
|
|
|
|
int foo(void)
|
|
{
|
|
int i, j;
|
|
|
|
for (i = 0; i < N; i++)
|
|
y[i] = i;
|
|
|
|
for (i = 0; i < N; i++)
|
|
for (j = 0; j < N; j++)
|
|
x[i][j] = i + j;
|
|
|
|
for (i = 0; i < N; i++)
|
|
{
|
|
y[i] = i;
|
|
|
|
for (j = 0; j < N; j++)
|
|
{
|
|
if (j > 500)
|
|
{
|
|
x[i][j] = i + j + 3;
|
|
y[j] = i*j + 10;
|
|
}
|
|
else
|
|
x[i][j] = x[i][j]*3;
|
|
}
|
|
}
|
|
|
|
return x[2][5]*y[8];
|
|
}
|
|
|
|
int main(void)
|
|
{
|
|
if (168 != foo())
|
|
abort ();
|
|
|
|
return 0;
|
|
}
|
|
|
|
/* Check that parallel code generation part make the right answer. */
|
|
/* { dg-final { scan-tree-dump-times "5 loops carried no dependency" 1 "graphite" } } */
|
|
/* { dg-final { scan-tree-dump-times "loopfn.0" 4 "optimized" } } */
|
|
/* { dg-final { scan-tree-dump-times "loopfn.1" 4 "optimized" } } */
|