Avoid invalid loop transformations in jump threading registry.

My upcoming improvements to the forward jump threader make it thread
more aggressively.  In investigating some "regressions", I noticed
that it has always allowed threading through empty latches and across
loop boundaries.  As we have discussed recently, this should be avoided
until after loop optimizations have run their course.

Note that this wasn't much of a problem before because DOM/VRP
couldn't find these opportunities, but with a smarter solver, we trip
over them more easily.

Because the forward threader doesn't have an independent localized cost
model like the new threader (profitable_path_p), it is difficult to
catch these things at discovery.  However, we can catch them at
registration time, with the added benefit that all the threaders
(forward and backward) can share the handcuffs.

This patch is an adaptation of what we do in the backward threader, but
it is not meant to catch everything we do there, as some of the
restrictions there are due to limitations of the different block
copiers (for example, the generic copier does not re-use existing
threading paths).

We could ideally remove the now redundant bits in profitable_path_p, but
I would prefer not to for two reasons.  First, the backward threader uses
profitable_path_p as it discovers paths to avoid discovering paths in
unprofitable directions.  Second, I would like to merge all the forward
cost restrictions into the profitability class in the backward threader,
not the other way around.  Alas, that reshuffling will have to wait for
the next release.

As usual, there are quite a few tests that needed adjustments.  It seems
we were quite happily threading improper scenarios.  With most of them,
as can be seen in pr77445-2.c, we're merely shifting the threading to
after loop optimizations.

Tested on x86-64 Linux.

gcc/ChangeLog:

	* tree-ssa-threadupdate.c (jt_path_registry::cancel_invalid_paths):
	New.
	(jt_path_registry::register_jump_thread): Call
	cancel_invalid_paths.
	* tree-ssa-threadupdate.h (class jt_path_registry): Add
	cancel_invalid_paths.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/20030714-2.c: Adjust.
	* gcc.dg/tree-ssa/pr66752-3.c: Adjust.
	* gcc.dg/tree-ssa/pr77445-2.c: Adjust.
	* gcc.dg/tree-ssa/ssa-dom-thread-18.c: Adjust.
	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust.
	* gcc.dg/vect/bb-slp-16.c: Adjust.
This commit is contained in:
Aldy Hernandez 2021-09-23 10:59:24 +02:00
parent 29c9285703
commit 4a960d548b
8 changed files with 78 additions and 35 deletions

View File

@ -32,7 +32,8 @@ get_alias_set (t)
}
}
/* There should be exactly three IF conditionals if we thread jumps
properly. */
/* { dg-final { scan-tree-dump-times "if " 3 "dom2"} } */
/* There should be exactly 4 IF conditionals if we thread jumps
properly. There used to be 3, but one thread was crossing
loops. */
/* { dg-final { scan-tree-dump-times "if " 4 "dom2"} } */

View File

@ -1,5 +1,5 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-dce2" } */
/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3" } */
extern int status, pt;
extern int count;
@ -32,10 +32,15 @@ foo (int N, int c, int b, int *a)
pt--;
}
/* There are 4 jump threading opportunities, all of which will be
realized, which will eliminate testing of FLAG, completely. */
/* { dg-final { scan-tree-dump-times "Registering jump" 4 "thread1"} } */
/* There are 2 jump threading opportunities (which don't cross loops),
all of which will be realized, which will eliminate testing of
FLAG, completely. */
/* { dg-final { scan-tree-dump-times "Registering jump" 2 "thread1"} } */
/* There should be no assignments or references to FLAG, verify they're
eliminated as early as possible. */
/* { dg-final { scan-tree-dump-not "if .flag" "dce2"} } */
/* We used to remove references to FLAG by DCE2, but this was
depending on early threaders threading through loop boundaries
(which we shouldn't do). However, the late threading passes, which
run after loop optimizations , can successfully eliminate the
references to FLAG. Verify that ther are no references by the late
threading passes. */
/* { dg-final { scan-tree-dump-not "if .flag" "thread3"} } */

View File

@ -123,8 +123,8 @@ enum STATES FMS( u8 **in , u32 *transitions) {
aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough
to change decisions in switch expansion which in turn can expose new
jump threading opportunities. Skip the later tests on aarch64. */
/* { dg-final { scan-tree-dump "Jumps threaded: 1\[1-9\]" "thread1" } } */
/* { dg-final { scan-tree-dump-times "Invalid sum" 4 "thread1" } } */
/* { dg-final { scan-tree-dump "Jumps threaded: 9" "thread1" } } */
/* { dg-final { scan-tree-dump-times "Invalid sum" 1 "thread1" } } */
/* { dg-final { scan-tree-dump-not "optimizing for size" "thread1" } } */
/* { dg-final { scan-tree-dump-not "optimizing for size" "thread2" } } */
/* { dg-final { scan-tree-dump-not "optimizing for size" "thread3" { target { ! aarch64*-*-* } } } } */

View File

@ -21,5 +21,7 @@
condition.
All the cases are picked up by VRP1 as jump threads. */
/* { dg-final { scan-tree-dump-times "Registering jump" 6 "thread1" } } */
/* There used to be 6 jump threads found by thread1, but they all
depended on threading through distinct loops in ethread. */
/* { dg-final { scan-tree-dump-times "Threaded" 2 "vrp1" } } */

View File

@ -1,8 +1,8 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */
/* { dg-final { scan-tree-dump "Jumps threaded: 18" "thread1" } } */
/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread3" { target { ! aarch64*-*-* } } } } */
/* { dg-final { scan-tree-dump "Jumps threaded: 12" "thread1" } } */
/* { dg-final { scan-tree-dump "Jumps threaded: 5" "thread3" { target { ! aarch64*-*-* } } } } */
/* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2" } } */
/* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough

View File

@ -1,8 +1,5 @@
/* { dg-require-effective-target vect_int } */
/* See note below as to why we disable threading. */
/* { dg-additional-options "-fdisable-tree-thread1" } */
#include <stdarg.h>
#include "tree-vect.h"
@ -30,10 +27,6 @@ main1 (int dummy)
*pout++ = *pin++ + a;
*pout++ = *pin++ + a;
*pout++ = *pin++ + a;
/* In some architectures like ppc64, jump threading may thread
the iteration where i==0 such that we no longer optimize the
BB. Another alternative to disable jump threading would be
to wrap the read from `i' into a function returning i. */
if (arr[i] = i)
a = i;
else

View File

@ -2757,6 +2757,58 @@ fwd_jt_path_registry::update_cfg (bool may_peel_loop_headers)
return retval;
}
bool
jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path)
{
gcc_checking_assert (!path.is_empty ());
edge taken_edge = path[path.length () - 1]->e;
loop_p loop = taken_edge->src->loop_father;
bool seen_latch = false;
bool path_crosses_loops = false;
for (unsigned int i = 0; i < path.length (); i++)
{
edge e = path[i]->e;
if (e == NULL)
{
// NULL outgoing edges on a path can happen for jumping to a
// constant address.
cancel_thread (&path, "Found NULL edge in jump threading path");
return true;
}
if (loop->latch == e->src || loop->latch == e->dest)
seen_latch = true;
// The first entry represents the block with an outgoing edge
// that we will redirect to the jump threading path. Thus we
// don't care about that block's loop father.
if ((i > 0 && e->src->loop_father != loop)
|| e->dest->loop_father != loop)
path_crosses_loops = true;
if (flag_checking && !m_backedge_threads)
gcc_assert ((path[i]->e->flags & EDGE_DFS_BACK) == 0);
}
if (cfun->curr_properties & PROP_loop_opts_done)
return false;
if (seen_latch && empty_block_p (loop->latch))
{
cancel_thread (&path, "Threading through latch before loop opts "
"would create non-empty latch");
return true;
}
if (path_crosses_loops)
{
cancel_thread (&path, "Path crosses loops");
return true;
}
return false;
}
/* Register a jump threading opportunity. We queue up all the jump
threading opportunities discovered by a pass and update the CFG
and SSA form all at once.
@ -2776,19 +2828,8 @@ jt_path_registry::register_jump_thread (vec<jump_thread_edge *> *path)
return false;
}
/* First make sure there are no NULL outgoing edges on the jump threading
path. That can happen for jumping to a constant address. */
for (unsigned int i = 0; i < path->length (); i++)
{
if ((*path)[i]->e == NULL)
{
cancel_thread (path, "Found NULL edge in jump threading path");
return false;
}
if (flag_checking && !m_backedge_threads)
gcc_assert (((*path)[i]->e->flags & EDGE_DFS_BACK) == 0);
}
if (cancel_invalid_paths (*path))
return false;
if (dump_file && (dump_flags & TDF_DETAILS))
dump_jump_thread_path (dump_file, *path, true);

View File

@ -75,6 +75,7 @@ protected:
unsigned long m_num_threaded_edges;
private:
virtual bool update_cfg (bool peel_loop_headers) = 0;
bool cancel_invalid_paths (vec<jump_thread_edge *> &path);
jump_thread_path_allocator m_allocator;
// True if threading through back edges is allowed. This is only
// allowed in the generic copier in the backward threader.