Go to file
Richard Sandiford ea74a3f548 vect: Fix VLA SLP invariant optimisation [PR98535]
duplicate_and_interleave is the main fallback way of loading
a repeating sequence of elements into variable-length vectors.
The code handles cases in which the number of elements in the
sequence is potentially several times greater than the number
of elements in a vector.

Let:

- NE be the (compile-time) number of elements in the sequence
- NR be the (compile-time) number of vector results and
- VE be the (run-time) number of elements in each vector

The basic approach is to duplicate each element into a
separate vector, giving NE vectors in total, then use
log2(NE) rows of NE permutes to generate NE results.

In the worst case — when VE has no known compile-time factor
and NR >= NE — all of these permutes are necessary.  However,
if VE is known to be a multiple of 2**F, then each of the
first F permute rows produces duplicate results; specifically,
the high permute for a given pair is the same as the low permute.
The code dealt with this by reusing the low result for the
high result.  This part was OK.

However, having duplicate results from one row meant that the
next row did duplicate work.  The redundancies would be optimised
away by later passes, but the code tried to avoid generating them
in the first place.  This is the part that went wrong.

Specifically, NR is typically less than NE when some permutes are
redundant, so the code tried to use NR to reduce the amount of work
performed.  The problem was that, although it correctly calculated
a conservative bound on how many results were needed in each row,
it chose the wrong results for anything other than the final row.

This doesn't usually matter for fully-packed SVE vectors.  We first
try to coalesce smaller elements into larger ones, so normally
VE ends up being 2**VQ (where VQ is the number of 128-bit blocks
in an SVE vector).  In that situation we'd only apply the faulty
optimisation to the final row, i.e. the case it handled correctly.
E.g. for things like:

  void
  f (long *x)
  {
    for (int i = 0; i < 100; i += 8)
      {
        x[i] += 1;
        x[i + 1] += 2;
        x[i + 2] += 3;
        x[i + 3] += 4;
        x[i + 4] += 5;
        x[i + 5] += 6;
        x[i + 6] += 7;
        x[i + 7] += 8;
      }
  }

(already tested by the testsuite), we'd have 3 rows of permutes
producing 4 vector results.  The schemne produced:

1st row: 8 results from 4 permutes, highs duplicates of lows
2nd row: 8 results from 8 permutes (half of which are actually redundant)
3rd row: 4 results from 4 permutes

However, coalescing elements is trickier for unpacked vectors,
and at the moment we don't try to do it (see the GET_MODE_SIZE
check in can_duplicate_and_interleave_p).  Unpacked vectors
therefore stress the code in ways that packed vectors didn't.

The patch fixes this by removing the redundancies from each row,
rather than trying to work around them later.  This also removes
the redundant work in the second row of the example above.

gcc/
	PR tree-optimization/98535
	* tree-vect-slp.c (duplicate_and_interleave): Use quick_grow_cleared.
	If the high and low permutes are the same, remove the high permutes
	from the working set and only continue with the low ones.
2021-01-20 13:16:30 +00:00
c++tools Daily bump. 2021-01-06 00:16:55 +00:00
config Daily bump. 2021-01-06 00:16:55 +00:00
contrib Daily bump. 2021-01-15 00:16:28 +00:00
fixincludes
gcc vect: Fix VLA SLP invariant optimisation [PR98535] 2021-01-20 13:16:30 +00:00
gnattools
gotools
include Daily bump. 2021-01-17 00:16:23 +00:00
INSTALL
intl
libada
libatomic Daily bump. 2021-01-16 00:16:29 +00:00
libbacktrace Daily bump. 2021-01-19 00:16:35 +00:00
libcc1 Daily bump. 2021-01-06 00:16:55 +00:00
libcody Daily bump. 2021-01-13 00:16:36 +00:00
libcpp Daily bump. 2021-01-16 00:16:29 +00:00
libdecnumber
libffi Daily bump. 2021-01-06 00:16:55 +00:00
libgcc Daily bump. 2021-01-14 00:16:24 +00:00
libgfortran Fix ChangeLog entries. 2021-01-17 18:27:02 -08:00
libgo libgo: update hurd support 2021-01-14 09:57:04 -08:00
libgomp Daily bump. 2021-01-20 00:16:46 +00:00
libhsail-rt Daily bump. 2021-01-06 00:16:55 +00:00
libiberty Daily bump. 2021-01-05 00:16:42 +00:00
libitm Daily bump. 2021-01-16 00:16:29 +00:00
libobjc Daily bump. 2021-01-06 00:16:55 +00:00
liboffloadmic Daily bump. 2021-01-06 00:16:55 +00:00
libphobos Daily bump. 2021-01-06 00:16:55 +00:00
libquadmath Daily bump. 2021-01-06 00:16:55 +00:00
libsanitizer Daily bump. 2021-01-06 00:16:55 +00:00
libssp Daily bump. 2021-01-06 00:16:55 +00:00
libstdc++-v3 Daily bump. 2021-01-19 00:16:35 +00:00
libvtv Daily bump. 2021-01-06 00:16:55 +00:00
lto-plugin Daily bump. 2021-01-06 00:16:55 +00:00
maintainer-scripts
zlib Daily bump. 2021-01-06 00:16:55 +00:00
.dir-locals.el
.gitattributes
.gitignore
ABOUT-NLS
ar-lib
ChangeLog Daily bump. 2021-01-13 00:16:36 +00:00
ChangeLog.jit
ChangeLog.tree-ssa
compile
config-ml.in
config.guess
config.rpath
config.sub
configure
configure.ac
COPYING
COPYING3
COPYING3.LIB
COPYING.LIB
COPYING.RUNTIME
depcomp
install-sh
libtool-ldflags
libtool.m4 Update GNU/Hurd configure support 2021-01-05 16:04:14 -07:00
lt~obsolete.m4
ltgcc.m4
ltmain.sh
ltoptions.m4
ltsugar.m4
ltversion.m4
MAINTAINERS MAINTAINERS: Fix spacing 2021-01-12 18:41:43 +00:00
Makefile.def sync libctf toplevel from binutils-gdb 2021-01-07 09:28:58 +10:30
Makefile.in sync libctf toplevel from binutils-gdb 2021-01-07 09:28:58 +10:30
Makefile.tpl
missing
mkdep
mkinstalldirs
move-if-change
multilib.am
README
symlink-tree
test-driver
ylwrap

This directory contains the GNU Compiler Collection (GCC).

The GNU Compiler Collection is free software.  See the files whose
names start with COPYING for copying permission.  The manuals, and
some of the runtime libraries, are under different terms; see the
individual source files for details.

The directory INSTALL contains copies of the installation information
as HTML and plain text.  The source of this information is
gcc/doc/install.texi.  The installation information includes details
of what is included in the GCC sources and what files GCC installs.

See the file gcc/doc/gcc.texi (together with other files that it
includes) for usage and porting information.  An online readable
version of the manual is in the files gcc/doc/gcc.info*.

See http://gcc.gnu.org/bugs/ for how to report bugs usefully.

Copyright years on GCC source files may be listed using range
notation, e.g., 1987-2012, indicating that every year in the range,
inclusive, is a copyrightable year that could otherwise be listed
individually.