Go to file
Roger Sayle b41be002ed ivopts: Improve code generated for very simple loops.
This patch tidies up the code that GCC generates for simple loops,
by selecting/generating a simpler loop bound expression in ivopts.
The original motivation came from looking at the following loop (from
gcc.target/i386/pr90178.c)

int *find_ptr (int* mem, int sz, int val)
{
  for (int i = 0; i < sz; i++)
    if (mem[i] == val)
      return &mem[i];
  return 0;
}

which GCC currently compiles to:

find_ptr:
        movq    %rdi, %rax
        testl   %esi, %esi
        jle     .L4
        leal    -1(%rsi), %ecx
        leaq    4(%rdi,%rcx,4), %rcx
        jmp     .L3
.L7:    addq    $4, %rax
        cmpq    %rcx, %rax
        je      .L4
.L3:    cmpl    %edx, (%rax)
        jne     .L7
        ret
.L4:    xorl    %eax, %eax
        ret

Notice the relatively complex leal/leaq instructions, that result
from ivopts using the following expression for the loop bound:
inv_expr 2:     ((unsigned long) ((unsigned int) sz_8(D) + 4294967295)
		* 4 + (unsigned long) mem_9(D)) + 4

which results from NITERS being (unsigned int) sz_8(D) + 4294967295,
i.e. (sz - 1), and the logic in cand_value_at determining the bound
as BASE + NITERS*STEP at the start of the final iteration and as
BASE + NITERS*STEP + STEP at the end of the final iteration.

Ideally, we'd like the middle-end optimizers to simplify
BASE + NITERS*STEP + STEP as BASE + (NITERS+1)*STEP, especially
when NITERS already has the form BOUND-1, but with type conversions
and possible overflow to worry about, the above "inv_expr 2" is the
best that can be done by fold (without additional context information).

This patch improves ivopts' cand_value_at by instead of using just
the tree expression for NITERS, passing the data structure that
explains how that expression was derived.  This allows us to peek
under the surface to check that NITERS+1 doesn't overflow, and in
this patch to use the SSA_NAME already holding the required value.

In the motivating loop above, inv_expr 2 now becomes:
(unsigned long) sz_8(D) * 4 + (unsigned long) mem_9(D)

And as a result, on x86_64 we now generate:

find_ptr:
        movq    %rdi, %rax
        testl   %esi, %esi
        jle     .L4
        movslq  %esi, %rsi
        leaq    (%rdi,%rsi,4), %rcx
        jmp     .L3
.L7:    addq    $4, %rax
        cmpq    %rcx, %rax
        je      .L4
.L3:    cmpl    %edx, (%rax)
        jne     .L7
        ret
.L4:    xorl    %eax, %eax
        ret

This improvement required one minor tweak to GCC's testsuite for
gcc.dg/wrapped-binop-simplify.c, where we again generate better
code, and therefore no longer find as many optimization opportunities
in later passes (vrp2).

Previously:

void v1 (unsigned long *in, unsigned long *out, unsigned int n)
{
  int i;
  for (i = 0; i < n; i++) {
    out[i] = in[i];
  }
}

on x86_64 generated:
v1:	testl   %edx, %edx
        je      .L1
        movl    %edx, %edx
        xorl    %eax, %eax
.L3:	movq    (%rdi,%rax,8), %rcx
        movq    %rcx, (%rsi,%rax,8)
        addq    $1, %rax
        cmpq    %rax, %rdx
        jne     .L3
.L1:	ret

and now instead generates:
v1:	testl   %edx, %edx
        je      .L1
        movl    %edx, %edx
        xorl    %eax, %eax
        leaq    0(,%rdx,8), %rcx
.L3:	movq    (%rdi,%rax), %rdx
        movq    %rdx, (%rsi,%rax)
        addq    $8, %rax
        cmpq    %rax, %rcx
        jne     .L3
.L1:	ret

2021-11-26  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* tree-ssa-loop-ivopts.c (cand_value_at): Take a class
	tree_niter_desc* argument instead of just a tree for NITER.
	If we require the iv candidate value at the end of the final
	loop iteration, try using the original loop bound as the
	NITER for sufficiently simple loops.
	(may_eliminate_iv): Update (only) call to cand_value_at.

gcc/testsuite/ChangeLog
	* gcc.dg/wrapped-binop-simplify.c: Update expected test result.
	* gcc.dg/tree-ssa/ivopts-5.c: New test case.
	* gcc.dg/tree-ssa/ivopts-6.c: New test case.
	* gcc.dg/tree-ssa/ivopts-7.c: New test case.
	* gcc.dg/tree-ssa/ivopts-8.c: New test case.
	* gcc.dg/tree-ssa/ivopts-9.c: New test case.
2021-11-26 17:22:10 +00:00
c++tools Daily bump. 2021-10-27 00:16:33 +00:00
config
contrib Daily bump. 2021-11-25 00:16:29 +00:00
fixincludes Daily bump. 2021-11-24 00:16:29 +00:00
gcc ivopts: Improve code generated for very simple loops. 2021-11-26 17:22:10 +00:00
gnattools Daily bump. 2021-10-23 00:16:26 +00:00
gotools
include Daily bump. 2021-11-13 00:16:39 +00:00
INSTALL
intl
libada Daily bump. 2021-10-23 00:16:26 +00:00
libatomic
libbacktrace Daily bump. 2021-11-13 00:16:39 +00:00
libcc1
libcody Daily bump. 2021-11-02 00:16:32 +00:00
libcpp Daily bump. 2021-11-24 00:16:29 +00:00
libdecnumber Daily bump. 2021-10-23 00:16:26 +00:00
libffi Daily bump. 2021-11-16 00:16:31 +00:00
libgcc Daily bump. 2021-11-26 00:16:26 +00:00
libgfortran Daily bump. 2021-10-19 00:16:23 +00:00
libgo
libgomp Daily bump. 2021-11-25 00:16:29 +00:00
libiberty Daily bump. 2021-10-23 00:16:26 +00:00
libitm
libobjc
liboffloadmic Daily bump. 2021-10-20 00:16:43 +00:00
libphobos Daily bump. 2021-11-20 00:16:35 +00:00
libquadmath
libsanitizer Daily bump. 2021-11-19 00:16:34 +00:00
libssp
libstdc++-v3 libstdc++: Ensure dg-add-options comes after dg-options 2021-11-26 15:11:58 +00:00
libvtv
lto-plugin
maintainer-scripts
zlib
.dir-locals.el
.gitattributes
.gitignore
ABOUT-NLS
ar-lib
ChangeLog Daily bump. 2021-11-17 00:16:29 +00:00
ChangeLog.jit
ChangeLog.tree-ssa
compile
config-ml.in
config.guess
config.rpath
config.sub
configure configure, Darwin: Set appropriate defaults for host-shared. 2021-11-16 19:44:51 +00:00
configure.ac configure, Darwin: Set appropriate defaults for host-shared. 2021-11-16 19:44:51 +00:00
COPYING
COPYING3
COPYING3.LIB
COPYING.LIB
COPYING.RUNTIME
depcomp
install-sh
libtool-ldflags
libtool.m4
lt~obsolete.m4
ltgcc.m4
ltmain.sh
ltoptions.m4
ltsugar.m4
ltversion.m4
MAINTAINERS MAINTAINERS: Add myself to DCO section and update email address 2021-11-16 22:49:55 +01:00
Makefile.def Make opcodes configure depend on bfd configure 2021-11-12 18:34:12 +10:30
Makefile.in Make opcodes configure depend on bfd configure 2021-11-12 18:34:12 +10:30
Makefile.tpl [PATCH 1/5] Makefile.in: Ensure build CPP/CPPFLAGS is used for build targets 2021-10-28 10:42:49 -04:00
missing
mkdep
mkinstalldirs
move-if-change
multilib.am
README
symlink-tree
test-driver
ylwrap

This directory contains the GNU Compiler Collection (GCC).

The GNU Compiler Collection is free software.  See the files whose
names start with COPYING for copying permission.  The manuals, and
some of the runtime libraries, are under different terms; see the
individual source files for details.

The directory INSTALL contains copies of the installation information
as HTML and plain text.  The source of this information is
gcc/doc/install.texi.  The installation information includes details
of what is included in the GCC sources and what files GCC installs.

See the file gcc/doc/gcc.texi (together with other files that it
includes) for usage and porting information.  An online readable
version of the manual is in the files gcc/doc/gcc.info*.

See http://gcc.gnu.org/bugs/ for how to report bugs usefully.

Copyright years on GCC source files may be listed using range
notation, e.g., 1987-2012, indicating that every year in the range,
inclusive, is a copyrightable year that could otherwise be listed
individually.