[AArch64] Enable VECT_COMPARE_COSTS by default for SVE

This patch enables VECT_COMPARE_COSTS by default for SVE, both so
that we can compare SVE against Advanced SIMD and so that (with future
patches) we can compare multiple SVE vectorisation approaches against
each other.  It also adds a target-specific --param to control this.

2019-11-16  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* config/aarch64/aarch64.opt (--param=aarch64-sve-compare-costs):
	New option.
	* doc/invoke.texi: Document it.
	* config/aarch64/aarch64.c (aarch64_autovectorize_vector_modes):
	By default, return VECT_COMPARE_COSTS for SVE.

gcc/testsuite/
	* gcc.target/aarch64/sve/reduc_3.c: Split multi-vector cases out
	into...
	* gcc.target/aarch64/sve/reduc_3_costly.c: ...this new test,
	passing -fno-vect-cost-model for them.
	* gcc.target/aarch64/sve/slp_6.c: Add -fno-vect-cost-model.
	* gcc.target/aarch64/sve/slp_7.c,
	* gcc.target/aarch64/sve/slp_7_run.c: Split multi-vector cases out
	into...
	* gcc.target/aarch64/sve/slp_7_costly.c,
	* gcc.target/aarch64/sve/slp_7_costly_run.c: ...these new tests,
	passing -fno-vect-cost-model for them.
	* gcc.target/aarch64/sve/while_7.c: Add -fno-vect-cost-model.
	* gcc.target/aarch64/sve/while_9.c: Likewise.

From-SVN: r278337
This commit is contained in:
Richard Sandiford 2019-11-16 10:43:52 +00:00 committed by Richard Sandiford
parent bcc7e346bf
commit eb23241ba8
14 changed files with 154 additions and 36 deletions

View File

@ -1,3 +1,11 @@
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
* config/aarch64/aarch64.opt (--param=aarch64-sve-compare-costs):
New option.
* doc/invoke.texi: Document it.
* config/aarch64/aarch64.c (aarch64_autovectorize_vector_modes):
By default, return VECT_COMPARE_COSTS for SVE.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
* target.h (VECT_COMPARE_COSTS): New constant.

View File

@ -15962,7 +15962,15 @@ aarch64_autovectorize_vector_modes (vector_modes *modes, bool)
for this case. */
modes->safe_push (V2SImode);
return 0;
unsigned int flags = 0;
/* Consider enabling VECT_COMPARE_COSTS for SVE, both so that we
can compare SVE against Advanced SIMD and so that we can compare
multiple SVE vectorization approaches against each other. There's
not really any point doing this for Advanced SIMD only, since the
first mode that works should always be the best. */
if (TARGET_SVE && aarch64_sve_compare_costs)
flags |= VECT_COMPARE_COSTS;
return flags;
}
/* Implement TARGET_MANGLE_TYPE. */

View File

@ -258,3 +258,7 @@ long aarch64_stack_protector_guard_offset = 0
moutline-atomics
Target Report Mask(OUTLINE_ATOMICS) Save
Generate local calls to out-of-line atomic operations.
-param=aarch64-sve-compare-costs=
Target Joined UInteger Var(aarch64_sve_compare_costs) Init(1) IntegerRange(0, 1) Param
When vectorizing for SVE, consider using unpacked vectors for smaller elements and use the cost model to pick the cheapest approach. Also use the cost model to choose between SVE and Advanced SIMD vectorization.

View File

@ -11179,8 +11179,8 @@ without notice in future releases.
In order to get minimal, maximal and default value of a parameter,
one can use @option{--help=param -Q} options.
In each case, the @var{value} is an integer. The allowable choices for
@var{name} are:
In each case, the @var{value} is an integer. The following choices
of @var{name} are recognized for all targets:
@table @gcctabopt
@item predictable-branch-outcome
@ -12396,6 +12396,20 @@ statements or when determining their validity prior to issuing
diagnostics.
@end table
The following choices of @var{name} are available on AArch64 targets:
@table @gcctabopt
@item aarch64-sve-compare-costs
When vectorizing for SVE, consider using ``unpacked'' vectors for
smaller elements and use the cost model to pick the cheapest approach.
Also use the cost model to choose between SVE and Advanced SIMD vectorization.
Using unpacked vectors includes storing smaller elements in larger
containers and accessing elements with extending loads and truncating
stores.
@end table
@end table
@node Instrumentation Options

View File

@ -1,3 +1,19 @@
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
* gcc.target/aarch64/sve/reduc_3.c: Split multi-vector cases out
into...
* gcc.target/aarch64/sve/reduc_3_costly.c: ...this new test,
passing -fno-vect-cost-model for them.
* gcc.target/aarch64/sve/slp_6.c: Add -fno-vect-cost-model.
* gcc.target/aarch64/sve/slp_7.c,
* gcc.target/aarch64/sve/slp_7_run.c: Split multi-vector cases out
into...
* gcc.target/aarch64/sve/slp_7_costly.c,
* gcc.target/aarch64/sve/slp_7_costly_run.c: ...these new tests,
passing -fno-vect-cost-model for them.
* gcc.target/aarch64/sve/while_7.c: Add -fno-vect-cost-model.
* gcc.target/aarch64/sve/while_9.c: Likewise.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
* gcc.dg/vect/bb-slp-4.c: Expect the block to be vectorized

View File

@ -17,7 +17,6 @@ void reduc_ptr_##DSTTYPE##_##SRCTYPE (DSTTYPE *restrict sum, \
REDUC_PTR (int8_t, int8_t)
REDUC_PTR (int16_t, int16_t)
REDUC_PTR (int32_t, int32_t)
REDUC_PTR (int64_t, int64_t)
@ -25,17 +24,6 @@ REDUC_PTR (_Float16, _Float16)
REDUC_PTR (float, float)
REDUC_PTR (double, double)
/* Widening reductions. */
REDUC_PTR (int32_t, int8_t)
REDUC_PTR (int32_t, int16_t)
REDUC_PTR (int64_t, int8_t)
REDUC_PTR (int64_t, int16_t)
REDUC_PTR (int64_t, int32_t)
REDUC_PTR (float, _Float16)
REDUC_PTR (double, float)
/* Float<>Int conversions */
REDUC_PTR (_Float16, int16_t)
REDUC_PTR (float, int32_t)
@ -45,8 +33,14 @@ REDUC_PTR (int16_t, _Float16)
REDUC_PTR (int32_t, float)
REDUC_PTR (int64_t, double)
/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.s\n} 3 } } */
/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 4 } } */
/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.b\n} 1 } } */
/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.h\n} 2 { xfail *-*-* } } } */
/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.s\n} 2 { xfail *-*-* } } } */
/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 2 { xfail *-*-* } } } */
/* We don't yet vectorize the int<-float cases. */
/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.h\n} 1 } } */
/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.s\n} 1 } } */
/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 1 } } */
/* { dg-final { scan-assembler-times {\tfaddv\th[0-9]+, p[0-7], z[0-9]+\.h\n} 2 } } */
/* { dg-final { scan-assembler-times {\tfaddv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 3 } } */
/* { dg-final { scan-assembler-times {\tfaddv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 3 } } */
/* { dg-final { scan-assembler-times {\tfaddv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */
/* { dg-final { scan-assembler-times {\tfaddv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 2 } } */

View File

@ -0,0 +1,32 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math -fno-vect-cost-model" } */
#include <stdint.h>
#define NUM_ELEMS(TYPE) (32 / sizeof (TYPE))
#define REDUC_PTR(DSTTYPE, SRCTYPE) \
void reduc_ptr_##DSTTYPE##_##SRCTYPE (DSTTYPE *restrict sum, \
SRCTYPE *restrict array, \
int count) \
{ \
*sum = 0; \
for (int i = 0; i < count; ++i) \
*sum += array[i]; \
}
/* Widening reductions. */
REDUC_PTR (int32_t, int8_t)
REDUC_PTR (int32_t, int16_t)
REDUC_PTR (int64_t, int8_t)
REDUC_PTR (int64_t, int16_t)
REDUC_PTR (int64_t, int32_t)
REDUC_PTR (float, _Float16)
REDUC_PTR (double, float)
/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */
/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 3 } } */
/* { dg-final { scan-assembler-times {\tfaddv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 1 } } */
/* { dg-final { scan-assembler-times {\tfaddv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 1 } } */

View File

@ -1,5 +1,5 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=scalable -ffast-math" } */
/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=scalable -ffast-math -fno-vect-cost-model" } */
#include <stdint.h>

View File

@ -31,37 +31,27 @@ vec_slp_##TYPE (TYPE *restrict a, TYPE *restrict b, int n) \
T (uint16_t) \
T (int32_t) \
T (uint32_t) \
T (int64_t) \
T (uint64_t) \
T (_Float16) \
T (float) \
T (double)
T (float)
TEST_ALL (VEC_PERM)
/* We can't use SLP for the 64-bit loops, since the number of reduction
results might be greater than the number of elements in the vector.
Otherwise we have two loads per loop, one for the initial vector
and one for the loop body. */
/* We have two loads per loop, one for the initial vector and one for
the loop body. */
/* { dg-final { scan-assembler-times {\tld1b\t} 2 } } */
/* { dg-final { scan-assembler-times {\tld1h\t} 3 } } */
/* { dg-final { scan-assembler-times {\tld1w\t} 3 } } */
/* { dg-final { scan-assembler-times {\tld4d\t} 3 } } */
/* { dg-final { scan-assembler-not {\tld4b\t} } } */
/* { dg-final { scan-assembler-not {\tld4h\t} } } */
/* { dg-final { scan-assembler-not {\tld4w\t} } } */
/* { dg-final { scan-assembler-not {\tld1d\t} } } */
/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.b} 8 } } */
/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.h} 8 } } */
/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.s} 8 } } */
/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.d} 8 } } */
/* { dg-final { scan-assembler-times {\tfaddv\th[0-9]+, p[0-7], z[0-9]+\.h} 4 } } */
/* { dg-final { scan-assembler-times {\tfaddv\ts[0-9]+, p[0-7], z[0-9]+\.s} 4 } } */
/* { dg-final { scan-assembler-times {\tfaddv\td[0-9]+, p[0-7], z[0-9]+\.d} 4 } } */
/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.b} 4 } } */
/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.h} 6 } } */
/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.s} 6 } } */
/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.d} 6 } } */
/* { dg-final { scan-assembler-not {\tuqdec} } } */

View File

@ -0,0 +1,43 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=scalable -ffast-math -fno-vect-cost-model" } */
#include <stdint.h>
#define VEC_PERM(TYPE) \
void __attribute__ ((noinline, noclone)) \
vec_slp_##TYPE (TYPE *restrict a, TYPE *restrict b, int n) \
{ \
TYPE x0 = b[0]; \
TYPE x1 = b[1]; \
TYPE x2 = b[2]; \
TYPE x3 = b[3]; \
for (int i = 0; i < n; ++i) \
{ \
x0 += a[i * 4]; \
x1 += a[i * 4 + 1]; \
x2 += a[i * 4 + 2]; \
x3 += a[i * 4 + 3]; \
} \
b[0] = x0; \
b[1] = x1; \
b[2] = x2; \
b[3] = x3; \
}
#define TEST_ALL(T) \
T (int64_t) \
T (uint64_t) \
T (double)
TEST_ALL (VEC_PERM)
/* We can't use SLP for the 64-bit loops, since the number of reduction
results might be greater than the number of elements in the vector. */
/* { dg-final { scan-assembler-times {\tld4d\t} 3 } } */
/* { dg-final { scan-assembler-not {\tld1d\t} } } */
/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.d} 8 } } */
/* { dg-final { scan-assembler-times {\tfaddv\td[0-9]+, p[0-7], z[0-9]+\.d} 4 } } */
/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.d} 6 } } */
/* { dg-final { scan-assembler-not {\tuqdec} } } */

View File

@ -0,0 +1,5 @@
/* { dg-do run { target aarch64_sve_hw } } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math -fno-vect-cost-model" } */
#define FILENAME "slp_7_costly.c"
#include "slp_7_run.c"

View File

@ -1,7 +1,11 @@
/* { dg-do run { target aarch64_sve_hw } } */
/* { dg-options "-O2 -ftree-vectorize -ffast-math" } */
#include "slp_7.c"
#ifndef FILENAME
#define FILENAME "slp_7.c"
#endif
#include FILENAME
#define N (54 * 4)

View File

@ -1,5 +1,5 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=scalable" } */
/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=scalable -fno-vect-cost-model" } */
#include <stdint.h>

View File

@ -1,5 +1,5 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=scalable" } */
/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=scalable -fno-vect-cost-model" } */
#include <stdint.h>