Support for aliasing with variable strides

This patch adds runtime alias checks for loops with variable strides,
so that we can vectorise them even without a restrict qualifier.
There are several parts to doing this:

1) For accesses like:

     x[i * n] += 1;

   we need to check whether n (and thus the DR_STEP) is nonzero.
   vect_analyze_data_ref_dependence records values that need to be
   checked in this way, then prune_runtime_alias_test_list records a
   bounds check on DR_STEP being outside the range [0, 0].

2) For accesses like:

     x[i * n] = x[i * n + 1] + 1;

   we simply need to test whether abs (n) >= 2.
   prune_runtime_alias_test_list looks for cases like this and tries
   to guess whether it is better to use this kind of check or a check
   for non-overlapping ranges.  (We could do an OR of the two conditions
   at runtime, but that isn't implemented yet.)

3) Checks for overlapping ranges need to cope with variable strides.
   At present the "length" of each segment in a range check is
   represented as an offset from the base that lies outside the
   touched range, in the same direction as DR_STEP.  The length
   can therefore be negative and is sometimes conservative.

   With variable steps it's easier to reaon about if we split
   this into two:

     seg_len:
       distance travelled from the first iteration of interest
       to the last, e.g. DR_STEP * (VF - 1)

     access_size:
       the number of bytes accessed in each iteration

   with access_size always being a positive constant and seg_len
   possibly being variable.  We can then combine alias checks
   for two accesses that are a constant number of bytes apart by
   adjusting the access size to account for the gap.  This leaves
   the segment length unchanged, which allows the check to be combined
   with further accesses.

   When seg_len is positive, the runtime alias check has the form:

        base_a >= base_b + seg_len_b + access_size_b
     || base_b >= base_a + seg_len_a + access_size_a

   In many accesses the base will be aligned to the access size, which
   allows us to skip the addition:

        base_a > base_b + seg_len_b
     || base_b > base_a + seg_len_a

   A similar saving is possible with "negative" lengths.

   The patch therefore tracks the alignment in addition to seg_len
   and access_size.

2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vectorizer.h (vec_lower_bound): New structure.
	(_loop_vec_info): Add check_nonzero and lower_bounds.
	(LOOP_VINFO_CHECK_NONZERO): New macro.
	(LOOP_VINFO_LOWER_BOUNDS): Likewise.
	(LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Check lower_bounds too.
	* tree-data-ref.h (dr_with_seg_len): Add access_size and align
	fields.  Make seg_len the distance travelled, not including the
	access size.
	(dr_direction_indicator): Declare.
	(dr_zero_step_indicator): Likewise.
	(dr_known_forward_stride_p): Likewise.
	* tree-data-ref.c: Include stringpool.h, tree-vrp.h and
	tree-ssanames.h.
	(runtime_alias_check_p): Allow runtime alias checks with
	variable strides.
	(operator ==): Compare access_size and align.
	(prune_runtime_alias_test_list): Rework for new distinction between
	the access_size and seg_len.
	(create_intersect_range_checks_index): Likewise.  Cope with polynomial
	segment lengths.
	(get_segment_min_max): New function.
	(create_intersect_range_checks): Use it.
	(dr_step_indicator): New function.
	(dr_direction_indicator): Likewise.
	(dr_zero_step_indicator): Likewise.
	(dr_known_forward_stride_p): Likewise.
	* tree-loop-distribution.c (data_ref_segment_size): Return
	DR_STEP * (niters - 1).
	(compute_alias_check_pairs): Update call to the dr_with_seg_len
	constructor.
	* tree-vect-data-refs.c (vect_check_nonzero_value): New function.
	(vect_preserves_scalar_order_p): New function, split out from...
	(vect_analyze_data_ref_dependence): ...here.  Check for zero steps.
	(vect_vfa_segment_size): Return DR_STEP * (length_factor - 1).
	(vect_vfa_access_size): New function.
	(vect_vfa_align): Likewise.
	(vect_compile_time_alias): Take access_size_a and access_b arguments.
	(dump_lower_bound): New function.
	(vect_check_lower_bound): Likewise.
	(vect_small_gap_p): Likewise.
	(vectorizable_with_step_bound_p): Likewise.
	(vect_prune_runtime_alias_test_list): Ignore cross-iteration
	depencies if the vectorization factor is 1.  Convert the checks
	for nonzero steps into checks on the bounds of DR_STEP.  Try using
	a bunds check for variable steps if the minimum required step is
	relatively small. Update calls to the dr_with_seg_len
	constructor and to vect_compile_time_alias.
	* tree-vect-loop-manip.c (vect_create_cond_for_lower_bounds): New
	function.
	(vect_loop_versioning): Call it.
	* tree-vect-loop.c (vect_analyze_loop_2): Clear LOOP_VINFO_LOWER_BOUNDS
	when retrying.
	(vect_estimate_min_profitable_iters): Account for any bounds checks.

gcc/testsuite/
	* gcc.dg/vect/bb-slp-cond-1.c: Expect loop vectorization rather
	than SLP vectorization.
	* gcc.dg/vect/vect-alias-check-10.c: New test.
	* gcc.dg/vect/vect-alias-check-11.c: Likewise.
	* gcc.dg/vect/vect-alias-check-12.c: Likewise.
	* gcc.dg/vect/vect-alias-check-8.c: Likewise.
	* gcc.dg/vect/vect-alias-check-9.c: Likewise.
	* gcc.target/aarch64/sve/strided_load_8.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_1.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_1.h: Likewise.
	* gcc.target/aarch64/sve/var_stride_1_run.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_2.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_2_run.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_3.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_3_run.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_4.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_4_run.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_5.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_5_run.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_6.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_6_run.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_7.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_7_run.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_8.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_8_run.c: Likewise.
	* gfortran.dg/vect/vect-alias-check-1.F90: Likewise.

Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>

From-SVN: r256644
This commit is contained in:
Richard Sandiford 2018-01-13 18:02:10 +00:00 committed by Richard Sandiford
parent f307441ac4
commit a57776a113
34 changed files with 1725 additions and 250 deletions

View File

@ -1,3 +1,61 @@
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* tree-vectorizer.h (vec_lower_bound): New structure.
(_loop_vec_info): Add check_nonzero and lower_bounds.
(LOOP_VINFO_CHECK_NONZERO): New macro.
(LOOP_VINFO_LOWER_BOUNDS): Likewise.
(LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Check lower_bounds too.
* tree-data-ref.h (dr_with_seg_len): Add access_size and align
fields. Make seg_len the distance travelled, not including the
access size.
(dr_direction_indicator): Declare.
(dr_zero_step_indicator): Likewise.
(dr_known_forward_stride_p): Likewise.
* tree-data-ref.c: Include stringpool.h, tree-vrp.h and
tree-ssanames.h.
(runtime_alias_check_p): Allow runtime alias checks with
variable strides.
(operator ==): Compare access_size and align.
(prune_runtime_alias_test_list): Rework for new distinction between
the access_size and seg_len.
(create_intersect_range_checks_index): Likewise. Cope with polynomial
segment lengths.
(get_segment_min_max): New function.
(create_intersect_range_checks): Use it.
(dr_step_indicator): New function.
(dr_direction_indicator): Likewise.
(dr_zero_step_indicator): Likewise.
(dr_known_forward_stride_p): Likewise.
* tree-loop-distribution.c (data_ref_segment_size): Return
DR_STEP * (niters - 1).
(compute_alias_check_pairs): Update call to the dr_with_seg_len
constructor.
* tree-vect-data-refs.c (vect_check_nonzero_value): New function.
(vect_preserves_scalar_order_p): New function, split out from...
(vect_analyze_data_ref_dependence): ...here. Check for zero steps.
(vect_vfa_segment_size): Return DR_STEP * (length_factor - 1).
(vect_vfa_access_size): New function.
(vect_vfa_align): Likewise.
(vect_compile_time_alias): Take access_size_a and access_b arguments.
(dump_lower_bound): New function.
(vect_check_lower_bound): Likewise.
(vect_small_gap_p): Likewise.
(vectorizable_with_step_bound_p): Likewise.
(vect_prune_runtime_alias_test_list): Ignore cross-iteration
depencies if the vectorization factor is 1. Convert the checks
for nonzero steps into checks on the bounds of DR_STEP. Try using
a bunds check for variable steps if the minimum required step is
relatively small. Update calls to the dr_with_seg_len
constructor and to vect_compile_time_alias.
* tree-vect-loop-manip.c (vect_create_cond_for_lower_bounds): New
function.
(vect_loop_versioning): Call it.
* tree-vect-loop.c (vect_analyze_loop_2): Clear LOOP_VINFO_LOWER_BOUNDS
when retrying.
(vect_estimate_min_profitable_iters): Account for any bounds checks.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>

View File

@ -1,3 +1,34 @@
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* gcc.dg/vect/bb-slp-cond-1.c: Expect loop vectorization rather
than SLP vectorization.
* gcc.dg/vect/vect-alias-check-10.c: New test.
* gcc.dg/vect/vect-alias-check-11.c: Likewise.
* gcc.dg/vect/vect-alias-check-12.c: Likewise.
* gcc.dg/vect/vect-alias-check-8.c: Likewise.
* gcc.dg/vect/vect-alias-check-9.c: Likewise.
* gcc.target/aarch64/sve/strided_load_8.c: Likewise.
* gcc.target/aarch64/sve/var_stride_1.c: Likewise.
* gcc.target/aarch64/sve/var_stride_1.h: Likewise.
* gcc.target/aarch64/sve/var_stride_1_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_2.c: Likewise.
* gcc.target/aarch64/sve/var_stride_2_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_3.c: Likewise.
* gcc.target/aarch64/sve/var_stride_3_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_4.c: Likewise.
* gcc.target/aarch64/sve/var_stride_4_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_5.c: Likewise.
* gcc.target/aarch64/sve/var_stride_5_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_6.c: Likewise.
* gcc.target/aarch64/sve/var_stride_6_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_7.c: Likewise.
* gcc.target/aarch64/sve/var_stride_7_run.c: Likewise.
* gcc.target/aarch64/sve/var_stride_8.c: Likewise.
* gcc.target/aarch64/sve/var_stride_8_run.c: Likewise.
* gfortran.dg/vect/vect-alias-check-1.F90: Likewise.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>

View File

@ -45,10 +45,6 @@ int main ()
return 0;
}
/* Basic blocks of if-converted loops are vectorized from within the loop
vectorizer pass. In this case it is really a deficiency in loop
vectorization data dependence analysis that causes us to require
basic block vectorization in the first place. */
/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "vect" { target vect_element_align } } } */
/* { dg-final { scan-tree-dump {(no need for alias check [^\n]* when VF is 1|no alias between [^\n]* when [^\n]* is outside \(-16, 16\))} "vect" { target vect_element_align } } } */
/* { dg-final { scan-tree-dump-times "loop vectorized" 1 "vect" { target vect_element_align } } } */

View File

@ -0,0 +1,69 @@
/* { dg-do run } */
#define N 87
#define M 6
typedef signed char sc;
typedef unsigned char uc;
typedef signed short ss;
typedef unsigned short us;
typedef int si;
typedef unsigned int ui;
typedef signed long long sll;
typedef unsigned long long ull;
#define FOR_EACH_TYPE(M) \
M (sc) M (uc) \
M (ss) M (us) \
M (si) M (ui) \
M (sll) M (ull) \
M (float) M (double)
#define TEST_VALUE(I) ((I) * 5 / 2)
#define ADD_TEST(TYPE) \
void __attribute__((noinline, noclone)) \
test_##TYPE (TYPE *a, int step) \
{ \
for (int i = 0; i < N; ++i) \
{ \
a[i * step + 0] = a[i * step + 0] + 1; \
a[i * step + 1] = a[i * step + 1] + 2; \
a[i * step + 2] = a[i * step + 2] + 4; \
a[i * step + 3] = a[i * step + 3] + 8; \
} \
} \
void __attribute__((noinline, noclone)) \
ref_##TYPE (TYPE *a, int step) \
{ \
for (int i = 0; i < N; ++i) \
{ \
a[i * step + 0] = a[i * step + 0] + 1; \
a[i * step + 1] = a[i * step + 1] + 2; \
a[i * step + 2] = a[i * step + 2] + 4; \
a[i * step + 3] = a[i * step + 3] + 8; \
asm volatile (""); \
} \
}
#define DO_TEST(TYPE) \
for (int j = -M; j <= M; ++j) \
{ \
TYPE a[N * M], b[N * M]; \
for (int i = 0; i < N * M; ++i) \
a[i] = b[i] = TEST_VALUE (i); \
int offset = (j < 0 ? N * M - 4 : 0); \
test_##TYPE (a + offset, j); \
ref_##TYPE (b + offset, j); \
if (__builtin_memcmp (a, b, sizeof (a)) != 0) \
__builtin_abort (); \
}
FOR_EACH_TYPE (ADD_TEST)
int
main (void)
{
FOR_EACH_TYPE (DO_TEST)
return 0;
}

View File

@ -0,0 +1,99 @@
/* { dg-do run } */
#define N 87
#define M 6
typedef signed char sc;
typedef unsigned char uc;
typedef signed short ss;
typedef unsigned short us;
typedef int si;
typedef unsigned int ui;
typedef signed long long sll;
typedef unsigned long long ull;
#define FOR_EACH_TYPE(M) \
M (sc) M (uc) \
M (ss) M (us) \
M (si) M (ui) \
M (sll) M (ull) \
M (float) M (double)
#define TEST_VALUE1(I) ((I) * 5 / 2)
#define TEST_VALUE2(I) ((I) * 11 / 5)
#define ADD_TEST(TYPE) \
void __attribute__((noinline, noclone)) \
test_##TYPE (TYPE *restrict a, TYPE *restrict b, \
int step) \
{ \
for (int i = 0; i < N; ++i) \
{ \
TYPE r1 = a[i * step + 0] += 1; \
a[i * step + 1] += 2; \
a[i * step + 2] += 4; \
a[i * step + 3] += 8; \
b[i] += r1; \
} \
} \
\
void __attribute__((noinline, noclone)) \
ref_##TYPE (TYPE *restrict a, TYPE *restrict b, \
int step) \
{ \
for (int i = 0; i < N; ++i) \
{ \
TYPE r1 = a[i * step + 0] += 1; \
a[i * step + 1] += 2; \
a[i * step + 2] += 4; \
a[i * step + 3] += 8; \
b[i] += r1; \
asm volatile (""); \
} \
}
#define DO_TEST(TYPE) \
for (int j = -M; j <= M; ++j) \
{ \
TYPE a1[N * M], a2[N * M], b1[N], b2[N]; \
for (int i = 0; i < N * M; ++i) \
a1[i] = a2[i] = TEST_VALUE1 (i); \
for (int i = 0; i < N; ++i) \
b1[i] = b2[i] = TEST_VALUE2 (i); \
int offset = (j < 0 ? N * M - 4 : 0); \
test_##TYPE (a1 + offset, b1, j); \
ref_##TYPE (a2 + offset, b2, j); \
if (__builtin_memcmp (a1, a2, sizeof (a1)) != 0) \
__builtin_abort (); \
if (__builtin_memcmp (b1, b2, sizeof (b1)) != 0) \
__builtin_abort (); \
}
FOR_EACH_TYPE (ADD_TEST)
int
main (void)
{
FOR_EACH_TYPE (DO_TEST)
return 0;
}
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* is outside \(-2, 2\)} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* is outside \(-3, 3\)} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* is outside \(-4, 4\)} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {run-time check [^\n]* abs \([^*]*\) >= 4} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* \* 2[)]* is outside \(-4, 4\)} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* \* 2[)]* is outside \(-6, 6\)} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* \* 2[)]* is outside \(-8, 8\)} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {run-time check [^\n]* abs \([^*]* \* 2[)]* >= 8} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* \* 4[)]* is outside \(-8, 8\)} "vect" { target { vect_int || vect_float } } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* \* 4[)]* is outside \(-12, 12\)} "vect" { target { vect_int || vect_float } } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* \* 4[)]* is outside \(-16, 16\)} "vect" { target { vect_int || vect_float } } } } */
/* { dg-final { scan-tree-dump {run-time check [^\n]* abs \([^*]* \* 4[)]* >= 16} "vect" { target { vect_int || vect_float } } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* \* 8[)]* is outside \(-16, 16\)} "vect" { target vect_double } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* \* 8[)]* is outside \(-24, 24\)} "vect" { target vect_double } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* \* 8[)]* is outside \(-32, 32\)} "vect" { target vect_double } } } */
/* { dg-final { scan-tree-dump {run-time check [^\n]* abs \([^*]* \* 8[)]* >= 32} "vect" { target vect_double } } } */

View File

@ -0,0 +1,99 @@
/* { dg-do run } */
#define N 87
#define M 7
typedef signed char sc;
typedef unsigned char uc;
typedef signed short ss;
typedef unsigned short us;
typedef int si;
typedef unsigned int ui;
typedef signed long long sll;
typedef unsigned long long ull;
#define FOR_EACH_TYPE(M) \
M (sc) M (uc) \
M (ss) M (us) \
M (si) M (ui) \
M (sll) M (ull) \
M (float) M (double)
#define TEST_VALUE1(I) ((I) * 5 / 2)
#define TEST_VALUE2(I) ((I) * 11 / 5)
#define ADD_TEST(TYPE) \
void __attribute__((noinline, noclone)) \
test_##TYPE (TYPE *restrict a, TYPE *restrict b, \
int step) \
{ \
step = step & M; \
for (int i = 0; i < N; ++i) \
{ \
TYPE r1 = a[i * step + 0] += 1; \
a[i * step + 1] += 2; \
a[i * step + 2] += 4; \
a[i * step + 3] += 8; \
b[i] += r1; \
} \
} \
\
void __attribute__((noinline, noclone)) \
ref_##TYPE (TYPE *restrict a, TYPE *restrict b, \
int step) \
{ \
for (unsigned short i = 0; i < N; ++i) \
{ \
TYPE r1 = a[i * step + 0] += 1; \
a[i * step + 1] += 2; \
a[i * step + 2] += 4; \
a[i * step + 3] += 8; \
b[i] += r1; \
asm volatile (""); \
} \
}
#define DO_TEST(TYPE) \
for (int j = 0; j <= M; ++j) \
{ \
TYPE a1[N * M], a2[N * M], b1[N], b2[N]; \
for (int i = 0; i < N * M; ++i) \
a1[i] = a2[i] = TEST_VALUE1 (i); \
for (int i = 0; i < N; ++i) \
b1[i] = b2[i] = TEST_VALUE2 (i); \
test_##TYPE (a1, b1, j); \
ref_##TYPE (a2, b2, j); \
if (__builtin_memcmp (a1, a2, sizeof (a1)) != 0) \
__builtin_abort (); \
if (__builtin_memcmp (b1, b2, sizeof (b1)) != 0) \
__builtin_abort (); \
}
FOR_EACH_TYPE (ADD_TEST)
int
main (void)
{
FOR_EACH_TYPE (DO_TEST)
return 0;
}
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* is outside \[0, 2\)} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* is outside \[0, 3\)} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* is outside \[0, 4\)} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {run-time check [^\n]* unsigned \([^*]*\) >= 4} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* \* 2[)]* is outside \[0, 4\)} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* \* 2[)]* is outside \[0, 6\)} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* \* 2[)]* is outside \[0, 8\)} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {run-time check [^\n]* unsigned \([^*]* \* 2[)]* >= 8} "vect" { target vect_int } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* \* 4[)]* is outside \[0, 8\)} "vect" { target { vect_int || vect_float } }} } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* \* 4[)]* is outside \[0, 12\)} "vect" { target { vect_int || vect_float } }} } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* \* 4[)]* is outside \[0, 16\)} "vect" { target { vect_int || vect_float } }} } */
/* { dg-final { scan-tree-dump {run-time check [^\n]* unsigned \([^*]* \* 4[)]* >= 16} "vect" { target { vect_int || vect_float } }} } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* \* 8[)]* is outside \[0, 16\)} "vect" { target vect_double } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* \* 8[)]* is outside \[0, 24\)} "vect" { target vect_double } } } */
/* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* \* 8[)]* is outside \[0, 32\)} "vect" { target vect_double } } } */
/* { dg-final { scan-tree-dump {run-time check [^\n]* unsigned \([^*]* \* 8[)]* >= 32} "vect" { target vect_double } } } */

View File

@ -0,0 +1,62 @@
/* { dg-do run } */
#define N 200
#define DIST 32
typedef signed char sc;
typedef unsigned char uc;
typedef signed short ss;
typedef unsigned short us;
typedef int si;
typedef unsigned int ui;
typedef signed long long sll;
typedef unsigned long long ull;
#define FOR_EACH_TYPE(M) \
M (sc) M (uc) \
M (ss) M (us) \
M (si) M (ui) \
M (sll) M (ull) \
M (float) M (double)
#define TEST_VALUE(I) ((I) * 5 / 2)
#define ADD_TEST(TYPE) \
TYPE a_##TYPE[N * 2]; \
void __attribute__((noinline, noclone)) \
test_##TYPE (int x, int y) \
{ \
for (int i = 0; i < N; ++i) \
a_##TYPE[i + x] += a_##TYPE[i + y]; \
}
#define DO_TEST(TYPE) \
for (int i = 0; i < DIST * 2; ++i) \
{ \
for (int j = 0; j < N + DIST * 2; ++j) \
a_##TYPE[j] = TEST_VALUE (j); \
test_##TYPE (i, DIST); \
for (int j = 0; j < N + DIST * 2; ++j) \
{ \
TYPE expected; \
if (j < i || j >= i + N) \
expected = TEST_VALUE (j); \
else if (i <= DIST) \
expected = ((TYPE) TEST_VALUE (j) \
+ (TYPE) TEST_VALUE (j - i + DIST)); \
else \
expected = ((TYPE) TEST_VALUE (j) \
+ a_##TYPE[j - i + DIST]); \
if (expected != a_##TYPE[j]) \
__builtin_abort (); \
} \
}
FOR_EACH_TYPE (ADD_TEST)
int
main (void)
{
FOR_EACH_TYPE (DO_TEST)
return 0;
}

View File

@ -0,0 +1,55 @@
/* { dg-do run } */
#define N 200
#define M 4
typedef signed char sc;
typedef unsigned char uc;
typedef signed short ss;
typedef unsigned short us;
typedef int si;
typedef unsigned int ui;
typedef signed long long sll;
typedef unsigned long long ull;
#define FOR_EACH_TYPE(M) \
M (sc) M (uc) \
M (ss) M (us) \
M (si) M (ui) \
M (sll) M (ull) \
M (float) M (double)
#define TEST_VALUE(I) ((I) * 5 / 2)
#define ADD_TEST(TYPE) \
void __attribute__((noinline, noclone)) \
test_##TYPE (TYPE *a, TYPE *b) \
{ \
for (int i = 0; i < N; i += 2) \
{ \
a[i + 0] = b[i + 0] + 2; \
a[i + 1] = b[i + 1] + 3; \
} \
}
#define DO_TEST(TYPE) \
for (int j = 1; j < M; ++j) \
{ \
TYPE a[N + M]; \
for (int i = 0; i < N + M; ++i) \
a[i] = TEST_VALUE (i); \
test_##TYPE (a + j, a); \
for (int i = 0; i < N; i += 2) \
if (a[i + j] != (TYPE) (a[i] + 2) \
|| a[i + j + 1] != (TYPE) (a[i + 1] + 3)) \
__builtin_abort (); \
}
FOR_EACH_TYPE (ADD_TEST)
int
main (void)
{
FOR_EACH_TYPE (DO_TEST)
return 0;
}

View File

@ -0,0 +1,15 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize" } */
void
foo (double *x, int m)
{
for (int i = 0; i < 256; ++i)
x[i * m] += x[i * m];
}
/* { dg-final { scan-assembler-times {\tcbz\tw1,} 1 } } */
/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, } 1 } } */
/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, } 1 } } */
/* { dg-final { scan-assembler-times {\tldr\t} 1 } } */
/* { dg-final { scan-assembler-times {\tstr\t} 1 } } */

View File

@ -0,0 +1,27 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize" } */
#define TYPE int
#define SIZE 257
void __attribute__ ((weak))
f (TYPE *x, TYPE *y, unsigned short n, long m __attribute__((unused)))
{
for (int i = 0; i < SIZE; ++i)
x[i * n] += y[i * n];
}
/* { dg-final { scan-assembler {\tld1w\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tst1w\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tldr\tw[0-9]+} } } */
/* { dg-final { scan-assembler {\tstr\tw[0-9]+} } } */
/* Should multiply by (VF-1)*4 rather than (257-1)*4. */
/* { dg-final { scan-assembler-not {, 1024} } } */
/* { dg-final { scan-assembler-not {\t.bfiz\t} } } */
/* { dg-final { scan-assembler-not {lsl[^\n]*[, ]10} } } */
/* { dg-final { scan-assembler-not {\tcmp\tx[0-9]+, 0} } } */
/* { dg-final { scan-assembler-not {\tcmp\tw[0-9]+, 0} } } */
/* { dg-final { scan-assembler-not {\tcsel\tx[0-9]+} } } */
/* Two range checks and a check for n being zero. */
/* { dg-final { scan-assembler-times {\tcmp\t} 1 } } */
/* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */

View File

@ -0,0 +1,61 @@
extern void abort (void) __attribute__ ((noreturn));
#define MARGIN 6
void __attribute__ ((weak, optimize ("no-tree-vectorize")))
test (int n, int m, int offset)
{
int abs_n = (n < 0 ? -n : n);
int abs_m = (m < 0 ? -m : m);
int max_i = (abs_n > abs_m ? abs_n : abs_m);
int abs_offset = (offset < 0 ? -offset : offset);
int size = MARGIN * 2 + max_i * SIZE + abs_offset;
TYPE *array = (TYPE *) __builtin_alloca (size * sizeof (TYPE));
for (int i = 0; i < size; ++i)
array[i] = i;
int base_x = offset < 0 ? MARGIN - offset : MARGIN;
int base_y = offset < 0 ? MARGIN : MARGIN + offset;
int start_x = n < 0 ? base_x - n * (SIZE - 1) : base_x;
int start_y = m < 0 ? base_y - m * (SIZE - 1) : base_y;
f (&array[start_x], &array[start_y], n, m);
int j = 0;
int start = (n < 0 ? size - 1 : 0);
int end = (n < 0 ? -1 : size);
int inc = (n < 0 ? -1 : 1);
for (int i = start; i != end; i += inc)
{
if (j == SIZE || i != start_x + j * n)
{
if (array[i] != i)
abort ();
}
else if (n == 0)
{
TYPE sum = i;
for (; j < SIZE; j++)
{
int next_y = start_y + j * m;
if (n >= 0 ? next_y < i : next_y > i)
sum += array[next_y];
else if (next_y == i)
sum += sum;
else
sum += next_y;
}
if (array[i] != sum)
abort ();
}
else
{
int next_y = start_y + j * m;
TYPE base = i;
if (n >= 0 ? next_y < i : next_y > i)
base += array[next_y];
else
base += next_y;
if (array[i] != base)
abort ();
j += 1;
}
}
}

View File

@ -0,0 +1,14 @@
/* { dg-do run { target { aarch64_sve_hw } } } */
/* { dg-options "-O2 -ftree-vectorize" } */
#include "var_stride_1.c"
#include "var_stride_1.h"
int
main (void)
{
for (int n = 0; n < 10; ++n)
for (int offset = -33; offset <= 33; ++offset)
test (n, n, offset);
return 0;
}

View File

@ -0,0 +1,25 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize" } */
#define TYPE int
#define SIZE 257
void __attribute__ ((weak))
f (TYPE *x, TYPE *y, unsigned short n, unsigned short m)
{
for (int i = 0; i < SIZE; ++i)
x[i * n] += y[i * m];
}
/* { dg-final { scan-assembler {\tld1w\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tst1w\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tldr\tw[0-9]+} } } */
/* { dg-final { scan-assembler {\tstr\tw[0-9]+} } } */
/* Should multiply by (257-1)*4 rather than (VF-1)*4. */
/* { dg-final { scan-assembler-times {\tadd\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 10\n} 2 } } */
/* { dg-final { scan-assembler-not {\tcmp\tx[0-9]+, 0} } } */
/* { dg-final { scan-assembler-not {\tcmp\tw[0-9]+, 0} } } */
/* { dg-final { scan-assembler-not {\tcsel\tx[0-9]+} } } */
/* Two range checks and a check for n being zero. (m being zero is OK.) */
/* { dg-final { scan-assembler-times {\tcmp\t} 1 } } */
/* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */

View File

@ -0,0 +1,18 @@
/* { dg-do run { target { aarch64_sve_hw } } } */
/* { dg-options "-O2 -ftree-vectorize" } */
#include "var_stride_2.c"
#include "var_stride_1.h"
int
main (void)
{
for (int n = 0; n < 10; ++n)
for (int m = 0; m < 10; ++m)
for (int offset = -17; offset <= 17; ++offset)
{
test (n, m, offset);
test (n, m, offset + n * (SIZE - 1));
}
return 0;
}

View File

@ -0,0 +1,27 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize" } */
#define TYPE int
#define SIZE 257
void __attribute__ ((weak))
f (TYPE *x, TYPE *y, int n, long m __attribute__((unused)))
{
for (int i = 0; i < SIZE; ++i)
x[i * n] += y[i * n];
}
/* { dg-final { scan-assembler {\tld1w\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tst1w\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tldr\tw[0-9]+} } } */
/* { dg-final { scan-assembler {\tstr\tw[0-9]+} } } */
/* Should multiply by (VF-1)*4 rather than (257-1)*4. */
/* { dg-final { scan-assembler-not {, 1024} } } */
/* { dg-final { scan-assembler-not {\t.bfiz\t} } } */
/* { dg-final { scan-assembler-not {lsl[^\n]*[, ]10} } } */
/* { dg-final { scan-assembler-not {\tcmp\tx[0-9]+, 0} } } */
/* { dg-final { scan-assembler {\tcmp\tw2, 0} } } */
/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+} 2 } } */
/* Two range checks and a check for n being zero. */
/* { dg-final { scan-assembler {\tcmp\t} } } */
/* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */

View File

@ -0,0 +1,14 @@
/* { dg-do run { target { aarch64_sve_hw } } } */
/* { dg-options "-O2 -ftree-vectorize" } */
#include "var_stride_3.c"
#include "var_stride_1.h"
int
main (void)
{
for (int n = -10; n < 10; ++n)
for (int offset = -33; offset <= 33; ++offset)
test (n, n, offset);
return 0;
}

View File

@ -0,0 +1,25 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize" } */
#define TYPE int
#define SIZE 257
void __attribute__ ((weak))
f (TYPE *x, TYPE *y, int n, int m)
{
for (int i = 0; i < SIZE; ++i)
x[i * n] += y[i * m];
}
/* { dg-final { scan-assembler {\tld1w\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tst1w\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tldr\tw[0-9]+} } } */
/* { dg-final { scan-assembler {\tstr\tw[0-9]+} } } */
/* Should multiply by (257-1)*4 rather than (VF-1)*4. */
/* { dg-final { scan-assembler-times {\tlsl\tx[0-9]+, x[0-9]+, 10\n} 2 } } */
/* { dg-final { scan-assembler {\tcmp\tw2, 0} } } */
/* { dg-final { scan-assembler {\tcmp\tw3, 0} } } */
/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+} 4 } } */
/* Two range checks and a check for n being zero. (m being zero is OK.) */
/* { dg-final { scan-assembler {\tcmp\t} } } */
/* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */

View File

@ -0,0 +1,18 @@
/* { dg-do run { target { aarch64_sve_hw } } } */
/* { dg-options "-O2 -ftree-vectorize" } */
#include "var_stride_4.c"
#include "var_stride_1.h"
int
main (void)
{
for (int n = -10; n < 10; ++n)
for (int m = -10; m < 10; ++m)
for (int offset = -17; offset <= 17; ++offset)
{
test (n, m, offset);
test (n, m, offset + n * (SIZE - 1));
}
return 0;
}

View File

@ -0,0 +1,27 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize" } */
#define TYPE double
#define SIZE 257
void __attribute__ ((weak))
f (TYPE *x, TYPE *y, long n, long m __attribute__((unused)))
{
for (int i = 0; i < SIZE; ++i)
x[i * n] += y[i * n];
}
/* { dg-final { scan-assembler {\tld1d\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tst1d\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tldr\td[0-9]+} } } */
/* { dg-final { scan-assembler {\tstr\td[0-9]+} } } */
/* Should multiply by (VF-1)*8 rather than (257-1)*8. */
/* { dg-final { scan-assembler-not {, 2048} } } */
/* { dg-final { scan-assembler-not {\t.bfiz\t} } } */
/* { dg-final { scan-assembler-not {lsl[^\n]*[, ]11} } } */
/* { dg-final { scan-assembler {\tcmp\tx[0-9]+, 0} } } */
/* { dg-final { scan-assembler-not {\tcmp\tw[0-9]+, 0} } } */
/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+} 2 } } */
/* Two range checks and a check for n being zero. */
/* { dg-final { scan-assembler {\tcmp\t} } } */
/* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */

View File

@ -0,0 +1,14 @@
/* { dg-do run { target { aarch64_sve_hw } } } */
/* { dg-options "-O2 -ftree-vectorize" } */
#include "var_stride_5.c"
#include "var_stride_1.h"
int
main (void)
{
for (int n = -10; n < 10; ++n)
for (int offset = -33; offset <= 33; ++offset)
test (n, n, offset);
return 0;
}

View File

@ -0,0 +1,25 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize" } */
#define TYPE long
#define SIZE 257
void __attribute__ ((weak))
f (TYPE *x, TYPE *y, long n, long m)
{
for (int i = 0; i < SIZE; ++i)
x[i * n] += y[i * m];
}
/* { dg-final { scan-assembler {\tld1d\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tst1d\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tldr\tx[0-9]+} } } */
/* { dg-final { scan-assembler {\tstr\tx[0-9]+} } } */
/* Should multiply by (257-1)*8 rather than (VF-1)*8. */
/* { dg-final { scan-assembler-times {lsl\tx[0-9]+, x[0-9]+, 11} 2 } } */
/* { dg-final { scan-assembler {\tcmp\tx[0-9]+, 0} } } */
/* { dg-final { scan-assembler-not {\tcmp\tw[0-9]+, 0} } } */
/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+} 4 } } */
/* Two range checks and a check for n being zero. (m being zero is OK.) */
/* { dg-final { scan-assembler {\tcmp\t} } } */
/* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */

View File

@ -0,0 +1,18 @@
/* { dg-do run { target { aarch64_sve_hw } } } */
/* { dg-options "-O2 -ftree-vectorize" } */
#include "var_stride_6.c"
#include "var_stride_1.h"
int
main (void)
{
for (int n = -10; n < 10; ++n)
for (int m = -10; m < 10; ++m)
for (int offset = -17; offset <= 17; ++offset)
{
test (n, m, offset);
test (n, m, offset + n * (SIZE - 1));
}
return 0;
}

View File

@ -0,0 +1,26 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize" } */
#define TYPE double
#define SIZE 257
void __attribute__ ((weak))
f (TYPE *x, TYPE *y, long n, long m __attribute__((unused)))
{
for (int i = 0; i < SIZE; ++i)
x[i * n] += y[i];
}
/* { dg-final { scan-assembler {\tld1d\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tst1d\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tldr\td[0-9]+} } } */
/* { dg-final { scan-assembler {\tstr\td[0-9]+} } } */
/* Should multiply by (257-1)*8 rather than (VF-1)*8. */
/* { dg-final { scan-assembler-times {\tadd\tx[0-9]+, x1, 2048} 1 } } */
/* { dg-final { scan-assembler-times {lsl\tx[0-9]+, x[0-9]+, 11} 1 } } */
/* { dg-final { scan-assembler {\tcmp\tx[0-9]+, 0} } } */
/* { dg-final { scan-assembler-not {\tcmp\tw[0-9]+, 0} } } */
/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+} 2 } } */
/* Two range checks and a check for n being zero. */
/* { dg-final { scan-assembler {\tcmp\t} } } */
/* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */

View File

@ -0,0 +1,14 @@
/* { dg-do run { target { aarch64_sve_hw } } } */
/* { dg-options "-O2 -ftree-vectorize" } */
#include "var_stride_7.c"
#include "var_stride_1.h"
int
main (void)
{
for (int n = -10; n < 10; ++n)
for (int offset = -33; offset <= 33; ++offset)
test (n, 1, offset);
return 0;
}

View File

@ -0,0 +1,26 @@
/* { dg-do compile } */
/* { dg-options "-O2 -ftree-vectorize" } */
#define TYPE long
#define SIZE 257
void
f (TYPE *x, TYPE *y, long n __attribute__((unused)), long m)
{
for (int i = 0; i < SIZE; ++i)
x[i] += y[i * m];
}
/* { dg-final { scan-assembler {\tld1d\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tst1d\tz[0-9]+} } } */
/* { dg-final { scan-assembler {\tldr\tx[0-9]+} } } */
/* { dg-final { scan-assembler {\tstr\tx[0-9]+} } } */
/* Should multiply by (257-1)*8 rather than (VF-1)*8. */
/* { dg-final { scan-assembler-times {\tadd\tx[0-9]+, x0, 2048} 1 } } */
/* { dg-final { scan-assembler-times {lsl\tx[0-9]+, x[0-9]+, 11} 1 } } */
/* { dg-final { scan-assembler {\tcmp\tx[0-9]+, 0} } } */
/* { dg-final { scan-assembler-not {\tcmp\tw[0-9]+, 0} } } */
/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+} 2 } } */
/* Two range checks only; doesn't matter whether n is zero. */
/* { dg-final { scan-assembler {\tcmp\t} } } */
/* { dg-final { scan-assembler-times {\tccmp\t} 1 } } */

View File

@ -0,0 +1,14 @@
/* { dg-do run { target { aarch64_sve_hw } } } */
/* { dg-options "-O2 -ftree-vectorize" } */
#include "var_stride_8.c"
#include "var_stride_1.h"
int
main (void)
{
for (int n = -10; n < 10; ++n)
for (int offset = -33; offset <= 33; ++offset)
test (1, n, offset);
return 0;
}

View File

@ -0,0 +1,102 @@
! { dg-do run }
! { dg-additional-options "-fno-inline" }
#define N 200
#define TEST_VALUE(I) ((I) * 5 / 2)
subroutine setup(a)
real :: a(N)
do i = 1, N
a(i) = TEST_VALUE(i)
end do
end subroutine
subroutine check(a, x, gap)
real :: a(N), temp, x
integer :: gap
do i = 1, N - gap
temp = a(i + gap) + x
if (a(i) /= temp) call abort
end do
do i = N - gap + 1, N
temp = TEST_VALUE(i)
if (a(i) /= temp) call abort
end do
end subroutine
subroutine testa(a, x, base, n)
real :: a(n), x
integer :: base, n
do i = n, 2, -1
a(base + i - 1) = a(base + i) + x
end do
end subroutine testa
subroutine testb(a, x, base, n)
real :: a(n), x
integer :: base
do i = n, 4, -1
a(base + i - 3) = a(base + i) + x
end do
end subroutine testb
subroutine testc(a, x, base, n)
real :: a(n), x
integer :: base
do i = n, 8, -1
a(base + i - 7) = a(base + i) + x
end do
end subroutine testc
subroutine testd(a, x, base, n)
real :: a(n), x
integer :: base
do i = n, 16, -1
a(base + i - 15) = a(base + i) + x
end do
end subroutine testd
subroutine teste(a, x, base, n)
real :: a(n), x
integer :: base
do i = n, 32, -1
a(base + i - 31) = a(base + i) + x
end do
end subroutine teste
subroutine testf(a, x, base, n)
real :: a(n), x
integer :: base
do i = n, 64, -1
a(base + i - 63) = a(base + i) + x
end do
end subroutine testf
program main
real :: a(N)
call setup(a)
call testa(a, 91.0, 0, N)
call check(a, 91.0, 1)
call setup(a)
call testb(a, 55.0, 0, N)
call check(a, 55.0, 3)
call setup(a)
call testc(a, 72.0, 0, N)
call check(a, 72.0, 7)
call setup(a)
call testd(a, 69.0, 0, N)
call check(a, 69.0, 15)
call setup(a)
call teste(a, 44.0, 0, N)
call check(a, 44.0, 31)
call setup(a)
call testf(a, 39.0, 0, N)
call check(a, 39.0, 63)
end program

View File

@ -95,6 +95,9 @@ along with GCC; see the file COPYING3. If not see
#include "tree-affine.h"
#include "params.h"
#include "builtins.h"
#include "stringpool.h"
#include "tree-vrp.h"
#include "tree-ssanames.h"
static struct datadep_stats
{
@ -1305,18 +1308,6 @@ runtime_alias_check_p (ddr_p ddr, struct loop *loop, bool speed_p)
return false;
}
/* FORNOW: We don't support creating runtime alias tests for non-constant
step. */
if (TREE_CODE (DR_STEP (DDR_A (ddr))) != INTEGER_CST
|| TREE_CODE (DR_STEP (DDR_B (ddr))) != INTEGER_CST)
{
if (dump_enabled_p ())
dump_printf (MSG_MISSED_OPTIMIZATION,
"runtime alias check not supported for non-constant "
"step\n");
return false;
}
return true;
}
@ -1331,11 +1322,13 @@ static bool
operator == (const dr_with_seg_len& d1,
const dr_with_seg_len& d2)
{
return operand_equal_p (DR_BASE_ADDRESS (d1.dr),
DR_BASE_ADDRESS (d2.dr), 0)
&& data_ref_compare_tree (DR_OFFSET (d1.dr), DR_OFFSET (d2.dr)) == 0
&& data_ref_compare_tree (DR_INIT (d1.dr), DR_INIT (d2.dr)) == 0
&& data_ref_compare_tree (d1.seg_len, d2.seg_len) == 0;
return (operand_equal_p (DR_BASE_ADDRESS (d1.dr),
DR_BASE_ADDRESS (d2.dr), 0)
&& data_ref_compare_tree (DR_OFFSET (d1.dr), DR_OFFSET (d2.dr)) == 0
&& data_ref_compare_tree (DR_INIT (d1.dr), DR_INIT (d2.dr)) == 0
&& data_ref_compare_tree (d1.seg_len, d2.seg_len) == 0
&& known_eq (d1.access_size, d2.access_size)
&& d1.align == d2.align);
}
/* Comparison function for sorting objects of dr_with_seg_len_pair_t
@ -1415,7 +1408,7 @@ comp_dr_with_seg_len_pair (const void *pa_, const void *pb_)
void
prune_runtime_alias_test_list (vec<dr_with_seg_len_pair_t> *alias_pairs,
poly_uint64 factor)
poly_uint64)
{
/* Sort the collected data ref pairs so that we can scan them once to
combine all possible aliasing checks. */
@ -1461,6 +1454,8 @@ prune_runtime_alias_test_list (vec<dr_with_seg_len_pair_t> *alias_pairs,
}
poly_int64 init_a1, init_a2;
/* Only consider cases in which the distance between the initial
DR_A1 and the initial DR_A2 is known at compile time. */
if (!operand_equal_p (DR_BASE_ADDRESS (dr_a1->dr),
DR_BASE_ADDRESS (dr_a2->dr), 0)
|| !operand_equal_p (DR_OFFSET (dr_a1->dr),
@ -1480,141 +1475,79 @@ prune_runtime_alias_test_list (vec<dr_with_seg_len_pair_t> *alias_pairs,
std::swap (init_a1, init_a2);
}
/* Only merge const step data references. */
poly_int64 step_a1, step_a2;
if (!poly_int_tree_p (DR_STEP (dr_a1->dr), &step_a1)
|| !poly_int_tree_p (DR_STEP (dr_a2->dr), &step_a2))
continue;
/* Work out what the segment length would be if we did combine
DR_A1 and DR_A2:
bool neg_step = maybe_lt (step_a1, 0) || maybe_lt (step_a2, 0);
- If DR_A1 and DR_A2 have equal lengths, that length is
also the combined length.
/* DR_A1 and DR_A2 must go in the same direction. */
if (neg_step && (maybe_gt (step_a1, 0) || maybe_gt (step_a2, 0)))
continue;
- If DR_A1 and DR_A2 both have negative "lengths", the combined
length is the lower bound on those lengths.
poly_uint64 seg_len_a1 = 0, seg_len_a2 = 0;
bool const_seg_len_a1 = poly_int_tree_p (dr_a1->seg_len,
&seg_len_a1);
bool const_seg_len_a2 = poly_int_tree_p (dr_a2->seg_len,
&seg_len_a2);
- If DR_A1 and DR_A2 both have positive lengths, the combined
length is the upper bound on those lengths.
/* We need to compute merged segment length at compilation time for
dr_a1 and dr_a2, which is impossible if either one has non-const
segment length. */
if ((!const_seg_len_a1 || !const_seg_len_a2)
&& maybe_ne (step_a1, step_a2))
continue;
Other cases are unlikely to give a useful combination.
bool do_remove = false;
poly_uint64 diff = init_a2 - init_a1;
poly_uint64 min_seg_len_b;
tree new_seg_len;
if (!poly_int_tree_p (dr_b1->seg_len, &min_seg_len_b))
The lengths both have sizetype, so the sign is taken from
the step instead. */
if (!operand_equal_p (dr_a1->seg_len, dr_a2->seg_len, 0))
{
tree step_b = DR_STEP (dr_b1->dr);
if (!tree_fits_shwi_p (step_b))
poly_uint64 seg_len_a1, seg_len_a2;
if (!poly_int_tree_p (dr_a1->seg_len, &seg_len_a1)
|| !poly_int_tree_p (dr_a2->seg_len, &seg_len_a2))
continue;
min_seg_len_b = factor * abs_hwi (tree_to_shwi (step_b));
tree indicator_a = dr_direction_indicator (dr_a1->dr);
if (TREE_CODE (indicator_a) != INTEGER_CST)
continue;
tree indicator_b = dr_direction_indicator (dr_a2->dr);
if (TREE_CODE (indicator_b) != INTEGER_CST)
continue;
int sign_a = tree_int_cst_sgn (indicator_a);
int sign_b = tree_int_cst_sgn (indicator_b);
poly_uint64 new_seg_len;
if (sign_a <= 0 && sign_b <= 0)
new_seg_len = lower_bound (seg_len_a1, seg_len_a2);
else if (sign_a >= 0 && sign_b >= 0)
new_seg_len = upper_bound (seg_len_a1, seg_len_a2);
else
continue;
dr_a1->seg_len = build_int_cst (TREE_TYPE (dr_a1->seg_len),
new_seg_len);
dr_a1->align = MIN (dr_a1->align, known_alignment (new_seg_len));
}
/* Now we try to merge alias check dr_a1 & dr_b and dr_a2 & dr_b.
/* This is always positive due to the swap above. */
poly_uint64 diff = init_a2 - init_a1;
Case A:
check if the following condition is satisfied:
DIFF - SEGMENT_LENGTH_A < SEGMENT_LENGTH_B
where DIFF = DR_A2_INIT - DR_A1_INIT. However,
SEGMENT_LENGTH_A or SEGMENT_LENGTH_B may not be constant so we
have to make a best estimation. We can get the minimum value
of SEGMENT_LENGTH_B as a constant, represented by MIN_SEG_LEN_B,
then either of the following two conditions can guarantee the
one above:
1: DIFF <= MIN_SEG_LEN_B
2: DIFF - SEGMENT_LENGTH_A < MIN_SEG_LEN_B
Because DIFF - SEGMENT_LENGTH_A is done in sizetype, we need
to take care of wrapping behavior in it.
Case B:
If the left segment does not extend beyond the start of the
right segment the new segment length is that of the right
plus the segment distance. The condition is like:
DIFF >= SEGMENT_LENGTH_A ;SEGMENT_LENGTH_A is a constant.
Note 1: Case A.2 and B combined together effectively merges every
dr_a1 & dr_b and dr_a2 & dr_b when SEGMENT_LENGTH_A is const.
Note 2: Above description is based on positive DR_STEP, we need to
take care of negative DR_STEP for wrapping behavior. See PR80815
for more information. */
if (neg_step)
/* The new check will start at DR_A1. Make sure that its access
size encompasses the initial DR_A2. */
if (maybe_lt (dr_a1->access_size, diff + dr_a2->access_size))
{
/* Adjust diff according to access size of both references. */
diff += tree_to_poly_uint64
(TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a2->dr))));
diff -= tree_to_poly_uint64
(TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a1->dr))));
/* Case A.1. */
if (known_le (diff, min_seg_len_b)
/* Case A.2 and B combined. */
|| const_seg_len_a2)
{
if (const_seg_len_a1 || const_seg_len_a2)
new_seg_len
= build_int_cstu (sizetype,
lower_bound (seg_len_a1 - diff,
seg_len_a2));
else
new_seg_len
= size_binop (MINUS_EXPR, dr_a2->seg_len,
build_int_cstu (sizetype, diff));
dr_a2->seg_len = new_seg_len;
do_remove = true;
}
dr_a1->access_size = upper_bound (dr_a1->access_size,
diff + dr_a2->access_size);
unsigned int new_align = known_alignment (dr_a1->access_size);
dr_a1->align = MIN (dr_a1->align, new_align);
}
else
if (dump_enabled_p ())
{
/* Case A.1. */
if (known_le (diff, min_seg_len_b)
/* Case A.2 and B combined. */
|| const_seg_len_a1)
{
if (const_seg_len_a1 && const_seg_len_a2)
new_seg_len
= build_int_cstu (sizetype,
upper_bound (seg_len_a2 + diff,
seg_len_a1));
else
new_seg_len
= size_binop (PLUS_EXPR, dr_a2->seg_len,
build_int_cstu (sizetype, diff));
dr_a1->seg_len = new_seg_len;
do_remove = true;
}
}
if (do_remove)
{
if (dump_enabled_p ())
{
dump_printf (MSG_NOTE, "merging ranges for ");
dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a1->dr));
dump_printf (MSG_NOTE, ", ");
dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b1->dr));
dump_printf (MSG_NOTE, " and ");
dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a2->dr));
dump_printf (MSG_NOTE, ", ");
dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b2->dr));
dump_printf (MSG_NOTE, "\n");
}
alias_pairs->ordered_remove (neg_step ? i - 1 : i);
i--;
dump_printf (MSG_NOTE, "merging ranges for ");
dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a1->dr));
dump_printf (MSG_NOTE, ", ");
dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b1->dr));
dump_printf (MSG_NOTE, " and ");
dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a2->dr));
dump_printf (MSG_NOTE, ", ");
dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b2->dr));
dump_printf (MSG_NOTE, "\n");
}
alias_pairs->ordered_remove (i);
i--;
}
}
}
@ -1654,7 +1587,9 @@ create_intersect_range_checks_index (struct loop *loop, tree *cond_expr,
|| DR_NUM_DIMENSIONS (dr_a.dr) != DR_NUM_DIMENSIONS (dr_b.dr))
return false;
if (!tree_fits_uhwi_p (dr_a.seg_len) || !tree_fits_uhwi_p (dr_b.seg_len))
poly_uint64 seg_len1, seg_len2;
if (!poly_int_tree_p (dr_a.seg_len, &seg_len1)
|| !poly_int_tree_p (dr_b.seg_len, &seg_len2))
return false;
if (!tree_fits_shwi_p (DR_STEP (dr_a.dr)))
@ -1669,19 +1604,42 @@ create_intersect_range_checks_index (struct loop *loop, tree *cond_expr,
gcc_assert (TREE_CODE (DR_STEP (dr_a.dr)) == INTEGER_CST);
bool neg_step = tree_int_cst_compare (DR_STEP (dr_a.dr), size_zero_node) < 0;
unsigned HOST_WIDE_INT abs_step
= absu_hwi (tree_to_shwi (DR_STEP (dr_a.dr)));
unsigned HOST_WIDE_INT abs_step = tree_to_shwi (DR_STEP (dr_a.dr));
if (neg_step)
{
abs_step = -abs_step;
seg_len1 = -seg_len1;
seg_len2 = -seg_len2;
}
else
{
/* Include the access size in the length, so that we only have one
tree addition below. */
seg_len1 += dr_a.access_size;
seg_len2 += dr_b.access_size;
}
unsigned HOST_WIDE_INT seg_len1 = tree_to_uhwi (dr_a.seg_len);
unsigned HOST_WIDE_INT seg_len2 = tree_to_uhwi (dr_b.seg_len);
/* Infer the number of iterations with which the memory segment is accessed
by DR. In other words, alias is checked if memory segment accessed by
DR_A in some iterations intersect with memory segment accessed by DR_B
in the same amount iterations.
Note segnment length is a linear function of number of iterations with
DR_STEP as the coefficient. */
unsigned HOST_WIDE_INT niter_len1 = (seg_len1 + abs_step - 1) / abs_step;
unsigned HOST_WIDE_INT niter_len2 = (seg_len2 + abs_step - 1) / abs_step;
poly_uint64 niter_len1, niter_len2;
if (!can_div_trunc_p (seg_len1 + abs_step - 1, abs_step, &niter_len1)
|| !can_div_trunc_p (seg_len2 + abs_step - 1, abs_step, &niter_len2))
return false;
poly_uint64 niter_access1 = 0, niter_access2 = 0;
if (neg_step)
{
/* Divide each access size by the byte step, rounding up. */
if (!can_div_trunc_p (dr_a.access_size - abs_step - 1,
abs_step, &niter_access1)
|| !can_div_trunc_p (dr_b.access_size + abs_step - 1,
abs_step, &niter_access2))
return false;
}
unsigned int i;
for (i = 0; i < DR_NUM_DIMENSIONS (dr_a.dr); i++)
@ -1732,12 +1690,22 @@ create_intersect_range_checks_index (struct loop *loop, tree *cond_expr,
/* Adjust ranges for negative step. */
if (neg_step)
{
min1 = fold_build2 (MINUS_EXPR, TREE_TYPE (min1), max1, idx_step);
max1 = fold_build2 (MINUS_EXPR, TREE_TYPE (min1),
CHREC_LEFT (access1), idx_step);
min2 = fold_build2 (MINUS_EXPR, TREE_TYPE (min2), max2, idx_step);
max2 = fold_build2 (MINUS_EXPR, TREE_TYPE (min2),
CHREC_LEFT (access2), idx_step);
/* IDX_LEN1 and IDX_LEN2 are negative in this case. */
std::swap (min1, max1);
std::swap (min2, max2);
/* As with the lengths just calculated, we've measured the access
sizes in iterations, so multiply them by the index step. */
tree idx_access1
= fold_build2 (MULT_EXPR, TREE_TYPE (min1), idx_step,
build_int_cst (TREE_TYPE (min1), niter_access1));
tree idx_access2
= fold_build2 (MULT_EXPR, TREE_TYPE (min2), idx_step,
build_int_cst (TREE_TYPE (min2), niter_access2));
/* MINUS_EXPR because the above values are negative. */
max1 = fold_build2 (MINUS_EXPR, TREE_TYPE (max1), max1, idx_access1);
max2 = fold_build2 (MINUS_EXPR, TREE_TYPE (max2), max2, idx_access2);
}
tree part_cond_expr
= fold_build2 (TRUTH_OR_EXPR, boolean_type_node,
@ -1752,6 +1720,89 @@ create_intersect_range_checks_index (struct loop *loop, tree *cond_expr,
return true;
}
/* If ALIGN is nonzero, set up *SEQ_MIN_OUT and *SEQ_MAX_OUT so that for
every address ADDR accessed by D:
*SEQ_MIN_OUT <= ADDR (== ADDR & -ALIGN) <= *SEQ_MAX_OUT
In this case, every element accessed by D is aligned to at least
ALIGN bytes.
If ALIGN is zero then instead set *SEG_MAX_OUT so that:
*SEQ_MIN_OUT <= ADDR < *SEQ_MAX_OUT. */
static void
get_segment_min_max (const dr_with_seg_len &d, tree *seg_min_out,
tree *seg_max_out, HOST_WIDE_INT align)
{
/* Each access has the following pattern:
<- |seg_len| ->
<--- A: -ve step --->
+-----+-------+-----+-------+-----+
| n-1 | ,.... | 0 | ..... | n-1 |
+-----+-------+-----+-------+-----+
<--- B: +ve step --->
<- |seg_len| ->
|
base address
where "n" is the number of scalar iterations covered by the segment.
(This should be VF for a particular pair if we know that both steps
are the same, otherwise it will be the full number of scalar loop
iterations.)
A is the range of bytes accessed when the step is negative,
B is the range when the step is positive.
If the access size is "access_size" bytes, the lowest addressed byte is:
base + (step < 0 ? seg_len : 0) [LB]
and the highest addressed byte is always below:
base + (step < 0 ? 0 : seg_len) + access_size [UB]
Thus:
LB <= ADDR < UB
If ALIGN is nonzero, all three values are aligned to at least ALIGN
bytes, so:
LB <= ADDR <= UB - ALIGN
where "- ALIGN" folds naturally with the "+ access_size" and often
cancels it out.
We don't try to simplify LB and UB beyond this (e.g. by using
MIN and MAX based on whether seg_len rather than the stride is
negative) because it is possible for the absolute size of the
segment to overflow the range of a ssize_t.
Keeping the pointer_plus outside of the cond_expr should allow
the cond_exprs to be shared with other alias checks. */
tree indicator = dr_direction_indicator (d.dr);
tree neg_step = fold_build2 (LT_EXPR, boolean_type_node,
fold_convert (ssizetype, indicator),
ssize_int (0));
tree addr_base = fold_build_pointer_plus (DR_BASE_ADDRESS (d.dr),
DR_OFFSET (d.dr));
addr_base = fold_build_pointer_plus (addr_base, DR_INIT (d.dr));
tree seg_len = fold_convert (sizetype, d.seg_len);
tree min_reach = fold_build3 (COND_EXPR, sizetype, neg_step,
seg_len, size_zero_node);
tree max_reach = fold_build3 (COND_EXPR, sizetype, neg_step,
size_zero_node, seg_len);
max_reach = fold_build2 (PLUS_EXPR, sizetype, max_reach,
size_int (d.access_size - align));
*seg_min_out = fold_build_pointer_plus (addr_base, min_reach);
*seg_max_out = fold_build_pointer_plus (addr_base, max_reach);
}
/* Given two data references and segment lengths described by DR_A and DR_B,
create expression checking if the two addresses ranges intersect with
each other:
@ -1768,43 +1819,48 @@ create_intersect_range_checks (struct loop *loop, tree *cond_expr,
if (create_intersect_range_checks_index (loop, cond_expr, dr_a, dr_b))
return;
tree segment_length_a = dr_a.seg_len;
tree segment_length_b = dr_b.seg_len;
tree addr_base_a = DR_BASE_ADDRESS (dr_a.dr);
tree addr_base_b = DR_BASE_ADDRESS (dr_b.dr);
tree offset_a = DR_OFFSET (dr_a.dr), offset_b = DR_OFFSET (dr_b.dr);
offset_a = fold_build2 (PLUS_EXPR, TREE_TYPE (offset_a),
offset_a, DR_INIT (dr_a.dr));
offset_b = fold_build2 (PLUS_EXPR, TREE_TYPE (offset_b),
offset_b, DR_INIT (dr_b.dr));
addr_base_a = fold_build_pointer_plus (addr_base_a, offset_a);
addr_base_b = fold_build_pointer_plus (addr_base_b, offset_b);
tree seg_a_min = addr_base_a;
tree seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a);
/* For negative step, we need to adjust address range by TYPE_SIZE_UNIT
bytes, e.g., int a[3] -> a[1] range is [a+4, a+16) instead of
[a, a+12) */
if (tree_int_cst_compare (DR_STEP (dr_a.dr), size_zero_node) < 0)
unsigned HOST_WIDE_INT min_align;
tree_code cmp_code;
if (TREE_CODE (DR_STEP (dr_a.dr)) == INTEGER_CST
&& TREE_CODE (DR_STEP (dr_b.dr)) == INTEGER_CST)
{
tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a.dr)));
seg_a_min = fold_build_pointer_plus (seg_a_max, unit_size);
seg_a_max = fold_build_pointer_plus (addr_base_a, unit_size);
/* In this case adding access_size to seg_len is likely to give
a simple X * step, where X is either the number of scalar
iterations or the vectorization factor. We're better off
keeping that, rather than subtracting an alignment from it.
In this case the maximum values are exclusive and so there is
no alias if the maximum of one segment equals the minimum
of another. */
min_align = 0;
cmp_code = LE_EXPR;
}
else
{
/* Calculate the minimum alignment shared by all four pointers,
then arrange for this alignment to be subtracted from the
exclusive maximum values to get inclusive maximum values.
This "- min_align" is cumulative with a "+ access_size"
in the calculation of the maximum values. In the best
(and common) case, the two cancel each other out, leaving
us with an inclusive bound based only on seg_len. In the
worst case we're simply adding a smaller number than before.
Because the maximum values are inclusive, there is an alias
if the maximum value of one segment is equal to the minimum
value of the other. */
min_align = MIN (dr_a.align, dr_b.align);
cmp_code = LT_EXPR;
}
tree seg_b_min = addr_base_b;
tree seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b);
if (tree_int_cst_compare (DR_STEP (dr_b.dr), size_zero_node) < 0)
{
tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_b.dr)));
seg_b_min = fold_build_pointer_plus (seg_b_max, unit_size);
seg_b_max = fold_build_pointer_plus (addr_base_b, unit_size);
}
tree seg_a_min, seg_a_max, seg_b_min, seg_b_max;
get_segment_min_max (dr_a, &seg_a_min, &seg_a_max, min_align);
get_segment_min_max (dr_b, &seg_b_min, &seg_b_max, min_align);
*cond_expr
= fold_build2 (TRUTH_OR_EXPR, boolean_type_node,
fold_build2 (LE_EXPR, boolean_type_node, seg_a_max, seg_b_min),
fold_build2 (LE_EXPR, boolean_type_node, seg_b_max, seg_a_min));
fold_build2 (cmp_code, boolean_type_node, seg_a_max, seg_b_min),
fold_build2 (cmp_code, boolean_type_node, seg_b_max, seg_a_min));
}
/* Create a conditional expression that represents the run-time checks for
@ -5271,3 +5327,90 @@ free_data_refs (vec<data_reference_p> datarefs)
free_data_ref (dr);
datarefs.release ();
}
/* Common routine implementing both dr_direction_indicator and
dr_zero_step_indicator. Return USEFUL_MIN if the indicator is known
to be >= USEFUL_MIN and -1 if the indicator is known to be negative.
Return the step as the indicator otherwise. */
static tree
dr_step_indicator (struct data_reference *dr, int useful_min)
{
tree step = DR_STEP (dr);
STRIP_NOPS (step);
/* Look for cases where the step is scaled by a positive constant
integer, which will often be the access size. If the multiplication
doesn't change the sign (due to overflow effects) then we can
test the unscaled value instead. */
if (TREE_CODE (step) == MULT_EXPR
&& TREE_CODE (TREE_OPERAND (step, 1)) == INTEGER_CST
&& tree_int_cst_sgn (TREE_OPERAND (step, 1)) > 0)
{
tree factor = TREE_OPERAND (step, 1);
step = TREE_OPERAND (step, 0);
/* Strip widening and truncating conversions as well as nops. */
if (CONVERT_EXPR_P (step)
&& INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (step, 0))))
step = TREE_OPERAND (step, 0);
tree type = TREE_TYPE (step);
/* Get the range of step values that would not cause overflow. */
widest_int minv = (wi::to_widest (TYPE_MIN_VALUE (ssizetype))
/ wi::to_widest (factor));
widest_int maxv = (wi::to_widest (TYPE_MAX_VALUE (ssizetype))
/ wi::to_widest (factor));
/* Get the range of values that the unconverted step actually has. */
wide_int step_min, step_max;
if (TREE_CODE (step) != SSA_NAME
|| get_range_info (step, &step_min, &step_max) != VR_RANGE)
{
step_min = wi::to_wide (TYPE_MIN_VALUE (type));
step_max = wi::to_wide (TYPE_MAX_VALUE (type));
}
/* Check whether the unconverted step has an acceptable range. */
signop sgn = TYPE_SIGN (type);
if (wi::les_p (minv, widest_int::from (step_min, sgn))
&& wi::ges_p (maxv, widest_int::from (step_max, sgn)))
{
if (wi::ge_p (step_min, useful_min, sgn))
return ssize_int (useful_min);
else if (wi::lt_p (step_max, 0, sgn))
return ssize_int (-1);
else
return fold_convert (ssizetype, step);
}
}
return DR_STEP (dr);
}
/* Return a value that is negative iff DR has a negative step. */
tree
dr_direction_indicator (struct data_reference *dr)
{
return dr_step_indicator (dr, 0);
}
/* Return a value that is zero iff DR has a zero step. */
tree
dr_zero_step_indicator (struct data_reference *dr)
{
return dr_step_indicator (dr, 1);
}
/* Return true if DR is known to have a nonnegative (but possibly zero)
step. */
bool
dr_known_forward_stride_p (struct data_reference *dr)
{
tree indicator = dr_direction_indicator (dr);
tree neg_step_val = fold_binary (LT_EXPR, boolean_type_node,
fold_convert (ssizetype, indicator),
ssize_int (0));
return neg_step_val && integer_zerop (neg_step_val);
}

View File

@ -203,11 +203,20 @@ typedef struct data_reference *data_reference_p;
struct dr_with_seg_len
{
dr_with_seg_len (data_reference_p d, tree len)
: dr (d), seg_len (len) {}
dr_with_seg_len (data_reference_p d, tree len, unsigned HOST_WIDE_INT size,
unsigned int a)
: dr (d), seg_len (len), access_size (size), align (a) {}
data_reference_p dr;
/* The offset of the last access that needs to be checked minus
the offset of the first. */
tree seg_len;
/* A value that, when added to abs (SEG_LEN), gives the total number of
bytes in the segment. */
poly_uint64 access_size;
/* The minimum common alignment of DR's start address, SEG_LEN and
ACCESS_SIZE. */
unsigned int align;
};
/* This struct contains two dr_with_seg_len objects with aliasing data
@ -475,6 +484,10 @@ extern void prune_runtime_alias_test_list (vec<dr_with_seg_len_pair_t> *,
poly_uint64);
extern void create_runtime_alias_checks (struct loop *,
vec<dr_with_seg_len_pair_t> *, tree*);
extern tree dr_direction_indicator (struct data_reference *);
extern tree dr_zero_step_indicator (struct data_reference *);
extern bool dr_known_forward_stride_p (struct data_reference *);
/* Return true when the base objects of data references A and B are
the same memory object. */

View File

@ -2330,16 +2330,12 @@ break_alias_scc_partitions (struct graph *rdg,
static tree
data_ref_segment_size (struct data_reference *dr, tree niters)
{
tree segment_length;
if (integer_zerop (DR_STEP (dr)))
segment_length = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr)));
else
segment_length = size_binop (MULT_EXPR,
fold_convert (sizetype, DR_STEP (dr)),
fold_convert (sizetype, niters));
return segment_length;
niters = size_binop (MINUS_EXPR,
fold_convert (sizetype, niters),
size_one_node);
return size_binop (MULT_EXPR,
fold_convert (sizetype, DR_STEP (dr)),
fold_convert (sizetype, niters));
}
/* Return true if LOOP's latch is dominated by statement for data reference
@ -2394,9 +2390,16 @@ compute_alias_check_pairs (struct loop *loop, vec<ddr_p> *alias_ddrs,
else
seg_length_b = data_ref_segment_size (dr_b, niters);
unsigned HOST_WIDE_INT access_size_a
= tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a))));
unsigned HOST_WIDE_INT access_size_b
= tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_b))));
unsigned int align_a = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_a)));
unsigned int align_b = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_b)));
dr_with_seg_len_pair_t dr_with_seg_len_pair
(dr_with_seg_len (dr_a, seg_length_a),
dr_with_seg_len (dr_b, seg_length_b));
(dr_with_seg_len (dr_a, seg_length_a, access_size_a, align_a),
dr_with_seg_len (dr_b, seg_length_b, access_size_b, align_b));
/* Canonicalize pairs by sorting the two DR members. */
if (comp_res > 0)

View File

@ -169,6 +169,50 @@ vect_mark_for_runtime_alias_test (ddr_p ddr, loop_vec_info loop_vinfo)
return true;
}
/* Record that loop LOOP_VINFO needs to check that VALUE is nonzero. */
static void
vect_check_nonzero_value (loop_vec_info loop_vinfo, tree value)
{
vec<tree> checks = LOOP_VINFO_CHECK_NONZERO (loop_vinfo);
for (unsigned int i = 0; i < checks.length(); ++i)
if (checks[i] == value)
return;
if (dump_enabled_p ())
{
dump_printf_loc (MSG_NOTE, vect_location, "need run-time check that ");
dump_generic_expr (MSG_NOTE, TDF_SLIM, value);
dump_printf (MSG_NOTE, " is nonzero\n");
}
LOOP_VINFO_CHECK_NONZERO (loop_vinfo).safe_push (value);
}
/* Return true if we know that the order of vectorized STMT_A and
vectorized STMT_B will be the same as the order of STMT_A and STMT_B.
At least one of the statements is a write. */
static bool
vect_preserves_scalar_order_p (gimple *stmt_a, gimple *stmt_b)
{
stmt_vec_info stmtinfo_a = vinfo_for_stmt (stmt_a);
stmt_vec_info stmtinfo_b = vinfo_for_stmt (stmt_b);
/* Single statements are always kept in their original order. */
if (!STMT_VINFO_GROUPED_ACCESS (stmtinfo_a)
&& !STMT_VINFO_GROUPED_ACCESS (stmtinfo_b))
return true;
/* STMT_A and STMT_B belong to overlapping groups. All loads in a
group are emitted at the position of the first scalar load and all
stores in a group are emitted at the position of the last scalar store.
Thus writes will happen no earlier than their current position
(but could happen later) while reads will happen no later than their
current position (but could happen earlier). Reordering is therefore
only possible if the first access is a write. */
gimple *earlier_stmt = get_earlier_stmt (stmt_a, stmt_b);
return !DR_IS_WRITE (STMT_VINFO_DATA_REF (vinfo_for_stmt (earlier_stmt)));
}
/* A subroutine of vect_analyze_data_ref_dependence. Handle
DDR_COULD_BE_INDEPENDENT_P ddr DDR that has a known set of dependence
@ -414,22 +458,27 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr,
... = a[i];
a[i+1] = ...;
where loads from the group interleave with the store. */
if (STMT_VINFO_GROUPED_ACCESS (stmtinfo_a)
|| STMT_VINFO_GROUPED_ACCESS (stmtinfo_b))
if (!vect_preserves_scalar_order_p (DR_STMT (dra), DR_STMT (drb)))
{
gimple *earlier_stmt;
earlier_stmt = get_earlier_stmt (DR_STMT (dra), DR_STMT (drb));
if (DR_IS_WRITE
(STMT_VINFO_DATA_REF (vinfo_for_stmt (earlier_stmt))))
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"READ_WRITE dependence in interleaving.\n");
return true;
}
if (!loop->force_vectorize)
{
tree indicator = dr_zero_step_indicator (dra);
if (TREE_CODE (indicator) != INTEGER_CST)
vect_check_nonzero_value (loop_vinfo, indicator);
else if (integer_zerop (indicator))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"READ_WRITE dependence in interleaving."
"\n");
"access also has a zero step\n");
return true;
}
}
continue;
}
@ -3030,38 +3079,57 @@ vect_analyze_data_ref_accesses (vec_info *vinfo)
/* Function vect_vfa_segment_size.
Create an expression that computes the size of segment
that will be accessed for a data reference. The functions takes into
account that realignment loads may access one more vector.
Input:
DR: The data reference.
LENGTH_FACTOR: segment length to consider.
Return an expression whose value is the size of segment which will be
accessed by DR. */
Return a value suitable for the dr_with_seg_len::seg_len field.
This is the "distance travelled" by the pointer from the first
iteration in the segment to the last. Note that it does not include
the size of the access; in effect it only describes the first byte. */
static tree
vect_vfa_segment_size (struct data_reference *dr, tree length_factor)
{
tree segment_length;
length_factor = size_binop (MINUS_EXPR,
fold_convert (sizetype, length_factor),
size_one_node);
return size_binop (MULT_EXPR, fold_convert (sizetype, DR_STEP (dr)),
length_factor);
}
if (integer_zerop (DR_STEP (dr)))
segment_length = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr)));
else
segment_length = size_binop (MULT_EXPR,
fold_convert (sizetype, DR_STEP (dr)),
fold_convert (sizetype, length_factor));
/* Return a value that, when added to abs (vect_vfa_segment_size (dr)),
gives the worst-case number of bytes covered by the segment. */
if (vect_supportable_dr_alignment (dr, false)
== dr_explicit_realign_optimized)
static unsigned HOST_WIDE_INT
vect_vfa_access_size (data_reference *dr)
{
stmt_vec_info stmt_vinfo = vinfo_for_stmt (DR_STMT (dr));
tree ref_type = TREE_TYPE (DR_REF (dr));
unsigned HOST_WIDE_INT ref_size = tree_to_uhwi (TYPE_SIZE_UNIT (ref_type));
unsigned HOST_WIDE_INT access_size = ref_size;
if (GROUP_FIRST_ELEMENT (stmt_vinfo))
{
tree vector_size = TYPE_SIZE_UNIT
(STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr))));
segment_length = size_binop (PLUS_EXPR, segment_length, vector_size);
gcc_assert (GROUP_FIRST_ELEMENT (stmt_vinfo) == DR_STMT (dr));
access_size *= GROUP_SIZE (stmt_vinfo) - GROUP_GAP (stmt_vinfo);
}
return segment_length;
if (STMT_VINFO_VEC_STMT (stmt_vinfo)
&& (vect_supportable_dr_alignment (dr, false)
== dr_explicit_realign_optimized))
{
/* We might access a full vector's worth. */
tree vectype = STMT_VINFO_VECTYPE (stmt_vinfo);
access_size += tree_to_uhwi (TYPE_SIZE_UNIT (vectype)) - ref_size;
}
return access_size;
}
/* Get the minimum alignment for all the scalar accesses that DR describes. */
static unsigned int
vect_vfa_align (const data_reference *dr)
{
return TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr)));
}
/* Function vect_no_alias_p.
@ -3069,13 +3137,15 @@ vect_vfa_segment_size (struct data_reference *dr, tree length_factor)
Given data references A and B with equal base and offset, see whether
the alias relation can be decided at compilation time. Return 1 if
it can and the references alias, 0 if it can and the references do
not alias, and -1 if we cannot decide at compile time. SEGMENT_LENGTH_A
and SEGMENT_LENGTH_B are the memory lengths accessed by A and B
respectively. */
not alias, and -1 if we cannot decide at compile time. SEGMENT_LENGTH_A,
SEGMENT_LENGTH_B, ACCESS_SIZE_A and ACCESS_SIZE_B are the equivalent
of dr_with_seg_len::{seg_len,access_size} for A and B. */
static int
vect_compile_time_alias (struct data_reference *a, struct data_reference *b,
tree segment_length_a, tree segment_length_b)
tree segment_length_a, tree segment_length_b,
unsigned HOST_WIDE_INT access_size_a,
unsigned HOST_WIDE_INT access_size_b)
{
poly_offset_int offset_a = wi::to_poly_offset (DR_INIT (a));
poly_offset_int offset_b = wi::to_poly_offset (DR_INIT (b));
@ -3088,18 +3158,21 @@ vect_compile_time_alias (struct data_reference *a, struct data_reference *b,
if (tree_int_cst_compare (DR_STEP (a), size_zero_node) < 0)
{
const_length_a = (-wi::to_poly_wide (segment_length_a)).force_uhwi ();
offset_a = (offset_a + vect_get_scalar_dr_size (a)) - const_length_a;
offset_a = (offset_a + access_size_a) - const_length_a;
}
else
const_length_a = tree_to_poly_uint64 (segment_length_a);
if (tree_int_cst_compare (DR_STEP (b), size_zero_node) < 0)
{
const_length_b = (-wi::to_poly_wide (segment_length_b)).force_uhwi ();
offset_b = (offset_b + vect_get_scalar_dr_size (b)) - const_length_b;
offset_b = (offset_b + access_size_b) - const_length_b;
}
else
const_length_b = tree_to_poly_uint64 (segment_length_b);
const_length_a += access_size_a;
const_length_b += access_size_b;
if (ranges_known_overlap_p (offset_a, const_length_a,
offset_b, const_length_b))
return 1;
@ -3149,6 +3222,108 @@ dependence_distance_ge_vf (data_dependence_relation *ddr,
return true;
}
/* Dump LOWER_BOUND using flags DUMP_KIND. Dumps are known to be enabled. */
static void
dump_lower_bound (int dump_kind, const vec_lower_bound &lower_bound)
{
dump_printf (dump_kind, "%s (", lower_bound.unsigned_p ? "unsigned" : "abs");
dump_generic_expr (dump_kind, TDF_SLIM, lower_bound.expr);
dump_printf (dump_kind, ") >= ");
dump_dec (dump_kind, lower_bound.min_value);
}
/* Record that the vectorized loop requires the vec_lower_bound described
by EXPR, UNSIGNED_P and MIN_VALUE. */
static void
vect_check_lower_bound (loop_vec_info loop_vinfo, tree expr, bool unsigned_p,
poly_uint64 min_value)
{
vec<vec_lower_bound> lower_bounds = LOOP_VINFO_LOWER_BOUNDS (loop_vinfo);
for (unsigned int i = 0; i < lower_bounds.length (); ++i)
if (operand_equal_p (lower_bounds[i].expr, expr, 0))
{
unsigned_p &= lower_bounds[i].unsigned_p;
min_value = upper_bound (lower_bounds[i].min_value, min_value);
if (lower_bounds[i].unsigned_p != unsigned_p
|| maybe_lt (lower_bounds[i].min_value, min_value))
{
lower_bounds[i].unsigned_p = unsigned_p;
lower_bounds[i].min_value = min_value;
if (dump_enabled_p ())
{
dump_printf_loc (MSG_NOTE, vect_location,
"updating run-time check to ");
dump_lower_bound (MSG_NOTE, lower_bounds[i]);
dump_printf (MSG_NOTE, "\n");
}
}
return;
}
vec_lower_bound lower_bound (expr, unsigned_p, min_value);
if (dump_enabled_p ())
{
dump_printf_loc (MSG_NOTE, vect_location, "need a run-time check that ");
dump_lower_bound (MSG_NOTE, lower_bound);
dump_printf (MSG_NOTE, "\n");
}
LOOP_VINFO_LOWER_BOUNDS (loop_vinfo).safe_push (lower_bound);
}
/* Return true if it's unlikely that the step of the vectorized form of DR
will span fewer than GAP bytes. */
static bool
vect_small_gap_p (loop_vec_info loop_vinfo, data_reference *dr, poly_int64 gap)
{
stmt_vec_info stmt_info = vinfo_for_stmt (DR_STMT (dr));
HOST_WIDE_INT count
= estimated_poly_value (LOOP_VINFO_VECT_FACTOR (loop_vinfo));
if (GROUP_FIRST_ELEMENT (stmt_info))
count *= GROUP_SIZE (vinfo_for_stmt (GROUP_FIRST_ELEMENT (stmt_info)));
return estimated_poly_value (gap) <= count * vect_get_scalar_dr_size (dr);
}
/* Return true if we know that there is no alias between DR_A and DR_B
when abs (DR_STEP (DR_A)) >= N for some N. When returning true, set
*LOWER_BOUND_OUT to this N. */
static bool
vectorizable_with_step_bound_p (data_reference *dr_a, data_reference *dr_b,
poly_uint64 *lower_bound_out)
{
/* Check that there is a constant gap of known sign between DR_A
and DR_B. */
poly_int64 init_a, init_b;
if (!operand_equal_p (DR_BASE_ADDRESS (dr_a), DR_BASE_ADDRESS (dr_b), 0)
|| !operand_equal_p (DR_OFFSET (dr_a), DR_OFFSET (dr_b), 0)
|| !operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0)
|| !poly_int_tree_p (DR_INIT (dr_a), &init_a)
|| !poly_int_tree_p (DR_INIT (dr_b), &init_b)
|| !ordered_p (init_a, init_b))
return false;
/* Sort DR_A and DR_B by the address they access. */
if (maybe_lt (init_b, init_a))
{
std::swap (init_a, init_b);
std::swap (dr_a, dr_b);
}
/* If the two accesses could be dependent within a scalar iteration,
make sure that we'd retain their order. */
if (maybe_gt (init_a + vect_get_scalar_dr_size (dr_a), init_b)
&& !vect_preserves_scalar_order_p (DR_STMT (dr_a), DR_STMT (dr_b)))
return false;
/* There is no alias if abs (DR_STEP) is greater than or equal to
the bytes spanned by the combination of the two accesses. */
*lower_bound_out = init_b + vect_get_scalar_dr_size (dr_b) - init_a;
return true;
}
/* Function vect_prune_runtime_alias_test_list.
Prune a list of ddrs to be tested at run-time by versioning for alias.
@ -3178,6 +3353,19 @@ vect_prune_runtime_alias_test_list (loop_vec_info loop_vinfo)
dump_printf_loc (MSG_NOTE, vect_location,
"=== vect_prune_runtime_alias_test_list ===\n");
/* Step values are irrelevant for aliasing if the number of vector
iterations is equal to the number of scalar iterations (which can
happen for fully-SLP loops). */
bool ignore_step_p = known_eq (LOOP_VINFO_VECT_FACTOR (loop_vinfo), 1U);
if (!ignore_step_p)
{
/* Convert the checks for nonzero steps into bound tests. */
tree value;
FOR_EACH_VEC_ELT (LOOP_VINFO_CHECK_NONZERO (loop_vinfo), i, value)
vect_check_lower_bound (loop_vinfo, value, true, 1);
}
if (may_alias_ddrs.is_empty ())
return true;
@ -3191,9 +3379,12 @@ vect_prune_runtime_alias_test_list (loop_vec_info loop_vinfo)
FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr)
{
int comp_res;
poly_uint64 lower_bound;
struct data_reference *dr_a, *dr_b;
gimple *dr_group_first_a, *dr_group_first_b;
tree segment_length_a, segment_length_b;
unsigned HOST_WIDE_INT access_size_a, access_size_b;
unsigned int align_a, align_b;
gimple *stmt_a, *stmt_b;
/* Ignore the alias if the VF we chose ended up being no greater
@ -3221,6 +3412,64 @@ vect_prune_runtime_alias_test_list (loop_vec_info loop_vinfo)
dr_a = DDR_A (ddr);
stmt_a = DR_STMT (DDR_A (ddr));
dr_b = DDR_B (ddr);
stmt_b = DR_STMT (DDR_B (ddr));
/* Skip the pair if inter-iteration dependencies are irrelevant
and intra-iteration dependencies are guaranteed to be honored. */
if (ignore_step_p
&& (vect_preserves_scalar_order_p (stmt_a, stmt_b)
|| vectorizable_with_step_bound_p (dr_a, dr_b, &lower_bound)))
{
if (dump_enabled_p ())
{
dump_printf_loc (MSG_NOTE, vect_location,
"no need for alias check between ");
dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a));
dump_printf (MSG_NOTE, " and ");
dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b));
dump_printf (MSG_NOTE, " when VF is 1\n");
}
continue;
}
/* See whether we can handle the alias using a bounds check on
the step, and whether that's likely to be the best approach.
(It might not be, for example, if the minimum step is much larger
than the number of bytes handled by one vector iteration.) */
if (!ignore_step_p
&& TREE_CODE (DR_STEP (dr_a)) != INTEGER_CST
&& vectorizable_with_step_bound_p (dr_a, dr_b, &lower_bound)
&& (vect_small_gap_p (loop_vinfo, dr_a, lower_bound)
|| vect_small_gap_p (loop_vinfo, dr_b, lower_bound)))
{
bool unsigned_p = dr_known_forward_stride_p (dr_a);
if (dump_enabled_p ())
{
dump_printf_loc (MSG_NOTE, vect_location, "no alias between ");
dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a));
dump_printf (MSG_NOTE, " and ");
dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b));
dump_printf (MSG_NOTE, " when the step ");
dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_STEP (dr_a));
dump_printf (MSG_NOTE, " is outside ");
if (unsigned_p)
dump_printf (MSG_NOTE, "[0");
else
{
dump_printf (MSG_NOTE, "(");
dump_dec (MSG_NOTE, poly_int64 (-lower_bound));
}
dump_printf (MSG_NOTE, ", ");
dump_dec (MSG_NOTE, lower_bound);
dump_printf (MSG_NOTE, ")\n");
}
vect_check_lower_bound (loop_vinfo, DR_STEP (dr_a), unsigned_p,
lower_bound);
continue;
}
dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a));
if (dr_group_first_a)
{
@ -3228,8 +3477,6 @@ vect_prune_runtime_alias_test_list (loop_vec_info loop_vinfo)
dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a));
}
dr_b = DDR_B (ddr);
stmt_b = DR_STMT (DDR_B (ddr));
dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b));
if (dr_group_first_b)
{
@ -3237,12 +3484,24 @@ vect_prune_runtime_alias_test_list (loop_vec_info loop_vinfo)
dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b));
}
if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0))
length_factor = scalar_loop_iters;
if (ignore_step_p)
{
segment_length_a = size_zero_node;
segment_length_b = size_zero_node;
}
else
length_factor = size_int (vect_factor);
segment_length_a = vect_vfa_segment_size (dr_a, length_factor);
segment_length_b = vect_vfa_segment_size (dr_b, length_factor);
{
if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0))
length_factor = scalar_loop_iters;
else
length_factor = size_int (vect_factor);
segment_length_a = vect_vfa_segment_size (dr_a, length_factor);
segment_length_b = vect_vfa_segment_size (dr_b, length_factor);
}
access_size_a = vect_vfa_access_size (dr_a);
access_size_b = vect_vfa_access_size (dr_b);
align_a = vect_vfa_align (dr_a);
align_b = vect_vfa_align (dr_b);
comp_res = data_ref_compare_tree (DR_BASE_ADDRESS (dr_a),
DR_BASE_ADDRESS (dr_b));
@ -3259,7 +3518,22 @@ vect_prune_runtime_alias_test_list (loop_vec_info loop_vinfo)
{
int res = vect_compile_time_alias (dr_a, dr_b,
segment_length_a,
segment_length_b);
segment_length_b,
access_size_a,
access_size_b);
if (res >= 0 && dump_enabled_p ())
{
dump_printf_loc (MSG_NOTE, vect_location,
"can tell at compile time that ");
dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a));
dump_printf (MSG_NOTE, " and ");
dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b));
if (res == 0)
dump_printf (MSG_NOTE, " do not alias\n");
else
dump_printf (MSG_NOTE, " alias\n");
}
if (res == 0)
continue;
@ -3273,8 +3547,8 @@ vect_prune_runtime_alias_test_list (loop_vec_info loop_vinfo)
}
dr_with_seg_len_pair_t dr_with_seg_len_pair
(dr_with_seg_len (dr_a, segment_length_a),
dr_with_seg_len (dr_b, segment_length_b));
(dr_with_seg_len (dr_a, segment_length_a, access_size_a, align_a),
dr_with_seg_len (dr_b, segment_length_b, access_size_b, align_b));
/* Canonicalize pairs by sorting the two DR members. */
if (comp_res > 0)
@ -3287,6 +3561,7 @@ vect_prune_runtime_alias_test_list (loop_vec_info loop_vinfo)
unsigned int count = (comp_alias_ddrs.length ()
+ check_unequal_addrs.length ());
dump_printf_loc (MSG_NOTE, vect_location,
"improved number of alias checks from %d to %d\n",
may_alias_ddrs.length (), count);

View File

@ -2875,6 +2875,31 @@ vect_create_cond_for_unequal_addrs (loop_vec_info loop_vinfo, tree *cond_expr)
}
}
/* Create an expression that is true when all lower-bound conditions for
the vectorized loop are met. Chain this condition with *COND_EXPR. */
static void
vect_create_cond_for_lower_bounds (loop_vec_info loop_vinfo, tree *cond_expr)
{
vec<vec_lower_bound> lower_bounds = LOOP_VINFO_LOWER_BOUNDS (loop_vinfo);
for (unsigned int i = 0; i < lower_bounds.length (); ++i)
{
tree expr = lower_bounds[i].expr;
tree type = unsigned_type_for (TREE_TYPE (expr));
expr = fold_convert (type, expr);
poly_uint64 bound = lower_bounds[i].min_value;
if (!lower_bounds[i].unsigned_p)
{
expr = fold_build2 (PLUS_EXPR, type, expr,
build_int_cstu (type, bound - 1));
bound += bound - 1;
}
tree part_cond_expr = fold_build2 (GE_EXPR, boolean_type_node, expr,
build_int_cstu (type, bound));
chain_cond_expr (cond_expr, part_cond_expr);
}
}
/* Function vect_create_cond_for_alias_checks.
Create a conditional expression that represents the run-time checks for
@ -2986,6 +3011,7 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
if (version_alias)
{
vect_create_cond_for_unequal_addrs (loop_vinfo, &cond_expr);
vect_create_cond_for_lower_bounds (loop_vinfo, &cond_expr);
vect_create_cond_for_alias_checks (loop_vinfo, &cond_expr);
}

View File

@ -2475,6 +2475,7 @@ again:
}
}
/* Free optimized alias test DDRS. */
LOOP_VINFO_LOWER_BOUNDS (loop_vinfo).truncate (0);
LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo).release ();
LOOP_VINFO_CHECK_UNEQUAL_ADDRS (loop_vinfo).release ();
/* Reset target cost data. */
@ -3673,6 +3674,18 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
/* Count LEN - 1 ANDs and LEN comparisons. */
(void) add_stmt_cost (target_cost_data, len * 2 - 1, scalar_stmt,
NULL, 0, vect_prologue);
len = LOOP_VINFO_LOWER_BOUNDS (loop_vinfo).length ();
if (len)
{
/* Count LEN - 1 ANDs and LEN comparisons. */
unsigned int nstmts = len * 2 - 1;
/* +1 for each bias that needs adding. */
for (unsigned int i = 0; i < len; ++i)
if (!LOOP_VINFO_LOWER_BOUNDS (loop_vinfo)[i].unsigned_p)
nstmts += 1;
(void) add_stmt_cost (target_cost_data, nstmts, scalar_stmt,
NULL, 0, vect_prologue);
}
dump_printf (MSG_NOTE,
"cost model: Adding cost of checks for loop "
"versioning aliasing.\n");

View File

@ -174,6 +174,18 @@ typedef struct _slp_instance {
loop to be valid. */
typedef std::pair<tree, tree> vec_object_pair;
/* Records that vectorization is only possible if abs (EXPR) >= MIN_VALUE.
UNSIGNED_P is true if we can assume that abs (EXPR) == EXPR. */
struct vec_lower_bound {
vec_lower_bound () {}
vec_lower_bound (tree e, bool u, poly_uint64 m)
: expr (e), unsigned_p (u), min_value (m) {}
tree expr;
bool unsigned_p;
poly_uint64 min_value;
};
/* Vectorizer state common between loop and basic-block vectorization. */
struct vec_info {
enum vec_kind { bb, loop };
@ -406,6 +418,14 @@ typedef struct _loop_vec_info : public vec_info {
/* Check that the addresses of each pair of objects is unequal. */
auto_vec<vec_object_pair> check_unequal_addrs;
/* List of values that are required to be nonzero. This is used to check
whether things like "x[i * n] += 1;" are safe and eventually gets added
to the checks for lower bounds below. */
auto_vec<tree> check_nonzero;
/* List of values that need to be checked for a minimum value. */
auto_vec<vec_lower_bound> lower_bounds;
/* Statements in the loop that have data references that are candidates for a
runtime (loop versioning) misalignment check. */
auto_vec<gimple *> may_misalign_stmts;
@ -514,6 +534,8 @@ typedef struct _loop_vec_info : public vec_info {
#define LOOP_VINFO_MAY_ALIAS_DDRS(L) (L)->may_alias_ddrs
#define LOOP_VINFO_COMP_ALIAS_DDRS(L) (L)->comp_alias_ddrs
#define LOOP_VINFO_CHECK_UNEQUAL_ADDRS(L) (L)->check_unequal_addrs
#define LOOP_VINFO_CHECK_NONZERO(L) (L)->check_nonzero
#define LOOP_VINFO_LOWER_BOUNDS(L) (L)->lower_bounds
#define LOOP_VINFO_GROUPED_STORES(L) (L)->grouped_stores
#define LOOP_VINFO_SLP_INSTANCES(L) (L)->slp_instances
#define LOOP_VINFO_SLP_UNROLLING_FACTOR(L) (L)->slp_unrolling_factor
@ -534,7 +556,8 @@ typedef struct _loop_vec_info : public vec_info {
((L)->may_misalign_stmts.length () > 0)
#define LOOP_REQUIRES_VERSIONING_FOR_ALIAS(L) \
((L)->comp_alias_ddrs.length () > 0 \
|| (L)->check_unequal_addrs.length () > 0)
|| (L)->check_unequal_addrs.length () > 0 \
|| (L)->lower_bounds.length () > 0)
#define LOOP_REQUIRES_VERSIONING_FOR_NITERS(L) \
(LOOP_VINFO_NITERS_ASSUMPTIONS (L))
#define LOOP_REQUIRES_VERSIONING(L) \