Fix target/101934: aarch64 memset code creates unaligned stores for -mstrict-align

The problem here is the aarch64_expand_setmem code did not check
STRICT_ALIGNMENT if it is creating an overlapping store.
This patch adds that check and the testcase works.

gcc/ChangeLog:

	PR target/101934
	* config/aarch64/aarch64.c (aarch64_expand_setmem):
	Check STRICT_ALIGNMENT before creating an overlapping
	store.

gcc/testsuite/ChangeLog:

	PR target/101934
	* gcc.target/aarch64/memset-strict-align-1.c: New test.
This commit is contained in:
Andrew Pinski 2021-08-31 04:41:14 +00:00
parent c4d6dcacfc
commit a45786e9a3
2 changed files with 30 additions and 2 deletions

View File

@ -23566,8 +23566,8 @@ aarch64_expand_setmem (rtx *operands)
/* Do certain trailing copies as overlapping if it's going to be
cheaper. i.e. less instructions to do so. For instance doing a 15
byte copy it's more efficient to do two overlapping 8 byte copies than
8 + 4 + 2 + 1. */
if (n > 0 && n < copy_limit / 2)
8 + 4 + 2 + 1. Only do this when -mstrict-align is not supplied. */
if (n > 0 && n < copy_limit / 2 && !STRICT_ALIGNMENT)
{
next_mode = smallest_mode_for_size (n, MODE_INT);
int n_bits = GET_MODE_BITSIZE (next_mode).to_constant ();

View File

@ -0,0 +1,28 @@
/* { dg-do compile } */
/* { dg-options "-Os -mstrict-align" } */
struct s { char x[95]; };
void foo (struct s *);
void bar (void) { struct s s1 = {}; foo (&s1); }
/* memset (s1 = {}, sizeof = 95) should be expanded out
such that there are no overlap stores when -mstrict-align
is in use.
so 2 pair 16 bytes stores (64 bytes).
1 16 byte stores
1 8 byte store
1 4 byte store
1 2 byte store
1 1 byte store
*/
/* { dg-final { scan-assembler-times "stp\tq" 2 } } */
/* { dg-final { scan-assembler-times "str\tq" 1 } } */
/* { dg-final { scan-assembler-times "str\txzr" 1 } } */
/* { dg-final { scan-assembler-times "str\twzr" 1 } } */
/* { dg-final { scan-assembler-times "strh\twzr" 1 } } */
/* { dg-final { scan-assembler-times "strb\twzr" 1 } } */
/* Also one store pair for the frame-pointer and the LR. */
/* { dg-final { scan-assembler-times "stp\tx" 1 } } */