Updated stack-clash implementation supporting 64k probes.

This patch implements the use of the stack clash mitigation for aarch64.
In Aarch64 we expect both the probing interval and the guard size to be 64KB
and we enforce them to always be equal.

We also probe up by 1024 bytes in the general case when a probe is required.

AArch64 has the following probing conditions:

 1a) Any initial adjustment less than 63KB requires no probing.  An ABI defined
     safe buffer of 1Kbytes is used and a page size of 64k is assumed.

  b) Any final adjustment residual requires a probe at SP + 1KB.
     We know this to be safe since you would have done at least one page worth
     of allocations already to get to that point.

  c) Any final adjustment more than remainder (total allocation amount) larger
     than 1K - LR offset requires a probe at SP.


  safe buffer mentioned in 1a is maintained by the storing of FP/LR.
  In the case of -fomit-frame-pointer we can still count on LR being stored
  if the function makes a call, even if it's a tail call.  The AArch64 frame
  layout code guarantees this and tests have been added to check against
  this particular case.

 2) Any allocations larger than 1 page size, is done in increments of page size
    and probed up by 1KB leaving the residuals.

 3a) Any residual for initial adjustment that is less than guard-size - 1KB
     requires no probing.  Essentially this is a sliding window.  The probing
     range determines the ABI safe buffer, and the amount to be probed up.

Incrementally allocating less than the probing thresholds, e.g. recursive functions will
not be an issue as the storing of LR counts as a probe.


                            +-------------------+                                    
                            |  ABI SAFE REGION  |                                    
                  +------------------------------                                    
                  |         |                   |                                    
                  |         |                   |                                    
                  |         |                   |                                    
                  |         |                   |                                    
                  |         |                   |                                    
                  |         |                   |                                    
 maximum amount   |         |                   |                                    
 not needing a    |         |                   |                                    
 probe            |         |                   |                                    
                  |         |                   |                                    
                  |         |                   |                                    
                  |         |                   |                                    
                  |         |                   |        Probe offset when           
                  |         ---------------------------- probe is required           
                  |         |                   |                                    
                  +-------- +-------------------+ --------  Point of first probe     
                            |  ABI SAFE REGION  |                                    
                            ---------------------                                    
                            |                   |                                    
                            |                   |                                    
                            |                   |                                         

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Target was tested with stack clash on and off by default.

GLIBC testsuite also ran with stack clash on by default and no new
regressions.


Co-Authored-By: Richard Sandiford <richard.sandiford@linaro.org>
Co-Authored-By: Tamar Christina <tamar.christina@arm.com>

From-SVN: r264747
This commit is contained in:
Jeff Law 2018-10-01 06:49:35 -06:00 committed by Tamar Christina
parent 041bfa6f07
commit cd1bef27d2
26 changed files with 593 additions and 21 deletions

View File

@ -1,3 +1,19 @@
2018-10-01 Jeff Law <law@redhat.com>
Richard Sandiford <richard.sandiford@linaro.org>
Tamar Christina <tamar.christina@arm.com>
PR target/86486
* config/aarch64/aarch64.md
(probe_stack_range): Add k (SP) constraint.
* config/aarch64/aarch64.h (STACK_CLASH_CALLER_GUARD,
STACK_CLASH_MAX_UNROLL_PAGES): New.
* config/aarch64/aarch64.c (aarch64_output_probe_stack_range): Emit
stack probes for stack clash.
(aarch64_allocate_and_probe_stack_space): New.
(aarch64_expand_prologue): Use it.
(aarch64_expand_epilogue): Likewise and update IP regs re-use criteria.
(aarch64_sub_sp): Add emit_move_imm optional param.
2018-10-01 MCC CS <deswurstes@users.noreply.github.com>
PR tree-optimization/87261

View File

@ -2816,10 +2816,11 @@ aarch64_add_sp (rtx temp1, rtx temp2, poly_int64 delta, bool emit_move_imm)
if nonnull. */
static inline void
aarch64_sub_sp (rtx temp1, rtx temp2, poly_int64 delta, bool frame_related_p)
aarch64_sub_sp (rtx temp1, rtx temp2, poly_int64 delta, bool frame_related_p,
bool emit_move_imm = true)
{
aarch64_add_offset (Pmode, stack_pointer_rtx, stack_pointer_rtx, -delta,
temp1, temp2, frame_related_p);
temp1, temp2, frame_related_p, emit_move_imm);
}
/* Set DEST to (vec_series BASE STEP). */
@ -3979,13 +3980,33 @@ aarch64_output_probe_stack_range (rtx reg1, rtx reg2)
/* Loop. */
ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, loop_lab);
HOST_WIDE_INT stack_clash_probe_interval
= 1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE);
/* TEST_ADDR = TEST_ADDR + PROBE_INTERVAL. */
xops[0] = reg1;
xops[1] = GEN_INT (PROBE_INTERVAL);
HOST_WIDE_INT interval;
if (flag_stack_clash_protection)
interval = stack_clash_probe_interval;
else
interval = PROBE_INTERVAL;
gcc_assert (aarch64_uimm12_shift (interval));
xops[1] = GEN_INT (interval);
output_asm_insn ("sub\t%0, %0, %1", xops);
/* Probe at TEST_ADDR. */
output_asm_insn ("str\txzr, [%0]", xops);
/* If doing stack clash protection then we probe up by the ABI specified
amount. We do this because we're dropping full pages at a time in the
loop. But if we're doing non-stack clash probing, probe at SP 0. */
if (flag_stack_clash_protection)
xops[1] = GEN_INT (STACK_CLASH_CALLER_GUARD);
else
xops[1] = CONST0_RTX (GET_MODE (xops[1]));
/* Probe at TEST_ADDR. If we're inside the loop it is always safe to probe
by this amount for each iteration. */
output_asm_insn ("str\txzr, [%0, %1]", xops);
/* Test if TEST_ADDR == LAST_ADDR. */
xops[1] = reg2;
@ -4794,6 +4815,188 @@ aarch64_set_handled_components (sbitmap components)
cfun->machine->reg_is_wrapped_separately[regno] = true;
}
/* Allocate POLY_SIZE bytes of stack space using TEMP1 and TEMP2 as scratch
registers. If POLY_SIZE is not large enough to require a probe this function
will only adjust the stack. When allocating the stack space
FRAME_RELATED_P is then used to indicate if the allocation is frame related.
FINAL_ADJUSTMENT_P indicates whether we are allocating the outgoing
arguments. If we are then we ensure that any allocation larger than the ABI
defined buffer needs a probe so that the invariant of having a 1KB buffer is
maintained.
We emit barriers after each stack adjustment to prevent optimizations from
breaking the invariant that we never drop the stack more than a page. This
invariant is needed to make it easier to correctly handle asynchronous
events, e.g. if we were to allow the stack to be dropped by more than a page
and then have multiple probes up and we take a signal somewhere in between
then the signal handler doesn't know the state of the stack and can make no
assumptions about which pages have been probed. */
static void
aarch64_allocate_and_probe_stack_space (rtx temp1, rtx temp2,
poly_int64 poly_size,
bool frame_related_p,
bool final_adjustment_p)
{
HOST_WIDE_INT guard_size
= 1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE);
HOST_WIDE_INT guard_used_by_caller = STACK_CLASH_CALLER_GUARD;
/* When doing the final adjustment for the outgoing argument size we can't
assume that LR was saved at position 0. So subtract it's offset from the
ABI safe buffer so that we don't accidentally allow an adjustment that
would result in an allocation larger than the ABI buffer without
probing. */
HOST_WIDE_INT min_probe_threshold
= final_adjustment_p
? guard_used_by_caller - cfun->machine->frame.reg_offset[LR_REGNUM]
: guard_size - guard_used_by_caller;
poly_int64 frame_size = cfun->machine->frame.frame_size;
/* We should always have a positive probe threshold. */
gcc_assert (min_probe_threshold > 0);
if (flag_stack_clash_protection && !final_adjustment_p)
{
poly_int64 initial_adjust = cfun->machine->frame.initial_adjust;
poly_int64 final_adjust = cfun->machine->frame.final_adjust;
if (known_eq (frame_size, 0))
{
dump_stack_clash_frame_info (NO_PROBE_NO_FRAME, false);
}
else if (known_lt (initial_adjust, guard_size - guard_used_by_caller)
&& known_lt (final_adjust, guard_used_by_caller))
{
dump_stack_clash_frame_info (NO_PROBE_SMALL_FRAME, true);
}
}
HOST_WIDE_INT size;
/* If SIZE is not large enough to require probing, just adjust the stack and
exit. */
if (!poly_size.is_constant (&size)
|| known_lt (poly_size, min_probe_threshold)
|| !flag_stack_clash_protection)
{
aarch64_sub_sp (temp1, temp2, poly_size, frame_related_p);
return;
}
if (dump_file)
fprintf (dump_file,
"Stack clash AArch64 prologue: " HOST_WIDE_INT_PRINT_DEC " bytes"
", probing will be required.\n", size);
/* Round size to the nearest multiple of guard_size, and calculate the
residual as the difference between the original size and the rounded
size. */
HOST_WIDE_INT rounded_size = ROUND_DOWN (size, guard_size);
HOST_WIDE_INT residual = size - rounded_size;
/* We can handle a small number of allocations/probes inline. Otherwise
punt to a loop. */
if (rounded_size <= STACK_CLASH_MAX_UNROLL_PAGES * guard_size)
{
for (HOST_WIDE_INT i = 0; i < rounded_size; i += guard_size)
{
aarch64_sub_sp (NULL, temp2, guard_size, true);
emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
guard_used_by_caller));
emit_insn (gen_blockage ());
}
dump_stack_clash_frame_info (PROBE_INLINE, size != rounded_size);
}
else
{
/* Compute the ending address. */
aarch64_add_offset (Pmode, temp1, stack_pointer_rtx, -rounded_size,
temp1, NULL, false, true);
rtx_insn *insn = get_last_insn ();
/* For the initial allocation, we don't have a frame pointer
set up, so we always need CFI notes. If we're doing the
final allocation, then we may have a frame pointer, in which
case it is the CFA, otherwise we need CFI notes.
We can determine which allocation we are doing by looking at
the value of FRAME_RELATED_P since the final allocations are not
frame related. */
if (frame_related_p)
{
/* We want the CFA independent of the stack pointer for the
duration of the loop. */
add_reg_note (insn, REG_CFA_DEF_CFA,
plus_constant (Pmode, temp1, rounded_size));
RTX_FRAME_RELATED_P (insn) = 1;
}
/* This allocates and probes the stack. Note that this re-uses some of
the existing Ada stack protection code. However we are guaranteed not
to enter the non loop or residual branches of that code.
The non-loop part won't be entered because if our allocation amount
doesn't require a loop, the case above would handle it.
The residual amount won't be entered because TEMP1 is a mutliple of
the allocation size. The residual will always be 0. As such, the only
part we are actually using from that code is the loop setup. The
actual probing is done in aarch64_output_probe_stack_range. */
insn = emit_insn (gen_probe_stack_range (stack_pointer_rtx,
stack_pointer_rtx, temp1));
/* Now reset the CFA register if needed. */
if (frame_related_p)
{
add_reg_note (insn, REG_CFA_DEF_CFA,
plus_constant (Pmode, stack_pointer_rtx, rounded_size));
RTX_FRAME_RELATED_P (insn) = 1;
}
emit_insn (gen_blockage ());
dump_stack_clash_frame_info (PROBE_LOOP, size != rounded_size);
}
/* Handle any residuals. Residuals of at least MIN_PROBE_THRESHOLD have to
be probed. This maintains the requirement that each page is probed at
least once. For initial probing we probe only if the allocation is
more than GUARD_SIZE - buffer, and for the outgoing arguments we probe
if the amount is larger than buffer. GUARD_SIZE - buffer + buffer ==
GUARD_SIZE. This works that for any allocation that is large enough to
trigger a probe here, we'll have at least one, and if they're not large
enough for this code to emit anything for them, The page would have been
probed by the saving of FP/LR either by this function or any callees. If
we don't have any callees then we won't have more stack adjustments and so
are still safe. */
if (residual)
{
HOST_WIDE_INT residual_probe_offset = guard_used_by_caller;
/* If we're doing final adjustments, and we've done any full page
allocations then any residual needs to be probed. */
if (final_adjustment_p && rounded_size != 0)
min_probe_threshold = 0;
/* If doing a small final adjustment, we always probe at offset 0.
This is done to avoid issues when LR is not at position 0 or when
the final adjustment is smaller than the probing offset. */
else if (final_adjustment_p && rounded_size == 0)
residual_probe_offset = 0;
aarch64_sub_sp (temp1, temp2, residual, frame_related_p);
if (residual >= min_probe_threshold)
{
if (dump_file)
fprintf (dump_file,
"Stack clash AArch64 prologue residuals: "
HOST_WIDE_INT_PRINT_DEC " bytes, probing will be required."
"\n", residual);
emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
residual_probe_offset));
emit_insn (gen_blockage ());
}
}
}
/* Add a REG_CFA_EXPRESSION note to INSN to say that register REG
is saved at BASE + OFFSET. */
@ -4821,7 +5024,7 @@ aarch64_add_cfa_expression (rtx_insn *insn, unsigned int reg,
| local variables | <-- frame_pointer_rtx
| |
+-------------------------------+
| padding0 | \
| padding | \
+-------------------------------+ |
| callee-saved registers | | frame.saved_regs_size
+-------------------------------+ |
@ -4840,7 +5043,23 @@ aarch64_add_cfa_expression (rtx_insn *insn, unsigned int reg,
Dynamic stack allocations via alloca() decrease stack_pointer_rtx
but leave frame_pointer_rtx and hard_frame_pointer_rtx
unchanged. */
unchanged.
By default for stack-clash we assume the guard is at least 64KB, but this
value is configurable to either 4KB or 64KB. We also force the guard size to
be the same as the probing interval and both values are kept in sync.
With those assumptions the callee can allocate up to 63KB (or 3KB depending
on the guard size) of stack space without probing.
When probing is needed, we emit a probe at the start of the prologue
and every PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE bytes thereafter.
We have to track how much space has been allocated and the only stores
to the stack we track as implicit probes are the FP/LR stores.
For outgoing arguments we probe if the size is larger than 1KB, such that
the ABI specified buffer is maintained for the next callee. */
/* Generate the prologue instructions for entry into a function.
Establish the stack frame by decreasing the stack pointer with a
@ -4889,7 +5108,16 @@ aarch64_expand_prologue (void)
rtx ip0_rtx = gen_rtx_REG (Pmode, IP0_REGNUM);
rtx ip1_rtx = gen_rtx_REG (Pmode, IP1_REGNUM);
aarch64_sub_sp (ip0_rtx, ip1_rtx, initial_adjust, true);
/* In theory we should never have both an initial adjustment
and a callee save adjustment. Verify that is the case since the
code below does not handle it for -fstack-clash-protection. */
gcc_assert (known_eq (initial_adjust, 0) || callee_adjust == 0);
/* Will only probe if the initial adjustment is larger than the guard
less the amount of the guard reserved for use by the caller's
outgoing args. */
aarch64_allocate_and_probe_stack_space (ip0_rtx, ip1_rtx, initial_adjust,
true, false);
if (callee_adjust != 0)
aarch64_push_regs (reg1, reg2, callee_adjust);
@ -4945,7 +5173,11 @@ aarch64_expand_prologue (void)
callee_adjust != 0 || emit_frame_chain);
aarch64_save_callee_saves (DFmode, callee_offset, V0_REGNUM, V31_REGNUM,
callee_adjust != 0 || emit_frame_chain);
aarch64_sub_sp (ip1_rtx, ip0_rtx, final_adjust, !frame_pointer_needed);
/* We may need to probe the final adjustment if it is larger than the guard
that is assumed by the called. */
aarch64_allocate_and_probe_stack_space (ip1_rtx, ip0_rtx, final_adjust,
!frame_pointer_needed, true);
}
/* Return TRUE if we can use a simple_return insn.
@ -4985,10 +5217,21 @@ aarch64_expand_epilogue (bool for_sibcall)
/* A stack clash protection prologue may not have left IP0_REGNUM or
IP1_REGNUM in a usable state. The same is true for allocations
with an SVE component, since we then need both temporary registers
for each allocation. */
for each allocation. For stack clash we are in a usable state if
the adjustment is less than GUARD_SIZE - GUARD_USED_BY_CALLER. */
HOST_WIDE_INT guard_size
= 1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE);
HOST_WIDE_INT guard_used_by_caller = STACK_CLASH_CALLER_GUARD;
/* We can re-use the registers when the allocation amount is smaller than
guard_size - guard_used_by_caller because we won't be doing any probes
then. In such situations the register should remain live with the correct
value. */
bool can_inherit_p = (initial_adjust.is_constant ()
&& final_adjust.is_constant ()
&& !flag_stack_clash_protection);
&& final_adjust.is_constant ())
&& (!flag_stack_clash_protection
|| known_lt (initial_adjust,
guard_size - guard_used_by_caller));
/* We need to add memory barrier to prevent read from deallocated stack. */
bool need_barrier_p
@ -5016,8 +5259,10 @@ aarch64_expand_epilogue (bool for_sibcall)
hard_frame_pointer_rtx, -callee_offset,
ip1_rtx, ip0_rtx, callee_adjust == 0);
else
aarch64_add_sp (ip1_rtx, ip0_rtx, final_adjust,
!can_inherit_p || df_regs_ever_live_p (IP1_REGNUM));
/* The case where we need to re-use the register here is very rare, so
avoid the complicated condition and just always emit a move if the
immediate doesn't fit. */
aarch64_add_sp (ip1_rtx, ip0_rtx, final_adjust, true);
aarch64_restore_callee_saves (DImode, callee_offset, R0_REGNUM, R30_REGNUM,
callee_adjust != 0, &cfi_ops);

View File

@ -84,6 +84,14 @@
#define LONG_DOUBLE_TYPE_SIZE 128
/* This value is the amount of bytes a caller is allowed to drop the stack
before probing has to be done for stack clash protection. */
#define STACK_CLASH_CALLER_GUARD 1024
/* This value controls how many pages we manually unroll the loop for when
generating stack clash probes. */
#define STACK_CLASH_MAX_UNROLL_PAGES 4
/* The architecture reserves all bits of the address for hardware use,
so the vbit must go into the delta field of pointers to member
functions. This is the same config as that in the AArch32

View File

@ -6503,7 +6503,7 @@
)
(define_insn "probe_stack_range"
[(set (match_operand:DI 0 "register_operand" "=r")
[(set (match_operand:DI 0 "register_operand" "=rk")
(unspec_volatile:DI [(match_operand:DI 1 "register_operand" "0")
(match_operand:DI 2 "register_operand" "r")]
UNSPECV_PROBE_STACK_RANGE))]

View File

@ -1,3 +1,31 @@
2018-10-01 Jeff Law <law@redhat.com>
Richard Sandiford <richard.sandiford@linaro.org>
Tamar Christina <tamar.christina@arm.com>
PR target/86486
* gcc.target/aarch64/stack-check-12.c: New.
* gcc.target/aarch64/stack-check-13.c: New.
* gcc.target/aarch64/stack-check-cfa-1.c: New.
* gcc.target/aarch64/stack-check-cfa-2.c: New.
* gcc.target/aarch64/stack-check-prologue-1.c: New.
* gcc.target/aarch64/stack-check-prologue-10.c: New.
* gcc.target/aarch64/stack-check-prologue-11.c: New.
* gcc.target/aarch64/stack-check-prologue-12.c: New.
* gcc.target/aarch64/stack-check-prologue-13.c: New.
* gcc.target/aarch64/stack-check-prologue-14.c: New.
* gcc.target/aarch64/stack-check-prologue-15.c: New.
* gcc.target/aarch64/stack-check-prologue-2.c: New.
* gcc.target/aarch64/stack-check-prologue-3.c: New.
* gcc.target/aarch64/stack-check-prologue-4.c: New.
* gcc.target/aarch64/stack-check-prologue-5.c: New.
* gcc.target/aarch64/stack-check-prologue-6.c: New.
* gcc.target/aarch64/stack-check-prologue-7.c: New.
* gcc.target/aarch64/stack-check-prologue-8.c: New.
* gcc.target/aarch64/stack-check-prologue-9.c: New.
* gcc.target/aarch64/stack-check-prologue.h: New.
* lib/target-supports.exp
(check_effective_target_supports_stack_clash_protection): Add AArch64.
2018-10-01 Tamar Christina <tamar.christina@arm.com>
* lib/target-supports.exp (check_cached_effective_target_indexed): New.

View File

@ -0,0 +1,22 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16 -fno-asynchronous-unwind-tables -fno-unwind-tables" } */
/* { dg-require-effective-target supports_stack_clash_protection } */
extern void arf (unsigned long int *, unsigned long int *);
void
frob ()
{
unsigned long int num[10000];
unsigned long int den[10000];
arf (den, num);
}
/* This verifies that the scheduler did not break the dependencies
by adjusting the offsets within the probe and that the scheduler
did not reorder around the stack probes. */
/* { dg-final { scan-assembler-times {sub\tsp, sp, #65536\n\tstr\txzr, \[sp, 1024\]} 2 } } */
/* There is some residual allocation, but we don't care about that. Only that it's not probed. */
/* { dg-final { scan-assembler-times {str\txzr, } 2 } } */

View File

@ -0,0 +1,28 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16 -fno-asynchronous-unwind-tables -fno-unwind-tables" } */
/* { dg-require-effective-target supports_stack_clash_protection } */
#define ARG32(X) X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X
#define ARG192(X) ARG32(X),ARG32(X),ARG32(X),ARG32(X),ARG32(X),ARG32(X)
void out1(ARG192(__int128));
int t1(int);
int t3(int x)
{
if (x < 1000)
return t1 (x) + 1;
out1 (ARG192(1));
return 0;
}
/* This test creates a large (> 1k) outgoing argument area that needs
to be probed. We don't test the exact size of the space or the
exact offset to make the test a little less sensitive to trivial
output changes. */
/* { dg-final { scan-assembler-times "sub\\tsp, sp, #....\\n\\tstr\\txzr, \\\[sp" 1 } } */

View File

@ -0,0 +1,12 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16 -funwind-tables" } */
/* { dg-require-effective-target supports_stack_clash_protection } */
#define SIZE 128*1024
#include "stack-check-prologue.h"
/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 65536} 1 } } */
/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 131072} 1 } } */
/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 0} 1 } } */
/* Checks that the CFA notes are correct for every sp adjustment. */

View File

@ -0,0 +1,13 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16 -funwind-tables" } */
/* { dg-require-effective-target supports_stack_clash_protection } */
#define SIZE 1280*1024 + 512
#include "stack-check-prologue.h"
/* { dg-final { scan-assembler-times {\.cfi_def_cfa [0-9]+, 1310720} 1 } } */
/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 1311232} 1 } } */
/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 1310720} 1 } } */
/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 0} 1 } } */
/* Checks that the CFA notes are correct for every sp adjustment. */

View File

@ -0,0 +1,10 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
/* { dg-require-effective-target supports_stack_clash_protection } */
#define SIZE 128
#include "stack-check-prologue.h"
/* { dg-final { scan-assembler-times {str\s+xzr,} 0 } } */
/* SIZE is smaller than guard-size - 1Kb so no probe expected. */

View File

@ -0,0 +1,11 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
/* { dg-require-effective-target supports_stack_clash_protection } */
#define SIZE (6 * 64 * 1024) + (1 * 63 * 1024) + 512
#include "stack-check-prologue.h"
/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 2 } } */
/* SIZE is more than 4x guard-size and remainder larger than guard-size - 1Kb,
1 probe expected in a loop and 1 residual probe. */

View File

@ -0,0 +1,11 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
/* { dg-require-effective-target supports_stack_clash_protection } */
#define SIZE (6 * 64 * 1024) + (1 * 32 * 1024)
#include "stack-check-prologue.h"
/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 1 } } */
/* SIZE is more than 4x guard-size and remainder larger than guard-size - 1Kb,
1 probe expected in a loop and 1 residual probe. */

View File

@ -0,0 +1,15 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16 -fomit-frame-pointer -momit-leaf-frame-pointer" } */
/* { dg-require-effective-target supports_stack_clash_protection } */
void
f (void)
{
volatile int x[16384 + 1000];
x[0] = 0;
}
/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 1 } } */
/* SIZE is more than 1 guard-size, but only one 64KB page is used, expect only 1
probe. Leaf function and omitting leaf pointers. */

View File

@ -0,0 +1,20 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16 -fomit-frame-pointer -momit-leaf-frame-pointer" } */
/* { dg-require-effective-target supports_stack_clash_protection } */
void h (void) __attribute__ ((noreturn));
void
f (void)
{
volatile int x[16384 + 1000];
x[30]=0;
h ();
}
/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 1 } } */
/* { dg-final { scan-assembler-times {str\s+x30, \[sp\]} 1 } } */
/* SIZE is more than 1 guard-size, but only one 64KB page is used, expect only 1
probe. Leaf function and omitting leaf pointers, tail call to noreturn which
may only omit an epilogue and not a prologue. Checking for LR saving. */

View File

@ -0,0 +1,24 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16 -fomit-frame-pointer -momit-leaf-frame-pointer" } */
/* { dg-require-effective-target supports_stack_clash_protection } */
void h (void) __attribute__ ((noreturn));
void
f (void)
{
volatile int x[16384 + 1000];
if (x[0])
h ();
x[345] = 1;
h ();
}
/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 1 } } */
/* { dg-final { scan-assembler-times {str\s+x30, \[sp\]} 1 } } */
/* SIZE is more than 1 guard-size, two 64k pages used, expect only 1 explicit
probe at 1024 and one implicit probe due to LR being saved. Leaf function
and omitting leaf pointers, tail call to noreturn which may only omit an
epilogue and not a prologue and control flow in between. Checking for
LR saving. */

View File

@ -0,0 +1,23 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16 -fomit-frame-pointer -momit-leaf-frame-pointer" } */
/* { dg-require-effective-target supports_stack_clash_protection } */
void g (volatile int *x) ;
void h (void) __attribute__ ((noreturn));
void
f (void)
{
volatile int x[16384 + 1000];
g (x);
h ();
}
/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 1 } } */
/* { dg-final { scan-assembler-times {str\s+x30, \[sp\]} 1 } } */
/* SIZE is more than 1 guard-size, two 64k pages used, expect only 1 explicit
probe at 1024 and one implicit probe due to LR being saved. Leaf function
and omitting leaf pointers, normal function call followed by a tail call to
noreturn which may only omit an epilogue and not a prologue and control flow
in between. Checking for LR saving. */

View File

@ -0,0 +1,10 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
/* { dg-require-effective-target supports_stack_clash_protection } */
#define SIZE 2 * 1024
#include "stack-check-prologue.h"
/* { dg-final { scan-assembler-times {str\s+xzr,} 0 } } */
/* SIZE is smaller than guard-size - 1Kb so no probe expected. */

View File

@ -0,0 +1,11 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
/* { dg-require-effective-target supports_stack_clash_protection } */
#define SIZE 63 * 1024
#include "stack-check-prologue.h"
/* { dg-final { scan-assembler-times {str\s+xzr,} 1 } } */
/* SIZE is exactly guard-size - 1Kb, boundary condition so 1 probe expected.
*/

View File

@ -0,0 +1,11 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
/* { dg-require-effective-target supports_stack_clash_protection } */
#define SIZE 63 * 1024 + 512
#include "stack-check-prologue.h"
/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 1 } } */
/* SIZE is more than guard-size - 1Kb and remainder is less than 1kB,
1 probe expected. */

View File

@ -0,0 +1,11 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
/* { dg-require-effective-target supports_stack_clash_protection } */
#define SIZE 64 * 1024
#include "stack-check-prologue.h"
/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 1 } } */
/* SIZE is more than guard-size - 1Kb and remainder is zero,
1 probe expected, boundary condition. */

View File

@ -0,0 +1,11 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
/* { dg-require-effective-target supports_stack_clash_protection } */
#define SIZE 65 * 1024
#include "stack-check-prologue.h"
/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 1 } } */
/* SIZE is more than guard-size - 1Kb and remainder is equal to 1kB,
1 probe expected. */

View File

@ -0,0 +1,11 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
/* { dg-require-effective-target supports_stack_clash_protection } */
#define SIZE 127 * 1024
#include "stack-check-prologue.h"
/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 2 } } */
/* SIZE is more than 1x guard-size and remainder equal than guard-size - 1Kb,
2 probe expected, unrolled, no loop. */

View File

@ -0,0 +1,10 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
/* { dg-require-effective-target supports_stack_clash_protection } */
#define SIZE 128 * 1024
#include "stack-check-prologue.h"
/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 2 } } */
/* SIZE is more than 2x guard-size and no remainder, unrolled, no loop. */

View File

@ -0,0 +1,11 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
/* { dg-require-effective-target supports_stack_clash_protection } */
#define SIZE 6 * 64 * 1024
#include "stack-check-prologue.h"
/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 1 } } */
/* SIZE is more than 4x guard-size and no remainder, 1 probe expected in a loop
and no residual probe. */

View File

@ -0,0 +1,5 @@
int f_test (int x)
{
char arr[SIZE];
return arr[x];
}

View File

@ -8385,14 +8385,9 @@ proc check_effective_target_autoincdec { } {
#
proc check_effective_target_supports_stack_clash_protection { } {
# Temporary until the target bits are fully ACK'd.
# if { [istarget aarch*-*-*] } {
# return 1
# }
if { [istarget x86_64-*-*] || [istarget i?86-*-*]
|| [istarget powerpc*-*-*] || [istarget rs6000*-*-*]
|| [istarget s390*-*-*] } {
|| [istarget aarch64*-**] || [istarget s390*-*-*] } {
return 1
}
return 0