This patch implements the use of the stack clash mitigation for aarch64.
In Aarch64 we expect both the probing interval and the guard size to be 64KB
and we enforce them to always be equal.
We also probe up by 1024 bytes in the general case when a probe is required.
AArch64 has the following probing conditions:
1a) Any initial adjustment less than 63KB requires no probing. An ABI defined
safe buffer of 1Kbytes is used and a page size of 64k is assumed.
b) Any final adjustment residual requires a probe at SP + 1KB.
We know this to be safe since you would have done at least one page worth
of allocations already to get to that point.
c) Any final adjustment more than remainder (total allocation amount) larger
than 1K - LR offset requires a probe at SP.
safe buffer mentioned in 1a is maintained by the storing of FP/LR.
In the case of -fomit-frame-pointer we can still count on LR being stored
if the function makes a call, even if it's a tail call. The AArch64 frame
layout code guarantees this and tests have been added to check against
this particular case.
2) Any allocations larger than 1 page size, is done in increments of page size
and probed up by 1KB leaving the residuals.
3a) Any residual for initial adjustment that is less than guard-size - 1KB
requires no probing. Essentially this is a sliding window. The probing
range determines the ABI safe buffer, and the amount to be probed up.
Incrementally allocating less than the probing thresholds, e.g. recursive functions will
not be an issue as the storing of LR counts as a probe.
+-------------------+
| ABI SAFE REGION |
+------------------------------
| | |
| | |
| | |
| | |
| | |
| | |
maximum amount | | |
not needing a | | |
probe | | |
| | |
| | |
| | |
| | | Probe offset when
| ---------------------------- probe is required
| | |
+-------- +-------------------+ -------- Point of first probe
| ABI SAFE REGION |
---------------------
| |
| |
| |
Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Target was tested with stack clash on and off by default.
GLIBC testsuite also ran with stack clash on by default and no new
regressions.
Co-Authored-By: Richard Sandiford <richard.sandiford@linaro.org>
Co-Authored-By: Tamar Christina <tamar.christina@arm.com>
From-SVN: r264747
Currently some target supports checks such as vect_int cache their
results in a manner that would cause them not to be rechecked when
running the same tests against a different variant in a multi variant
run. This causes tests to be skipped or run when they shouldn't be.
there is already an existing caching mechanism in place that does the
caching correctly, but presumably these weren't used because some of these
tests originally only contained static data. e.g. only checked if the target is
aarch64*-*-* etc.
This patch changes every function that needs to do any caching at all to use
check_cached_effective_target which will cache per variant instead of globally.
For those tests that already parameterize over et_index I have created
check_cached_effective_target_indexed to handle this common case by creating a list
containing the property name and the current value of et_index.
These changes result in a much simpler implementation for most tests and a large
reduction in lines for target-supports.exp.
Regtested on
aarch64-none-elf
x86_64-pc-linux-gnu
powerpc64-unknown-linux-gnu
arm-none-eabi
and no testsuite errors. Difference would depend on your site.exp.
On arm we get about 4500 new testcases and on aarch64 the low 10s.
On PowerPC and x86_64 no changes as expected since the default exp for these
just test the default configuration.
What this means for new target checks is that they should always use either
check_cached_effective_target or check_cached_effective_target_indexed if the
result of the check is to be cached.
As an example the new vect_int looks like
proc check_effective_target_vect_int { } {
return [check_cached_effective_target_indexed <name> {
expr {
<condition>
}}]
}
The debug information that was once there is now all hidden in
check_cached_effective_target, (called from check_cached_effective_target_indexed)
and so the only thing you are required to do is give it a unique cache name and a condition.
The condition doesn't need to be an if statement so simple boolean expressions are enough here:
[istarget i?86-*-*] || [istarget x86_64-*-*]
|| ([istarget powerpc*-*-*]
&& ![istarget powerpc-*-linux*paired*])
|| ...
From-SVN: r264745
Avoid constants to end up in the limm field for particular
instructions when compiling for size.
gcc/
xxxx-xx-xx Claudiu Zissulescu <claziss@synopsys.com>
* config/arc/arc.md (*add_n): Clean up pattern, update instruction
constraints.
(ashlsi3_insn): Update instruction constraints.
(ashrsi3_insn): Likewise.
(rotrsi3): Likewise.
(add_shift): Likewise.
* config/arc/constraints.md (Csz): New 32 bit constraint. It
avoids placing in the limm field small constants which, otherwise,
could end into a small instruction.
testsuite/
xxxx-xx-xx Claudiu Zissulescu <claziss@synopsys.com>
* gcc.target/arc/tph_addx.c: New test.
From-SVN: r264737
gcc/
Claudiu Zissulescu <claziss@synopsys.com>
* config/arc/arc.md (maddsidi4_split): Don't use dmac if the
destination register is not odd-even.
(umaddsidi4_split): Likewise.
gcc/testsuite/
Claudiu Zissulescu <claziss@synopsys.com>
* gcc.target/arc/tmac-3.c: New file.
From-SVN: r264736
2018-10-01 Richard Biener <rguenther@suse.de>
* tree-inline.c (expand_call_inline): Store origin of fn
in BLOCK_ABSTRACT_ORIGIN for the inline BLOCK.
* tree.c (block_ultimate_origin): Simplify and do some
checking.
From-SVN: r264734
for gcc/ada/ChangeLog
* gcc-interface/lang-specs.h (default_compilers): When given
fcompare-debug-second, adjust auxbase like cc1, and pass
gnatd_A.
* gcc-interface/misc.c (flag_compare_debug): Remove variable.
(gnat_post_options): Do not set it.
* lib-writ.adb (flag_compare_debug): Remove import.
(Write_ALI): Do not test it.
From-SVN: r264732
* config/i386/mmx.md (EMMS): New int iterator.
(emms): New int attribute.
(mmx_<emms>): Macroize insn from *mmx_emms and *mmx_femms using
EMMS int iterator. Explicitly declare clobbers.
(mmx_emms): Remove expander.
(mmx_femms): Ditto.
* config/i386/predicates.md (emms_operation): Remove predicate.
(vzeroall_pattern): New predicate.
(vzeroupper_pattern): Rename from vzeroupper_operation.
* config/i386/i386.c (ix86_avx_u128_mode_after): Use
vzeroupper_pattern and vzeroall_pattern predicates.
From-SVN: r264727
gcc/
PR rtl-optimization/86939
* ira-lives.c (make_hard_regno_born): Rename from this...
(make_hard_regno_live): ... to this. Remove update to conflict
information. Update function comment.
(make_hard_regno_dead): Add conflict information update. Update
function comment.
(make_object_born): Rename from this...
(make_object_live): ... to this. Remove update to conflict information.
Update function comment.
(make_object_dead): Add conflict information update. Update function
comment.
(mark_pseudo_regno_live): Call make_object_live.
(mark_pseudo_regno_subword_live): Likewise.
(mark_hard_reg_dead): Update function comment.
(mark_hard_reg_live): Call make_hard_regno_live.
(process_bb_node_lives): Likewise.
* lra-lives.c (make_hard_regno_born): Rename from this...
(make_hard_regno_live): ... to this. Remove update to conflict
information. Remove now uneeded check_pic_pseudo_p argument.
Update function comment.
(make_hard_regno_dead): Add check_pic_pseudo_p argument and add update
to conflict information. Update function comment.
(mark_pseudo_live): Remove update to conflict information. Update
function comment.
(mark_pseudo_dead): Add conflict information update.
(mark_regno_live): Call make_hard_regno_live.
(mark_regno_dead): Call make_hard_regno_dead with new arguement.
(process_bb_lives): Call make_hard_regno_live and make_hard_regno_dead.
From-SVN: r264726
2018-09-30 Paul Thomas <pault@gcc.gnu.org>
PR fortran/87359
* trans-array.c (gfc_is_reallocatable_lhs): Correct the problem
introduced by r264358, which prevented components of associate
names from being reallocated on assignment.
2018-09-30 Paul Thomas <pault@gcc.gnu.org>
PR fortran/87359
* gfortran.dg/associate_40.f90 : New test.
From-SVN: r264725
2018-09-30 Paul Thomas <pault@gcc.gnu.org>
PR fortran/70752
PR fortran/72709
* trans-array.c (gfc_conv_scalarized_array_ref): If this is a
deferred type and the info->descriptor is present, use the
info->descriptor
(gfc_conv_array_ref): Is the se expr is a descriptor type, pass
it as 'decl' rather than the symbol backend_decl.
(gfc_array_allocate): If the se string_length is a component
reference, fix it and use it for the expression string length
if the latter is not a variable type. If it is a variable do
an assignment. Make use of component ref string lengths to set
the descriptor 'span'.
(gfc_conv_expr_descriptor): For pointer assignment, do not set
the span field if gfc_get_array_span returns zero.
* trans.c (get_array_span): If the upper bound a character type
is zero, use the descriptor span if available.
2018-09-30 Paul Thomas <pault@gcc.gnu.org>
PR fortran/70752
PR fortran/72709
* gfortran.dg/deferred_character_25.f90 : New test.
* gfortran.dg/deferred_character_26.f90 : New test.
* gfortran.dg/deferred_character_27.f90 : New test to verify
that PR82617 remains fixed.
From-SVN: r264724
2018-09-30 Paul Thomas <pault@gcc.gnu.org>
PR fortran/70149
* trans-decl.c (gfc_get_symbol_decl): A deferred character
length pointer that is initialized needs the string length to
be initialized as well.
2018-09-30 Paul Thomas <pault@gcc.gnu.org>
PR fortran/70149
* gfortran.dg/deferred_character_24.f90 : New test.
From-SVN: r264721
When passing and returning BLKmode values in 2 integer registers, use
1 TImode register instead of 2 DImode registers. Otherwise, V1TImode
may be used to move and store such BLKmode values, which prevent RTL
optimizations.
gcc/
PR target/87370
* config/i386/i386.c (construct_container): Use TImode for
BLKmode values in 2 integer registers.
gcc/testsuite/
PR target/87370
* gcc.target/i386/pr87370.c: New test.
From-SVN: r264716
2018-09-29 Paul Thomas <pault@gcc.gnu.org>
PR fortran/65667
* trans-expr.c (gfc_trans_assignment_1): If there is dependency
fix the rse stringlength.
2018-09-29 Paul Thomas <pault@gcc.gnu.org>
PR fortran/65667
* gfortran.dg/dependency_52.f90 : New test.
From-SVN: r264715
* builtins.c (unterminated_array): Pass in c_strlen_data * to
c_strlen rather than just a tree *.
(c_strlen): Change NONSTR argument to a c_strlen_data pointer.
Update recursive calls appropriately. If caller did not provide a
suitable data pointer, create a local one. When a non-terminated
string is discovered, bubble up information about the string via the
c_strlen_data object.
* builtins.h (c_strlen): Update prototype.
(c_strlen_data): New structure.
* gimple-fold.c (get_range_strlen): Update calls to c_strlen.
For a type 2 call, if c_strlen indicates a non-terminated string
use the length of the non-terminated string.
(gimple_fold_builtin_stpcpy): Update calls to c_strlen.
From-SVN: r264712
PR target/87467
* config/i386/avx512fintrin.h (_mm512_abs_pd, _mm512_mask_abs_pd): Use
__m512d type for __A argument rather than __m512.
* gcc.target/i386/avx512f-abspd-1.c (SIZE): Divide by two.
(CALC): Use double instead of float.
(TEST): Adjust to test _mm512_abs_pd and _mm512_mask_abs_pd rather than
_mm512_abs_ps and _mm512_mask_abs_ps.
From-SVN: r264711
* doc/xml/gnu/fdl-1.3.xml: The Free Software Foundation web
site now uses https. Also omit the unnecessary trailing slash.
* doc/xml/gnu/gpl-3.0.xml: Ditto.
From-SVN: r264710
* match.pd (simple_comparison): Don't optimize if either operand is
a function pointer when target needs function pointer canonicalization.
From-SVN: r264705
Now that e.g. ASM_CPU_POWER5_SPEC is always "-mpower5" it is clearer and
easier to just write that directly.
* config/rs6000/driver-rs6000.c (asm_names): Adjust the entries for
power5 .. power9 to remove indirection.
* config/rs6000/rs6000.h (ASM_CPU_POWER5_SPEC, ASM_CPU_POWER6_SPEC,
ASM_CPU_POWER7_SPEC, ASM_CPU_POWER8_SPEC, ASM_CPU_POWER9_SPEC,
ASM_CPU_476_SPEC): Delete.
(ASM_CPU_SPEC): Adjust.
(EXTRA_SPECS): Delete asm_cpu_power5, asm_cpu_power6, asm_cpu_power7,
asm_cpu_power8, asm_cpu_power9, asm_cpu_476.
From-SVN: r264704
All supported assemblers know lwsync, so we never need to generate this
instruction using the .long escape hatch.
* config.in (HAVE_AS_LWSYNC): Delete.
* config/powerpcspe/powerpcspe.h (TARGET_LWSYNC_INSTRUCTION): Delete.
* config/powerpcspe/sync.md (*lwsync): Always generate lwsync, never
do it as a .long .
* config/rs6000/rs6000.h (TARGET_LWSYNC_INSTRUCTION): Delete.
* config/rs6000/sync.md (*lwsync): Always generate lwsync, never do it
as a .long .
* configure.ac: Delete HAVE_AS_LWSYNC.
* configure: Regenerate.
From-SVN: r264702
* calls.c (expand_call): Try to do a tail call for thunks at -O0 too.
* cgraph.h (struct cgraph_thunk_info): Add indirect_offset.
(cgraph_node::create_thunk): Add indirect_offset parameter.
(thunk_adjust): Likewise.
* cgraph.c (cgraph_node::create_thunk): Add indirect_offset parameter
and initialize the corresponding field with it.
(cgraph_node::dump): Dump indirect_offset field.
* cgraphclones.c (duplicate_thunk_for_node): Deal with indirect_offset.
* cgraphunit.c (cgraph_node::analyze): Be prepared for external thunks.
(thunk_adjust): Add indirect_offset parameter and deal with it.
(cgraph_node::expand_thunk): Deal with the indirect_offset field and
pass it to thunk_adjust. Do not call the target hook if it's non-zero
or if the thunk is external or local. Fix formatting. Do not chain
the RESULT_DECL to BLOCK_VARS. Pass the static chain to the target,
if any, in the GIMPLE representation.
* ipa-icf.c (sem_function::equals_wpa): Deal with indirect_offset.
* lto-cgraph.c (lto_output_node): Write indirect_offset field.
(input_node): Read indirect_offset field.
* tree-inline.c (expand_call_inline): Pass indirect_offset field in the
call to thunk_adjust.
* tree-nested.c (struct nesting_info): Add thunk_p field.
(create_nesting_tree): Set it.
(convert_all_function_calls): Copy static chain from targets to thunks.
(finalize_nesting_tree_1): Return early for thunks.
(unnest_nesting_tree_1): Do not finalize thunks.
(gimplify_all_functions): Do not gimplify thunks.
cp/
* method.c (use_thunk): Adjust call to cgraph_node::create_thunk.
ada/
* gcc-interface/decl.c (is_cplusplus_method): Do not require C++
convention on Interfaces.
* gcc-interface/trans.c (Subprogram_Body_to_gnu): Try to create a
bona-fide thunk and hand it over to the middle-end.
(get_controlling_type): New function.
(use_alias_for_thunk_p): Likewise.
(thunk_labelno): New static variable.
(make_covariant_thunk): New function.
(maybe_make_gnu_thunk): Likewise.
* gcc-interface/utils.c (finish_subprog_decl): Set DECL_CONTEXT of the
result DECL here instead of...
(end_subprog_body): ...here.
Co-Authored-By: Pierre-Marie de Rodat <derodat@adacore.com>
From-SVN: r264701
As noted at Cauldron, dumpfile.c currently emits "note: " for all kinds
of dump message, so that (after filtering) there's no distinction between
MSG_OPTIMIZED_LOCATIONS vs MSG_NOTE vs MSG_MISSED_OPTIMIZATION in the
textual output.
This patch changes dumpfile.c so that the "note: " varies to show
which MSG_* was used, with the string prefix matching that used for
filtering in -fopt-info, hence e.g.
directive_unroll_3.f90:24:0: optimized: loop unrolled 7 times
and:
pr19210-1.c:24:3: missed: missed loop optimization: niters analysis ends up with assumptions.
The patch adds "dg-optimized" and "dg-missed" directives for use
in the testsuite for matching these (with -fopt-info on stderr; they
don't help for dumpfile output).
The patch also converts the various problem-reporting dump messages
in coverage.c:get_coverage_counts to use MSG_MISSED_OPTIMIZATION
rather than MSG_OPTIMIZED_LOCATIONS, as the docs call out "optimized"
as
"information when an optimization is successfully applied",
whereas "missed" is for
"information about missed optimizations",
and problems with profile data seem to me to fall much more into the
latter category than the former. Doing so requires converting a few
tests from using "-fopt-info" (which is implicitly
"-fopt-info-optimized-optall") to getting the "missed" optimizations.
Changing them to "-fopt-info-missed" added lots of noise from the
vectorizer, so I changed these tests to use "-fopt-info-missed-ipa".
gcc/ChangeLog:
* coverage.c (get_coverage_counts): Convert problem-reporting dump
messages from MSG_OPTIMIZED_LOCATIONS to MSG_MISSED_OPTIMIZATION.
* dumpfile.c (kind_as_string): New function.
(dump_loc): Rather than a hardcoded prefix of "note: ", use
kind_as_string to vary the prefix based on dump_kind.
(selftest::test_capture_of_dump_calls): Update for above.
gcc/testsuite/ChangeLog:
* c-c++-common/unroll-1.c: Update expected output from "note" to
"optimized".
* c-c++-common/unroll-2.c: Likewise.
* c-c++-common/unroll-3.c: Likewise.
* g++.dg/tree-ssa/dom-invalid.C: Update expected output from
dg-message to dg-missed. Convert param from -fopt-info to
-fopt-info-missed-ipa.
* g++.dg/tree-ssa/pr81408.C: Update expected output from
dg-message to dg-missed.
* g++.dg/vect/slp-pr56812.cc: Update expected output from
dg-message to dg-optimized.
* gcc.dg/pr26570.c: Update expected output from dg-message to
dg-missed. Convert param from -fopt-info to
-fopt-info-missed-ipa.
* gcc.dg/pr32773.c: Likewise.
* gcc.dg/tree-ssa/pr19210-1.c: Update expected output from
dg-message to dg-missed.
* gcc.dg/unroll-2.c: Update expected output from dg-message to
dg-optimized.
* gcc.dg/vect/nodump-vect-opt-info-1.c: Likewise. Convert param
from -fopt-info to -fopt-info-vec.
* gfortran.dg/directive_unroll_1.f90: Update expected output from
"note" to "optimized".
* gfortran.dg/directive_unroll_2.f90: Likewise.
* gfortran.dg/directive_unroll_3.f90: Likewise.
* gnat.dg/unroll4.adb: Likewise.
* lib/gcc-dg.exp (dg-optimized): New procedure.
(dg-missed): New procedure.
From-SVN: r264697
As reported in
<https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01684.html>, some
fp-int-convert tests fail after my fix for PR c/87390, in Arm /
AArch64 configurations where _Float16 uses excess precision by
default. The issue is comparisons of the results of a conversion by
assignment (compile-time or run-time) from integer to floating-point
with the original integer value; previously this would compare against
an implicit compile-time conversion to the target type, but now, for
C11 and later, it compares against an implicit compile-time conversion
to a possibly wider evaluation format. This is fixed by adding casts
to the test so that the comparison is with a value converted
explicitly to the target type at compile time, without any use of a
wider evaluation format.
PR c/87390
* gcc.dg/torture/fp-int-convert.h (TEST_I_F_VAL): Convert integer
values explicitly to target type for comparison.
From-SVN: r264696
* config/i386/i386.h (SSE_REGNO): Fix check for FIRST_REX_SSE_REG.
(GET_SSE_REGNO): Rename from SSE_REGNO. Update all uses for rename.
From-SVN: r264695
* config/i386/i386.h (CC_REGNO): Remove FPSR_REGS.
* config/i386/i386.c (ix86_fixed_condition_code_regs): Use
INVALID_REGNUM instead of FPSR_REG.
(ix86_md_asm_adjust): Do not clobber FPSR_REG.
* config/i386/i386.md: Update comment of FP compares.
(fldenv): Do not clobber FPSR_REG.
From-SVN: r264694
Fix a bug in the parser code that decides whether a given name should
be considered exported or not. The function Lex::is_exported_name
(which assumes that its input is a mangled name) was being called on
non-mangled (raw utf-8) names in various places. For the bug in
question this caused an imported package to be registered under the
wrong name. To fix the issue, rename 'Lex::is_exported_name' to
'Lex::is_exported_mangled_name', and add a new 'Lex::is_exported_name'
that works on utf-8 strings.
Fixesgolang/go#27836.
Reviewed-on: https://go-review.googlesource.com/137736
From-SVN: r264690
This patch was part of the original patch we acquired from Honza and Martin.
It simplifies nested vec_merge operations using the same mask.
Self-tests are included.
2018-09-28 Andrew Stubbs <ams@codesourcery.com>
Jan Hubicka <jh@suse.cz>
Martin Jambor <mjambor@suse.cz>
* simplify-rtx.c (simplify_merge_mask): New function.
(simplify_ternary_operation): Use it, also see if VEC_MERGEs with the
same masks are used in op1 or op2.
(test_vec_merge): New function.
(test_vector_ops): Call test_vec_merge.
Co-Authored-By: Jan Hubicka <jh@suse.cz>
Co-Authored-By: Martin Jambor <mjambor@suse.cz>
From-SVN: r264688
This fixes the one remaining case where the stricter vec_splat checking
complains in the testsuite.
* g++.dg/ext/altivec-6.C: Change the vec_splat second argument to a
valid value, in the "vector bool int" case.
From-SVN: r264681
This deletes most HAVE_AS_* that determine if the assembler supports
some ISA level (and also HAVE_AS_MFPGPR and HAVE_AS_DFP).
These are not useful: we will only generate an instruction that requires
some newer ISA if the user specifically asked for it (with -mcpu=, say).
If the assembler cannot handle that, it is fine if it gives an error.
They also hurt: it increases the number of possible situations that all
need handling and all need testing. We do not handle all cases, and
obviously do not test all either.
This patch removes:
HAVE_AS_POPCNTB (power5, 2.02)
HAVE_AS_FPRND (power5+, 2.04)
HAVE_AS_CMPB (power6, 2.05)
HAVE_AS_POPCNTD (power7, 2.06)
HAVE_AS_POWER8 (power8, 2.07)
HAVE_AS_POWER9 (power9, 3.0)
HAVE_AS_DFP (power6, 2.05, server)
HAVE_AS_MFPGPR (power6x but not later, not arch)
PR target/87149
* config.in (HAVE_AS_CMPB, HAVE_AS_DFP, HAVE_AS_FPRND, HAVE_AS_MFPGPR,
HAVE_AS_POPCNTB, HAVE_AS_POPCNTD, HAVE_AS_POWER8, HAVE_AS_POWER9):
Delete, always treat as true.
* config/powerpcspe/powerpcspe.c (rs6000_option_override_internal):
Ditto. Simplify remaining code.
* config/powerpcspe/powerpcspe.h: Ditto.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Ditto.
Simplify remaining code.
(rs6000_expand_builtin): Ditto.
* config/rs6000/rs6000.h: Ditto.
* configure.ac: Ditto.
* configure: Regenerate.
From-SVN: r264675
2018-09-27 Richard Biener <rguenther@suse.de>
PR testsuite/87451
* gcc.dg/debug/dwarf2/inline5.c: Deal with different comment characters.
From-SVN: r264668