crt1.o is used to create dynamic and non-PIE static executables. Static
PIE needs to link with rcrt1.o, instead of crt1.o, which is also used by
musl libc and OpenBSD:
https://gcc.gnu.org/ml/gcc/2015-06/msg00008.html
to relocate static PIE at run-time. When -pg is used with -static-pie,
grcrt1.o should be used.
* config/gnu-user.h (GNU_USER_TARGET_STARTFILE_SPEC): Use
rcrt1.o%s/grcrt1.o%s for -static-pie.
From-SVN: r254890
* ipa-cp.c (update_profiling_info): Handle conversion to local
profile.
* tree-cfg.c (execute_fixup_cfg): Do fixup same way as inliner does.
From-SVN: r254885
* gimple-ssa-evrp.c (class evrp_range_analyzer): New class extracted
from evrp_dom_walker class. Various methods moved into new class.
(evrp_range_analyzer::evrp_range_analyzer): Constructor for new class.
(evrp_range_analyzer::enter): New method.
(evrp_range_analyzer::leave): New method.
(evrp_dom_walker): Remove delegators no longer needed by this class.
Replace vr_values data member with evrp_range_analyzer
From-SVN: r254884
* gimple-ssa-evrp.c (evrp_dom_walker): Add cleanup method.
Add private copy constructor and move assignment operators.
Privatize methods and class data where trivially possible.
(evrp_dom_walker::cleanup): New function, extracted from
execute_early_vrp. Simplify access to class data.
From-SVN: r254882
* vr-values.h (get_output_for_vrp): Prototype.
* vr-values.c (get_output_for_vrp): New function extracted from
vrp_visit_assignment_or_call and extract_range_from_stmt.
(vrp_visit_assignment_or_call): Use get_output_for_vrp. Simplify.
From-SVN: r254880
Disabling software prefetching and switching the autoprefetcher to weak improves
CPU2017 rate and speed benchmarks for both int and fp sets on Falkor.
SPECrate 2017 fp is up 0.38%
SPECspeed 2017 fp is up 0.54%
SPECrate 2017 int is up 3.02%
SPECspeed 2017 int is up 3.16%
There are only a couple individual regressions. The biggest one being about 4%
in parest.
For SPEC2006, we've noticed the following:
SPECint is up 0.91%
SPECfp is stable
In the case of SPEC2006 we noticed both a big regression in mcf (about 20%)
and a big improvement for hmmer (about 40%).
Since the overall result is positive, we would like to make these new tuning
settings the default for Falkor.
We may revisit the software prefetcher setting in the future, in case we
can adjust it enough so it provides us a good balance between improvements and
regressions (mcf). But for now it is best if it stays off.
2017-11-17 Luis Machado <luis.machado@linaro.org>
gcc/
* config/aarch64/aarch64.c
(qdf24xx_prefetch_tune) <default_opt_level>: Set to -1.
(qdf24xx_tunings) <autoprefetcher_model>: Set to
tune_params::AUTOPREFETCHER_WEAK.
From-SVN: r254879
Control-flow Enforcement Technology (CET), published by Intel,
introduces the Shadow Stack feature, which ensures a return from a
function is done to exactly the same location from where the function
was called. When EH is present the control-flow transfer may skip some
stack frames and the shadow stack has to be adjusted not to signal a
violation of a control-flow transfer. It's done by counting a number
of skiping frames and adjasting shadow stack pointer by this number.
Having new semantic of the 'ret' instruction if CET is supported in HW
the 'ret' instruction cannot be generated in ix86_expand_epilogue when
we are returning after EH is processed. Added a code in
ix86_expand_epilogue to adjust Shadow Stack pointer and to generate an
indirect jump instead of 'ret'. As sp register is used during this
adjustment thus the argument in pro_epilogue_adjust_stack is changed
to update cfa_reg based on whether control-flow instrumentation is set.
Without updating the cfa_reg field there is an assert later in dwarf2
pass related to mismatch the stack register and cfa_reg value.
gcc/
* config/i386/i386.c (ix86_expand_epilogue): Change simple
return to indirect jump for EH return if control-flow protection
is enabled. Change explicit 'false' argument in
pro_epilogue_adjust_stack with a value of flag_cf_protection.
* config/i386/i386.md (simple_return_indirect_internal): Remove
SImode restriction to support 64-bit.
libgcc/
* config/i386/linux-unwind.h: Include
config/i386/shadow-stack-unwind.h.
* config/i386/shadow-stack-unwind.h: New file.
* unwind-dw2.c: (uw_install_context): Add a frame parameter and
pass it to _Unwind_Frames_Extra.
* unwind-generic.h (_Unwind_Frames_Extra): New.
* unwind.inc (_Unwind_RaiseException_Phase2): Add frames_p
parameter. Add local variable frames to count number of frames.
(_Unwind_ForcedUnwind_Phase2): Likewise.
(_Unwind_RaiseException): Add local variable frames to count
number of frames, pass it to _Unwind_RaiseException_Phase2 and
uw_install_context.
(_Unwind_ForcedUnwind): Likewise.
(_Unwind_Resume): Likewise.
(_Unwind_Resume_or_Rethrow): Likewise.
From-SVN: r254876
This patch makes combine reconsider insns it added notes to. This
matters for example if the note is a REG_DEAD; without the note the
setter of the register has to be kept around in the result of
combinations, so it cannot be a 2->1 combination, and the cost of
the result is higher than without that extra set, so try_combine may
refuse the combination with the set, but allow it without the set.
This fixes a regression for powerpc: pr69946.c has started to fail
after the bitfield expansion changes. GCC used to generate
lwz 3,0(9)
rlwinm 3,3,12,20,23
ori 3,3,0x11
rotldi 3,3,52
bl bar
but now it does
lwz 3,0(9)
rldicr 3,3,32,3
srdi 3,3,48
ori 3,3,0x110
sldi 3,3,48
bl bar
(an instruction too many). After this patch it is
lwz 3,0(9)
rlwinm 3,3,16,16,19
ori 3,3,0x110
sldi 3,3,48
bl bar
(the testcase still does not pass, it looks for very specific insns).
* combine.c (added_notes_insn): New.
(try_combine): Handle added_notes_insn like added_links_insn.
Rewrite return value code.
(distribute_notes): Set added_notes_insn to the earliest insn we added
a note to.
From-SVN: r254875
If we have a PARALLEL of two SETs, and one half is unused, we currently
happily split that into two instructions (although the unused one is
useless). Worse, as PR82621 shows, combine will happily merge this
insn into I3 even if some intervening insn sets the same register
again, which is wrong.
This fixes it by not splitting PARALLELs with REG_UNUSED notes. It
all is handled fine by combine in that case: just the "single set
that is unused" case isn't handled properly.
This also results in better code: combine will now actually throw
away the unused SET. (It still won't do that in an I3).
PR rtl-optimization/82621
* combine.c (try_combine): Do not split PARALLELs of two SETs if the
dest of one of those SETs is unused.
From-SVN: r254874
This fixes the altivec-macros.c testcase; we now need to explicitly
say "no column number" for messages without one.
gcc/testsuite/
* gcc.target/powerpc/altivec-macros.c: Include "-:" in the messages
matched for.
From-SVN: r254873
Enable building libgcc with CET options by default on Linux/x86 if
binutils supports CET v2.0. It can be disabled with --disable-cet.
It is an error to configure GCC with --enable-cet if bintuiils
doesn't support CET v2.0.
ENDBR instruction is added to __morestack_large_model since it is
called indirectly.
2017-11-17 Igor Tsimbalist <igor.v.tsimbalist@intel.com>
config/
* cet.m4: New file.
gcc/
* config.gcc (extra_headers): Add cet.h for x86 targets.
* config/i386/cet.h: New file.
* doc/install.texi: Add --enable-cet/--disable-cet.
libgcc/
* Makefile.in (configure_deps): Add $(srcdir)/../config/cet.m4.
(CET_FLAGS): New.
* config/i386/morestack.S: Include <cet.h>.
(__morestack_large_model): Add _CET_ENDBR at function entrance.
* config/i386/resms64.h: Include <cet.h>.
* config/i386/resms64f.h: Likewise.
* config/i386/resms64fx.h: Likewise.
* config/i386/resms64x.h: Likewise.
* config/i386/savms64.h: Likewise.
* config/i386/savms64f.h: Likewise.
* config/i386/t-linux (HOST_LIBGCC2_CFLAGS): Add $(CET_FLAGS).
(CRTSTUFF_T_CFLAGS): Likewise.
* configure.ac: Include ../config/cet.m4.
Set and substitute CET_FLAGS.
* configure: Regenerated.
From-SVN: r254868
Testcase gcc.target/arm/cmse/cmse-14.c checks whether bar is called via
__gnu_cmse_nonsecure_call libcall and not via a direct call. However the
pattern is a bit surprising in that it needs to explicitely allow "by"
due to allowing anything before the 'b'.
This patch rewrites the logic to look for b as a first non-whitespace
letter followed iby anything (to match bl and conditional branches)
followed by some spaces and then bar.
2017-11-17 Thomas Preud'homme <thomas.preudhomme@arm.com>
gcc/testsuite/
* gcc.target/arm/cmse/cmse-14.c: Change logic to match branch
instruction to bar.
From-SVN: r254861
Some of the tests in the gcc.target/arm/cmse directory (eg.
gcc.target/arm/cmse/mainline/bitfield-4.c) are failing when run without
an architecture specified in RUNTESTFLAGS due to them not adding the
option to select an Armv8-M architecture.
This patch fixes the issue by adding the right option from the exp file
so that no architecture fiddling is necessary in the individual tests.
2017-11-17 Thomas Preud'homme <thomas.preudhomme@arm.com>
gcc/testsuite/
* gcc.target/arm/cmse/cmse.exp: Add option to select Armv8-M Baseline
or Armv8-M Mainline when running the respective tests.
* gcc.target/arm/cmse/baseline/cmse-11.c: Remove architecture check and
selection.
* gcc.target/arm/cmse/baseline/cmse-13.c: Likewise.
* gcc.target/arm/cmse/baseline/cmse-2.c: Likewise.
* gcc.target/arm/cmse/baseline/cmse-6.c: Likewise.
* gcc.target/arm/cmse/baseline/softfp.c: Likewise.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/hard/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/hard/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/hard/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/hard/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/soft/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/soft/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/soft/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/soft/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/softfp-sp/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/softfp-sp/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/softfp-sp/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/softfp/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/softfp/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/softfp/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/softfp/cmse-8.c: Likewise.
From-SVN: r254860
Commit r253825 which introduced some sanity checks for sbitmap revealed
a bug in the conversion of cmse_nonsecure_entry_clear_before_return ()
to using bitmap structure. bitmap_and expects that the two bitmaps have
the same length, yet the code in
cmse_nonsecure_entry_clear_before_return () have different size for
to_clear_bitmap and to_clear_arg_regs_bitmap, with the assumption that
bitmap_and would behave has if the bits not allocated were in fact zero.
This commit makes sure both bitmap are equally sized.
2017-11-17 Thomas Preud'homme <thomas.preudhomme@arm.com>
gcc/
* config/arm/arm.c (cmse_nonsecure_entry_clear_before_return): Allocate
to_clear_arg_regs_bitmap to the same size as to_clear_bitmap.
From-SVN: r254859
2017-11-17 Marc Glisse <marc.glisse@inria.fr>
* include/bits/vector.tcc (vector::_M_realloc_insert): Cache old
values before the allocation.
From-SVN: r254849
Had a small thinko in the implementation of mmintrin.h _mm_add_pi32 that only shows
when compiling for power9. A trivial and obvious 2 line patch to fix it.
From-SVN: r254848