PR tree-optimization/80632
* tree-switch-conversion.c (struct switch_conv_info): Add target_vop
field.
(build_arrays): Initialize it for virtual phis.
(fix_phi_nodes): Use it for virtual phis.
* gcc.dg/pr80632.c: New test.
From-SVN: r247642
PR tree-optimization/80558
* tree-vrp.c (extract_range_from_binary_expr_1): Optimize
[x, y] op z into [x op, y op z] for op & or | if conditions
are met.
* gcc.dg/tree-ssa/vrp115.c: New test.
From-SVN: r247641
Code scheduling for Cortex-A53 isn't as good as it could be. It turns out
code runs faster overall if we place loads and stores with a dependency
closer together. To achieve this effect, this patch adds a bypass between
cortex_a53_load1 and cortex_a53_load*/cortex_a53_store* if the result of an
earlier load is used in an address calculation. This significantly improved
benchmark scores in a proprietary benchmark suite.
gcc/
* config/arm/aarch-common.c (arm_early_load_addr_dep_ptr):
New function.
(arm_early_store_addr_dep_ptr): Likewise.
* config/arm/aarch-common-protos.h
(arm_early_load_addr_dep_ptr): Add prototype.
(arm_early_store_addr_dep_ptr): Likewise.
* config/arm/cortex-a53.md: Add new bypasses.
From-SVN: r247631
* tree.c (next_type_uid): Change type to unsigned.
(type_hash_canon): Decrement back next_type_uid if
freeing a type node with the highest TYPE_UID. For INTEGER_TYPEs
also ggc_free TYPE_MIN_VALUE, TYPE_MAX_VALUE and TYPE_CACHED_VALUES
if possible.
From-SVN: r247628
Many supported cores use the AUTOPREFETCHER_WEAK setting which tries
to order loads and stores to improve streaming performance. Since significant
gains were reported in http://patchwork.ozlabs.org/patch/534469/ it seems
like a good idea to enable this setting too for -mcpu=generic. Since the
weak model only keeps the order if it doesn't make the schedule worse, it
should not impact performance adversely on cores that don't show a gain.
gcc/
* config/aarch64/aarch64.c (generic_tunings): Update prefetch model.
From-SVN: r247610
Set jump alignment to 4 for Cortex cores as it reduces codesize by 0.4% on
average with no obvious performance difference. See original discussion of
the overheads of various alignments:
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg02075.html.
gcc/
* config/aarch64/aarch64.c (cortexa35_tunings): Set jump alignment to 4.
(cortexa53_tunings): Likewise.
(cortexa57_tunings): Likewise.
(cortexa72_tunings): Likewise.
(cortexa73_tunings): Likewise.
From-SVN: r247609
With -mcpu=generic the loop alignment is currently 4. All but one of the
supported cores use 8 or higher. Since using 8 provides performance gains
on several cores, it is best to use that by default. As discussed in [1],
the jump alignment has no effect on performance, yet has a relatively high
codesize cost [2], so setting it to 4 is best. This gives a 0.2% overall
codesize improvement as well as performance gains in several benchmarks.
gcc/
* config/aarch64/aarch64.c (generic_tunings): Set jump alignment to 4.
Set loop alignment to 8.
[1] https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00574.html
[2] https://gcc.gnu.org/ml/gcc-patches/2016-06/msg02075.html
From-SVN: r247608
All cores which add a cpu_addrcost_table use a non-zero value for
HI and TI mode shifts (a non-zero value for general indexing also
applies to all shifts). Given this, it makes no sense to use a
different setting in generic_addrcost_table. So change it so that
all supported cores, including -mcpu=generic, now generate the same:
int f(short *p, short *q, long x) { return p[x] + q[x]; }
lsl x2, x2, 1
ldrsh w3, [x0, x2]
ldrsh w0, [x1, x2]
add w0, w3, w0
ret
gcc/
* config/aarch64/aarch64.c (generic_addrcost_table):
Change HI/TI mode setting.
From-SVN: r247606
2017-05-04 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/80622
* tree-sra.c (comes_initialized_p): New function.
(build_accesses_from_assign): Only set write lazily when
comes_initialized_p is false.
(analyze_access_subtree): Use comes_initialized_p.
(propagate_subaccesses_across_link): Assert !comes_initialized_p
instead of testing for PARM_DECL.
testsuite/
* gcc.dg/tree-ssa/pr80622.c: New test.
From-SVN: r247604
2017-05-04 Richard Biener <rguenther@suse.de>
* tree-ssa-alias.c (get_continuation_for_phi): Improve looking
for the last VUSE which def dominates the PHI. Directly call
maybe_skip_until.
(get_continuation_for_phi_1): Remove.
* gcc.dg/tree-ssa/ssa-fre-58.c: New testcase.
From-SVN: r247596
For the reasons explained in PR77536, niter_for_unrolled_loop assumes 5
iterations in the absence of profiling information, although it doesn't
increase beyond the estimate for the original loop. This left a hole in
which the new estimate could be less than the old one but still greater
than the limit imposed by CEIL (nb_iterations_upper_bound, unroll factor).
2017-05-04 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-ssa-loop-manip.c (niter_for_unrolled_loop): Add commentary
to explain the use of truncating division. Cap the number of
iterations to the maximum given by nb_iterations_upper_bound,
if defined.
gcc/testsuite/
* gcc.dg/vect/vect-profile-1.c: New test.
From-SVN: r247591
This patch adds support for purecode to ARMv8-M Baseline, in addition to
the existing support for ARMv7-M and ARMv8-M Mainline.
2017-05-04 Prakhar Bahuguna <prakhar.bahuguna@arm.com>
Andre Simoes Dias Vieira <andre.simoesdiasvieira@arm.com>
gcc/
* config/arm/arm.md (movsi): Change TARGET_32BIT to TARGET_HAVE_MOVT.
(movt splitter): Likewise.
* config/arm/arm.c (arm_option_check_internal): Change arm_arch_thumb2
to TARGET_HAVE_MOVT, and merge with -mslow-flash-data check.
(const_ok_for_arm): Change else to else if (TARGET_THUMB2) and add else
block for Thumb-1 with MOVT.
(thumb2_legitimate_address_p): Move code block ...
(can_avoid_literal_pool_for_label_p): ... into this new function.
(thumb1_legitimate_address_p): Add check for TARGET_HAVE_MOVT and
literal pool.
(thumb_legitimate_constant_p): Add conditional on TARGET_HAVE_MOVT
* doc/invoke.texi (-mpure-code): Change "ARMv7-M targets" for
"M-profile targets with the MOVT instruction".
gcc/testsuite/
* gcc.target/arm/pure-code/pure-code.exp: Add conditional for
check_effective_target_arm_thumb1_movt_ok.
Co-Authored-By: Andre Vieira <andre.simoesdiasvieira@arm.com>
From-SVN: r247585
The GCC documentation in section 6.60.8 ARM Floating Point Status and
Control Intrinsics states that the FPSCR register can be read and
written to using the intrinsics __builtin_arm_get_fpscr and
__builtin_arm_set_fpscr. However, these are misnamed within GCC itself
and these intrinsic names are not recognised.
This patch corrects the intrinsic names to match the documentation, and
adds tests to verify these intrinsics generate the correct
instructions.
2017-05-04 Prakhar Bahuguna <prakhar.bahuguna@arm.com>
gcc/
* gcc/config/arm/arm-builtins.c (arm_init_builtins): Rename
__builtin_arm_ldfscr to __builtin_arm_get_fpscr, and rename
__builtin_arm_stfscr to __builtin_arm_set_fpscr.
gcc/testsuite/
* gcc.target/arm/fpscr.c: New file.
From-SVN: r247584
* brig-builtins.def: Added a builtin for class_f64.
* builtin-types.def: Added a builtin type needed by class_f64.
* brigfrontend/brig-code-entry-handler.cc
(brig_code_entry_handler::build_address_operand): Fix a bug
with reg+offset addressing on 32b segments. In large mode,
the offset is treated as 32bits unless it's global, readonly or
kernarg address space.
* rt/workitems.c: Removed a leftover comment.
* rt/arithmetic.c (__hsail_class_f32, __hsail_class_f64): Fix the
check for signaling/non-signalling NaN. Add class_f64 default
implementation.
From-SVN: r247576