2016-11-16 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
PR tree-optimization/77848
* tree-if-conv.c (version_loop_for_if_conversion): When versioning
an outer loop, only save basic block aux information for the inner
loop.
(versionable_outer_loop_p): New function.
(tree_if_conversion): Version the outer loop instead of the inner
one if the pattern will be recognized for outer-loop
vectorization.
From-SVN: r242520
The `user_defined_section_attribute' is used as part of the condition to
determine if GCC should partition blocks within a function into hot and
cold blocks. This global is initially false, and is set to true from
within the file parse phase of GCC, as part of the attribute handling
hook.
The `user_defined_section_attribute' is reset to false as part of the
final pass of GCC. However, the final pass is part of the optimisation
phase of the compiler, and so if at any point during the file parse
phase any function, or data, has a section attribute the global
`user_defined_section_attribute' will be set to true.
When GCC performs the block partitioning pass on the first function, if
`user_defined_section_attribute' is true then the function will not be
partitioned. Notice though, that due to the above, whether we partition
this first function or not has nothing to do with whether the function
has a section attribute, instead, if any function or data in the parsed
file has a section attribute then we don't partition the first
function.
After performing (or not) the block partitioning pass on the first
function we perform the final pass on the first function, at which point
we reset `user_defined_section_attribute' to false. As parsing is
complete by this point, we will never set
`user_defined_section_attribute' to true after that, and so all of the
following functions will have the partition blocks pass performed on
them, even if the function has a section attribute, and will not be
partitioned.
Luckily we don't end up partitioning functions that should not be
partitioned though. Due to the way that functions are selected during
the assembler writing phase, if a function has a section attribute this
takes priority over any hot/cold block partitioning that has been done.
What we see from the above then is that the
`user_defined_section_attribute' mechanism is broken. It was originally
created when GCC parsed, optimised, and generated assembler function at
a time. Now that we deal with the whole file in one go, we need to
update the mechanism used to gate the block partitioning pass.
This patch does this by looking specifically for a section attribute on
the function DECL, which removes the need for a global variable, and
will work whether we parse the whole file in one go, or one function at
a time.
A few new tests have been added. These check for the case where a
function is not partitioned when it could be.
gcc/ChangeLog:
* gcc/bb-reorder.c: Remove 'toplev.h' include.
(pass_partition_blocks::gate): No longer check
user_defined_section_attribute, instead check the function decl
for a section attribute.
* gcc/c-family/c-attribs.c (handle_section_attribute): No longer
set user_defined_section_attribute.
* gcc/final.c (rest_of_handle_final): Likewise.
* gcc/toplev.c: Remove definition of user_defined_section_attribute.
* gcc/toplev.h: Remove declaration of
user_defined_section_attribute.
gcc/testsuiteChangeLog:
* gcc.dg/tree-prof/section-attr-1.c: New file.
* gcc.dg/tree-prof/section-attr-2.c: New file.
* gcc.dg/tree-prof/section-attr-3.c: New file.
From-SVN: r242519
gcc/
* config/mips/mips.md (casesi_internal_mips16_<mode>):
Explicitly switch between JR and JRC for the table jump. Adjust
instruction count.
From-SVN: r242517
A step toward eliminating goc2c.
Drop the exported parfor code; it was needed for tests in the past, but
no longer is. The Go 1.7 runtime no longer uses parfor.
Reviewed-on: https://go-review.googlesource.com/33324
From-SVN: r242509
2016-11-16 Thomas Preud'homme <thomas.preudhomme@arm.com>
gcc/
* config/arm/arm.md (arm_addsi3): Add alternative for addition of
general register with general register or ARM constant into SP
register.
gcc/testsuite/
* gcc.target/arm/empty_fiq_handler.c: New test.
From-SVN: r242508
Looking at PR77308, one of the issues is that the bswap optimization
phase doesn't work on ARM. This is due to an odd check that uses
SLOW_UNALIGNED_ACCESS (which is always true on ARM). Since the testcase
in PR77308 generates much better code with this patch (~13% fewer
instructions), it seems best to remove this check.
gcc/
* tree-ssa-math-opts.c (bswap_replace): Remove test
of SLOW_UNALIGNED_ACCESS.
testsuite/
* gcc.dg/optimize-bswapdi-3.c: Remove xfail.
* gcc.dg/optimize-bswaphi-1.c: Likewise.
* gcc.dg/optimize-bswapsi-2.c: Likewise.
From-SVN: r242506
ieee_support_halting only checked the availability of status
flags, not trapping support. On some targets the later can
only be checked at runtime: feenableexcept reports if
enabling traps failed.
So check trapping support by enabling/disabling it.
Updated the test that enabled trapping to check if it is
supported.
gcc/testsuite/
PR libgfortran/78314
* gfortran.dg/ieee/ieee_6.f90: Use ieee_support_halting.
libgfortran/
PR libgfortran/78314
* config/fpu-glibc.h (support_fpu_trap): Use feenableexcept.
From-SVN: r242505
gcc/
* config/mips/mips-protos.h (mips_set_text_contents_type): New
prototype.
* config/mips/mips.h (ASM_OUTPUT_BEFORE_CASE_LABEL): New macro.
(ASM_OUTPUT_CASE_END): Likewise.
* config/mips/mips.c (mips_set_text_contents_type): New
function.
(mips16_emit_constants): Record the pool's initial label number
with the `consttable' insn. Emit a `consttable_end' insn at the
end.
(mips_final_prescan_insn): Call `mips_set_text_contents_type'
for `consttable' insns.
(mips_final_postscan_insn): Call `mips_set_text_contents_type'
for `consttable_end' insns.
* config/mips/mips.md (unspec): Add UNSPEC_CONSTTABLE_END enum
value.
(consttable): Add operand.
(consttable_end): New insn.
gcc/testsuite/
* gcc.target/mips/data-sym-jump.c: New test case.
* gcc.target/mips/data-sym-pool.c: New test case.
* gcc.target/mips/insn-pseudo-4.c: Adjust for constant pool
annotation.
From-SVN: r242502
gcc/
2016-11-16 Yuri Rumyantsev <ysrumyan@gmail.com>
* params.def (PARAM_VECT_EPILOGUES_NOMASK): New.
* tree-if-conv.c (tree_if_conversion): Make public.
* * tree-if-conv.h: New file.
* tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid
dynamic alias checks for epilogues.
* tree-vect-loop-manip.c (vect_do_peeling): Return created epilog.
* tree-vect-loop.c: include tree-if-conv.h.
(new_loop_vec_info): Add zeroing orig_loop_info field.
(vect_analyze_loop_2): Don't try to enhance alignment for epilogues.
(vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL
if epilogue is vectorized, set up orig_loop_info field of loop_vinfo
using passed argument.
(vect_transform_loop): Check if created epilogue should be returned
for further vectorization with less vf. If-convert epilogue if
required. Print vectorization success for epilogue.
* tree-vectorizer.c (vectorize_loops): Add epilogue vectorization
if it is required, pass loop_vinfo produced during vectorization of
loop body to vect_analyze_loop.
* tree-vectorizer.h (struct _loop_vec_info): Add new field
orig_loop_info.
(LOOP_VINFO_ORIG_LOOP_INFO): New.
(LOOP_VINFO_EPILOGUE_P): New.
(LOOP_VINFO_ORIG_VECT_FACTOR): New.
(vect_do_peeling): Change prototype to return epilogue.
(vect_analyze_loop): Add argument of loop_vec_info type.
(vect_transform_loop): Return created loop.
gcc/testsuite/
2016-11-16 Yuri Rumyantsev <ysrumyan@gmail.com>
* lib/target-supports.exp (check_avx2_hw_available): New.
(check_effective_target_avx2_runtime): New.
* gcc.dg/vect/vect-tail-nomask-1.c: New test.
From-SVN: r242501
So far all target implementations of the separate shrink-wrapping hooks
use the DF LIVE info to figure out around which basic blocks the non-
volatile registers need to be saved. This is done by looking at the
IN+GEN+KILL sets of the basic blocks. However, that doesn't work for
registers that DF says are defined in the entry block, or used in the
exit block.
This patch introduces a local flag DF_SCAN_EMPTY_ENTRY_EXIT that says
no registers should be defined in the entry block, and none used in the
exit block. It also makes try_shrink_wrapping_separate use it. The
rs6000 port is changed to use IN+GEN+KILL for the LR component.
* config/rs6000/rs6000.c (rs6000_components_for_bb): Mark the LR
component as used also if LR_REGNO is a live input to the bb.
* df-scan.c (df_get_entry_block_def_set): Return immediately after
clearing the set if DF_SCAN_EMPTY_ENTRY_EXIT is set.
(df_get_exit_block_use_set): Ditto.
* df.h (df_scan_flags): New enum.
* shrink-wrap.c (try_shrink_wrapping_separate): Set
DF_SCAN_EMPTY_ENTRY_EXIT in df_scan->local_flags, and call
df_update_entry_block_defs and df_update_exit_block_uses
at the start; clear the flag and call those functions at the end.
From-SVN: r242497
We previously stored the number of loop iterations rather
than the number of latch iterations.
gcc/
2016-11-15 Richard Sandiford <richard.sandiford@arm.com>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* tree-vect-loop-manip.c (slpeel_make_loop_iterate_ntimes): Set
nb_iterations to the number of latch iterations rather than the
number of loop iterations.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r242493
The transformations made by make_compound_operation apply
only to scalar integer modes. The fix for PR70944 had enforced
that by returning early for vector modes at the top of the
function. However, the function is supposed to be recursive,
so we should continue to look at integer suboperands even if
the outer operation is a vector one.
This patch instead splits out the non-recursive parts
of make_compound_operation into a subroutine and checks
that the mode is a scalar integer before calling it.
The patch was originally written to help with the later
conversion to static type checking of mode classes, but it
also happened to reenable optimisation of things like
vec_duplicate operands.
Note that the gen_lowparts in the PLUS and MINUS cases
were redundant, since new_rtx already had mode "mode"
at those points.
gcc/
2016-11-15 Richard Sandiford <richard.sandiford@arm.com>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* combine.c (maybe_swap_commutative_operands): New function.
(combine_simplify_rtx): Use it.
(change_zero_ext): Likewise.
(make_compound_operation_int): New function, split out of...
(make_compound_operation): ...here. Use
maybe_swap_commutative_operands for both.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r242492
* arm/arm-fpus.def (vfpv2): New FPU, currently an alias for 'vfp'.
(neon-vfpv3): New FPU, currently an alias for 'neon'.
* arm/arm-tables.opt: Regenerated.
* arm/t-aprofile (MULTILIB_REUSE): Add reuse rules for vfpv2 and
neon-vfpv3.
* doc/invoke.texi (ARM: -mfpu): Document new options. Note that 'vfp'
and 'neon' are aliases for specific implementations.
From-SVN: r242491
gcc/fortran/ChangeLog:
2016-11-16 Andre Vehreschild <vehre@gcc.gnu.org>
PR fortran/78356
* class.c (gfc_is_class_scalar_expr): Prevent taking an array ref for
a component ref.
* trans-expr.c (gfc_trans_assignment_1): Ensure a reference to the
object to copy is generated, when assigning class objects.
gcc/testsuite/ChangeLog:
2016-11-16 Andre Vehreschild <vehre@gcc.gnu.org>
PR fortran/78356
* gfortran.dg/class_allocate_23.f08: New test.
From-SVN: r242490
vec_cmps assign the result of a vector comparison to a mask.
The optab was called with the destination having mode mask_mode
but with the source (the comparison) having mode VOIDmode,
which led to invalid rtl if the source operand was used directly.
gcc/
2016-11-15 Richard Sandiford <richard.sandiford@arm.com>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* optabs.c (vector_compare_rtx): Add a cmp_mode parameter
and use it in the final call to gen_rtx_fmt_ee.
(expand_vec_cond_expr): Update accordingly.
(expand_vec_cmp_expr): Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r242489
local_cprop_find_used_regs punted on all multiword registers,
with the comment:
/* Setting a subreg of a register larger than word_mode leaves
the non-written words unchanged. */
But this only applies if the outer mode is smaller than the
inner mode. If they're the same size then writes to the subreg
are a normal full update.
This patch uses df_read_modify_subreg_p instead. A later patch
adds more uses of the same routine, but this part had a (positive)
effect on code generation for the testsuite whereas the others
seemed to be simple clean-ups.
gcc/
2016-11-15 Richard Sandiford <richard.sandiford@arm.com>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* cprop.c (local_cprop_find_used_regs): Use df_read_modify_subreg_p.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r242488
2016-11-16 Richard Biener <rguenther@suse.de>
PR middle-end/78333
* gimplify.c (gimplify_function_tree): Do not instrument
GNU extern inline functions.
* gcc.dg/pr78333.c: New testcase.
From-SVN: r242487
gcc/arc: New peephole2 and little endian arc test fixes
Resolve some test failures introduced for little endian arc as a result
of the recent arc/nps400 additions.
There's a new peephole2 optimisation to merge together two zero_extracts
in order that the movb instruction can be used.
One of the test cases is extended so that the test does something
meaningful in both big and little endian arc mode.
Other tests have their expected results updated to reflect improvements
in other areas of GCC.
gcc/ChangeLog:
Andrew Burgess <andrew.burgess@embecosm.com>
* config/arc/arc.md (movb peephole2): New peephole2 to merge two
zero_extract operations to allow a movb to occur.
* gcc.target/arc/movb-1.c: Update little endian arc results.
* gcc.target/arc/movb-2.c: Likewise.
* gcc.target/arc/movb-5.c: Likewise.
* gcc.target/arc/movh_cl-1.c: Extend test to cover little endian
arc.
From-SVN: r242484
When one uses ld.gold to build gcc, the thread sanitizer doesn't work,
because gold is more conservative when applying TLS relaxations than
ld.bfd. In this case a missing initial-exec attribute on a declaration
causes gcc to assume the general dynamic model. With ld.bfd this gets
relaxed to initial exec when linking the shared library, so the missing
attribute doesn't matter. But ld.gold doesn't perform this optimization
and this leads to crashes on tsan instrumented binaries.
See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78294
and: https://sourceware.org/bugzilla/show_bug.cgi?id=20805
The fix is easy, just add the missing attribute.
PR sanitizer/78294
* tsan/tsan_rtl.cc: Add missing attribute.
From-SVN: r242480
The CONCAT handling in emit_group_load chooses between doing
an extraction from a single component or forcing the whole
thing to memory and extracting from there. The condition for
the former (more efficient) option was:
if ((bytepos == 0 && bytelen == slen0)
|| (bytepos != 0 && bytepos + bytelen <= slen))
On the one hand this seems dangerous, since the second line
allows bit ranges that start in the first component and leak
into the second. On the other hand it seems strange to allow
references that start after the first byte of the second
component but not those that start after the first byte
of the first component. This led to a pessimisation of
things like gcc.dg/builtins-54.c for hppa64-hp-hpux11.23.
This patch simply checks whether the reference is contained
within a single component. It also makes sure that we do
an extraction on anything that doesn't span the whole
component (even if it's constant).
gcc/
2016-11-15 Richard Sandiford <richard.sandiford@arm.com>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* expr.c (emit_group_load_1): Tighten check for whether an
access involves only one operand of a CONCAT. Use extract_bit_field
for constants if the bit range does span the whole operand.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r242477
If the size passed in to rtx_addr_can_trap_p was zero, the frame
handling would get the size from the mode instead. However, this
too can be zero if the mode is BLKmode, i.e. if we have a BLKmode
memory reference with no MEM_SIZE (which should be rare these days).
This meant that the conditions for a 4-byte access at offset X were
stricter than those for an access of unknown size at offset X.
This patch checks whether the size is still zero, as the
SYMBOL_REF handling does.
gcc/
2016-11-15 Richard Sandiford <richard.sandiford@arm.com>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* rtlanal.c (rtx_addr_can_trap_p_1): Handle unknown sizes.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r242476
vect_transform_loop has to reduce three iteration counts by
the vectorisation factor: nb_iterations_upper_bound,
nb_iterations_likely_upper_bound and nb_iterations_estimate.
All three are latch execution counts rather than loop body
execution counts. The calculations were taking that into
account for the first two, but not for nb_iterations_estimate.
This patch updates the way the calculations are done to fix
this and to add a bit more commentary about what is going on.
gcc/
2016-11-15 Richard Sandiford <richard.sandiford@arm.com>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
* tree-vect-loop.c (vect_transform_loop): Protect the updates of
all three iteration counts with an any_* test. Use a single update
for each count. Fix the calculation of nb_iterations_estimate.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r242475
The old code still built thanks to the brackets in the definition
of XVECEXP.
gcc/
* config/arc/arc.c (arc_loop_hazard): Add missing brackets.
From-SVN: r242473
The test assumes short is always smaller than int, and therefore does not
expect a warning when the logical operands are of type short and int.
This isn't true for the avr - shorts and ints are of the same size, and
therefore the warning triggers for the above case also.
Fix by explicitly typedef'ing __INT32_TYPE for int and __INT16_TYPE__ for short
if the target's int size is less than 4 bytes.
gcc/testsuite/
2016-11-16 Senthil Kumar Selvaraj <senthil_kumar.selvaraj@atmel.com>
* c-c++-common/Wlogical-op-1.c: Use __INT{16,32}_TYPE__ instead
of {short,int} if __SIZEOF_INT__ is less than 4 bytes.
From-SVN: r242472
PR target/78364
* config/arm/arm.md (*extv_reg): Restrict operands 2 and 3 to the
proper ranges for an SBFX instruction.
(extzv_t2): Likewise for UBFX.
From-SVN: r242471
2016-11-16 Richard Biener <rguenther@suse.de>
PR tree-optimization/78348
* tree-loop-distribution.c (enum partition_kind): Add PKIND_MEMMOVE.
(generate_memcpy_builtin): Honor PKIND_MEMCPY on the partition.
(classify_partition): Set PKIND_MEMCPY if dependence analysis
revealed no dependency, PKIND_MEMMOVE otherwise.
* gcc.dg/tree-ssa/ldist-24.c: New testcase.
From-SVN: r242470
PR sanitizer/77823
* ubsan.c (ubsan_build_overflow_builtin): Add DATAP argument, if
it points to non-NULL tree, use it instead of ubsan_create_data.
(instrument_si_overflow): Handle vector signed integer overflow
checking.
* ubsan.h (ubsan_build_overflow_builtin): Add DATAP argument.
* tree-vrp.c (simplify_internal_call_using_ranges): Punt for
vector IFN_UBSAN_CHECK_*.
* internal-fn.c (expand_addsub_overflow): Add DATAP argument,
pass it through to ubsan_build_overflow_builtin.
(expand_neg_overflow, expand_mul_overflow): Likewise.
(expand_vector_ubsan_overflow): New function.
(expand_UBSAN_CHECK_ADD, expand_UBSAN_CHECK_SUB,
expand_UBSAN_CHECK_MUL): Use tit for vector arithmetics.
(expand_arith_overflow): Adjust expand_*_overflow callers.
* c-c++-common/ubsan/overflow-vec-1.c: New test.
* c-c++-common/ubsan/overflow-vec-2.c: New test.
From-SVN: r242469