We can use the mode iterators directly with an @pattern to avoid the
need for an expander that was only there to pass the mode through.
gcc/ChangeLog:
2019-09-26 Iain Sandoe <iain@sandoe.co.uk>
* config/rs6000/darwin.md: Replace the expanders for
load_macho_picbase and reload_macho_picbase with use of '@'
in their respective define_insns.
(nonlocal_goto_receiver): Pass Pmode to gen_reload_macho_picbase.
* config/rs6000/rs6000-logue.c (rs6000_emit_prologue): Pass
Pmode to gen_load_macho_picbase.
* config/rs6000/rs6000.md: Likewise.
From-SVN: r276159
2019-09-25 Richard Biener <rguenther@suse.de>
PR tree-optimization/91896
* tree-vect-loop.c (vectorizable_reduction): The single
def-use cycle optimization cannot apply when there's more
than one pattern stmt involved.
* gcc.dg/torture/pr91896.c: New testcase.
From-SVN: r276158
2019-09-26 Richard Biener <rguenther@suse.de>
* tree-vect-loop.c (vect_analyze_loop_operations): Also call
vectorizable_reduction for vect_double_reduction_def.
(vect_transform_loop): Likewise.
(vect_create_epilog_for_reduction): Move double-reduction
PHI creation and preheader argument setting of PHIs ...
(vectorizable_reduction): ... here. Also process
vect_double_reduction_def PHIs, creating the vectorized
PHI nodes, remembering the scalar adjustment computed for
the epilogue in STMT_VINFO_REDUC_EPILOGUE_ADJUSTMENT.
Remember the original reduction code in STMT_VINFO_REDUC_CODE.
* tree-vectorizer.c (vec_info::new_stmt_vec_info):
Initialize STMT_VINFO_REDUC_CODE.
* tree-vectorizer.h (_stmt_vec_info::reduc_epilogue_adjustment): New.
(_stmt_vec_info::reduc_code): Likewise.
(STMT_VINFO_REDUC_EPILOGUE_ADJUSTMENT): Likewise.
(STMT_VINFO_REDUC_CODE): Likewise.
From-SVN: r276150
When -march=native is passed to host_detect_local_cpu to the backend,
it overrides all command lines after it. That means
$ gcc -march=native -march=armv8-a
is treated as
$ gcc -march=armv8-a -march=native
Prune joined switches with Negative and RejectNegative to allow
-march=armv8-a to override previous -march=native on command-line.
This is the same fix as was applied for i386 in SVN revision 269164 but for
aarch64 and arm.
2019-09-26 Matt Turner <mattst88@gmail.com>
PR driver/69471
* config/aarch64/aarch64.opt (march=): Add Negative(march=).
(mtune=): Add Negative(mtune=).
(mcpu=): Add Negative(mcpu=).
* config/arm/arm.opt: Likewise.
From-SVN: r276148
This patch implements some more SIMD32, but these ones have a DImode result+addend.
Apart from that there's nothing too exciting about them.
Bootstrapped and tested on arm-none-linux-gnueabihf.
* config/arm/arm.md (arm_<simd32_op>): New define_insn.
* config/arm/arm_acle.h (__smlald, __smlaldx, __smlsld, __smlsldx):
Define.
* config/arm/arm_acle.h: Define builtins for the above.
* config/arm/iterators.md (SIMD32_DIMODE): New int_iterator.
(simd32_op): Handle the above.
* config/arm/unspecs.md: Define unspecs for the above.
* gcc.target/arm/acle/simd32.c: Update test.
From-SVN: r276147
This patch is part of a series to implement the SIMD32 ACLE intrinsics [1].
The interesting parts implementation-wise involve adding support for setting and reading
the Q bit for saturation and the GE-bits for the packed SIMD instructions.
That will come in a later patch.
For now, this patch implements the other intrinsics that don't need anything special ;
just a mapping from arm_acle.h function to builtin to RTL expander+unspec.
I've compressed as many as I could with iterators so that we end up needing only 3
new define_insns.
Bootstrapped and tested on arm-none-linux-gnueabihf.
[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics
* config/arm/arm.md (arm_<simd32_op>): New define_insn.
(arm_<sup>xtb16): Likewise.
(arm_usada8): Likewise.
* config/arm/arm_acle.h (__qadd8, __qsub8, __shadd8, __shsub8,
__uhadd8, __uhsub8, __uqadd8, __uqsub8, __qadd16, __qasx, __qsax,
__qsub16, __shadd16, __shasx, __shsax, __shsub16, __uhadd16, __uhasx,
__uhsax, __uhsub16, __uqadd16, __uqasx, __uqsax, __uqsub16, __sxtab16,
__sxtb16, __uxtab16, __uxtb16): Define.
* config/arm/arm_acle_builtins.def: Define builtins for the above.
* config/arm/unspecs.md: Define unspecs for the above.
* config/arm/iterators.md (SIMD32_NOGE_BINOP): New int_iterator.
(USXTB16): Likewise.
(simd32_op): New int_attribute.
(sup): Handle UNSPEC_SXTB16, UNSPEC_UXTB16.
* doc/sourcebuild.exp (arm_simd32_ok): Document.
* lib/target-supports.exp
(check_effective_target_arm_simd32_ok_nocache): New procedure.
(check_effective_target_arm_simd32_ok): Likewise.
(add_options_for_arm_simd32): Likewise.
* gcc.target/arm/acle/simd32.c: New test.
From-SVN: r276146
My recent assemble_real patch (r275873) meant that we now output
negative FP16 constants in the same way as we'd output an integer
subreg of them. This patch updates gcc.target/arm/fp16-* accordingly.
2019-09-26 Richard Sandiford <richard.sandiford@arm.com>
gcc/testsuite/
* gcc.target/arm/fp16-compile-alt-3.c: Expect (__fp16) -2.0
to be written as a negative short rather than a positive one.
* gcc.target/arm/fp16-compile-ieee-3.c: Likewise.
From-SVN: r276145
The PLUS handling in aarch64_rtx_costs only checked for nonnegative
constants, meaning that simple immediate subtractions like:
(set (reg R1) (plus (reg R2) (const_int -8)))
had a cost of two instructions.
2019-09-26 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/aarch64.c (aarch64_rtx_costs): Use
aarch64_plus_immediate rather than aarch64_uimm12_shift
to test for valid PLUS immediates.
From-SVN: r276140
gcc/fortran/ChangeLog:
PR fortran/91426
* error.c (curr_diagnostic): New static variable.
(gfc_report_diagnostic): New static function.
(gfc_warning): Replace call to diagnostic_report_diagnostic with
call to gfc_report_diagnostic.
(gfc_format_decoder): Colorize the text of %L and %C to match the
colorization used by diagnostic_show_locus.
(gfc_warning_now_at): Replace call to diagnostic_report_diagnostic with
call to gfc_report_diagnostic.
(gfc_warning_now): Likewise.
(gfc_warning_internal): Likewise.
(gfc_error_now): Likewise.
(gfc_fatal_error): Likewise.
(gfc_error_opt): Likewise.
(gfc_internal_error): Likewise.
From-SVN: r276132
Hi,
Martin and his clang warnings discovered that I forgot to remove a
static inline function and a variable when ripping out the old IPA-SRA
from tree-sra.c and both are now unused. Thus I am doing that now
with the patch below which I will commit as obvious (after including
it in a round of a bootstrap and testing on an x86_64-linux).
Thanks,
Martin
2019-09-25 Martin Jambor <mjambor@suse.cz>
* tree-sra.c (no_accesses_p): Remove.
(no_accesses_representant): Likewise.
From-SVN: r276128
We're somewhat inconsistent in arm_neon.h when it comes to using the implementation namespace for local
identifiers. This means things like:
#define hash_abcd 0
#define hash_e 1
#define wk 2
#include "arm_neon.h"
uint32x4_t
foo (uint32x4_t a, uint32_t b, uint32x4_t c)
{
return vsha1cq_u32 (a, b, c);
}
don't compile.
This patch fixes these issues throughout the whole of arm_neon.h
Bootstrapped and tested on aarch64-none-linux-gnu.
The advsimd-intrinsics.exp tests pass just fine.
From-SVN: r276125
2019-09-25 Richard Biener <rguenther@suse.de>
PR tree-optimization/91896
* tree-vect-loop.c (vectorizable_reduction): The single
def-use cycle optimization cannot apply when there's more
than one pattern stmt involved.
* gcc.dg/torture/pr91896.c: New testcase.
From-SVN: r276123
The DCache clean & ICache invalidation requirements for instructions
to be data coherence are discoverable through new fields in CTR_EL0.
Let's support the two bits if they are enabled, the CPU core will
not execute the unnecessary DCache clean or Icache Invalidation
instructions.
2019-09-25 Shaokun Zhang <zhangshaokun@hisilicon.com>
* config/aarch64/sync-cache.c (__aarch64_sync_cache_range): Add support for
CTR_EL0.IDC and CTR_EL0.DIC.
From-SVN: r276122
The break here was skipping over the code that sets EXPR_LOCATION on the
call expressions, for no good reason.
* parser.c (cp_parser_postfix_expression): Do set location of
dependent member call.
From-SVN: r276112
This switches the picbase load and reload patterns to use the 'P' mode
iterator instead of writing an SI and DI pattern for each.
gcc/ChangeLog:
2019-09-24 Iain Sandoe <iain@sandoe.co.uk>
* config/rs6000/rs6000.md (load_macho_picbase_<mode>): New, using
the 'P' mode iterator, replacing the (removed) SI and DI variants.
(reload_macho_picbase_<mode>): Likewise.
From-SVN: r276107
As a clean-up, we want to be able to use mode iterators in darwin.md.
This patch moves the include point for the Darwin include until after
the definition of the mode iterators and attrs. No functional change
intended.
gcc/ChangeLog:
2019-09-24 Iain Sandoe <iain@sandoe.co.uk>
* config/rs6000/rs6000.md: Move darwin.md include until
after the definition of the mode iterators.
From-SVN: r276106
PR tree-optimization/91570 - ICE in get_range_strlen_dynamic on a conditional
of two strings
gcc/Changelog:
* tree-ssa-strlen.c (get_range_strlen_dynamic): Handle null and
non-constant minlen, maxlen and maxbound.
gcc/testsuite/Changelog:
* gcc.dg/pr91570.c: New test.
From-SVN: r276105
The __index_type is only ever unsigned char or unsigned short, so not
the same type as size_t.
* include/std/variant (variant::index()): Remove impossible case.
From-SVN: r276100
The optimisation to optimise:
typedef unsigned long long u64;
void bar(u64 *x)
{
*x = 0xabcdef10abcdef10;
}
from:
mov x1, 61200
movk x1, 0xabcd, lsl 16
movk x1, 0xef10, lsl 32
movk x1, 0xabcd, lsl 48
str x1, [x0]
into:
mov w1, 61200
movk w1, 0xabcd, lsl 16
stp w1, w1, [x0]
ends up producing two distinct stores if the destination is volatile:
void bar(u64 *x)
{
*(volatile u64 *)x = 0xabcdef10abcdef10;
}
mov w1, 61200
movk w1, 0xabcd, lsl 16
str w1, [x0]
str w1, [x0, 4]
because we end up not merging the strs into an stp. It's questionable whether the use of STP is valid for volatile in the first place.
To avoid unnecessary pain in a context where it's unlikely to be performance critical [1] (use of volatile), this patch avoids this
transformation for volatile destinations, so we produce the original single STR-X.
Bootstrapped and tested on aarch64-none-linux-gnu.
[1] https://lore.kernel.org/lkml/20190821103200.kpufwtviqhpbuv2n@willie-the-truck/
* config/aarch64/aarch64.md (mov<mode>): Don't call
aarch64_split_dimode_const_store on volatile MEM.
* gcc.target/aarch64/nosplit-di-const-volatile_1.c: New test.
From-SVN: r276098
This is a minor patch that fixes the entry for the fp16fml feature in
GCC's aarch64-option-extensions.def.
As can be seen in the Linux sources here
https://github.com/torvalds/linux/blob/master/arch/arm64/kernel/cpuinfo.c#L69
the correct string is "asimdfhm", not "asimdfml".
Cross-compiled and tested on aarch64-none-linux-gnu.
2019-09-24 Stamatis Markianos-Wright <stam.markianos-wright@arm.com>
* config/aarch64/aarch64-option-extensions.def (fp16fml):
Update hwcap string for fp16fml.
From-SVN: r276097
Hi,
I am quite surprised I did not catch this before but the new
ipa-param-manipulation does not copy PARM_DECLs when creating
artificial thinks (I think it originally did but then I somehow
removed during one cleanups). Fixed by adding the capability at the
natural place. It is triggered whenever context of the PARM_DECL that
is just taken from the original function does not match the target
fndecl rather than by some constructor parameter because in such
situation it is always the correct thing to do.
Bootstrapped and tested on x86_64-linux. OK for trunk?
Thanks,
Martin
2019-09-24 Martin Jambor <mjambor@suse.cz>
PR ipa/91831
* ipa-param-manipulation.c (carry_over_param): Make a method of
ipa_param_body_adjustments, remove now unnecessary argument. Also copy
in case of a context mismatch.
(ipa_param_body_adjustments::common_initialization): Adjust call to
carry_over_param.
* ipa-param-manipulation.h (class ipa_param_body_adjustments): Add
private method carry_over_param.
testsuite/
* g++.dg/ipa/pr91831.C: New test.
From-SVN: r276094
Hi,
IPA-SRA asserts that an offset obtained from get_ref_base_and_extent
is non-negative (after it verifies it is based on a parameter). That
assumption is invalid as the testcase shows. One could probably also write a
testcase with defined behavior, but unless I see a reasonable one
where the transformation is really desirable, I'd like to just punt on
those cases.
Bootstrapped and tested on x86_64-linux. OK for trunk?
Thanks,
Martin
2019-09-24 Martin Jambor <mjambor@suse.cz>
PR ipa/91832
* ipa-sra.c (scan_expr_access): Check that offset is non-negative.
testsuite/
* gcc.dg/ipa/pr91832.c: New test.
From-SVN: r276093