gcc/
PR target/96551
* config/i386/sse.md (vec_unpacku_float_hi_v16si): For vector
compare to integer mask, don't use gen_rtx_LT, use
ix86_expand_mask_vec_cmp instead.
(vec_unpacku_float_hi_v16si): Ditto.
gcc/testsuite
* gcc.target/i386/avx512f-pr96551-1.c: New test.
* gcc.target/i386/avx512f-pr96551-2.c: New test.
When looking at the verification, I have noticed a bug in it.
The verification that CASE_HIGH (if present) has the same type as CASE_LOW
is only performed for the case label 2 and higher, case label 1 (the first
one after the default label) isn't checked.
The following patch fixes that, it will uselessly also compare
TREE_TYPE (CASE_LOW (elt)) != elt_type for the case label 1, but I think
that isn't that expensive and helps readability of the code.
2020-08-31 Jakub Jelinek <jakub@redhat.com>
* tree-cfg.c (verify_gimple_switch): If the first non-default case
label has CASE_HIGH, verify it has the same type as CASE_LOW.
I meant something like the following, which on e.g. a dumb:
typedef float V __attribute__((vector_size (4 * sizeof (float))));
void
foo (V *p, float *q)
{
p[0] += (V) { 1.0f, 2.0f, 3.0f, 4.0f };
q[0] += 4.0f;
q[1] -= 3.0f;
q[17] -= 2.0f;
q[31] += 1.0f;
}
testcase merges all the 4 scalar constant pool entries into the CONST_VECTOR
one.
I'm punting for section anchors and not doing it in the per-function (i.e.
non-shared) constant pools simply because I don't know them well enough,
don't know whether backends use the offsets for something etc.
For section anchors, I guess it would need to be done before (re)computing the
offsets and arrange for the desc->mark < 0 entries not to be considered as
objects in the object block, for non-shared pools, perhaps it would be
enough to call the new function from output_constant_pool before calling
recompute_pool_offsets and adjust recompute_pool_offsets to ignore
desc->mark < 0.
Here is an adjusted patch that ought to merge even the same sized different
mode vectors with the same byte representation, etc.
It won't really help with avoiding the multiple reads of the constant in the
same function, but as you found, your patch doesn't help with that either.
Your patch isn't really incompatible with what the patch below does, though
I wonder whether a) it wouldn't be better to always canonicalize to an
integral mode with as few elts as possible even e.g. for floats b) whether
asserting that it simplify_rtx succeeds is safe, whether it shouldn't just
canonicalize if the canonicalization works and just do what it previously
did otherwise.
The following patch puts all pool entries which can be natively encoded
into a vector, sorts it by decreasing size, determines minimum size
of a pool entry and adds hash elts for each (aligned) min_size or wider
power of two-ish portion of the pool constant in addition to the whole pool
constant byte representation.
This is the version that passed bootstrap/regtest on both x86_64-linux and
i686-linux. In both bootstraps/regtests together, it saved (from the
statistics I've gathered) 63104 .rodata bytes (before constant merging),
in 6814 hits of the data->desc->mark = ~(*slot)->desc->labelno;.
2020-08-31 Jakub Jelinek <jakub@redhat.com>
PR middle-end/54201
* varasm.c: Include alloc-pool.h.
(output_constant_pool_contents): Emit desc->mark < 0 entries as
aliases.
(struct constant_descriptor_rtx_data): New type.
(constant_descriptor_rtx_data_cmp): New function.
(struct const_rtx_data_hasher): New type.
(const_rtx_data_hasher::hash, const_rtx_data_hasher::equal): New
methods.
(optimize_constant_pool): New function.
(output_shared_constant_pool): Call it if TARGET_SUPPORTS_ALIASES.
gcc/fortran/ChangeLog:
PR fortran/95352
* simplify.c (simplify_bound_dim): Add check for NULL pointer
before trying to access structure member.
José Rui Faustino de Sousa <jrfsousa@gmail.com>
gcc/testsuite/ChangeLog:
* gfortran.dg/PR95352.f90: New test.
gcc/fortran/ChangeLog:
PR fortran/94110
* interface.c (gfc_compare_actual_formal): Add code to also raise
the actual argument cannot be an assumed-size array error when the
dummy arguments are deferred-shape or assumed-rank pointer.
gcc/testsuite/ChangeLog:
PR fortran/94110
* gfortran.dg/PR94110.f90: New test.
The constant pool size optimization I was testing resulted in various ICEs
in gcc.target/i386/ testsuite, the problem is that the ssse3_pshufbv8qi
splitter emits invalid RTL, in V4SImode 0xf7f7f7f7 CONST_INTs shouldn't
appear, instead they should have been -0x8080809 (0xf7f7f7f7 sign extended
into 64 bits).
2020-08-30 Jakub Jelinek <jakub@redhat.com>
* config/i386/sse.md (ssse3_pshufbv8qi): Use gen_int_mode instead of
GEN_INT, and ix86_build_const_vector instead of gen_rtvec and
gen_rtx_CONT_VECTOR.
libstdc++-v3/ChangeLog:
* include/std/numeric (__detail::__absu(bool)): Make deleted
function a function template, so it will be chosen for calls
with an explicit template argument list.
* testsuite/26_numerics/gcd/gcd_neg.cc: Add dg-prune-output.
* testsuite/26_numerics/lcm/lcm_neg.cc: Likewise.
It turns out that the target hook that this is supposed to satisfy
disappeared in 2004. Probably time to retire it.
2020-08-28 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/
* config/rs6000/rs6000-builtin.def (MASK_FOR_STORE): Remove.
* config/rs6000/rs6000-call.c (rs6000_expand_builtin): Remove
all logic for ALTIVEC_BUILTIN_MASK_FOR_STORE.
My recent change to implement P0548 ("common_type and duration") was not
correct. The result of common_type_t<duration<R,P>, duration<R,P>>
should be duration<common_type_t<R>, P::type>, not duration<R, P::type>.
The common_type specialization for two different duration types was
correct, but the specializations for a single duration type (which only
exist to optimize compilation time) were wrong.
This fixes the partial specializations of common_type for a single
duration type, and also the return types of duration::operator+ and
duration::operator- which are supposed to use common_type_t<duration>.
libstdc++-v3/ChangeLog:
* include/std/chrono (common_type): Fix partial specializations
for a single duration type to use the common_type of the rep.
(duration::operator+, duration::operator-): Fix return types
to also use the common_type of the rep.
* testsuite/20_util/duration/requirements/reduced_period.cc:
Check duration using a rep that has common_type specialized.
This fixes a bug with mixed signed and unsigned types, where converting
a negative value to the unsigned result type alters the value. The
solution is to obtain the absolute values of the arguments immediately
and to perform the actual GCD or LCM algorithm on two arguments of the
same type.
In order to operate on the most negative number without overflow when
taking its absolute, use an unsigned type for the result of the abs
operation. For example, -INT_MIN will overflow, but -(unsigned)INT_MIN
is (unsigned)INT_MAX+1U which is the correct value.
libstdc++-v3/ChangeLog:
PR libstdc++/92978
* include/std/numeric (__abs_integral): Replace with ...
(__detail::__absu): New function template that returns an
unsigned type, guaranteeing it can represent the most
negative signed value.
(__detail::__gcd, __detail::__lcm): Require arguments to
be unsigned and therefore already non-negative.
(gcd, lcm): Convert arguments to absolute value as unsigned
type before calling __detail::__gcd or __detail::__lcm.
* include/experimental/numeric (gcd, lcm): Likewise.
* testsuite/26_numerics/gcd/gcd_neg.cc: Adjust expected
errors.
* testsuite/26_numerics/lcm/lcm_neg.cc: Likewise.
* testsuite/26_numerics/gcd/92978.cc: New test.
* testsuite/26_numerics/lcm/92978.cc: New test.
* testsuite/experimental/numeric/92978.cc: New test.
Remove unnecessary tests before copying function address to r12.
2020-08-28 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/
* config/rs6000/rs6000.c (rs6000_call_aix): Remove test for r12.
(rs6000_sibcall_aix): Likewise.
An API change broke the amdgcn build.
gcc/ChangeLog:
* config/gcn/gcn-tree.c (gcn_goacc_get_worker_red_decl): Add "true"
parameter to vec_safe_grow_cleared.
gcc/ChangeLog:
* ggc-common.c (gt_pch_save): Add argument to a call.
gcc/jit/ChangeLog:
* jit-recording.c (recording::switch_::make_debug_string): Add argument
to a call.
gcc/fortran/ChangeLog:
PR fortran/94672
* trans-array.c (gfc_trans_g77_array): Check against the parm decl and
set the nonparm decl used for the is-present check to NULL if absent.
gcc/testsuite/ChangeLog:
PR fortran/94672
* gfortran.dg/optional_assumed_charlen_2.f90: New test.
Problem is related to that operand 4 (In original pattern
cond_sub<mode>_any_const) is no longer the same as operand 1, and so
the pattern doesn't match the split condition.
Pattern cond_sub<mode>_any_const is being split by this patch into two
separate patterns:
* Pattern cond_sub<mode>_relaxed_const now matches const_int
SVE_RELAXED_GP operand.
* Pattern cond_sub<mode>_strict_const now matches const_int
SVE_STRICT_GP operand.
* Remove aarch64_sve_pred_dominates_p condition from both patterns.
gcc/ChangeLog:
PR target/96357
* config/aarch64/aarch64-sve.md
(cond_sub<mode>_relaxed_const): Updated and renamed from
cond_sub<mode>_any_const pattern.
(cond_sub<mode>_strict_const): New pattern.
gcc/testsuite/ChangeLog:
PR target/96357
* gcc.target/aarch64/sve/pr96357.c: New test.
This test fails on ILP32 since we're looking for a pattern that could
only be hit on LP64. Disabling the test on ILP32 since the problematic
mult pattern was never hit there, so there's nothing to test.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/mem-shift-canonical.c: Skip on ILP32.
2020-08-28 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/96624
* simplify.c (gfc_simplify_reshape): Detect zero shape and
clear index if found.
gcc/testsuite/
PR fortran/96624
* gfortran.dg/reshape_8.f90 : New test.
gcc.dg/pr96579.c includes gcc.dg/pr96370.c which needs target dfp.
2020-08-28 Christophe Lyon <christophe.lyon@linaro.org>
gcc/testsuite/
* gcc.dg/pr96579.c: Compile only with target dfp.
2020-08-30 Uros Bizjak <ubizjak@gmail.com>
gcc/ChangeLog:
PR target/96744
* config/i386/i386-expand.c (split_double_mode): Also handle
E_P2HImode and E_P2QImode.
* config/i386/sse.md (MASK_DWI): New define_mode_iterator.
(mov<mode>): New expander for P2HI,P2QI.
(*mov<mode>_internal): New define_insn_and_split to split
movement of P2QI/P2HI to 2 movqi/movhi patterns after reload.
gcc/testsuite/ChangeLog:
* gcc.target/i386/double_mask_reg-1.c: New test.
Replace the U+00B7 middle dot character, placed after "mips64p32le"
in the target lists, with a space. The U+00B7 character may not be
considered whitespace by Bourne shell and any non-ASCII character
may render incorrectly in some terminal devices.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/251177
This implements the changes from P0548 "common_type and duration". That
was a change for C++17, but as it corrects some issues introduced by DRs
I'm also treating it as a DR and changing it for all modes from C++11
up.
The main change is that duration<R,P>::period no longer denotes P, but
rather P::type, the reduced ratio. The unary operator+ and operator-
members of duration should now return a duration using that reduced
ratio.
The requirement that common_type<T>::type is the same type as
common_type<T, T>::type (rather than simply T) was already implemented
for PR 89102.
The standard says that duration::operator+() and duration::operator-()
should return common_type_t<duration>, but that seems unnecessarily
expensive to compute. This change just uses duration<rep, period> which
is the same type, so we don't need to instantiate common_type.
As an optimization, this also adds partial specializations of
common_type for two durations of the same type, a single duration, two
time_points of the same type, and a single time_point. These
specializations avoid instantiating other specializations of common_type
and one or both of __duration_common_type or __timepoint_common_type for
the cases where the answer is trivial to obtain.
libstdc++-v3/ChangeLog:
* include/std/chrono (__duration_common_type): Ensure the
reduced ratio is used. Remove unused partial specialization
using __failure_type.
(common_type): Pass reduced ratios to __duration_common_type.
Add partial specializations for simple cases involving a single
duration or time_point type.
(duration::period): Use reduced ratio.
(duration::operator+(), duration::operator-()): Return duration
type using the reduced ratio.
* testsuite/20_util/duration/requirements/typedefs_neg2.cc:
Adjust expected errors.
* testsuite/20_util/duration/requirements/reduced_period.cc: New test.
This fixes the months-based addition for year_month when the
year_month's month component is 0.
libstdc++-v3/ChangeLog:
* include/std/chrono (year_month::operator+): Properly handle a
month value of 0 by casting the month value to int before
subtracting 1 from it so that the difference is sign-extended in
the subsequent addition.
* testsuite/std/time/year_month/1.cc: Test adding months to a
year_month whose month component is below or above the
normalized range of [1,12].
We currently don't enforce a constraint on some of the calendar types'
addition/subtraction operator overloads that take a 'months' arguments:
Constraints: If the argument supplied by the caller for the months
parameter is convertible to years, its implicit conversion sequence to
years is worse than its implicit conversion sequence to months.
This constraint is relevant when adding/subtracting a duration to/from,
say, a year_month where the given duration is convertible to both
'months' and to 'years' (as in the new testcases below). The correct
behavior here in light of this constraint is to perform the operation
through the (more efficient) 'years'-based overload, but we currently
emit an ambiguous overload error.
This patch templatizes the 'months'-based addition/subtraction operator
overloads so that in the event of an implicit-conversion tie, we select
the non-template 'years'-based overload. This is the same approach
that the date library takes for enforcing this constraint.
libstdc++-v3/ChangeLog:
* include/std/chrono
(__detail::__months_years_conversion_disambiguator): Define.
(year_month::operator+=): Templatize the 'months'-based overload
so that the 'years'-based overload is selected in case of
equally-ranked implicit conversion sequences to both 'months'
and 'years' from the supplied argument.
(year_month::operator-=): Likewise.
(year_month::operator+): Likewise.
(year_month::operator-): Likewise.
(year_month_day::operator+=): Likewise.
(year_month_day::operator-=): Likewise.
(year_month_day::operator+): Likewise.
(year_month_day::operator-): Likewise.
(year_month_day_last::operator+=): Likewise.
(year_month_day_last::operator-=): Likewise.
(year_month_day_last::operator+): Likewise
(year_month_day_last::operator-): Likewise.
(year_month_day_weekday::operator+=): Likewise
(year_month_day_weekday::operator-=): Likewise.
(year_month_day_weekday::operator+): Likewise.
(year_month_day_weekday::operator-): Likewise.
(year_month_day_weekday_last::operator+=): Likewise
(year_month_day_weekday_last::operator-=): Likewise.
(year_month_day_weekday_last::operator+): Likewise.
(year_month_day_weekday_last::operator-): Likewise.
(testsuite/std/time/year_month/2.cc): New test.
(testsuite/std/time/year_month_day/2.cc): New test.
(testsuite/std/time/year_month_day_last/2.cc): New test.
(testsuite/std/time/year_month_weekday/2.cc): New test.
(testsuite/std/time/year_month_weekday_last/2.cc): New test.
For _Atomic fields, lowering the alignment of long long or double etc.
fields on ia32 is undesirable, because then one really can't perform atomic
operations on those using cmpxchg8b.
The following patch stops lowering the alignment in fields for _Atomic
types (the x86_field_alignment change) and for -mpreferred-stack-boundary=2
also ensures we don't misalign _Atomic long long etc. automatic variables
(the ix86_{local,minimum}_alignment changes).
Not sure about iamcu_alignment change, I know next to nothing about IA MCU,
but unless it doesn't have cmpxchg8b instruction, it would surprise me if we
don't want to do it as well.
clang apparently doesn't lower the field alignment for _Atomic.
2020-08-27 Jakub Jelinek <jakub@redhat.com>
PR target/65146
* config/i386/i386.c (iamcu_alignment): Don't decrease alignment
for TYPE_ATOMIC types.
(ix86_local_alignment): Likewise.
(ix86_minimum_alignment): Likewise.
(x86_field_alignment): Likewise, and emit a -Wpsabi diagnostic
for it.
* gcc.target/i386/pr65146.c: New test.
Prior to P10, ELFv2 hasn't implemented nonlocal sibcalls. Now that we do,
we need to be sure that r12 is set up prior to such a call.
2020-08-27 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/
PR target/96787
* config/rs6000/rs6000.c (rs6000_sibcall_aix): Support
indirect call for ELFv2.
gcc/testsuite/
PR target/96787
* gcc.target/powerpc/pr96787-1.c: New.
* gcc.target/powerpc/pr96787-2.c: New.
A length expression containing a divide by zero in a character
declaration will result in an ICE if the constant is anymore
complicated that a contant divided by a constant.
The cause was that char_len_param_value can return MATCH_YES
even if a divide by zero was seen. Prior to returning check
whether a divide by zero was seen and if so set it to MATCH_ERROR.
2020-08-27 Mark Eggleston <markeggleston@gcc.gnu.org>
gcc/fortran
PR fortran/95882
* decl.c (char_len_param_value): Check gfc_seen_div0 and
if it is set return MATCH_ERROR.
2020-08-27 Mark Eggleston <markeggleston@gcc.gnu.org>
gcc/testsuite/
PR fortran/95882
* gfortran.dg/pr95882_1.f90: New test.
* gfortran.dg/pr95882_2.f90: New test.
* gfortran.dg/pr95882_3.f90: New test.
* gfortran.dg/pr95882_4.f90: New test.
* gfortran.dg/pr95882_5.f90: New test.
This removes the bogus tranfer of flow-sensitive info in copy_ref_info
plus fixes one oversight in FRE when flow-sensitive non-NULLness was added to
points-to info.
2020-08-27 Richard Biener <rguenther@suse.de>
PR tree-optimization/96522
* tree-ssa-address.c (copy_ref_info): Reset flow-sensitive
info of the copied points-to. Transfer bigger alignment
via the access type.
* tree-ssa-sccvn.c (eliminate_dom_walker::eliminate_stmt):
Reset all flow-sensitive info.
* gcc.dg/torture/pr96522.c: New testcase.
The following streamlines TARGET_MEM_REF dumping building
on what we do for MEM_REF and thus dumping things like
access type, TBAA type and base/clique. I've changed it
to do semantic dumping aka base + offset + step * index
rather than the odd base: A, step: way.
2020-08-27 Richard Biener <rguenther@suse.de>
* tree-pretty-print.c (dump_mem_ref): Handle TARGET_MEM_REFs.
(dump_generic_node): Use dump_mem_ref also for TARGET_MEM_REF.
* gcc.dg/tree-ssa/loop-19.c: Adjust.
* gcc.dg/tree-ssa/loop-2.c: Likewise.
* gcc.dg/tree-ssa/loop-3.c: Likewise.
Inside a (mem) RTX, it is canonical to write multiplications by powers
of two using a (mult) [0]. Outside of a (mem), the canonical way to
write multiplications by powers of two is using (ashift).
Now I observed that LRA does not quite respect this RTL canonicalization
rule. When compiling gcc/testsuite/gcc.dg/torture/pr34330.c with -Os
-ftree-vectorize, the RTL in the dump "281r.ira" has the insn:
(set (reg:SI 111)
(mem:SI (plus:DI (mult:DI (reg:DI 101 [ ivtmp.9 ])
(const_int 4 [0x4]))
(reg/v/f:DI 105 [ b ]))))
but LRA then proceeds to generate a reload, and we get the following
non-canonical insn in "282r.reload":
(set (reg:DI 7 x7 [121])
(plus:DI (mult:DI (reg:DI 5 x5 [orig:101 ivtmp.9 ] [101])
(const_int 4 [0x4]))
(reg/v/f:DI 1 x1 [orig:105 b ] [105])))
This patch fixes LRA to ensure that we generate canonical RTL in this
case. After the patch, we get the following insn in "282r.reload":
(set (reg:DI 7 x7 [121])
(plus:DI (ashift:DI (reg:DI 5 x5 [orig:101 ivtmp.9 ] [101])
(const_int 2 [0x2]))
(reg/v/f:DI 1 x1 [orig:105 b ] [105])))
[0] : https://gcc.gnu.org/onlinedocs/gccint/Insn-Canonicalizations.html
gcc/ChangeLog:
* lra-constraints.c (canonicalize_reload_addr): New.
(curr_insn_transform): Use canonicalize_reload_addr to ensure we
generate canonical RTL for an address reload.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/mem-shift-canonical.c: New test.
This makes sure to put special-ops expanded rhs left where
expression rewrite expects it.
2020-08-27 Richard Biener <rguenther@suse.de>
PR tree-optimization/96579
* tree-ssa-reassoc.c (linearize_expr_tree): If we expand
rhs via special ops make sure to swap operands.
* gcc.dg/pr96579.c: New testcase.
This improves DSEs stmt walking by not considering a DEF without
uses for further processing (and thus giving up when there's two
paths to follow).
2020-08-26 Richard Biener <rguenther@suse.de>
PR tree-optimization/96565
* tree-ssa-dse.c (dse_classify_store): Remove defs with
no uses from further processing.
* gcc.dg/tree-ssa/ssa-dse-40.c: New testcase.
* gcc.dg/builtin-object-size-4.c: Adjust.
Almost all of the proposed resolution for LWG 3448 is already
implemented; the only part left is to adjust the return type of
transform_view::sentinel::operator-.
libstdc++-v3/ChangeLog:
PR libstdc++/95322
* include/std/ranges (transform_view::sentinel::__distance_from):
Give this a deduced return type.
(transform_view::sentinel::operator-): Adjust the return type so
that it's based on the constness of the iterator rather than
that of the sentinel.
* testsuite/std/ranges/adaptors/95322.cc: Refer to LWG 3488.