This patch fixes a large lmbench performance regression with
128-bit SVE, compiled in length-agnostic mode.
vect_better_loop_vinfo_p (new in GCC 10) tries to estimate whether
a new loop_vinfo is cheaper than a previous one, with an in-built
preference for the old one. For variable VF it prefers the old
loop_vinfo if it is cheaper for at least one VF. However, we have
no idea how likely that VF is in practice.
Another extreme would be to do what most of the rest of the
vectoriser does, and rely solely on the constant estimated VF.
But as noted in the comment, this means that a one-unit cost
difference would be enough to pick the new loop_vinfo,
despite the target generally preferring the old loop_vinfo
where possible. The cost model just isn't accurate enough
for that to produce good results as things stand: there might
not be any practical benefit to the new loop_vinfo at the
estimated VF, and it would be significantly worse for higher VFs.
The patch instead goes for a hacky compromise: make sure that the new
loop_vinfo is also no worse than the old loop_vinfo at double the
estimated VF. For all but trivial loops, this ensures that the
new loop_vinfo is only chosen if it is better than the old one
by a non-trivial amount at the estimated VF. It also avoids
putting too much faith in the VF estimate.
I realise this isn't great, but it's supposed to be a conservative fix
suitable for stage 4. The only affected testcases are the ones for
pr89007-*.c, where Advanced SIMD is indeed preferred for 128-bit SVE
and is no worse for 256-bit SVE.
Part of the problem here is that if the new loop_vinfo is better,
we discard the old one and never consider using it even as an
epilogue loop. This means that if we choose Advanced SIMD over SVE,
we're much more likely to have left-over scalar elements.
Another is that the estimate provided by estimated_poly_value might have
different probabilities attached. E.g. when tuning for a particular core,
the estimate is probably accurate, but when tuning for generic code,
the estimate is more of a guess. Relying solely on the estimate is
probably correct for the former but not for the latter.
Hopefully those are things that we could tackle in GCC 11.
2020-04-20 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-loop.c (vect_better_loop_vinfo_p): If old_loop_vinfo
has a variable VF, prefer new_loop_vinfo if it is cheaper for the
estimated VF and is no worse at double the estimated VF.
gcc/testsuite/
* gcc.target/aarch64/sve/cost_model_8.c: New test.
* gcc.target/aarch64/sve/cost_model_9.c: Likewise.
* gcc.target/aarch64/sve/pr89007-1.c: Add -msve-vector-bits=512.
* gcc.target/aarch64/sve/pr89007-2.c: Likewise.
This testcase triggered an ICE in rtx_vector_builder::step because
we were trying to use a stepped representation for floating-point
constants. The underlying problem was that the arguments to
rtx_vector_builder were the wrong way around, meaning that some
variations were likely to be incorrectly encoded for integers
(but probably as a silent failure).
Also, aarch64_sve_expand_vector_init_handle_trailing_constants
tries to extend the trailing constant elements to a full vector
by following the "natural" pattern of the original vector, which
should generally lead to nicer constants. However, for the testcase,
we'd then end up picking a variable for some elements. Fixed by
stubbing out all variable elements with zeros.
That fix involved testing valid_for_const_vector_p. For consistency,
the patch uses the same test when finding trailing constants, instead
of the previous aarch64_legitimate_constant_p.
2020-04-20 Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR target/94668
* config/aarch64/aarch64.c (aarch64_sve_expand_vector_init): Fix
order of arguments to rtx_vector_builder.
(aarch64_sve_expand_vector_init_handle_trailing_constants): Likewise.
When extending the trailing constants to a full vector, replace any
variables with zeros.
gcc/testsuite/
PR target/94668
* gcc.target/aarch64/sve/pr94668.c: New test.
If extra_tool_flags starts with a dash, an error like 'ERROR: verbose:
illegal argument: -march=native -O2 -std=c++17' is printed. This is
easily fixed by inserting a double dash before the variable.
2020-04-20 Matthias Kretz <kretz@kde.org>
* testsuite/lib/libstdc++.exp: Avoid illegal argument to verbose.
We treat tpl-tpl-parms as types. They're not; bound-tpl-tpl-parms
are. We can get away with them being type-like. Unfortunately we
give the original level==orig_level case a canonical type, but the
reduced cases of level<orig_level get structural equality. This patch
gives them structural type always.
* pt.c (canonical_type_parameter): Assert not a tpl-tpl-parm.
(process_template_parm): tpl-tpl-parms are structural.
(rewrite_template_parm): Propagate structuralness.
We were not comparing expression pack expansions correctly. We could
consider distinct expansions equal and creating two, apparently equal,
specializations that would sometimes collide. cp_tree_operand_length
says a pack has 1 operand (for mangling), whereas it actually has 3,
but only two of which are significant for equality. We must special
case that in cp_tree_equal. That new code matches the hasher and the
type_pack_expansion case in structural_comp_types.
* tree.c (cp_tree_equal): [TEMPLATE_ID_EXPR, default] Refactor.
[EXPR_PACK_EXPANSION]: Add.
One of the problems hit by pr94454 was that the argument hasher was
not skipping nodes that template_args_equal would. Fixed by replacing
the STRIP_NOPS invocation by a bespoke loop. We also confuse the
canonical type machinery by treating tpl-tpl-parms as types. They're
not; bound-tpl-tpl-parms are. We can get away with them being
type-like. Unfortunately we give the original level==orig_level case
a canonical type, but the reduced cases of level<orig_level get
structural equality. That breaks the hasher because we'll use
TYPE_HASH (CANONICAL_TYPE ()) when we can. There's a note in
tsubst[TEMPLATE_TEMPLATE_PARM] about why the reduced ones cannot have
a canonical type. (I didn't feel like questioning that assertion at
this point.)
* pt.c (iterative_hash_template_arg): Strip nodes as
template_args_equal does.
[ARGUMENT_PACK_SELECT, TREE_VEC, CONSTRUCTOR]: Refactor.
[node_class:TEMPLATE_TEMPLATE_PARM]: Hash by level & index.
[node_class:default]: Refactor.
Add missing check in gfc_set_array_spec for sum of rank and corank to not
exceed GFC_MAX_DIMENSIONS.
2020-04-20 Harald Anlauf <anlauf@gmx.de>
PR fortran/93364
* array.c (gfc_set_array_spec): Check for sum of rank and corank
not exceeding GFC_MAX_DIMENSIONS.
2020-04-20 Harald Anlauf <anlauf@gmx.de>
PR fortran/93364
* gfortran.dg/pr93364.f90: New test.
2020-04-20 Steve Kargl <kargl@gcc.gnu.org>
Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/91800
* decl.c (variable_decl): Reject Hollerith constants as type
initializer.
2020-04-20 Steve Kargl <kargl@gcc.gnu.org>
Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/91800
* gfortran.dg/hollerith_9.f90: New test.
Testing on the host does not make sense for 'declare copyout' for
a same-scope stack-allocated variable. Once the copyout is done,
the variable is gone. Hence, test the variable on the device. This
can be revisit after the OpenACC semantic has been fixed; but with
that fix, the test PASSes again with devices.
PR middle-end/94120
* testsuite/libgomp.oacc-c++/declare-pr94120.C: Fix 'declare copy(out)'
test case.
Some more C++20 changes from P1614R2, "The Mothership has Landed".
* include/bits/stl_queue.h (queue): Define operator<=> for C++20.
* include/bits/stl_stack.h (stack): Likewise.
* testsuite/23_containers/queue/cmp_c++20.cc: New test.
* testsuite/23_containers/stack/cmp_c++20.cc: New test.
This appears to be a copy&paste error, which cppcheck diagnoses.
PR other/94629
* include/debug/formatter.h (_Error_formatter::_Parameter): Fix
redundant assignment in constructor.
std.array.Appender and RefAppender: use .opSlice() instead of data()
Previously, Appender.data() was used to extract a slice of the Appender's array.
Now use the [] slice operator instead. The same goes for RefAppender.
Fixes: PR d/94455
Reviewed-on: https://github.com/dlang/phobos/pull/7450
According to "Intel 64 and IA32 Arch SDM, Vol. 3:
"Because SIMD floating-point exceptions are precise and occur immediately,
the situation does not arise where an x87 FPU instruction, a WAIT/FWAIT
instruction, or another SSE/SSE2/SSE3 instruction will catch a pending
unmasked SIMD floating-point exception."
Remove unneeded assignments to volatile memory.
libgcc/ChangeLog:
* config/i386/sfp-exceptions.c (__sfp_handle_exceptions) [__SSE_MATH__]:
Remove unneeded assignments to volatile memory.
libatomic/ChangeLog:
* config/x86/fenv.c (__atomic_feraiseexcept) [__SSE_MATH__]:
Remove unneeded assignments to volatile memory.
libgfortran/ChangeLog:
* config/fpu-387.h (local_feraiseexcept) [__SSE_MATH__]:
Remove unneeded assignments to volatile memory.
While the coroutines implementation, and most of the coroutines
tests, will operate with C++14 or newer, these tests require
facilities introduced in C++17. Add the target requirement.
gcc/testsuite/
2020-04-19 Iain Sandoe <iain@sandoe.co.uk>
* g++.dg/coroutines/torture/co-await-17-capture-comp-ref.C: Require
C++17.
* g++.dg/coroutines/torture/co-ret-15-default-return_void.C: Likewise.
Returning &gfc_bad_expr when simplifying bounds after a divisin by zero
happened results in the division by zero error actually reaching the user.
2020-04-19 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/93500
* resolve.c (resolve_operator): If both operands are
NULL, return false.
* simplify.c (simplify_bound): If a division by zero
was seen during bound simplification, free the
corresponcing expression and return &gfc_bad_expr.
2020-04-19 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/93500
* arith_divide_3.f90: New test.
Similarly to inline asm, :: (or any other number of consecutive colons) can
appear in ObjC @selector argument and with the introduction of CPP_SCOPE
into the C FE, we need to trat CPP_SCOPE as two CPP_COLON tokens.
The C++ FE does that already that way.
2020-04-19 Jakub Jelinek <jakub@redhat.com>
PR objc/94637
* c-parser.c (c_parser_objc_selector_arg): Handle CPP_SCOPE like
two CPP_COLON tokens.
* objc.dg/pr94637.m: New test.
Patch fixes test failure seen on X32 where a nested struct was passed in
registers, rather than via invisible reference. Now, all non-POD
structs are passed by invisible reference, not just those with a
user-defined copy constructor/destructor.
gcc/d/ChangeLog:
PR d/94609
* d-codegen.cc (argument_reference_p): Don't check TREE_ADDRESSABLE.
(type_passed_as): Build reference type if TREE_ADDRESSABLE.
* d-convert.cc (convert_for_argument): Build explicit TARGET_EXPR if
needed for arguments passed by invisible reference.
* types.cc (TypeVisitor::visit (TypeStruct *)): Mark all structs that
are not POD as TREE_ADDRESSABLE.
The intended purpose of the option is both for targets that don't
support phobos yet, and for gdc itself to support bootstrapping itself
as a self-hosted D compiler.
The libphobos testsuite has been updated to only add libphobos to the
search paths if it's being built. A new D2 testsuite directive
RUNNABLE_PHOBOS_TEST has also been patched in to disable some runnable
tests that have phobos dependencies, of which is a temporary measure
until upstream DMD fixes or removes these tests entirely.
gcc/testsuite/ChangeLog:
* lib/gdc-utils.exp (gdc-convert-test): Add dg-skip-if for tests that
depending on the phobos standard library.
libphobos/ChangeLog:
* configure: Regenerate.
* configure.ac: Add --with-libphobos-druntime-only option and the
conditional ENABLE_LIBDRUNTIME_ONLY.
* configure.tgt: Define LIBDRUNTIME_ONLY.
* src/Makefile.am: Add phobos sources if not ENABLE_LIBDRUNTIME_ONLY.
* src/Makefile.in: Regenerate.
* testsuite/testsuite_flags.in: Add phobos path if compiling phobos.
The current check_effective_target_d_runtime procedure returns false if
the target is built without any core runtime library for D being
available (--disable-libphobos). This additional procedure is for
targets where the core runtime library exists, but without the higher
level standard library.
gcc/ChangeLog:
* doc/sourcebuild.texi (Effective-Target Keywords, Environment
attributes): Document d_runtime_has_std_library.
gcc/testsuite/ChangeLog:
* gdc.dg/link.d: Use d_runtime_has_std_library effective target.
* gdc.dg/runnable.d: Move phobos tests to...
* gdc.dg/runnable2.d: ...here. New test.
* lib/target-supports.exp
(check_effective_target_d_runtime_has_std_library): New.
libphobos/ChangeLog:
* testsuite/libphobos.phobos/phobos.exp: Skip if effective target is
not d_runtime_has_std_library.
* testsuite/libphobos.phobos_shared/phobos_shared.exp: Likewise.
In the testcase below, during specialization of c<int>::d, we build two
identical specializations of the parameter type b<decltype(e)::k> -- one when
substituting into c<int>::d's TYPE_ARG_TYPES and another when substituting into
c<int>::d's DECL_ARGUMENTS.
We don't reuse the first specialization the second time around as a consequence
of the fix for PR c++/56247 which made PARM_DECLs always compare different from
one another during spec_hasher::equal. As a result, when looking up existing
specializations of 'b', spec_hasher::equal considers the template argument
decltype(e')::k to be different from decltype(e'')::k, where e' and e'' are the
result of two calls to tsubst_copy on the PARM_DECL e.
Since the two specializations are considered different due to the mentioned fix,
their TYPE_CANONICAL points to themselves even though they are otherwise
identical types, and this triggers an ICE in maybe_rebuild_function_decl_type
when comparing the TYPE_ARG_TYPES of c<int>::d to its DECL_ARGUMENTS.
This patch fixes this issue at the spec_hasher::equal level by ignoring the
'comparing_specializations' flag in cp_tree_equal whenever the DECL_CONTEXTs of
the two parameters are identical. This seems to be a sufficient condition to be
able to correctly compare PARM_DECLs structurally. (This also subsumes the
CONSTRAINT_VAR_P check since constraint variables all have empty, and therefore
identical, DECL_CONTEXTs.)
gcc/cp/ChangeLog:
PR c++/94632
* tree.c (cp_tree_equal) <case PARM_DECL>: Ignore
comparing_specializations if the parameters' contexts are identical.
gcc/testsuite/ChangeLog:
PR c++/94632
* g++.dg/template/canon-type-14.C: New test.
When updating an auto return type of an abbreviated function template in
splice_late_return_type, we should also propagate PLACEHOLDER_TYPE_CONSTRAINTS
(and cv-qualifiers) of the original auto node.
gcc/cp/ChangeLog:
PR c++/92187
* pt.c (splice_late_return_type): Propagate cv-qualifiers and
PLACEHOLDER_TYPE_CONSTRAINTS from the original auto node to the new one.
gcc/testsuite/ChangeLog:
PR c++/92187
* g++.dg/concepts/abbrev5.C: New test.
* g++.dg/concepts/abbrev6.C: New test.
Some more C++20 changes from P1614R2, "The Mothership has Landed".
* include/std/chrono (duration, time_point): Define operator<=> and
remove redundant operator!= for C++20.
* testsuite/20_util/duration/comparison_operators/three_way.cc: New
test.
* testsuite/20_util/time_point/comparison_operators/three_way.cc: New
test.
In C++20 the rebind and const_reference members of std::allocator are
gone, so this testsuite utility stopped working, causing
ext/pb_ds/regression/priority_queue_rand_debug.cc to FAIL.
* testsuite/util/native_type/native_priority_queue.hpp: Use
allocator_traits to rebind allocator.
This time instead of having a NOP copy insn that we can completely ignore and
ultimately remove, we have a NOP set within a multi-set PARALLEL. It triggers,
the same failure when the source of such a set is a hard register for the same
reasons as we've already noted in the BZ and patches-to-date.
For prior cases we've been able to mark the insn as a nop set and ignore it for
the rest of cse_insn, ultimately removing it. That's not really an option here
as there are other sets that we have to preserve.
We might be able to fix this instance by splitting the multi-set insn, but I'm
not keen to introduce splitting into cse. Furthermore, the target may not be
able to split the insn. So I considered this is non-starter.
What I finally settled on was to use the existing do_not_record machinery to
ignore the nop set within the parallel (and only that set within the parallel).
One might argue that we should always ignore a REG_UNUSED set. But I rejected
that idea -- we could have cse-able divmod insns where the first had a
REG_UNUSED note for a destination, but the second did not.
One might also argue that we could have a nop set without a REG_UNUSED in a
multi-set parallel and thus we could trigger yet another insert_regs ICE at
some point. I tend to think this is a possibility. If we see this happen,
we'll have to revisit.
PR rtl-optimization/90275
* cse.c (cse_insn): Avoid recording nop sets in multi-set parallels
when the destination has a REG_UNUSED note.
In this PR, we're ICEing on a use of an 'int... a' template parameter pack as
part of the variadic lambda init-capture [...z=a].
The unexpected thing about this variadic init-capture is that it is not
type-dependent, and so the call to do_auto_deduction from
lambda_capture_field_type actually resolves its type to 'int' instead of exiting
early like it does for a type-dependent variadic initializer. This later
confuses add_capture which, according to one of its comments, assumes that
'type' is always 'auto' for a variadic init-capture.
The simplest fix (and the approach that this patch takes) seems to be to avoid
doing auto deduction in lambda_capture_field_type when the initializer uses
parameter packs, so that we always return 'auto' even in the non-type-dependent
case.
gcc/cp/ChangeLog:
PR c++/94483
* lambda.c (lambda_capture_field_type): Avoid doing auto deduction if
the explicit initializer has parameter packs.
gcc/testsuite/ChangeLog:
PR c++/94483
* g++.dg/cpp2a/lambda-pack-init5.C: New test.
In the testcase for this PR, we try to parse the statement
A(value<0>());
first tentatively as a declaration (with a parenthesized declarator), and during
this tentative parse we end up issuing a hard error from
cp_parser_check_template_parameters about its invalidness as a declaration.
Rather than issuing a hard error, it seems we should instead simulate an error
since we're parsing tentatively. This would then allow cp_parser_statement to
recover and successfully parse the statement as an expression-statement instead.
gcc/cp/ChangeLog:
PR c++/88754
* parser.c (cp_parser_check_template_parameters): Before issuing a hard
error, first try simulating an error instead.
gcc/testsuite/ChangeLog:
PR c++/88754
* g++.dg/parse/ambig10.C: New test.
The attached patch fixes an ICE on invalid: When the return type of
a function was misdeclared with a wrong rank, we issued a warning,
but not an error (unless with -pedantic); later on, an ICE ensued.
Nothing good can come from wrongly declaring a function type
(considering the ABI), so I changed that into a hard error.
2020-04-17 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/94090
* gfortran.dg (gfc_compare_interfaces): Add
optional argument bad_result_characteristics.
* interface.c (gfc_check_result_characteristics): Fix
whitespace.
(gfc_compare_interfaces): Handle new argument; return
true if function return values are wrong.
* resolve.c (resolve_global_procedure): Hard error if
the return value of a function is wrong.
2020-04-17 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/94090
* gfortran.dg/interface_46.f90: New test.
Some more C++20 changes from P1614R2, "The Mothership has Landed".
This adds three-way comparison support to std::char_traits,
std::basic_string, std::basic_string_view, and std::sub_match.
* include/bits/basic_string.h (basic_string): Define operator<=> and
remove redundant comparison operators for C++20.
* include/bits/char_traits.h (__gnu_cxx::char_traits, char_traits):
Add comparison_category members.
(__detail::__char_traits_cmp_cat): New helper to get comparison
category from char traits class.
* include/bits/regex.h (regex_traits::_RegexMask::operator!=): Do not
define for C++20.
(sub_match): Define operator<=> and remove redundant comparison
operators for C++20.
(match_results): Remove redundant operator!= for C++20.
* include/std/string_view (basic_string_view): Define operator<=> and
remove redundant comparison operators for C++20.
* testsuite/21_strings/basic_string/operators/char/cmp_c++20.cc: New
test.
* testsuite/21_strings/basic_string/operators/wchar_t/cmp_c++20.cc:
New test.
* testsuite/21_strings/basic_string_view/operations/copy/char/
constexpr.cc: Initialize variable.
* testsuite/21_strings/basic_string_view/operations/copy/wchar_t/
constexpr.cc: Likewise.
* testsuite/21_strings/basic_string_view/operators/char/2.cc: Add
dg-do directive and remove comments showing incorrect signatures.
* testsuite/21_strings/basic_string_view/operators/wchar_t/2.cc:
Likewise.
* testsuite/21_strings/basic_string_view/operators/char/cmp_c++20.cc:
New test.
* testsuite/21_strings/basic_string_view/operators/wchar_t/cmp_c++20.cc:
New test.
* testsuite/28_regex/sub_match/compare_c++20.cc: New test.
We were seeing performance regressions on 256-bit SVE with code like:
for (int i = 0; i < count; ++i)
#pragma GCC unroll 128
for (int j = 0; j < 128; ++j)
*dst++ = 1;
(derived from lmbench).
For 128-bit SVE, it's clearly better to use Advanced SIMD STPs here,
since they can store 256 bits at a time. We already do this for
-msve-vector-bits=128 because in that case Advanced SIMD comes first
in autovectorize_vector_modes.
If we handled full-loop predication well for this kind of loop,
the choice between Advanced SIMD and 256-bit SVE would be mostly
a wash, since both of them could store 256 bits at a time. However,
SVE would still have the extra prologue overhead of setting up the
predicate, so Advanced SIMD would still be the natural choice.
As things stand though, we don't handle full-loop predication well
for this kind of loop, so the 256-bit SVE code is significantly worse.
Something to fix for GCC 11 (hopefully). However, even though we
account for the overhead of predication in the cost model, the SVE
version (wrongly) appeared to need half the number of stores.
That was enough to drown out the predication overhead and meant
that we'd pick the SVE code over the Advanced SIMD code.
512-bit SVE has a clear advantage over Advanced SIMD, so we should
continue using SVE there.
This patch tries to account for this in the cost model. It's a bit
of a compromise; see the comment in the patch for more details.
2020-04-17 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/aarch64.c (aarch64_advsimd_ldp_stp_p): New function.
(aarch64_sve_adjust_stmt_cost): Add a vectype parameter. Double the
cost of load and store insns if one loop iteration has enough scalar
elements to use an Advanced SIMD LDP or STP.
(aarch64_add_stmt_cost): Update call accordingly.
gcc/testsuite/
* gcc.target/aarch64/sve/cost_model_2.c: New test.
* gcc.target/aarch64/sve/cost_model_3.c: Likewise.
* gcc.target/aarch64/sve/cost_model_4.c: Likewise.
* gcc.target/aarch64/sve/cost_model_5.c: Likewise.
* gcc.target/aarch64/sve/cost_model_6.c: Likewise.
* gcc.target/aarch64/sve/cost_model_7.c: Likewise.
This change fixes two obvious redundant assignments reported by cppcheck:
trunk.git/gcc/c/c-parser.c:16969:2: style: Variable 'data.clauses' is reassigned a value before the old one has been used. [redundantAssignment]
trunk.git/gcc/cp/call.c:5116:9: style: Variable 'arg2' is reassigned a value before the old one has been used. [redundantAssignment]
2020-04-17 Jakub Jelinek <jakub@redhat.com>
PR other/94629
* c-parser.c (c_parser_oacc_routine): Remove redundant assignment
to data.clauses.
* call.c (build_conditional_expr_1): Remove redundant assignment to
arg2.
As the testcase shows, there are unfortunately more problematic cases
in *testqi_ext_3 if the mode is not CCZmode, because the sign flag might
not behave the same between the insn with zero_extract and what we split it
into.
The previous fix to the insn condition was because *testdi_1 for mask with
upper 32-bits clear and bit 31 set is implemented using SImode test and thus
SF is set depending on that bit 31 rather than on always cleared.
But we can have other cases. On the zero_extract (which has <MODE>mode),
we can have either the pos + len == precision of <MODE>mode, or
pos + len < precision of <MODE>mode cases. The former one copies the most
significant bit into SF, the latter will have SF always cleared.
For the former case, either it is a zero_extract from a larger mode, but
then when we perform test in that larger mode, SF will be always clear and
thus mismatch from the zero_extract case (so we need to enforce CCZmode),
or it will be a zero_extract from same mode with pos 0 and len equal to
mode precision, such zero_extracts should have been really simplified
into their first operand.
For the latter case, when SF is always clear on the define_insn with
zero_extract, we need to split into something that doesn't sometimes set
SF, i.e. it has to be a test with mask that doesn't have the most
significant bit set. In some cases it can be achieved through using test
in a wider mode (e.g. in the testcase, there is
(zero_extract:SI (reg:HI) (const_int 13) (const_int 3))
which will always set SF to 0, but we split it into
(and:HI (reg:HI) (const_int -8))
which will copy the MSB of (reg:HI) into SF, but we can do:
(and:SI (subreg:SI (reg:HI) 0) (const_int 0xfff8))
which will keep SF always cleared), but there are various cases where we
can't (when already using DImode, or when SImode and we'd turned it into
the problematic *testdi_1 implemented using SImode test, or when
the val operand is a MEM (we don't want to read from memory more than
the user originally wanted), paradoxical subreg of MEM could be problematic
too if we through the narrowing end up with a MEM).
So, the patch attempts to require CCZmode (and not CCNOmode) if it can't
really ensure the SF will have same meaning between the define_insn and what
we split it into, and if we decide we allow CCNOmode, it needs to avoid
performing narrowing and/or widen if pos + len would indicate we'd have MSB
set in the mask.
2020-04-17 Jakub Jelinek <jakub@redhat.com>
Jeff Law <law@redhat.com>
PR target/94567
* config/i386/i386.md (*testqi_ext_3): Use CCZmode rather than
CCNOmode in ix86_match_ccmode if len is equal to <MODE>mode precision,
or pos + len >= 32, or pos + len is equal to operands[2] precision
and operands[2] is not a register operand. During splitting perform
SImode AND if operands[0] doesn't have CCZmode and pos + len is
equal to mode precision.
* gcc.c-torture/execute/pr94567.c: New test.
Co-Authored-By: Jeff Law <law@redhat.com>