2019-11-19 Richard Biener <rguenther@suse.de>
PR tree-optimization/92581
* tree-vect-loop.c (vect_create_epilog_for_reduction): For
condition reduction chains gather all conditions involved
for computing the index reduction vector.
* gcc.dg/vect/vect-cond-reduc-5.c: New testcase.
From-SVN: r278445
Thumb1 cannot support asm-flags currently, because we don't expose the
flags register to the compiler. Disable the support for that case.
Adjust the asm-flag-6 test for aarch64 ilp32 correctness.
gcc/
* config/arm/arm-c.c (arm_cpu_builtins): Use def_or_undef_macro
to define __GCC_ASM_FLAG_OUTPUTS__.
* config/arm/arm.c (thumb1_md_asm_adjust): New function.
(arm_option_params_internal): Swap out targetm.md_asm_adjust
depending on TARGET_THUMB1.
* doc/extend.texi (FlagOutputOperands): Document thumb1 restriction.
gcc/testsuite/
* testsuite/gcc.target/arm/asm-flag-3.c: Skip for thumb1.
* testsuite/gcc.target/arm/asm-flag-5.c: Likewise.
* testsuite/gcc.target/arm/asm-flag-6.c: Likewise.
* testsuite/gcc.target/arm/asm-flag-4.c: New test.
* testsuite/gcc.target/aarch64/asm-flag-6.c: Use %w for
asm inputs to cmp instruction for ILP32.
From-SVN: r278443
PR middle-end/91450
* internal-fn.c (expand_mul_overflow): For s1 * s2 -> ur, if one
operand is negative and one non-negative, compare the non-negative
one against 0 rather than comparing s1 & s2 against 0. Otherwise,
don't compare (s1 & s2) == 0, but compare separately both s1 == 0
and s2 == 0, unless one of them is known to be negative. Remove
tem2 variable, use tem where tem2 has been used before.
* gcc.c-torture/execute/pr91450-1.c: New test.
* gcc.c-torture/execute/pr91450-2.c: New test.
From-SVN: r278437
PR tree-optimization/92557
* omp-low.c (omp_clause_aligned_alignment): Punt if TYPE_MODE is not
vmode rather than asserting it always is.
* gcc.dg/gomp/pr92557.c: New test.
From-SVN: r278432
2019-11-19 Richard Biener <rguenther@suse.de>
PR tree-optimization/92554
* tree-vect-loop.c (vect_create_epilog_for_reduction): Look
for the actual condition stmt and deal with sign-changes.
* gcc.dg/vect/pr92554.c: New testcase.
From-SVN: r278431
2019-11-19 Martin Liska <mliska@suse.cz>
PR bootstrap/92540
* config/riscv/riscv.c (riscv_address_insns): Initialize
addr in order to remove boostrap -Wmaybe-uninitialized
error.
From-SVN: r278429
Certain bad uses of C2x standard attributes (that is, attributes
inside [[]] with only a name but no namespace specified) are
constraint violations, and so should be diagnosed with a pedwarn (or
error) where GCC currently uses a warning. This patch implements this
in some cases (not yet for attributes used on types, nor for some bad
uses of fallthrough attributes). Specifically, this applies to
unknown standard attributes (taking care not to pedwarn for nodiscard,
which is known but not implemented for C), and to all currently
implemented standard attributes in attribute declarations (including
when mixed with fallthrough) and on statements.
Bootstrapped with no regressions on x86_64-pc-linux-gnu.
gcc/c:
* c-decl.c (c_warn_unused_attributes): Use pedwarn not warning for
standard attributes.
* c-parser.c (c_parser_std_attribute): Take argument for_tm. Use
pedwarn for unknown standard attributes and return error_mark_node
for them.
gcc/c-family:
* c-common.c (attribute_fallthrough_p): In C, use pedwarn not
warning for standard attributes mixed with fallthrough attributes.
gcc/testsuite:
* gcc.dg/c2x-attr-fallthrough-5.c, gcc.dg/c2x-attr-syntax-5.c: New
tests.
* gcc.dg/c2x-attr-deprecated-2.c, gcc.dg/c2x-attr-deprecated-4.c,
gcc.dg/c2x-attr-fallthrough-2.c, gcc.dg/c2x-attr-maybe_unused-2.c,
gcc.dg/c2x-attr-maybe_unused-4.c: Expect errors in place of some
warnings.
From-SVN: r278428
This patch refactors tree-loop-distribution.c for thread safety without
use of C11 __thread feature. All global variables were moved to
`class loop_distribution` which is initialized at ::execute time.
From-SVN: r278421
This patch adds more tests of C2x attributes, where I found cases that
were handled correctly by my patches but missing from the original
tests. Tests are added for -std=c11 -pedantic handling of C2x
attribute syntax and corresponding -Wc11-c2x-compat handling; for
struct [[deprecated]]; and for the [[__fallthrough__]] spelling of
[[fallthrough]] in the case of valid fallthrough attributes.
Tested for x86_64-pc-linux-gnu.
* gcc.dg/c11-attr-syntax-1.c, gcc.dg/c11-attr-syntax-2.c,
gcc.dg/c11-attr-syntax-3.c, gcc.dg/c2x-attr-syntax-4.c: New tests.
* gcc.dg/c2x-attr-deprecated-1.c: Also test struct [[deprecated]].
* gcc.dg/c2x-attr-fallthrough-1.c: Also test [[__fallthrough__]].
From-SVN: r278418
When fixing c++/91889 (r276251) I was assuming that we couldn't have a ck_qual
under a ck_ref_bind, and I was introducing it in the patch and so this
+ if (next_conversion (convs)->kind == ck_qual)
+ {
+ gcc_assert (same_type_p (TREE_TYPE (expr),
+ next_conversion (convs)->type));
+ /* Strip the cast created by the ck_qual; cp_build_addr_expr
+ below expects an lvalue. */
+ STRIP_NOPS (expr);
+ }
in convert_like_real was supposed to handle it. But that assumption was wrong
as this test shows; here we have "(int *)f" where f is of type long int, and
we're converting it to "const int *const &", so we have both ck_ref_bind and
ck_qual. That means that the new STRIP_NOPS strips an expression it shouldn't
have, and that then breaks when creating a TARGET_EXPR. So we want to limit
the stripping to the new case only. This I do by checking need_temporary_p,
which will be 0 in the new case. Yes, we can set need_temporary_p when
binding a reference directly, but then we won't have a qualification
conversion. It is possible to have a bit-field, convert it to a pointer,
and then convert that pointer to a more-qualified pointer, but in that case
we're not dealing with an lvalue, so gl_kind is 0, so we won't enter this
block in reference_binding:
1747 if ((related_p || compatible_p) && gl_kind)
* call.c (convert_like_real) <case ck_ref_bind>: Check need_temporary_p.
* g++.dg/cpp0x/ref-bind7.C: New test.
From-SVN: r278416
This patch adds optabs that check whether a read followed by a write
or a write followed by a read can be divided into interleaved byte
accesses without changing the dependencies between the bytes.
This is one of the uses of the SVE2 WHILERW and WHILEWR instructions.
(The instructions can also be used to limit the VF at runtime,
but that's future work.)
2019-11-18 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* doc/sourcebuild.texi (vect_check_ptrs): Document.
* optabs.def (check_raw_ptrs_optab, check_war_ptrs_optab): New optabs.
* doc/md.texi: Document them.
* internal-fn.def (IFN_CHECK_RAW_PTRS, IFN_CHECK_WAR_PTRS): New
internal functions.
* internal-fn.h (internal_check_ptrs_fn_supported_p): Declare.
* internal-fn.c (check_ptrs_direct): New macro.
(expand_check_ptrs_optab_fn): Likewise.
(direct_check_ptrs_optab_supported_p): Likewise.
(internal_check_ptrs_fn_supported_p): New fuction.
* tree-data-ref.c: Include internal-fn.h.
(create_ifn_alias_checks): New function.
(create_intersect_range_checks): Use it.
* config/aarch64/iterators.md (SVE2_WHILE_PTR): New int iterator.
(optab, cmp_op): Handle it.
(raw_war, unspec): New int attributes.
* config/aarch64/aarch64.md (UNSPEC_WHILERW, UNSPEC_WHILE_WR): New
constants.
* config/aarch64/predicates.md (aarch64_bytes_per_sve_vector_operand):
New predicate.
* config/aarch64/aarch64-sve2.md (check_<raw_war>_ptrs<mode>): New
expander.
(@aarch64_sve2_while<cmp_op><GPI:mode><PRED_ALL:mode>_ptest): New
pattern.
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_vect_check_ptrs):
New procedure.
* gcc.dg/vect/vect-alias-check-14.c: Expect IFN_CHECK_WAR to be
used, if available.
* gcc.dg/vect/vect-alias-check-15.c: Likewise.
* gcc.dg/vect/vect-alias-check-16.c: Likewise IFN_CHECK_RAW.
* gcc.target/aarch64/sve2/whilerw_1.c: New test.
* gcc.target/aarch64/sve2/whilewr_1.c: Likewise.
* gcc.target/aarch64/sve2/whilewr_2.c: Likewise.
From-SVN: r278414
Empty vector constructors are equivalent to zero vectors. If we handle
that case directly, we can support it for variable-length vectors and
can hopefully make things more efficient for fixed-length vectors.
This is needed by a later C++ patch.
2019-11-18 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree.c (build_vector_from_ctor): Directly return a zero vector for
empty constructors.
From-SVN: r278413
SVE has two composite conditions:
pmore == at least one bit set && last bit clear
plast == no bits set || last bit set
So in general we generate them from:
A: CC = test bits
B: reg1 = first condition
C: CC = test bits
D: reg2 = second condition
E: result = (reg1 op reg2) where op is || or &&
To fold all this into a single test, we need to be able to remove
the redundant C (the cse.c patch) and then fold B, D and E down to
a single condition (the simplify-rtx.c patch).
The underlying conditions are unsigned, so the simplify-rtx.c part needs
to support both unsigned comparisons and AND. However, to avoid opening
the can of worms that is ANDing FP comparisons for unordered inputs,
I've restricted the new AND handling to cases in which NaNs can be
ignored. I think this is still a strict extension of what we have now,
it just doesn't go as far as it could. Going further would need an
entirely different set of testcases so I think would make more sense
as separate work.
2019-11-18 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* cse.c (cse_insn): Delete no-op register moves too.
* simplify-rtx.c (comparison_to_mask): Handle unsigned comparisons.
Take a second comparison to control the value for NE.
(mask_to_comparison): Handle unsigned comparisons.
(simplify_logical_relational_operation): Likewise. Update call
to comparison_to_mask. Handle AND if !HONOR_NANs.
(simplify_binary_operation_1): Call the above for AND too.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/ptest_pmore.c: New test.
From-SVN: r278411
This patch handles VIEW_CONVERT_EXPRs of variable-length VECTOR_CSTs
by adding tree-level versions of native_decode_vector_rtx and
simplify_const_vector_subreg. It uses the same code for fixed-length
vectors, both to get more coverage and because operating directly on
the compressed encoding should be more efficient for longer vectors
with a regular pattern.
The structure and comments are very similar between the tree and
rtx routines.
2019-11-18 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* fold-const.c (native_encode_vector): Turn into a wrapper function,
splitting the main code out into...
(native_encode_vector_part): ...this new function.
(native_decode_vector_tree): New function.
(fold_view_convert_vector_encoding): Likewise.
(fold_view_convert_expr): Use it for converting VECTOR_CSTs
to VECTOR_TYPEs.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/general/temporaries_1.c: New test.
From-SVN: r278410
For:
void
f1 (int *x, int *y)
{
for (int i = 0; i < 32; ++i)
x[i] += y[i];
}
we checked at runtime whether one vector at x would overlap one vector
at y. But in cases like this, the vector code would handle x <= y just
fine, since any write to address A still happens after any read from
address A. The only problem is if x is ahead of y by less than a
vector.
The same is true for two writes:
void
f2 (int *x, int *y)
{
for (int i = 0; i < 32; ++i)
{
x[i] = i;
y[i] = 2;
}
}
if y <= x then a vector write at y after a vector write at x would
have the same net effect as the original scalar writes.
This patch optimises the alias checks for these two cases. E.g.,
before the patch, f1 used:
add x2, x0, 15
sub x2, x2, x1
cmp x2, 30
bls .L2
whereas after the patch it uses:
add x2, x1, 4
sub x2, x0, x2
cmp x2, 8
bls .L2
Read-after-write cases like:
int
f3 (int *x, int *y)
{
int res = 0;
for (int i = 0; i < 32; ++i)
{
x[i] = i;
res += y[i];
}
return res;
}
can cope with x == y, but otherwise don't allow overlap in either
direction. Since checking for x == y at runtime would require extra
code, we're probably better off sticking with the current overlap test.
An overlap test is also needed if the scalar or vector accesses covered
by the alias check are mixed together, rather than all statements for
the second access following all statements for the first access.
The new code for gcc.target/aarch64/sve/var_strict_[135].c is slightly
better than before.
2019-11-18 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-data-ref.c (create_intersect_range_checks_index): If the
alias pair describes simple WAW and WAR dependencies, just check
whether the first B access overlaps later A accesses.
(create_waw_or_war_checks): New function that performs the same
optimization on addresses.
(create_intersect_range_checks): Call it.
gcc/testsuite/
* gcc.dg/vect/vect-alias-check-8.c: Expect WAR/WAW checks to be used.
* gcc.dg/vect/vect-alias-check-14.c: Likewise.
* gcc.dg/vect/vect-alias-check-15.c: Likewise.
* gcc.dg/vect/vect-alias-check-18.c: Likewise.
* gcc.dg/vect/vect-alias-check-19.c: Likewise.
* gcc.target/aarch64/sve/var_stride_1.c: Update expected sequence.
* gcc.target/aarch64/sve/var_stride_2.c: Likewise.
* gcc.target/aarch64/sve/var_stride_3.c: Likewise.
* gcc.target/aarch64/sve/var_stride_5.c: Likewise.
From-SVN: r278409
LRA allows address constraints that are more relaxed than "p":
/* Target hooks sometimes don't treat extra-constraint addresses as
legitimate address_operands, so handle them specially. */
if (insn_extra_address_constraint (cn)
&& satisfies_address_constraint_p (&ad, cn))
return change_p;
For SVE it's useful to allow the same thing for memory constraints.
The particular use case is LD1RQ, which is an SVE instruction that
addresses Advanced SIMD vector modes and that accepts some addresses
that normal Advanced SIMD moves don't.
Normally we require every memory to satisfy at least "m", which is
defined to be a memory "with any kind of address that the machine
supports in general". However, LD1RQ is very much special-purpose:
it doesn't really have any relation to normal operations on these
modes. Adding its addressing modes to "m" would lead to bad Advanced
SIMD optimisation decisions in passes like ivopts. LD1RQ therefore
has a memory constraint that accepts things "m" doesn't.
2019-11-18 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* lra-constraints.c (valid_address_p): Take the operand and a
constraint as argument. If the operand is a MEM and the constraint
is a memory constraint, check whether the eliminated form of the
MEM already satisfies the constraint.
(process_address_1): Update calls accordingly.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/ld1rq_f16.c: Remove XFAIL.
* gcc.target/aarch64/sve/acle/asm/ld1rq_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1rq_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1rq_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1rq_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1rq_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1rq_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1rq_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/ld1rq_u64.c: Likewise.
From-SVN: r278408
I happened to notice that MODIFY_JNI_METHOD_CALL was defined in
cygming.h and documented in tm.texi. However, because it was only
needed for gcj, it is obsolete. This patch removes the vestiges.
Tested by grep, and rebuilding the documentation.
gcc/ChangeLog
2019-11-18 Tom Tromey <tromey@adacore.com>
* doc/tm.texi: Rebuild.
* doc/tm.texi.in (Misc): Don't document MODIFY_JNI_METHOD_CALL.
* config/i386/cygming.h (MODIFY_JNI_METHOD_CALL): Don't define.
From-SVN: r278407
2019-11-18 Richard Biener <rguenther@suse.de>
PR tree-optimization/92516
* tree-vect-slp.c (vect_analyze_slp_instance): Add bst_map
argument, hoist bst_map creation/destruction to ...
(vect_analyze_slp): ... here, forming a true graph with
SLP instances being the entries.
(vect_detect_hybrid_slp_stmts): Remove wrapper.
(vect_detect_hybrid_slp): Use one visited set for all
graph entries.
(vect_slp_analyze_node_operations): Simplify visited/lvisited
to hash-sets of slp_tree.
(vect_slp_analyze_operations): Likewise.
(vect_bb_slp_scalar_cost): Remove wrapper.
(vect_bb_vectorization_profitable_p): Use one visited set for
all graph entries.
(vect_schedule_slp_instance): Elide bst_map use.
(vect_schedule_slp): Likewise.
* g++.dg/vect/slp-pr92516.cc: New testcase.
2019-11-18 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_analyze_slp_instance): When a CTOR
was vectorized with just external refs fail.
* gcc.dg/vect/vect-ctor-1.c: New testcase.
From-SVN: r278406
The std::jthread::get_id() function was missing a return statement.
The is_invocable check needs to be done using decayed types, as they'll
be forwarded to std::invoke as rvalues.
Also reduce header dependencies for the <thread> header. We don't need
to include <functional> for std::jthread because <bits/invoke.h> is
already included, which defines std::__invoke. We can also remove
<bits/functexcept.h> which isn't used at all. Finally, when
_GLIBCXX_HAS_GTHREADS is not defined there's no point including any
other headers, since we're not going to define anything in <thread>
anyway.
* include/std/thread: Reduce header dependencies.
(jthread::get_id()): Add missing return.
(jthread::get_stop_token()): Avoid unnecessary stop_source temporary.
(jthread::_S_create): Check is_invocable using decayed types. Add
static assertion.
* testsuite/30_threads/jthread/1.cc: Add dg-require-gthreads.
* testsuite/30_threads/jthread/2.cc: Likewise.
* testsuite/30_threads/jthread/3.cc: New test.
* testsuite/30_threads/jthread/jthread.cc: Add missing directives for
pthread and gthread support. Use VERIFY instead of assert.
From-SVN: r278402
* include/bits/alloc_traits.h (allocator_traits::construct)
(allocator_traits::destroy, allocator_traits::max_size): Add unused
attributes to parameters that are not used in C++20.
* include/std/bit (__ceil2): Add braces around assertion to avoid
-Wmissing-braces warning.
From-SVN: r278401
2019-11-18 Richard Biener <rguenther@suse.de>
PR tree-optimization/92558
* tree-vect-loop.c (vect_create_epilog_for_reduction): When
reducting the width of a reduction vector def update new_phis.
* gcc.dg/vect/pr92558.c: New testcase.
From-SVN: r278400
The gthr weak reference based single thread detection is unsafe with
static linking and in case of dynamic linking it's ineffective on musl
since pthread symbols are defined in libc.so.
(Ideally this should be fixed for all targets, since glibc plans to move
libpthread.so into libc.so too and users want to static link to pthread
without --whole-archive: PR87189.)
For now we have to explicitly opt out from the broken behaviour in the
config machinery of each target lib and libgcc was previously missed.
libgcc/ChangeLog:
2019-11-18 Szabolcs Nagy <szabolcs.nagy@arm.com>
* config.host: Add t-gthr-noweak on *-*-musl*.
* config/t-gthr-noweak: New file.
From-SVN: r278399
On powerpc and s390x the musl ABI requires 64 bit and 128 bit long
double respectively, so adjust the default.
gcc/ChangeLog:
2019-11-18 Szabolcs Nagy <szabolcs.nagy@arm.com>
* configure.ac (gcc_cv_target_ldbl128): Set for powerpc*-*-linux-musl*
and s390*-*-linux-musl* targets.
* configure: Regenerate.
From-SVN: r278398
2019-11-18 Martin Liska <mliska@suse.cz>
* dbgcnt.c (dbg_cnt_set_limit_by_name): Provide error
message for an unknown counter.
(dbg_cnt_process_single_pair): Support 0 as minimum value.
(dbg_cnt_process_opt): Remove unreachable code.
From-SVN: r278396
Hi there,
When compiling an __RTL function that has an unspecified "startwith"
pass we currently don't run the cleanup pass, this means that we ICE on
the next function (if it's a basic function).
This change ensures that the clean_state pass is run even if the
startwith pass is unspecified.
We also ensure the name of the startwith pass is always freed correctly.
As an example, before this change the following code would ICE when compiling
the function `foo_a`.
When compiled with
./aarch64-none-linux-gnu-gcc -O0 -S unspecified-pass-error.c -o test.s
```
int __RTL () badfoo ()
{
(function "badfoo"
(insn-chain
(block 2
(edge-from entry (flags "FALLTHRU"))
(cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK)
(cinsn 101 (set (reg:DI x19) (reg:DI x0)))
(cinsn 10 (use (reg/i:SI x19)))
(edge-to exit (flags "FALLTHRU"))
) ;; block 2
) ;; insn-chain
) ;; function "foo2"
}
int
foo_a ()
{
return 200;
}
```
Now it silently ignores the __RTL function and successfully compiles foo_a.
regtest done on aarch64
regtest done on x86_64
OK for trunk?
gcc/ChangeLog:
2019-11-18 Matthew Malcomson <matthew.malcomson@arm.com>
* run-rtl-passes.c (run_rtl_passes): Accept and handle empty
"initial_pass_name" argument -- by running "*clean_state" pass.
Also free the "initial_pass_name" when done.
gcc/c/ChangeLog:
2019-11-18 Matthew Malcomson <matthew.malcomson@arm.com>
* c-parser.c (c_parser_parse_rtl_body): Always call
run_rtl_passes, even if startwith pass is not provided.
gcc/testsuite/ChangeLog:
2019-11-18 Matthew Malcomson <matthew.malcomson@arm.com>
* gcc.dg/rtl/aarch64/unspecified-pass-error.c: New test.
From-SVN: r278393
A change made with r271340 ("libfortran/90038: Use posix_spawn instead
of fork") accidentally brought the obsolete `runstatedir' setting back
in. Fix it.
libgfortran/
* Makefile.in: Regenerate.
From-SVN: r278383