If the user's coroutine return type omits the mandatory promise
type then we will currently restate that error each time we see
a coroutine keyword, which doesn't provide any new information.
This suppresses all but the first instance in each coroutine.
gcc/cp/ChangeLog:
2020-02-04 Iain Sandoe <iain@sandoe.co.uk>
* coroutines.cc (find_promise_type): Delete unused forward
declaration.
(struct coroutine_info): Add a bool for no promise type error.
(coro_promise_type_found_p): Only emit the error for a missing
promise once in each affected coroutine.
gcc/testsuite/ChangeLog:
2020-02-04 Iain Sandoe <iain@sandoe.co.uk>
* g++.dg/coroutines/coro-missing-promise.C: New test.
Redundant store removal in FRE was restricted for correctness reasons.
The following extends correctness fixes required to memcpy/aggregate
copy translation. The main change is that we no longer insert
references rewritten to cover such aggregate copies into the hashtable
but the original one.
2020-02-04 Richard Biener <rguenther@suse.de>
PR tree-optimization/91123
* tree-ssa-sccvn.c (vn_walk_cb_data::finish): New method.
(vn_walk_cb_data::last_vuse): New member.
(vn_walk_cb_data::saved_operands): Likewsie.
(vn_walk_cb_data::~vn_walk_cb_data): Release saved_operands.
(vn_walk_cb_data::push_partial_def): Use finish.
(vn_reference_lookup_2): Update last_vuse and use finish if
we've saved operands.
(vn_reference_lookup_3): Use finish and update calls to
push_partial_defs everywhere. When translating through
memcpy or aggregate copies save off operands and alias-set.
(eliminate_dom_walker::eliminate_stmt): Restore VN_WALKREWRITE
operation for redundant store removal.
* gcc.dg/tree-ssa/ssa-fre-85.c: New testcase.
The PR shows that code generation ends up pessimized by the new
canonicalization rules that end up nailing do-not-care elements
to specific values making it hard to generate good code later.
The temporary solution is to avoid this for the cases we also
obviously know the canonicalization will create more GIMPLE stmts than
before.
2020-02-04 Richard Biener <rguenther@suse.de>
PR tree-optimization/92819
* tree-ssa-forwprop.c (simplify_vector_constructor): Avoid
generating more stmts than before.
* gcc.target/i386/pr92819.c: New testcase.
* gcc.target/i386/pr92803.c: Adjust.
2020-02-03 Michael Meissner <meissner@linux.ibm.com>
* config/rs6000/rs6000.c (adjust_vec_address_pcrel): New helper
function to adjust PC-relative vector addresses.
(rs6000_adjust_vec_address): Call adjust_vec_address_pcrel to
handle vectors with PC-relative addresses.
2020-02-03 Michael Meissner <meissner@linux.ibm.com>
* config/rs6000/rs6000.c (reg_to_non_prefixed): Add forward
reference.
(hard_reg_and_mode_to_addr_mask): Delete.
(rs6000_adjust_vec_address): If the original vector address
was REG+REG or REG+OFFSET and the element is not zero, do the add
of the elements in the original address before adding the offset
for the vector element. Use address_to_insn_form to validate the
address using the register being loaded, rather than guessing
whether the address is a DS-FORM or DQ-FORM address.
2020-02-03 Michael Meissner <meissner@linux.ibm.com>
* config/rs6000/rs6000.c (get_vector_offset): New helper function
to calculate the offset in memory from the start of a vector of a
particular element. Add code to keep the element number in
bounds if the element number is variable.
(rs6000_adjust_vec_address): Move calculation of offset of the
vector element to get_vector_offset.
(rs6000_split_vec_extract_var): Do not do the initial AND of
element here, move the code to get_vector_offset.
[expr.const] specifically rules out mentioning a reference even if its
address is never used, because it implies indirection that is similarly
non-constant for a pointer variable.
PR c++/66477
* constexpr.c (cxx_eval_constant_expression) [PARM_DECL]: Don't
defer loading the value of a reference.
Since copying a class object is defined in terms of the copy constructor,
copying an empty class is OK even if it would otherwise not be usable in a
constant expression. Relatedly, using a parameter as an lvalue is no more
problematic than a local variable, and calling a member function uses the
object as an lvalue.
PR c++/91953
* constexpr.c (potential_constant_expression_1) [PARM_DECL]: Allow
empty class type.
[COMPONENT_REF]: A member function reference doesn't use the object
as an rvalue.
Since coroutine-ness is discovered lazily, we encounter the diagnostics
during each keyword parse. We were not handling the case where a user code
failed to include fundamental information (e.g. the traits) in a graceful
manner.
Once we've emitted an error for this level of fail, then we suppress
additional copies (otherwise the same thing will be reported for every
coroutine keyword seen).
gcc/cp/ChangeLog:
2020-02-03 Iain Sandoe <iain@sandoe.co.uk>
* coroutines.cc (struct coroutine_info): Add a bool flag to note
that we emitted an error for a bad function return type.
(get_coroutine_info): Tolerate an unset info table in case of
missing traits.
(find_coro_traits_template_decl): In case of error or if we didn't
find a type template, note we emitted the error and suppress
duplicates.
(find_coro_handle_template_decl): Likewise.
(instantiate_coro_traits): Only check for error_mark_node in the
return from lookup_qualified_name.
(coro_promise_type_found_p): Reorder initialization so that we check
for the traits and their usability before allocation of the info
table. Check for a suitable return type and emit a diagnostic for
here instead of relying on the lookup machinery. This allows the
error to have a better location, and means we can suppress multiple
copies.
(coro_function_valid_p): Re-check for a valid promise (and thus the
traits) before proceeding. Tolerate missing info as a fatal error.
gcc/testsuite/ChangeLog:
2020-02-03 Iain Sandoe <iain@sandoe.co.uk>
* g++.dg/coroutines/pr93458-1-missing-traits.C: New test.
* g++.dg/coroutines/pr93458-2-bad-traits.C: New test.
* g++.dg/coroutines/pr93458-3-missing-handle.C: New test.
* g++.dg/coroutines/pr93458-4-bad-coro-handle.C: New test.
* g++.dg/coroutines/pr93458-5-bad-coro-type.C: New test.
Various places in the analyzer use fold_build2, test the result, then
discard it. It's more efficient to use fold_binary, which avoids
building and GC-ing a redundant tree for the cases where folding fails.
gcc/analyzer/ChangeLog:
* constraint-manager.cc (range::constrained_to_single_element):
Replace fold_build2 with fold_binary. Remove unnecessary newline.
(constraint_manager::get_or_add_equiv_class): Replace fold_build2
with fold_binary in two places, and remove out-of-date comment.
(constraint_manager::eval_condition): Replace fold_build2 with
fold_binary.
* region-model.cc (constant_svalue::eval_condition): Likewise.
(region_model::on_assignment): Likewise.
PR analyzer/93544 reports an ICE when attempting to report a double-free
within diagnostic_manager::prune_for_sm_diagnostic, in which the
variable of interest has become an INTEGER_CST. Additionally, it picks
a nonsensical path through the function in which the pointer being
double-freed is known to be NULL, which we shouldn't complain about.
The dump shows that it picks the INTEGER_CST when updating var at a phi
node:
considering event 4, with var: ‘iftmp.0_2’, state: ‘start’
updating from ‘iftmp.0_2’ to ‘0B’ based on phi node
phi: iftmp.0_2 = PHI <iftmp.0_6(3), 0B(2)>
considering event 3, with var: ‘0B’, state: ‘start’
and that it has picked the shortest path through the exploded graph,
and on this path the pointer has been assigned NULL.
The root cause is that the state machine's on_stmt isn't called for phi
nodes (and wouldn't make much sense, as we wouldn't know which arg to
choose). malloc state machine::on_stmt "sees" a GIMPLE_ASSIGN to NULL
and handles it by transitioning the lhs to the "null" state, but never
"sees" GIMPLE_PHI nodes.
This patch fixes the ICE by wiring up phi-handling with state machines,
so that state machines have an on_phi vfunc. It updates the only current
user of "is_zero_assignment" (the malloc sm) to implement equivalent
logic for phi nodes. Doing so ensures that the pointer is in a separate
sm-state for the NULL vs non-NULL cases, and so gets separate exploded
nodes, and hence the path-finding logic chooses the correct path, and
the correct non-NULL phi argument.
The patch also adds some bulletproofing to prune_for_sm_diagnostic to
avoid crashing in the event of a bad path.
gcc/analyzer/ChangeLog:
PR analyzer/93544
* diagnostic-manager.cc
(diagnostic_manager::prune_for_sm_diagnostic): Bulletproof
against bad choices due to bad paths.
* engine.cc (impl_region_model_context::on_phi): New.
* exploded-graph.h (impl_region_model_context::on_phi): New decl.
* region-model.cc (region_model::on_longjmp): Likewise.
(region_model::handle_phi): Add phi param. Call the ctxt's on_phi
vfunc.
(region_model::update_for_phis): Pass phi to handle_phi.
* region-model.h (region_model::handle_phi): Add phi param.
(region_model_context::on_phi): New vfunc.
(test_region_model_context::on_phi): New.
* sm-malloc.cc (malloc_state_machine::on_phi): New.
(malloc_state_machine::on_zero_assignment): New.
* sm.h (state_machine::on_phi): New vfunc.
gcc/testsuite/ChangeLog:
PR analyzer/93544
* gcc.dg/analyzer/torture/pr93544.c: New test.
PR analyzer/93546 reports an ICE within region_model::add_region_for_type
when merging two region_models each containing a label pointer. The
two labels are stored as pointers to symbolic_regions, but these regions
were created with NULL type, leading to an assertion failure when a
merged copy is created.
The labels themselves have void (but not NULL) type.
This patch updates make_region_for_type to use the type of the decl when
creating such regions, rather than implicitly setting the region's type
to NULL, fixing the ICE.
gcc/analyzer/ChangeLog:
PR analyzer/93546
* region-model.cc (region_model::on_call_pre): Update for new
param of symbolic_region ctor.
(region_model::deref_rvalue): Likewise.
(region_model::add_new_malloc_region): Likewise.
(make_region_for_type): Likewise, preserving type.
* region-model.h (symbolic_region::symbolic_region): Add "type"
param and pass it to base class ctor.
gcc/testsuite/ChangeLog:
PR analyzer/93546
* gcc.dg/analyzer/pr93546.c: New test.
This un-documents constraints that cannot (or should not) be used in
inline assembler. It also improves markup, and presentation in general.
More work is needed, but gradual improvement is easier to do.
* config/rs6000/constraints.md: Improve documentation.
/
* doc/md.texi (PowerPC and IBM RS6000): Improve documentation.
The t-arm make fragment currently uses 'mv' to update some files that
are automatically regenerated, but this causes problems on read-only
filesystems if the date stamps are incorrect and the files have not
really changed. So use move-if-change instead.
PR target/93548
* config/arm/t-arm: ($(srcdir)/config/arm/arm-tune.md,
$(srcdir)/config/arm/arm-tables.opt): Use move-if-change.
The C front-end fixed this issue in r257620 by adding a DECL_EXPR from
grokdeclarator. We don't have an easy way to do that in the C++ front-end,
but it works fine to create and prepend a DECL_EXPR when we are genericizing
the NOP_EXPR for the cast.
The C patch wraps the DECL_EXPR in a BIND_EXPR, but that seems unnecessary
in C++; this is just a hook to run gimplify_type_sizes, we aren't actually
declaring anything that we need to worry about scoping for.
PR c++/88256
* cp-gimplify.c (predeclare_vla): New.
(cp_genericize_r) [NOP_EXPR]: Call it.
This is a patch for an issue where the compiler was generating a conditional
branch in Thumb2, which was too far for b{cond} to handle.
This was originally reported at binutils:
https://sourceware.org/bugzilla/show_bug.cgi?id=24991
And then raised for GCC:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91816
As can be seen here:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/Cihfddaf.html
the range of a 32-bit Thumb B{cond} is +/-1MB.
This is now checked for in arm.md and an unconditional branch is generated if
the jump would be greater than 1MB.
gcc/ChangeLog
2020-02-03 Stam Markianos-Wright <stam.markianos-wright@arm.com>
PR target/91816
* config/arm/arm-protos.h: New function arm_gen_far_branch prototype.
* config/arm/arm.c (arm_gen_far_branch): New function
arm_gen_far_branch.
* config/arm/arm.md: Update b<cond> for Thumb2 range checks.
gcc/testsuite/ChangeLog
2020-02-03 Stam Markianos-Wright <stam.markianos-wright@arm.com>
PR target/91816
* gcc.target/arm/pr91816.c: New test.
The following testcase started to ICE when .POPCOUNT matching has been added
to match.pd; we had __builtin_popcount*, but nothing would use the
popcounthi2 expander before.
The problem is that the popcounthi2_z196 expander doesn't emit valid RTL:
error: unrecognizable insn:
(insn 138 137 139 27 (set (reg:SI 190)
(ashift:SI (reg:HI 95 [ _105 ])
(const_int 8 [0x8]))) -1
(nil))
during RTL pass: vregs
The following patch is an attempt to fix that, furthermore I've tried to
slightly simplify it as well, it makes no sense to me to perform
(x + (x << 8)) >> 8 when we need to either zero extend or mask the result
at the end in order to avoid bits from above HImode to affect it, when we
can do
(x + (x >> 8)) & 0xff (or zero extension).
2020-02-03 Jakub Jelinek <jakub@redhat.com>
PR target/93533
* config/s390/s390.md (popcounthi2_z196): Fix up expander to emit
valid RTL to sum up the lowest and second lowest bytes of the popcnt
result.
* gcc.c-torture/compile/pr93533.c: New test.
* gcc.target/s390/pr93533.c: New test.
gcc/cp
* coroutines.cc (transform_await_wrapper): Set actor funcion as
new context of label_decl.
(build_actor_fn): Fill new field of await_xform_data.
gcc/testsuite
* g++.dg/coroutines/co-await-04-control-flow.C: Add label.
This fixes an ICE taking place in cp_default_conversion because we got
a SCOPE_REF that doesn't have a type and so checking
INTEGRAL_OR_UNSCOPED_ENUMERATION_TYPE_P (TREE_TYPE (exp)) will crash.
This happens since the recent Joseph's change in decl_attributes whereby
we don't skip C++11 attributes on types.
[dcl.align] is clear that alignas applied to a function is ill-formed.
That should be fixed, and we have PR90847 for that. But I think a more
appropriate fix at this stage would be the following: in a template we
want to splice dependent attributes and save them for later, and by
doing so avoid this crash.
PR c++/93530 - ICE on invalid alignas in a template.
* decl.c (grokdeclarator): Call cplus_decl_attributes instead of
decl_attributes.
* g++.dg/cpp0x/alignas18.C: New test.
This test explicitly tests for code generation that expects a
common section.
gcc/testsuite/ChangeLog:
2020-02-02 Iain Sandoe <iain@sandoe.co.uk>
* gcc.target/powerpc/darwin-abi-12.c: Add '-fcommon' to the
options.
2020-02-02 Vladimir Makarov <vmakarov@redhat.com>
PR rtl-optimization/91333
* ira-color.c (struct allocno_color_data): Add member
hard_reg_prefs.
(init_allocno_threads): Set the member up.
(bucket_allocno_compare_func): Add compare hard reg
prefs.
2020-02-02 Vladimir Makarov <vmakarov@redhat.com>
PR rtl-optimization/91333
* gcc.target/i386/pr91333.c: Add vmovsd to regexp. Set up count
to 3.
The following patch fixes
-FAIL: libgomp.fortran/use_device_addr-1.f90 -O0 execution test
-FAIL: libgomp.fortran/use_device_addr-2.f90 -O0 execution test
that has been FAILing for several months on powerpc64le-linux.
The problem is in the Fortran FE, which adds the artificial arguments
for scalar VALUE OPTIONAL dummy args only to DECL_ARGUMENTS where the
current function can see them, but not to TYPE_ARG_TYPES; if those functions
aren't varargs, this confuses calls.c to pass the remaining arguments
(which aren't named (== not covered by TYPE_ARG_TYPES) and aren't varargs
either) in a different spot from what the callee (which has proper
DECL_ARGUMENTS for all args) expects. For the artificial length arguments
for character dummy args we already put them in both DECL_ARGUMENTS and
TYPE_ARG_TYPES.
2020-02-01 Jakub Jelinek <jakub@redhat.com>
PR fortran/92305
* trans-types.c (gfc_get_function_type): Also push boolean_type_node
types for non-character scalar VALUE optional dummy arguments.
* trans-decl.c (create_function_arglist): Skip those in
hidden_typelist. Formatting fix.
On nios2-linux-gnu, there has been a long-standing bug in C++ exception
handling that sometimes resulted in link errors like
../nios2-linux-gnu/bin/ld: FDE encoding in /tmp/cccfpQ2l.o(.eh_frame) prevents .eh_frame_hdr table being created
when building some shared libraries or PIE executables. The root of
the problem is that GCC was incorrectly emitting an absolute encoding
in EH tables for PIC. This patch changes it to use either
DW_EH_PE_indirect (for global) or DW_EH_PE_datarel (for local), and
fixes libgcc so it can find the address of the GOT as the base address
for DW_EH_PE_datarel.
Complicating matters somewhat, GAS was missing support for
%gotoff(symbol) relocation syntax. I have just pushed a fix for that,
but I've added a configure check to test for presence of the binutils
support and fall back to the current absolute encoding (which works
most of the time) if it is not available. Once the fix makes it into
an official binutils release it might be appropriate to make this
error out instead.
Since this is a wrong-code bug and affects only nios2 target, I think
this is appropriate for Stage 4. I regression-tested on both
nios2-linux-gnu and nios2-elf, with and without the binutils support
present, before committing this.
2020-01-31 Sandra Loosemore <sandra@codesourcery.com>
gcc/
* configure.ac [nios2-*-*]: Check HAVE_AS_NIOS2_GOTOFF_RELOCATION.
* config.in: Regenerated.
* configure: Regenerated.
* config/nios2/nios2.h (ASM_PREFERRED_EH_DATA_FORMAT): Fix handling
for PIC when HAVE_AS_NIOS2_GOTOFF_RELOCATION.
(ASM_MAYBE_OUTPUT_ENCODED_ADDR_RTX): New.
gcc/testsuite/
* g++.target/nios2/hello-pie.C: New.
* g++.target/nios2/nios2.exp: New.
libgcc/
* config.host [nios2-*-linux*] (tmake_file, tm_file): Adjust.
* config/nios2-elf-lib.h: New.
* unwind-dw2-fde-dip.c (_Unwind_IteratePhdrCallback): Use existing
code for finding GOT base for nios2.
This commit:
commit e7c26e04b2 (tjteru/master)
Date: Wed Jan 22 14:54:26 2020 +0000
gcc: Add new configure options to allow static libraries to be selected
contains a couple of issues. First I failed to correctly regenerate
all of the configure files it should have done. Second, there was a
mistake in lib-link.m4, one of the conditions didn't use pure sh
syntax, I wrote this:
if x$lib_type = xauto || x$lib_type = xshared; then
When I should have written this:
if test "x$lib_type" = "xauto" || test "x$lib_type" = "xshared"; then
These issues were raised on the mailing list in these messages:
https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01827.htmlhttps://gcc.gnu.org/ml/gcc-patches/2020-01/msg01921.html
config/ChangeLog:
* lib-link.m4 (AC_LIB_LINKFLAGS_BODY): Update shell syntax.
gcc/ChangeLog:
* configure: Regenerate.
intl/ChangeLog:
* configure: Regenerate.
libcpp/ChangeLog:
* configure: Regenerate.
libstdc++-v3/ChangeLog:
* configure: Regenerate.
sizeof a VLA type is not a constant in C or the GNU C++ extension, so we
need to capture the VLA even in unevaluated context. For PR60855 we stopped
looking through a previous capture, but we also need to capture the first
time the variable is mentioned.
PR c++/86216
* semantics.c (process_outer_var_ref): Capture VLAs even in
unevaluated context.
The remaining low-hanging fruit for improvement on memory consumption in the
14179 testcase was the duplication of the CONSTRUCTOR for the array by
reshape_init. This patch changes reshape_init to reuse a single constructor
for an array of non-aggregate type such as the one in the testcase.
PR c++/14179
* decl.c (reshape_init_array_1): Reuse a single CONSTRUCTOR with
non-aggregate elements.
(reshape_init_array): Add first_initializer_p parm.
(reshape_init_r): Change first_initializer_p from bool to tree.
(reshape_init): Pass init to it.
PR14179 and the C counterpart PR12245 are about memory consumption of very
large file-scope arrays. Recently, location wrappers increased memory
consumption significantly: in an array of integer constants, each one will
have a location wrapper, which added up to over 500MB in the 14179
testcase. For this kind of testcase tracking these locations isn't worth
the cost, so this patch turns the wrappers off after 256 elements; any array
that size or larger isn't likely to be interested in the location of
individual integer constants.
PR c++/14179
* parser.c (cp_parser_initializer_list): Suppress location wrappers
after 256 elements.
gcc/analyzer/ChangeLog:
PR analyzer/93457
* region-model.cc (make_region_for_type): Use VOID_TYPE_P rather
than checking against void_type_node.
gcc/testsuite/ChangeLog:
PR analyzer/93457
* gcc.dg/analyzer/pr93457.c: New test.
gcc/analyzer/ChangeLog:
PR analyzer/93373
* region-model.cc (ASSERT_COMPAT_TYPES): Convert to...
(assert_compat_types): ...this, and bail when either type is NULL,
or when VOID_TYPE_P (dst_type).
(region_model::get_lvalue): Update for above conversion.
(region_model::get_rvalue): Likewise.
gcc/testsuite/ChangeLog:
PR analyzer/93373
* gcc.dg/analyzer/torture/pr93373.c: New test.
PR analyzer/93379 reports an ICE within
region_model::update_for_return_superedge when writing the
returned svalue_id to the lhs of the call_stmt
The root cause is that this analyzer code assumed that for any call
with a non-NULL gimple_call_lhs, the called fndecl would have non-void
return type, and thus that a non-null svalue_id would be returned from
region_model::pop_frame. This isn't the case e.g. for a call with
conflicting types where the callee returns void but the caller assumes
int.
This patch fixes the ICE by moving the check for null result so that
it also guards setting the lhs.
gcc/analyzer/ChangeLog:
PR analyzer/93379
* region-model.cc (region_model::update_for_return_superedge):
Move check for null result so that it also guards setting the
lhs.
gcc/testsuite/ChangeLog:
PR analyzer/93379
* gcc.dg/analyzer/torture/pr93379-2.c: New test.
* gcc.dg/analyzer/torture/pr93379.c: New test.
PR analyzer/93438 reports an ICE when merging two region_models
in which an older stack frame has a local pointing to a local in
a more recent stack frame.
stack
older frame
int *: "ow" --+
|
newer frame |
int: "pk" <---+
The root cause is that the state-merging code assumes that all frame
regions in the merged model have already been created.
stack_region::can_merge_p iterates through the frames, creating
and populating each merged frame in turn, so when it attempts to
populate the older frame, it attempts to reference the newer frame in
the merged model, which doesn't exist yet.
This patch reworks stack_region::can_merge_p to use a two-pass approach
in which all frames in the merged model are created first, and then
are all populated, fixing the bug.
gcc/analyzer/ChangeLog:
PR analyzer/93438
* region-model.cc (stack_region::can_merge_p): Split into a two
pass approach, creating all stack regions first, then populating
them.
(selftest::test_state_merging): Add test coverage for (a) the case
of self-merging a model in which a local in an older stack frame
points to a local in a more recent stack frame (which previously
would ICE), and (b) the case of self-merging a model in which a
local points to a global (which previously worked OK).
gcc/testsuite/ChangeLog:
PR analyzer/93438
* gcc.dg/analyzer/torture/pr93438.c: New test.
* gcc.dg/analyzer/torture/pr93438-2.c: New test.
The test FAILs on i686-linux with:
FAIL: g++.dg/pr91838.C (test for excess errors)
Excess errors:
/home/jakub/src/gcc/gcc/testsuite/g++.dg/pr91838.C:7:8: warning: MMX vector return without MMX enabled changes the ABI [-Wpsabi]
/home/jakub/src/gcc/gcc/testsuite/g++.dg/pr91838.C:7:3: warning: MMX vector argument without MMX enabled changes the ABI [-Wpsabi]
and on x86_64-linux with -m32 testing with failure to match the
expected pattern in there (or both with e.g. -m32/-mno-mmx/-mno-sse testing).
The test is also in a wrong directory, has non-standard specification that
it requires c++11 or later.
2020-01-31 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/91838
* g++.dg/pr91838.C: Moved to ...
* g++.dg/opt/pr91838.C: ... here. Require c++11 target instead of
dg-skip-if for c++98. Pass -Wno-psabi -w to avoid psabi style
warnings on vector arg passing or return. Add -masm=att on i?86/x86_64.
Only check for pxor %xmm0, %xmm0 on lp64 i?86/x86_64.
This patch adds support for the SVE intrinsics that map to Armv8.6
bfloat16 instructions. This means that svcvtnt is now a base SVE
function for one type suffix combination; the others are still
SVE2-specific.
This relies on a binutils fix:
https://sourceware.org/ml/binutils/2020-01/msg00450.html
so anyone testing older binutils 2.34 or binutils master sources will
need to upgrade to get clean test results. (At the time of writing,
no released version of binutils has this bug.)
2020-01-31 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/aarch64.h (TARGET_SVE_BF16): New macro.
* config/aarch64/aarch64-sve-builtins-sve2.h (svcvtnt): Move to
aarch64-sve-builtins-base.h.
* config/aarch64/aarch64-sve-builtins-sve2.cc (svcvtnt): Move to
aarch64-sve-builtins-base.cc.
* config/aarch64/aarch64-sve-builtins-base.h (svbfdot, svbfdot_lane)
(svbfmlalb, svbfmlalb_lane, svbfmlalt, svbfmlalt_lane, svbfmmla)
(svcvtnt): Declare.
* config/aarch64/aarch64-sve-builtins-base.cc (svbfdot, svbfdot_lane)
(svbfmlalb, svbfmlalb_lane, svbfmlalt, svbfmlalt_lane, svbfmmla)
(svcvtnt): New functions.
* config/aarch64/aarch64-sve-builtins-base.def (svbfdot, svbfdot_lane)
(svbfmlalb, svbfmlalb_lane, svbfmlalt, svbfmlalt_lane, svbfmmla)
(svcvtnt): New functions.
(svcvt): Add a form that converts f32 to bf16.
* config/aarch64/aarch64-sve-builtins-shapes.h (ternary_bfloat)
(ternary_bfloat_lane, ternary_bfloat_lanex2, ternary_bfloat_opt_n):
Declare.
* config/aarch64/aarch64-sve-builtins-shapes.cc (parse_element_type):
Treat B as bfloat16_t.
(ternary_bfloat_lane_base): New class.
(ternary_bfloat_def): Likewise.
(ternary_bfloat): New shape.
(ternary_bfloat_lane_def): New class.
(ternary_bfloat_lane): New shape.
(ternary_bfloat_lanex2_def): New class.
(ternary_bfloat_lanex2): New shape.
(ternary_bfloat_opt_n_def): New class.
(ternary_bfloat_opt_n): New shape.
* config/aarch64/aarch64-sve-builtins.cc (TYPES_cvt_bfloat): New macro.
* config/aarch64/aarch64-sve.md (@aarch64_sve_<sve_fp_op>vnx4sf)
(@aarch64_sve_<sve_fp_op>_lanevnx4sf): New patterns.
(@aarch64_sve_<optab>_trunc<VNx4SF_ONLY:mode><VNx8BF_ONLY:mode>)
(@cond_<optab>_trunc<VNx4SF_ONLY:mode><VNx8BF_ONLY:mode>): Likewise.
(*cond_<optab>_trunc<VNx4SF_ONLY:mode><VNx8BF_ONLY:mode>): Likewise.
(@aarch64_sve_cvtnt<VNx8BF_ONLY:mode>): Likewise.
* config/aarch64/aarch64-sve2.md (@aarch64_sve2_cvtnt<mode>): Key
the pattern off the narrow mode instead of the wider one.
* config/aarch64/iterators.md (VNx8BF_ONLY): New mode iterator.
(UNSPEC_BFMLALB, UNSPEC_BFMLALT, UNSPEC_BFMMLA): New unspecs.
(sve_fp_op): Handle them.
(SVE_BFLOAT_TERNARY_LONG): New int itertor.
(SVE_BFLOAT_TERNARY_LONG_LANE): Likewise.
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_aarch64_asm_bf16_ok):
New proc.
* gcc.target/aarch64/sve/acle/asm/bfdot_f32.c: New test.
* gcc.target/aarch64/sve/acle/asm/bfdot_lane_f32.c: Likweise.
* gcc.target/aarch64/sve/acle/asm/bfmlalb_f32.c: Likweise.
* gcc.target/aarch64/sve/acle/asm/bfmlalb_lane_f32.c: Likweise.
* gcc.target/aarch64/sve/acle/asm/bfmlalt_f32.c: Likweise.
* gcc.target/aarch64/sve/acle/asm/bfmlalt_lane_f32.c: Likweise.
* gcc.target/aarch64/sve/acle/asm/bfmmla_f32.c: Likweise.
* gcc.target/aarch64/sve/acle/asm/cvt_bf16.c: Likweise.
* gcc.target/aarch64/sve/acle/asm/cvtnt_bf16.c: Likweise.
* gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_1.c: Likweise.
* gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_lane_1.c:
Likweise.
* gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_lanex2_1.c:
Likweise.
* gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_opt_n_1.c:
Likweise.