Commit Graph

187040 Commits

Author SHA1 Message Date
Jonathan Wright
3bc9db6a98 simplify-rtx: Push sign/zero-extension inside vec_duplicate
As a general principle, vec_duplicate should be as close to the root
of an expression as possible. Where unary operations have
vec_duplicate as an argument, these operations should be pushed
inside the vec_duplicate.

This patch modifies unary operation simplification to push
sign/zero-extension of a scalar inside vec_duplicate.

This patch also updates all RTL patterns in aarch64-simd.md to use
the new canonical form.

gcc/ChangeLog:

2021-07-19  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64-simd.md: Push sign/zero-extension
	inside vec_duplicate for all patterns.
	* simplify-rtx.c (simplify_context::simplify_unary_operation_1):
	Push sign/zero-extension inside vec_duplicate.
2021-07-27 10:42:33 +01:00
Thomas Schwinge
d88a695158 Don't use libgomp 'cbuf' buffering with OpenACC 'async'
The host data might not be computed yet (by an earlier asynchronous compute
region, for example.

	libgomp/
	* target.c (gomp_coalesce_buf_add): Update comment.
	(gomp_copy_host2dev, gomp_map_vars_internal): Don't expect to see
	'aq && cbuf'.
	(gomp_map_vars_internal): Only 'if (!aq)', do
	'gomp_coalesce_buf_add'.
	* testsuite/libgomp.oacc-c-c++-common/async-data-1-2.c: Remove
	XFAIL.

Co-Authored-By: Julian Brown <julian@codesourcery.com>
2021-07-27 11:16:37 +02:00
Julian Brown
9c41f5b9cd Fix OpenACC "ephemeral" asynchronous host-to-device copies
This patch fixes several places in libgomp/target.c where "ephemeral" data
(on the stack or in temporary heap locations) may be used as the source of
an asynchronous host-to-device copy that may not complete before the host
data disappears.

An existing, but flawed, workaround for this problem in the AMD GCN
libgomp offloading plugin is currently present on mainline, and was
posted for the og9 branch here:

  https://gcc.gnu.org/legacy-ml/gcc-patches/2019-08/msg00901.html

and previous versions of this patch were posted here (for mainline/og9):

  https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg01482.html
  https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01026.html

libgomp/
	* libgomp.h (gomp_copy_host2dev): Update prototype.
	* oacc-mem.c (memcpy_tofrom_device, update_dev_host): Add new
	argument to gomp_copy_host2dev (false).
	* plugin/plugin-gcn.c (struct copy_data): Remove free_src field.
	(copy_data): Don't free src.
	(queue_push_copy): Remove free_src handling.
	(GOMP_OFFLOAD_dev2dev): Update call to queue_push_copy.
	(GOMP_OFFLOAD_openacc_async_host2dev): Remove source-data
	snapshotting.
	(GOMP_OFFLOAD_openacc_async_dev2host): Update call to
	queue_push_copy.
	* target.c (goacc_device_copy_async): Add SRCADDR_ORIG parameter.
	(gomp_copy_host2dev): Add EPHEMERAL parameter.  Snapshot source
	data when true, and set up deferred freeing of temporary buffer.
	(gomp_copy_dev2host): Update call to goacc_device_copy_async.
	(gomp_map_vars_existing, gomp_map_pointer, gomp_attach_pointer)
	(gomp_detach_pointer, gomp_map_vars_internal, gomp_update): Update
	calls to gomp_copy_host2dev with appropriate ephemeral argument.
	* testsuite/libgomp.oacc-c-c++-common/async-data-1-1.c: Remove
	XFAIL.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2021-07-27 11:16:27 +02:00
Thomas Schwinge
88c40c36db Add 'libgomp.oacc-c-c++-common/async-data-1-{1,2}.c'
libgomp/
	* testsuite/libgomp.oacc-c-c++-common/async-data-1-1.c: New file.
	* testsuite/libgomp.oacc-c-c++-common/async-data-1-2.c: Likewise.

Co-Authored-By: Tom de Vries <tom@codesourcery.com>
2021-07-27 11:16:26 +02:00
Thomas Schwinge
29ddaf43f7 [OpenACC] Clarify sequencing of 'async' data copying vs. profiling events in 'libgomp.oacc-c-c++-common/acc_prof-{init,parallel}-1.c'
... as noticed with GCN offloading.

Fix-up for r271346 (commit 5fae049dc2)
"OpenACC Profiling Interface (incomplete)".

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c: Clarify
	sequencing of 'async' data copying vs. profiling events.
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c:
	Likewise.
2021-07-27 11:16:25 +02:00
Thomas Schwinge
599e275d7e Fix OpenACC 'async'/'wait' issues in 'libgomp.oacc-c-c++-common/lib-{94,95}.c', 'libgomp.oacc-fortran/lib-16{,-2}.f90'
Fix-up for r265842 (commit 58168bbf6f)
"[OpenACC 2.5, libgomp] Add *_async versions of runtime library API functions".

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/lib-94.c: Fix OpenACC
	'async'/'wait' issue.
	* testsuite/libgomp.oacc-c-c++-common/lib-95.c: Likewise.
	* testsuite/libgomp.oacc-fortran/lib-16-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/lib-16.f90: Likewise.

Co-Authored-By: Julian Brown <julian@codesourcery.com>
2021-07-27 11:16:24 +02:00
Richard Biener
66030d68a7 tree-optimization/101573 - improve uninit warning at -O0
We can improve uninit warnings from the early pass by looking
at PHI arguments on fallthru edges that are uninitialized and
have uses that are before a possible loop exit.  This catches
some cases earlier that we'd only warn in a more confusing
way after early inlining as seen by testcase adjustments.

It introduces

FAIL: gcc.dg/uninit-23.c (test for excess errors)

where we additionally warn

gcc.dg/uninit-23.c:21:13: warning: 't4' is used uninitialized [-Wuninitialized]

which I think is OK even if it's not obvious that the new
warning is an improvement when you look at the obvious source.

Somehow for all cases I never get the `'foo' was declared here`
notes, I didn't dig why that happens but it's odd.

2021-07-22  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/101573
	* tree-ssa-uninit.c (warn_uninit_phi_uses): New function
	looking at uninitialized PHI arg defs in some constrained cases.
	(warn_uninitialized_vars): Call it.
	(execute_early_warn_uninitialized): Calculate dominators.

	* gcc.dg/uninit-pr101573.c: New testcase.
	* gcc.dg/uninit-15-O0.c: Adjust.
	* gcc.dg/uninit-15.c: Likewise.
	* gcc.dg/uninit-23.c: Likewise.
	* c-c++-common/uninit-17.c: Likewise.
2021-07-27 10:46:22 +02:00
Richard Biener
c8ce54c6e6 tree-optimization/39821 - fix cost classification for widening arith
This adjusts the vectorizer to cost vector_stmt for widening
arithmetic instead of vec_promote_demote in the line of telling
the target that stmt_info->stmt is the meaningful piece we cost.

2021-07-27  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/39821
	* tree-vect-stmts.c (vect_model_promotion_demotion_cost): Use
	vector_stmt for widening arithmetic.
	(vectorizable_conversion): Adjust.
2021-07-27 10:41:43 +02:00
Martin Jambor
13586172d0
ipa: Adjust references to identify read-only globals
this patch has been motivated by SPEC 2017's 544.nab_r in which there is
a static variable which is never written to and so zero throughout the
run-time of the benchmark.  However, it is passed by reference to a
function in which it is read and (after some multiplications) passed
into __builtin_exp which in turn unnecessarily consumes almost 10% of
the total benchmark run-time.  The situation is illustrated by the added
testcase remref-3.c.

The patch adds a flag to ipa-prop descriptor of each parameter to mark
such parameters.  IPA-CP and inling then take the effort to remove
IPA_REF_ADDR references in the caller and only add IPA_REF_LOAD
reference to the clone/overall inlined function.  This is sufficient
for subsequent symbol table analysis code to identify the read-only
variable as such and optimize the code.

There are two changes from the RFC version posted to the list earlier.
First, three missing calls to get_base_address were added (there was
another one in an assert).  Second, references are not stripped off
the callers if the cloned function cannot change the signature.  The
second change reveals a real shortcoming stemming from the fact we
cannot adjust function prototypes with fnspecs.  But that is a more
general problem.

gcc/ChangeLog:

2021-07-20  Martin Jambor  <mjambor@suse.cz>

	* cgraph.h (ipa_replace_map): New field force_load_ref.
	* ipa-prop.h (ipa_param_descriptor): Reduce precision of move_cost,
	aded new flag load_dereferenced, adjusted comments.
	(ipa_get_param_dereferenced): New function.
	(ipa_set_param_dereferenced): Likewise.
	* cgraphclones.c (cgraph_node::create_virtual_clone): Follow it.
	* ipa-cp.c: Include gimple.h.
	(ipcp_discover_new_direct_edges): Take into account dereferenced flag.
	(get_replacement_map): New parameter force_load_ref, set the
	appropriate flag in ipa_replace_map if set.
	(struct symbol_and_index_together): New type.
	(adjust_refs_in_act_callers): New function.
	(adjust_references_in_caller): Likewise.
	(create_specialized_node): When appropriate, call
	adjust_references_in_caller and force only load references.
	* ipa-prop.c (load_from_dereferenced_name): New function.
	(ipa_analyze_controlled_uses): Also detect loads from a
	dereference, harden testing of call statements.
	(ipa_write_node_info): Stream the dereferenced flag.
	(ipa_read_node_info): Likewise.
	(ipa_set_jf_constant): Also create refdesc when jump function
	references a variable.
	(cgraph_node_for_jfunc): Rename to symtab_node_for_jfunc, work
	also on references of variables and return a symtab_node.  Adjust
	all callers.
	(propagate_controlled_uses): Also remove references to VAR_DECLs.

gcc/testsuite/ChangeLog:

2021-06-29  Martin Jambor  <mjambor@suse.cz>

	* gcc.dg/ipa/remref-3.c: New test.
	* gcc.dg/ipa/remref-4.c: Likewise.
	* gcc.dg/ipa/remref-5.c: Likewise.
	* gcc.dg/ipa/remref-6.c: Likewise.
2021-07-27 10:03:17 +02:00
Jakub Jelinek
a21bd3cebd gimple-fold: Fix up __builtin_clear_padding on classes with virtual inheritence [PR101586]
For the following testcase, B is 16-byte type, containing 8-byte
virtual pointer and 1-byte A member, and C contains two FIELD_DECLs,
one with B type and size of just 8-byte and then a field with type
A and 1-byte size.
The __builtin_clear_padding code was upset about the B typed FIELD_DECL
containing FIELD_DECLs beyond the field size and triggered
assertion failure.
This patch makes it ignore all FIELD_DECLs that are (fully) beyond the sz
passed from the caller (except for the flexible array member
diagnostics that is kept).

2021-07-27  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/101586
	* gimple-fold.c (clear_padding_type): Ignore FIELD_DECLs with byte
	positions above or equal to sz except for diagnostics of flexible
	array members.

	* g++.dg/torture/builtin-clear-padding-4.C: New test.
2021-07-27 09:59:37 +02:00
Michael Meissner
5485e820cd PR 100170: Fix eq/ne tests on power10.
This patch updates eq/ne tests in the testsuite to adjust the test if
power10 code generation is used.

2021-07-26  Michael Meissner  <meissner@linux.ibm.com>

gcc/testsuite/
	PR testsuite/100170
	* gcc.target/powerpc/ppc-eq0-1.c: Adjust insn counts if power10
	code is generated.
	* gcc.target/powerpc/ppc-ne0-1.c: (ne0): Adjust insn counts if
	power10 code is generated.
	(plus_ne0): Move to ppc-ne0-2.c.
	(cmp_plus_ne): Likewise.
	(plus_ne0_cmp): Likewise.
	* gcc.target/powerpc/ppc-ne0-2.c: New file.
2021-07-26 21:32:39 -04:00
GCC Administrator
1a7febe943 Daily bump. 2021-07-27 00:16:27 +00:00
Andrew MacLeod
d5a8c13827 Confirm and Handle only ASCII in toupper and tolower ranges.
PR tree-optimization/78888
	* gimple-range-fold.cc (get_letter_range): New.
	(fold_using_range::range_of_builtin_call): Call get_letter_range.
2021-07-26 17:30:25 -04:00
David Malcolm
3a1d168e9e analyzer: fix uninit false +ve when returning structs
This patch fixes some false positives from
 -Wanalyzer-use-of-uninitialized-value
when returning structs from functions (seen on the Linux kernel).

gcc/analyzer/ChangeLog:
	* region-model.cc (region_model::on_call_pre): Always set conjured
	LHS, not just for SSA names.

gcc/testsuite/ChangeLog:
	* gcc.dg/analyzer/sock-1.c: New test.
	* gcc.dg/analyzer/sock-2.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2021-07-26 15:25:00 -04:00
Andrew MacLeod
1ce0b26e6e Adjust ranges for to_upper and to_lower.
Exclude lower case chars from to_upper and upper case chars from to_lower.

	gcc/
	PR tree-optimization/78888
	* gimple-range-fold.cc (fold_using_range::range_of_builtin_call): Add cases
	for CFN_BUILT_IN_TOUPPER and CFN_BUILT_IN_TOLOWER.

	gcc/testsuite/
	* gcc.dg/pr78888.c: New.
2021-07-26 14:14:30 -04:00
Roger Sayle
cf5f544227 Fold bswap32(x) != 0 to x != 0 (and related transforms)
This patch to match.pd implements several closely related folding
simplifications at the tree-level, that make use of the property
that bit permutation functions, rotate and bswap have inverses.

[1]	bswap(X) eq/ne C, for constant C, simplifies to X eq/ne C'
	where C'=bswap(C), generalizing the transform in the subject.
[2]	bswap(X) eq/ne bswap(Y) simplifies to X eq/ne Y.
[3]	lrotate(X,C1) eq/ne C2 simplifies to X eq/ne C3, where
	C3 = rrotate(C2,C1), i.e. apply the inverse rotation to C2.
[4]	Likewise, rrotate(X,C1) eq/ne C2 simplifies to X eq/ne C3,
	where C3 = lrotate(C2,C1).
[5]	rotate(X,Z) eq/ne rotate(Y,Z) simplifies to X eq/ne Y, when
	the bit-count Z (the same on both sides) has no side-effects.
[6]	rotate(X,Y) eq/ne 0 simplifies to X eq/ne 0 if Y has no
	side-effects.
[7]	Likewise, rotate(X,Y) eq/ne -1 simplifies to X eq/ne -1,
	if Y has no side-effects.

2010-07-26  Roger Sayle  <roger@nextmovesoftware.com>
	    Marc Glisse  <marc.glisse@inria.fr>

gcc/ChangeLog
	* match.pd (rotate): Simplify equality/inequality of rotations.
	(bswap): Simplify equality/inequality tests of byte swapping.

gcc/testsuite/ChangeLog
	* gcc.dg/fold-eqrotate-1.c: New test case.
	* gcc.dg/fold-eqbswap-1.c: New test case.
2021-07-26 17:30:26 +01:00
Joseph Myers
44e322f432 Regenerate .pot files.
gcc/po/
	* gcc.pot: Regenerate.

libcpp/po/
	* cpplib.pot: Regenerate.
2021-07-26 15:27:23 +00:00
Aldy Hernandez
f384e2f551 Implement operator_bitwise_xor::op1_op2_relation_effect.
This patch adjusts XORing of ranges where the operands are known to be
equal or not equal.

We should probably do the same thing for the op[12]_range methods.

gcc/ChangeLog:

	* range-op.cc (operator_bitwise_xor::op1_op2_relation_effect):
	New.
2021-07-26 16:49:14 +02:00
Aldy Hernandez
3cb72ac171 Pass relationship to methods calling generic fold_range.
Fix a small oversight in methods calling the base class fold_range.

gcc/ChangeLog:

	* range-op.cc (operator_lshift::fold_range): Pass rel to
	base class fold_range.
	(operator_rshift::fold_range): Same.
2021-07-26 16:49:14 +02:00
Ashimida
bf6d414415 Remove legacy external declarations in toplev.h [PR101447]
gcc/
	PR driver/101447
	* toplev.h (min_align_loops_log): Remove declaration.
	(min_align_jumps_log, min_align_labels_log): Likewise.
	(min_align_functions_log): Likewise.
2021-07-26 10:40:18 -04:00
Tobias Burnus
0cbf03689e PR fortran/93308/93963/94327/94331/97046 problems raised by descriptor handling
Fortran: Fix attributes and bounds in ISO_Fortran_binding.

2021-07-26  José Rui Faustino de Sousa  <jrfsousa@gmail.com>
	    Tobias Burnus  <tobias@codesourcery.com>

	PR fortran/93308
	PR fortran/93963
	PR fortran/94327
	PR fortran/94331
	PR fortran/97046

gcc/fortran/ChangeLog:

	* trans-decl.c (convert_CFI_desc): Only copy out the descriptor
	if necessary.
	* trans-expr.c (gfc_conv_gfc_desc_to_cfi_desc): Updated attribute
	handling which reflect a previous intermediate version of the
	standard. Only copy out the descriptor if necessary.

libgfortran/ChangeLog:

	* runtime/ISO_Fortran_binding.c (cfi_desc_to_gfc_desc): Add code
	to verify the descriptor. Correct bounds calculation.
	(gfc_desc_to_cfi_desc): Add code to verify the descriptor.

gcc/testsuite/ChangeLog:

	* gfortran.dg/ISO_Fortran_binding_1.f90: Add pointer attribute,
	this test is still erroneous but now it compiles.
	* gfortran.dg/bind_c_array_params_2.f90: Update regex to match
	code changes.
	* gfortran.dg/PR93308.f90: New test.
	* gfortran.dg/PR93963.f90: New test.
	* gfortran.dg/PR94327.c: New test.
	* gfortran.dg/PR94327.f90: New test.
	* gfortran.dg/PR94331.c: New test.
	* gfortran.dg/PR94331.f90: New test.
	* gfortran.dg/PR97046.f90: New test.
2021-07-26 14:32:53 +02:00
Aldy Hernandez
32f7506bdc Abstract out conditional simplification out of execute_vrp.
VRP simplifies conditionals involving casted values outside of the main
folding mechanism, because this optimization inhibits the VRP jump
threader from threading through the comparison.

As part of replacing VRP with an evrp instance, I am making sure we do
everything VRP does.  Hence, I am abstracting this functionality out so
we can call it from from elsewhere.

ISTM that when the proposed ranger-based jump threader can handle
everything the forward threader does, there will be no need for this
optimization to be done outside of the evrp folder.  Perhaps we can fold
this into the substitute_using_ranges class.  But that's further down
the line.

Also, there is no need to pass a vr_values around, when the base
range_query class will do.  I fixed this, at it makes it trivial to pass
down a ranger or evrp instance.

Tested on x86-64 Linux.

gcc/ChangeLog:

	* tree-vrp.c (vrp_simplify_cond_using_ranges): Rename vr_values
	with range_query.
	(execute_vrp): Abstract out simplification of conditionals...
	(simplify_casted_conds): ...here.
2021-07-26 13:40:15 +02:00
Aldy Hernandez
dd44445f09 Pass gimple context to array_bounds_checker.
I have changed the use of the array_bounds_checker in VRP to use a
ranger in my local tree to make sure there are no regressions when using
either VRP or the ranger.  In doing so I noticed that the checker
does not pass context to get_value_range, which causes the ranger to miss a
few cases.  This patch fixes the oversight.

Tested on x86-64 Linux using the array bounds checker both with VRP and
the ranger.

gcc/ChangeLog:

	* gimple-array-bounds.cc (array_bounds_checker::get_value_range):
	Add gimple argument.
	(array_bounds_checker::check_array_ref): Same.
	(array_bounds_checker::check_addr_expr): Same.
	(array_bounds_checker::check_array_bounds): Pass statement to
	check_array_bounds and check_addr_expr.
	* gimple-array-bounds.h (check_array_bounds): Add gimple argument.
	(check_addr_expr): Same.
	(get_value_range): Same.
2021-07-26 11:55:24 +02:00
Tamar Christina
1ab2270036 AArch64: correct dot-product RTL patterns for aarch64.
The previous fix for this problem was wrong due to a subtle difference between
where NEON expects the RMW values and where intrinsics expects them.

The insn pattern is modeled after the intrinsics and so needs an expand for
the vectorizer optab to switch the RTL.

However operand[3] is not expected to be written to so the current pattern is
bogus.

Instead I rewrite the RTL to be in canonical ordering and merge them.

gcc/ChangeLog:

	* config/aarch64/aarch64-simd-builtins.def (sdot, udot): Rename to..
	(sdot_prod, udot_prod): ... This.
	* config/aarch64/aarch64-simd.md (aarch64_<sur>dot<vsi2qi>): Merged
	into...
	(<sur>dot_prod<vsi2qi>): ... this.
	(aarch64_<sur>dot_lane<vsi2qi>, aarch64_<sur>dot_laneq<vsi2qi>):
	Change operands order.
	(<sur>sadv16qi): Use new operands order.
	* config/aarch64/arm_neon.h (vdot_u32, vdotq_u32, vdot_s32,
	vdotq_s32): Use new RTL ordering.
2021-07-26 10:23:21 +01:00
Tamar Christina
2050ac1a54 AArch64: correct usdot vectorizer and intrinsics optabs
There's a slight mismatch between the vectorizer optabs and the intrinsics
patterns for NEON.  The vectorizer expects operands[3] and operands[0] to be
the same but the aarch64 intrinsics expanders expect operands[0] and
operands[1] to be the same.

This means we need different patterns here.  This adds a separate usdot
vectorizer pattern which just shuffles around the RTL params.

There's also an inconsistency between the usdot and (u|s)dot intrinsics RTL
patterns which is not corrected here.

gcc/ChangeLog:

	* config/aarch64/aarch64-builtins.c (TYPES_TERNOP_SUSS,
	aarch64_types_ternop_suss_qualifiers): New.
	* config/aarch64/aarch64-simd-builtins.def (usdot_prod): Use it.
	* config/aarch64/aarch64-simd.md (usdot_prod<vsi2qi>): Re-organize RTL.
	* config/aarch64/arm_neon.h (vusdot_s32, vusdotq_s32): Use it.
2021-07-26 10:22:23 +01:00
Jakub Jelinek
acf9d1fd80 openmp: Add support for omp attributes section and scan directives
This patch adds support for expressing the section and scan directives
using the attribute syntax and additionally fixes some bugs in the attribute
syntax directive handling.
For now it requires that the scan and section directives appear as the only
attribute, not combined with other OpenMP or non-OpenMP attributes on the same
statement.

2021-07-26  Jakub Jelinek  <jakub@redhat.com>

	* parser.h (struct cp_lexer): Add orphan_p member.
	* parser.c (cp_parser_statement): Don't change in_omp_attribute_pragma
	upon restart from CPP_PRAGMA handling.  Fix up condition when a lexer
	should be destroyed and adjust saved_tokens if it records tokens from
	the to be destroyed lexer.
	(cp_parser_omp_section_scan): New function.
	(cp_parser_omp_scan_loop_body): Use it.  If
	parser->lexer->in_omp_attribute_pragma, allow optional comma
	after scan.
	(cp_parser_omp_sections_scope): Use cp_parser_omp_section_scan.

	* g++.dg/gomp/attrs-1.C: Use attribute syntax even for section
	and scan directives.
	* g++.dg/gomp/attrs-2.C: Likewise.
	* g++.dg/gomp/attrs-6.C: New test.
	* g++.dg/gomp/attrs-7.C: New test.
	* g++.dg/gomp/attrs-8.C: New test.
2021-07-26 09:13:47 +02:00
GCC Administrator
124bb55777 Daily bump. 2021-07-26 00:16:23 +00:00
Arnaud Charlet
b454c40956 [Ada] Declare time_t uniformly based on a system parameter #2
gcc/ada/

	* libgnat/s-osprim__x32.adb: Add missing with clause.
2021-07-25 09:23:44 -04:00
GCC Administrator
5a957cd388 Daily bump. 2021-07-25 00:16:22 +00:00
Marek Polacek
34dbb5f346 include: Fix -Wundef warnings in ansidecl.h
This quashes -Wundef warnings in ansidecl.h when compiled in C or C++.
In C, __cpp_constexpr and __cplusplus aren't defined so we evaluate
them to 0; conversely, __STDC_VERSION__ is not defined in C++.
This has caused grief when -Wundef is used with -Werror.

I've also tested -traditional-cpp.

include/ChangeLog:

	* ansidecl.h: Check if __cplusplus is defined before checking
	the value of __cpp_constexpr and __cplusplus.  Don't check
	__STDC_VERSION__ in C++.
2021-07-24 12:51:00 -04:00
GCC Administrator
ead235f601 Daily bump. 2021-07-24 00:16:44 +00:00
Harald Anlauf
e314cfc371 Fortran: extend check for array arguments and reject CLASS array elements.
gcc/fortran/ChangeLog:

	PR fortran/101536
	* check.c (array_check): Adjust check for the case of CLASS
	arrays.

gcc/testsuite/ChangeLog:

	PR fortran/101536
	* gfortran.dg/pr101536.f90: New test.
2021-07-23 21:00:10 +02:00
Jakub Jelinek
8408d34570 expmed: Fix store_integral_bit_field [PR101562]
Our documentation says that paradoxical subregs shouldn't appear
in strict_low_part:
'(strict_low_part (subreg:M (reg:N R) 0))'
     This expression code is used in only one context: as the
     destination operand of a 'set' expression.  In addition, the
     operand of this expression must be a non-paradoxical 'subreg'
     expression.
but on the testcase below that triggers UB at runtime
store_integral_bit_field emits exactly that.

The following patch fixes it by ensuring the requirement is satisfied.

2021-07-23  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/101562
	* expmed.c (store_integral_bit_field): Only use movstrict_optab
	if the operand isn't paradoxical.

	* gcc.c-torture/compile/pr101562.c: New test.
2021-07-23 19:55:16 +02:00
Aldy Hernandez
435f90187e Use range_query object in array bounds class.
Now that all dependencies of array_bounds_checker take a range_query, we
can sever the relationship with vr_values.  Changing this will allow us
to use the array_bounds_checker with VRP, evrp, or the ranger.

Tested on x86-64 Linux.

gcc/ChangeLog:

	* gimple-array-bounds.h (class array_bounds_checker): Change
	ranges type to range_query.
2021-07-23 17:39:15 +02:00
Jonathan Wright
50752b751f aarch64: Use memcpy to copy vector tables in vst1[q]_x2 intrinsics
Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vst1[q]_x2
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.

Add new code generation tests to verify that superfluous move
instructions are not generated for the vst1q_x2 intrinsics.

gcc/ChangeLog:

2021-07-23  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/arm_neon.h (vst1_s64_x2): Use
	__builtin_memcpy instead of constructing
	__builtin_aarch64_simd_oi one vector at a time.
	(vst1_u64_x2): Likewise.
	(vst1_f64_x2): Likewise.
	(vst1_s8_x2): Likewise.
	(vst1_p8_x2): Likewise.
	(vst1_s16_x2): Likewise.
	(vst1_p16_x2): Likewise.
	(vst1_s32_x2): Likewise.
	(vst1_u8_x2): Likewise.
	(vst1_u16_x2): Likewise.
	(vst1_u32_x2): Likewise.
	(vst1_f16_x2): Likewise.
	(vst1_f32_x2): Likewise.
	(vst1_p64_x2): Likewise.
	(vst1q_s8_x2): Likewise.
	(vst1q_p8_x2): Likewise.
	(vst1q_s16_x2): Likewise.
	(vst1q_p16_x2): Likewise.
	(vst1q_s32_x2): Likewise.
	(vst1q_s64_x2): Likewise.
	(vst1q_u8_x2): Likewise.
	(vst1q_u16_x2): Likewise.
	(vst1q_u32_x2): Likewise.
	(vst1q_u64_x2): Likewise.
	(vst1q_f16_x2): Likewise.
	(vst1q_f32_x2): Likewise.
	(vst1q_f64_x2): Likewise.
	(vst1q_p64_x2): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
	tests.
2021-07-23 15:38:00 +01:00
Jonathan Wright
ccf6e2c21b aarch64: Use memcpy to copy vector tables in vst1[q]_x3 intrinsics
Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vst1[q]_x3
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.

Add new code generation tests to verify that superfluous move
instructions are not generated for the vst1q_x3 intrinsics.

gcc/ChangeLog:

2021-07-23  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/arm_neon.h (vst1_s64_x3): Use
	__builtin_memcpy instead of constructing
	__builtin_aarch64_simd_ci one vector at a time.
	(vst1_u64_x3): Likewise.
	(vst1_f64_x3): Likewise.
	(vst1_s8_x3): Likewise.
	(vst1_p8_x3): Likewise.
	(vst1_s16_x3): Likewise.
	(vst1_p16_x3): Likewise.
	(vst1_s32_x3): Likewise.
	(vst1_u8_x3): Likewise.
	(vst1_u16_x3): Likewise.
	(vst1_u32_x3): Likewise.
	(vst1_f16_x3): Likewise.
	(vst1_f32_x3): Likewise.
	(vst1_p64_x3): Likewise.
	(vst1q_s8_x3): Likewise.
	(vst1q_p8_x3): Likewise.
	(vst1q_s16_x3): Likewise.
	(vst1q_p16_x3): Likewise.
	(vst1q_s32_x3): Likewise.
	(vst1q_s64_x3): Likewise.
	(vst1q_u8_x3): Likewise.
	(vst1q_u16_x3): Likewise.
	(vst1q_u32_x3): Likewise.
	(vst1q_u64_x3): Likewise.
	(vst1q_f16_x3): Likewise.
	(vst1q_f32_x3): Likewise.
	(vst1q_f64_x3): Likewise.
	(vst1q_p64_x3): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
	tests.
2021-07-23 15:37:43 +01:00
H.J. Lu
085666673d x86: Don't return hard register when LRA is in progress
Don't return hard register in ix86_gen_scratch_sse_rtx when LRA is in
progress to avoid ICE when there are no available hard registers for
LRA.

gcc/

	PR target/101504
	* config/i386/i386.c (ix86_gen_scratch_sse_rtx): Don't return
	hard register when LRA is in progress.

gcc/testsuite/

	PR target/101504
	* gcc.target/i386/pr101504.c: New test.
2021-07-23 06:10:39 -07:00
Jonathan Wakely
3ea62a2b2e libstdc++: Reduce headers included by <future>
The <future> header only needs std::atomic_flag, so can include
<bits/atomic_base.h> instead of the whole of <atomic>.

libstdc++-v3/ChangeLog:

	* include/std/future: Include <bits/atomic_base.h> instead of
	<atomic>.
2021-07-23 13:27:45 +01:00
Jonathan Wright
1711b04582 aarch64: Use memcpy to copy vector tables in vst1[q]_x4 intrinsics
Use __builtin_memcpy to copy vector structures instead of using a
union in each of the vst1[q]_x4 Neon intrinsics in arm_neon.h.

Add new code generation tests to verify that superfluous move
instructions are not generated for the vst1q_x4 intrinsics.

gcc/ChangeLog:

2021-07-21  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/arm_neon.h (vst1_s8_x4): Use
	__builtin_memcpy instead of using a union.
	(vst1q_s8_x4): Likewise.
	(vst1_s16_x4): Likewise.
	(vst1q_s16_x4): Likewise.
	(vst1_s32_x4): Likewise.
	(vst1q_s32_x4): Likewise.
	(vst1_u8_x4): Likewise.
	(vst1q_u8_x4): Likewise.
	(vst1_u16_x4): Likewise.
	(vst1q_u16_x4): Likewise.
	(vst1_u32_x4): Likewise.
	(vst1q_u32_x4): Likewise.
	(vst1_f16_x4): Likewise.
	(vst1q_f16_x4): Likewise.
	(vst1_f32_x4): Likewise.
	(vst1q_f32_x4): Likewise.
	(vst1_p8_x4): Likewise.
	(vst1q_p8_x4): Likewise.
	(vst1_p16_x4): Likewise.
	(vst1q_p16_x4): Likewise.
	(vst1_s64_x4): Likewise.
	(vst1_u64_x4): Likewise.
	(vst1_p64_x4): Likewise.
	(vst1q_s64_x4): Likewise.
	(vst1q_u64_x4): Likewise.
	(vst1q_p64_x4): Likewise.
	(vst1_f64_x4): Likewise.
	(vst1q_f64_x4): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
	tests.
2021-07-23 12:16:13 +01:00
Jonathan Wright
03148b8e50 aarch64: Use memcpy to copy vector tables in vst2[q] intrinsics
Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vst2[q]
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.

Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vst2q intrinsics.

gcc/ChangeLog:

2021-07-21  Jonathan Wrightt  <jonathan.wright@arm.com>

	* config/aarch64/arm_neon.h (vst2_s64): Use __builtin_memcpy
	instead of constructing __builtin_aarch64_simd_oi one vector
	at a time.
	(vst2_u64): Likewise.
	(vst2_f64): Likewise.
	(vst2_s8): Likewise.
	(vst2_p8): Likewise.
	(vst2_s16): Likewise.
	(vst2_p16): Likewise.
	(vst2_s32): Likewise.
	(vst2_u8): Likewise.
	(vst2_u16): Likewise.
	(vst2_u32): Likewise.
	(vst2_f16): Likewise.
	(vst2_f32): Likewise.
	(vst2_p64): Likewise.
	(vst2q_s8): Likewise.
	(vst2q_p8): Likewise.
	(vst2q_s16): Likewise.
	(vst2q_p16): Likewise.
	(vst2q_s32): Likewise.
	(vst2q_s64): Likewise.
	(vst2q_u8): Likewise.
	(vst2q_u16): Likewise.
	(vst2q_u32): Likewise.
	(vst2q_u64): Likewise.
	(vst2q_f16): Likewise.
	(vst2q_f32): Likewise.
	(vst2q_f64): Likewise.
	(vst2q_p64): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
	tests.
2021-07-23 12:15:57 +01:00
Jonathan Wright
95509ee2c1 aarch64: Use memcpy to copy vector tables in vst3[q] intrinsics
Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vst3[q]
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.

Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vst3q intrinsics.

gcc/ChangeLog:

2021-07-21  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/arm_neon.h (vst3_s64): Use __builtin_memcpy
	instead of constructing __builtin_aarch64_simd_ci one vector
	at a time.
	(vst3_u64): Likewise.
	(vst3_f64): Likewise.
	(vst3_s8): Likewise.
	(vst3_p8): Likewise.
	(vst3_s16): Likewise.
	(vst3_p16): Likewise.
	(vst3_s32): Likewise.
	(vst3_u8): Likewise.
	(vst3_u16): Likewise.
	(vst3_u32): Likewise.
	(vst3_f16): Likewise.
	(vst3_f32): Likewise.
	(vst3_p64): Likewise.
	(vst3q_s8): Likewise.
	(vst3q_p8): Likewise.
	(vst3q_s16): Likewise.
	(vst3q_p16): Likewise.
	(vst3q_s32): Likewise.
	(vst3q_s64): Likewise.
	(vst3q_u8): Likewise.
	(vst3q_u16): Likewise.
	(vst3q_u32): Likewise.
	(vst3q_u64): Likewise.
	(vst3q_f16): Likewise.
	(vst3q_f32): Likewise.
	(vst3q_f64): Likewise.
	(vst3q_p64): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
	tests.
2021-07-23 12:15:35 +01:00
Jonathan Wright
e8de7edde6 aarch64: Use memcpy to copy vector tables in vst4[q] intrinsics
Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vst4[q]
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.

Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vst4q intrinsics.

gcc/ChangeLog:

2021-07-20  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/arm_neon.h (vst4_s64): Use __builtin_memcpy
	instead of constructing __builtin_aarch64_simd_xi one vector
	at a time.
	(vst4_u64): Likewise.
	(vst4_f64): Likewise.
	(vst4_s8): Likewise.
	(vst4_p8): Likewise.
	(vst4_s16): Likewise.
	(vst4_p16): Likewise.
	(vst4_s32): Likewise.
	(vst4_u8): Likewise.
	(vst4_u16): Likewise.
	(vst4_u32): Likewise.
	(vst4_f16): Likewise.
	(vst4_f32): Likewise.
	(vst4_p64): Likewise.
	(vst4q_s8): Likewise.
	(vst4q_p8): Likewise.
	(vst4q_s16): Likewise.
	(vst4q_p16): Likewise.
	(vst4q_s32): Likewise.
	(vst4q_s64): Likewise.
	(vst4q_u8): Likewise.
	(vst4q_u16): Likewise.
	(vst4q_u32): Likewise.
	(vst4q_u64): Likewise.
	(vst4q_f16): Likewise.
	(vst4q_f32): Likewise.
	(vst4q_f64): Likewise.
	(vst4q_p64): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
	tests.
2021-07-23 12:15:20 +01:00
Jonathan Wright
4848e283cc aarch64: Use memcpy to copy vector tables in vtbx4 intrinsics
Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vtbx4
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.

gcc/ChangeLog:

2021-07-19  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/arm_neon.h (vtbx4_s8): Use __builtin_memcpy
	instead of constructing __builtin_aarch64_simd_oi one vector
	at a time.
	(vtbx4_u8): Likewise.
	(vtbx4_p8): Likewise.
2021-07-23 12:15:02 +01:00
Jonathan Wright
f2f04d8b9d aarch64: Use memcpy to copy vector tables in vtbl[34] intrinsics
Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vtbl[34]
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.

gcc/ChangeLog:

2021-07-08  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/arm_neon.h (vtbl3_s8): Use __builtin_memcpy
	instead of constructing __builtin_aarch64_simd_oi one vector
	at a time.
	(vtbl3_u8): Likewise.
	(vtbl3_p8): Likewise.
	(vtbl4_s8): Likewise.
	(vtbl4_u8): Likewise.
	(vtbl4_p8): Likewise.
2021-07-23 12:14:42 +01:00
Jonathan Wright
5f65676eba aarch64: Use memcpy to copy vector tables in vqtbx[234] intrinsics
Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vqtbx[234]
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.

Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vqtbx[234] intrinsics.

gcc/ChangeLog:

2021-07-08  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/arm_neon.h (vqtbx2_s8): Use __builtin_memcpy
	instead of constructing __builtin_aarch64_simd_oi one vector
	at a time.
	(vqtbx2_u8): Likewise.
	(vqtbx2_p8): Likewise.
	(vqtbx2q_s8): Likewise.
	(vqtbx2q_u8): Likewise.
	(vqtbx2q_p8): Likewise.
	(vqtbx3_s8): Use __builtin_memcpy instead of constructing
	__builtin_aarch64_simd_ci one vector at a time.
	(vqtbx3_u8): Likewise.
	(vqtbx3_p8): Likewise.
	(vqtbx3q_s8): Likewise.
	(vqtbx3q_u8): Likewise.
	(vqtbx3q_p8): Likewise.
	(vqtbx4_s8): Use __builtin_memcpy instead of constructing
	__builtin_aarch64_simd_xi one vector at a time.
	(vqtbx4_u8): Likewise.
	(vqtbx4_p8): Likewise.
	(vqtbx4q_s8): Likewise.
	(vqtbx4q_u8): Likewise.
	(vqtbx4q_p8): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vector_structure_intrinsics.c: New tests.
2021-07-23 12:14:18 +01:00
Jonathan Wright
484acfa4cf aarch64: Use memcpy to copy vector tables in vqtbl[234] intrinsics
Use __builtin_memcpy to copy vector structures instead of building
a new opaque structure one vector at a time in each of the vqtbl[234]
Neon intrinsics in arm_neon.h. This simplifies the header file and
also improves code generation - superfluous move instructions were
emitted for every register extraction/set in this additional
structure.

Add new code generation tests to verify that superfluous move
instructions are no longer generated for the vqtbl[234] intrinsics.

gcc/ChangeLog:

2021-07-08  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/arm_neon.h (vqtbl2_s8): Use __builtin_memcpy
	instead of constructing __builtin_aarch64_simd_oi one vector
	at a time.
	(vqtbl2_u8): Likewise.
	(vqtbl2_p8): Likewise.
	(vqtbl2q_s8): Likewise.
	(vqtbl2q_u8): Likewise.
	(vqtbl2q_p8): Likewise.
	(vqtbl3_s8): Use __builtin_memcpy instead of constructing
	__builtin_aarch64_simd_ci one vector at a time.
	(vqtbl3_u8): Likewise.
	(vqtbl3_p8): Likewise.
	(vqtbl3q_s8): Likewise.
	(vqtbl3q_u8): Likewise.
	(vqtbl3q_p8): Likewise.
	(vqtbl4_s8): Use __builtin_memcpy instead of constructing
	__builtin_aarch64_simd_xi one vector at a time.
	(vqtbl4_u8): Likewise.
	(vqtbl4_p8): Likewise.
	(vqtbl4q_s8): Likewise.
	(vqtbl4q_u8): Likewise.
	(vqtbl4q_p8): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vector_structure_intrinsics.c: New test.
2021-07-23 12:13:55 +01:00
Jonathan Wakely
5b965dc49a libstdc++: Update documentation comments for namespace rel_ops
The comments in <bits/stl_relops.h> describe problems that were solved
years ago (for GCC 3.1). The comparison operators in <iterator> are no
longer ambiguous with the rel_ops ones, so the linked mailing list
thread and FAQ entry aren't relevant now. The reference to std_utility.h
is also outdated as it's just called utility now, both in the source
tree and when installed.

The use of rel_ops is still frowned upon though, so replace the
discussion of ambiguities within libstdc++ headers with adminition about
using rel_ops in user code.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* include/bits/stl_relops.h: Update documentation comments.
2021-07-23 11:22:10 +01:00
Jakub Jelinek
7f7364108f openmp: Add support for __has_attribute(omp::directive) and __has_attribute(omp::sequence)
Now that the C++ FE supports these attributes, but not through registering
them in the attributes tables (they work quite differently from other
attributes), this teaches c_common_has_attributes about those.

2021-07-23  Jakub Jelinek  <jakub@redhat.com>

	* c-lex.c (c_common_has_attribute): Call canonicalize_attr_name also
	on attr_id.  Return 1 for omp::directive or omp::sequence in C++11
	and later.

	* c-c++-common/gomp/attrs-1.c: New test.
	* c-c++-common/gomp/attrs-2.c: New test.
	* c-c++-common/gomp/attrs-3.c: New test.
2021-07-23 09:50:15 +02:00
Jakub Jelinek
2c5d803d03 openmp: Diagnose invalid mixing of the attribute and pragma syntax directives
The OpenMP 5.1 spec says that the attribute and pragma syntax directives
should not be mixed on the same statement.  The following patch adds diagnostic
for that,
  [[omp::directive (...)]]
  #pragma omp ...
is always an error and for the other order
  #pragma omp ...
  [[omp::directive (...)]]
it depends on whether the pragma directive is an OpenMP construct
(then it is an error because it needs a structured block or loop
or statement as body) or e.g. a standalone directive (then it is fine).

Only block scope is handled for now though, namespace scope and class scope
still needs implementing even the basic support.

2021-07-23  Jakub Jelinek  <jakub@redhat.com>

gcc/c-family/
	* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP__START_ and
	PRAGMA_OMP__LAST_ enumerators.
gcc/cp/
	* parser.h (struct cp_parser): Add omp_attrs_forbidden_p member.
	* parser.c (cp_parser_handle_statement_omp_attributes): Diagnose
	mixing of attribute and pragma syntax directives when seeing
	omp::directive if parser->omp_attrs_forbidden_p or if attribute syntax
	directives are followed by OpenMP pragma.
	(cp_parser_statement): Clear parser->omp_attrs_forbidden_p after
	the cp_parser_handle_statement_omp_attributes call.
	(cp_parser_omp_structured_block): Add disallow_omp_attrs argument,
	if true, set parser->omp_attrs_forbidden_p.
	(cp_parser_omp_scan_loop_body, cp_parser_omp_sections_scope): Pass
	false as disallow_omp_attrs to cp_parser_omp_structured_block.
	(cp_parser_omp_parallel, cp_parser_omp_task): Set
	parser->omp_attrs_forbidden_p.
gcc/testsuite/
	* g++.dg/gomp/attrs-4.C: New test.
	* g++.dg/gomp/attrs-5.C: New test.
2021-07-23 09:37:36 +02:00
Xi Ruoyao
19e0505879
testsuite: mips: pass -finline/-fnoinline through
gcc/testsuite/

	* gcc.target/mips/mips.exp (mips_option_groups): add
	  -finline and -fno-inline.
2021-07-23 13:55:56 +08:00