Commit Graph

189627 Commits

Author SHA1 Message Date
Stafford Horne
1bac7d31a1 or1k: Fix clobbering of _mcount argument if fPIC is enabled
Recently we changed the PROFILE_HOOK _mcount call to pass in the link
register as an argument.  This actually does not work when the _mcount
call uses a PLT because the GOT register setup code ends up getting
inserted before the PROFILE_HOOK and clobbers the link register
argument.

These glibc tests are failing:
  gmon/tst-gmon-pie-gprof
  gmon/tst-gmon-static-gprof

This patch fixes this by saving the instruction that stores the Link
Register to the _mcount argument and then inserts the GOT register setup
instructions after that.

For example:

main.c:

    extern int e;

    int f2(int a) {
      return a + e;
    }

    int f1(int a) {
      return f2 (a + a);
    }

    int main(int argc, char ** argv) {
      return f1 (argc);
    }

Compiled:

    or1k-smh-linux-gnu-gcc -Wall -c -O2 -fPIC -pg -S main.c

Before Fix:

    main:
        l.addi  r1, r1, -16
        l.sw    8(r1), r2
        l.sw    0(r1), r16
        l.addi  r2, r1, 16   # Keeping FP, but not needed
        l.sw    4(r1), r18
        l.sw    12(r1), r9
        l.jal   8            # GOT Setup clobbers r9 (Link Register)
         l.movhi        r16, gotpchi(_GLOBAL_OFFSET_TABLE_-4)
        l.ori   r16, r16, gotpclo(_GLOBAL_OFFSET_TABLE_+0)
        l.add   r16, r16, r9
        l.or    r18, r3, r3
        l.or    r3, r9, r9    # This is not the original LR
        l.jal   plt(_mcount)
         l.nop

        l.jal   plt(f1)
         l.or    r3, r18, r18
        l.lwz   r9, 12(r1)
        l.lwz   r16, 0(r1)
        l.lwz   r18, 4(r1)
        l.lwz   r2, 8(r1)
        l.jr    r9
         l.addi  r1, r1, 16

After the fix:

    main:
        l.addi  r1, r1, -12
        l.sw    0(r1), r16
        l.sw    4(r1), r18
        l.sw    8(r1), r9
        l.or    r18, r3, r3
        l.or    r3, r9, r9    # We now have r9 (LR) set early
        l.jal   8             # Clobbers r9 (Link Register)
         l.movhi        r16, gotpchi(_GLOBAL_OFFSET_TABLE_-4)
        l.ori   r16, r16, gotpclo(_GLOBAL_OFFSET_TABLE_+0)
        l.add   r16, r16, r9
        l.jal   plt(_mcount)
         l.nop

        l.jal   plt(f1)
         l.or    r3, r18, r18
        l.lwz   r9, 8(r1)
        l.lwz   r16, 0(r1)
        l.lwz   r18, 4(r1)
        l.jr    r9
         l.addi  r1, r1, 12

Fixes: 308531d148 ("or1k: Add return address argument to _mcount call")

gcc/ChangeLog:
	* config/or1k/or1k-protos.h (or1k_profile_hook): New function.
	* config/or1k/or1k.h (PROFILE_HOOK): Change macro to reference
	new function or1k_profile_hook.
	* config/or1k/or1k.c (struct machine_function): Add new field
	set_mcount_arg_insn.
	(or1k_profile_hook): New function.
	(or1k_init_pic_reg): Update to inject pic rtx after _mcount arg
	when profiling.
	(or1k_frame_pointer_required): Frame pointer no longer needed
	when profiling.
2021-11-13 07:58:00 +09:00
Jan Hubicka
4d2d5565a0 Fix wrong code with pure functions
I introduced bug into find_func_aliases_for_call in handling pure functions.
Instead of reading global memory pure functions are believed to write global
memory.  This results in misoptimization of the testcase at -O1.

The change to pta-callused.c updates the template for new behaviour of the
constraint generation. We copy nonlocal memory to calluse which is correct but
also not strictly necessary because later we take care to add nonlocal_p flag
manually.

gcc/ChangeLog:

	PR tree-optimization/103209
	* tree-ssa-structalias.c (find_func_aliases_for_call): Fix
	use of handle_rhs_call

gcc/testsuite/ChangeLog:

	PR tree-optimization/103209
	* gcc.dg/tree-ssa/pta-callused.c: Update template.
	* gcc.c-torture/execute/pr103209.c: New test.
2021-11-12 23:55:50 +01:00
Aldy Hernandez
264f061997 path solver: Solve PHI imports first for ranges.
PHIs must be resolved first while solving ranges in a block,
regardless of where they appear in the import bitmap.  We went through
a similar exercise for the relational code, but missed these.

Tested on x86-64 & ppc64le Linux.

gcc/ChangeLog:

	PR tree-optimization/103202
	* gimple-range-path.cc
	(path_range_query::compute_ranges_in_block): Solve PHI imports first.
2021-11-12 20:42:56 +01:00
Jan Hubicka
b301cb43a7 Fix ipa-pure-const
gcc/ChangeLog:

	* ipa-pure-const.c (propagate_pure_const): Remove redundant check;
	fix call of ipa_make_function_const and ipa_make_function_pure.
2021-11-12 20:15:48 +01:00
David Malcolm
72f1c1c452 analyzer: "__analyzer_dump_state" has no side-effects
gcc/analyzer/ChangeLog:
	* engine.cc (exploded_node::on_stmt_pre): Return when handling
	"__analyzer_dump_state".

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2021-11-12 14:01:36 -05:00
Richard Sandiford
87fcff96db aarch64: Remove redundant costing code
Previous patches made some of the complex parts of the issue rate
code redundant.

gcc/
	* config/aarch64/aarch64.c (aarch64_vector_op::n_advsimd_ops): Delete.
	(aarch64_vector_op::m_seen_loads): Likewise.
	(aarch64_vector_costs::aarch64_vector_costs): Don't push to
	m_advsimd_ops.
	(aarch64_vector_op::count_ops): Remove vectype and factor parameters.
	Remove code that tries to predict different vec_flags from the
	current loop's.
	(aarch64_vector_costs::add_stmt_cost): Update accordingly.
	Remove m_advsimd_ops handling.
2021-11-12 17:33:03 +00:00
Richard Sandiford
c6c5c5ebae aarch64: Use new hooks for vector comparisons
Previously we tried to account for the different issue rates of
the various vector modes by guessing what the Advanced SIMD version
of an SVE loop would look like and what its issue rate was likely to be.
We'd then increase the cost of the SVE loop if the Advanced SIMD loop
might issue more quickly.

This patch moves that logic to better_main_loop_than_p, so that we
can compare loops side-by-side rather than having to guess.  This also
means we can apply the issue rate heuristics to *any* vector loop
comparison, rather than just weighting SVE vs. Advanced SIMD.

The actual heuristics are otherwise unchanged.  We're just
applying them in a different place.

gcc/
	* config/aarch64/aarch64.c (aarch64_vector_costs::m_saw_sve_only_op)
	(aarch64_sve_only_stmt_p): Delete.
	(aarch64_vector_costs::prefer_unrolled_loop): New function,
	extracted from adjust_body_cost.
	(aarch64_vector_costs::better_main_loop_than_p): New function,
	using heuristics extracted from adjust_body_cost and
	adjust_body_cost_sve.
	(aarch64_vector_costs::adjust_body_cost_sve): Remove
	advsimd_cycles_per_iter and could_use_advsimd parameters.
	Update after changes above.
	(aarch64_vector_costs::adjust_body_cost): Update after changes above.
2021-11-12 17:33:03 +00:00
Richard Sandiford
2e1886ea06 aarch64: Add vf_factor to aarch64_vec_op_count
-mtune=neoverse-512tvb sets the likely SVE vector length to 128 bits,
but it also takes into account Neoverse V1, which is a 256-bit target.
This patch adds this VF (VL) factor to aarch64_vec_op_count.

gcc/
	* config/aarch64/aarch64.c (aarch64_vec_op_count::m_vf_factor):
	New member variable.
	(aarch64_vec_op_count::aarch64_vec_op_count): Add a parameter for it.
	(aarch64_vec_op_count::vf_factor): New function.
	(aarch64_vector_costs::aarch64_vector_costs): When costing for
	neoverse-512tvb, pass a vf_factor of 2 for the Neoverse V1 version
	of an SVE loop.
	(aarch64_vector_costs::adjust_body_cost): Read the vf factor
	instead of hard-coding 2.
2021-11-12 17:33:02 +00:00
Richard Sandiford
a82ffd4361 aarch64: Move cycle estimation into aarch64_vec_op_count
This patch just moves the main cycle estimation routines
into aarch64_vec_op_count.

gcc/
	* config/aarch64/aarch64.c
	(aarch64_vec_op_count::rename_cycles_per_iter): New function.
	(aarch64_vec_op_count::min_nonpred_cycles_per_iter): Likewise.
	(aarch64_vec_op_count::min_pred_cycles_per_iter): Likewise.
	(aarch64_vec_op_count::min_cycles_per_iter): Likewise.
	(aarch64_vec_op_count::dump): Move earlier in file.  Dump the
	above properties too.
	(aarch64_estimate_min_cycles_per_iter): Delete.
	(adjust_body_cost): Use aarch64_vec_op_count::min_cycles_per_iter
	instead of aarch64_estimate_min_cycles_per_iter.  Rely on the dump
	routine to print CPI estimates.
	(adjust_body_cost_sve): Likewise.  Use the other functions above
	instead of doing the work inline.
2021-11-12 17:33:02 +00:00
Richard Sandiford
1a5288fe3d aarch64: Use an array of aarch64_vec_op_counts
-mtune=neoverse-512tvb uses two issue rates, one for Neoverse V1
and one with more generic parameters.  We use both rates when
making a choice between scalar, Advanced SIMD and SVE code.

Previously we calculated the Neoverse V1 issue rates from the
more generic issue rates, but by removing m_scalar_ops and
(later) m_advsimd_ops, it becomes easier to track multiple
issue rates directly.

This patch therefore converts m_ops and (temporarily) m_advsimd_ops
into arrays.

gcc/
	* config/aarch64/aarch64.c (aarch64_vec_op_count): Allow default
	initialization.
	(aarch64_vec_op_count::base_issue_info): Remove handling of null
	issue_infos.
	(aarch64_vec_op_count::simd_issue_info): Likewise.
	(aarch64_vec_op_count::sve_issue_info): Likewise.
	(aarch64_vector_costs::m_ops): Turn into a vector.
	(aarch64_vector_costs::m_advsimd_ops): Likewise.
	(aarch64_vector_costs::aarch64_vector_costs): Add entries to
	the vectors based on aarch64_tune_params.
	(aarch64_vector_costs::analyze_loop_vinfo): Update the pred_ops
	of all entries in m_ops.
	(aarch64_vector_costs::add_stmt_cost): Call count_ops for all
	entries in m_ops.
	(aarch64_estimate_min_cycles_per_iter): Remove issue_info
	parameter and get the information from the ops instead.
	(aarch64_vector_costs::adjust_body_cost_sve): Take a
	aarch64_vec_issue_info instead of a aarch64_vec_op_count.
	(aarch64_vector_costs::adjust_body_cost): Update call accordingly.
	Exit earlier if m_ops is empty for either cost structure.
2021-11-12 17:33:02 +00:00
Richard Sandiford
6756706ea6 aarch64: Use real scalar op counts
Now that vector finish_costs is passed the associated scalar costs,
we can record the scalar issue information while computing the scalar
costs, rather than trying to estimate it while computing the vector
costs.

This simplifies things a little, but the main motivation is to improve
accuracy.

gcc/
	* config/aarch64/aarch64.c (aarch64_vector_costs::m_scalar_ops)
	(aarch64_vector_costs::m_sve_ops): Replace with...
	(aarch64_vector_costs::m_ops): ...this.
	(aarch64_vector_costs::analyze_loop_vinfo): Update accordingly.
	(aarch64_vector_costs::adjust_body_cost_sve): Likewise.
	(aarch64_vector_costs::aarch64_vector_costs): Likewise.
	Initialize m_vec_flags here rather than in add_stmt_cost.
	(aarch64_vector_costs::count_ops): Test for scalar reductions too.
	Allow vectype to be null.
	(aarch64_vector_costs::add_stmt_cost): Call count_ops for scalar
	code too.  Don't require vectype to be nonnull.
	(aarch64_vector_costs::adjust_body_cost): Take the loop_vec_info
	and scalar costs as parameters.  Use the scalar costs to determine
	the cycles per iteration of the scalar loop, then multiply it
	by the estimated VF.
	(aarch64_vector_costs::finish_cost): Update call accordingly.
2021-11-12 17:33:01 +00:00
Richard Sandiford
902b7c9e18 aarch64: Get floatness from stmt_info
This patch gets the floatness of a memory access from the data
reference rather than the vectype.  This makes it more suitable
for use in scalar costing code.

gcc/
	* config/aarch64/aarch64.c (aarch64_dr_type): New function.
	(aarch64_vector_costs::count_ops): Use it rather than the
	vectype to determine floatness.
2021-11-12 17:33:01 +00:00
Richard Sandiford
26122469df aarch64: Remove vectype from latency tests
This patch gets the scalar mode of a reduction operation from the
gimple stmt rather than the vectype.  This makes it more suitable
for use in scalar costs.

gcc/
	* config/aarch64/aarch64.c (aarch64_sve_in_loop_reduction_latency):
	Remove vectype parameter and get floatness from the type of the
	stmt lhs instead.
	(arch64_in_loop_reduction_latency): Likewise.
	(aarch64_detect_vector_stmt_subtype): Update caller.
	(aarch64_vector_costs::count_ops): Likewise.
2021-11-12 17:33:00 +00:00
Richard Sandiford
15aba5a67c aarch64: Fold aarch64_sve_op_count into aarch64_vec_op_count
Later patches make aarch64 use the new vector hooks.  We then
only need to track one set of ops for each aarch64_vector_costs
structure.  This in turn means that it's more convenient to merge
aarch64_sve_op_count and aarch64_vec_op_count.

The patch also adds issue info and vec flags to aarch64_vec_op_count,
so that the structure is more self-descriptive.  This simplifies some
things later.

gcc/
	* config/aarch64/aarch64.c (aarch64_sve_op_count): Fold into...
	(aarch64_vec_op_count): ...this.  Add a constructor.
	(aarch64_vec_op_count::vec_flags): New function.
	(aarch64_vec_op_count::base_issue_info): Likewise.
	(aarch64_vec_op_count::simd_issue_info): Likewise.
	(aarch64_vec_op_count::sve_issue_info): Likewise.
	(aarch64_vec_op_count::m_issue_info): New member variable.
	(aarch64_vec_op_count::m_vec_flags): Likewise.
	(aarch64_vector_costs): Add a constructor.
	(aarch64_vector_costs::m_sve_ops): Change type to aarch64_vec_op_count.
	(aarch64_vector_costs::aarch64_vector_costs): New function.
	Initialize m_scalar_ops, m_advsimd_ops and m_sve_ops.
	(aarch64_vector_costs::count_ops): Remove vec_flags and
	issue_info parameters, using the new aarch64_vec_op_count
	functions instead.
	(aarch64_vector_costs::add_stmt_cost): Update call accordingly.
	(aarch64_sve_op_count::dump): Fold into...
	(aarch64_vec_op_count::dump): ..here.
2021-11-12 17:33:00 +00:00
Richard Sandiford
526e1639aa aarch64: Detect more consecutive MEMs
For tests like:

    int res[2];
    void
    f1 (int x, int y)
    {
      res[0] = res[1] = x + y;
    }

we generated:

        add     w0, w0, w1
        adrp    x1, .LANCHOR0
        add     x2, x1, :lo12:.LANCHOR0
        str     w0, [x1, #:lo12:.LANCHOR0]
        str     w0, [x2, 4]
        ret

Using [x1, #:lo12:.LANCHOR0] for the first store prevented the
two stores being recognised as a pair.  However, the MEM_EXPR
and MEM_OFFSET information tell us that the MEMs really are
consecutive.  The peehole2 context then guarantees that the
first address is equivalent to [x2, 0].

While there: the reg_mentioned_p tests for loads were probably correct,
but seemed a bit indirect.  We're matching two consecutive loads,
so the thing we need to test is that the second MEM in the original
sequence doesn't depend on the result of the first load in the
original sequence.

gcc/
	* config/aarch64/aarch64.c: Include tree-dfa.h.
	(aarch64_check_consecutive_mems): New function that takes MEM_EXPR
	and MEM_OFFSET into account.
	(aarch64_swap_ldrstr_operands): Use it.
	(aarch64_operands_ok_for_ldpstp): Likewise.  Check that the
	address of the second memory doesn't depend on the result of
	the first load.

gcc/testsuite/
	* gcc.target/aarch64/stp_1.c: New test.
2021-11-12 17:33:00 +00:00
Tobias Burnus
48c6cac9ca Fortran/openmp: Fix '!$omp end'
gcc/fortran/ChangeLog:

	* parse.c (decode_omp_directive): Fix permitting 'nowait' for some
	combined directives, add missing 'omp end ... loop'.
	(gfc_ascii_statement): Fix ST_OMP_END_TEAMS_LOOP result.
	* openmp.c (resolve_omp_clauses): Add missing combined loop constructs
	case values to the 'if(directive-name: ...)' check.
	* trans-openmp.c (gfc_split_omp_clauses): Put nowait on target if
	first leaf construct accepting it.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/unexpected-end.f90: Update dg-error.
	* gfortran.dg/gomp/clauses-1.f90: New test.
	* gfortran.dg/gomp/nowait-2.f90: New test.
	* gfortran.dg/gomp/nowait-3.f90: New test.
2021-11-12 17:58:21 +01:00
Jan Hubicka
82de09ab17 Fix exit condition in ipa_make_function_pure
gcc/ChangeLog:

	* ipa-pure-const.c (ipa_make_function_pure): Fix exit condition.
2021-11-12 16:54:29 +01:00
Jan Hubicka
4526ec20f1 Fix ICE in tree-ssa-structalias.c
PR tree-optimization/103175
	* ipa-modref.c (modref_lattice::merge): Add sanity check.
	(callee_to_caller_flags): Make flags adjustment sane.
	(modref_eaf_analysis::analyze_ssa_name): Likewise.
2021-11-12 16:35:01 +01:00
Jakub Jelinek
f49c7a4fb2 libgomp: Unbreak gcn offload build
My recent libgomp change apparently broke libgomp build for gcn offloading.
The problem is that gcn, unlike nvptx, doesn't override teams.c source file
and the patch I've committed assumed all the non-LIBGOMP_USE_PTHREADS targets
do not use it.  My understanding is that gcn included omp_get_num_teams
and omp_get_team_num definitions in both icv-device.o and teams.o,
with the definitions only in the former working correctly.

This patch brings gcn into sync with how nvptx does it, that teams.c
is overridden, provides a dummy GOMP_teams_reg and omp_get_{num_teams,team_num}
definitions and icv-device.c doesn't provide those.

2021-11-12  Jakub Jelinek  <jakub@redhat.com>

	PR target/103201
	* config/gcn/icv-device.c (omp_get_num_teams, omp_get_team_num): Move
	to ...
	* config/gcn/teams.c: ... here.  New file.
2021-11-12 16:11:02 +01:00
Martin Jambor
847f587dc4
Fortran: Use build_debug_expr_decl to create DEBUG_DECL_EXPRs
This patch converts one more open coded construction of a
DEBUG_EXPR_DECL to a call of build_debug_expr_decl that I missed in my
previous patch befause it happens to be in the Fortran front-end.

gcc/fortran/ChangeLog:

2021-11-11  Martin Jambor  <mjambor@suse.cz>

	* trans-types.c (gfc_get_array_descr_info): Use build_debug_expr_decl
	instead of building DEBUG_EXPR_DECL manually.
2021-11-12 15:46:05 +01:00
Martin Liska
6849c71c06 testsuite: Filter out TSVC test on Power [PR103051]
PR testsuite/103051

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/tsvc/vect-tsvc-s112.c: Skip test for old Power
	CPUs.
2021-11-12 15:24:01 +01:00
Martin Liska
83310a08a2 libbacktrace: fix UBSAN issues
Fix issues mentioned in the PR.

	PR libbacktrace/103167

libbacktrace/ChangeLog:

	* elf.c (elf_uncompress_lzma_block): Cast to unsigned int.
	(elf_uncompress_lzma): Likewise.
	* xztest.c (test_samples): memcpy only if v > 0.
2021-11-12 15:06:12 +01:00
David Malcolm
aa1fd30df5 jit: fix -Werror=format-overflow= in testsuite [PR103199]
gcc/jit/ChangeLog:
	PR jit/103199
	* docs/examples/tut04-toyvm/toyvm.c (toyvm_function_compile):
	Increase size of buffer.
	* docs/examples/tut04-toyvm/toyvm.cc
	(compilation_state::create_function): Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2021-11-12 08:20:45 -05:00
Jan Hubicka
1b62cddcf0 Fix ipa-modref pure/const discovery
PR ipa/103200
	* ipa-modref.c (analyze_function, modref_propagate_in_scc): Do
	not mark pure/const function if there are side-effects.
2021-11-12 14:01:17 +01:00
Chung-Lin Tang
b7e2048063 openmp: Relax handling of implicit map vs. existing device mappings
This patch implements relaxing the requirements when a map with the implicit
attribute encounters an overlapping existing map. As the OpenMP 5.0 spec
describes on page 320, lines 18-27 (and 5.1 spec, page 352, lines 13-22):

"If a single contiguous part of the original storage of a list item with an
 implicit data-mapping attribute has corresponding storage in the device data
 environment prior to a task encountering the construct that is associated with
 the map clause, only that part of the original storage will have corresponding
 storage in the device data environment as a result of the map clause."

2021-11-12  Chung-Lin Tang  <cltang@codesourcery.com>

include/ChangeLog:

	* gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_3): Define special bit macro.
	(GOMP_MAP_IMPLICIT): New special map kind bits value.
	(GOMP_MAP_FLAG_SPECIAL_BITS): Define helper mask for whole set of
	special map kind bits.
	(GOMP_MAP_IMPLICIT_P): New predicate macro for implicit map kinds.

gcc/ChangeLog:

	* tree.h (OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P): New access macro for
	'implicit' bit, using 'base.deprecated_flag' field of tree_node.
	* tree-pretty-print.c (dump_omp_clause): Add support for printing
	implicit attribute in tree dumping.
	* gimplify.c (gimplify_adjust_omp_clauses_1):
	Set OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P to 1 if map clause is implicitly
	created.
	(gimplify_adjust_omp_clauses): Adjust place of adding implicitly created
	clauses, from simple append, to starting of list, after non-map clauses.
	* omp-low.c (lower_omp_target): Add GOMP_MAP_IMPLICIT bits into kind
	values passed to libgomp for implicit maps.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/target-implicit-map-1.c: New test.
	* c-c++-common/goacc/combined-reduction.c: Adjust scan test pattern.
	* c-c++-common/goacc/firstprivate-mappings-1.c: Likewise.
	* c-c++-common/goacc/mdc-1.c: Likewise.
	* g++.dg/goacc/firstprivate-mappings-1.C: Likewise.

libgomp/ChangeLog:

	* target.c (gomp_map_vars_existing): Add 'bool implicit' parameter, add
	implicit map handling to allow a "superset" existing map as valid case.
	(get_kind): Adjust to filter out GOMP_MAP_IMPLICIT bits in return value.
	(get_implicit): New function to extract implicit status.
	(gomp_map_fields_existing): Adjust arguments in calls to
	gomp_map_vars_existing, and add uses of get_implicit.
	(gomp_map_vars_internal): Likewise.
	* testsuite/libgomp.c-c++-common/target-implicit-map-1.c: New test.
2021-11-12 20:29:48 +08:00
Jonathan Wakely
a54ce8865a libstdc++: Print assertion messages to stderr [PR59675]
This replaces the printf used by failed debug assertions with fprintf,
so we can write to stderr.

To avoid including <stdio.h> the assert function is moved into the
library. To avoid programs using a vague linkage definition of the old
inline function, the function is renamed. Code compiled with old
versions of GCC might still call the old function, but code compiled
with the newer GCC will call the new function and write to stderr.

libstdc++-v3/ChangeLog:

	PR libstdc++/59675
	* acinclude.m4 (libtool_VERSION): Bump version.
	* config/abi/pre/gnu.ver (GLIBCXX_3.4.30): Add version and
	export new symbol.
	* configure: Regenerate.
	* include/bits/c++config (__replacement_assert): Remove, declare
	__glibcxx_assert_fail instead.
	* src/c++11/debug.cc (__glibcxx_assert_fail): New function to
	replace __replacement_assert, writing to stderr instead of
	stdout.
	* testsuite/util/testsuite_abi.cc: Update latest version.
2021-11-12 12:23:10 +00:00
Mikael Morin
68d62cb206 fortran: Ignore unused args in scalarization [PR97896]
The KIND argument of the INDEX intrinsic is a compile time constant
that is used at compile time only to resolve to a kind-specific library
function.  That argument is otherwise completely ignored at runtime, and there is
no code generated for it as the library procedure has no kind argument.
This confuses the scalarizer which expects to see every argument
of elemental functions used when calling a procedure.
This change removes the argument from the scalarization lists
at the beginning of the scalarization process, so that the argument
is completely ignored.
This also reverts the existing workaround
(commit d09847357b except for its testcase).

	PR fortran/97896

gcc/fortran/ChangeLog:
	* intrinsic.c (add_sym_4ind): Remove.
	(add_functions): Use add_sym4 instead of add_sym4ind.
	Don’t special case the index intrinsic.
	* iresolve.c (gfc_resolve_index_func): Use the individual arguments
	directly instead of the full argument list.
	* intrinsic.h (gfc_resolve_index_func): Update the declaration
	accordingly.
	* trans-decl.c (gfc_get_extern_function_decl): Don’t modify the
	list of arguments in the case of the index intrinsic.
	* trans-array.h (gfc_get_intrinsic_for_expr,
	gfc_get_proc_ifc_for_expr): New.
	* trans-array.c (gfc_get_intrinsic_for_expr,
	arg_evaluated_for_scalarization): New.
	(gfc_walk_elemental_function_args): Add intrinsic procedure
	as argument.  Count arguments.  Check arg_evaluated_for_scalarization.
	* trans-intrinsic.c (gfc_walk_intrinsic_function): Update call.
	* trans-stmt.c (get_intrinsic_for_code): New.
	(gfc_trans_call): Update call.

gcc/testsuite/ChangeLog:
	* gfortran.dg/index_5.f90: New.
2021-11-12 13:10:55 +01:00
Jakub Jelinek
7d6da11fce openmp: Honor OpenMP 5.1 num_teams lower bound
The following patch implements what I've been talking about earlier,
honor that for explicit num_teams clause we create at least the
lower-bound (if not specified, upper-bound) teams in the league.
For host fallback, it still means we only have one thread doing all the
teams, sequentially one after another.
For PTX and GCN, I think the new teams-2.c test and maybe teams-4.c too
will or might fail.
For these offloads, I think it is ok to remove symbols no longer used
from libgomp.a.
If num_teams_lower is bigger than the provided num_blocks or num_workgroups,
we should arrange for gomp_num_teams_var to be num_teams_lower - 1,
stop using the %ctaid.x or __builtin_gcn_dim_pos (0) for omp_get_team_num ()
and instead use for it some .shared var that GOMP_teams4 initializes to
%ctaid.x or __builtin_gcn_dim_pos (0) when first and for !first
increment that by num_blocks or num_workgroups each time and only
return false when we are above num_teams_lower.
Any help with actually implementing this for the 2 architectures highly
appreciated.

2021-11-12  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* omp-builtins.def (BUILT_IN_GOMP_TEAMS): Remove.
	(BUILT_IN_GOMP_TEAMS4): New.
	* builtin-types.def (BT_FN_VOID_UINT_UINT): Remove.
	(BT_FN_BOOL_UINT_UINT_UINT_BOOL): New.
	* omp-low.c (lower_omp_teams): Use GOMP_teams4 instead of
	GOMP_teams, pass to it also num_teams lower-bound expression
	or a dup of upper-bound if it is missing and a flag whether
	it is the first call or not.
gcc/fortran/
	* types.def (BT_FN_VOID_UINT_UINT): Remove.
	(BT_FN_BOOL_UINT_UINT_UINT_BOOL): New.
libgomp/
	* libgomp_g.h (GOMP_teams4): Declare.
	* libgomp.map (GOMP_5.1): Export GOMP_teams4.
	* target.c (GOMP_teams4): New function.
	* config/nvptx/target.c (GOMP_teams): Remove.
	(GOMP_teams4): New function.
	* config/gcn/target.c (GOMP_teams): Remove.
	(GOMP_teams4): New function.
	* testsuite/libgomp.c/teams-4.c (main): Expect exactly 2
	teams instead of <= 2.
	* testsuite/libgomp.c-c++-common/teams-2.c: New test.
2021-11-12 12:41:22 +01:00
Martin Liska
5f516a6a5d Remove unused function.
PR tree-optimization/102497

gcc/ChangeLog:

	* gimple-predicate-analysis.cc (add_pred): Remove unused
	function:
2021-11-12 12:40:02 +01:00
Richard Biener
140346fa24 tree-optimization/103204 - fix missed valueization in VN
The following fixes a missed valueization when simplifying
a MEM[&...] combination during valueization.

2021-11-12  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/103204
	* tree-ssa-sccvn.c (valueize_refs_1): Re-valueize the
	top operand after folding in an address.

	* gcc.dg/torture/pr103204.c: New testcase.
2021-11-12 09:11:49 +01:00
Alan Modra
c60ded6f5e Make opcodes configure depend on bfd configure
The idea is for opcodes to be able to see whether bfd is compiled
for 64-bit.  A lot of --enable-targets=all libopcodes is wasted space
if bfd can't load 64-bit target object files.

	* Makefile.def (configure-opcodes): Depend on configure-bfd.
	* Makefile.in: Regenerate.
2021-11-12 18:34:12 +10:30
Jonathan Wakely
1ae8edf5f7 libstdc++: Implement constexpr std::vector for C++20
This implements P1004R2 ("Making std::vector constexpr") for C++20.

For now, debug mode vectors are not supported in constant expressions.
To make that work we might need to disable all attaching/detaching of
safe iterators. That can be fixed later.

Co-authored-by: Josh Marshall <joshua.r.marshall.1991@gmail.com>

libstdc++-v3/ChangeLog:

	* include/bits/alloc_traits.h (_Destroy): Make constexpr for
	C++20 mode.
	* include/bits/allocator.h (__shrink_to_fit::_S_do_it):
	Likewise.
	* include/bits/stl_algobase.h (__fill_a1): Declare _Bit_iterator
	overload constexpr for C++20.
	* include/bits/stl_bvector.h (_Bit_type, _S_word_bit): Move out
	of inline namespace.
	(_Bit_reference, _Bit_iterator_base, _Bit_iterator)
	(_Bit_const_iterator, _Bvector_impl_data, _Bvector_base)
	(vector<bool, A>>): Add constexpr to every member function.
	(_Bvector_base::_M_allocate): Initialize storage during constant
	evaluation.
	(vector<bool, A>::_M_initialize_value): Use __fill_bvector_n
	instead of memset.
	(__fill_bvector_n): New helper function to replace memset during
	constant evaluation.
	* include/bits/stl_uninitialized.h (__uninitialized_copy<false>):
	Move logic to ...
	(__do_uninit_copy): New function.
	(__uninitialized_fill<false>): Move logic to ...
	(__do_uninit_fill): New function.
	(__uninitialized_fill_n<false>): Move logic to ...
	(__do_uninit_fill_n): New function.
	(__uninitialized_copy_a): Add constexpr. Use __do_uninit_copy.
	(__uninitialized_move_a, __uninitialized_move_if_noexcept_a):
	Add constexpr.
	(__uninitialized_fill_a): Add constexpr. Use __do_uninit_fill.
	(__uninitialized_fill_n_a): Add constexpr. Use
	__do_uninit_fill_n.
	(__uninitialized_default_n, __uninitialized_default_n_a)
	(__relocate_a_1, __relocate_a): Add constexpr.
	* include/bits/stl_vector.h (_Vector_impl_data, _Vector_impl)
	(_Vector_base, vector): Add constexpr to every member function.
	(_Vector_impl::_S_adjust): Disable ASan annotation during
	constant evaluation.
	(_Vector_base::_S_use_relocate): Disable bitwise-relocation
	during constant evaluation.
	(vector::_Temporary_value): Use a union for storage.
	* include/bits/vector.tcc (vector, vector<bool>): Add constexpr
	to every member function.
	* include/std/vector (erase_if, erase): Add constexpr.
	* testsuite/23_containers/headers/vector/synopsis.cc: Add
	constexpr for C++20 mode.
	* testsuite/23_containers/vector/bool/cmp_c++20.cc: Change to
	compile-only test using constant expressions.
	* testsuite/23_containers/vector/bool/capacity/29134.cc: Adjust
	namespace for _S_word_bit.
	* testsuite/23_containers/vector/bool/modifiers/insert/31370.cc:
	Likewise.
	* testsuite/23_containers/vector/cmp_c++20.cc: Likewise.
	* testsuite/23_containers/vector/cons/89164.cc: Adjust errors
	for C++20 and move C++17 test to ...
	* testsuite/23_containers/vector/cons/89164_c++17.cc: ... here.
	* testsuite/23_containers/vector/bool/capacity/constexpr.cc: New test.
	* testsuite/23_containers/vector/bool/cons/constexpr.cc: New test.
	* testsuite/23_containers/vector/bool/element_access/constexpr.cc: New test.
	* testsuite/23_containers/vector/bool/modifiers/assign/constexpr.cc: New test.
	* testsuite/23_containers/vector/bool/modifiers/constexpr.cc: New test.
	* testsuite/23_containers/vector/bool/modifiers/swap/constexpr.cc: New test.
	* testsuite/23_containers/vector/capacity/constexpr.cc: New test.
	* testsuite/23_containers/vector/cons/constexpr.cc: New test.
	* testsuite/23_containers/vector/data_access/constexpr.cc: New test.
	* testsuite/23_containers/vector/element_access/constexpr.cc: New test.
	* testsuite/23_containers/vector/modifiers/assign/constexpr.cc: New test.
	* testsuite/23_containers/vector/modifiers/constexpr.cc: New test.
	* testsuite/23_containers/vector/modifiers/swap/constexpr.cc: New test.
2021-11-12 00:42:39 +00:00
GCC Administrator
b39265d4fe Daily bump. 2021-11-12 00:16:32 +00:00
Jonathan Wakely
4a407d358e libstdc++: Fix debug containers for C++98 mode
Since r12-5072 made _Safe_container::operator=(const _Safe_container&)
protected, the debug containers no longer compile in C++98 mode. They
have user-provided copy assignment operators in C++98 mode, and they
assign each base class in turn. The 'this->_M_safe() = __x' expressions
fail, because calling a protected member function is only allowed via
'this'. They could be fixed by using this->_Safe::operator=(__x) but a
simpler solution is to just remove the user-provided assignment
operators and let the compiler define them (as we do for C++11 and
later, by defining them as defaulted).

The only change needed for that to work is to define the _Safe_vector
copy assignment operator in C++98 mode, so that the implicit
__gnu_debug::vector::operator= definition will call it, instead of
needing to call _M_update_guaranteed_capacity() manually.

libstdc++-v3/ChangeLog:

	* include/debug/deque (deque::operator=(const deque&)): Remove
	definition.
	* include/debug/list (list::operator=(const list&)): Likewise.
	* include/debug/map.h (map::operator=(const map&)): Likewise.
	* include/debug/multimap.h (multimap::operator=(const multimap&)):
	Likewise.
	* include/debug/multiset.h (multiset::operator=(const multiset&)):
	Likewise.
	* include/debug/set.h (set::operator=(const set&)): Likewise.
	* include/debug/string (basic_string::operator=(const basic_string&)):
	Likewise.
	* include/debug/vector (vector::operator=(const vector&)):
	Likewise.
	(_Safe_vector::operator=(const _Safe_vector&)): Define for
	C++98 as well.
2021-11-11 21:55:11 +00:00
Aldy Hernandez
53b3edceab Make ranger optional in path_range_query.
All users of path_range_query are currently allocating a gimple_ranger
only to pass it to the query object.  It's tidier to just do it from
path_range_query if no ranger was passed.

Tested on x86-64 Linux.

gcc/ChangeLog:

	* gimple-range-path.cc (path_range_query::path_range_query): New
	ctor without a ranger.
	(path_range_query::~path_range_query): Free ranger if necessary.
	(path_range_query::range_on_path_entry): Adjust m_ranger for pointer.
	(path_range_query::ssa_range_in_phi): Same.
	(path_range_query::compute_ranges_in_block): Same.
	(path_range_query::compute_imports): Same.
	(path_range_query::compute_ranges): Same.
	(path_range_query::range_of_stmt): Same.
	(path_range_query::compute_outgoing_relations): Same.
	* gimple-range-path.h (class path_range_query): New ctor.
	* tree-ssa-loop-ch.c (ch_base::copy_headers): Remove gimple_ranger
	as path_range_query allocates one.
	* tree-ssa-threadbackward.c (class back_threader): Remove m_ranger.
	(back_threader::~back_threader): Same.
2021-11-11 22:13:17 +01:00
Aldy Hernandez
a7753db4a7 Remove loop crossing restriction from the backward threader.
We have much more thorough restrictions, that are shared between both
threader implementations, in the registry.  I've been meaning to
remove the backward threader one, since it's only purpose was reducing
the search space.  Previously there was a small time penalty for its
removal, but with the various patches in the past month, it looks like
the removal is a wash performance wise.

This catches 8 more jump threads in the backward threader in my suite.
Presumably, because we disallowed all loop crossing, whereas the
registry restrictions allow some crossing (if we exit the loop, etc).

Tested on x86-64 Linux.

gcc/ChangeLog:

	* tree-ssa-threadbackward.c
	(back_threader_profitability::profitable_path_p): Remove loop
	crossing restriction.
2021-11-11 22:13:17 +01:00
Bill Schmidt
8a8458ac6b rs6000: Fix test_mffsl.c to require Power9 support
2021-11-11  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/testsuite/
	* gcc.target/powerpc/test_mffsl.c: Require Power9.
2021-11-11 14:36:04 -06:00
Ian Lance Taylor
7846156274 compiler: traverse func subexprs when creating func descriptors
Fix the Create_func_descriptors pass to traverse the subexpressions of
the function in a Call_expression.  There are no subexpressions in the
normal case of calling a function a method directly, but there are
subexpressions when in code like F().M() when F returns an interface type.

Forgetting to traverse the function subexpressions was almost entirely
hidden by the fact that we also created the necessary thunks in
Bound_method_expression::do_flatten and
Interface_field_reference_expression::do_get_backend.  However, when
the thunks were created there, they did not go through the
order_evaluations pass.  This almost always worked, but failed in the
case in which the function being thunked returned multiple results, as
order_evaluations takes the necessary step of moving the
Call_expression into its own statement, and that would not happen when
order_evaluations was not called.  Avoid hiding errors like this by
changing those methods to only lookup the previously created thunk,
rather than creating it if it was not already created.

The test case for this is https://golang.org/cl/363156.

Fixes https://golang.org/issue/49512

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/363274
2021-11-11 12:21:56 -08:00
Jonathan Wakely
083fd73202 libstdc++: Make pmr::memory_resource::allocate implicitly create objects
Calling the placement version of ::operator new "implicitly creates
objects in the returned region of storage" as per [intro.object]. This
allows the returned memory to be used as storage for implicit-lifetime
types (including arrays) without additional action by the caller. This
is required by the proposed resolution of LWG 3147.

libstdc++-v3/ChangeLog:

	* include/std/memory_resource (memory_resource::allocate):
	Implicitly create objects in the returned storage.
2021-11-11 18:16:17 +00:00
Jonathan Wakely
ef0e100f58 libstdc++: Remove public std::vector<bool>::data() member
This function only exists to avoid an error in the debug mode vector, so
doesn't need to be public.

libstdc++-v3/ChangeLog:

	* include/bits/stl_bvector.h (vector<bool>::data()): Give
	protected access, and delete for C++11 and later.
2021-11-11 18:16:17 +00:00
Jan Hubicka
dc002e31fb Fix gfortran.dg/inline_matmul_17.f90 template.
As discussed on the mailing list the template actually tests for missed
optimization where we fail to pragate size of an array.  We no longer miss this
after modref improvements.

gcc/testsuite/ChangeLog:

2021-11-11  Jan Hubicka  <hubicka@ucw.cz>

	* gfortran.dg/inline_matmul_17.f90: Fix template
2021-11-11 18:51:35 +01:00
Jan Hubicka
494bdadf28 Enable pure-const discovery in modref.
We newly can handle some extra cases, for example:

struct a {int a,b,c;};
__attribute__ ((noinline))
int init (struct a *a)
{
  a->a=1;
  a->b=2;
  a->c=3;
}
int const_fn ()
{
  struct a a;
  init (&a);
  return a.a + a.b + a.c;
}

Here pure/const stops on the fact that const_fn calls non-const init, while
modref knows that the memory it initializes is local to const_fn.

I ended up reordering passes so early modref is done after early pure-const
mostly to avoid need to change testsuite which greps for const functions
being detects in pure-const.  Stil some testuiste compensation is needed.

gcc/ChangeLog:

2021-11-11  Jan Hubicka  <hubicka@ucw.cz>

	* ipa-modref.c (analyze_function): Do pure/const discovery, return
	true on success.
	(pass_modref::execute): If pure/const is discovered fixup cfg.
	(ignore_edge): Do not ignore pure/const edges.
	(modref_propagate_in_scc): Do pure/const discovery, return true if
	cdtor was promoted pure/const.
	(pass_ipa_modref::execute): If needed remove unreachable functions.
	* ipa-pure-const.c (warn_function_noreturn): Fix whitespace.
	(warn_function_cold): Likewise.
	(skip_function_for_local_pure_const): Move earlier.
	(ipa_make_function_const): Break out from ...
	(ipa_make_function_pure): Break out from ...
	(propagate_pure_const): ... here.
	(pass_local_pure_const::execute): Use it.
	* ipa-utils.h (ipa_make_function_const): Declare.
	(ipa_make_function_pure): Declare.
	* passes.def: Move early modref after pure-const.

gcc/testsuite/ChangeLog:

2021-11-11  Jan Hubicka  <hubicka@ucw.cz>

	* c-c++-common/tm/inline-asm.c: Disable pure-const.
	* g++.dg/ipa/modref-1.C: Update template.
	* gcc.dg/tree-ssa/modref-11.c: Disable pure-const.
	* gcc.dg/tree-ssa/modref-14.c: New test.
	* gcc.dg/tree-ssa/modref-8.c: Do not optimize sibling calls.
	* gfortran.dg/do_subscript_3.f90: Add -O0.
2021-11-11 18:14:45 +01:00
David Malcolm
abdff441a0 diagnostic: fix unused variable 'def_tabstop' [PR103129]
gcc/ChangeLog:
	PR other/103129
	* diagnostic-show-locus.c (def_policy): Use def_tabstop.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2021-11-11 12:12:53 -05:00
Tobias Burnus
407eaad25f Fortran/openmp: Add support for 2 argument num_teams clause
Fortran part to commit r12-5146-g48d7327f2aaf65

gcc/fortran/ChangeLog:

	* gfortran.h (struct gfc_omp_clauses): Rename num_teams to
	num_teams_upper, add num_teams_upper.
	* dump-parse-tree.c (show_omp_clauses): Update to handle
	lower-bound num_teams clause.
	* frontend-passes.c (gfc_code_walker): Likewise
	* openmp.c (gfc_free_omp_clauses, gfc_match_omp_clauses,
	resolve_omp_clauses): Likewise.
	* trans-openmp.c (gfc_trans_omp_clauses, gfc_split_omp_clauses,
	gfc_trans_omp_target): Likewise.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/teams-1.f90: New test.
2021-11-11 17:27:00 +01:00
Jonathan Wright
e1b218d174 aarch64: Use type-qualified builtins for vcombine_* Neon intrinsics
Declare unsigned and polynomial type-qualified builtins for
vcombine_* Neon intrinsics. Using these builtins removes the need for
many casts in arm_neon.h.

gcc/ChangeLog:

2021-11-10  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64-builtins.c (TYPES_COMBINE): Delete.
	(TYPES_COMBINEP): Delete.
	* config/aarch64/aarch64-simd-builtins.def: Declare type-
	qualified builtins for vcombine_* intrinsics.
	* config/aarch64/arm_neon.h (vcombine_s8): Remove unnecessary
	cast.
	(vcombine_s16): Likewise.
	(vcombine_s32): Likewise.
	(vcombine_f32): Likewise.
	(vcombine_u8): Use type-qualified builtin and remove casts.
	(vcombine_u16): Likewise.
	(vcombine_u32): Likewise.
	(vcombine_u64): Likewise.
	(vcombine_p8): Likewise.
	(vcombine_p16): Likewise.
	(vcombine_p64): Likewise.
	(vcombine_bf16): Remove unnecessary cast.
	* config/aarch64/iterators.md (VD_I): New mode iterator.
	(VDC_P): New mode iterator.
2021-11-11 15:34:52 +00:00
Jonathan Wright
1716ddd1e9 aarch64: Use type-qualified builtins for LD1/ST1 Neon intrinsics
Declare unsigned and polynomial type-qualified builtins for LD1/ST1
Neon intrinsics. Using these builtins removes the need for many casts
in arm_neon.h.

The new type-qualified builtins are also lowered to gimple - as the
unqualified builtins are already.

gcc/ChangeLog:

2021-11-10  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64-builtins.c (TYPES_LOAD1_U): Define.
	(TYPES_LOAD1_P): Define.
	(TYPES_STORE1_U): Define.
	(TYPES_STORE1P): Rename to...
	(TYPES_STORE1_P): This.
	(get_mem_type_for_load_store): Add unsigned and poly types.
	(aarch64_general_gimple_fold_builtin): Add unsigned and poly
	type-qualified builtin declarations.
	* config/aarch64/aarch64-simd-builtins.def: Declare type-
	qualified builtins for LD1/ST1.
	* config/aarch64/arm_neon.h (vld1_p8): Use type-qualified
	builtin and remove cast.
	(vld1_p16): Likewise.
	(vld1_u8): Likewise.
	(vld1_u16): Likewise.
	(vld1_u32): Likewise.
	(vld1q_p8): Likewise.
	(vld1q_p16): Likewise.
	(vld1q_p64): Likewise.
	(vld1q_u8): Likewise.
	(vld1q_u16): Likewise.
	(vld1q_u32): Likewise.
	(vld1q_u64): Likewise.
	(vst1_p8): Likewise.
	(vst1_p16): Likewise.
	(vst1_u8): Likewise.
	(vst1_u16): Likewise.
	(vst1_u32): Likewise.
	(vst1q_p8): Likewise.
	(vst1q_p16): Likewise.
	(vst1q_p64): Likewise.
	(vst1q_u8): Likewise.
	(vst1q_u16): Likewise.
	(vst1q_u32): Likewise.
	(vst1q_u64): Likewise.
	* config/aarch64/iterators.md (VALLP_NO_DI): New iterator.
2021-11-11 15:34:51 +00:00
Jonathan Wright
6eca10aa76 aarch64: Use type-qualified builtins for ADDV Neon intrinsics
Declare unsigned type-qualified builtins and use them to implement
the vector reduction Neon intrinsics. This removes the need for many
casts in arm_neon.h.

gcc/ChangeLog:

2021-11-09  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64-simd-builtins.def: Declare unsigned
	builtins for vector reduction.
	* config/aarch64/arm_neon.h (vaddv_u8): Use type-qualified
	builtin and remove casts.
	(vaddv_u16): Likewise.
	(vaddv_u32): Likewise.
	(vaddvq_u8): Likewise.
	(vaddvq_u16): Likewise.
	(vaddvq_u32): Likewise.
	(vaddvq_u64): Likewise.
2021-11-11 15:34:51 +00:00
Jonathan Wright
f341c03203 aarch64: Use type-qualified builtins for ADDP Neon intrinsics
Declare unsigned type-qualified builtins and use them to implement
the pairwise addition Neon intrinsics. This removes the need for many
casts in arm_neon.h.

gcc/ChangeLog:

2021-11-09  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64-simd-builtins.def:
	* config/aarch64/arm_neon.h (vpaddq_u8): Use type-qualified
	builtin and remove casts.
	(vpaddq_u16): Likewise.
	(vpaddq_u32): Likewise.
	(vpaddq_u64): Likewise.
	(vpadd_u8): Likewise.
	(vpadd_u16): Likewise.
	(vpadd_u32): Likewise.
	(vpaddd_u64): Likewise.
2021-11-11 15:34:51 +00:00
Jonathan Wright
80ee260d5b aarch64: Use type-qualified builtins for [R]SUBHN[2] Neon intrinsics
Declare unsigned type-qualified builtins and use them to implement
(rounding) halving-narrowing-subtract Neon intrinsics. This removes
the need for many casts in arm_neon.h.

gcc/ChangeLog:

2021-11-09  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64-simd-builtins.def: Declare unsigned
	builtins for [r]subhn[2].
	* config/aarch64/arm_neon.h (vsubhn_s16): Remove unnecessary
	cast.
	(vsubhn_s32): Likewise.
	(vsubhn_s64): Likewise.
	(vsubhn_u16): Use type-qualified builtin and remove casts.
	(vsubhn_u32): Likewise.
	(vsubhn_u64): Likewise.
	(vrsubhn_s16): Remove unnecessary cast.
	(vrsubhn_s32): Likewise.
	(vrsubhn_s64): Likewise.
	(vrsubhn_u16): Use type-qualified builtin and remove casts.
	(vrsubhn_u32): Likewise.
	(vrsubhn_u64): Likewise.
	(vrsubhn_high_s16): Remove unnecessary cast.
	(vrsubhn_high_s32): Likewise.
	(vrsubhn_high_s64): Likewise.
	(vrsubhn_high_u16): Use type-qualified builtin and remove
	casts.
	(vrsubhn_high_u32): Likewise.
	(vrsubhn_high_u64): Likewise.
	(vsubhn_high_s16): Remove unnecessary cast.
	(vsubhn_high_s32): Likewise.
	(vsubhn_high_s64): Likewise.
	(vsubhn_high_u16): Use type-qualified builtin and remove
	casts.
	(vsubhn_high_u32): Likewise.
	(vsubhn_high_u64): Likewise.
2021-11-11 15:34:51 +00:00
Jonathan Wright
7bde2a6ecd aarch64: Use type-qualified builtins for [R]ADDHN[2] Neon intrinsics
Declare unsigned type-qualified builtins and use them to implement
(rounding) halving-narrowing-add Neon intrinsics. This removes the
need for many casts in arm_neon.h.

gcc/ChangeLog:

2021-11-09  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64-simd-builtins.def: Declare unsigned
	builtins for [r]addhn[2].
	* config/aarch64/arm_neon.h (vaddhn_s16): Remove unnecessary
	cast.
	(vaddhn_s32): Likewise.
	(vaddhn_s64): Likewise.
	(vaddhn_u16): Use type-qualified builtin and remove casts.
	(vaddhn_u32): Likewise.
	(vaddhn_u64): Likewise.
	(vraddhn_s16): Remove unnecessary cast.
	(vraddhn_s32): Likewise.
	(vraddhn_s64): Likewise.
	(vraddhn_u16): Use type-qualified builtin and remove casts.
	(vraddhn_u32): Likewise.
	(vraddhn_u64): Likewise.
	(vaddhn_high_s16): Remove unnecessary cast.
	(vaddhn_high_s32): Likewise.
	(vaddhn_high_s64): Likewise.
	(vaddhn_high_u16): Use type-qualified builtin and remove
	casts.
	(vaddhn_high_u32): Likewise.
	(vaddhn_high_u64): Likewise.
	(vraddhn_high_s16): Remove unnecessary cast.
	(vraddhn_high_s32): Likewise.
	(vraddhn_high_s64): Likewise.
	(vraddhn_high_u16): Use type-qualified builtin and remove
	casts.
	(vraddhn_high_u32): Likewise.
	(vraddhn_high_u64): Likewise.
2021-11-11 15:34:51 +00:00