Commit Graph

189602 Commits

Author SHA1 Message Date
Jonathan Wakely
a54ce8865a libstdc++: Print assertion messages to stderr [PR59675]
This replaces the printf used by failed debug assertions with fprintf,
so we can write to stderr.

To avoid including <stdio.h> the assert function is moved into the
library. To avoid programs using a vague linkage definition of the old
inline function, the function is renamed. Code compiled with old
versions of GCC might still call the old function, but code compiled
with the newer GCC will call the new function and write to stderr.

libstdc++-v3/ChangeLog:

	PR libstdc++/59675
	* acinclude.m4 (libtool_VERSION): Bump version.
	* config/abi/pre/gnu.ver (GLIBCXX_3.4.30): Add version and
	export new symbol.
	* configure: Regenerate.
	* include/bits/c++config (__replacement_assert): Remove, declare
	__glibcxx_assert_fail instead.
	* src/c++11/debug.cc (__glibcxx_assert_fail): New function to
	replace __replacement_assert, writing to stderr instead of
	stdout.
	* testsuite/util/testsuite_abi.cc: Update latest version.
2021-11-12 12:23:10 +00:00
Mikael Morin
68d62cb206 fortran: Ignore unused args in scalarization [PR97896]
The KIND argument of the INDEX intrinsic is a compile time constant
that is used at compile time only to resolve to a kind-specific library
function.  That argument is otherwise completely ignored at runtime, and there is
no code generated for it as the library procedure has no kind argument.
This confuses the scalarizer which expects to see every argument
of elemental functions used when calling a procedure.
This change removes the argument from the scalarization lists
at the beginning of the scalarization process, so that the argument
is completely ignored.
This also reverts the existing workaround
(commit d09847357b except for its testcase).

	PR fortran/97896

gcc/fortran/ChangeLog:
	* intrinsic.c (add_sym_4ind): Remove.
	(add_functions): Use add_sym4 instead of add_sym4ind.
	Don’t special case the index intrinsic.
	* iresolve.c (gfc_resolve_index_func): Use the individual arguments
	directly instead of the full argument list.
	* intrinsic.h (gfc_resolve_index_func): Update the declaration
	accordingly.
	* trans-decl.c (gfc_get_extern_function_decl): Don’t modify the
	list of arguments in the case of the index intrinsic.
	* trans-array.h (gfc_get_intrinsic_for_expr,
	gfc_get_proc_ifc_for_expr): New.
	* trans-array.c (gfc_get_intrinsic_for_expr,
	arg_evaluated_for_scalarization): New.
	(gfc_walk_elemental_function_args): Add intrinsic procedure
	as argument.  Count arguments.  Check arg_evaluated_for_scalarization.
	* trans-intrinsic.c (gfc_walk_intrinsic_function): Update call.
	* trans-stmt.c (get_intrinsic_for_code): New.
	(gfc_trans_call): Update call.

gcc/testsuite/ChangeLog:
	* gfortran.dg/index_5.f90: New.
2021-11-12 13:10:55 +01:00
Jakub Jelinek
7d6da11fce openmp: Honor OpenMP 5.1 num_teams lower bound
The following patch implements what I've been talking about earlier,
honor that for explicit num_teams clause we create at least the
lower-bound (if not specified, upper-bound) teams in the league.
For host fallback, it still means we only have one thread doing all the
teams, sequentially one after another.
For PTX and GCN, I think the new teams-2.c test and maybe teams-4.c too
will or might fail.
For these offloads, I think it is ok to remove symbols no longer used
from libgomp.a.
If num_teams_lower is bigger than the provided num_blocks or num_workgroups,
we should arrange for gomp_num_teams_var to be num_teams_lower - 1,
stop using the %ctaid.x or __builtin_gcn_dim_pos (0) for omp_get_team_num ()
and instead use for it some .shared var that GOMP_teams4 initializes to
%ctaid.x or __builtin_gcn_dim_pos (0) when first and for !first
increment that by num_blocks or num_workgroups each time and only
return false when we are above num_teams_lower.
Any help with actually implementing this for the 2 architectures highly
appreciated.

2021-11-12  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* omp-builtins.def (BUILT_IN_GOMP_TEAMS): Remove.
	(BUILT_IN_GOMP_TEAMS4): New.
	* builtin-types.def (BT_FN_VOID_UINT_UINT): Remove.
	(BT_FN_BOOL_UINT_UINT_UINT_BOOL): New.
	* omp-low.c (lower_omp_teams): Use GOMP_teams4 instead of
	GOMP_teams, pass to it also num_teams lower-bound expression
	or a dup of upper-bound if it is missing and a flag whether
	it is the first call or not.
gcc/fortran/
	* types.def (BT_FN_VOID_UINT_UINT): Remove.
	(BT_FN_BOOL_UINT_UINT_UINT_BOOL): New.
libgomp/
	* libgomp_g.h (GOMP_teams4): Declare.
	* libgomp.map (GOMP_5.1): Export GOMP_teams4.
	* target.c (GOMP_teams4): New function.
	* config/nvptx/target.c (GOMP_teams): Remove.
	(GOMP_teams4): New function.
	* config/gcn/target.c (GOMP_teams): Remove.
	(GOMP_teams4): New function.
	* testsuite/libgomp.c/teams-4.c (main): Expect exactly 2
	teams instead of <= 2.
	* testsuite/libgomp.c-c++-common/teams-2.c: New test.
2021-11-12 12:41:22 +01:00
Martin Liska
5f516a6a5d Remove unused function.
PR tree-optimization/102497

gcc/ChangeLog:

	* gimple-predicate-analysis.cc (add_pred): Remove unused
	function:
2021-11-12 12:40:02 +01:00
Richard Biener
140346fa24 tree-optimization/103204 - fix missed valueization in VN
The following fixes a missed valueization when simplifying
a MEM[&...] combination during valueization.

2021-11-12  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/103204
	* tree-ssa-sccvn.c (valueize_refs_1): Re-valueize the
	top operand after folding in an address.

	* gcc.dg/torture/pr103204.c: New testcase.
2021-11-12 09:11:49 +01:00
Alan Modra
c60ded6f5e Make opcodes configure depend on bfd configure
The idea is for opcodes to be able to see whether bfd is compiled
for 64-bit.  A lot of --enable-targets=all libopcodes is wasted space
if bfd can't load 64-bit target object files.

	* Makefile.def (configure-opcodes): Depend on configure-bfd.
	* Makefile.in: Regenerate.
2021-11-12 18:34:12 +10:30
Jonathan Wakely
1ae8edf5f7 libstdc++: Implement constexpr std::vector for C++20
This implements P1004R2 ("Making std::vector constexpr") for C++20.

For now, debug mode vectors are not supported in constant expressions.
To make that work we might need to disable all attaching/detaching of
safe iterators. That can be fixed later.

Co-authored-by: Josh Marshall <joshua.r.marshall.1991@gmail.com>

libstdc++-v3/ChangeLog:

	* include/bits/alloc_traits.h (_Destroy): Make constexpr for
	C++20 mode.
	* include/bits/allocator.h (__shrink_to_fit::_S_do_it):
	Likewise.
	* include/bits/stl_algobase.h (__fill_a1): Declare _Bit_iterator
	overload constexpr for C++20.
	* include/bits/stl_bvector.h (_Bit_type, _S_word_bit): Move out
	of inline namespace.
	(_Bit_reference, _Bit_iterator_base, _Bit_iterator)
	(_Bit_const_iterator, _Bvector_impl_data, _Bvector_base)
	(vector<bool, A>>): Add constexpr to every member function.
	(_Bvector_base::_M_allocate): Initialize storage during constant
	evaluation.
	(vector<bool, A>::_M_initialize_value): Use __fill_bvector_n
	instead of memset.
	(__fill_bvector_n): New helper function to replace memset during
	constant evaluation.
	* include/bits/stl_uninitialized.h (__uninitialized_copy<false>):
	Move logic to ...
	(__do_uninit_copy): New function.
	(__uninitialized_fill<false>): Move logic to ...
	(__do_uninit_fill): New function.
	(__uninitialized_fill_n<false>): Move logic to ...
	(__do_uninit_fill_n): New function.
	(__uninitialized_copy_a): Add constexpr. Use __do_uninit_copy.
	(__uninitialized_move_a, __uninitialized_move_if_noexcept_a):
	Add constexpr.
	(__uninitialized_fill_a): Add constexpr. Use __do_uninit_fill.
	(__uninitialized_fill_n_a): Add constexpr. Use
	__do_uninit_fill_n.
	(__uninitialized_default_n, __uninitialized_default_n_a)
	(__relocate_a_1, __relocate_a): Add constexpr.
	* include/bits/stl_vector.h (_Vector_impl_data, _Vector_impl)
	(_Vector_base, vector): Add constexpr to every member function.
	(_Vector_impl::_S_adjust): Disable ASan annotation during
	constant evaluation.
	(_Vector_base::_S_use_relocate): Disable bitwise-relocation
	during constant evaluation.
	(vector::_Temporary_value): Use a union for storage.
	* include/bits/vector.tcc (vector, vector<bool>): Add constexpr
	to every member function.
	* include/std/vector (erase_if, erase): Add constexpr.
	* testsuite/23_containers/headers/vector/synopsis.cc: Add
	constexpr for C++20 mode.
	* testsuite/23_containers/vector/bool/cmp_c++20.cc: Change to
	compile-only test using constant expressions.
	* testsuite/23_containers/vector/bool/capacity/29134.cc: Adjust
	namespace for _S_word_bit.
	* testsuite/23_containers/vector/bool/modifiers/insert/31370.cc:
	Likewise.
	* testsuite/23_containers/vector/cmp_c++20.cc: Likewise.
	* testsuite/23_containers/vector/cons/89164.cc: Adjust errors
	for C++20 and move C++17 test to ...
	* testsuite/23_containers/vector/cons/89164_c++17.cc: ... here.
	* testsuite/23_containers/vector/bool/capacity/constexpr.cc: New test.
	* testsuite/23_containers/vector/bool/cons/constexpr.cc: New test.
	* testsuite/23_containers/vector/bool/element_access/constexpr.cc: New test.
	* testsuite/23_containers/vector/bool/modifiers/assign/constexpr.cc: New test.
	* testsuite/23_containers/vector/bool/modifiers/constexpr.cc: New test.
	* testsuite/23_containers/vector/bool/modifiers/swap/constexpr.cc: New test.
	* testsuite/23_containers/vector/capacity/constexpr.cc: New test.
	* testsuite/23_containers/vector/cons/constexpr.cc: New test.
	* testsuite/23_containers/vector/data_access/constexpr.cc: New test.
	* testsuite/23_containers/vector/element_access/constexpr.cc: New test.
	* testsuite/23_containers/vector/modifiers/assign/constexpr.cc: New test.
	* testsuite/23_containers/vector/modifiers/constexpr.cc: New test.
	* testsuite/23_containers/vector/modifiers/swap/constexpr.cc: New test.
2021-11-12 00:42:39 +00:00
GCC Administrator
b39265d4fe Daily bump. 2021-11-12 00:16:32 +00:00
Jonathan Wakely
4a407d358e libstdc++: Fix debug containers for C++98 mode
Since r12-5072 made _Safe_container::operator=(const _Safe_container&)
protected, the debug containers no longer compile in C++98 mode. They
have user-provided copy assignment operators in C++98 mode, and they
assign each base class in turn. The 'this->_M_safe() = __x' expressions
fail, because calling a protected member function is only allowed via
'this'. They could be fixed by using this->_Safe::operator=(__x) but a
simpler solution is to just remove the user-provided assignment
operators and let the compiler define them (as we do for C++11 and
later, by defining them as defaulted).

The only change needed for that to work is to define the _Safe_vector
copy assignment operator in C++98 mode, so that the implicit
__gnu_debug::vector::operator= definition will call it, instead of
needing to call _M_update_guaranteed_capacity() manually.

libstdc++-v3/ChangeLog:

	* include/debug/deque (deque::operator=(const deque&)): Remove
	definition.
	* include/debug/list (list::operator=(const list&)): Likewise.
	* include/debug/map.h (map::operator=(const map&)): Likewise.
	* include/debug/multimap.h (multimap::operator=(const multimap&)):
	Likewise.
	* include/debug/multiset.h (multiset::operator=(const multiset&)):
	Likewise.
	* include/debug/set.h (set::operator=(const set&)): Likewise.
	* include/debug/string (basic_string::operator=(const basic_string&)):
	Likewise.
	* include/debug/vector (vector::operator=(const vector&)):
	Likewise.
	(_Safe_vector::operator=(const _Safe_vector&)): Define for
	C++98 as well.
2021-11-11 21:55:11 +00:00
Aldy Hernandez
53b3edceab Make ranger optional in path_range_query.
All users of path_range_query are currently allocating a gimple_ranger
only to pass it to the query object.  It's tidier to just do it from
path_range_query if no ranger was passed.

Tested on x86-64 Linux.

gcc/ChangeLog:

	* gimple-range-path.cc (path_range_query::path_range_query): New
	ctor without a ranger.
	(path_range_query::~path_range_query): Free ranger if necessary.
	(path_range_query::range_on_path_entry): Adjust m_ranger for pointer.
	(path_range_query::ssa_range_in_phi): Same.
	(path_range_query::compute_ranges_in_block): Same.
	(path_range_query::compute_imports): Same.
	(path_range_query::compute_ranges): Same.
	(path_range_query::range_of_stmt): Same.
	(path_range_query::compute_outgoing_relations): Same.
	* gimple-range-path.h (class path_range_query): New ctor.
	* tree-ssa-loop-ch.c (ch_base::copy_headers): Remove gimple_ranger
	as path_range_query allocates one.
	* tree-ssa-threadbackward.c (class back_threader): Remove m_ranger.
	(back_threader::~back_threader): Same.
2021-11-11 22:13:17 +01:00
Aldy Hernandez
a7753db4a7 Remove loop crossing restriction from the backward threader.
We have much more thorough restrictions, that are shared between both
threader implementations, in the registry.  I've been meaning to
remove the backward threader one, since it's only purpose was reducing
the search space.  Previously there was a small time penalty for its
removal, but with the various patches in the past month, it looks like
the removal is a wash performance wise.

This catches 8 more jump threads in the backward threader in my suite.
Presumably, because we disallowed all loop crossing, whereas the
registry restrictions allow some crossing (if we exit the loop, etc).

Tested on x86-64 Linux.

gcc/ChangeLog:

	* tree-ssa-threadbackward.c
	(back_threader_profitability::profitable_path_p): Remove loop
	crossing restriction.
2021-11-11 22:13:17 +01:00
Bill Schmidt
8a8458ac6b rs6000: Fix test_mffsl.c to require Power9 support
2021-11-11  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/testsuite/
	* gcc.target/powerpc/test_mffsl.c: Require Power9.
2021-11-11 14:36:04 -06:00
Ian Lance Taylor
7846156274 compiler: traverse func subexprs when creating func descriptors
Fix the Create_func_descriptors pass to traverse the subexpressions of
the function in a Call_expression.  There are no subexpressions in the
normal case of calling a function a method directly, but there are
subexpressions when in code like F().M() when F returns an interface type.

Forgetting to traverse the function subexpressions was almost entirely
hidden by the fact that we also created the necessary thunks in
Bound_method_expression::do_flatten and
Interface_field_reference_expression::do_get_backend.  However, when
the thunks were created there, they did not go through the
order_evaluations pass.  This almost always worked, but failed in the
case in which the function being thunked returned multiple results, as
order_evaluations takes the necessary step of moving the
Call_expression into its own statement, and that would not happen when
order_evaluations was not called.  Avoid hiding errors like this by
changing those methods to only lookup the previously created thunk,
rather than creating it if it was not already created.

The test case for this is https://golang.org/cl/363156.

Fixes https://golang.org/issue/49512

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/363274
2021-11-11 12:21:56 -08:00
Jonathan Wakely
083fd73202 libstdc++: Make pmr::memory_resource::allocate implicitly create objects
Calling the placement version of ::operator new "implicitly creates
objects in the returned region of storage" as per [intro.object]. This
allows the returned memory to be used as storage for implicit-lifetime
types (including arrays) without additional action by the caller. This
is required by the proposed resolution of LWG 3147.

libstdc++-v3/ChangeLog:

	* include/std/memory_resource (memory_resource::allocate):
	Implicitly create objects in the returned storage.
2021-11-11 18:16:17 +00:00
Jonathan Wakely
ef0e100f58 libstdc++: Remove public std::vector<bool>::data() member
This function only exists to avoid an error in the debug mode vector, so
doesn't need to be public.

libstdc++-v3/ChangeLog:

	* include/bits/stl_bvector.h (vector<bool>::data()): Give
	protected access, and delete for C++11 and later.
2021-11-11 18:16:17 +00:00
Jan Hubicka
dc002e31fb Fix gfortran.dg/inline_matmul_17.f90 template.
As discussed on the mailing list the template actually tests for missed
optimization where we fail to pragate size of an array.  We no longer miss this
after modref improvements.

gcc/testsuite/ChangeLog:

2021-11-11  Jan Hubicka  <hubicka@ucw.cz>

	* gfortran.dg/inline_matmul_17.f90: Fix template
2021-11-11 18:51:35 +01:00
Jan Hubicka
494bdadf28 Enable pure-const discovery in modref.
We newly can handle some extra cases, for example:

struct a {int a,b,c;};
__attribute__ ((noinline))
int init (struct a *a)
{
  a->a=1;
  a->b=2;
  a->c=3;
}
int const_fn ()
{
  struct a a;
  init (&a);
  return a.a + a.b + a.c;
}

Here pure/const stops on the fact that const_fn calls non-const init, while
modref knows that the memory it initializes is local to const_fn.

I ended up reordering passes so early modref is done after early pure-const
mostly to avoid need to change testsuite which greps for const functions
being detects in pure-const.  Stil some testuiste compensation is needed.

gcc/ChangeLog:

2021-11-11  Jan Hubicka  <hubicka@ucw.cz>

	* ipa-modref.c (analyze_function): Do pure/const discovery, return
	true on success.
	(pass_modref::execute): If pure/const is discovered fixup cfg.
	(ignore_edge): Do not ignore pure/const edges.
	(modref_propagate_in_scc): Do pure/const discovery, return true if
	cdtor was promoted pure/const.
	(pass_ipa_modref::execute): If needed remove unreachable functions.
	* ipa-pure-const.c (warn_function_noreturn): Fix whitespace.
	(warn_function_cold): Likewise.
	(skip_function_for_local_pure_const): Move earlier.
	(ipa_make_function_const): Break out from ...
	(ipa_make_function_pure): Break out from ...
	(propagate_pure_const): ... here.
	(pass_local_pure_const::execute): Use it.
	* ipa-utils.h (ipa_make_function_const): Declare.
	(ipa_make_function_pure): Declare.
	* passes.def: Move early modref after pure-const.

gcc/testsuite/ChangeLog:

2021-11-11  Jan Hubicka  <hubicka@ucw.cz>

	* c-c++-common/tm/inline-asm.c: Disable pure-const.
	* g++.dg/ipa/modref-1.C: Update template.
	* gcc.dg/tree-ssa/modref-11.c: Disable pure-const.
	* gcc.dg/tree-ssa/modref-14.c: New test.
	* gcc.dg/tree-ssa/modref-8.c: Do not optimize sibling calls.
	* gfortran.dg/do_subscript_3.f90: Add -O0.
2021-11-11 18:14:45 +01:00
David Malcolm
abdff441a0 diagnostic: fix unused variable 'def_tabstop' [PR103129]
gcc/ChangeLog:
	PR other/103129
	* diagnostic-show-locus.c (def_policy): Use def_tabstop.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2021-11-11 12:12:53 -05:00
Tobias Burnus
407eaad25f Fortran/openmp: Add support for 2 argument num_teams clause
Fortran part to commit r12-5146-g48d7327f2aaf65

gcc/fortran/ChangeLog:

	* gfortran.h (struct gfc_omp_clauses): Rename num_teams to
	num_teams_upper, add num_teams_upper.
	* dump-parse-tree.c (show_omp_clauses): Update to handle
	lower-bound num_teams clause.
	* frontend-passes.c (gfc_code_walker): Likewise
	* openmp.c (gfc_free_omp_clauses, gfc_match_omp_clauses,
	resolve_omp_clauses): Likewise.
	* trans-openmp.c (gfc_trans_omp_clauses, gfc_split_omp_clauses,
	gfc_trans_omp_target): Likewise.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/teams-1.f90: New test.
2021-11-11 17:27:00 +01:00
Jonathan Wright
e1b218d174 aarch64: Use type-qualified builtins for vcombine_* Neon intrinsics
Declare unsigned and polynomial type-qualified builtins for
vcombine_* Neon intrinsics. Using these builtins removes the need for
many casts in arm_neon.h.

gcc/ChangeLog:

2021-11-10  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64-builtins.c (TYPES_COMBINE): Delete.
	(TYPES_COMBINEP): Delete.
	* config/aarch64/aarch64-simd-builtins.def: Declare type-
	qualified builtins for vcombine_* intrinsics.
	* config/aarch64/arm_neon.h (vcombine_s8): Remove unnecessary
	cast.
	(vcombine_s16): Likewise.
	(vcombine_s32): Likewise.
	(vcombine_f32): Likewise.
	(vcombine_u8): Use type-qualified builtin and remove casts.
	(vcombine_u16): Likewise.
	(vcombine_u32): Likewise.
	(vcombine_u64): Likewise.
	(vcombine_p8): Likewise.
	(vcombine_p16): Likewise.
	(vcombine_p64): Likewise.
	(vcombine_bf16): Remove unnecessary cast.
	* config/aarch64/iterators.md (VD_I): New mode iterator.
	(VDC_P): New mode iterator.
2021-11-11 15:34:52 +00:00
Jonathan Wright
1716ddd1e9 aarch64: Use type-qualified builtins for LD1/ST1 Neon intrinsics
Declare unsigned and polynomial type-qualified builtins for LD1/ST1
Neon intrinsics. Using these builtins removes the need for many casts
in arm_neon.h.

The new type-qualified builtins are also lowered to gimple - as the
unqualified builtins are already.

gcc/ChangeLog:

2021-11-10  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64-builtins.c (TYPES_LOAD1_U): Define.
	(TYPES_LOAD1_P): Define.
	(TYPES_STORE1_U): Define.
	(TYPES_STORE1P): Rename to...
	(TYPES_STORE1_P): This.
	(get_mem_type_for_load_store): Add unsigned and poly types.
	(aarch64_general_gimple_fold_builtin): Add unsigned and poly
	type-qualified builtin declarations.
	* config/aarch64/aarch64-simd-builtins.def: Declare type-
	qualified builtins for LD1/ST1.
	* config/aarch64/arm_neon.h (vld1_p8): Use type-qualified
	builtin and remove cast.
	(vld1_p16): Likewise.
	(vld1_u8): Likewise.
	(vld1_u16): Likewise.
	(vld1_u32): Likewise.
	(vld1q_p8): Likewise.
	(vld1q_p16): Likewise.
	(vld1q_p64): Likewise.
	(vld1q_u8): Likewise.
	(vld1q_u16): Likewise.
	(vld1q_u32): Likewise.
	(vld1q_u64): Likewise.
	(vst1_p8): Likewise.
	(vst1_p16): Likewise.
	(vst1_u8): Likewise.
	(vst1_u16): Likewise.
	(vst1_u32): Likewise.
	(vst1q_p8): Likewise.
	(vst1q_p16): Likewise.
	(vst1q_p64): Likewise.
	(vst1q_u8): Likewise.
	(vst1q_u16): Likewise.
	(vst1q_u32): Likewise.
	(vst1q_u64): Likewise.
	* config/aarch64/iterators.md (VALLP_NO_DI): New iterator.
2021-11-11 15:34:51 +00:00
Jonathan Wright
6eca10aa76 aarch64: Use type-qualified builtins for ADDV Neon intrinsics
Declare unsigned type-qualified builtins and use them to implement
the vector reduction Neon intrinsics. This removes the need for many
casts in arm_neon.h.

gcc/ChangeLog:

2021-11-09  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64-simd-builtins.def: Declare unsigned
	builtins for vector reduction.
	* config/aarch64/arm_neon.h (vaddv_u8): Use type-qualified
	builtin and remove casts.
	(vaddv_u16): Likewise.
	(vaddv_u32): Likewise.
	(vaddvq_u8): Likewise.
	(vaddvq_u16): Likewise.
	(vaddvq_u32): Likewise.
	(vaddvq_u64): Likewise.
2021-11-11 15:34:51 +00:00
Jonathan Wright
f341c03203 aarch64: Use type-qualified builtins for ADDP Neon intrinsics
Declare unsigned type-qualified builtins and use them to implement
the pairwise addition Neon intrinsics. This removes the need for many
casts in arm_neon.h.

gcc/ChangeLog:

2021-11-09  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64-simd-builtins.def:
	* config/aarch64/arm_neon.h (vpaddq_u8): Use type-qualified
	builtin and remove casts.
	(vpaddq_u16): Likewise.
	(vpaddq_u32): Likewise.
	(vpaddq_u64): Likewise.
	(vpadd_u8): Likewise.
	(vpadd_u16): Likewise.
	(vpadd_u32): Likewise.
	(vpaddd_u64): Likewise.
2021-11-11 15:34:51 +00:00
Jonathan Wright
80ee260d5b aarch64: Use type-qualified builtins for [R]SUBHN[2] Neon intrinsics
Declare unsigned type-qualified builtins and use them to implement
(rounding) halving-narrowing-subtract Neon intrinsics. This removes
the need for many casts in arm_neon.h.

gcc/ChangeLog:

2021-11-09  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64-simd-builtins.def: Declare unsigned
	builtins for [r]subhn[2].
	* config/aarch64/arm_neon.h (vsubhn_s16): Remove unnecessary
	cast.
	(vsubhn_s32): Likewise.
	(vsubhn_s64): Likewise.
	(vsubhn_u16): Use type-qualified builtin and remove casts.
	(vsubhn_u32): Likewise.
	(vsubhn_u64): Likewise.
	(vrsubhn_s16): Remove unnecessary cast.
	(vrsubhn_s32): Likewise.
	(vrsubhn_s64): Likewise.
	(vrsubhn_u16): Use type-qualified builtin and remove casts.
	(vrsubhn_u32): Likewise.
	(vrsubhn_u64): Likewise.
	(vrsubhn_high_s16): Remove unnecessary cast.
	(vrsubhn_high_s32): Likewise.
	(vrsubhn_high_s64): Likewise.
	(vrsubhn_high_u16): Use type-qualified builtin and remove
	casts.
	(vrsubhn_high_u32): Likewise.
	(vrsubhn_high_u64): Likewise.
	(vsubhn_high_s16): Remove unnecessary cast.
	(vsubhn_high_s32): Likewise.
	(vsubhn_high_s64): Likewise.
	(vsubhn_high_u16): Use type-qualified builtin and remove
	casts.
	(vsubhn_high_u32): Likewise.
	(vsubhn_high_u64): Likewise.
2021-11-11 15:34:51 +00:00
Jonathan Wright
7bde2a6ecd aarch64: Use type-qualified builtins for [R]ADDHN[2] Neon intrinsics
Declare unsigned type-qualified builtins and use them to implement
(rounding) halving-narrowing-add Neon intrinsics. This removes the
need for many casts in arm_neon.h.

gcc/ChangeLog:

2021-11-09  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64-simd-builtins.def: Declare unsigned
	builtins for [r]addhn[2].
	* config/aarch64/arm_neon.h (vaddhn_s16): Remove unnecessary
	cast.
	(vaddhn_s32): Likewise.
	(vaddhn_s64): Likewise.
	(vaddhn_u16): Use type-qualified builtin and remove casts.
	(vaddhn_u32): Likewise.
	(vaddhn_u64): Likewise.
	(vraddhn_s16): Remove unnecessary cast.
	(vraddhn_s32): Likewise.
	(vraddhn_s64): Likewise.
	(vraddhn_u16): Use type-qualified builtin and remove casts.
	(vraddhn_u32): Likewise.
	(vraddhn_u64): Likewise.
	(vaddhn_high_s16): Remove unnecessary cast.
	(vaddhn_high_s32): Likewise.
	(vaddhn_high_s64): Likewise.
	(vaddhn_high_u16): Use type-qualified builtin and remove
	casts.
	(vaddhn_high_u32): Likewise.
	(vaddhn_high_u64): Likewise.
	(vraddhn_high_s16): Remove unnecessary cast.
	(vraddhn_high_s32): Likewise.
	(vraddhn_high_s64): Likewise.
	(vraddhn_high_u16): Use type-qualified builtin and remove
	casts.
	(vraddhn_high_u32): Likewise.
	(vraddhn_high_u64): Likewise.
2021-11-11 15:34:51 +00:00
Jonathan Wright
aa11d95bea aarch64: Use type-qualified builtins for UHSUB Neon intrinsics
Declare unsigned type-qualified builtins and use them to implement
halving-subtract Neon intrinsics. This removes the need for many
casts in arm_neon.h.

gcc/ChangeLog:

2021-11-09  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64-simd-builtins.def: Use BINOPU type
	qualifiers in generator macros for uhsub builtins.
	* config/aarch64/arm_neon.h (vhsub_s8): Remove unnecessary
	cast.
	(vhsub_s16): Likewise.
	(vhsub_s32): Likewise.
	(vhsub_u8): Use type-qualified builtin and remove casts.
	(vhsub_u16): Likewise.
	(vhsub_u32): Likewise.
	(vhsubq_s8): Remove unnecessary cast.
	(vhsubq_s16): Likewise.
	(vhsubq_s32): Likewise.
	(vhsubq_u8): Use type-qualified builtin and remove casts.
	(vhsubq_u16): Likewise.
	(vhsubq_u32): Likewise.
2021-11-11 15:34:50 +00:00
Jonathan Wright
3e35924cf1 aarch64: Use type-qualified builtins for U[R]HADD Neon intrinsics
Declare unsigned type-qualified builtins and use them to implement
(rounding) halving-add Neon intrinsics. This removes the need for
many casts in arm_neon.h.

gcc/ChangeLog:

2021-11-09  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64-simd-builtins.def: Use BINOPU type
	qualifiers in generator macros for u[r]hadd builtins.
	* config/aarch64/arm_neon.h (vhadd_s8): Remove unnecessary
	cast.
	(vhadd_s16): Likewise.
	(vhadd_s32): Likewise.
	(vhadd_u8): Use type-qualified builtin and remove casts.
	(vhadd_u16): Likewise.
	(vhadd_u32): Likewise.
	(vhaddq_s8): Remove unnecessary cast.
	(vhaddq_s16): Likewise.
	(vhaddq_s32): Likewise.
	(vhaddq_u8): Use type-qualified builtin and remove casts.
	(vhaddq_u16): Likewise.
	(vhaddq_u32): Likewise.
	(vrhadd_s8): Remove unnecessary cast.
	(vrhadd_s16): Likewise.
	(vrhadd_s32): Likewise.
	(vrhadd_u8): Use type-qualified builtin and remove casts.
	(vrhadd_u16): Likewise.
	(vrhadd_u32): Likewise.
	(vrhaddq_s8): Remove unnecessary cast.
	(vrhaddq_s16): Likewise.
	(vrhaddq_s32): Likewise.
	(vrhaddq_u8): Use type-wualified builtin and remove casts.
	(vrhaddq_u16): Likewise.
	(vrhaddq_u32): Likewise.
2021-11-11 15:34:50 +00:00
Jonathan Wright
ee03bed0b0 aarch64: Use type-qualified builtins for USUB[LW][2] Neon intrinsics
Declare unsigned type-qualified builtins and use them to implement
widening-subtract Neon intrinsics. This removes the need for many
casts in arm_neon.h.

gcc/ChangeLog:

2021-11-09  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64-simd-builtins.def: Use BINOPU type
	qualifiers in generator macros for usub[lw][2] builtins.
	* config/aarch64/arm_neon.h (vsubl_s8): Remove unnecessary
	cast.
	(vsubl_s16): Likewise.
	(vsubl_s32): Likewise.
	(vsubl_u8): Use type-qualified builtin and remove casts.
	(vsubl_u16): Likewise.
	(vsubl_u32): Likewise.
	(vsubl_high_s8): Remove unnecessary cast.
	(vsubl_high_s16): Likewise.
	(vsubl_high_s32): Likewise.
	(vsubl_high_u8): Use type-qualified builtin and remove casts.
	(vsubl_high_u16): Likewise.
	(vsubl_high_u32): Likewise.
	(vsubw_s8): Remove unnecessary casts.
	(vsubw_s16): Likewise.
	(vsubw_s32): Likewise.
	(vsubw_u8): Use type-qualified builtin and remove casts.
	(vsubw_u16): Likewise.
	(vsubw_u32): Likewise.
	(vsubw_high_s8): Remove unnecessary cast.
	(vsubw_high_s16): Likewise.
	(vsubw_high_s32): Likewise.
	(vsubw_high_u8): Use type-qualified builtin and remove casts.
	(vsubw_high_u16): Likewise.
	(vsubw_high_u32): Likewise.
2021-11-11 15:34:50 +00:00
Jonathan Wright
10e98c3c63 aarch64: Use type-qualified builtins for UADD[LW][2] Neon intrinsics
Declare unsigned type-qualified builtins and use them to implement
widening-add Neon intrinsics. This removes the need for many casts in
arm_neon.h.

gcc/ChangeLog:

2021-11-09  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64-simd-builtins.def: Use BINOPU type
	qualifiers in generator macros for uadd[lw][2] builtins.
	* config/aarch64/arm_neon.h (vaddl_s8): Remove unnecessary
	cast.
	(vaddl_s16): Likewise.
	(vaddl_s32): Likewise.
	(vaddl_u8): Use type-qualified builtin and remove casts.
	(vaddl_u16): Likewise.
	(vaddl_u32): Likewise.
	(vaddl_high_s8): Remove unnecessary cast.
	(vaddl_high_s16): Likewise.
	(vaddl_high_s32): Likewise.
	(vaddl_high_u8): Use type-qualified builtin and remove casts.
	(vaddl_high_u16): Likewise.
	(vaddl_high_u32): Likewise.
	(vaddw_s8): Remove unnecessary cast.
	(vaddw_s16): Likewise.
	(vaddw_s32): Likewise.
	(vaddw_u8): Use type-qualified builtin and remove casts.
	(vaddw_u16): Likewise.
	(vaddw_u32): Likewise.
	(vaddw_high_s8): Remove unnecessary cast.
	(vaddw_high_s16): Likewise.
	(vaddw_high_s32): Likewise.
	(vaddw_high_u8): Use type-qualified builtin and remove casts.
	(vaddw_high_u16): Likewise.
	(vaddw_high_u32): Likewise.
2021-11-11 15:34:50 +00:00
Jonathan Wright
a22c03d439 aarch64: Use type-qualified builtins for [R]SHRN[2] Neon intrinsics
Declare unsigned type-qualified builtins and use them for [R]SHRN[2]
Neon intrinsics. This removes the need for casts in arm_neon.h.

gcc/ChangeLog:

2021-11-08  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64-simd-builtins.def: Declare type-
	qualified builtins for [R]SHRN[2].
	* config/aarch64/arm_neon.h (vshrn_n_u16): Use type-qualified
	builtin and remove casts.
	(vshrn_n_u32): Likewise.
	(vshrn_n_u64): Likewise.
	(vrshrn_high_n_u16): Likewise.
	(vrshrn_high_n_u32): Likewise.
	(vrshrn_high_n_u64): Likewise.
	(vrshrn_n_u16): Likewise.
	(vrshrn_n_u32): Likewise.
	(vrshrn_n_u64): Likewise.
	(vshrn_high_n_u16): Likewise.
	(vshrn_high_n_u32): Likewise.
	(vshrn_high_n_u64): Likewise.
2021-11-11 15:34:50 +00:00
Jonathan Wright
439906c61d aarch64: Use type-qualified builtins for XTN[2] Neon intrinsics
Declare unsigned type-qualified builtins and use them for XTN[2] Neon
intrinsics. This removes the need for casts in arm_neon.h.

gcc/ChangeLog:

2021-11-08  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64-simd-builtins.def: Declare unsigned
	type-qualified builtins for XTN[2].
	* config/aarch64/arm_neon.h (vmovn_high_u16): Use type-
	qualified builtin and remove casts.
	(vmovn_high_u32): Likewise.
	(vmovn_high_u64): Likewise.
	(vmovn_u16): Likewise.
	(vmovn_u32): Likewise.
	(vmovn_u64): Likewise.
2021-11-11 15:34:49 +00:00
Jonathan Wright
a2590b545e aarch64: Use type-qualified builtins for PMUL[L] Neon intrinsics
Declare poly type-qualified builtins and use them for PMUL[L] Neon
intrinsics. This removes the need for casts in arm_neon.h.

gcc/ChangeLog:

2021-11-08  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64-simd-builtins.def: Use poly type
	qualifier in builtin generator macros.
	* config/aarch64/arm_neon.h (vmul_p8): Use type-qualified
	builtin and remove casts.
	(vmulq_p8): Likewise.
	(vmull_high_p8): Likewise.
	(vmull_p8): Likewise.
2021-11-11 15:34:49 +00:00
Jonathan Wright
515ef83098 aarch64: Use type-qualified builtins for unsigned MLA/MLS intrinsics
Declare type-qualified builtins and use them for MLA/MLS Neon
intrinsics that operate on unsigned types. This eliminates lots of
casts in arm_neon.h.

gcc/ChangeLog:

2021-11-08  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/aarch64-simd-builtins.def: Declare type-
	qualified builtin generators for unsigned MLA/MLS intrinsics.
	* config/aarch64/arm_neon.h (vmla_n_u16): Use type-qualified
	builtin.
	(vmla_n_u32): Likewise.
	(vmla_u8): Likewise.
	(vmla_u16): Likewise.
	(vmla_u32): Likewise.
	(vmlaq_n_u16): Likewise.
	(vmlaq_n_u32): Likewise.
	(vmlaq_u8): Likewise.
	(vmlaq_u16): Likewise.
	(vmlaq_u32): Likewise.
	(vmls_n_u16): Likewise.
	(vmls_n_u32): Likewise.
	(vmls_u8): Likewise.
	(vmls_u16): Likewise.
	(vmls_u32): Likewise.
	(vmlsq_n_u16): Likewise.
	(vmlsq_n_u32): Likewise.
	(vmlsq_u8): Likewise.
	(vmlsq_u16): Likewise.
	(vmlsq_u32): Likewise.
2021-11-11 15:34:49 +00:00
Raphael Moreira Zinsly
8d71d3a317 libgcc: Fix backtrace fallback on PowerPC Big-endian
At the end of the backtrace stream _Unwind_Find_FDE() may not be able
to find the frame unwind info and will later call the backtrace fallback
instead of finishing. This occurs when using an old libc on ppc64 due to
dl_iterate_phdr() not being able to set the fde in the last trace.
When this occurs the cfa of the trace will be behind of context's cfa.
Also, libgo’s probestackmaps() calls the backtrace with a null pointer
and can get to the backchain fallback with the same problem, in this case
we are only interested in find a stack map, we don't need nor can do a
backchain.
_Unwind_ForcedUnwind_Phase2() can hit the same issue as it uses
uw_frame_state_for(), so we need to treat _URC_NORMAL_STOP.

libgcc/ChangeLog:
	PR libgcc/103044
	* config/rs6000/linux-unwind.h (ppc_backchain_fallback): Check if it's
	called with a null argument or at the end of the backtrace and return.
	* unwind.inc (_Unwind_ForcedUnwind_Phase2): Treat _URC_NORMAL_STOP.
2021-11-11 15:29:25 +00:00
Jan Hubicka
8d3abf42d5 Fix some side cases of side effects discovery
I wrote script comparing modref pure/const discovery with ipa-pure-const
and found mistakes on both ends.  This plugs the modref differences in handling
looping pure consts which were previously missed due to early exits on
ECF_CONST | ECF_PURE.  Those early exists are bit anoying and I think as
a cleanup I may just drop some of them as premature optimizations coming from
time modref was very simplistic on what it propagates.

gcc/ChangeLog:

2021-11-11  Jan Hubicka  <hubicka@ucw.cz>

	* ipa-modref.c (modref_summary::useful_p): Check also for side-effects
	with looping const/pure.
	(modref_summary_lto::useful_p): Likewise.
	(merge_call_side_effects): Merge side effects before early exit
	for pure/const.
	(process_fnspec): Also handle pure functions.
	(analyze_call): Do not early exit on looping pure const.
	(propagate_unknown_call): Also handle nontrivial SCC as side-effect.
	(modref_propagate_in_scc): Update.
2021-11-11 16:07:47 +01:00
Richard Biener
fac4c4bdab tree-optimization/103190 - fix assert in reassoc stmt placement with asm
This makes sure to only assert we don't run into a asm goto when
inserting a stmt in reassoc, matching the condition in
can_reassociate_p.  We can handle EH edges from an asm just like
EH edges from any other stmt.

2021-11-11  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/103190
	* tree-ssa-reassoc.c (insert_stmt_after): Only assert on asm goto.
2021-11-11 16:06:24 +01:00
Aldy Hernandez
bfa04d0ec9 Move import population from threader to path solver.
Imports are our nomenclature for external SSA names to a block that
are used to calculate the outgoing edges for said block.  For example,
in the following snippet:

    <bb 2> :
    _1 = b_10 == block_11;
    _2 = b_10 != -1;
    _3 = _1 & _2;
    if (_3 != 0)
      goto <bb 3>; [INV]
    else
      goto <bb 5>; [INV]

...the imports to the block are b_10 and block_11 since they are both
needed to calculate _3.

The path solver takes a bitmap of imports in addition to the path
itself.  This sets up the number of SSA names to be on the lookout
for, while resolving the final conditional.

Calculating these imports was initially done in the threader, since it
was the only user of the path solver.  With new clients, it has become
obvious that populating the imports should be a task for the path
solver, so it can be shared among the clients.

This patch moves the import code to the solver, making both the solver
and the threader simpler in the process.  This is because intent is
clearer and some duplicate code was removed.

This reshuffling had the net effect of giving us a handful of new
threads through my suite of .ii files (125).  This was unexpected, but
welcome nevertheless.  There is no performance difference in callgrind
over the same suite.

Regstrapped on x86-64 Linux.

gcc/ChangeLog:

	* gimple-range-path.cc (path_range_query::add_copies_to_imports):
	Rename to...
	(path_range_query::compute_imports): ...this.  Adapt it so it can
	be passed the imports bitmap instead of working on m_imports.
	(path_range_query::compute_ranges): Call compute_imports in all
	cases unless an imports bitmap is passed.
	* gimple-range-path.h (path_range_query::compute_imports): New.
	(path_range_query::add_copies_to_imports): Remove.
	* tree-ssa-threadbackward.c (back_threader::resolve_def): Remove.
	(back_threader::find_paths_to_names): Inline resolve_def.
	(back_threader::find_paths): Call compute_imports.
	(back_threader::resolve_phi): Adjust comment.
2021-11-11 15:42:00 +01:00
Sandra Loosemore
1ea781a865 Testsuite: Various fixes for nios2.
2021-11-11  Sandra Loosemore  <sandra@codesourcery.com>

	gcc/testsuite/
	* g++.dg/warn/Wmismatched-new-delete-5.C: Add
	-fdelete-null-pointer-checks.
	* gcc.dg/attr-returns-nonnull.c: Likewise.
	* gcc.dg/debug/btf/btf-datasec-1.c: Add -G0 option for nios2.
	* gcc.dg/ifcvt-4.c: Skip on nios2.
	* gcc.dg/struct-by-value-1.c: Add -G0 option for nios2.
2021-11-11 06:38:58 -08:00
Richard Biener
8865133614 tree-optimization/103188 - avoid running ranger on not-up-to-date SSA
The following splits loop header copying into an analysis phase
that uses ranger and a transform phase that can do without to avoid
running ranger on IL that has SSA form not updated.

2021-11-11  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/103188
	* tree-ssa-loop-ch.c (should_duplicate_loop_header_p):
	Remove query parameter, split out check for size
	optimization.
	(ch_base::m_ranger, cb_base::m_query): Remove.
	(ch_base::copy_headers): Split processing loop into
	analysis around which we allocate and use ranger and
	transform where we do not.
	(pass_ch::execute): Do not allocate/free ranger here.
	(pass_ch_vect::execute): Likewise.

	* gcc.dg/torture/pr103188.c: New testcase.
2021-11-11 15:01:26 +01:00
Jan Hubicka
6e30c48120 Fix recursion discovery in ipa-pure-const
We make self recursive functions as looping of fear of endless recursion.
This is done correctly for local pure/const and for non-trivial SCCs in
callgraph, but for trivial SCCs we miss the flag.

I think it is bad decision since infinite recursion will run out of stack,
but changing it upsets some testcases and should be done independently.
So this patch is fixing current behaviour to be consistent.

gcc/ChangeLog:

2021-11-11  Jan Hubicka  <hubicka@ucw.cz>

	* ipa-pure-const.c (propagate_pure_const): Self recursion is
	a side effects.
2021-11-11 14:39:19 +01:00
Jan Hubicka
61396dfb2a Fix noreturn discovery.
Fix ipa-pure-const handling of noreturn flags.  It is not safe to set it for
interposable symbols and we should also set it for aliases (just like we do for
other flags).  This patch merely copies other flag handling and implements it
here.

gcc/ChangeLog:

2021-11-11  Jan Hubicka  <hubicka@ucw.cz>

	* cgraph.c (set_noreturn_flag_1): New function.
	(cgraph_node::set_noreturn_flag): New member function
	* cgraph.h (cgraph_node::set_noreturn_flags): Declare.
	* ipa-pure-const.c (pass_local_pure_const::execute): Use it.
2021-11-11 14:35:10 +01:00
Patrick Palka
e106221db2 c++: use auto_vec in cp_parser_template_argument_list
gcc/cp/ChangeLog:

	* parser.c (cp_parser_template_argument_list): Use auto_vec
	instead of manual memory management.
2021-11-11 08:10:20 -05:00
Jakub Jelinek
fa4fcb111a libgomp: Use TLS storage for omp_get_num_teams()/omp_get_team_num() values
When thinking about GOMP_teams3, I've realized that using global variables
for the values returned by omp_get_num_teams()/omp_get_team_num() calls
is incorrect even with our right now dumb way of implementing host teams.
The problems are two, one is if host teams is used from multiple pthread_create
created threads - the spec says that host teams can't be nested inside of
explicit parallel or other teams constructs, but with pthread_create the
standard says obviously nothing about it.  Another more important thing
is host fallback, right now we don't do anything for omp_get_num_teams()
or omp_get_team_num() which was fine before host teams was introduced and
the 5.1 requirement that num_teams clause specifies minimum of teams, but
with the global vars it means inside of target teams num_teams (2) we happily
return omp_get_num_teams() == 4 if the target teams is inside of host teams
with num_teams(4).  With target fallback being invoked from parallel
regions global vars simply can't work right on the host.

So, this patch moves them to struct gomp_thread and propagates those for
parallel to child threads.  For host fallback, the implicit zeroing of
*thr results in us returning omp_get_num_teams () == 1 and
omp_get_team_num () == 0 which is fine for target teams without num_teams
clause, for target teams with num_teams clause something to work on and
for target without teams nested in it I've asked on omp-lang what should
be done.

2021-11-11  Jakub Jelinek  <jakub@redhat.com>

	* libgomp.h (struct gomp_thread): Add num_teams and team_num members.
	* team.c (struct gomp_thread_start_data): Likewise.
	(gomp_thread_start): Initialize thr->num_teams and thr->team_num.
	(gomp_team_start): Initialize start_data->num_teams and
	start_data->team_num.  Update nthr->num_teams and nthr->team_num.
	* teams.c (gomp_num_teams, gomp_team_num): Remove.
	(GOMP_teams_reg): Set and restore thr->num_teams and thr->team_num
	instead of gomp_num_teams and gomp_team_num.
	(omp_get_num_teams): Use thr->num_teams + 1 instead of gomp_num_teams.
	(omp_get_team_num): Use thr->team_num instead of gomp_team_num.
	* testsuite/libgomp.c/teams-4.c: New test.
2021-11-11 13:57:31 +01:00
Aldy Hernandez
3e5a190533 Resolve entry loop condition for the edge remaining in the loop.
There is a known failure for gfortran.dg/vector_subscript_1.f90.  It
was previously failing for all optimization levels except -Os.
Getting the loop header copying right, now makes it fail for all
levels :-).

Tested on x86-64 Linux.

Co-authored-by: Richard Biener <rguenther@suse.de>

gcc/ChangeLog:

	* tree-ssa-loop-ch.c (entry_loop_condition_is_static): Resolve
	statically to the edge remaining in the loop.
2021-11-11 13:17:32 +01:00
Richard Biener
a5fed4063f middle-end/103181 - fix operation_could_trap_p for vector division
For integer vector division we only checked for all zero vector
constants rather than checking whether any element in the constant
vector is zero.

2021-11-11  Richard Biener  <rguenther@suse.de>

	PR middle-end/103181
	* tree-eh.c (operation_could_trap_helper_p): Properly
	check vector constants for a zero element for integer
	division.  Separate floating point and integer division code.

	* gcc.dg/torture/pr103181.c: New testcase.
2021-11-11 10:32:51 +01:00
Jakub Jelinek
10db757301 dwarf2out: Fix up field_byte_offset [PR101378]
For PCC_BITFIELD_TYPE_MATTERS field_byte_offset has quite large code
to deal with it since many years ago (see it e.g. in GCC 3.2, although it
used to be on HOST_WIDE_INTs, then on double_ints, now on offset_ints).
But that code apparently isn't able to cope with members with empty class
types with [[no_unique_address]] attribute, because the empty classes have
non-zero type size but zero decl size and so one can end up from the
computation with negative offset or offset 1 byte smaller than it should be.
For !PCC_BITFIELD_TYPE_MATTERS, we just use
    tree_result = byte_position (decl);
which seems exactly right even for the empty classes or anything which is
not a bitfield (and for which we don't add DW_AT_bit_offset attribute).
So, instead of trying to handle those no_unique_address members in the
current already very complicated code, this limits it to bitfields.

stor-layout.c PCC_BITFIELD_TYPE_MATTERS handling also affects only
bitfields, twice it checks DECL_BIT_FIELD and once DECL_BIT_FIELD_TYPE.

As discussed, this patch uses DECL_BIT_FIELD_TYPE check, because
DECL_BIT_FIELD might be cleared for some bitfields with bitsizes
multiple of BITS_PER_UNIT and e.g.
struct S { int e; int a : 1, b : 7, c : 8, d : 16; } s;
struct T { int a : 1, b : 7; long long c : 8; int d : 16; } t;

int
main ()
{
  s.c = 0x55;
  s.d = 0xaaaa;
  t.c = 0x55;
  t.d = 0xaaaa;
  s.e++;
}
has different debug info with DECL_BIT_FIELD check.

2021-11-11  Jakub Jelinek  <jakub@redhat.com>

	PR debug/101378
	* dwarf2out.c (field_byte_offset): Do the PCC_BITFIELD_TYPE_MATTERS
	handling only for DECL_BIT_FIELD_TYPE decls.

	* g++.dg/debug/dwarf2/pr101378.C: New test.
2021-11-11 10:16:45 +01:00
Prathamesh Kulkarni
145be5efaf [aarch64] PR102376 - Emit better diagnostic for arch extensions in target attr.
gcc/ChangeLog:
	PR target/102376
	* config/aarch64/aarch64.c (aarch64_process_target_attr): Check if
	token is arch extension without leading '+' and emit appropriate
	diagnostic for the same.

gcc/testsuite/ChangeLog:
	PR target/102376
	* gcc.target/aarch64/pr102376.c: New test.
2021-11-11 14:40:21 +05:30
Jakub Jelinek
48d7327f2a openmp: Add support for 2 argument num_teams clause
In OpenMP 5.1, num_teams clause can accept either one expression as before,
but it in that case changed meaning, rather than create <= expression
teams it is now create == expression teams.  Or it accepts two expressions
separated by :, with the meaning that the first is low bound and second upper
bound on how many teams should be created.  The other ways to set number of
teams are upper bounds with lower bound of 1.

The following patch does parsing of this for C/C++.  For host teams, we
actually don't need to do anything further right now, we always create
(pretend to create) exactly the requested number of teams, so we can just
evaluate and throw away the lower bound for now.
For teams nested in target, we don't guarantee that though and further
work will be needed.
In particular, omplower now turns the teams part of:
struct S { S (); S (const S &); ~S (); int s; };
void bar (S &, S &);
int baz ();
_Pragma ("omp declare target to (baz)");

void
foo (void)
{
  S a, b;
  #pragma omp target private (a) map (b)
  {
    #pragma omp teams firstprivate (b) num_teams (baz ())
    {
      bar (a, b);
    }
  }
}
into:
  retval.0 = baz ();
  retval.1 = retval.0;
  {
    unsigned int retval.3;
    struct S * D.2549;
    struct S b;

    retval.3 = (unsigned int) retval.1;
    D.2549 = .omp_data_i->b;
    S::S (&b, D.2549);
    #pragma omp teams num_teams(retval.1) firstprivate(b) shared(a)
    __builtin_GOMP_teams (retval.3, 0);
    {
      bar (&a, &b);
    }
    S::~S (&b);
    #pragma omp return(nowait)
  }
IMHO we want a new API, say GOMP_teams3 which will take 3 arguments
instead of 2 (the lower and upper bounds from num_teams and thread_limit)
and will return a bool whether it should do the teams body or not.
And, we should add right before outermost {} above
while (__builtin_GOMP_teams3 ((unsigned) retval.1, (unsigned) retval.1, 0))
and remove the __builtin_GOMP_teams call.  The current function performs
exit equivalent (at least on NVPTX) which seems bad because that means
the destructors of e.g. private variables on target aren't invoked, and
at the current placement neither destructors of the already constructed
privatized variables in teams.
I'll do this next on the compiler side, but I'm afraid I'll need help
with the nvptx and amdgcn implementations.  E.g. for nvptx, we won't be
able to use %ctaid.x .  I think ideal would be to use a .shared
integer variable for the omp_get_team_num value, but I don't have any
experience with that, are .shared variables zero initialized by default,
or do they have random value at start?  PTX docs say they aren't initializable.

2021-11-11  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* tree.h (OMP_CLAUSE_NUM_TEAMS_EXPR): Rename to ...
	(OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR): ... this.
	(OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR): Define.
	* tree.c (omp_clause_num_ops): Increase num ops for
	OMP_CLAUSE_NUM_TEAMS to 2.
	* tree-pretty-print.c (dump_omp_clause): Print optional lower bound
	for OMP_CLAUSE_NUM_TEAMS.
	* gimplify.c (gimplify_scan_omp_clauses): Gimplify
	OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR if non-NULL.
	(optimize_target_teams): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead
	of OMP_CLAUSE_NUM_TEAMS_EXPR.  Handle OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR.
	* omp-low.c (lower_omp_teams): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR
	instead of OMP_CLAUSE_NUM_TEAMS_EXPR.
	* omp-expand.c (expand_teams_call, get_target_arguments): Likewise.
gcc/c/
	* c-parser.c (c_parser_omp_clause_num_teams): Parse optional
	lower-bound and store it into OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR.
	Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of
	OMP_CLAUSE_NUM_TEAMS_EXPR.
	(c_parser_omp_target): For OMP_CLAUSE_NUM_TEAMS evaluate before
	combined target teams even lower-bound expression.
gcc/cp/
	* parser.c (cp_parser_omp_clause_num_teams): Parse optional
	lower-bound and store it into OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR.
	Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of
	OMP_CLAUSE_NUM_TEAMS_EXPR.
	(cp_parser_omp_target): For OMP_CLAUSE_NUM_TEAMS evaluate before
	combined target teams even lower-bound expression.
	* semantics.c (finish_omp_clauses): Handle
	OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR of OMP_CLAUSE_NUM_TEAMS clause.
	* pt.c (tsubst_omp_clauses): Likewise.
	(tsubst_expr): For OMP_CLAUSE_NUM_TEAMS evaluate before
	combined target teams even lower-bound expression.
gcc/fortran/
	* trans-openmp.c (gfc_trans_omp_clauses): Use
	OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR.
gcc/testsuite/
	* c-c++-common/gomp/clauses-1.c (bar): Supply lower-bound expression
	to half of the num_teams clauses.
	* c-c++-common/gomp/num-teams-1.c: New test.
	* c-c++-common/gomp/num-teams-2.c: New test.
	* g++.dg/gomp/attrs-1.C (bar): Supply lower-bound expression
	to half of the num_teams clauses.
	* g++.dg/gomp/attrs-2.C (bar): Likewise.
	* g++.dg/gomp/num-teams-1.C: New test.
	* g++.dg/gomp/num-teams-2.C: New test.
libgomp/
	* testsuite/libgomp.c-c++-common/teams-1.c: New test.
2021-11-11 09:42:47 +01:00
Richard Biener
0136f25ac0 Remove find_pdom and find_dom
This removes now useless wrappers around get_immediate_dominator.

2021-11-11  Richard Biener  <rguenther@suse.de>

	* cfganal.c (find_pdom): Remove.
	(control_dependences::find_control_dependence): Remove
	special-casing of entry block, call get_immediate_dominator
	directly.
	* gimple-predicate-analysis.cc (find_pdom): Remove.
	(find_dom): Likewise.
	(find_control_equiv_block): Call get_immediate_dominator
	directly.
	(compute_control_dep_chain): Likewise.
	(predicate::init_from_phi_def): Likewise.
2021-11-11 09:20:15 +01:00
Richard Biener
a11afa7af8 Apply TLC to control dependence compute
This makes the control dependence compute avoid a find_edge
and optimizes allocation by embedding the bitmap head into the
vector of control dependences instead of allocating all of them.
It also uses a local bitmap obstack.

The bitmap changes make it necessary to shuffle some includes.

2021-11-10  Richard Biener  <rguenther@suse.de>

	* cfganal.h (control_dependences::control_dependence_map):
	Embed bitmap_head.
	(control_dependences::m_bitmaps): New.
	* cfganal.c (control_dependences::set_control_dependence_map_bit):
	Adjust.
	(control_dependences::clear_control_dependence_bitmap):
	Likewise.
	(control_dependences::find_control_dependence): Do not
	find_edge for the abnormal edge test.
	(control_dependences::control_dependences): Instead do not
	add abnormal edges to the edge list.  Adjust.
	(control_dependences::~control_dependences): Likewise.
	(control_dependences::get_edges_dependent_on): Likewise.
	* function-tests.c: Include bitmap.h.

gcc/analyzer/
	* supergraph.cc: Include bitmap.h.

gcc/c/
	* gimple-parser.c: Shuffle bitmap.h include.
2021-11-11 09:19:49 +01:00