OpenE2K/gcc - gcc - Expired Mentality Git

Author	SHA1	Message	Date
Jonathan Wakely	a54ce8865a	libstdc++: Print assertion messages to stderr [PR59675] This replaces the printf used by failed debug assertions with fprintf, so we can write to stderr. To avoid including <stdio.h> the assert function is moved into the library. To avoid programs using a vague linkage definition of the old inline function, the function is renamed. Code compiled with old versions of GCC might still call the old function, but code compiled with the newer GCC will call the new function and write to stderr. libstdc++-v3/ChangeLog: PR libstdc++/59675 * acinclude.m4 (libtool_VERSION): Bump version. * config/abi/pre/gnu.ver (GLIBCXX_3.4.30): Add version and export new symbol. * configure: Regenerate. * include/bits/c++config (__replacement_assert): Remove, declare __glibcxx_assert_fail instead. * src/c++11/debug.cc (__glibcxx_assert_fail): New function to replace __replacement_assert, writing to stderr instead of stdout. * testsuite/util/testsuite_abi.cc: Update latest version.	2021-11-12 12:23:10 +00:00
Mikael Morin	68d62cb206	fortran: Ignore unused args in scalarization [PR97896] The KIND argument of the INDEX intrinsic is a compile time constant that is used at compile time only to resolve to a kind-specific library function. That argument is otherwise completely ignored at runtime, and there is no code generated for it as the library procedure has no kind argument. This confuses the scalarizer which expects to see every argument of elemental functions used when calling a procedure. This change removes the argument from the scalarization lists at the beginning of the scalarization process, so that the argument is completely ignored. This also reverts the existing workaround (commit `d09847357b` except for its testcase). PR fortran/97896 gcc/fortran/ChangeLog: * intrinsic.c (add_sym_4ind): Remove. (add_functions): Use add_sym4 instead of add_sym4ind. Don’t special case the index intrinsic. * iresolve.c (gfc_resolve_index_func): Use the individual arguments directly instead of the full argument list. * intrinsic.h (gfc_resolve_index_func): Update the declaration accordingly. * trans-decl.c (gfc_get_extern_function_decl): Don’t modify the list of arguments in the case of the index intrinsic. * trans-array.h (gfc_get_intrinsic_for_expr, gfc_get_proc_ifc_for_expr): New. * trans-array.c (gfc_get_intrinsic_for_expr, arg_evaluated_for_scalarization): New. (gfc_walk_elemental_function_args): Add intrinsic procedure as argument. Count arguments. Check arg_evaluated_for_scalarization. * trans-intrinsic.c (gfc_walk_intrinsic_function): Update call. * trans-stmt.c (get_intrinsic_for_code): New. (gfc_trans_call): Update call. gcc/testsuite/ChangeLog: * gfortran.dg/index_5.f90: New.	2021-11-12 13:10:55 +01:00
Jakub Jelinek	7d6da11fce	openmp: Honor OpenMP 5.1 num_teams lower bound The following patch implements what I've been talking about earlier, honor that for explicit num_teams clause we create at least the lower-bound (if not specified, upper-bound) teams in the league. For host fallback, it still means we only have one thread doing all the teams, sequentially one after another. For PTX and GCN, I think the new teams-2.c test and maybe teams-4.c too will or might fail. For these offloads, I think it is ok to remove symbols no longer used from libgomp.a. If num_teams_lower is bigger than the provided num_blocks or num_workgroups, we should arrange for gomp_num_teams_var to be num_teams_lower - 1, stop using the %ctaid.x or __builtin_gcn_dim_pos (0) for omp_get_team_num () and instead use for it some .shared var that GOMP_teams4 initializes to %ctaid.x or __builtin_gcn_dim_pos (0) when first and for !first increment that by num_blocks or num_workgroups each time and only return false when we are above num_teams_lower. Any help with actually implementing this for the 2 architectures highly appreciated. 2021-11-12 Jakub Jelinek <jakub@redhat.com> gcc/ * omp-builtins.def (BUILT_IN_GOMP_TEAMS): Remove. (BUILT_IN_GOMP_TEAMS4): New. * builtin-types.def (BT_FN_VOID_UINT_UINT): Remove. (BT_FN_BOOL_UINT_UINT_UINT_BOOL): New. * omp-low.c (lower_omp_teams): Use GOMP_teams4 instead of GOMP_teams, pass to it also num_teams lower-bound expression or a dup of upper-bound if it is missing and a flag whether it is the first call or not. gcc/fortran/ * types.def (BT_FN_VOID_UINT_UINT): Remove. (BT_FN_BOOL_UINT_UINT_UINT_BOOL): New. libgomp/ * libgomp_g.h (GOMP_teams4): Declare. * libgomp.map (GOMP_5.1): Export GOMP_teams4. * target.c (GOMP_teams4): New function. * config/nvptx/target.c (GOMP_teams): Remove. (GOMP_teams4): New function. * config/gcn/target.c (GOMP_teams): Remove. (GOMP_teams4): New function. * testsuite/libgomp.c/teams-4.c (main): Expect exactly 2 teams instead of <= 2. * testsuite/libgomp.c-c++-common/teams-2.c: New test.	2021-11-12 12:41:22 +01:00
Martin Liska	5f516a6a5d	Remove unused function. PR tree-optimization/102497 gcc/ChangeLog: * gimple-predicate-analysis.cc (add_pred): Remove unused function:	2021-11-12 12:40:02 +01:00
Richard Biener	140346fa24	tree-optimization/103204 - fix missed valueization in VN The following fixes a missed valueization when simplifying a MEM[&...] combination during valueization. 2021-11-12 Richard Biener <rguenther@suse.de> PR tree-optimization/103204 * tree-ssa-sccvn.c (valueize_refs_1): Re-valueize the top operand after folding in an address. * gcc.dg/torture/pr103204.c: New testcase.	2021-11-12 09:11:49 +01:00
Alan Modra	c60ded6f5e	Make opcodes configure depend on bfd configure The idea is for opcodes to be able to see whether bfd is compiled for 64-bit. A lot of --enable-targets=all libopcodes is wasted space if bfd can't load 64-bit target object files. * Makefile.def (configure-opcodes): Depend on configure-bfd. * Makefile.in: Regenerate.	2021-11-12 18:34:12 +10:30
Jonathan Wakely	1ae8edf5f7	libstdc++: Implement constexpr std::vector for C++20 This implements P1004R2 ("Making std::vector constexpr") for C++20. For now, debug mode vectors are not supported in constant expressions. To make that work we might need to disable all attaching/detaching of safe iterators. That can be fixed later. Co-authored-by: Josh Marshall <joshua.r.marshall.1991@gmail.com> libstdc++-v3/ChangeLog: * include/bits/alloc_traits.h (_Destroy): Make constexpr for C++20 mode. * include/bits/allocator.h (__shrink_to_fit::_S_do_it): Likewise. * include/bits/stl_algobase.h (__fill_a1): Declare _Bit_iterator overload constexpr for C++20. * include/bits/stl_bvector.h (_Bit_type, _S_word_bit): Move out of inline namespace. (_Bit_reference, _Bit_iterator_base, _Bit_iterator) (_Bit_const_iterator, _Bvector_impl_data, _Bvector_base) (vector<bool, A>>): Add constexpr to every member function. (_Bvector_base::_M_allocate): Initialize storage during constant evaluation. (vector<bool, A>::_M_initialize_value): Use __fill_bvector_n instead of memset. (__fill_bvector_n): New helper function to replace memset during constant evaluation. * include/bits/stl_uninitialized.h (__uninitialized_copy<false>): Move logic to ... (__do_uninit_copy): New function. (__uninitialized_fill<false>): Move logic to ... (__do_uninit_fill): New function. (__uninitialized_fill_n<false>): Move logic to ... (__do_uninit_fill_n): New function. (__uninitialized_copy_a): Add constexpr. Use __do_uninit_copy. (__uninitialized_move_a, __uninitialized_move_if_noexcept_a): Add constexpr. (__uninitialized_fill_a): Add constexpr. Use __do_uninit_fill. (__uninitialized_fill_n_a): Add constexpr. Use __do_uninit_fill_n. (__uninitialized_default_n, __uninitialized_default_n_a) (__relocate_a_1, __relocate_a): Add constexpr. * include/bits/stl_vector.h (_Vector_impl_data, _Vector_impl) (_Vector_base, vector): Add constexpr to every member function. (_Vector_impl::_S_adjust): Disable ASan annotation during constant evaluation. (_Vector_base::_S_use_relocate): Disable bitwise-relocation during constant evaluation. (vector::_Temporary_value): Use a union for storage. * include/bits/vector.tcc (vector, vector<bool>): Add constexpr to every member function. * include/std/vector (erase_if, erase): Add constexpr. * testsuite/23_containers/headers/vector/synopsis.cc: Add constexpr for C++20 mode. * testsuite/23_containers/vector/bool/cmp_c++20.cc: Change to compile-only test using constant expressions. * testsuite/23_containers/vector/bool/capacity/29134.cc: Adjust namespace for _S_word_bit. * testsuite/23_containers/vector/bool/modifiers/insert/31370.cc: Likewise. * testsuite/23_containers/vector/cmp_c++20.cc: Likewise. * testsuite/23_containers/vector/cons/89164.cc: Adjust errors for C++20 and move C++17 test to ... * testsuite/23_containers/vector/cons/89164_c++17.cc: ... here. * testsuite/23_containers/vector/bool/capacity/constexpr.cc: New test. * testsuite/23_containers/vector/bool/cons/constexpr.cc: New test. * testsuite/23_containers/vector/bool/element_access/constexpr.cc: New test. * testsuite/23_containers/vector/bool/modifiers/assign/constexpr.cc: New test. * testsuite/23_containers/vector/bool/modifiers/constexpr.cc: New test. * testsuite/23_containers/vector/bool/modifiers/swap/constexpr.cc: New test. * testsuite/23_containers/vector/capacity/constexpr.cc: New test. * testsuite/23_containers/vector/cons/constexpr.cc: New test. * testsuite/23_containers/vector/data_access/constexpr.cc: New test. * testsuite/23_containers/vector/element_access/constexpr.cc: New test. * testsuite/23_containers/vector/modifiers/assign/constexpr.cc: New test. * testsuite/23_containers/vector/modifiers/constexpr.cc: New test. * testsuite/23_containers/vector/modifiers/swap/constexpr.cc: New test.	2021-11-12 00:42:39 +00:00
GCC Administrator	b39265d4fe	Daily bump.	2021-11-12 00:16:32 +00:00
Jonathan Wakely	4a407d358e	libstdc++: Fix debug containers for C++98 mode Since r12-5072 made _Safe_container::operator=(const _Safe_container&) protected, the debug containers no longer compile in C++98 mode. They have user-provided copy assignment operators in C++98 mode, and they assign each base class in turn. The 'this->_M_safe() = __x' expressions fail, because calling a protected member function is only allowed via 'this'. They could be fixed by using this->_Safe::operator=(__x) but a simpler solution is to just remove the user-provided assignment operators and let the compiler define them (as we do for C++11 and later, by defining them as defaulted). The only change needed for that to work is to define the _Safe_vector copy assignment operator in C++98 mode, so that the implicit __gnu_debug::vector::operator= definition will call it, instead of needing to call _M_update_guaranteed_capacity() manually. libstdc++-v3/ChangeLog: * include/debug/deque (deque::operator=(const deque&)): Remove definition. * include/debug/list (list::operator=(const list&)): Likewise. * include/debug/map.h (map::operator=(const map&)): Likewise. * include/debug/multimap.h (multimap::operator=(const multimap&)): Likewise. * include/debug/multiset.h (multiset::operator=(const multiset&)): Likewise. * include/debug/set.h (set::operator=(const set&)): Likewise. * include/debug/string (basic_string::operator=(const basic_string&)): Likewise. * include/debug/vector (vector::operator=(const vector&)): Likewise. (_Safe_vector::operator=(const _Safe_vector&)): Define for C++98 as well.	2021-11-11 21:55:11 +00:00
Aldy Hernandez	53b3edceab	Make ranger optional in path_range_query. All users of path_range_query are currently allocating a gimple_ranger only to pass it to the query object. It's tidier to just do it from path_range_query if no ranger was passed. Tested on x86-64 Linux. gcc/ChangeLog: * gimple-range-path.cc (path_range_query::path_range_query): New ctor without a ranger. (path_range_query::~path_range_query): Free ranger if necessary. (path_range_query::range_on_path_entry): Adjust m_ranger for pointer. (path_range_query::ssa_range_in_phi): Same. (path_range_query::compute_ranges_in_block): Same. (path_range_query::compute_imports): Same. (path_range_query::compute_ranges): Same. (path_range_query::range_of_stmt): Same. (path_range_query::compute_outgoing_relations): Same. * gimple-range-path.h (class path_range_query): New ctor. * tree-ssa-loop-ch.c (ch_base::copy_headers): Remove gimple_ranger as path_range_query allocates one. * tree-ssa-threadbackward.c (class back_threader): Remove m_ranger. (back_threader::~back_threader): Same.	2021-11-11 22:13:17 +01:00
Aldy Hernandez	a7753db4a7	Remove loop crossing restriction from the backward threader. We have much more thorough restrictions, that are shared between both threader implementations, in the registry. I've been meaning to remove the backward threader one, since it's only purpose was reducing the search space. Previously there was a small time penalty for its removal, but with the various patches in the past month, it looks like the removal is a wash performance wise. This catches 8 more jump threads in the backward threader in my suite. Presumably, because we disallowed all loop crossing, whereas the registry restrictions allow some crossing (if we exit the loop, etc). Tested on x86-64 Linux. gcc/ChangeLog: * tree-ssa-threadbackward.c (back_threader_profitability::profitable_path_p): Remove loop crossing restriction.	2021-11-11 22:13:17 +01:00
Bill Schmidt	8a8458ac6b	rs6000: Fix test_mffsl.c to require Power9 support 2021-11-11 Bill Schmidt <wschmidt@linux.ibm.com> gcc/testsuite/ * gcc.target/powerpc/test_mffsl.c: Require Power9.	2021-11-11 14:36:04 -06:00
Ian Lance Taylor	7846156274	compiler: traverse func subexprs when creating func descriptors Fix the Create_func_descriptors pass to traverse the subexpressions of the function in a Call_expression. There are no subexpressions in the normal case of calling a function a method directly, but there are subexpressions when in code like F().M() when F returns an interface type. Forgetting to traverse the function subexpressions was almost entirely hidden by the fact that we also created the necessary thunks in Bound_method_expression::do_flatten and Interface_field_reference_expression::do_get_backend. However, when the thunks were created there, they did not go through the order_evaluations pass. This almost always worked, but failed in the case in which the function being thunked returned multiple results, as order_evaluations takes the necessary step of moving the Call_expression into its own statement, and that would not happen when order_evaluations was not called. Avoid hiding errors like this by changing those methods to only lookup the previously created thunk, rather than creating it if it was not already created. The test case for this is https://golang.org/cl/363156. Fixes https://golang.org/issue/49512 Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/363274	2021-11-11 12:21:56 -08:00
Jonathan Wakely	083fd73202	libstdc++: Make pmr::memory_resource::allocate implicitly create objects Calling the placement version of ::operator new "implicitly creates objects in the returned region of storage" as per [intro.object]. This allows the returned memory to be used as storage for implicit-lifetime types (including arrays) without additional action by the caller. This is required by the proposed resolution of LWG 3147. libstdc++-v3/ChangeLog: * include/std/memory_resource (memory_resource::allocate): Implicitly create objects in the returned storage.	2021-11-11 18:16:17 +00:00
Jonathan Wakely	ef0e100f58	libstdc++: Remove public std::vector<bool>::data() member This function only exists to avoid an error in the debug mode vector, so doesn't need to be public. libstdc++-v3/ChangeLog: * include/bits/stl_bvector.h (vector<bool>::data()): Give protected access, and delete for C++11 and later.	2021-11-11 18:16:17 +00:00
Jan Hubicka	dc002e31fb	Fix gfortran.dg/inline_matmul_17.f90 template. As discussed on the mailing list the template actually tests for missed optimization where we fail to pragate size of an array. We no longer miss this after modref improvements. gcc/testsuite/ChangeLog: 2021-11-11 Jan Hubicka <hubicka@ucw.cz> * gfortran.dg/inline_matmul_17.f90: Fix template	2021-11-11 18:51:35 +01:00
Jan Hubicka	494bdadf28	Enable pure-const discovery in modref. We newly can handle some extra cases, for example: struct a {int a,b,c;}; __attribute__ ((noinline)) int init (struct a a) { a->a=1; a->b=2; a->c=3; } int const_fn () { struct a a; init (&a); return a.a + a.b + a.c; } Here pure/const stops on the fact that const_fn calls non-const init, while modref knows that the memory it initializes is local to const_fn. I ended up reordering passes so early modref is done after early pure-const mostly to avoid need to change testsuite which greps for const functions being detects in pure-const. Stil some testuiste compensation is needed. gcc/ChangeLog: 2021-11-11 Jan Hubicka <hubicka@ucw.cz> ipa-modref.c (analyze_function): Do pure/const discovery, return true on success. (pass_modref::execute): If pure/const is discovered fixup cfg. (ignore_edge): Do not ignore pure/const edges. (modref_propagate_in_scc): Do pure/const discovery, return true if cdtor was promoted pure/const. (pass_ipa_modref::execute): If needed remove unreachable functions. * ipa-pure-const.c (warn_function_noreturn): Fix whitespace. (warn_function_cold): Likewise. (skip_function_for_local_pure_const): Move earlier. (ipa_make_function_const): Break out from ... (ipa_make_function_pure): Break out from ... (propagate_pure_const): ... here. (pass_local_pure_const::execute): Use it. * ipa-utils.h (ipa_make_function_const): Declare. (ipa_make_function_pure): Declare. * passes.def: Move early modref after pure-const. gcc/testsuite/ChangeLog: 2021-11-11 Jan Hubicka <hubicka@ucw.cz> * c-c++-common/tm/inline-asm.c: Disable pure-const. * g++.dg/ipa/modref-1.C: Update template. * gcc.dg/tree-ssa/modref-11.c: Disable pure-const. * gcc.dg/tree-ssa/modref-14.c: New test. * gcc.dg/tree-ssa/modref-8.c: Do not optimize sibling calls. * gfortran.dg/do_subscript_3.f90: Add -O0.	2021-11-11 18:14:45 +01:00
David Malcolm	abdff441a0	diagnostic: fix unused variable 'def_tabstop' [PR103129] gcc/ChangeLog: PR other/103129 * diagnostic-show-locus.c (def_policy): Use def_tabstop. Signed-off-by: David Malcolm <dmalcolm@redhat.com>	2021-11-11 12:12:53 -05:00
Tobias Burnus	407eaad25f	Fortran/openmp: Add support for 2 argument num_teams clause Fortran part to commit r12-5146-g48d7327f2aaf65 gcc/fortran/ChangeLog: * gfortran.h (struct gfc_omp_clauses): Rename num_teams to num_teams_upper, add num_teams_upper. * dump-parse-tree.c (show_omp_clauses): Update to handle lower-bound num_teams clause. * frontend-passes.c (gfc_code_walker): Likewise * openmp.c (gfc_free_omp_clauses, gfc_match_omp_clauses, resolve_omp_clauses): Likewise. * trans-openmp.c (gfc_trans_omp_clauses, gfc_split_omp_clauses, gfc_trans_omp_target): Likewise. libgomp/ChangeLog: * testsuite/libgomp.fortran/teams-1.f90: New test.	2021-11-11 17:27:00 +01:00
Jonathan Wright	e1b218d174	aarch64: Use type-qualified builtins for vcombine_* Neon intrinsics Declare unsigned and polynomial type-qualified builtins for vcombine_* Neon intrinsics. Using these builtins removes the need for many casts in arm_neon.h. gcc/ChangeLog: 2021-11-10 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-builtins.c (TYPES_COMBINE): Delete. (TYPES_COMBINEP): Delete. * config/aarch64/aarch64-simd-builtins.def: Declare type- qualified builtins for vcombine_* intrinsics. * config/aarch64/arm_neon.h (vcombine_s8): Remove unnecessary cast. (vcombine_s16): Likewise. (vcombine_s32): Likewise. (vcombine_f32): Likewise. (vcombine_u8): Use type-qualified builtin and remove casts. (vcombine_u16): Likewise. (vcombine_u32): Likewise. (vcombine_u64): Likewise. (vcombine_p8): Likewise. (vcombine_p16): Likewise. (vcombine_p64): Likewise. (vcombine_bf16): Remove unnecessary cast. * config/aarch64/iterators.md (VD_I): New mode iterator. (VDC_P): New mode iterator.	2021-11-11 15:34:52 +00:00
Jonathan Wright	1716ddd1e9	aarch64: Use type-qualified builtins for LD1/ST1 Neon intrinsics Declare unsigned and polynomial type-qualified builtins for LD1/ST1 Neon intrinsics. Using these builtins removes the need for many casts in arm_neon.h. The new type-qualified builtins are also lowered to gimple - as the unqualified builtins are already. gcc/ChangeLog: 2021-11-10 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-builtins.c (TYPES_LOAD1_U): Define. (TYPES_LOAD1_P): Define. (TYPES_STORE1_U): Define. (TYPES_STORE1P): Rename to... (TYPES_STORE1_P): This. (get_mem_type_for_load_store): Add unsigned and poly types. (aarch64_general_gimple_fold_builtin): Add unsigned and poly type-qualified builtin declarations. * config/aarch64/aarch64-simd-builtins.def: Declare type- qualified builtins for LD1/ST1. * config/aarch64/arm_neon.h (vld1_p8): Use type-qualified builtin and remove cast. (vld1_p16): Likewise. (vld1_u8): Likewise. (vld1_u16): Likewise. (vld1_u32): Likewise. (vld1q_p8): Likewise. (vld1q_p16): Likewise. (vld1q_p64): Likewise. (vld1q_u8): Likewise. (vld1q_u16): Likewise. (vld1q_u32): Likewise. (vld1q_u64): Likewise. (vst1_p8): Likewise. (vst1_p16): Likewise. (vst1_u8): Likewise. (vst1_u16): Likewise. (vst1_u32): Likewise. (vst1q_p8): Likewise. (vst1q_p16): Likewise. (vst1q_p64): Likewise. (vst1q_u8): Likewise. (vst1q_u16): Likewise. (vst1q_u32): Likewise. (vst1q_u64): Likewise. * config/aarch64/iterators.md (VALLP_NO_DI): New iterator.	2021-11-11 15:34:51 +00:00
Jonathan Wright	6eca10aa76	aarch64: Use type-qualified builtins for ADDV Neon intrinsics Declare unsigned type-qualified builtins and use them to implement the vector reduction Neon intrinsics. This removes the need for many casts in arm_neon.h. gcc/ChangeLog: 2021-11-09 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Declare unsigned builtins for vector reduction. * config/aarch64/arm_neon.h (vaddv_u8): Use type-qualified builtin and remove casts. (vaddv_u16): Likewise. (vaddv_u32): Likewise. (vaddvq_u8): Likewise. (vaddvq_u16): Likewise. (vaddvq_u32): Likewise. (vaddvq_u64): Likewise.	2021-11-11 15:34:51 +00:00
Jonathan Wright	f341c03203	aarch64: Use type-qualified builtins for ADDP Neon intrinsics Declare unsigned type-qualified builtins and use them to implement the pairwise addition Neon intrinsics. This removes the need for many casts in arm_neon.h. gcc/ChangeLog: 2021-11-09 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: * config/aarch64/arm_neon.h (vpaddq_u8): Use type-qualified builtin and remove casts. (vpaddq_u16): Likewise. (vpaddq_u32): Likewise. (vpaddq_u64): Likewise. (vpadd_u8): Likewise. (vpadd_u16): Likewise. (vpadd_u32): Likewise. (vpaddd_u64): Likewise.	2021-11-11 15:34:51 +00:00
Jonathan Wright	80ee260d5b	aarch64: Use type-qualified builtins for [R]SUBHN[2] Neon intrinsics Declare unsigned type-qualified builtins and use them to implement (rounding) halving-narrowing-subtract Neon intrinsics. This removes the need for many casts in arm_neon.h. gcc/ChangeLog: 2021-11-09 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Declare unsigned builtins for [r]subhn[2]. * config/aarch64/arm_neon.h (vsubhn_s16): Remove unnecessary cast. (vsubhn_s32): Likewise. (vsubhn_s64): Likewise. (vsubhn_u16): Use type-qualified builtin and remove casts. (vsubhn_u32): Likewise. (vsubhn_u64): Likewise. (vrsubhn_s16): Remove unnecessary cast. (vrsubhn_s32): Likewise. (vrsubhn_s64): Likewise. (vrsubhn_u16): Use type-qualified builtin and remove casts. (vrsubhn_u32): Likewise. (vrsubhn_u64): Likewise. (vrsubhn_high_s16): Remove unnecessary cast. (vrsubhn_high_s32): Likewise. (vrsubhn_high_s64): Likewise. (vrsubhn_high_u16): Use type-qualified builtin and remove casts. (vrsubhn_high_u32): Likewise. (vrsubhn_high_u64): Likewise. (vsubhn_high_s16): Remove unnecessary cast. (vsubhn_high_s32): Likewise. (vsubhn_high_s64): Likewise. (vsubhn_high_u16): Use type-qualified builtin and remove casts. (vsubhn_high_u32): Likewise. (vsubhn_high_u64): Likewise.	2021-11-11 15:34:51 +00:00
Jonathan Wright	7bde2a6ecd	aarch64: Use type-qualified builtins for [R]ADDHN[2] Neon intrinsics Declare unsigned type-qualified builtins and use them to implement (rounding) halving-narrowing-add Neon intrinsics. This removes the need for many casts in arm_neon.h. gcc/ChangeLog: 2021-11-09 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Declare unsigned builtins for [r]addhn[2]. * config/aarch64/arm_neon.h (vaddhn_s16): Remove unnecessary cast. (vaddhn_s32): Likewise. (vaddhn_s64): Likewise. (vaddhn_u16): Use type-qualified builtin and remove casts. (vaddhn_u32): Likewise. (vaddhn_u64): Likewise. (vraddhn_s16): Remove unnecessary cast. (vraddhn_s32): Likewise. (vraddhn_s64): Likewise. (vraddhn_u16): Use type-qualified builtin and remove casts. (vraddhn_u32): Likewise. (vraddhn_u64): Likewise. (vaddhn_high_s16): Remove unnecessary cast. (vaddhn_high_s32): Likewise. (vaddhn_high_s64): Likewise. (vaddhn_high_u16): Use type-qualified builtin and remove casts. (vaddhn_high_u32): Likewise. (vaddhn_high_u64): Likewise. (vraddhn_high_s16): Remove unnecessary cast. (vraddhn_high_s32): Likewise. (vraddhn_high_s64): Likewise. (vraddhn_high_u16): Use type-qualified builtin and remove casts. (vraddhn_high_u32): Likewise. (vraddhn_high_u64): Likewise.	2021-11-11 15:34:51 +00:00
Jonathan Wright	aa11d95bea	aarch64: Use type-qualified builtins for UHSUB Neon intrinsics Declare unsigned type-qualified builtins and use them to implement halving-subtract Neon intrinsics. This removes the need for many casts in arm_neon.h. gcc/ChangeLog: 2021-11-09 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Use BINOPU type qualifiers in generator macros for uhsub builtins. * config/aarch64/arm_neon.h (vhsub_s8): Remove unnecessary cast. (vhsub_s16): Likewise. (vhsub_s32): Likewise. (vhsub_u8): Use type-qualified builtin and remove casts. (vhsub_u16): Likewise. (vhsub_u32): Likewise. (vhsubq_s8): Remove unnecessary cast. (vhsubq_s16): Likewise. (vhsubq_s32): Likewise. (vhsubq_u8): Use type-qualified builtin and remove casts. (vhsubq_u16): Likewise. (vhsubq_u32): Likewise.	2021-11-11 15:34:50 +00:00
Jonathan Wright	3e35924cf1	aarch64: Use type-qualified builtins for U[R]HADD Neon intrinsics Declare unsigned type-qualified builtins and use them to implement (rounding) halving-add Neon intrinsics. This removes the need for many casts in arm_neon.h. gcc/ChangeLog: 2021-11-09 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Use BINOPU type qualifiers in generator macros for u[r]hadd builtins. * config/aarch64/arm_neon.h (vhadd_s8): Remove unnecessary cast. (vhadd_s16): Likewise. (vhadd_s32): Likewise. (vhadd_u8): Use type-qualified builtin and remove casts. (vhadd_u16): Likewise. (vhadd_u32): Likewise. (vhaddq_s8): Remove unnecessary cast. (vhaddq_s16): Likewise. (vhaddq_s32): Likewise. (vhaddq_u8): Use type-qualified builtin and remove casts. (vhaddq_u16): Likewise. (vhaddq_u32): Likewise. (vrhadd_s8): Remove unnecessary cast. (vrhadd_s16): Likewise. (vrhadd_s32): Likewise. (vrhadd_u8): Use type-qualified builtin and remove casts. (vrhadd_u16): Likewise. (vrhadd_u32): Likewise. (vrhaddq_s8): Remove unnecessary cast. (vrhaddq_s16): Likewise. (vrhaddq_s32): Likewise. (vrhaddq_u8): Use type-wualified builtin and remove casts. (vrhaddq_u16): Likewise. (vrhaddq_u32): Likewise.	2021-11-11 15:34:50 +00:00
Jonathan Wright	ee03bed0b0	aarch64: Use type-qualified builtins for USUB[LW][2] Neon intrinsics Declare unsigned type-qualified builtins and use them to implement widening-subtract Neon intrinsics. This removes the need for many casts in arm_neon.h. gcc/ChangeLog: 2021-11-09 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Use BINOPU type qualifiers in generator macros for usub[lw][2] builtins. * config/aarch64/arm_neon.h (vsubl_s8): Remove unnecessary cast. (vsubl_s16): Likewise. (vsubl_s32): Likewise. (vsubl_u8): Use type-qualified builtin and remove casts. (vsubl_u16): Likewise. (vsubl_u32): Likewise. (vsubl_high_s8): Remove unnecessary cast. (vsubl_high_s16): Likewise. (vsubl_high_s32): Likewise. (vsubl_high_u8): Use type-qualified builtin and remove casts. (vsubl_high_u16): Likewise. (vsubl_high_u32): Likewise. (vsubw_s8): Remove unnecessary casts. (vsubw_s16): Likewise. (vsubw_s32): Likewise. (vsubw_u8): Use type-qualified builtin and remove casts. (vsubw_u16): Likewise. (vsubw_u32): Likewise. (vsubw_high_s8): Remove unnecessary cast. (vsubw_high_s16): Likewise. (vsubw_high_s32): Likewise. (vsubw_high_u8): Use type-qualified builtin and remove casts. (vsubw_high_u16): Likewise. (vsubw_high_u32): Likewise.	2021-11-11 15:34:50 +00:00
Jonathan Wright	10e98c3c63	aarch64: Use type-qualified builtins for UADD[LW][2] Neon intrinsics Declare unsigned type-qualified builtins and use them to implement widening-add Neon intrinsics. This removes the need for many casts in arm_neon.h. gcc/ChangeLog: 2021-11-09 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Use BINOPU type qualifiers in generator macros for uadd[lw][2] builtins. * config/aarch64/arm_neon.h (vaddl_s8): Remove unnecessary cast. (vaddl_s16): Likewise. (vaddl_s32): Likewise. (vaddl_u8): Use type-qualified builtin and remove casts. (vaddl_u16): Likewise. (vaddl_u32): Likewise. (vaddl_high_s8): Remove unnecessary cast. (vaddl_high_s16): Likewise. (vaddl_high_s32): Likewise. (vaddl_high_u8): Use type-qualified builtin and remove casts. (vaddl_high_u16): Likewise. (vaddl_high_u32): Likewise. (vaddw_s8): Remove unnecessary cast. (vaddw_s16): Likewise. (vaddw_s32): Likewise. (vaddw_u8): Use type-qualified builtin and remove casts. (vaddw_u16): Likewise. (vaddw_u32): Likewise. (vaddw_high_s8): Remove unnecessary cast. (vaddw_high_s16): Likewise. (vaddw_high_s32): Likewise. (vaddw_high_u8): Use type-qualified builtin and remove casts. (vaddw_high_u16): Likewise. (vaddw_high_u32): Likewise.	2021-11-11 15:34:50 +00:00
Jonathan Wright	a22c03d439	aarch64: Use type-qualified builtins for [R]SHRN[2] Neon intrinsics Declare unsigned type-qualified builtins and use them for [R]SHRN[2] Neon intrinsics. This removes the need for casts in arm_neon.h. gcc/ChangeLog: 2021-11-08 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Declare type- qualified builtins for [R]SHRN[2]. * config/aarch64/arm_neon.h (vshrn_n_u16): Use type-qualified builtin and remove casts. (vshrn_n_u32): Likewise. (vshrn_n_u64): Likewise. (vrshrn_high_n_u16): Likewise. (vrshrn_high_n_u32): Likewise. (vrshrn_high_n_u64): Likewise. (vrshrn_n_u16): Likewise. (vrshrn_n_u32): Likewise. (vrshrn_n_u64): Likewise. (vshrn_high_n_u16): Likewise. (vshrn_high_n_u32): Likewise. (vshrn_high_n_u64): Likewise.	2021-11-11 15:34:50 +00:00
Jonathan Wright	439906c61d	aarch64: Use type-qualified builtins for XTN[2] Neon intrinsics Declare unsigned type-qualified builtins and use them for XTN[2] Neon intrinsics. This removes the need for casts in arm_neon.h. gcc/ChangeLog: 2021-11-08 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Declare unsigned type-qualified builtins for XTN[2]. * config/aarch64/arm_neon.h (vmovn_high_u16): Use type- qualified builtin and remove casts. (vmovn_high_u32): Likewise. (vmovn_high_u64): Likewise. (vmovn_u16): Likewise. (vmovn_u32): Likewise. (vmovn_u64): Likewise.	2021-11-11 15:34:49 +00:00
Jonathan Wright	a2590b545e	aarch64: Use type-qualified builtins for PMUL[L] Neon intrinsics Declare poly type-qualified builtins and use them for PMUL[L] Neon intrinsics. This removes the need for casts in arm_neon.h. gcc/ChangeLog: 2021-11-08 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Use poly type qualifier in builtin generator macros. * config/aarch64/arm_neon.h (vmul_p8): Use type-qualified builtin and remove casts. (vmulq_p8): Likewise. (vmull_high_p8): Likewise. (vmull_p8): Likewise.	2021-11-11 15:34:49 +00:00
Jonathan Wright	515ef83098	aarch64: Use type-qualified builtins for unsigned MLA/MLS intrinsics Declare type-qualified builtins and use them for MLA/MLS Neon intrinsics that operate on unsigned types. This eliminates lots of casts in arm_neon.h. gcc/ChangeLog: 2021-11-08 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Declare type- qualified builtin generators for unsigned MLA/MLS intrinsics. * config/aarch64/arm_neon.h (vmla_n_u16): Use type-qualified builtin. (vmla_n_u32): Likewise. (vmla_u8): Likewise. (vmla_u16): Likewise. (vmla_u32): Likewise. (vmlaq_n_u16): Likewise. (vmlaq_n_u32): Likewise. (vmlaq_u8): Likewise. (vmlaq_u16): Likewise. (vmlaq_u32): Likewise. (vmls_n_u16): Likewise. (vmls_n_u32): Likewise. (vmls_u8): Likewise. (vmls_u16): Likewise. (vmls_u32): Likewise. (vmlsq_n_u16): Likewise. (vmlsq_n_u32): Likewise. (vmlsq_u8): Likewise. (vmlsq_u16): Likewise. (vmlsq_u32): Likewise.	2021-11-11 15:34:49 +00:00
Raphael Moreira Zinsly	8d71d3a317	libgcc: Fix backtrace fallback on PowerPC Big-endian At the end of the backtrace stream _Unwind_Find_FDE() may not be able to find the frame unwind info and will later call the backtrace fallback instead of finishing. This occurs when using an old libc on ppc64 due to dl_iterate_phdr() not being able to set the fde in the last trace. When this occurs the cfa of the trace will be behind of context's cfa. Also, libgo’s probestackmaps() calls the backtrace with a null pointer and can get to the backchain fallback with the same problem, in this case we are only interested in find a stack map, we don't need nor can do a backchain. _Unwind_ForcedUnwind_Phase2() can hit the same issue as it uses uw_frame_state_for(), so we need to treat _URC_NORMAL_STOP. libgcc/ChangeLog: PR libgcc/103044 * config/rs6000/linux-unwind.h (ppc_backchain_fallback): Check if it's called with a null argument or at the end of the backtrace and return. * unwind.inc (_Unwind_ForcedUnwind_Phase2): Treat _URC_NORMAL_STOP.	2021-11-11 15:29:25 +00:00
Jan Hubicka	8d3abf42d5	Fix some side cases of side effects discovery I wrote script comparing modref pure/const discovery with ipa-pure-const and found mistakes on both ends. This plugs the modref differences in handling looping pure consts which were previously missed due to early exits on ECF_CONST \| ECF_PURE. Those early exists are bit anoying and I think as a cleanup I may just drop some of them as premature optimizations coming from time modref was very simplistic on what it propagates. gcc/ChangeLog: 2021-11-11 Jan Hubicka <hubicka@ucw.cz> * ipa-modref.c (modref_summary::useful_p): Check also for side-effects with looping const/pure. (modref_summary_lto::useful_p): Likewise. (merge_call_side_effects): Merge side effects before early exit for pure/const. (process_fnspec): Also handle pure functions. (analyze_call): Do not early exit on looping pure const. (propagate_unknown_call): Also handle nontrivial SCC as side-effect. (modref_propagate_in_scc): Update.	2021-11-11 16:07:47 +01:00
Richard Biener	fac4c4bdab	tree-optimization/103190 - fix assert in reassoc stmt placement with asm This makes sure to only assert we don't run into a asm goto when inserting a stmt in reassoc, matching the condition in can_reassociate_p. We can handle EH edges from an asm just like EH edges from any other stmt. 2021-11-11 Richard Biener <rguenther@suse.de> PR tree-optimization/103190 * tree-ssa-reassoc.c (insert_stmt_after): Only assert on asm goto.	2021-11-11 16:06:24 +01:00
Aldy Hernandez	bfa04d0ec9	Move import population from threader to path solver. Imports are our nomenclature for external SSA names to a block that are used to calculate the outgoing edges for said block. For example, in the following snippet: <bb 2> : _1 = b_10 == block_11; _2 = b_10 != -1; _3 = _1 & _2; if (_3 != 0) goto <bb 3>; [INV] else goto <bb 5>; [INV] ...the imports to the block are b_10 and block_11 since they are both needed to calculate _3. The path solver takes a bitmap of imports in addition to the path itself. This sets up the number of SSA names to be on the lookout for, while resolving the final conditional. Calculating these imports was initially done in the threader, since it was the only user of the path solver. With new clients, it has become obvious that populating the imports should be a task for the path solver, so it can be shared among the clients. This patch moves the import code to the solver, making both the solver and the threader simpler in the process. This is because intent is clearer and some duplicate code was removed. This reshuffling had the net effect of giving us a handful of new threads through my suite of .ii files (125). This was unexpected, but welcome nevertheless. There is no performance difference in callgrind over the same suite. Regstrapped on x86-64 Linux. gcc/ChangeLog: * gimple-range-path.cc (path_range_query::add_copies_to_imports): Rename to... (path_range_query::compute_imports): ...this. Adapt it so it can be passed the imports bitmap instead of working on m_imports. (path_range_query::compute_ranges): Call compute_imports in all cases unless an imports bitmap is passed. * gimple-range-path.h (path_range_query::compute_imports): New. (path_range_query::add_copies_to_imports): Remove. * tree-ssa-threadbackward.c (back_threader::resolve_def): Remove. (back_threader::find_paths_to_names): Inline resolve_def. (back_threader::find_paths): Call compute_imports. (back_threader::resolve_phi): Adjust comment.	2021-11-11 15:42:00 +01:00
Sandra Loosemore	1ea781a865	Testsuite: Various fixes for nios2. 2021-11-11 Sandra Loosemore <sandra@codesourcery.com> gcc/testsuite/ * g++.dg/warn/Wmismatched-new-delete-5.C: Add -fdelete-null-pointer-checks. * gcc.dg/attr-returns-nonnull.c: Likewise. * gcc.dg/debug/btf/btf-datasec-1.c: Add -G0 option for nios2. * gcc.dg/ifcvt-4.c: Skip on nios2. * gcc.dg/struct-by-value-1.c: Add -G0 option for nios2.	2021-11-11 06:38:58 -08:00
Richard Biener	8865133614	tree-optimization/103188 - avoid running ranger on not-up-to-date SSA The following splits loop header copying into an analysis phase that uses ranger and a transform phase that can do without to avoid running ranger on IL that has SSA form not updated. 2021-11-11 Richard Biener <rguenther@suse.de> PR tree-optimization/103188 * tree-ssa-loop-ch.c (should_duplicate_loop_header_p): Remove query parameter, split out check for size optimization. (ch_base::m_ranger, cb_base::m_query): Remove. (ch_base::copy_headers): Split processing loop into analysis around which we allocate and use ranger and transform where we do not. (pass_ch::execute): Do not allocate/free ranger here. (pass_ch_vect::execute): Likewise. * gcc.dg/torture/pr103188.c: New testcase.	2021-11-11 15:01:26 +01:00
Jan Hubicka	6e30c48120	Fix recursion discovery in ipa-pure-const We make self recursive functions as looping of fear of endless recursion. This is done correctly for local pure/const and for non-trivial SCCs in callgraph, but for trivial SCCs we miss the flag. I think it is bad decision since infinite recursion will run out of stack, but changing it upsets some testcases and should be done independently. So this patch is fixing current behaviour to be consistent. gcc/ChangeLog: 2021-11-11 Jan Hubicka <hubicka@ucw.cz> * ipa-pure-const.c (propagate_pure_const): Self recursion is a side effects.	2021-11-11 14:39:19 +01:00
Jan Hubicka	61396dfb2a	Fix noreturn discovery. Fix ipa-pure-const handling of noreturn flags. It is not safe to set it for interposable symbols and we should also set it for aliases (just like we do for other flags). This patch merely copies other flag handling and implements it here. gcc/ChangeLog: 2021-11-11 Jan Hubicka <hubicka@ucw.cz> * cgraph.c (set_noreturn_flag_1): New function. (cgraph_node::set_noreturn_flag): New member function * cgraph.h (cgraph_node::set_noreturn_flags): Declare. * ipa-pure-const.c (pass_local_pure_const::execute): Use it.	2021-11-11 14:35:10 +01:00
Patrick Palka	e106221db2	c++: use auto_vec in cp_parser_template_argument_list gcc/cp/ChangeLog: * parser.c (cp_parser_template_argument_list): Use auto_vec instead of manual memory management.	2021-11-11 08:10:20 -05:00
Jakub Jelinek	fa4fcb111a	libgomp: Use TLS storage for omp_get_num_teams()/omp_get_team_num() values When thinking about GOMP_teams3, I've realized that using global variables for the values returned by omp_get_num_teams()/omp_get_team_num() calls is incorrect even with our right now dumb way of implementing host teams. The problems are two, one is if host teams is used from multiple pthread_create created threads - the spec says that host teams can't be nested inside of explicit parallel or other teams constructs, but with pthread_create the standard says obviously nothing about it. Another more important thing is host fallback, right now we don't do anything for omp_get_num_teams() or omp_get_team_num() which was fine before host teams was introduced and the 5.1 requirement that num_teams clause specifies minimum of teams, but with the global vars it means inside of target teams num_teams (2) we happily return omp_get_num_teams() == 4 if the target teams is inside of host teams with num_teams(4). With target fallback being invoked from parallel regions global vars simply can't work right on the host. So, this patch moves them to struct gomp_thread and propagates those for parallel to child threads. For host fallback, the implicit zeroing of thr results in us returning omp_get_num_teams () == 1 and omp_get_team_num () == 0 which is fine for target teams without num_teams clause, for target teams with num_teams clause something to work on and for target without teams nested in it I've asked on omp-lang what should be done. 2021-11-11 Jakub Jelinek <jakub@redhat.com> libgomp.h (struct gomp_thread): Add num_teams and team_num members. * team.c (struct gomp_thread_start_data): Likewise. (gomp_thread_start): Initialize thr->num_teams and thr->team_num. (gomp_team_start): Initialize start_data->num_teams and start_data->team_num. Update nthr->num_teams and nthr->team_num. * teams.c (gomp_num_teams, gomp_team_num): Remove. (GOMP_teams_reg): Set and restore thr->num_teams and thr->team_num instead of gomp_num_teams and gomp_team_num. (omp_get_num_teams): Use thr->num_teams + 1 instead of gomp_num_teams. (omp_get_team_num): Use thr->team_num instead of gomp_team_num. * testsuite/libgomp.c/teams-4.c: New test.	2021-11-11 13:57:31 +01:00
Aldy Hernandez	3e5a190533	Resolve entry loop condition for the edge remaining in the loop. There is a known failure for gfortran.dg/vector_subscript_1.f90. It was previously failing for all optimization levels except -Os. Getting the loop header copying right, now makes it fail for all levels :-). Tested on x86-64 Linux. Co-authored-by: Richard Biener <rguenther@suse.de> gcc/ChangeLog: * tree-ssa-loop-ch.c (entry_loop_condition_is_static): Resolve statically to the edge remaining in the loop.	2021-11-11 13:17:32 +01:00
Richard Biener	a5fed4063f	middle-end/103181 - fix operation_could_trap_p for vector division For integer vector division we only checked for all zero vector constants rather than checking whether any element in the constant vector is zero. 2021-11-11 Richard Biener <rguenther@suse.de> PR middle-end/103181 * tree-eh.c (operation_could_trap_helper_p): Properly check vector constants for a zero element for integer division. Separate floating point and integer division code. * gcc.dg/torture/pr103181.c: New testcase.	2021-11-11 10:32:51 +01:00
Jakub Jelinek	10db757301	dwarf2out: Fix up field_byte_offset [PR101378] For PCC_BITFIELD_TYPE_MATTERS field_byte_offset has quite large code to deal with it since many years ago (see it e.g. in GCC 3.2, although it used to be on HOST_WIDE_INTs, then on double_ints, now on offset_ints). But that code apparently isn't able to cope with members with empty class types with [[no_unique_address]] attribute, because the empty classes have non-zero type size but zero decl size and so one can end up from the computation with negative offset or offset 1 byte smaller than it should be. For !PCC_BITFIELD_TYPE_MATTERS, we just use tree_result = byte_position (decl); which seems exactly right even for the empty classes or anything which is not a bitfield (and for which we don't add DW_AT_bit_offset attribute). So, instead of trying to handle those no_unique_address members in the current already very complicated code, this limits it to bitfields. stor-layout.c PCC_BITFIELD_TYPE_MATTERS handling also affects only bitfields, twice it checks DECL_BIT_FIELD and once DECL_BIT_FIELD_TYPE. As discussed, this patch uses DECL_BIT_FIELD_TYPE check, because DECL_BIT_FIELD might be cleared for some bitfields with bitsizes multiple of BITS_PER_UNIT and e.g. struct S { int e; int a : 1, b : 7, c : 8, d : 16; } s; struct T { int a : 1, b : 7; long long c : 8; int d : 16; } t; int main () { s.c = 0x55; s.d = 0xaaaa; t.c = 0x55; t.d = 0xaaaa; s.e++; } has different debug info with DECL_BIT_FIELD check. 2021-11-11 Jakub Jelinek <jakub@redhat.com> PR debug/101378 * dwarf2out.c (field_byte_offset): Do the PCC_BITFIELD_TYPE_MATTERS handling only for DECL_BIT_FIELD_TYPE decls. * g++.dg/debug/dwarf2/pr101378.C: New test.	2021-11-11 10:16:45 +01:00
Prathamesh Kulkarni	145be5efaf	[aarch64] PR102376 - Emit better diagnostic for arch extensions in target attr. gcc/ChangeLog: PR target/102376 * config/aarch64/aarch64.c (aarch64_process_target_attr): Check if token is arch extension without leading '+' and emit appropriate diagnostic for the same. gcc/testsuite/ChangeLog: PR target/102376 * gcc.target/aarch64/pr102376.c: New test.	2021-11-11 14:40:21 +05:30
Jakub Jelinek	48d7327f2a	openmp: Add support for 2 argument num_teams clause In OpenMP 5.1, num_teams clause can accept either one expression as before, but it in that case changed meaning, rather than create <= expression teams it is now create == expression teams. Or it accepts two expressions separated by :, with the meaning that the first is low bound and second upper bound on how many teams should be created. The other ways to set number of teams are upper bounds with lower bound of 1. The following patch does parsing of this for C/C++. For host teams, we actually don't need to do anything further right now, we always create (pretend to create) exactly the requested number of teams, so we can just evaluate and throw away the lower bound for now. For teams nested in target, we don't guarantee that though and further work will be needed. In particular, omplower now turns the teams part of: struct S { S (); S (const S &); ~S (); int s; }; void bar (S &, S &); int baz (); _Pragma ("omp declare target to (baz)"); void foo (void) { S a, b; #pragma omp target private (a) map (b) { #pragma omp teams firstprivate (b) num_teams (baz ()) { bar (a, b); } } } into: retval.0 = baz (); retval.1 = retval.0; { unsigned int retval.3; struct S * D.2549; struct S b; retval.3 = (unsigned int) retval.1; D.2549 = .omp_data_i->b; S::S (&b, D.2549); #pragma omp teams num_teams(retval.1) firstprivate(b) shared(a) __builtin_GOMP_teams (retval.3, 0); { bar (&a, &b); } S::~S (&b); #pragma omp return(nowait) } IMHO we want a new API, say GOMP_teams3 which will take 3 arguments instead of 2 (the lower and upper bounds from num_teams and thread_limit) and will return a bool whether it should do the teams body or not. And, we should add right before outermost {} above while (__builtin_GOMP_teams3 ((unsigned) retval.1, (unsigned) retval.1, 0)) and remove the __builtin_GOMP_teams call. The current function performs exit equivalent (at least on NVPTX) which seems bad because that means the destructors of e.g. private variables on target aren't invoked, and at the current placement neither destructors of the already constructed privatized variables in teams. I'll do this next on the compiler side, but I'm afraid I'll need help with the nvptx and amdgcn implementations. E.g. for nvptx, we won't be able to use %ctaid.x . I think ideal would be to use a .shared integer variable for the omp_get_team_num value, but I don't have any experience with that, are .shared variables zero initialized by default, or do they have random value at start? PTX docs say they aren't initializable. 2021-11-11 Jakub Jelinek <jakub@redhat.com> gcc/ * tree.h (OMP_CLAUSE_NUM_TEAMS_EXPR): Rename to ... (OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR): ... this. (OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR): Define. * tree.c (omp_clause_num_ops): Increase num ops for OMP_CLAUSE_NUM_TEAMS to 2. * tree-pretty-print.c (dump_omp_clause): Print optional lower bound for OMP_CLAUSE_NUM_TEAMS. * gimplify.c (gimplify_scan_omp_clauses): Gimplify OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR if non-NULL. (optimize_target_teams): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR. Handle OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR. * omp-low.c (lower_omp_teams): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR. * omp-expand.c (expand_teams_call, get_target_arguments): Likewise. gcc/c/ * c-parser.c (c_parser_omp_clause_num_teams): Parse optional lower-bound and store it into OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR. Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR. (c_parser_omp_target): For OMP_CLAUSE_NUM_TEAMS evaluate before combined target teams even lower-bound expression. gcc/cp/ * parser.c (cp_parser_omp_clause_num_teams): Parse optional lower-bound and store it into OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR. Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR. (cp_parser_omp_target): For OMP_CLAUSE_NUM_TEAMS evaluate before combined target teams even lower-bound expression. * semantics.c (finish_omp_clauses): Handle OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR of OMP_CLAUSE_NUM_TEAMS clause. * pt.c (tsubst_omp_clauses): Likewise. (tsubst_expr): For OMP_CLAUSE_NUM_TEAMS evaluate before combined target teams even lower-bound expression. gcc/fortran/ * trans-openmp.c (gfc_trans_omp_clauses): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR. gcc/testsuite/ * c-c++-common/gomp/clauses-1.c (bar): Supply lower-bound expression to half of the num_teams clauses. * c-c++-common/gomp/num-teams-1.c: New test. * c-c++-common/gomp/num-teams-2.c: New test. * g++.dg/gomp/attrs-1.C (bar): Supply lower-bound expression to half of the num_teams clauses. * g++.dg/gomp/attrs-2.C (bar): Likewise. * g++.dg/gomp/num-teams-1.C: New test. * g++.dg/gomp/num-teams-2.C: New test. libgomp/ * testsuite/libgomp.c-c++-common/teams-1.c: New test.	2021-11-11 09:42:47 +01:00
Richard Biener	0136f25ac0	Remove find_pdom and find_dom This removes now useless wrappers around get_immediate_dominator. 2021-11-11 Richard Biener <rguenther@suse.de> * cfganal.c (find_pdom): Remove. (control_dependences::find_control_dependence): Remove special-casing of entry block, call get_immediate_dominator directly. * gimple-predicate-analysis.cc (find_pdom): Remove. (find_dom): Likewise. (find_control_equiv_block): Call get_immediate_dominator directly. (compute_control_dep_chain): Likewise. (predicate::init_from_phi_def): Likewise.	2021-11-11 09:20:15 +01:00
Richard Biener	a11afa7af8	Apply TLC to control dependence compute This makes the control dependence compute avoid a find_edge and optimizes allocation by embedding the bitmap head into the vector of control dependences instead of allocating all of them. It also uses a local bitmap obstack. The bitmap changes make it necessary to shuffle some includes. 2021-11-10 Richard Biener <rguenther@suse.de> * cfganal.h (control_dependences::control_dependence_map): Embed bitmap_head. (control_dependences::m_bitmaps): New. * cfganal.c (control_dependences::set_control_dependence_map_bit): Adjust. (control_dependences::clear_control_dependence_bitmap): Likewise. (control_dependences::find_control_dependence): Do not find_edge for the abnormal edge test. (control_dependences::control_dependences): Instead do not add abnormal edges to the edge list. Adjust. (control_dependences::~control_dependences): Likewise. (control_dependences::get_edges_dependent_on): Likewise. * function-tests.c: Include bitmap.h. gcc/analyzer/ * supergraph.cc: Include bitmap.h. gcc/c/ * gimple-parser.c: Shuffle bitmap.h include.	2021-11-11 09:19:49 +01:00

1 2 3 4 5 ...

189602 Commits