This replaces the printf used by failed debug assertions with fprintf,
so we can write to stderr.
To avoid including <stdio.h> the assert function is moved into the
library. To avoid programs using a vague linkage definition of the old
inline function, the function is renamed. Code compiled with old
versions of GCC might still call the old function, but code compiled
with the newer GCC will call the new function and write to stderr.
libstdc++-v3/ChangeLog:
PR libstdc++/59675
* acinclude.m4 (libtool_VERSION): Bump version.
* config/abi/pre/gnu.ver (GLIBCXX_3.4.30): Add version and
export new symbol.
* configure: Regenerate.
* include/bits/c++config (__replacement_assert): Remove, declare
__glibcxx_assert_fail instead.
* src/c++11/debug.cc (__glibcxx_assert_fail): New function to
replace __replacement_assert, writing to stderr instead of
stdout.
* testsuite/util/testsuite_abi.cc: Update latest version.
The KIND argument of the INDEX intrinsic is a compile time constant
that is used at compile time only to resolve to a kind-specific library
function. That argument is otherwise completely ignored at runtime, and there is
no code generated for it as the library procedure has no kind argument.
This confuses the scalarizer which expects to see every argument
of elemental functions used when calling a procedure.
This change removes the argument from the scalarization lists
at the beginning of the scalarization process, so that the argument
is completely ignored.
This also reverts the existing workaround
(commit d09847357b except for its testcase).
PR fortran/97896
gcc/fortran/ChangeLog:
* intrinsic.c (add_sym_4ind): Remove.
(add_functions): Use add_sym4 instead of add_sym4ind.
Don’t special case the index intrinsic.
* iresolve.c (gfc_resolve_index_func): Use the individual arguments
directly instead of the full argument list.
* intrinsic.h (gfc_resolve_index_func): Update the declaration
accordingly.
* trans-decl.c (gfc_get_extern_function_decl): Don’t modify the
list of arguments in the case of the index intrinsic.
* trans-array.h (gfc_get_intrinsic_for_expr,
gfc_get_proc_ifc_for_expr): New.
* trans-array.c (gfc_get_intrinsic_for_expr,
arg_evaluated_for_scalarization): New.
(gfc_walk_elemental_function_args): Add intrinsic procedure
as argument. Count arguments. Check arg_evaluated_for_scalarization.
* trans-intrinsic.c (gfc_walk_intrinsic_function): Update call.
* trans-stmt.c (get_intrinsic_for_code): New.
(gfc_trans_call): Update call.
gcc/testsuite/ChangeLog:
* gfortran.dg/index_5.f90: New.
The following patch implements what I've been talking about earlier,
honor that for explicit num_teams clause we create at least the
lower-bound (if not specified, upper-bound) teams in the league.
For host fallback, it still means we only have one thread doing all the
teams, sequentially one after another.
For PTX and GCN, I think the new teams-2.c test and maybe teams-4.c too
will or might fail.
For these offloads, I think it is ok to remove symbols no longer used
from libgomp.a.
If num_teams_lower is bigger than the provided num_blocks or num_workgroups,
we should arrange for gomp_num_teams_var to be num_teams_lower - 1,
stop using the %ctaid.x or __builtin_gcn_dim_pos (0) for omp_get_team_num ()
and instead use for it some .shared var that GOMP_teams4 initializes to
%ctaid.x or __builtin_gcn_dim_pos (0) when first and for !first
increment that by num_blocks or num_workgroups each time and only
return false when we are above num_teams_lower.
Any help with actually implementing this for the 2 architectures highly
appreciated.
2021-11-12 Jakub Jelinek <jakub@redhat.com>
gcc/
* omp-builtins.def (BUILT_IN_GOMP_TEAMS): Remove.
(BUILT_IN_GOMP_TEAMS4): New.
* builtin-types.def (BT_FN_VOID_UINT_UINT): Remove.
(BT_FN_BOOL_UINT_UINT_UINT_BOOL): New.
* omp-low.c (lower_omp_teams): Use GOMP_teams4 instead of
GOMP_teams, pass to it also num_teams lower-bound expression
or a dup of upper-bound if it is missing and a flag whether
it is the first call or not.
gcc/fortran/
* types.def (BT_FN_VOID_UINT_UINT): Remove.
(BT_FN_BOOL_UINT_UINT_UINT_BOOL): New.
libgomp/
* libgomp_g.h (GOMP_teams4): Declare.
* libgomp.map (GOMP_5.1): Export GOMP_teams4.
* target.c (GOMP_teams4): New function.
* config/nvptx/target.c (GOMP_teams): Remove.
(GOMP_teams4): New function.
* config/gcn/target.c (GOMP_teams): Remove.
(GOMP_teams4): New function.
* testsuite/libgomp.c/teams-4.c (main): Expect exactly 2
teams instead of <= 2.
* testsuite/libgomp.c-c++-common/teams-2.c: New test.
The following fixes a missed valueization when simplifying
a MEM[&...] combination during valueization.
2021-11-12 Richard Biener <rguenther@suse.de>
PR tree-optimization/103204
* tree-ssa-sccvn.c (valueize_refs_1): Re-valueize the
top operand after folding in an address.
* gcc.dg/torture/pr103204.c: New testcase.
The idea is for opcodes to be able to see whether bfd is compiled
for 64-bit. A lot of --enable-targets=all libopcodes is wasted space
if bfd can't load 64-bit target object files.
* Makefile.def (configure-opcodes): Depend on configure-bfd.
* Makefile.in: Regenerate.
This implements P1004R2 ("Making std::vector constexpr") for C++20.
For now, debug mode vectors are not supported in constant expressions.
To make that work we might need to disable all attaching/detaching of
safe iterators. That can be fixed later.
Co-authored-by: Josh Marshall <joshua.r.marshall.1991@gmail.com>
libstdc++-v3/ChangeLog:
* include/bits/alloc_traits.h (_Destroy): Make constexpr for
C++20 mode.
* include/bits/allocator.h (__shrink_to_fit::_S_do_it):
Likewise.
* include/bits/stl_algobase.h (__fill_a1): Declare _Bit_iterator
overload constexpr for C++20.
* include/bits/stl_bvector.h (_Bit_type, _S_word_bit): Move out
of inline namespace.
(_Bit_reference, _Bit_iterator_base, _Bit_iterator)
(_Bit_const_iterator, _Bvector_impl_data, _Bvector_base)
(vector<bool, A>>): Add constexpr to every member function.
(_Bvector_base::_M_allocate): Initialize storage during constant
evaluation.
(vector<bool, A>::_M_initialize_value): Use __fill_bvector_n
instead of memset.
(__fill_bvector_n): New helper function to replace memset during
constant evaluation.
* include/bits/stl_uninitialized.h (__uninitialized_copy<false>):
Move logic to ...
(__do_uninit_copy): New function.
(__uninitialized_fill<false>): Move logic to ...
(__do_uninit_fill): New function.
(__uninitialized_fill_n<false>): Move logic to ...
(__do_uninit_fill_n): New function.
(__uninitialized_copy_a): Add constexpr. Use __do_uninit_copy.
(__uninitialized_move_a, __uninitialized_move_if_noexcept_a):
Add constexpr.
(__uninitialized_fill_a): Add constexpr. Use __do_uninit_fill.
(__uninitialized_fill_n_a): Add constexpr. Use
__do_uninit_fill_n.
(__uninitialized_default_n, __uninitialized_default_n_a)
(__relocate_a_1, __relocate_a): Add constexpr.
* include/bits/stl_vector.h (_Vector_impl_data, _Vector_impl)
(_Vector_base, vector): Add constexpr to every member function.
(_Vector_impl::_S_adjust): Disable ASan annotation during
constant evaluation.
(_Vector_base::_S_use_relocate): Disable bitwise-relocation
during constant evaluation.
(vector::_Temporary_value): Use a union for storage.
* include/bits/vector.tcc (vector, vector<bool>): Add constexpr
to every member function.
* include/std/vector (erase_if, erase): Add constexpr.
* testsuite/23_containers/headers/vector/synopsis.cc: Add
constexpr for C++20 mode.
* testsuite/23_containers/vector/bool/cmp_c++20.cc: Change to
compile-only test using constant expressions.
* testsuite/23_containers/vector/bool/capacity/29134.cc: Adjust
namespace for _S_word_bit.
* testsuite/23_containers/vector/bool/modifiers/insert/31370.cc:
Likewise.
* testsuite/23_containers/vector/cmp_c++20.cc: Likewise.
* testsuite/23_containers/vector/cons/89164.cc: Adjust errors
for C++20 and move C++17 test to ...
* testsuite/23_containers/vector/cons/89164_c++17.cc: ... here.
* testsuite/23_containers/vector/bool/capacity/constexpr.cc: New test.
* testsuite/23_containers/vector/bool/cons/constexpr.cc: New test.
* testsuite/23_containers/vector/bool/element_access/constexpr.cc: New test.
* testsuite/23_containers/vector/bool/modifiers/assign/constexpr.cc: New test.
* testsuite/23_containers/vector/bool/modifiers/constexpr.cc: New test.
* testsuite/23_containers/vector/bool/modifiers/swap/constexpr.cc: New test.
* testsuite/23_containers/vector/capacity/constexpr.cc: New test.
* testsuite/23_containers/vector/cons/constexpr.cc: New test.
* testsuite/23_containers/vector/data_access/constexpr.cc: New test.
* testsuite/23_containers/vector/element_access/constexpr.cc: New test.
* testsuite/23_containers/vector/modifiers/assign/constexpr.cc: New test.
* testsuite/23_containers/vector/modifiers/constexpr.cc: New test.
* testsuite/23_containers/vector/modifiers/swap/constexpr.cc: New test.
Since r12-5072 made _Safe_container::operator=(const _Safe_container&)
protected, the debug containers no longer compile in C++98 mode. They
have user-provided copy assignment operators in C++98 mode, and they
assign each base class in turn. The 'this->_M_safe() = __x' expressions
fail, because calling a protected member function is only allowed via
'this'. They could be fixed by using this->_Safe::operator=(__x) but a
simpler solution is to just remove the user-provided assignment
operators and let the compiler define them (as we do for C++11 and
later, by defining them as defaulted).
The only change needed for that to work is to define the _Safe_vector
copy assignment operator in C++98 mode, so that the implicit
__gnu_debug::vector::operator= definition will call it, instead of
needing to call _M_update_guaranteed_capacity() manually.
libstdc++-v3/ChangeLog:
* include/debug/deque (deque::operator=(const deque&)): Remove
definition.
* include/debug/list (list::operator=(const list&)): Likewise.
* include/debug/map.h (map::operator=(const map&)): Likewise.
* include/debug/multimap.h (multimap::operator=(const multimap&)):
Likewise.
* include/debug/multiset.h (multiset::operator=(const multiset&)):
Likewise.
* include/debug/set.h (set::operator=(const set&)): Likewise.
* include/debug/string (basic_string::operator=(const basic_string&)):
Likewise.
* include/debug/vector (vector::operator=(const vector&)):
Likewise.
(_Safe_vector::operator=(const _Safe_vector&)): Define for
C++98 as well.
All users of path_range_query are currently allocating a gimple_ranger
only to pass it to the query object. It's tidier to just do it from
path_range_query if no ranger was passed.
Tested on x86-64 Linux.
gcc/ChangeLog:
* gimple-range-path.cc (path_range_query::path_range_query): New
ctor without a ranger.
(path_range_query::~path_range_query): Free ranger if necessary.
(path_range_query::range_on_path_entry): Adjust m_ranger for pointer.
(path_range_query::ssa_range_in_phi): Same.
(path_range_query::compute_ranges_in_block): Same.
(path_range_query::compute_imports): Same.
(path_range_query::compute_ranges): Same.
(path_range_query::range_of_stmt): Same.
(path_range_query::compute_outgoing_relations): Same.
* gimple-range-path.h (class path_range_query): New ctor.
* tree-ssa-loop-ch.c (ch_base::copy_headers): Remove gimple_ranger
as path_range_query allocates one.
* tree-ssa-threadbackward.c (class back_threader): Remove m_ranger.
(back_threader::~back_threader): Same.
We have much more thorough restrictions, that are shared between both
threader implementations, in the registry. I've been meaning to
remove the backward threader one, since it's only purpose was reducing
the search space. Previously there was a small time penalty for its
removal, but with the various patches in the past month, it looks like
the removal is a wash performance wise.
This catches 8 more jump threads in the backward threader in my suite.
Presumably, because we disallowed all loop crossing, whereas the
registry restrictions allow some crossing (if we exit the loop, etc).
Tested on x86-64 Linux.
gcc/ChangeLog:
* tree-ssa-threadbackward.c
(back_threader_profitability::profitable_path_p): Remove loop
crossing restriction.
Fix the Create_func_descriptors pass to traverse the subexpressions of
the function in a Call_expression. There are no subexpressions in the
normal case of calling a function a method directly, but there are
subexpressions when in code like F().M() when F returns an interface type.
Forgetting to traverse the function subexpressions was almost entirely
hidden by the fact that we also created the necessary thunks in
Bound_method_expression::do_flatten and
Interface_field_reference_expression::do_get_backend. However, when
the thunks were created there, they did not go through the
order_evaluations pass. This almost always worked, but failed in the
case in which the function being thunked returned multiple results, as
order_evaluations takes the necessary step of moving the
Call_expression into its own statement, and that would not happen when
order_evaluations was not called. Avoid hiding errors like this by
changing those methods to only lookup the previously created thunk,
rather than creating it if it was not already created.
The test case for this is https://golang.org/cl/363156.
Fixes https://golang.org/issue/49512
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/363274
Calling the placement version of ::operator new "implicitly creates
objects in the returned region of storage" as per [intro.object]. This
allows the returned memory to be used as storage for implicit-lifetime
types (including arrays) without additional action by the caller. This
is required by the proposed resolution of LWG 3147.
libstdc++-v3/ChangeLog:
* include/std/memory_resource (memory_resource::allocate):
Implicitly create objects in the returned storage.
This function only exists to avoid an error in the debug mode vector, so
doesn't need to be public.
libstdc++-v3/ChangeLog:
* include/bits/stl_bvector.h (vector<bool>::data()): Give
protected access, and delete for C++11 and later.
As discussed on the mailing list the template actually tests for missed
optimization where we fail to pragate size of an array. We no longer miss this
after modref improvements.
gcc/testsuite/ChangeLog:
2021-11-11 Jan Hubicka <hubicka@ucw.cz>
* gfortran.dg/inline_matmul_17.f90: Fix template
We newly can handle some extra cases, for example:
struct a {int a,b,c;};
__attribute__ ((noinline))
int init (struct a *a)
{
a->a=1;
a->b=2;
a->c=3;
}
int const_fn ()
{
struct a a;
init (&a);
return a.a + a.b + a.c;
}
Here pure/const stops on the fact that const_fn calls non-const init, while
modref knows that the memory it initializes is local to const_fn.
I ended up reordering passes so early modref is done after early pure-const
mostly to avoid need to change testsuite which greps for const functions
being detects in pure-const. Stil some testuiste compensation is needed.
gcc/ChangeLog:
2021-11-11 Jan Hubicka <hubicka@ucw.cz>
* ipa-modref.c (analyze_function): Do pure/const discovery, return
true on success.
(pass_modref::execute): If pure/const is discovered fixup cfg.
(ignore_edge): Do not ignore pure/const edges.
(modref_propagate_in_scc): Do pure/const discovery, return true if
cdtor was promoted pure/const.
(pass_ipa_modref::execute): If needed remove unreachable functions.
* ipa-pure-const.c (warn_function_noreturn): Fix whitespace.
(warn_function_cold): Likewise.
(skip_function_for_local_pure_const): Move earlier.
(ipa_make_function_const): Break out from ...
(ipa_make_function_pure): Break out from ...
(propagate_pure_const): ... here.
(pass_local_pure_const::execute): Use it.
* ipa-utils.h (ipa_make_function_const): Declare.
(ipa_make_function_pure): Declare.
* passes.def: Move early modref after pure-const.
gcc/testsuite/ChangeLog:
2021-11-11 Jan Hubicka <hubicka@ucw.cz>
* c-c++-common/tm/inline-asm.c: Disable pure-const.
* g++.dg/ipa/modref-1.C: Update template.
* gcc.dg/tree-ssa/modref-11.c: Disable pure-const.
* gcc.dg/tree-ssa/modref-14.c: New test.
* gcc.dg/tree-ssa/modref-8.c: Do not optimize sibling calls.
* gfortran.dg/do_subscript_3.f90: Add -O0.
Declare unsigned type-qualified builtins and use them to implement
the vector reduction Neon intrinsics. This removes the need for many
casts in arm_neon.h.
gcc/ChangeLog:
2021-11-09 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/aarch64-simd-builtins.def: Declare unsigned
builtins for vector reduction.
* config/aarch64/arm_neon.h (vaddv_u8): Use type-qualified
builtin and remove casts.
(vaddv_u16): Likewise.
(vaddv_u32): Likewise.
(vaddvq_u8): Likewise.
(vaddvq_u16): Likewise.
(vaddvq_u32): Likewise.
(vaddvq_u64): Likewise.
Declare unsigned type-qualified builtins and use them to implement
the pairwise addition Neon intrinsics. This removes the need for many
casts in arm_neon.h.
gcc/ChangeLog:
2021-11-09 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/aarch64-simd-builtins.def:
* config/aarch64/arm_neon.h (vpaddq_u8): Use type-qualified
builtin and remove casts.
(vpaddq_u16): Likewise.
(vpaddq_u32): Likewise.
(vpaddq_u64): Likewise.
(vpadd_u8): Likewise.
(vpadd_u16): Likewise.
(vpadd_u32): Likewise.
(vpaddd_u64): Likewise.
Declare unsigned type-qualified builtins and use them to implement
halving-subtract Neon intrinsics. This removes the need for many
casts in arm_neon.h.
gcc/ChangeLog:
2021-11-09 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/aarch64-simd-builtins.def: Use BINOPU type
qualifiers in generator macros for uhsub builtins.
* config/aarch64/arm_neon.h (vhsub_s8): Remove unnecessary
cast.
(vhsub_s16): Likewise.
(vhsub_s32): Likewise.
(vhsub_u8): Use type-qualified builtin and remove casts.
(vhsub_u16): Likewise.
(vhsub_u32): Likewise.
(vhsubq_s8): Remove unnecessary cast.
(vhsubq_s16): Likewise.
(vhsubq_s32): Likewise.
(vhsubq_u8): Use type-qualified builtin and remove casts.
(vhsubq_u16): Likewise.
(vhsubq_u32): Likewise.
Declare unsigned type-qualified builtins and use them for [R]SHRN[2]
Neon intrinsics. This removes the need for casts in arm_neon.h.
gcc/ChangeLog:
2021-11-08 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/aarch64-simd-builtins.def: Declare type-
qualified builtins for [R]SHRN[2].
* config/aarch64/arm_neon.h (vshrn_n_u16): Use type-qualified
builtin and remove casts.
(vshrn_n_u32): Likewise.
(vshrn_n_u64): Likewise.
(vrshrn_high_n_u16): Likewise.
(vrshrn_high_n_u32): Likewise.
(vrshrn_high_n_u64): Likewise.
(vrshrn_n_u16): Likewise.
(vrshrn_n_u32): Likewise.
(vrshrn_n_u64): Likewise.
(vshrn_high_n_u16): Likewise.
(vshrn_high_n_u32): Likewise.
(vshrn_high_n_u64): Likewise.
Declare unsigned type-qualified builtins and use them for XTN[2] Neon
intrinsics. This removes the need for casts in arm_neon.h.
gcc/ChangeLog:
2021-11-08 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/aarch64-simd-builtins.def: Declare unsigned
type-qualified builtins for XTN[2].
* config/aarch64/arm_neon.h (vmovn_high_u16): Use type-
qualified builtin and remove casts.
(vmovn_high_u32): Likewise.
(vmovn_high_u64): Likewise.
(vmovn_u16): Likewise.
(vmovn_u32): Likewise.
(vmovn_u64): Likewise.
Declare poly type-qualified builtins and use them for PMUL[L] Neon
intrinsics. This removes the need for casts in arm_neon.h.
gcc/ChangeLog:
2021-11-08 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/aarch64-simd-builtins.def: Use poly type
qualifier in builtin generator macros.
* config/aarch64/arm_neon.h (vmul_p8): Use type-qualified
builtin and remove casts.
(vmulq_p8): Likewise.
(vmull_high_p8): Likewise.
(vmull_p8): Likewise.
At the end of the backtrace stream _Unwind_Find_FDE() may not be able
to find the frame unwind info and will later call the backtrace fallback
instead of finishing. This occurs when using an old libc on ppc64 due to
dl_iterate_phdr() not being able to set the fde in the last trace.
When this occurs the cfa of the trace will be behind of context's cfa.
Also, libgo’s probestackmaps() calls the backtrace with a null pointer
and can get to the backchain fallback with the same problem, in this case
we are only interested in find a stack map, we don't need nor can do a
backchain.
_Unwind_ForcedUnwind_Phase2() can hit the same issue as it uses
uw_frame_state_for(), so we need to treat _URC_NORMAL_STOP.
libgcc/ChangeLog:
PR libgcc/103044
* config/rs6000/linux-unwind.h (ppc_backchain_fallback): Check if it's
called with a null argument or at the end of the backtrace and return.
* unwind.inc (_Unwind_ForcedUnwind_Phase2): Treat _URC_NORMAL_STOP.
I wrote script comparing modref pure/const discovery with ipa-pure-const
and found mistakes on both ends. This plugs the modref differences in handling
looping pure consts which were previously missed due to early exits on
ECF_CONST | ECF_PURE. Those early exists are bit anoying and I think as
a cleanup I may just drop some of them as premature optimizations coming from
time modref was very simplistic on what it propagates.
gcc/ChangeLog:
2021-11-11 Jan Hubicka <hubicka@ucw.cz>
* ipa-modref.c (modref_summary::useful_p): Check also for side-effects
with looping const/pure.
(modref_summary_lto::useful_p): Likewise.
(merge_call_side_effects): Merge side effects before early exit
for pure/const.
(process_fnspec): Also handle pure functions.
(analyze_call): Do not early exit on looping pure const.
(propagate_unknown_call): Also handle nontrivial SCC as side-effect.
(modref_propagate_in_scc): Update.
This makes sure to only assert we don't run into a asm goto when
inserting a stmt in reassoc, matching the condition in
can_reassociate_p. We can handle EH edges from an asm just like
EH edges from any other stmt.
2021-11-11 Richard Biener <rguenther@suse.de>
PR tree-optimization/103190
* tree-ssa-reassoc.c (insert_stmt_after): Only assert on asm goto.
Imports are our nomenclature for external SSA names to a block that
are used to calculate the outgoing edges for said block. For example,
in the following snippet:
<bb 2> :
_1 = b_10 == block_11;
_2 = b_10 != -1;
_3 = _1 & _2;
if (_3 != 0)
goto <bb 3>; [INV]
else
goto <bb 5>; [INV]
...the imports to the block are b_10 and block_11 since they are both
needed to calculate _3.
The path solver takes a bitmap of imports in addition to the path
itself. This sets up the number of SSA names to be on the lookout
for, while resolving the final conditional.
Calculating these imports was initially done in the threader, since it
was the only user of the path solver. With new clients, it has become
obvious that populating the imports should be a task for the path
solver, so it can be shared among the clients.
This patch moves the import code to the solver, making both the solver
and the threader simpler in the process. This is because intent is
clearer and some duplicate code was removed.
This reshuffling had the net effect of giving us a handful of new
threads through my suite of .ii files (125). This was unexpected, but
welcome nevertheless. There is no performance difference in callgrind
over the same suite.
Regstrapped on x86-64 Linux.
gcc/ChangeLog:
* gimple-range-path.cc (path_range_query::add_copies_to_imports):
Rename to...
(path_range_query::compute_imports): ...this. Adapt it so it can
be passed the imports bitmap instead of working on m_imports.
(path_range_query::compute_ranges): Call compute_imports in all
cases unless an imports bitmap is passed.
* gimple-range-path.h (path_range_query::compute_imports): New.
(path_range_query::add_copies_to_imports): Remove.
* tree-ssa-threadbackward.c (back_threader::resolve_def): Remove.
(back_threader::find_paths_to_names): Inline resolve_def.
(back_threader::find_paths): Call compute_imports.
(back_threader::resolve_phi): Adjust comment.
The following splits loop header copying into an analysis phase
that uses ranger and a transform phase that can do without to avoid
running ranger on IL that has SSA form not updated.
2021-11-11 Richard Biener <rguenther@suse.de>
PR tree-optimization/103188
* tree-ssa-loop-ch.c (should_duplicate_loop_header_p):
Remove query parameter, split out check for size
optimization.
(ch_base::m_ranger, cb_base::m_query): Remove.
(ch_base::copy_headers): Split processing loop into
analysis around which we allocate and use ranger and
transform where we do not.
(pass_ch::execute): Do not allocate/free ranger here.
(pass_ch_vect::execute): Likewise.
* gcc.dg/torture/pr103188.c: New testcase.
We make self recursive functions as looping of fear of endless recursion.
This is done correctly for local pure/const and for non-trivial SCCs in
callgraph, but for trivial SCCs we miss the flag.
I think it is bad decision since infinite recursion will run out of stack,
but changing it upsets some testcases and should be done independently.
So this patch is fixing current behaviour to be consistent.
gcc/ChangeLog:
2021-11-11 Jan Hubicka <hubicka@ucw.cz>
* ipa-pure-const.c (propagate_pure_const): Self recursion is
a side effects.
Fix ipa-pure-const handling of noreturn flags. It is not safe to set it for
interposable symbols and we should also set it for aliases (just like we do for
other flags). This patch merely copies other flag handling and implements it
here.
gcc/ChangeLog:
2021-11-11 Jan Hubicka <hubicka@ucw.cz>
* cgraph.c (set_noreturn_flag_1): New function.
(cgraph_node::set_noreturn_flag): New member function
* cgraph.h (cgraph_node::set_noreturn_flags): Declare.
* ipa-pure-const.c (pass_local_pure_const::execute): Use it.
When thinking about GOMP_teams3, I've realized that using global variables
for the values returned by omp_get_num_teams()/omp_get_team_num() calls
is incorrect even with our right now dumb way of implementing host teams.
The problems are two, one is if host teams is used from multiple pthread_create
created threads - the spec says that host teams can't be nested inside of
explicit parallel or other teams constructs, but with pthread_create the
standard says obviously nothing about it. Another more important thing
is host fallback, right now we don't do anything for omp_get_num_teams()
or omp_get_team_num() which was fine before host teams was introduced and
the 5.1 requirement that num_teams clause specifies minimum of teams, but
with the global vars it means inside of target teams num_teams (2) we happily
return omp_get_num_teams() == 4 if the target teams is inside of host teams
with num_teams(4). With target fallback being invoked from parallel
regions global vars simply can't work right on the host.
So, this patch moves them to struct gomp_thread and propagates those for
parallel to child threads. For host fallback, the implicit zeroing of
*thr results in us returning omp_get_num_teams () == 1 and
omp_get_team_num () == 0 which is fine for target teams without num_teams
clause, for target teams with num_teams clause something to work on and
for target without teams nested in it I've asked on omp-lang what should
be done.
2021-11-11 Jakub Jelinek <jakub@redhat.com>
* libgomp.h (struct gomp_thread): Add num_teams and team_num members.
* team.c (struct gomp_thread_start_data): Likewise.
(gomp_thread_start): Initialize thr->num_teams and thr->team_num.
(gomp_team_start): Initialize start_data->num_teams and
start_data->team_num. Update nthr->num_teams and nthr->team_num.
* teams.c (gomp_num_teams, gomp_team_num): Remove.
(GOMP_teams_reg): Set and restore thr->num_teams and thr->team_num
instead of gomp_num_teams and gomp_team_num.
(omp_get_num_teams): Use thr->num_teams + 1 instead of gomp_num_teams.
(omp_get_team_num): Use thr->team_num instead of gomp_team_num.
* testsuite/libgomp.c/teams-4.c: New test.
There is a known failure for gfortran.dg/vector_subscript_1.f90. It
was previously failing for all optimization levels except -Os.
Getting the loop header copying right, now makes it fail for all
levels :-).
Tested on x86-64 Linux.
Co-authored-by: Richard Biener <rguenther@suse.de>
gcc/ChangeLog:
* tree-ssa-loop-ch.c (entry_loop_condition_is_static): Resolve
statically to the edge remaining in the loop.
For integer vector division we only checked for all zero vector
constants rather than checking whether any element in the constant
vector is zero.
2021-11-11 Richard Biener <rguenther@suse.de>
PR middle-end/103181
* tree-eh.c (operation_could_trap_helper_p): Properly
check vector constants for a zero element for integer
division. Separate floating point and integer division code.
* gcc.dg/torture/pr103181.c: New testcase.
For PCC_BITFIELD_TYPE_MATTERS field_byte_offset has quite large code
to deal with it since many years ago (see it e.g. in GCC 3.2, although it
used to be on HOST_WIDE_INTs, then on double_ints, now on offset_ints).
But that code apparently isn't able to cope with members with empty class
types with [[no_unique_address]] attribute, because the empty classes have
non-zero type size but zero decl size and so one can end up from the
computation with negative offset or offset 1 byte smaller than it should be.
For !PCC_BITFIELD_TYPE_MATTERS, we just use
tree_result = byte_position (decl);
which seems exactly right even for the empty classes or anything which is
not a bitfield (and for which we don't add DW_AT_bit_offset attribute).
So, instead of trying to handle those no_unique_address members in the
current already very complicated code, this limits it to bitfields.
stor-layout.c PCC_BITFIELD_TYPE_MATTERS handling also affects only
bitfields, twice it checks DECL_BIT_FIELD and once DECL_BIT_FIELD_TYPE.
As discussed, this patch uses DECL_BIT_FIELD_TYPE check, because
DECL_BIT_FIELD might be cleared for some bitfields with bitsizes
multiple of BITS_PER_UNIT and e.g.
struct S { int e; int a : 1, b : 7, c : 8, d : 16; } s;
struct T { int a : 1, b : 7; long long c : 8; int d : 16; } t;
int
main ()
{
s.c = 0x55;
s.d = 0xaaaa;
t.c = 0x55;
t.d = 0xaaaa;
s.e++;
}
has different debug info with DECL_BIT_FIELD check.
2021-11-11 Jakub Jelinek <jakub@redhat.com>
PR debug/101378
* dwarf2out.c (field_byte_offset): Do the PCC_BITFIELD_TYPE_MATTERS
handling only for DECL_BIT_FIELD_TYPE decls.
* g++.dg/debug/dwarf2/pr101378.C: New test.
gcc/ChangeLog:
PR target/102376
* config/aarch64/aarch64.c (aarch64_process_target_attr): Check if
token is arch extension without leading '+' and emit appropriate
diagnostic for the same.
gcc/testsuite/ChangeLog:
PR target/102376
* gcc.target/aarch64/pr102376.c: New test.
In OpenMP 5.1, num_teams clause can accept either one expression as before,
but it in that case changed meaning, rather than create <= expression
teams it is now create == expression teams. Or it accepts two expressions
separated by :, with the meaning that the first is low bound and second upper
bound on how many teams should be created. The other ways to set number of
teams are upper bounds with lower bound of 1.
The following patch does parsing of this for C/C++. For host teams, we
actually don't need to do anything further right now, we always create
(pretend to create) exactly the requested number of teams, so we can just
evaluate and throw away the lower bound for now.
For teams nested in target, we don't guarantee that though and further
work will be needed.
In particular, omplower now turns the teams part of:
struct S { S (); S (const S &); ~S (); int s; };
void bar (S &, S &);
int baz ();
_Pragma ("omp declare target to (baz)");
void
foo (void)
{
S a, b;
#pragma omp target private (a) map (b)
{
#pragma omp teams firstprivate (b) num_teams (baz ())
{
bar (a, b);
}
}
}
into:
retval.0 = baz ();
retval.1 = retval.0;
{
unsigned int retval.3;
struct S * D.2549;
struct S b;
retval.3 = (unsigned int) retval.1;
D.2549 = .omp_data_i->b;
S::S (&b, D.2549);
#pragma omp teams num_teams(retval.1) firstprivate(b) shared(a)
__builtin_GOMP_teams (retval.3, 0);
{
bar (&a, &b);
}
S::~S (&b);
#pragma omp return(nowait)
}
IMHO we want a new API, say GOMP_teams3 which will take 3 arguments
instead of 2 (the lower and upper bounds from num_teams and thread_limit)
and will return a bool whether it should do the teams body or not.
And, we should add right before outermost {} above
while (__builtin_GOMP_teams3 ((unsigned) retval.1, (unsigned) retval.1, 0))
and remove the __builtin_GOMP_teams call. The current function performs
exit equivalent (at least on NVPTX) which seems bad because that means
the destructors of e.g. private variables on target aren't invoked, and
at the current placement neither destructors of the already constructed
privatized variables in teams.
I'll do this next on the compiler side, but I'm afraid I'll need help
with the nvptx and amdgcn implementations. E.g. for nvptx, we won't be
able to use %ctaid.x . I think ideal would be to use a .shared
integer variable for the omp_get_team_num value, but I don't have any
experience with that, are .shared variables zero initialized by default,
or do they have random value at start? PTX docs say they aren't initializable.
2021-11-11 Jakub Jelinek <jakub@redhat.com>
gcc/
* tree.h (OMP_CLAUSE_NUM_TEAMS_EXPR): Rename to ...
(OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR): ... this.
(OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR): Define.
* tree.c (omp_clause_num_ops): Increase num ops for
OMP_CLAUSE_NUM_TEAMS to 2.
* tree-pretty-print.c (dump_omp_clause): Print optional lower bound
for OMP_CLAUSE_NUM_TEAMS.
* gimplify.c (gimplify_scan_omp_clauses): Gimplify
OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR if non-NULL.
(optimize_target_teams): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead
of OMP_CLAUSE_NUM_TEAMS_EXPR. Handle OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR.
* omp-low.c (lower_omp_teams): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR
instead of OMP_CLAUSE_NUM_TEAMS_EXPR.
* omp-expand.c (expand_teams_call, get_target_arguments): Likewise.
gcc/c/
* c-parser.c (c_parser_omp_clause_num_teams): Parse optional
lower-bound and store it into OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR.
Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of
OMP_CLAUSE_NUM_TEAMS_EXPR.
(c_parser_omp_target): For OMP_CLAUSE_NUM_TEAMS evaluate before
combined target teams even lower-bound expression.
gcc/cp/
* parser.c (cp_parser_omp_clause_num_teams): Parse optional
lower-bound and store it into OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR.
Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of
OMP_CLAUSE_NUM_TEAMS_EXPR.
(cp_parser_omp_target): For OMP_CLAUSE_NUM_TEAMS evaluate before
combined target teams even lower-bound expression.
* semantics.c (finish_omp_clauses): Handle
OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR of OMP_CLAUSE_NUM_TEAMS clause.
* pt.c (tsubst_omp_clauses): Likewise.
(tsubst_expr): For OMP_CLAUSE_NUM_TEAMS evaluate before
combined target teams even lower-bound expression.
gcc/fortran/
* trans-openmp.c (gfc_trans_omp_clauses): Use
OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR.
gcc/testsuite/
* c-c++-common/gomp/clauses-1.c (bar): Supply lower-bound expression
to half of the num_teams clauses.
* c-c++-common/gomp/num-teams-1.c: New test.
* c-c++-common/gomp/num-teams-2.c: New test.
* g++.dg/gomp/attrs-1.C (bar): Supply lower-bound expression
to half of the num_teams clauses.
* g++.dg/gomp/attrs-2.C (bar): Likewise.
* g++.dg/gomp/num-teams-1.C: New test.
* g++.dg/gomp/num-teams-2.C: New test.
libgomp/
* testsuite/libgomp.c-c++-common/teams-1.c: New test.
This makes the control dependence compute avoid a find_edge
and optimizes allocation by embedding the bitmap head into the
vector of control dependences instead of allocating all of them.
It also uses a local bitmap obstack.
The bitmap changes make it necessary to shuffle some includes.
2021-11-10 Richard Biener <rguenther@suse.de>
* cfganal.h (control_dependences::control_dependence_map):
Embed bitmap_head.
(control_dependences::m_bitmaps): New.
* cfganal.c (control_dependences::set_control_dependence_map_bit):
Adjust.
(control_dependences::clear_control_dependence_bitmap):
Likewise.
(control_dependences::find_control_dependence): Do not
find_edge for the abnormal edge test.
(control_dependences::control_dependences): Instead do not
add abnormal edges to the edge list. Adjust.
(control_dependences::~control_dependences): Likewise.
(control_dependences::get_edges_dependent_on): Likewise.
* function-tests.c: Include bitmap.h.
gcc/analyzer/
* supergraph.cc: Include bitmap.h.
gcc/c/
* gimple-parser.c: Shuffle bitmap.h include.