Commit Graph

192209 Commits

Author SHA1 Message Date
Alexandre Oliva
b8c4171ebd hardcmp: split before dispatch edge
If we harden a compare at the end of a block with an edge to the
abnormal dispatch block, it won't have a single successor.  Arrange to
split the block at its final stmt so as to have a single succ.


for  gcc/ChangeLog

	PR middle-end/104975
	* gimple-harden-conditionals.cc
	(pass_harden_compares::execute): Force split in case of
	multiple edges.

for  gcc/testsuite/ChangeLog

	PR middle-end/104975
	* gcc.dg/pr104975.c: New.
2022-03-24 10:02:27 -03:00
Tom de Vries
11fb784ac5 [libatomic] Fix return value in libat_test_and_set
On nvptx (using a Quadro K2000 with driver 470.103.01) I ran into this:
...
FAIL: gcc.dg/atomic/stdatomic-flag-2.c -O1 execution test
...
which mimimized to:
...
  #include <stdatomic.h>
  atomic_flag a = ATOMIC_FLAG_INIT;
  int main () {
    if ((atomic_flag_test_and_set) (&a))
      __builtin_abort ();
    return 0;
  }
...

The atomic_flag_test_and_set is implemented using __atomic_test_and_set_1,
which corresponds to the "word-sized compare-and-swap loop" version of
libat_test_and_set in libatomic/tas_n.c.

The semantics of a test-and-set is that the return value is "true if and only
if the previous contents were 'set'".

But the code uses:
...
  return woldval != 0;
...
which means it doesn't look only at the byte that was either set or not set,
but at the entire word.

Fix this by using instead:
...
  return (woldval & ((UTYPE) ~(UTYPE) 0 << shift)) != 0;
...

Tested on nvptx.

libatomic/ChangeLog:

2022-03-24  Tom de Vries  <tdevries@suse.de>

	PR target/105011
	* tas_n.c (libat_test_and_set): Fix return value.
2022-03-24 13:30:57 +01:00
Jakub Jelinek
568377743e testsuite: Add compat.exp testcase for most common zero width bitfld ABI passing [PR102024]
On Tue, Mar 22, 2022 at 05:51:58PM +0100, Jakub Jelinek via Gcc wrote:
> I guess it would be nice to include the testcases we are talking about,
> like { float x; int : 0; float y; } and { float x; int : 0; } and
> { int : 0; float x; } into compat.exp testsuite so that we see ABI
> differences in compat testing.

Here is a patch that does that.  It uses the struct-layout-1* framework,
but isn't generated because we don't want in this case pseudo-random
structure layouts, but particular ones we know cause or could cause problems
on some targets.  If other problematic cases are discovered, we can add
further ones.

Tested on x86_64-linux with:
make check-gcc check-g++ RUNTESTFLAGS='ALT_CC_UNDER_TEST=gcc ALT_CXX_UNDER_TEST=g++ compat.exp=pr102*'
and with
make check-gcc check-g++ RUNTESTFLAGS='compat.exp=pr102*'
The former as expected has:
FAIL: gcc.dg/compat/pr102024 c_compat_x_tst.o-c_compat_y_alt.o execute
FAIL: gcc.dg/compat/pr102024 c_compat_x_alt.o-c_compat_y_tst.o execute
fails because on x86_64 we've changed the C ABI but kept the C++ ABI here.
E.g. on rs6000 it should be the g++.dg such tests to fail (all assuming
the alt gcc/g++ is GCC 4.5 through 11).

2022-03-24  Jakub Jelinek  <jakub@redhat.com>

	PR target/102024
	* gcc.dg/compat/pr102024_main.c: New test.
	* gcc.dg/compat/pr102024_test.h: New test.
	* gcc.dg/compat/pr102024_x.c: New test.
	* gcc.dg/compat/pr102024_y.c: New test.
	* g++.dg/compat/pr102024_main.C: New test.
	* g++.dg/compat/pr102024_test.h: New test.
	* g++.dg/compat/pr102024_x.C: New test.
	* g++.dg/compat/pr102024_y.C: New test.
2022-03-24 12:25:15 +01:00
Jakub Jelinek
8698ff67cd fold-const: Handle C++ dependent COMPONENT_REFs in operand_equal_p [PR105035]
As mentioned in the PR, operand_equal_p already contains some hacks so that
it can be called already on pre-instantiation C++ trees from templates,
but the recent change to compare DECL_FIELD_OFFSET in the COMPONENT_REF
case broke this.  Many such COMPONENT_REFs are already punted on earlier
because they have NULL TREE_TYPE, but in this case the code knows what
type they have but still uses an IDENTIFIER_NODE as second operand
of COMPONENT_REF (I think SCOPE_REF is something that could be used too).

The following patch looks at those DECL_FIELD_*OFFSET fields only if
both field[01] args are FIELD_DECLs and otherwise keeps it to the
earlier OP_SAME (1) check that guards this whole block.

2022-03-24  Jakub Jelinek  <jakub@redhat.com>

	PR c++/105035
	* fold-const.cc (operand_equal_p) <case COMPONENT_REF>: If either
	field0 or field1 is not a FIELD_DECL, return false.

	* g++.dg/warn/Wduplicated-cond2.C: New test.
2022-03-24 12:23:51 +01:00
Pascal Obry
d937c6e44b Properly reset the port handle when closing
When the serial port is closed, we need to ensure that the port handle is
properly reset for it to be detected as closed.

gcc/ada/
	PR ada/104767
	* libgnat/g-sercom__mingw.adb (Close): Reset port handle to -1.
	* libgnat/g-sercom__linux.adb (Close): Likewise.
2022-03-24 11:31:50 +01:00
Richard Biener
85b4d88132 Fix memory leaks
When changing the predcom pass to use auto_vec leaks were introduced by
failing to replace deallocation with C++ delete.  The following does
this.  It also fixes leaks in vectorization and range folding.

2022-03-24  Richard Biener  <rguenther@suse.de>

	* tree-predcom.cc (chain::chain): Add CTOR.
	(component::component): Likewise.
	(pcom_worker::release_chain): Use delete.
	(release_components): Likewise.
	(pcom_worker::filter_suitable_components): Likewise.
	(pcom_worker::split_data_refs_to_components): Use new.
	(make_invariant_chain): Likewise.
	(make_rooted_chain): Likewise.
	(pcom_worker::combine_chains): Likewise.
	* tree-vect-loop.cc (vect_create_epilog_for_reduction):
	Make sure to release previously constructed scalar_results.
	* tree-vect-stmts.cc (vectorizable_load): Use auto_vec
	for vec_offsets.
	* vr-values.cc (simplify_using_ranges::~simplify_using_ranges):
	Release m_flag_set_edges.
2022-03-24 11:16:10 +01:00
Siddhesh Poyarekar
c1d233e333 tree-optimization/104970: Limit size computation for access attribute
Limit object size computation only to the simple case where access
attribute has been explicitly specified.  The object passed to
__builtin_dynamic_object_size could either be a pointer or a VLA whose
size has been described using access attribute.

Further, return a valid size only if the object is a void * pointer or
points to (or is a VLA of) a type that has a constant size.

gcc/ChangeLog:

	PR tree-optimization/104970
	* tree-object-size.cc (parm_object_size): Restrict size
	computation scenarios to explicit access attributes.

gcc/testsuite/ChangeLog:

	PR tree-optimization/104970
	* gcc.dg/builtin-dynamic-object-size-0.c (test_parmsz_simple2,
	test_parmsz_simple3, test_parmsz_extern, test_parmsz_internal,
	test_parmsz_internal2, test_parmsz_internal3): New tests.
	(main): Use them.

Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
2022-03-24 15:04:31 +05:30
Jakub Jelinek
72124f487c c++: extern thread_local declarations in constexpr [PR104994]
C++14 to C++20 apparently should allow extern thread_local declarations in
constexpr functions, however useless they are there (because accessing
such vars is not valid in a constant expression, perhaps sizeof/decltype).
P2242 changed that for C++23 to passing through declaration but
https://cplusplus.github.io/CWG/issues/2552.html
has been filed for it yesterday.

The following patch implements the proposed wording of CWG 2552 in addition
to fixing the C++14 - C++20 handling bug.
If you'd like instead to keep the current pedantic C++23 wording for now,
that would mean taking out the first hunk (cxx_eval_constant_expression) and
g++.dg/cpp23/constexpr-nonlit2.C hunk.

2022-03-24  Jakub Jelinek  <jakub@redhat.com>

	PR c++/104994
	* constexpr.cc (cxx_eval_constant_expression): Don't diagnose passing
	through extern thread_local declarations.  Change wording from
	declaration to definition.
	(potential_constant_expression_1): Don't diagnose extern thread_local
	declarations.  Change wording from declared to defined.
	* decl.cc (start_decl): Likewise.

	* g++.dg/diagnostic/constexpr1.C: Change expected diagnostic wording
	from declared to defined.
	* g++.dg/cpp23/constexpr-nonlit1.C: Likewise.
	(garply): Change dg-error into dg-bogus.
	* g++.dg/cpp23/constexpr-nonlit2.C: Change expected diagnostic wording
	from declaration to definition.
	* g++.dg/cpp23/constexpr-nonlit6.C: Change expected diagnostic wording
	from declared to defined.
	* g++.dg/cpp23/constexpr-nonlit7.C: New test.
	* g++.dg/cpp2a/constexpr-try5.C: Change expected diagnostic wording
	from declared to defined.
	* g++.dg/cpp2a/consteval3.C: Likewise.
2022-03-24 10:12:25 +01:00
Kewen Lin
497bde3ab9 rs6000: Skip overload instances with NULL fntype [PR104967]
For some overload built-in function instance, if it requires
a data type which isn't defined on the target, its fntype
would be initialized as NULL.  This patch is to consider
this possibility in function find_instance, as shown in
PR104967.

	PR target/104967

gcc/ChangeLog:

	* config/rs6000/rs6000-c.cc (find_instance): Skip instances with null
	function types.
2022-03-23 20:47:14 -05:00
GCC Administrator
d1ca63a1b7 Daily bump. 2022-03-24 00:16:44 +00:00
David Malcolm
4cebae0924 analyzer: fix accessing wrong stack frame on interprocedural return [PR104979]
PR analyzer/104979 reports a leak false positive when handling an
interprocedural return to a caller:

  LHS = CALL(ARGS);

where the LHS is a certain non-trivial compound expression.

The root cause is that parts of the LHS were being erroneously
evaluated with respect to the stack frame of the called function,
rather than tha of the caller.  When LHS contained a local variable
within the caller as part of certain nested expressions, this local
variable was looked for within the called frame, rather than that of the
caller.  This lookup in the wrong stack frame led to the local variable
being treated as uninitialized, and thus the write to LHS was considered
as writing to a garbage location, leading to the return value being
lost, and thus being considered as a leak.

The region_model code uses the analyzer's path_var class to try to
extend the tree type with stack depth information.  Based on the above,
I think that the path_var class is fundamentally broken, but it's used
in a few other places in the analyzer, so I don't want to rip it out
until the next stage 1.

In the meantime, this patch reworks how region_model::pop_frame works so
that the destination region for an interprocedural return value is
computed after the frame is popped, so that the region_model has the
stack frame for the *caller* at that point.  Doing so fixes the issue.

I attempted a more ambitious fix which moved the storing of the return
svalue into the destination region from region_model::pop_region into
region_model::update_for_return_gcall, with pop_frame returning the
return svalue.  Unfortunately, this regressed g++.dg/analyzer/pr93212.C,
which returns a pointer into a stale frame.
unbind_region_and_descendents and poison_any_pointers_to_descendents are
only set up to poison regions with bindings into the stale frame, not
individual svalues, and updating that became more invasive than I'm
comfortable with in stage 4.

The patch also adds assertions to verify that we have the correct
function when looking up locals/SSA names in a stack frame.  There
doesn't seem to be a general-purpose way to get at the function of an
SSA name, so the assertions go from SSA name to def-stmt to basic_block,
and from there use the analyzer's supergraph to get the function from
the basic_block.  If there's a simpler way to do this, please let me know.

gcc/analyzer/ChangeLog:
	PR analyzer/104979
	* engine.cc (impl_run_checkers): Create the engine after the
	supergraph, and pass the supergraph to the engine.
	* region-model.cc (region_model::get_lvalue_1): Pass ctxt to
	frame_region::get_region_for_local.
	(region_model::update_for_return_gcall): Pass the lvalue for the
	result to pop_frame as a tree, rather than as a region.
	(region_model::pop_frame): Update for above change, determining
	the destination region after the frame is popped and thus with
	respect to the caller frame rather than the called frame.
	Likewise, set the value of the region to the return value after
	the frame is popped.
	(engine::engine): Add supergraph pointer.
	(selftest::test_stack_frames): Set the DECL_CONTECT of PARM_DECLs.
	(selftest::test_get_representative_path_var): Likewise.
	(selftest::test_state_merging): Likewise.
	* region-model.h (region_model::pop_frame): Convert first param
	from a const region * to a tree.
	(engine::engine): Add param "sg".
	(engine::m_sg): New field.
	* region.cc: Include "analyzer/sm.h" and
	"analyzer/program-state.h".
	(frame_region::get_region_for_local): Add "ctxt" param.
	Add assertions that VAR_DECLs are locals, and that expr is for the
	correct function.
	* region.h (frame_region::get_region_for_local): Add "ctxt" param.

gcc/testsuite/ChangeLog:
	PR analyzer/104979
	* gcc.dg/analyzer/boxed-malloc-1-29.c: Deleted test, moving the
	now fixed test_29 to...
	* gcc.dg/analyzer/boxed-malloc-1.c: ...here.
	* gcc.dg/analyzer/stale-frame-1.c: Add test coverage.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2022-03-23 17:40:29 -04:00
Jason Merrill
2cd0c9a531 c++: tweak PR103337 fix
Patrick suggested a way to implement the designated-init handling without
(temporarily) modifying the CONSTRUCTOR being reshaped.

	PR c++/103337

gcc/cp/ChangeLog:

	* decl.cc (reshape_single_init): New.
	(reshape_init_class): Use it.
2022-03-23 15:35:50 -04:00
Jason Merrill
e8cd3edc0f c++: tweak PR105006 fix
Checking dependent_type_p avoids needing to walk the overloads in cases
where it would not be possible to find a dependent using.

	PR c++/105006

gcc/cp/ChangeLog:

	* name-lookup.cc (lookup_using_decl): Check that scope is
	a dependent type before looking for dependent using.
2022-03-23 13:17:35 -04:00
Tobias Burnus
5e33fea219 Fortran: Fix directory stat check for '.' [PR103560]
MinGW does not like a call to 'stat' for './' via gfc_do_check_include_dir.
Solution: Only append '/' when concatenating the path with the filename.

gcc/fortran/ChangeLog:

	PR fortran/103560
	* scanner.cc (add_path_to_list): Don't append '/' to the
	save include path.
	(open_included_file): Use '/' in concatenating path + file name.
	* module.cc (gzopen_included_file_1): Likewise.

gcc/testsuite/ChangeLog:

	PR fortran/103560
	* gfortran.dg/include_14.f90: Update dg-warning.
	* gfortran.dg/include_17.f90: Likewise.
	* gfortran.dg/include_18.f90: Likewise.
	* gfortran.dg/include_6.f90: Update dg-*.
2022-03-23 16:54:12 +01:00
Richard Biener
d9792f8d22 target/102125 - alternative memcpy folding improvement
The following extends the heuristical memcpy folding path with the
ability to use misaligned accesses on strict-alignment targets just
like the size-based path does.  That avoids regressing the following
testcase on arm

    uint64_t bar64(const uint8_t *rData1)
    {
        uint64_t buffer;
        memcpy(&buffer, rData1, sizeof(buffer));
        return buffer;
    }

when r12-3482-g5f6a6c91d7c592 is reverted.

2022-03-23  Richard Biener  <rguenther@suse.de>

	PR target/102125
	* gimple-fold.cc (gimple_fold_builtin_memory_op): Allow the
	use of movmisalign when either the source or destination
	decl is properly aligned.
2022-03-23 15:57:33 +01:00
Richard Biener
1daa198aaf rtl-optimization/105028 - fix compile-time hog in form_threads_from_copies
form_threads_from_copies processes a sorted array of copies, skipping
those with the same thread and conflicting threads and merging the
first non-conflicting ones.  After that it terminates the loop and
gathers the remaining elements of the array, skipping same thread
copies, re-starting the process.  For a large number of copies this
gathering of the rest takes considerable time and it also appears
pointless.  The following simply continues processing the array
which should be equivalent as far as I can see.

This takes form_threads_from_copies off the profile radar from
previously taking ~50% of the compile-time.

2022-03-23  Richard Biener  <rguenther@suse.de>

	PR rtl-optimization/105028
	* ira-color.cc (form_threads_from_copies): Remove unnecessary
	copying of the sorted_copies tail.
2022-03-23 15:57:33 +01:00
Jason Merrill
a3f78748fa c++: using from enclosing class template [PR105006]
Here, DECL_DEPENDENT_P was false for the second using because Row<eT> is
"the current instantiation", so lookup succeeds.  But since Row itself has a
dependent using-decl for operator(), the set of functions imported by the
second using is dependent, so we should set the flag.

	PR c++/105006

gcc/cp/ChangeLog:

	* name-lookup.cc (lookup_using_decl): Set DECL_DEPENDENT_P if lookup
	finds a dependent using.

gcc/testsuite/ChangeLog:

	* g++.dg/template/using30.C: New test.
2022-03-23 08:56:17 -04:00
David Malcolm
e6a3991ea1 analyzer: use tainted_allocation_size::m_mem_space [PR105017]
gcc/analyzer/ChangeLog:
	PR analyzer/105017
	* sm-taint.cc (taint_diagnostic::subclass_equal_p): Check
	m_has_bounds as well as m_arg.
	(tainted_allocation_size::subclass_equal_p): Chain up to base
	class implementation.  Also check m_mem_space.
	(tainted_allocation_size::emit): Add note showing stack-based vs
	heap-based allocations.

gcc/testsuite/ChangeLog:
	PR analyzer/105017
	* gcc.dg/analyzer/taint-alloc-1.c: Add expected messages relating
	to heap vs stack.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2022-03-23 08:37:06 -04:00
David Malcolm
160b095fc9 analyzer: fix ICE adding note to disabled diagnostic [PR104997]
gcc/analyzer/ChangeLog:
	PR analyzer/104997
	* diagnostic-manager.cc (diagnostic_manager::add_diagnostic):
	Convert return type from "void" to "bool", reporting success vs
	failure to caller, for both overloads.
	* diagnostic-manager.h (diagnostic_manager::add_diagnostic):
	Likewise.
	* engine.cc (impl_region_model_context::warn): Propagate return
	value from diagnostic_manager::add_diagnostic.

gcc/testsuite/ChangeLog:
	PR analyzer/104997
	* gcc.dg/analyzer/write-to-string-literal-4-disabled.c: New test,
	adapted from write-to-string-literal-4.c.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2022-03-23 08:34:49 -04:00
Jonathan Wakely
4894d69a1f libstdc++: Add missing constraints to std::bit_cast [PR105027]
Our std::bit_cast was relying on the compiler to check for errors inside
__builtin_bit_cast, instead of checking them as constraints. That means
std::bit_cast was not SFINAE-friendly.

This fix uses a requires-clause, so for old versions of Clang without
concepts support the function will still be unconstrained. At some point
in future we can remove the #ifdef __cpp_concepts check and rely on all
compilers having full concepts support in C++20 mode.

libstdc++-v3/ChangeLog:

	PR libstdc++/105027
	* include/std/bit (bit_cast): Add constraints.
	* testsuite/26_numerics/bit/bit.cast/105027.cc: New test.
2022-03-23 12:17:16 +00:00
Martin Liska
018805e279 rs6000: Adjust error messages.
gcc/ChangeLog:

	* config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin):
	Use %qs in format.
	* config/rs6000/rs6000.cc (rs6000_option_override_internal):
	Reword the error message.
2022-03-23 12:50:27 +01:00
Jonathan Wakely
bd81327b03 libstdc++: Fix feature test macros in <version> for freestanding
Some C++17 and C++20 feature test macros are only defined in <version>
for hosted builds, even though the features are supported for
freestanding.

All C++23 feature test macros are defined in <version> for freestanding,
but most of the features are only supported for hosted.

libstdc++-v3/ChangeLog:

	* include/std/version [!_GLIBCXX_HOSTED]
	(__cpp_lib_hardware_interference_size): Define for freestanding.
	(__cpp_lib_bit_cast): Likewise.
	(__cpp_lib_is_layout_compatible): Likewise.
	(__cpp_lib_is_pointer_interconvertible): Likewise.
	(__cpp_lib_adaptor_iterator_pair_constructor): Do not define for
	freestanding.
	(__cpp_lib_invoke_r): Likewise.
	(__cpp_lib_ios_noreplace): Likewise.
	(__cpp_lib_monadic_optional): Likewise.
	(__cpp_lib_move_only_function): Likewise.
	(__cpp_lib_spanstream): Likewise.
	(__cpp_lib_stacktrace): Likewise.
	(__cpp_lib_string_contains): Likewise.
	(__cpp_lib_string_resize_and_overwrite): Likewise.
	(__cpp_lib_to_underlying): Likewise.
2022-03-23 10:03:03 +00:00
Jonathan Wakely
5bf59b0048 libstdc++: Disable atomic wait for freestanding [PR105021]
We use either condition variables or futexes to implement atomic waits,
so we can't do it in freestanding. This is non-conforming, so should be
revisited later, probably by making freestanding atomic waiting
operations spin without ever blocking.

Reviewed-by: Thomas Rodgers <trodgers@redhat.com>

libstdc++-v3/ChangeLog:

	PR libstdc++/105021
	* include/bits/atomic_base.h [!_GLIBCXX_HOSTED]: Do not include
	<bits/atomic_wait.h> for freestanding.
2022-03-23 10:03:03 +00:00
Jakub Jelinek
4a9e92164a testsuite: Fix up sse2-v1ti-shift-3.c test [PR102986]
This test is dg-do run and invokes UB when these rotate functions
are called with 0 as second argument.  There are some other tests
that do this but they are dg-do compile only and not even call those
functions at all, so it IMHO doesn't matter that they are only well
defined for [1,127] and not [0,127].

The following patch fixes it, we pattern recognize both forms as rotates
and we emit identical assembly.

2022-03-23  Jakub Jelinek  <jakub@redhat.com>

	PR target/102986
	* gcc.target/i386/sse2-v1ti-shift-3.c (rotr_v1ti, rotl_v1ti, rotr_ti,
	rotl_ti): Use -i&127 instead of 128-i to avoid UB on i == 0.
2022-03-23 10:29:37 +01:00
Tobias Burnus
1002a7ace1 LTO: Fixes for renaming issues with offload/OpenMP [PR104285]
gcc/lto/ChangeLog:

	PR middle-end/104285
	* lto-partition.cc (maybe_rewrite_identifier): Use get_identifier
	for the returned string to be usable as hash key.
	(validize_symbol_for_target): Hence, use return value directly.
	(privatize_symbol_name_1): Track maybe_rewrite_identifier renames.
	* lto.cc (offload_handle_link_vars): Move function up before ...
	(do_whole_program_analysis): Call it after static renamings.
	(lto_main): Move call after static renamings.

libgomp/ChangeLog:

	PR middle-end/104285
	* testsuite/libgomp.c++/target-same-name-2-a.C: New test.
	* testsuite/libgomp.c++/target-same-name-2-b.C: New test.
	* testsuite/libgomp.c++/target-same-name-2.C: New test.
	* testsuite/libgomp.c-c++-common/target-same-name-1-a.c: New test.
	* testsuite/libgomp.c-c++-common/target-same-name-1-b.c: New test.
	* testsuite/libgomp.c-c++-common/target-same-name-1.c: New test.
2022-03-23 09:44:39 +01:00
liuhongt
8fa7216ae0 Fix ICE caused by NULL_RTX returned by lowpart_subreg.
In validate_subreg, both (subreg:V2HF (reg:SI) 0)
and (subreg:V8HF (reg:V2HF) 0) are valid, but not
for (subreg:V8HF (reg:SI) 0) which causes ICE.

Ideally it should be handled in validate_subreg to support
subreg for all modes available in TARGET_CAN_CHANGE_MODE_CLASS, but
that would be too risky in stage4, so the patch is a walkround in the
backend to force_reg operands before lowpart_subreg for expanders or
pre_reload splitters.

gcc/ChangeLog:

	PR target/104976
	* config/i386/sse.md (ssePSmodelower): New.
	(*avx_cmp<mode>3_ltint_not): Force_reg operand before
	lowpart_subreg to avoid NULL_RTX.
	(<avx512>_fmaddc_<mode>_mask1<round_expand_name>,
	<avx512>_fcmaddc_<mode>_mask1<round_expand_name>,
	fma_<mode>_fmaddc_bcst, fma_<mode>_fcmaddc_bcst,
	<avx512>_<complexopname>_<mode>_mask<round_name>,
	avx512fp16_fcmaddcsh_v8hf_mask1<round_expand_name>,
	avx512fp16_fcmaddcsh_v8hf_mask3<round_expand_name>,
	avx512fp16_fmaddcsh_v8hf_mask3<round_expand_name>,
	avx512fp16_fmaddcsh_v8hf_mask3<round_expand_name>,
	float<floatunssuffix><mode>v4hf2,
	float<floatunssuffix>v2div2hf2,
	fix<fixunssuffix>_truncv4hf<mode>2,
	fix<fixunssuffix>_truncv2hfv2di2, extendv4hf<mode>2,
	extendv2hfv2df2,
	trunc<mode>v4hf2,truncv2dfv2hf2,
	*avx512bw_permvar_truncv16siv16hi_1,
	*avx512bw_permvar_truncv16siv16hi_1_hf,
	*avx512f_permvar_truncv8siv8hi_1,
	*avx512f_permvar_truncv8siv8hi_1_hf,
	*avx512f_vpermvar_truncv8div8si_1,
	*avx512f_permvar_truncv32hiv32qi_1,
	*avx512f_permvar_truncv16hiv16qi_1,
	*avx512f_permvar_truncv4div4si_1,
	*avx512f_pshufb_truncv8hiv8qi_1,
	*avx512f_pshufb_truncv4siv4hi_1,
	*avx512f_pshufd_truncv2div2si_1,
	sdot_prod<mode>, avx2_pblend<ssemodesuffix>_1,
	ashrv2di3,ashrv2di3,usdot_prod<mode>): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr104976.c: New test.
	* gcc.target/i386/avx512fp16-vfcmaddcph-1a.c: Scan either
	vblendps or masked vmovaps.
	* gcc.target/i386/avx512fp16-vfmaddcph-1a.c: Ditto
	* gcc.target/i386/avx512fp16vl-vfcmaddcph-1a.c: Ditto.
	* gcc.target/i386/avx512fp16vl-vfmaddcph-1a.c: Ditto.
2022-03-23 15:57:39 +08:00
GCC Administrator
a2287813b1 Daily bump. 2022-03-23 00:16:45 +00:00
Hans-Peter Nilsson
5d2233f403 libstdc++-v3 testsuite: Call fesetround(FE_DOWNWARD) only if defined
Without this, for a typical soft-float target such as cris-elf, after
commit r12-7676-g5a4e208022e704 you'll see, in libstdc++.log:
...
FAIL: 20_util/from_chars/6.cc (test for excess errors)
Excess errors:
/home/hp/tmp/auto0321/gcc/libstdc++-v3/testsuite/20_util/from_chars/6.cc:33: error: 'FE_DOWNWARD' was not declared in this scope

UNRESOLVED: 20_util/from_chars/6.cc compilation failed to produce executable
...

It appears to be a side-effect of that commit changing the
way __cpp_lib_to_chars is defined.  (On the bright side,
./7.cc now passes since that commit.)

TFM, specifically fenv(3), says that "Each of the macros
FE_DIVBYZERO, FE_INEXACT, FE_INVALID, FE_OVERFLOW,
FE_UNDERFLOW is defined when the implementation supports
handling of the corresponding exception".

A git-grep shows that this was the only place using a FE_ macro
unconditionally.

libstdc++-v3:
	* testsuite/20_util/from_chars/6.cc (test01) [FE_DOWNWARD]:
	Conditionalize call to fesetround.
2022-03-22 22:53:02 +01:00
Marek Polacek
4b7d9f8f51 c: -Wmissing-field-initializers and designated inits [PR82283, PR84685]
This patch fixes two kinds of wrong -Wmissing-field-initializers
warnings.  Our docs say that this warning "does not warn about designated
initializers", but we give a warning for

1) the array case:

  struct S {
    struct N {
      int a;
      int b;
    } c[1];
  } d = {
    .c[0].a = 1,
    .c[0].b = 1, // missing initializer for field 'b' of 'struct N'
  };

we warn because push_init_level, when constructing an array, clears
constructor_designated (which the warning relies on), and we forget
that we were in a designated initializer context.  Fixed by the
push_init_level hunk; and

2) the compound literal case:

  struct T {
    int a;
    int *b;
    int c;
  };

  struct T t = { .b = (int[]){1} }; // missing initializer for field 'c' of 'struct T'

where set_designator properly sets constructor_designated to 1, but the
compound literal causes us to create a whole new initializer_stack in
start_init, which clears constructor_designated.  Then, after we've parsed
the compound literal, finish_init flushes the initializer_stack entry,
but doesn't restore constructor_designated, so we forget we were in
a designated initializer context, which causes the bogus warning.  (The
designated flag is also tracked in constructor_stack, but in this case,
we didn't perform push_init_level between set_designator and start_init
so it wasn't saved anywhere.)

	PR c/82283
	PR c/84685

gcc/c/ChangeLog:

	* c-typeck.cc (struct initializer_stack): Add 'designated' member.
	(start_init): Set it.
	(finish_init): Restore constructor_designated.
	(push_init_level): Set constructor_designated to the value of
	constructor_designated in the upper constructor_stack.

gcc/testsuite/ChangeLog:

	* gcc.dg/Wmissing-field-initializers-1.c: New test.
	* gcc.dg/Wmissing-field-initializers-2.c: New test.
	* gcc.dg/Wmissing-field-initializers-3.c: New test.
	* gcc.dg/Wmissing-field-initializers-4.c: New test.
	* gcc.dg/Wmissing-field-initializers-5.c: New test.
2022-03-22 16:40:40 -04:00
Harald Anlauf
774ab2edcb Fortran: ensure intialization of stride array
gcc/fortran/ChangeLog:

	PR fortran/104999
	* simplify.cc (gfc_simplify_cshift): Ensure temporary holding
	source array stride is initialized.
2022-03-22 20:54:18 +01:00
Jakub Jelinek
b6e33907eb testsuite: Add testcase for already fixed PR [PR102489]
This got broken with r12-3529 and fixed with r12-5255.

2022-03-22  Jakub Jelinek  <jakub@redhat.com>

	PR c++/102489
	* g++.dg/coroutines/pr102489.C: New test.
2022-03-22 15:37:20 +01:00
Tom de Vries
24ee43194a [nvptx] Use '%' as register prefix
The percentage sign as first character of a ptx identifier can be used to
avoid name conflicts, e.g., between user-defined variable names and
compiler-generated names.

The insn nvptx_uniform_warp_check contains register names without '%' prefix,
which potentially could lead to name conflicts with user-defined variable
names.

Fix this by adding a '%' prefix, more specifically a '%r_' prefix to avoid a
name conflict with ptx special registers.

Tested on x86_64 with nvptx accelerator.

gcc/ChangeLog:

2022-03-20  Tom de Vries  <tdevries@suse.de>

	PR target/104925
	* config/nvptx/nvptx.md (define_insn "nvptx_uniform_warp_check"):
	Use % as register prefix.
2022-03-22 14:40:04 +01:00
Tom de Vries
b57358cc71 [nvptx] Limit HFmode support to mexperimental
With PR104489 still open and end-of-stage-4 approaching, classify HFmode
support as experimental, which is not enabled by default but can be enabled
using -mexperimental.

This fixes the nvptx build when the default sm_xx is set to sm_53 or higher.

Note that we're not using -mfp16 or some such, because that might create
expectations about being able to switch support on or off in the future, and
at this point it's not clear why, once reaching non-experimental status, it
shouldn't always be enabled.

gcc/ChangeLog:

2022-03-19  Tom de Vries  <tdevries@suse.de>

	* config/nvptx/nvptx.cc (nvptx_scalar_mode_supported_p)
	(nvptx_libgcc_floating_mode_supported_p): Only enable HFmode for
	mexperimental.

gcc/testsuite/ChangeLog:

2022-03-19  Tom de Vries  <tdevries@suse.de>

	* gcc.target/nvptx/float16-1.c: Add additional-options -mexperimental.
	* gcc.target/nvptx/float16-2.c: Same.
	* gcc.target/nvptx/float16-3.c: Same.
	* gcc.target/nvptx/float16-4.c: Same.
	* gcc.target/nvptx/float16-5.c: Same.
	* gcc.target/nvptx/float16-6.c: Same.
2022-03-22 14:35:35 +01:00
Tom de Vries
a4baa0d3c5 [nvptx] Add mexperimental
Add new option -mexperimental.

This allows, rather than developing a new feature to completion in a
development branch, to develop a new feature on trunk, without disturbing
trunk.

The equivalent of the feature branch merge then becomes making the
functionality available for -mno-experimental.

If more features at the same time will be developed, we can do something like
-mexperimental=feature1,feature2 but for now that's not necessary.

For now, has no effect.

gcc/ChangeLog:

2022-03-19  Tom de Vries  <tdevries@suse.de>

	* config/nvptx/nvptx.opt (mexperimental): New option.
2022-03-22 14:35:35 +01:00
Tom de Vries
f8b15e1771 [nvptx] Use .alias directive for mptx >= 6.3
Starting with ptx isa version 6.3, a ptx directive .alias is available.
Use this directive to support symbol aliases, as far as possible.

The alias support is off by default.  It can be turned on using a switch
-malias.

Furthermore, for pre-sm_75, it's not effective unless the ptx version is
bumped to 6.3 or higher using -mptx (given that the default for pre-sm_75 is
6.0).

The alias support has the following limitations.

Only function aliases are supported.

Weak aliases are not supported.  That is, if I disable the check in
nvptx_asm_output_def_from_decls that disallows this, a weak alias is emitted
and parsed by the driver.  But the test gcc.dg/globalalias.c starts failing,
with the behaviour matching the comment about "weird behavior of AIX's .set
pseudo-op": a weak alias may resolve to different functions in different
files.

Aliases to weak symbols are not supported (see gcc.dg/localalias.c).  This is
currently not prohibited by the compiler, but with the driver link we run
into: "error: Function test with .weak scope cannot be aliased".

Aliases to aliases are not supported (see libgomp.c-c++-common/pr96390.c).
This is currently not prohibited by the compiler, but with the driver link we
run into:  "Internal error: alias to unknown symbol" .

Unreferenced aliases are not emitted (these can occur f.i. when inlining a
call to an alias).  This avoids driver link error "Internal error: reference
to deleted section".

When enabling malias by default, libgomp detects alias support and
consequently libgomp.a will contains a few uses of .alias.  This however
results in aforementioned "Internal error: reference to deleted section" in
many test-cases.  Either there's some error with how .alias is used, or
there's a driver bug.  While this issue is not resolved, we keep malias
off-by-default.

At some point we may add support in the nvptx-tools linker for symbol
aliases, and define f.i. malias=ptx and malias=ld to choose between the two in
the compiler.

An example of where this support is useful, is the OvO (OpenMP vs Offload)
testsuite.  The testsuite passes already at -O2.  But at -O0, there are errors
in some c++ test-cases due to missing symbol alias support.  By compiling with
-malias, the whole testsuite passes also at -O0.

This patch causes a regression:
...
-PASS: gcc.dg/pr60797.c  (test for errors, line 4)
+FAIL: gcc.dg/pr60797.c  (test for errors, line 4)
...
The test-case is skipped for effective target alias, and both without and with
this patch the nvptx target is considered to not support it, so the test-case is
executed.  The test-case expects an error message along the lines of "alias
definitions not supported in this configuration", but instead we run into:
...
gcc.dg/pr60797.c:4:12: error: foo aliased to undefined symbol
...
This is probably due to the fact that the nvptx backend now defines macros
ASM_OUTPUT_DEF and ASM_OUTPUT_DEF_FROM_DECLS, so from the point of view of the
common part of the compiler, aliases are supported.

gcc/ChangeLog:

2022-03-18  Tom de Vries  <tdevries@suse.de>

	PR target/104957
	* config/nvptx/nvptx-protos.h (nvptx_asm_output_def_from_decls): Declare.
	* config/nvptx/nvptx.cc (write_fn_proto_1): Don't add function marker
	for alias.
	(SET_ASM_OP, NVPTX_ASM_OUTPUT_DEF): New macro def.
	(nvptx_asm_output_def_from_decls): New function.
	* config/nvptx/nvptx.h (ASM_OUTPUT_DEF): New macro def, define to
	gcc_unreachable ().
	(ASM_OUTPUT_DEF_FROM_DECLS): New macro def, define to
	nvptx_asm_output_def_from_decls.
	* config/nvptx/nvptx.opt (malias): New opt.

gcc/testsuite/ChangeLog:

2022-03-18  Tom de Vries  <tdevries@suse.de>

	PR target/104957
	* gcc.target/nvptx/alias-1.c: New test.
	* gcc.target/nvptx/alias-2.c: New test.
	* gcc.target/nvptx/alias-3.c: New test.
	* gcc.target/nvptx/alias-4.c: New test.
	* gcc.target/nvptx/nvptx.exp
	(check_effective_target_runtime_ptx_isa_version_6_3): New proc.
2022-03-22 14:35:34 +01:00
Tom de Vries
a624388b95 [nvptx] Add warp sync at simt exit
Consider this code (with N defined to 1024):
...
  float v = 0.0;
  #pragma omp target map(tofrom: v)
  #pragma omp parallel for simd
  for (int i = 0 ; i < N; i++)
    {
      #pragma omp atomic update
      v = v + 1.0;
    }
...

It hangs when executing on target board unix/-foffload=-misa=sm_75, using
drivers 470.103.01 and 510.54 on a T400 board (sm_75).

I'm tentatively identifying the problem as a bug in -muniform-simt for
architectures that support Independent Thread Scheduling (sm_70 and later).

The problem -muniform-simt is trying to address is to make sure that a
register produced outside an openmp simd region is available when used in any
lane inside an simd region.

The solution is to, outside an simd region, execute in all warp lanes, thus
producing consistent values in result registers in each warp thread.

This approach doesn't work when executing in all warp lanes multiplies the
side effects from 1 to 32 separate side effects, which is the case for atomic
insns.  So atomic insns are rewritten to execute only in lane 0, and if
there are any results, those are propagated to the other threads in the warp.
[ And likewise for system calls malloc, free, vprintf. ]

Now, consider a non-atomic update: ld, add, store.  The store has side
effects, are those multiplied or not?

Pre-sm_70 we can assume that at the end of an SIMT region, any divergent
control flow has reconverged, and we have a uniform warp, executing in lock
step.  So:
- the load will load the same value into the result register across the warp,
- the add will write the same value into the result register across the warp,
- the store will write the same value to the same memory location, 32 times,
  at once, having the result of a single store.
So, no side-effect multiplication (well, at least that's the observation).

Starting sm_70, the threads in a warp are no longer guaranteed to reconverge
after divergence.  There's a "Convergence Optimizer" that can can identify
that it is safe for a warp to reconverge, but that works only as long as the
code does not contain "synchronizing operations".

Consequently, the ld, add, store sequence can be executed by a non-uniform
warp, which means the side effects can have multiplied, and the registers are
no longer guarantueed to be in sync.

The atomic update in the example above is translated using an atom.cas loop,
which means that we have divergence (because only one thread is allowed to
succeed at a time) and the "Convergence Optimizer" doesn't reconverge probably
because the atom.cas counts as a "synchronizing operation".  So, it seems
plausible that the root cause for the mentioned hang is the problem described
above.

Fix this by adding an explicit warp sync at simt exit.

Note that we're assuming here that the warp will stay uniform until the next
SIMT region entry.

Tested on x86_64 with nvptx accelerator.

gcc/ChangeLog:

2022-03-09  Tom de Vries  <tdevries@suse.de>

	PR target/104916
	PR target/104783
	* config/nvptx/nvptx.md (define_expand "omp_simt_exit"): Emit warp
	sync (or uniform warp check for mptx < 6.0).

libgomp/ChangeLog:

2022-03-15  Tom de Vries  <tdevries@suse.de>

	PR target/104916
	PR target/104783
	* testsuite/libgomp.c/pr104783-2.c: New test.
2022-03-22 14:35:34 +01:00
Richard Biener
08f263e703 tree-optimization/105012 - fix ICE from local DSE of if-conversion
The following guards dse_classify_store with the same condition as
the DSE pass does - availability of a virtual definition.  For
the PR we run into the fortran frontend generating a clobber for
a FUNCTION_DECL lhs which is ignored by the operand scanner and has
no virtual operands assigned.  Apart from fixing the frontend the
following fixes the ICE by adjusting if-conversion.

2022-03-22  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/105012
	* tree-if-conv.cc (ifcvt_local_dce): Only call
	dse_classify_store when we have a VDEF.
2022-03-22 14:05:18 +01:00
Martin Liska
ef0e11ac88 nvptx: fix wrapping in an error message.
PR target/104902

gcc/ChangeLog:

	* config/nvptx/nvptx.cc (handle_ptx_version_option):
	Fix option wrapping in an error message.
2022-03-22 13:53:05 +01:00
Martin Liska
a47b1599f2 rs6000: wrap const in an error message.
PR target/104903

gcc/ChangeLog:

	* config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin):
	Wrap const keyword.
2022-03-22 13:50:20 +01:00
Martin Liska
c1ba4e5b88 v850: fix typo in pragma name
PR target/104904

gcc/ChangeLog:

	* config/v850/v850-c.cc (pop_data_area): Fix typo in pragma
	name.
2022-03-22 13:42:37 +01:00
Martin Liska
d85a84c76b rs6000: update error message format.
PR target/104898

gcc/ChangeLog:

	* config/rs6000/rs6000.cc (rs6000_option_override_internal):
	Use %qs instead of (%qs).
2022-03-22 13:39:42 +01:00
Martin Liska
1c2a3aeee7 i386: update error message format.
Use '%qs' instead of '(%qs)'.

	PR target/104898

gcc/ChangeLog:

	* config/i386/i386-options.cc (ix86_option_override_internal):
	  Use '%qs' instead of '(%qs)'.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr99753.c: Update test.
	* gcc.target/i386/spellcheck-options-1.c: Likewise.
	* gcc.target/i386/spellcheck-options-2.c: Likewise.
	* gcc.target/i386/spellcheck-options-4.c: Likewise.
2022-03-22 13:39:31 +01:00
Martin Liska
c6e75a4a35 aarch64: update error message format.
Use 'qs' and remove usage '(%qs)'.

	PR target/104898

gcc/ChangeLog:

	* config/aarch64/aarch64.cc (aarch64_handle_attr_arch):
	Use 'qs' and remove usage '(%qs)'.
	(aarch64_handle_attr_cpu): Likewise.
	(aarch64_handle_attr_tune): Likewise.
	(aarch64_handle_attr_isa_flags): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/branch-protection-attr.c:
	Use 'qs' and remove usage '(%qs)'.
	* gcc.target/aarch64/spellcheck_1.c: Likewise.
	* gcc.target/aarch64/spellcheck_2.c: Likewise.
	* gcc.target/aarch64/spellcheck_3.c: Likewise.
2022-03-22 13:27:16 +01:00
Andre Vieira
930eb8b6c2 aarch64: Update regmove costs for neoverse-v1 and neoverse-512tvb tunings
This patch updates the register move tunings for
-mcpu/-mtune={neoverse-v1,neoverse-512tvb}.

gcc/ChangeLog:
2022-03-22  Tamar Christina  <tamar.christina@arm.com>
	    Andre Vieira  <andre.simoesdiasvieira@arm.com>

	* config/aarch64/aarch64.cc (neoversev1_regmove_cost): New tuning
	struct.
	(neoversev1_tunings): Use neoversev1_regmove_cost and update store_int
	cost.
	(neoverse512tvb_tunings): Likewise.
2022-03-22 12:01:59 +00:00
Andre Vieira
27d8748df5 aarch64: Add Demeter tuning structs
This patch adds tuning structs for -mcpu/-mtune=demeter.

gcc/ChangeLog:

2022-03-22  Tamar Christina  <tamar.christina@arm.com>
	    Andre Vieira  <andre.simoesdiasvieira@arm.com>

	* config/aarch64/aarch64.cc (demeter_addrcost_table,
	demeter_regmove_cost, demeter_advsimd_vector_cost,
	demeter_sve_vector_cost, demeter_scalar_issue_info,
	demeter_advsimd_issue_info, demeter_sve_issue_info,
	demeter_vec_issue_info, demeter_vector_cost,
	demeter_tunings): New tuning structs.
	(aarch64_ve_op_count::rename_cycles_per_iter): Enable for demeter
	tuning.
	* config/aarch64/aarch64-cores.def: Add entry for demeter.
	* config/aarch64/aarch64-tune.md (tune): Add demeter to list.
2022-03-22 12:00:52 +00:00
Andre Vieira
b074fa6970 aarch64: Update reg-costs to differentiate between memmove costs
This patch introduces a struct to differentiate between different memmove costs
to enable a better modeling of memory operations. These have been modelled for
-mcpu/-mtune=neoverse-v1/neoverse-n1/neoverse-n2/neoverse-512tvb, for all other
tunings all entries are equal to the old single memmove cost to ensure the
behaviour remains the same.

2022-03-16  Tamar Christina  <tamar.christina@arm.com>
	    Andre Vieira  <andre.simoesdiasvieira@arm.com>

gcc/ChangeLog:

	* config/aarch64/aarch64-protos.h (struct cpu_memmov_cost): New struct.
	(struct tune_params): Change type of memmov_cost to use cpu_memmov_cost.
	* config/aarch64/aarch64.cc (aarch64_memory_move_cost): Update all
	tunings to use cpu_memmov_cost struct.
2022-03-22 11:59:21 +00:00
Andre Vieira
a850930164 aarch64: Add Neoverse-N2 tuning structs
This patch adds tuning structures for Neoverse N2.

2022-03-22  Tamar Christina  <tamar.christina@arm.com>
	    Andre Vieira  <andre.simoesdiasvieira@arm.com>

	* config/aarch64/aarch64.cc (neoversen2_addrcost_table,
	neoversen2_regmove_cost, neoversen2_advsimd_vector_cost,
	neoversen2_sve_vector_cost, neoversen2_scalar_issue_info,
	neoversen2_advsimd_issue_info, neoversen2_sve_issue_info,
	neoversen2_vec_issue_info, neoversen2_tunings): New structs.
	(neoversen2_tunings): Use new structs and update tuning flags.
	(aarch64_vec_op_count::rename_cycles_per_iter): Enable for neoversen2
	tuning.
2022-03-22 11:56:43 +00:00
Andre Vieira
0bae246acc aarch64: Enable FP16 feature by default for Armv9
This patch adds the feature bit for FP16 to the feature set for Armv9 since
Armv9 requires SVE to be implemented and SVE requires FP16 to be implemented.

2022-03-22  Andre Vieira  <andre.simoesdiasvieira@arm.com>

	* config/aarch64/aarch64.h (AARCH64_FL_FOR_ARCH9): Add FP16 feature
	bit.
2022-03-22 11:47:01 +00:00
Jakub Jelinek
6ee5892638 lto-plugin: Use GNU ld or Solaris ld version script in preference to -export-symbols-regex [PR102426]
As reported, libtool -export-symbols-regex doesn't work on Solaris
when using GNU ld instead of Sun ld, libtool just always assumes Sun ld.
As I'm unsure what is the maintainance status of libtool right now,
this patch solves it on the lto-plugin side instead, tests at configure time
similar way how libssp and other target libraries test for symbol versioning
(except omitting the symbol version because we just want one GLOBAL symbol
and rest of them LOCAL), and will use the current way of
-export-symbols-regex onload as fallback when this doesn't work.

2022-03-22  Jakub Jelinek  <jakub@redhat.com>

	PR lto/102426
lto-plugin/
	* configure.ac (LTO_PLUGIN_USE_SYMVER, LTO_PLUGIN_USE_SYMVER_GNU,
	LTO_PLUGIN_USE_SYMVER_SUN): New test for symbol versioning support.
	* Makefile.am (version_arg, version_dep): Set conditionally based
	on LTO_PLUGIN_USE_SYMVER*.
	(liblto_plugin_la_LDFLAGS): Use $(version_arg) instead of
	-export-symbols-regex onload.
	(liblto_plugin_la_DEPENDENCIES): Depend on $(version_dep).
	* lto-plugin.map: New file.
	* configure: Regenerated.
	* Makefile.in: Regenerated.
2022-03-22 11:03:54 +01:00
liuhongt
919fbffef0 Extend splitter pattern to reversed condition by swapping then and else rtx. [PR target/104982]
Failed to match this instruction:
(set (reg/v:SI 88 [ z ])
    (if_then_else:SI (eq (zero_extract:SI (reg:SI 92)
                (const_int 1 [0x1])
                (zero_extend:SI (subreg:QI (reg:SI 93) 0)))
            (const_int 0 [0]))
        (reg:SI 95)
        (reg:SI 94)))

but it's equal to

(set (reg/v:SI 88 [ z ])
    (if_then_else:SI (ne (zero_extract:SI (reg:SI 92)
                (const_int 1 [0x1])
                (zero_extend:SI (subreg:QI (reg:SI 93) 0)))
            (const_int 0 [0]))
        (reg:SI 94)
        (reg:SI 95)))

which is the exact existing splitter.

The patch will fix below regressions:

On x86-64, r12-7687 caused:

FAIL: gcc.target/i386/bt-5.c scan-assembler-not sar[lq][ \t]
FAIL: gcc.target/i386/bt-5.c scan-assembler-times bt[lq][ \t] 7

gcc/ChangeLog:

	PR target/104982
	* config/i386/i386.md (*jcc_bt<mode>_mask): Extend the
	following splitter to reversed condition.
2022-03-22 16:31:34 +08:00