Commit Graph

193966 Commits

Author SHA1 Message Date
Nathan Sidwell
e6d369bbdb c++: Add a late-writing step for modules
To add a module initializer optimization, we need to defer finishing writing
out the module file until the end of determining the dynamic initializers.
This is achieved by passing some saved-state from the main module writing
to a new function that completes it.

This patch merely adds the skeleton of that state and move things around,
allowing the finalization of the ELF file to be postponed.  None of the
contents writing is moved, or the init optimization added.

	gcc/cp/
	* cp-tree.h (fini_modules): Add some parameters.
	(finish_module_processing): Return an opaque pointer.
	* decl2.cc (c_parse_final_cleanups): Propagate a cookie from
	finish_module_processing to fini_modules.
	* module.cc (struct module_processing_cookie): New.
	(finish_module_processing): Return a heap-allocated cookie.
	(late_finish_module): New.  Finish out the module writing.
	(fini_modules): Adjust.
2022-06-10 12:32:22 -07:00
Jakub Jelinek
1eff4872d2 openmp: Call dlopen with "libmemkind.so.0" rather than "libmemkind.so"
On Thu, Jun 09, 2022 at 12:11:28PM +0200, Thomas Schwinge wrote:
> > This patch adds support for dlopening libmemkind.so
>
> Instead of 'dlopen'ing literally 'libmemkind.so':
> ..., shouldn't this instead 'dlopen' 'libmemkind.so.0'?  At least for
> Debian/Ubuntu, the latter ('libmemkind.so.0') is shipped in the "library"
> package:

I agree and I've actually noticed it too right before committing, but I thought
I'll investigate and tweak incrementally because "libmemkind.so"
is what I've actually tested (it is what llvm libomp uses).

Here is the now tested incremental fix.

2022-06-10  Jakub Jelinek  <jakub@redhat.com>

	* allocator.c (gomp_init_memkind): Call dlopen with "libmemkind.so.0"
	rather than "libmemkind.so".
2022-06-10 21:19:51 +02:00
Nathan Sidwell
c08ba00487 c++: Adjust module initializer calling emission
We special-case emitting the calls of module initializer functions.  It's
simpler to just emit a static fn do do that, and add it onto the front of
the global init fn chain.  We can also move the calculation of the set of
initializers to call to the point of use.

	gcc/cp/
	* cp-tree.h (module_has_import_init): Rename to ...
	(module_determined_import_inits): ... here.
	* decl2.cc (start_objects): Do not handle module initializers
	here.
	(c_parse_final_cleanups): Generate a separate module
	initializer calling function and add it to the list.  Shrink
	the c-lang region.
	* module.cc (num_init_calls_needed): Delete.
	 (module_has_import_init): Rename to ...
	(module_determined_import_inits): ... here. Do the
	calculation here ...
	(finish_module_processing): ... rather than here.
	(module_add_import_initializers): Reformat.

	gcc/testsuite/
	* g++.dg/modules/init-3_a.C: New.
	* g++.dg/modules/init-3_b.C: New.
	* g++.dg/modules/init-3_c.C: New.
2022-06-10 09:27:40 -07:00
Thomas Schwinge
1459b55d24 libgomp nvptx plugin: Remove '--with-cuda-driver=[...]' etc. configuration option
That means, exposing to the user only the '--without-cuda-driver' behavior:
including the GCC-shipped 'include/cuda/cuda.h' (not system <cuda.h>), and
'dlopen'ing the CUDA Driver library (not linking it).

For development purposes, the libgomp nvptx plugin developer may still manually
override that, to get the previous '--with-cuda-driver' behavior.

	libgomp/
	* plugin/Makefrag.am: Evaluate 'if PLUGIN_NVPTX_DYNAMIC' to true.
	* plugin/configfrag.ac (--with-cuda-driver)
	(--with-cuda-driver-include, --with-cuda-driver-lib)
	(CUDA_DRIVER_INCLUDE, CUDA_DRIVER_LIB, PLUGIN_NVPTX_CPPFLAGS)
	(PLUGIN_NVPTX_LDFLAGS, PLUGIN_NVPTX_LIBS, PLUGIN_NVPTX_DYNAMIC):
	Remove.
	* testsuite/libgomp-test-support.exp.in (cuda_driver_include)
	(cuda_driver_lib): Remove.
	* testsuite/lib/libgomp.exp (libgomp_init): Don't consider these.
	* Makefile.in: Regenerate.
	* configure: Likewise.
	* testsuite/Makefile.in: Likewise.
2022-06-10 17:08:57 +02:00
Jonathan Wakely
671970a562 libstdc++: Make std::lcm and std::gcd detect overflow [PR105844]
When I fixed PR libstdc++/92978 I introduced a regression whereby
std::lcm(INT_MIN, 1) and std::lcm(50000, 49999) would no longer produce
errors during constant evaluation. Those calls are undefined, because
they violate the preconditions that |m| and the result can be
represented in the return type (which is int in both those cases). The
regression occurred because __absu<unsigned>(INT_MIN) is well-formed,
due to the explicit casts to unsigned in that new helper function, and
the out-of-range multiplication is well-formed, because unsigned
arithmetic wraps instead of overflowing.

To fix 92978 I made std::gcm and std::lcm calculate |m| and |n|
immediately, yielding a common unsigned type that was used to calculate
the result. That was partly correct, but there's no need to use an
unsigned type. Doing so only suppresses the overflow errors so the
compiler can't detect them. This change replaces __absu with __abs_r
that returns the common type (not its corresponding unsigned type). This
way we can detect overflow in __abs_r when required, while still
supporting the most-negative value when it can be represented in the
result type. To detect LCM results that are out of range of the result
type we still need explicit checks, because neither constant evaluation
nor UBsan will complain about unsigned wrapping for cases such as
std::lcm(500000u, 499999u). We can detect those overflows efficiently by
using __builtin_mul_overflow and asserting.

libstdc++-v3/ChangeLog:

	PR libstdc++/105844
	* include/experimental/numeric (experimental::gcd): Simplify
	assertions. Use __abs_r instead of __absu.
	(experimental::lcm): Likewise. Remove use of __detail::__lcm so
	overflow can be detected.
	* include/std/numeric (__detail::__absu): Rename to __abs_r and
	change to allow signed result type, so overflow can be detected.
	(__detail::__lcm): Remove.
	(gcd): Simplify assertions. Use __abs_r instead of __absu.
	(lcm): Likewise. Remove use of __detail::__lcm so overflow can
	be detected.
	* testsuite/26_numerics/gcd/gcd_neg.cc: Adjust dg-error lines.
	* testsuite/26_numerics/lcm/lcm_neg.cc: Likewise.
	* testsuite/26_numerics/gcd/105844.cc: New test.
	* testsuite/26_numerics/lcm/105844.cc: New test.
2022-06-10 15:24:29 +01:00
Jonathan Wakely
1e65f2ed99 libstdc++: Fix lifetime bugs for non-TLS eh_globals [PR105880]
This ensures that the single-threaded fallback buffer eh_globals is not
destroyed during program termination, using the same immortalization
technique used for error category objects.

Also ensure that init._M_init can still be read after init has been
destroyed, by making it a static data member.

libstdc++-v3/ChangeLog:

	PR libstdc++/105880
	* libsupc++/eh_globals.cc (eh_globals): Ensure constant init and
	prevent destruction during termination.
	(__eh_globals_init::_M_init): Replace with static member _S_init.
	(__cxxabiv1::__cxa_get_globals_fast): Update.
	(__cxxabiv1::__cxa_get_globals): Likewise.
2022-06-10 15:24:29 +01:00
Roger Sayle
1753a71201 PR rtl-optimization/7061: Complex number arguments on x86_64-like ABIs.
This patch addresses the issue in comment #6 of PR rtl-optimization/7061
(a four digit PR number) from 2006 where on x86_64 complex number arguments
are unconditionally spilled to the stack.

For the test cases below:
float re(float _Complex a) { return __real__ a; }
float im(float _Complex a) { return __imag__ a; }

GCC with -O2 currently generates:

re:	movq    %xmm0, -8(%rsp)
        movss   -8(%rsp), %xmm0
        ret
im:	movq    %xmm0, -8(%rsp)
        movss   -4(%rsp), %xmm0
        ret

with this patch we now generate:

re:	ret
im:	movq    %xmm0, %rax
        shrq    $32, %rax
        movd    %eax, %xmm0
        ret

[Technically, this shift can be performed on %xmm0 in a single
instruction, but the backend needs to be taught to do that, the
important bit is that the SCmode argument isn't written to the
stack].

The patch itself is to emit_group_store where just before RTL
expansion commits to writing to the stack, we check if the store
group consists of a single scalar integer register that holds
a complex mode value; on x86_64 SCmode arguments are passed in
DImode registers.  If this is the case, we can use a SUBREG to
"view_convert" the integer to the equivalent complex mode.

An interesting corner case that showed up during testing is that
x86_64 also passes HCmode arguments in DImode registers(!), i.e.
using modes of different sizes.  This is easily handled/supported
by first converting to an integer mode of the correct size, and
then generating a complex mode SUBREG of this.  This is similar
in concept to the patch I proposed here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590139.html

2020-06-10  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	PR rtl-optimization/7061
	* expr.cc (emit_group_store): For groups that consist of a single
	scalar integer register that hold a complex mode value, use
	gen_lowpart to generate a SUBREG to "view_convert" to the complex
	mode.  For modes of different sizes, first convert to an integer
	mode of the appropriate size.

gcc/testsuite/ChangeLog
	PR rtl-optimization/7061
	* gcc.target/i386/pr7061-1.c: New test case.
	* gcc.target/i386/pr7061-2.c: New test case.
2022-06-10 15:16:55 +01:00
Jonathan Wakely
b370ed0bf9 libstdc++: Make std::hash<basic_string<>> allocator-agnostic (LWG 3705)
This new library issue was recently moved to Tentatively Ready by an LWG
poll, so I'm making the change on trunk.

As noted in PR libstc++/105907 the std::hash specializations for PMR
strings were not treated as slow hashes by the unordered containers, so
this change preserves that. The new specializations for custom
allocators are also not treated as slow, for the same reason. For the
versioned namespace (i.e. unstable ABI) we don't have to worry about
that, so can enable hash code caching for all basic_string
specializations.

libstdc++-v3/ChangeLog:

	* include/bits/basic_string.h (__hash_str_base): New class
	template.
	(hash<basic_string<C, char_traits<C>, A>>): Define partial
	specialization for each of the standard character types.
	(hash<string>, hash<wstring>, hash<u8string>, hash<u16string>)
	(hash<u32string>): Remove explicit specializations.
	* include/std/string (__hash_string_base): Remove class
	template.
	(hash<pmr::string>, hash<pmr::wstring>, hash<pmr::u8string>)
	(hash<pmr::u16string>, hash<pmr::u32string>): Remove explicit
	specializations.
	* testsuite/21_strings/basic_string/hash/hash.cc: Test with
	custom allocators.
	* testsuite/21_strings/basic_string/hash/hash_char8_t.cc:
	Likewise.
2022-06-10 14:39:25 +01:00
Antoni Boucher
5940b4e59f libgccjit: Support getting the size of a float [PR105829]
2022-06-09  Antoni Boucher  <bouanto@zoho.com>

gcc/jit/
	PR jit/105829
	* libgccjit.cc: Add support for floating-point types in
	gcc_jit_type_get_size.

gcc/testsuite/
	PR jit/105829
	* jit.dg/test-types.c: Add tests for gcc_jit_type_get_size.
2022-06-09 21:50:25 -04:00
GCC Administrator
e3bba42fb5 Daily bump. 2022-06-10 00:16:43 +00:00
Takayuki 'January June' Suwa
29dc90a580 xtensa: Add clrsbsi2 insn pattern
> (clrsb:m x)
> Represents the number of redundant leading sign bits in x, represented
> as an integer of mode m, starting at the most significant bit position.

This explanation is just what the NSA instruction (not ever emitted before)
calculates in Xtensa ISA.

gcc/ChangeLog:

	* config/xtensa/xtensa.md (clrsbsi2): New insn pattern.

libgcc/ChangeLog:

	* config/xtensa/lib1funcs.S (__clrsbsi2): New function.
	* config/xtensa/t-xtensa (LIB1ASMFUNCS): Add _clrsbsi2.
2022-06-09 15:07:59 -07:00
Takayuki 'January June' Suwa
e44e7face1 xtensa: Optimize '(~x & y)' to '((x & y) ^ y)'
In Xtensa ISA, there is no single machine instruction that calculates unary
bitwise negation.

gcc/ChangeLog:

	* config/xtensa/xtensa.md (*andsi3_bitcmpl):
	New insn_and_split pattern.

gcc/testsuite/ChangeLog:

	* gcc.target/xtensa/check_zero_byte.c: New.
2022-06-09 15:07:47 -07:00
Takayuki 'January June' Suwa
9777d446e2 xtensa: Make one_cmplsi2 optimizer-friendly
In Xtensa ISA, there is no single machine instruction that calculates unary
bitwise negation.  But a few optimizers assume that bitwise negation can be
done by a single insn.

As a result, '((x < 0) ? ~x : x)' cannot be optimized to '(x ^ (x >> 31))'
ever before, for example.

This patch relaxes such limitation, by putting the insn expansion off till
the split pass.

gcc/ChangeLog:

	* config/xtensa/xtensa.md (one_cmplsi2):
	Rearrange as an insn_and_split pattern.

gcc/testsuite/ChangeLog:

	* gcc.target/xtensa/one_cmpl_abs.c: New.
2022-06-09 15:07:22 -07:00
Takayuki 'January June' Suwa
2fcc69d8ce xtensa: Implement bswaphi2 insn pattern
This patch adds bswaphi2 insn pattern that is one instruction less than the
default expansion.

gcc/ChangeLog:

	* config/xtensa/xtensa.md (bswaphi2): New insn pattern.
2022-06-09 15:07:08 -07:00
Joseph Myers
6458486345 Update gcc sv.po
* sv.po: Update.
2022-06-09 22:04:25 +00:00
Segher Boessenkool
a05aac0a13 rs6000: Delete FP_ISA3
FP_ISA3 is exactly the same as SFDF, just a less obvious name.  So,
let's delete it.

2022-06-09  Segher Boessenkool  <segher@kernel.crashing.org>

	* config/rs6000/rs6000.md (FP_ISA3): Delete.
	(float<QHI:mode><FP_ISA3:mode>2): Rename to...
	(float<QHI:mode><SFDF:mode>2): ... this.  Adjust.
	(*float<QHI:mode><FP_ISA3:mode>2_internal): Rename to...
	(*float<QHI:mode><SFDF:mode>2_internal): ... this.  Adjust.
	(floatuns<QHI:mode><FP_ISA3:mode>2): Rename to...
	(floatuns<QHI:mode><SFDF:mode>2): ... this.  Adjust.
	(*floatuns<QHI:mode><FP_ISA3:mode>2_internal): Rename to...
	(*floatuns<QHI:mode><SFDF:mode>2_internal): ... this.  Adjust.
2022-06-09 19:35:53 +00:00
Jakub Jelinek
699e9a0f67 openmp: Fix up include of the generic allocator.c
As reported by Richard Sandiford, #include "../../../allocator.c"
has one too many ../s, dunno why it worked for me when using
../configure (VPATH = ../../../libgomp)

2022-06-09  Jakub Jelinek  <jakub@redhat.com>

	* config/linux/allocator.c: Fix up #include directive.
2022-06-09 19:44:50 +02:00
Jakub Jelinek
4c334e0e4f c++: Fix up ICE on __builtin_shufflevector constexpr evaluation [PR105871]
As the following testcase shows, BIT_FIELD_REF result doesn't have to have
just integral type, it can also have vector type.  And in that case
cxx_eval_bit_field_ref just ICEs on it because it is unprepared for that
case, creates the initial value with build_int_cst (sure, that one could be
easily replaced with build_zero_cst) and then expects it can through shifts,
ands and ors come up with the final value, but that doesn't work for
vectors.

We already call fold_ternary if whole is a VECTOR_CST, this patch does the
same if the result doesn't have integral type.  And, there is no guarantee
fold_ternary will succeed and the callers certainly don't expect NULL
being returned, so it also diagnoses those as non-constant and returns
original t in that case.

2022-06-09  Jakub Jelinek  <jakub@redhat.com>

	PR c++/105871
	* constexpr.cc (cxx_eval_bit_field_ref): For BIT_FIELD_REF with
	non-integral result type use fold_ternary too like for BIT_FIELD_REFs
	from VECTOR_CST.  If fold_ternary returns NULL, diagnose non-constant
	expression, set *non_constant_p and return t, instead of returning
	NULL.

	* g++.dg/pr105871.C: New test.
2022-06-09 17:42:31 +02:00
Maciej W. Rozycki
702a11ade2 RISC-V: Use a tab rather than space with FSFLAGS
Consistently use a tab rather than a space as the separator between the
assembly instruction mnemonic and its operand with FSFLAGS instructions
produced with the unordered FP comparison RTL insns.

	gcc/
	* config/riscv/riscv.md
	(*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_default)
	(*f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_snan): Emit a tab
	rather than space with FSFLAGS.
2022-06-09 14:34:34 +01:00
Nathan Sidwell
97b81fb036 c++: Better module initializer code
Every module interface needs to emit a global initializer, but it
might have nothing to init.  In those cases, there's no need for any
idempotency boolean to be emitted.

	gcc/cp
	* cp-tree.h (module_initializer_kind): Replace with ...
	(module_global_init_needed, module_has_import_inits): ...
	these.
	* decl2.cc (start_objects): Add has_body parm.  Reorganize
	module initializer creation.
	(generate_ctor_or_dtor_function): Adjust.
	(c_parse_final_cleanups): Adjust.
	(vtv_start_verification_constructor_init_function): Adjust.
	* module.cc (module_initializer_kind): Replace with ...
	(module_global_init_needed, module_has_import_inits): ...
	these.

	gcc/testsuite/
	* g++.dg/modules/init-2_a.C: Check no idempotency.
	* g++.dg/modules/init-2_b.C: Check idempotency.
2022-06-09 06:22:15 -07:00
Tobias Burnus
209de00fdb OpenMP: Handle ancestor:1 with discover_declare_target
gcc/
	* omp-offload.cc (omp_discover_declare_target_tgt_fn_r,
	omp_discover_declare_target_fn_r): Don't walk reverse-offload
	target regions.

gcc/testsuite/
	* c-c++-common/gomp/reverse-offload-1.c: New.
2022-06-09 14:48:24 +02:00
Jakub Jelinek
2dc19a1b59 doc: Fix up -Waddress documentation
WHen looking up the -Waddress documentation due to some PR that mentioned it,
I've noticed some typos and thus I'm fixing them.

2022-06-09  Jakub Jelinek  <jakub@redhat.com>

	* doc/invoke.texi (-Waddress): Fix a typo in small example.
	Fix typos inptr_t -> intptr_t and uinptr_t -> uintptr_t.
2022-06-09 10:19:53 +02:00
Jakub Jelinek
17f52a1c72 openmp: Add support for HBW or large capacity or interleaved memory through the libmemkind.so library
This patch adds support for dlopening libmemkind.so on Linux and uses it
for some kinds of allocations (but not yet e.g. pinned memory).

2022-06-09  Jakub Jelinek  <jakub@redhat.com>

	* allocator.c: Include dlfcn.h if LIBGOMP_USE_MEMKIND is defined.
	(enum gomp_memkind_kind): New type.
	(struct omp_allocator_data): Add memkind field if LIBGOMP_USE_MEMKIND
	is defined.
	(struct gomp_memkind_data): New type.
	(memkind_data, memkind_data_once): New variables.
	(gomp_init_memkind, gomp_get_memkind): New functions.
	(omp_init_allocator): Initialize data.memkind, don't fail for
	omp_high_bw_mem_space if libmemkind supports it.
	(omp_aligned_alloc, omp_free, omp_aligned_calloc, omp_realloc): Add
	memkind support of LIBGOMP_USE_MEMKIND is defined.
	* config/linux/allocator.c: New file.
2022-06-09 10:14:42 +02:00
Cui,Lili
269edf4e5e Update {skylake,icelake,alderlake}_cost to add a bit preference to vector store.
Since the interger vector construction cost has changed, we need to adjust the
load and store costs for intel processers.

With the patch applied
538.imagic_r:gets ~6% improvement on ADL for multicopy.
525.x264_r  :gets ~2% improvement on ADL and ICX for multicopy.
with no measurable changes for other benchmarks.

gcc/ChangeLog

	PR target/105493
	* config/i386/x86-tune-costs.h (skylake_cost): Raise the gpr load cost
	from 4 to 6 and gpr store cost from 6 to 8. Change SSE loads and
	unaligned loads cost from {6, 6, 6, 10, 20} to {8, 8, 8, 8, 16}.
	(icelake_cost): Ditto.
	(alderlake_cost): Raise the gpr store cost from 6 to 8 and SSE loads,
	stores and unaligned stores cost from {6, 6, 6, 10, 15} to
	{8, 8, 8, 10, 15}.

gcc/testsuite/

	PR target/105493
	* gcc.target/i386/pr91446.c: Adjust to expect vectorization
	* gcc.target/i386/pr99881.c: XFAIL.
	* gcc.target/i386/pr105493.c: New.
	* g++.target/i386/pr105638.C: Use other sequence checks
	instead of vpxor, because code generation changed.
2022-06-09 14:59:44 +08:00
Haochen Gui
2fc6e3d55f This patch replaces shift and ior insns with one rotate and mask insn for the split patterns which are for DI byte swap on Power6.
gcc/
	* config/rs6000/rs6000.md (define_split for bswapdi load): Merge shift
	and ior insns to one rotate and mask insn.
	(define_split for bswapdi register): Likewise.

gcc/testsuite/
	* gcc.target/powerpc/pr93453-1.c: New.
2022-06-09 13:31:09 +08:00
GCC Administrator
02b4e2de32 Daily bump. 2022-06-09 00:16:26 +00:00
Jason Merrill
e8ed26c2ac c++: non-templated friends [PR105852]
The previous patch for 105852 avoids copying DECL_TEMPLATE_INFO from a
non-templated friend, but it really shouldn't have it in the first place.

	PR c++/105852

gcc/cp/ChangeLog:

	* decl.cc (duplicate_decls): Change non-templated friend
	check to an assert.
	* pt.cc	(tsubst_function_decl): Don't set DECL_TEMPLATE_INFO
	on non-templated friends.
	(tsubst_friend_function): Adjust.
2022-06-08 16:38:25 -04:00
Jason Merrill
7d87790a87 c++: redeclared hidden friend take 2 [PR105852]
My previous patch for 105761 avoided copying DECL_TEMPLATE_INFO from a
friend to a later definition, but in this testcase we have first a
non-friend declaration and then a definition, and we need to avoid copying
in that case as well.  But we do still want to set new_template_info to
avoid GC trouble.

With this change, the modules dump correctly identifies ::foo as a
non-template function in tpl-friend-2_a.C.

Along the way I noticed that the duplicate_decls handling of
DECL_UNIQUE_FRIEND_P was backwards for templates, where we don't clobber
DECL_LANG_SPECIFIC (olddecl) with DECL_LANG_SPECIFIC (newdecl) like we do
for non-templates.

	PR c++/105852
	PR c++/105761

gcc/cp/ChangeLog:

	* decl.cc (duplicate_decls): Avoid copying template info
	from non-templated friend even if newdecl isn't a definition.
	Correct handling of DECL_UNIQUE_FRIEND_P on templates.
	* pt.cc (non_templated_friend_p): New.
	* cp-tree.h (non_templated_friend_p): Declare it.

gcc/testsuite/ChangeLog:

	* g++.dg/modules/tpl-friend-2_a.C: Adjust expected dump.
	* g++.dg/template/friend74.C: New test.
2022-06-08 16:37:50 -04:00
Roger Sayle
b6e1373bd3 PR middle-end/105874: Use EXPAND_MEMORY to fix ada bootstrap.
Many thanks to Tamar Christina for filing PR middle-end/105874 indicating
that SPECcpu 2017's Leela is failing on x86_64 due to a miscompilation
of FastBoard::is_eye.  This function is much smaller and easier to work
with than my previous hunt for the cause of the Ada bootstrap failures
due to miscompilation somewhere in GCC (or one of the 131 places that
the problematic form of optimization triggers during an ada bootstrap).

It turns out the source of the miscompilation introduced by my recent
patch is the distinction (during RTL expansion) of l-values and r-values.
According to the documentation above expand_modifier, EXPAND_MEMORY
should be used for lvalues (when a memory is required), and EXPAND_NORMAL
for rvalues when a constant is permissible.  In what I'd like to consider
a latent bug, the recursive call to expand_expr_real on line 11188 of
expr.cc, in the case handling ARRAY_REF, COMPONENT_REF, BIT_FIELD_REF
and ARRARY_RANGE_REF was passing EXPAND_NORMAL when it really required
(the semantics of) EXPAND_MEMORY.  All the time that VAR_DECLs were
being returned as memory this was fine, but as soon as we're able to
optimize sort arrays into immediate constants, bad things happen.

In the test case from Leela, we notice that the array s_eyemask
always has DImode constant value { 4, 64 }, which is useful as
an rvalue, but not when we need to index it as an lvalue, as in
s_eyemask[color].  This also explains why everything being accepted
by immediate_const_ctor_p (during an ada bootstrap) looks reasonable,
what's incorrect is that we don't know how these structs/arrays are
to be used.

The fix is to ensure that we call expand_expr with EXPAND_MEMORY
when processing the VAR_DECL's returned by get_inner_reference.

2022-06-08  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	PR middle-end/105874
	* expr.cc (expand_expr_real_1) <normal_inner_ref>:  New local
	variable tem_modifier for calculating the expand_modifier enum to
	use for expanding tem.  If tem is a VAR_DECL, use EXPAND_MEMORY.

gcc/testsuite/ChangeLog
	PR middle-end/105874
	* g++.dg/opt/pr105874.C: New test case.
2022-06-08 20:43:03 +01:00
Max Filippov
e94c6dbfb5 gcc: xtensa: fix PR target/105879
split_double operates with the 'word that comes first in memory in the
target' terminology, while gen_lowpart operates with the 'value
representing some low-order bits of X' terminology. They are not
equivalent and must be dealt with differently on little- and big-endian
targets.

gcc/
	PR target/105879
	* config/xtensa/xtensa.md (movdi): Rename 'first' and 'second'
	to 'lowpart' and 'highpart' so that they match 'gen_lowpart' and
	'gen_highpart' bitwise semantics and fix order of highpart and
	lowpart depending on target endianness.
2022-06-08 08:47:40 -07:00
Nathan Sidwell
90a6c3b6d6 c++: Reimplement static init/fini generation
Currently we generate static init/fini code by generating a set of
functions taking an 'initp' bool and an unsigned priority.  (There can
be more than one, as we repeat the end-of-compile loop.)  We then
generate a set of real init or fini functions for each needed
prioroty, calling the previous set of functions.  This is of course
very tangled, but excitingly the value-range-propagator is clever
enough to unentangle it.  However, the current arrangement makes
generation awkward, particularly as to how to optimize the
module-global-init generation.

This reimplements the generation to generate a set of separate
init/fini functions for each needed priority, and then call them from
the real inits previously mentioned.  This replaces a splay tree,
recording which priority/init combos we needed, with a pair of hash
tables, mapping priority to init functions.  Much simpler.

While there, rename several of the functions as they are only dealing
with part of the init/fini generation, not the whole set.

	gcc/cp/
	* decl2.cc (struct priority_info_s, priority_info): Delete.
	(priority_map_traits, priority_map_t): New.
	(static_init_fini_fns): New.
	(INITIALIZE_P_IDENTIFIER, PRIORITY_IDENTIFIER): Delete.
	(initialize_p_decl, priority_decl): Delete.
	(ssdf_decls, priority_info_map): Delete.
	(start_static_storage_duration_function): Rename to ...
	(start_partial_init_fini_fn): ... here. Create a void arg fn.
	Add it to the slot in the appropriate static_init_fini_fns
	hash table.
	(finish_static_storage_duration_function): Rename to ...
	(finish_partial_init_fini_fn): ... here.
	(get_priority_info): Delete.
	(one_static_initialization_or_destruction): Assert not
	trivial dtor.
	(do_static_initialization_or_destruction): Rename to ...
	(emit_partial_init_fini_fn) ... here.  Start & finish the fn.
	Simply init/fini each var.
	(partition_vars_for_init_fini): Partition vars according to
	priority and add to init and/or fini list.
	(generate_ctor_or_dtor_function): Start and finish the function.
	Do santitizer calls here.
	(generate_ctor_and_dtor_functions_for_priority): Delete.
	(c_parse_final_cleanups): Reimplement global init/fini
	processing.

	gcc/testsuite/
	* g++.dg/init/static-cdtor1.C: New.
2022-06-08 07:44:20 -07:00
Roger Sayle
d8c2580941 [Committed] Add -mno-avx2 to recent gcc.target/i386/xop-vpcmov3.c
Adding -march=cascadelake to the command line options of the recently
added xop-vpcmov3.c test case causes problems as GCC then prefers to
use AVX512's vpternlogd instruction, instead of the XOP vpcmov that
the test is checking for.  This is easily solved by adding an explicit
-mno-avx512vl to the command line options.

Committed to mainline as obvious (in hindsight).

2022-06-08  Roger Sayle  <roger@nextmovesoftware.com>

gcc/testsuite/ChangeLog
	* gcc.target/i386/xop-pcmov3.c: Add -mno-avx512vl to dg-options.
2022-06-08 10:06:23 +01:00
Tobias Burnus
5e5deac508 OpenMP: Fortran - fix ancestor's requires reverse_offload check
gcc/fortran/

	* openmp.cc (gfc_match_omp_clauses): Check also parent namespace
	for 'requires reverse_offload'.

gcc/testsuite/

	* gfortran.dg/gomp/target-device-ancestor-5.f90: New test.
2022-06-08 10:06:57 +02:00
Chung-Ju Wu
ef5cc6bbb6 arm: Add star-mc1 cpu
The star-mc1 is an embedded processor with armv8m architecture.  Majorly
it is designed to meet the requirements of AIoT application performance,
power consumption and security.  This patch is to add support of star-mc1
cpu.

Signed-off-by: Chung-Ju Wu <jasonwucj@gmail.com>

gcc/ChangeLog:

	* config/arm/arm-cpus.in (star-mc1): New cpu.
	* config/arm/arm-tables.opt: Regenerate.
	* config/arm/arm-tune.md: Regenerate.
	* doc/invoke.texi: Update docs.
2022-06-08 07:17:19 +00:00
Yang Yujie
75df1594ae
libgccjit: allow common objects in $(EXTRA_GCC_OBJS) and $(EXTRA_OBJS)
This patch fixes libgccjit build failure on loongarch* targets,
and could probably be useful for future ports.

For now, libgccjit is linked with objects from $(EXTRA_GCC_OBJS) and
libbackend.a, which contains object files from $(EXTRA_OBJS).

This effectively forbids any overlap between those two lists, i.e. all
target-specific shared code between the gcc driver and compiler
executables must go into gcc/common/config/<arch>/<arch>-common.cc,
which feels a bit inconvenient when there are a lot of "common" stuff
that we want to put into separate source files.

By linking libgccjit with $(EXTRA_GCC_OBJS_EXCLUSIVE), which contains
all elements from $(EXTRA_GCC_OBJS) but not $(EXTRA_OBJS), this problem
can be alleviated.

This patch does not affect any other target architecture than loongarch,
and has been bootstrapped and regression-tested on loongarch64-linux-gnuf64
an x86_64-pc-linux-gnu.

* gcc/jit/ChangeLog:

	* Make-lang.in: only link objects from $(EXTRA_GCC_OBJS)
	that's not in $(EXTRA_OBJS) into libgccjit.
2022-06-08 14:45:02 +08:00
liuhongt
5e005393d4 Disparages SSE_REGS alternatives sligntly with ?v instead of *v in *mov{si,di}_internal.
So alternative v won't be igored in record_reg_classess.

Similar for *r alternatives in some vector patterns.

It helps testcase in the PR, also RA now makes better decisions for
gcc.target/i386/extract-insert-combining.c

        movd    %esi, %xmm0
        movd    %edi, %xmm1
-       movl    %esi, -12(%rsp)
        paddd   %xmm0, %xmm1
        pinsrd  $0, %esi, %xmm0
        paddd   %xmm1, %xmm0

The patch has no big impact on SPEC2017 for both O2 and Ofast
march=native run.

And I noticed there's some changes in SPEC2017 from code like

mov mem, %eax
vmovd %eax, %xmm0
..
mov %eax, 64(%rsp)

to

vmovd mem, %xmm0
..
vmovd %xmm0, 64(%rsp)

Which should be exactly what we want?

gcc/ChangeLog:

	PR target/105513
	PR target/105504
	* config/i386/i386.md (*movsi_internal): Change alternative
	from *v to ?v.
	(*movdi_internal): Ditto.
	* config/i386/sse.md (vec_set<mode>_0): Change alternative *r
	to ?r.
	(*vec_extractv4sf_mem): Ditto.
	(*vec_extracthf): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr105513-1.c: New test.
	* gcc.target/i386/extract-insert-combining.c: Add new
	scan-assembler-not for spill.
2022-06-08 11:23:49 +08:00
liuhongt
e4bdeaba6e Adjust testcase to avoid compile failure under -m32.
gcc/testsuite/ChangeLog:

	PR target/105854
	* gcc.target/i386/pr105854.c: Add target int128 and dfp.
2022-06-08 10:59:18 +08:00
GCC Administrator
445ba599cb Daily bump. 2022-06-08 00:16:28 +00:00
Richard Earnshaw
2005b9b888 arm: Improve code generation for BFI and BFC [PR105090]
This patch, in response to PR105090, makes some general improvements
to the code generation when BFI and BFC instructions are available.
Firstly we handle more cases where the RTL does not generate an INSV
operation due to a lack of a tie between the input and output, but we
nevertheless need to emit BFI later on; we handle this by requiring
the register allocator to tie the operands.  Secondly we handle some
cases where we were previously emitting BFC, but AND with an immediate
would be better; we do this by converting all BFC patterns into AND
using a split pattern.  And finally, we handle some cases where
previously we would emit multiple BIC operations to clear a value, but
could instead use a single BFC instruction.

BFC and BFI express the mask as a pair of values, one for the number
of bits to clear and another for the location of the least significant
bit.  We handle these with a single new output modifier letter that
causes both values to be printed; we use an 'inverted' value so that
it can be used directly with the constant used in an AND rtl
construct.  We've run out of 'new' letters, so to do this we re-use
one of the long-obsoleted Maverick output modifiers.

gcc/ChangeLog:

	PR target/105090
	* config/arm/arm.cc (arm_bfi_1_p): New function.
	(arm_bfi_p): New function.
	(arm_rtx_costs_internal): Add costs for BFI idioms.
	(arm_print_operand [case 'V']): Format output for BFI/BFC masks.
	* config/arm/constraints.md (Dj): New constraint.
	* config/arm/arm.md (arm_andsi3_insn): Add alternative to use BFC.
	(insv_zero): Convert to an insn with a split.
	(*bfi, *bfi_alt1, *bfi_alt2, *bfi_alt3): New patterns.
2022-06-07 12:12:20 +01:00
liuhongt
cd22395457 Fix insn does not satisfy its constraints: sse2_lshrv1ti3
21114(define_insn_and_split "ssse3_palignrdi"
21115  [(set (match_operand:DI 0 "register_operand" "=y,x,Yv")
21116        (unspec:DI [(match_operand:DI 1 "register_operand" "0,0,Yv")
21117                    (match_operand:DI 2 "register_mmxmem_operand" "ym,x,Yv")
21118                    (match_operand:SI 3 "const_0_to_255_mul_8_operand")]
21119                   UNSPEC_PALIGNR))]
21120  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"

Alternative 2 requires Yw instead of Yv since it's splitted to vpsrldq
which requires AVX512VL & AVX512BW for evex version.

gcc/ChangeLog:

	PR target/105854
	* config/i386/sse.md (ssse3_palignrdi): Change alternative 2
	from Yv to Yw.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr105854.c: New test.
2022-06-07 17:32:21 +08:00
Roger Sayle
c00e1e3aa5 PR middle-end/105853: Call store_constructor directly from calls.cc.
This patch fixes both ICE regressions PR middle-end/105853 and
PR target/105856 caused by my recent patch to expand small const structs
as immediate constants.  That patch updated code generation in three
places: two in expr.cc that call store_constructor directly, and the
third in calls.cc's load_register_parameters that expands its CONSTRUCTOR
via expand_expr, as store_constructor is local/static to expr.cc, and
the "public" API, should usually simply forward the constructor to the
appropriate store_constructor function.

Alas, despite the clean regression testing on multiple targets, the above
ICEs show that expand_expr isn't a suitable proxy for store_constructor,
and things that (I'd assumed) shouldn't affect how/whether a struct is
placed in a register [such as whether the struct is considered packed/
aligned or not] actually interfere with the optimization that is being
attempted.

The (proposed) solution is to export store_constructor (and it's helper
function int_expr_size) from expr.cc, by removing their static qualifier
and prototyping both functions in expr.h, so they can be called directly
from load_register_parameters in calls.cc.  This cures both ICEs, but
almost as importantly improves code generation over GCC 12.

For PR 105853, GCC 12 generates:

compose_nd_na_ipv6_src:
	movzx eax, WORD PTR eth_addr_zero[rip+2]
	movzx edx, WORD PTR eth_addr_zero[rip]
	movzx edi, WORD PTR eth_addr_zero[rip+4]
	sal rax, 16
	or rax, rdx
	sal rdi, 32
	or rdi, rax
	xor eax, eax
	jmp packet_set_nd
eth_addr_zero:	.zero 6

where now (with this fix) GCC 13 generates:
compose_nd_na_ipv6_src:
        xorl    %edi, %edi
        xorl    %eax, %eax
        jmp     packet_set_nd

Likewise, for PR 105856 on ARM, we'd previously generate:
g_329_3:
	movw r3, #:lower16:.LANCHOR0
	movt r3, #:upper16:.LANCHOR0
	ldr r0, [r3]
	b func_19

but with this optimization we now generate:
g_329_3:
        mov     r0, #6
        b       func_19

2022-06-07  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	PR middle-end/105853
	PR target/105856
	* calls.cc (load_register_parameters): Call store_constructor
	and int_expr_size directly instead of expanding via expand_expr.
	* expr.cc (static void store_constructor): Don't prototype here.
	(static HOST_WIDE_INT int_expr_size): Likewise.
	(store_constructor): No longer static.
	(int_expr_size): Likewise, no longer static.
	* expr.h (store_constructor): Prototype here.
	(int_expr_size): Prototype here.

gcc/testsuite/ChangeLog
	PR middle-end/105853
	PR target/105856
	* gcc.dg/pr105853.c: New test case.
	* gcc.dg/pr105856.c: New test case.
2022-06-07 10:09:49 +01:00
Jan Beulich
cef3f69c2f Revert "configure: arrange to use appropriate objcopy"
This reverts commit 6124f42488.
It lacks pieces to work with system binutils.
2022-06-07 10:24:53 +02:00
Jakub Jelinek
03b7140632 openmp: Add support for OpenMP 5.2 linear clause syntax for C/C++
The syntax for linear clause changed in 5.2, the original syntax
which is still valid is:
linear (var1, var2)
linear (var3, var4 : step1)
The 4.5 syntax with modifiers like:
linear (val (var5, var6))
linear (val (var7, var8) : step2)
is still supported in 5.2, but is deprecated there.
Instead, one can use a new syntax:
linear (var9, var10 : val)
linear (var11, var12 : step (step3), val)
As val, ref, uval or step (someexpr) can be valid expressions (and especially
in C++ can be const / constexpr / consteval), the spec says that
when the whole step expression is val (or ref or uval) or step ( ... )
then it is the new modifier syntax, one can use + 0 or 0 + or 1 * or * 1
or ()s to say it is the old step expression.
Also, 5.2 now allows val modifier to be specified even outside of declare simd
(but not the other modifiers).  I've implemented this for the new modifier
syntax only, the old one keeps the old restriction (which is why
OMP_CLAUSE_LINEAR_OLD_LINEAR_MODIFIER flag has been introduced).

2022-06-07  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* tree.h (OMP_CLAUSE_LINEAR_OLD_LINEAR_MODIFIER): Define.
	* tree-pretty-print.cc (dump_omp_clause) <case OMP_CLAUSE_LINEAR>:
	Adjust clause printing style depending on
	OMP_CLAUSE_LINEAR_OLD_LINEAR_MODIFIER.
gcc/c/
	* c-parser.cc (c_parser_omp_clause_linear): Parse OpenMP 5.2
	style linear clause modifiers.  Set
	OMP_CLAUSE_LINEAR_OLD_LINEAR_MODIFIER flag on the clauses when
	old style modifiers are used.
	* c-typeck.cc (c_finish_omp_clauses): Only reject linear clause
	with val modifier on simd or for if the old style modifiers are
	used.
gcc/cp/
	* parser.cc (cp_parser_omp_clause_linear): Parse OpenMP 5.2
	style linear clause modifiers.  Set
	OMP_CLAUSE_LINEAR_OLD_LINEAR_MODIFIER flag on the clauses when
	old style modifiers are used.
	* semantics.cc (finish_omp_clauses): Only reject linear clause
	with val modifier on simd or for if the old style modifiers are
	used.
gcc/fortran/
	* trans-openmp.cc (gfc_trans_omp_clauses): Set
	OMP_CLAUSE_LINEAR_OLD_LINEAR_MODIFIER on OMP_CLAUSE_LINEAR
	clauses unconditionally for now.
gcc/testsuite/
	* c-c++-common/gomp/linear-2.c: New test.
	* c-c++-common/gomp/linear-3.c: New test.
	* g++.dg/gomp/linear-3.C: New test.
	* g++.dg/gomp/linear-4.C: New test.
	* g++.dg/gomp/linear-5.C: New test.
2022-06-07 10:05:08 +02:00
Jan Beulich
6bb0776e10 x86: harmonize __builtin_ia32_psadbw*() types
The 64-bit, 128-bit, and 512-bit variants have V<n>DI return type, in
line with instruction behavior. Make the 256-bit builtin match, thus
also making it match the insn it expands to (using VI8_AVX2_AVX512BW).

gcc/

	* config/i386/i386-builtin.def (__builtin_ia32_psadbw256):
	Change type.
	* config/i386/i386-builtin-types.def: New function type
	(V4DI, V32QI, V32QI).
	* config/i386/i386-expand.cc (ix86_expand_args_builtin): Handle
	V4DI_FTYPE_V32QI_V32QI.
2022-06-07 09:18:28 +02:00
Jan Beulich
76e3d60c16 x86-64: make "length_vex" also account for VEX.B use by register operand
The length attribute ought to be "the (bounding maximum) length of an
instruction" according to the comment next to its definition. A register
operand encoded using the ModR/M.rm field will additionally use VEX.B
for encoding the highest bit of the register number. Hence for the high
8 GPR registers as well as the [xy]mm{8..15} ones 3-byte VEX encoding
may be needed. Since it isn't known to the function calculating the
length which register goes where in the insn encoding, be conservative
and assume a 3-byte VEX prefix whenever any such register operand is
present and there's no memory operand.

gcc/

	* config/i386/i386.cc (ix86_attr_length_vex_default): Take REX.B
	into account for reg-only insns.
2022-06-07 09:17:25 +02:00
Roger Sayle
6dd194e2ce PR c++/96442: Improved error recovery in enumerations.
This patch is a revised fix for PR c++/96442 providing a cleaner
solution, setting ENUM_UNDERLYING_TYPE to integer_type_node when
issuing an error, so that this invariant holds during the parser's
error recovery.

2022-06-07  Roger Sayle  <roger@nextmovesoftware.com>

gcc/cp/ChangeLog
	PR c++/96442
	* decl.cc (start_enum): When emitting a "must be integral" error,
	set ENUM_UNDERLYING_TYPE to integer_type_node, to avoid an ICE
	downstream in build_enumeration.

gcc/testsuite/ChangeLog
	PR c++/96442
	* g++.dg/parse/pr96442.C: New test case.
2022-06-07 07:54:13 +01:00
Roger Sayle
c4320bde42 Recognize vpcmov in combine with -mxop on x86.
By way of an apology for causing PR target/105791, where I'd overlooked
the need to support V1TImode in TARGET_XOP's vpcmov instruction, this
patch further improves support for TARGET_XOP's vpcmov instruction, by
recognizing it in combine.

Currently, the test case:

typedef int v4si __attribute__ ((vector_size (16)));
v4si foo(v4si c, v4si t, v4si f)
{
    return (c&t)|(~c&f);
}

on x86_64 with -O2 -mxop generates:
        vpxor   %xmm2, %xmm1, %xmm1
        vpand   %xmm0, %xmm1, %xmm1
        vpxor   %xmm2, %xmm1, %xmm0
        ret

but with this patch now generates:
        vpcmov  %xmm0, %xmm2, %xmm1, %xmm0
        ret

On its own, the new combine splitter works fine on TARGET_64BIT, but
alas with -m32 combine incorrectly thinks the replacement instruction
is more expensive, as IF_THEN_ELSE isn't currently/correctly handled
in ix86_rtx_costs.  So to avoid the need for a target selector in the
new tescase, I've updated ix86_rtx_costs to report that AMD's vpcmov
has a latency of two cycles [it's now an obsolete instruction set
extension and there's unlikely to ever be a processor where this
instruction has a different timing], and while there I also added
rtx_costs for x86_64's integer conditional move instructions (which
have single cycle latency).

2022-06-07  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* config/i386/i386.cc (ix86_rtx_costs): Add a new case for
	IF_THEN_ELSE, and provide costs for TARGET_XOP's vpcmov and
	TARGET_CMOVE's (scalar integer) conditional moves.
	* config/i386/sse.md (define_split): Recognize XOP's vpcmov
	from its equivalent (canonical) pxor;pand;pxor sequence.

gcc/testsuite/ChangeLog
	* gcc.target/i386/xop-pcmov3.c: New test case.
2022-06-07 07:49:40 +01:00
Kewen Lin
63eab5d577 Update document for VECTOR_MODES_WITH_PREFIX
r10-3912 updated the format of VECTOR_MODES_WITH_PREFIX by
adding one more parameter ORDER, the related document is out
of date.  So update the document for ORDER.

gcc/ChangeLog:

	* machmode.def (VECTOR_MODES_WITH_PREFIX): Update document for
	parameter ORDER.
2022-06-06 22:08:23 -05:00
GCC Administrator
70e2ffbcb4 Daily bump. 2022-06-07 00:16:20 +00:00
Patrick Palka
733a792a2b c++: function NTTP argument considered unused [PR53164, PR105848]
Here at parse time the template argument f (an OVERLOAD) in A<f> gets
resolved ahead of time to the FUNCTION_DECL f<int>, and we defer marking
f<int> as used until instantiation (of g) as usual.

Later when instantiating g the type A<f> (where f has already been
resolved) is non-dependent, so tsubst_aggr_type avoids re-processing its
template arguments, and we end up never actually marking f<int> as used
(which means we never instantiate it) even though A<f>::h() later calls
it, leading to a link error.

This patch works around this issue by looking through ADDR_EXPR when
calling mark_used on the substituted callee of a CALL_EXPR.

	PR c++/53164
	PR c++/105848

gcc/cp/ChangeLog:

	* pt.cc (tsubst_copy_and_build) <case CALL_EXPR>: Look through an
	ADDR_EXPR callee when calling mark_used.

gcc/testsuite/ChangeLog:

	* g++.dg/template/fn-ptr3.C: New test.
2022-06-06 14:29:12 -04:00