This was fixed by r258755:
PR c++/81311 - wrong C++17 overload resolution.
PR c++/81952
gcc/testsuite/ChangeLog:
* g++.dg/overload/conv-op4.C: New test.
The exporter relies on sorting interface parse methods. It would sort
them as it encountered interface types. However, when an interface
type is an element of a struct or array type, the exporter might
encounter that interface type before sorting the parse methods. If it
then encountered an identical interface type again, it could get
confused about whether the two types are identical or not.
Fix the problem by always sorting the parse methods in the
finalize_methods pass.
Also firm up the export type sorting to make sure we never have this
kind of confusion again. Doing this revealed that we need to be more
careful about sorting in order to handle aliases correctly.
Also fix the interface type hash computation to use the right hash
value when looking at parse methods rather than all methods.
The test case for this is https://go.dev/cl/405759.
Fixesgolang/go#52841
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/405556
This patch improves support for vector equality and inequality of
V1TImode vectors, and V2DImode vectors with sse2 but not sse4.
Consider the three functions below:
typedef unsigned int uv4si __attribute__ ((__vector_size__ (16)));
typedef unsigned long long uv2di __attribute__ ((__vector_size__ (16)));
typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16)));
uv4si eq_v4si(uv4si x, uv4si y) { return x == y; }
uv2di eq_v2di(uv2di x, uv2di y) { return x == y; }
uv1ti eq_v1ti(uv1ti x, uv1ti y) { return x == y; }
These all perform vector comparisons of 128bit SSE2 registers, generating
the result as a vector, where ~0 (all 1 bits) represents true and a zero
represents false. eq_v4si is trivially implemented by x86_64's pcmpeqd
instruction. This patch improves the other two cases:
For v2di, gcc -O2 currently generates:
movq %xmm0, %rdx
movq %xmm1, %rax
movdqa %xmm0, %xmm2
cmpq %rax, %rdx
movhlps %xmm2, %xmm3
movhlps %xmm1, %xmm4
sete %al
movq %xmm3, %rdx
movzbl %al, %eax
negq %rax
movq %rax, %xmm0
movq %xmm4, %rax
cmpq %rax, %rdx
sete %al
movzbl %al, %eax
negq %rax
movq %rax, %xmm5
punpcklqdq %xmm5, %xmm0
ret
but with this patch we now generate:
pcmpeqd %xmm0, %xmm1
pshufd $177, %xmm1, %xmm0
pand %xmm1, %xmm0
ret
where the results of a V4SI comparison are shuffled and bit-wise ANDed
to produce the desired result. There's no change in the code generated
for "-O2 -msse4" where the compiler generates a single "pcmpeqq" insn.
For V1TI mode, the results are equally dramatic, where the current -O2
output looks like:
movaps %xmm0, -40(%rsp)
movq -40(%rsp), %rax
movq -32(%rsp), %rdx
movaps %xmm1, -24(%rsp)
movq -24(%rsp), %rcx
movq -16(%rsp), %rsi
xorq %rcx, %rax
xorq %rsi, %rdx
orq %rdx, %rax
sete %al
xorl %edx, %edx
movzbl %al, %eax
negq %rax
adcq $0, %rdx
movq %rax, %xmm2
negq %rdx
movq %rdx, -40(%rsp)
movhps -40(%rsp), %xmm2
movdqa %xmm2, %xmm0
ret
with this patch we now generate:
pcmpeqd %xmm0, %xmm1
pshufd $177, %xmm1, %xmm0
pand %xmm1, %xmm0
pshufd $78, %xmm0, %xmm1
pand %xmm1, %xmm0
ret
performing a V2DI comparison, followed by a shuffle and pand, and with
-O2 -msse4 take advantages of SSE4.1's pcmpeqq:
pcmpeqq %xmm0, %xmm1
pshufd $78, %xmm1, %xmm0
pand %xmm1, %xmm0
ret
2022-05-13 Roger Sayle <roger@nextmovesoftware.com>
Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
* config/i386/sse.md (vec_cmpeqv2div2di): Enable for TARGET_SSE2.
For !TARGET_SSE4_1, expand as a V4SI vector comparison, followed
by a pshufd and pand.
(vec_cmpeqv1tiv1ti): New define_expand implementing V1TImode
vector equality as a V2DImode vector comparison (see above),
followed by a pshufd and pand.
gcc/testsuite/ChangeLog
* gcc.target/i386/sse2-v1ti-veq.c: New test case.
* gcc.target/i386/sse2-v1ti-vne.c: New test case.
A few tests need not be restricted to 'lp64', so remove the restriction.
A few of those need a simple change to the DejaGnu directives to suppress
'-mcmodel' flags for '-m32'.
2022-05-13 Paul A. Clarke <pc@us.ibm.com>
gcc/testsuite
* g++.target/powerpc/pr65240-1.C: Adjust DejaGnu directives.
* g++.target/powerpc/pr65240-2.C: Likewise.
* g++.target/powerpc/pr65240-3.C: Likewise.
* g++.target/powerpc/pr65240-4.C: Likewise.
* g++.target/powerpc/pr65242.C: Likewise.
* g++.target/powerpc/pr67211.C: Likewise.
* g++.target/powerpc/pr69667.C: Likewise.
* g++.target/powerpc/pr71294.C: Likewise.
This patch implements the missed optimization enhancement PR 83907,
by handling memset with a constant byte value in tree-ssa's strlen
optimization pass. Effectively, this treats memset(dst,'x',3) as
it would memcpy(dst,"xxx",3).
This patch also includes a tweak to handle_store to address another
missed optimization observed in the related test case pr83907-2.c.
The consecutive byte stores to memory get coalesced into a vector
write of a vector const, but unfortunately tree-ssa-strlen's
handle_store didn't previously handle the (unusual) case where the
stored "string" starts with a zero byte but also contains non-zero
bytes.
2022-05-13 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR tree-optimization/83907
* tree-ssa-strlen.cc (handle_builtin_memset): Record a strinfo
for memset with an constant char value.
(handle_store): Improved handling of stores with a first byte
of zero, but not storing_all_zeros_p.
gcc/testsuite/ChangeLog
PR tree-optimization/83907
* gcc.dg/tree-ssa/pr83907-1.c: New test case.
* gcc.dg/tree-ssa/pr83907-2.c: New test case.
The Zbb support has introduced ctz and clz to the backend, but some
transformations in GCC need to know what the value of c[lt]z at zero
is. This affects how the optab is generated and may suppress use of
CLZ/CTZ in tree passes.
Among other things, this is needed for the transformation of
table-based ctz-implementations, such as in deepsjeng, to work
(see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90838).
Prior to this change, the test case from PR90838 would compile to
on RISC-V targets with Zbb:
myctz:
lui a4,%hi(.LC0)
ld a4,%lo(.LC0)(a4)
neg a5,a0
and a5,a5,a0
mul a5,a5,a4
lui a4,%hi(.LANCHOR0)
addi a4,a4,%lo(.LANCHOR0)
srli a5,a5,58
sh2add a5,a5,a4
lw a0,0(a5)
ret
After this change, we get:
myctz:
ctz a0,a0
andi a0,a0,63
ret
Testing this with deepsjeng_r (from SPEC 2017) against QEMU, this
shows a clear reduction in dynamic instruction count:
- before 1961888067076
- after 1907928279874 (2.75% reduction)
This also merges the various target-specific test-cases (for x86-64,
aarch64 and riscv) within gcc.dg/pr90838.c.
This extends the macros (i.e., effective-target keywords) used in
testing (lib/target-supports.exp) to reliably distinguish between RV32
and RV64 via __riscv_xlen (i.e., the integer register bitwidth) :
testing for ILP32 could be misleading (as ILP32 is a valid memory
model for 64bit systems).
gcc/ChangeLog:
* config/riscv/riscv.h (CLZ_DEFINED_VALUE_AT_ZERO): Implement.
(CTZ_DEFINED_VALUE_AT_ZERO): Same.
* doc/sourcebuild.texi: add documentation for RISC-V specific
test target keywords
gcc/testsuite/ChangeLog:
* gcc.dg/pr90838.c: Add additional flags (dg-additional-options)
when compiling for riscv64 and subsume gcc.target/aarch64/pr90838.c
and gcc.target/i386/pr95863-2.c.
* gcc.target/aarch64/pr90838.c: Removed.
* gcc.target/i386/pr95863-2.c: Removed.
* lib/target-supports.exp: Recognize RV32 or RV64 via XLEN
Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
Co-authored-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
The non-member swap for std::exception_ptr is in a nested namespace and
so can only be found by ADL currently. Add a using-declaration so that
qualified std::swap calls will use the std::exception_ptr::swap member,
instead of the generic std::swap.
There's no new test for this, because the generic std::swap works, it
just does more work than is necessary.
Also tell Doxygen to replace the __exception_ptr namespace with
"__unspecified__" in the generate docs, so the real name is not
documented.
libstdc++-v3/ChangeLog:
* doc/doxygen/user.cfg.in (PREDEFINED): Replace __exception_ptr
with "__unspecified__".
* libsupc++/exception_ptr.h: Improve doxygen docs.
(__exception_ptr::swap): Also declare in namespace std.
This allows std::rethrow_if_nested to work with -fno-rtti by not
attempting the dynamic_cast if it requires RTTI, since that's ill-formed
with -fno-rtti. The cast will still work if a static upcast to
std::nested_exception is allowed.
Also use if-constexpr to avoid the compile-time overload resolution (and
SFINAE) and run-time dispatching for std::rethrow_if_nested and
std::throw_with_nested.
Also add better doxygen comments throughout the file.
libstdc++-v3/ChangeLog:
* libsupc++/nested_exception.h (throw_with_nested) [C++17]: Use
if-constexpr instead of tag dispatching.
(rethrow_if_nested) [C++17]: Likewise.
(rethrow_if_nested) [!__cpp_rtti]: Do not use dynamic_cast if it
would require RTTI.
* testsuite/18_support/nested_exception/rethrow_if_nested-term.cc:
New test.
When folding, the LHS has not been set, so we should be checking the type of
op1. We should also make sure op1 is not undefined.
PR tree-optimization/105597
gcc/
* range-op.cc (operator_minus::lhs_op1_relation): Use op1 instead
of the lhs and make sure it is not undefined.
gcc/testsuite/
* gcc.dg/pr105597.c: New.
For a non-descriptor array, map(A(n:m)) was mapped as
map(tofrom:A[n-1] [len: ...]) map(alloc:A [pointer assign, bias: ...])
with this patch, it is changed to
map(tofrom:A[n-1] [len: ...]) map(firstprivate:A [pointer assign, bias: ...])
The latter avoids an alloc - and also avoids the race condition with
nowait in the enclosed testcase. (Note: predantically, the testcase is
invalid since OpenMP 5.1, violating the map clause restriction at [354:10-13].
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_trans_omp_clauses): When mapping nondescriptor
array sections, use GOMP_MAP_FIRSTPRIVATE_POINTER instead of
GOMP_MAP_POINTER for the pointer attachment.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/target-nowait-array-section.f90: New test.
Similar to 37e65643d3 ("testsuite/101269: fix testcase when used with
-m32"), RISC-V needs to be told not to put symbols in the
sdata/srodata/sbss sections.
gcc/testsuite/ChangeLog
* gcc.dg/debug/btf/btf-datasec-1.c: Don't use small data on RISC-V.
Re-using some common things like EQ_EXPR and other relationals made
certain things easier, but complicated debugging and added extra overhead
when accessing lookup tables. With forthcoming additional relation types,
it makes more sense to simple have a distinct relation kind.
* gimple-range-fold.cc (fold_using_range::range_of_phi): Use new VREL_*
enumerated values.
* gimple-range-path.cc (maybe_register_phi_relation): Ditto.
* range-op.cc (*::lhs_op1_relation): Return relation_kind, and use
new VREL enumerated values.
(*::lhs_op2_relation): Ditto.
(*::op1_op2_relation): Ditto.
(*::fold_range): Use new VREL enumerated values.
(minus_op1_op2_relation_effect): Ditto.
(range_relational_tests): Ditto.
* range-op.h (fold_range, op1_range, op2_range): Use VREL_VARYING.
(lhs_op1_relation, lhs_op2_relation, op1_op2_relation): Return
relation_kind.
(*_op1_op2_relation): Return relation_kind.
(relop_early_resolve): Use VREL_UNDEFINED.
* value-query.cc (range_query::query_relation): Use VREL_VARYING.
* value-relation.cc (VREL_LAST): Change enumerated value.
(vrel_range_assert): Delete.
(print_relation): Remove range assert.
(rr_negate_table): Adjust table to use new enumerated values..
(relation_negate): Remove range assert.
(rr_swap_table): Adjust.
(relation_swap): Remove range assert.
(rr_intersect_table): Adjust.
(relation_intersect): Remove range assert.
(rr_union_table): Adjust.
(relation_union): Remove range assert.
(rr_transitive_table): Adjust.
(relation_transitive): Remove range assert.
(equiv_oracle::query_relation): Use new VREL enumerated values.
(equiv_oracle::register_relation): Ditto.
(relation_oracle::register_stmt): Ditto.
(dom_oracle::set_one_relation): Ditto.
(dom_oracle::register_transitives): Ditto.
(dom_oracle::query_relation): Ditto.
(path_oracle::register_relation): Ditto.
(path_oracle::query_relation): Ditto.
* value-relation.h (enum relation_kind_t): New relation_kind.
(*_op1_op2_relation): Adjust prototypes.
Union_ returns a boolean indicating if the operation changes the range.
Also optimize the common single-pair UNION single-pair case.
* gimple-range-edge.cc (calc_switch_ranges): Check union return value.
* value-range.cc (irange::legacy_verbose_union_): Add return value.
(irange::irange_single_pair_union): New.
(irange::irange_union): Add return value.
* value-range.h (class irange): Adjust prototypes.
Return true if the intersection of ranges changed the original value.
Speed up the case when there is no change by calling an efficient
contains routine.
* value-range.cc (irange::legacy_verbose_intersect): Add return value.
(irange::irange_contains_p): New.
(irange::irange_intersect): Add return value.
* value-range.h (class irange): Adjust prototypes.
The "is_current" status is returned by parameter, but was being returned by the
function as well instead of true if NAME had a global range, and FALSE
if it did not.
* gimple-range-cache.cc (ranger_cache::get_global_range): Return the
had_global value instead.
We use the relation between op1 and op2 to help fold a statement, but
it was not provided to the lhs_op1_relation and lhs_op2_relation routines
to determine if is also creates a relation between the LHS and either operand.
gcc/
PR tree-optimization/104547
* gimple-range-fold.cc (fold_using_range::range_of_range_op): Add
the op1/op2 relation to the relation call.
* range-op.cc (*::lhs_op1_relation): Add param.
(*::lhs_op2_relation): Ditto.
(operator_minus::lhs_op1_relation): New.
(range_relational_tests): Add relation param.
* range-op.h (lhs_op1_relation, lhs_op2_relation): Adjust prototype.
gcc/testsuite/
* g++.dg/pr104547.C: New.
Internal-linkage entity mangling is entirely implementation defined --
there's no ABI issue. Let's not mangle in any module attachment to
them, it makes the symbols unnecessarily longer.
gcc/cp/
* mangle.cc (maybe_write_module): Check external linkage.
gcc/testsuite/
* g++.dg/modules/mod-sym-4.C: New.
VRP currently searches the ssa_name list for globals to exported after it
finishes running. Recent changes have VRP calling a side-effect routine for
each stmt during the walk. This change simply exports globals as they are
calculated the final time during the walk.
* gimple-range.cc (gimple_ranger::register_side_effects): First check
if the DEF should be exported as a global.
* tree-vrp.cc (rvrp_folder::pre_fold_bb): Process PHI side effects,
which will export globals.
(execute_ranger_vrp): Remove call to export_global_ranges.
Add modes to range_from_dom such that we can simply query, or adjust the
cache and deal with multiple predecessor blocks.
* gimple-range-cache.cc (ranger_cache::ranger_cache): Start with
worlist truncated.
(ranger_cache::entry_range): Add rfd_mode parameter.
(ranger_cache::exit_range): Ditto.
(ranger_cache::edge_range): New. Incorporate from range_on_edge.
(ranger_cache::range_of_expr): Adjust call to entry_range.
(ranger_cache::range_on_edge): Split to edge_range and call.
(ranger_cache::fill_block_cache): Always invoke range_from_dom.
(ranger_cache::range_from_dom): Make reentrant, add search mode, handle
mutiple predecessors.
(ranger_cache::update_to_nonnull): Adjust call to exit_range.
* gimple-range-cache.h (ranger_cache): Add enum rfd_mode. Adjust
prototypes.
libstdc++-v3/ChangeLog:
* include/bits/ostream_insert.h: Mark helper functions as
undocumented by Doxygen.
* include/bits/stl_algo.h: Use markdown for formatting and mark
helper functions as undocumented.
* include/bits/stl_numeric.h: Likewise.
* include/bits/stl_pair.h (pair): Add @headerfile.
Add @headerfile and @since tags. Improve grouping of non-member
functions via @relates tags.
Mark the std::pair base class of std::sub_match as undocumented, so that
the docs don't show all the related non-member functions are part of the
sub_match API. Use a new macro to re-add the data members for Doxygen
only.
libstdc++-v3/ChangeLog:
* doc/doxygen/user.cfg.in (PREDEFINED): Define macro
_GLIBCXX_DOXYGEN_ONLY to expand its argument.
* include/bits/c++config (_GLIBCXX_DOXYGEN_ONLY): Define.
* include/bits/regex.h: Improve doxygen docs.
* include/bits/regex_constants.h: Likewise.
* include/bits/regex_error.h: Likewise.
libstdc++-v3/ChangeLog:
* include/std/atomic: Suppress doxygen docs for
implementation details.
* include/bits/atomic_base.h: Likewise.
* include/bits/shared_ptr_atomic.h: Use markdown. Fix grouping
so that std::atomic is not added to the pointer abstractions
group.
Use macros to open and close the inline namespace _V2 that is used for
ABI versioning of individual components such as chrono::system_clock.
This allows the namespace to be hidden in the docs generated by Doxygen,
so that we document std::foo instead of std::_V2::foo.
This also makes it easy to remove that namespace entirely for the
gnu-versioned-namespace build, where everything is already versioned as
std::__8 and there are no backwards compatibility guarantees.
libstdc++-v3/ChangeLog:
* doc/doxygen/user.cfg.in (PREDEFINED): Expand new macros to
nothing.
* include/bits/c++config (_GLIBCXX_BEGIN_INLINE_ABI_NAMESPACE)
(_GLIBCXX_END_INLINE_ABI_NAMESPACE): Define new macros.
* include/bits/algorithmfwd.h (_V2::__rotate): Use new macros
for the namespace.
* include/bits/chrono.h (chrono::_V2::system_clock): Likewise.
* include/bits/stl_algo.h (_V2::__rotate): Likewise.
* include/std/condition_variable (_V2::condition_variable_any):
Likewise.
* include/std/system_error (_V2::error_category): Likewise.
Before Doxygen version 1.9.2 this option is broken (see
https://github.com/doxygen/doxygen/issues/8638 for more details) and
classes are not added to the correct groups by @ingroup and @addtogroup.
Also remove the obsolete CLASS_DIAGRAMS option that causes a warning.
libstdc++-v3/ChangeLog:
* doc/doxygen/user.cfg.in (GROUP_NESTED_COMPOUNDS): Set to NO.
(CLASS_DIAGRAMS): Remove obsolete option.
Reverse iteration over blocks, in gimple-harden-conditionals.cc, was
supposed to avoid visiting blocks introduced by hardening and
introducing further reversed conditionals and traps for them, but
newly-created blocks may be inserted before the current block, as
shown by the PR105455 testcase.
Use an auto_sbitmap to gather preexisting blocks that need visiting.
for gcc/ChangeLog
* gimple-harden-conditionals.cc: Include sbitmap.h.
(pass_harden_conditional_branches::execute): Skip new blocks.
(pass_harden_compares::execute): Likewise.
The compiler is allowed to assume it can access String bounds, such as
the prefix passed to Get_External_Name, even in circumstances in which
the prefix is not going to be used and has_prefix is false, so, from
the C side, we have to build a proper String_Template for the
String_Pointer.
for gcc/ada/ChangeLog
* gcc-interface/decl.cc (is_cplusplus_method): Build proper
String for Get_External_Name.
Test for the validity checking performed on nonstandard booleans
annotated with the "hardbool" Machine_Attribute pragma.
for gcc/testsuite/ChangeLog
* gnat.dg/hardbool.ads: New.
* gnat.dg/hardbool.adb: New.
Vector operations in MVE must be aligned to the element size, so if we
are asked for a misaligned move in a wider mode we must recast it to a
form suitable for the known alignment (larger elements have better
address offset ranges, so there is some advantage to using wider
element sizes if possible). Whilst fixing this, also rework the
predicates used for validating operands - the Neon predicates are
not right for MVE.
gcc/ChangeLog:
PR target/105463
* config/arm/mve.md (*movmisalign<mode>_mve_store): Use
mve_memory_operand.
(*movmisalign<mode>_mve_load): Likewise.
* config/arm/vec-common.md (movmisalign<mode>): Convert to generator
form...
(@movmisalign<mode>): ... thus. Use generic predicates and then
rework operands if they are not valid. For MVE rework to a
narrower element size if the alignment is not high enough.
There are a couple of issues with the mve_vector_mem_operand function.
Firstly, SP is permitted as a register provided there is no write-back
operation. Secondly, there were some cases where 'strict' was not
being applied when checking which registers had been used.
gcc/ChangeLog:
* config/arm/arm.cc (mve_vector_mem_operand): Allow SP_REGNUM
when there is no write-back. Fix use when strict is true.
These patterns were deprecated since GCC 4.8.
gcc/ChangeLog:
* config/xtensa/xtensa.md (extvsi, extvsi_internal, extzvsi,
extzvsi_internal): Rename from extv, extv_internal, extzv and
extzv_internal, respectively.
Most cases of VIEW_CONVERT_EXPRs involving reverse scalar storage order are
disqualified for SRA because they are storage_order_barrier_p, but you can
still have a VIEW_CONVERT_EXPR to a regular composite type being applied to
a component of a record type with reverse scalar storage order.
In this case the bypass for !useless_type_conversion_p in sra_modify_assign,
albeit already heavily guarded, triggers and may generate wrong code, so the
patch makes sure that it does only when the SSO is the same on both side.
gcc/
* tree-sra.cc (sra_modify_assign): Check that scalar storage order
is the same on the LHS and RHS before rewriting one with the model
of the other.
gcc/testsuite/
* gnat.dg/sso17.adb: New test.
This is a return convention mismatch coming from a discrepancy of the
Returns_By_Ref flag for the inherited function.
gcc/ada/
* sem_ch3.adb (Derive_Subprogram): For a function, also copy the
Returns_By_Ref flag from the parent.