Some of these tests take several minutes on a simulator like cris-elf,
so we can conditionally run fewer iterations. The testDiscreteDist
helper already supports custom sizes so we just need to make use of that
when { target simulator } matches.
The relevant code is sufficiently tested on other targets, so we're not
losing anything by only running a small number of iterators for sims.
libstdc++-v3/ChangeLog:
* testsuite/26_numerics/random/bernoulli_distribution/operators/values.cc:
Run fewer iterations for simulator targets.
* testsuite/26_numerics/random/binomial_distribution/operators/values.cc:
Likewise.
* testsuite/26_numerics/random/discrete_distribution/operators/values.cc:
Likewise.
* testsuite/26_numerics/random/geometric_distribution/operators/values.cc:
Likewise.
* testsuite/26_numerics/random/negative_binomial_distribution/operators/values.cc:
Likewise.
* testsuite/26_numerics/random/poisson_distribution/operators/values.cc:
Likewise.
* testsuite/26_numerics/random/uniform_int_distribution/operators/values.cc:
Likewise.
Improve and generalize rotate patterns. Rotates by more than half the
bitwidth of a register are canonicalized to rotate left. Many existing
shift patterns don't handle this case correctly, so add rotate left to
the shift iterator and convert rotate left into ror during assembly
output. Add missing zero_extend patterns for shifted BIC, ORN and EON.
gcc/
* config/aarch64/aarch64.md
(and_<SHIFT:optab><mode>3_compare0): Support rotate left.
(and_<SHIFT:optab>si3_compare0_uxtw): Likewise.
(<LOGICAL:optab>_<SHIFT:optab><mode>3): Likewise.
(<LOGICAL:optab>_<SHIFT:optab>si3_uxtw): Likewise.
(one_cmpl_<optab><mode>2): Likewise.
(<LOGICAL:optab>_one_cmpl_<SHIFT:optab><mode>3): Likewise.
(<LOGICAL:optab>_one_cmpl_<SHIFT:optab>sidi_uxtw): New pattern.
(eor_one_cmpl_<SHIFT:optab><mode>3_alt): Support rotate left.
(eor_one_cmpl_<SHIFT:optab>sidi3_alt_ze): Likewise.
(and_one_cmpl_<SHIFT:optab><mode>3_compare0): Likewise.
(and_one_cmpl_<SHIFT:optab>si3_compare0_uxtw): Likewise.
(and_one_cmpl_<SHIFT:optab><mode>3_compare0_no_reuse): Likewise.
(and_<SHIFT:optab><mode>3nr_compare0): Likewise.
(*<optab>si3_insn_uxtw): Use SHIFT_no_rotate.
(rolsi3_insn_uxtw): New pattern.
* config/aarch64/iterators.md (SHIFT): Add rotate left.
(SHIFT_no_rotate): Add new iterator.
(SHIFT:shift): Print rotate left as ror.
(is_rotl): Add test for left rotate.
gcc/testsuite/
* gcc.target/aarch64/ror_2.c: New test.
* gcc.target/aarch64/ror_3.c: New test.
The --with-cpu/--with-arch configure option processing not only checks valid
arguments but also sets TARGET_CPU_DEFAULT with a CPU and extension bitmask.
This isn't used however since a --with-cpu is translated into a -mcpu option
which is processed as if written on the command-line (so TARGET_CPU_DEFAULT
is never accessed).
So remove all the complex processing and bitmask, and just validate the
option. Fix a bug that always reports valid architecture extensions as invalid.
As a result the CPU processing in aarch64.c can be simplified.
gcc/
* config.gcc (aarch64*-*-*): Simplify --with-cpu and --with-arch
processing. Add support for architectural extensions.
* config/aarch64/aarch64.h (TARGET_CPU_DEFAULT): Remove
AARCH64_CPU_DEFAULT_FLAGS.
(TARGET_CPU_NBITS): Remove.
(TARGET_CPU_MASK): Remove.
* config/aarch64/aarch64.cc (AARCH64_CPU_DEFAULT_FLAGS): Remove define.
(get_tune_cpu): Assert CPU is always valid.
(get_arch): Assert architecture is always valid.
(aarch64_override_options): Cleanup CPU selection code and simplify logic.
(aarch64_option_restore): Remove unnecessary checks on tune.
This patch adds two new OpenMP runtime routines: omp_target_memcpy_async and
omp_target_memcpy_rect_async. Both functions are introduced in OpenMP 5.1 as
asynchronous variants of omp_target_memcpy and omp_target_memcpy_rect.
In contrast to the synchronous variants, the asynchronous functions have two
additional function parameters to allow the specification of task dependences:
int depobj_count
omp_depend_t *depobj_list
integer(c_int), value :: depobj_count
integer(omp_depend_kind), optional :: depobj_list(*)
The implementation splits the synchronous functions into two parts: (a) check
and (b) copy. Then (a) is used in the asynchronous functions for the sequential
part, and the actual copy process (b) is executed in a new created task. The
sequential part (a) takes into account the requirements for the return values:
"The routine returns zero if successful. Otherwise, it returns a non-zero
value." (omp_target_memcpy_async, OpenMP 5.1 spec, section 3.8.7)
"An application can determine the number of inclusive dimensions supported by an
implementation by passing NULL pointers (or C_NULL_PTR, for Fortran) for both
dst and src. The routine returns the number of dimensions supported by the
implementation for the specified device numbers. No copy operation is
performed." (omp_target_memcpy_rect_async, OpenMP 5.1 spec, section 3.8.8)
Due to asynchronicity an error is thrown if the asynchronous memcpy is not
successful (in contrast to the synchronous functions which use a return
value unequal to zero).
gcc/ChangeLog:
* omp-low.cc (omp_runtime_api_call): Added target_memcpy_async and
target_memcpy_rect_async to omp_runtime_apis array.
libgomp/ChangeLog:
* libgomp.map: Added omp_target_memcpy_async and
omp_target_memcpy_rect_async.
* libgomp.texi: Both functions are now supported.
* omp.h.in: Added omp_target_memcpy_async and
omp_target_memcpy_rect_async.
* omp_lib.f90.in: Added interfaces for both new functions.
* omp_lib.h.in: Likewise.
* target.c (ialias_redirect): Added for GOMP_task.
(omp_target_memcpy): Restructured into check and copy part.
(omp_target_memcpy_check): New helper function for omp_target_memcpy and
omp_target_memcpy_async that checks requirements.
(omp_target_memcpy_copy): New helper function for omp_target_memcpy and
omp_target_memcpy_async that performs the memcpy.
(omp_target_memcpy_async_helper): New helper function that is used in
omp_target_memcpy_async for the asynchronous task.
(omp_target_memcpy_async): Added.
(omp_target_memcpy_rect): Restructured into check and copy part.
(omp_target_memcpy_rect_check): New helper function for
omp_target_memcpy_rect and omp_target_memcpy_rect_async that checks
requirements.
(omp_target_memcpy_rect_copy): New helper function for
omp_target_memcpy_rect and omp_target_memcpy_rect_async that performs
the memcpy.
(omp_target_memcpy_rect_async_helper): New helper function that is used
in omp_target_memcpy_rect_async for the asynchronous task.
(omp_target_memcpy_rect_async): Added.
* task.c (ialias): Added for GOMP_task.
* testsuite/libgomp.c-c++-common/target-memcpy-async-1.c: New test.
* testsuite/libgomp.c-c++-common/target-memcpy-async-2.c: New test.
* testsuite/libgomp.c-c++-common/target-memcpy-rect-async-1.c: New test.
* testsuite/libgomp.c-c++-common/target-memcpy-rect-async-2.c: New test.
* testsuite/libgomp.fortran/target-memcpy-async-1.f90: New test.
* testsuite/libgomp.fortran/target-memcpy-async-2.f90: New test.
* testsuite/libgomp.fortran/target-memcpy-rect-async-1.f90: New test.
* testsuite/libgomp.fortran/target-memcpy-rect-async-2.f90: New test.
This patch replaces libbid's implementations of clz and ctz for 32 and
64 bits inputs which used several masks, and switches to the
corresponding builtins. This will provide a better implementation,
especially on targets with clz/ctz instructions.
2022-05-06 Christophe Lyon <christophe.lyon@arm.com>
libgcc/config/libbid/ChangeLog:
* bid_binarydecimal.c (CLZ32_MASK16): Delete.
(CLZ32_MASK8): Delete.
(CLZ32_MASK4): Delete.
(CLZ32_MASK2): Delete.
(CLZ32_MASK1): Delete.
(clz32_nz): Use __builtin_clz.
(ctz32_1bit): Delete.
(ctz32): Use __builtin_ctz.
(CLZ64_MASK32): Delete.
(CLZ64_MASK16): Delete.
(CLZ64_MASK8): Delete.
(CLZ64_MASK4): Delete.
(CLZ64_MASK2): Delete.
(CLZ64_MASK1): Delete.
(clz64_nz): Use __builtin_clzl.
(ctz64_1bit): Delete.
(ctz64): Use __builtin_ctzl.
This patch adds support for trunc and extend operations between HF
mode (_Float16) and Decimal Floating Point formats (_Decimal32,
_Decimal64 and _Decimal128).
For simplicity we rely on the implicit conversions inserted by the
compiler between HF and SD/DF/TF modes. The existing bid*_to_binary*
and binary*_to_bid* functions are non-trivial and at this stage it is
not clear if there is a performance-critical use case involving _Float16
and _Decimal* formats.
The patch also adds two executable tests, to make sure the right
functions are called, available (link phase) and functional.
Tested on aarch64 and x86_64. The number of symbol matches in the
testcases includes the .global XXX to avoid having to match different
call instructions for different targets.
2022-05-04 Christophe Lyon <christophe.lyon@arm.com>
libgcc/ChangeLog:
* Makefile.in (D32PBIT_FUNCS): Add _hf_to_sd and _sd_to_hf.
(D64PBIT_FUNCS): Add _hf_to_dd and _dd_to_hf.
(D128PBIT_FUNCS): Add _hf_to_td _td_to_hf.
libgcc/config/libbid/ChangeLog:
* bid_gcc_intrinsics.h (LIBGCC2_HAS_HF_MODE): Define according to
__LIBGCC_HAS_HF_MODE__.
(BID_HAS_HF_MODE): Define.
(HFtype): Define.
(__bid_extendhfsd): New prototype.
(__bid_extendhfdd): Likewise.
(__bid_extendhftd): Likewise.
(__bid_truncsdhf): Likewise.
(__bid_truncddhf): Likewise.
(__bid_trunctdhf): Likewise.
* _dd_to_hf.c: New file.
* _hf_to_dd.c: New file.
* _hf_to_sd.c: New file.
* _hf_to_td.c: New file.
* _sd_to_hf.c: New file.
* _td_to_hf.c: New file.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/convert-dfp-2.c: New test.
* gcc.dg/torture/convert-dfp.c: New test.
These tests exercise exception handling with Decimal Floating-Point
type.
dfp-1.C and dfp-2.C check that thrown objects of such types are
properly caught, whether when using C++ classes (decimalXX) or via GCC
mode attributes.
dfp-saves-aarch64.C checks that such objects are properly restored,
and has to use the mode attribute trick because objects of decimalXX
class type cannot be assigned to a register variable.
2022-05-03 Christophe Lyon <christophe.lyon@arm.com>
gcc/testsuite/
* g++.dg/eh/dfp-1.C: New test.
* g++.dg/eh/dfp-2.C: New test.
* g++.dg/eh/dfp-saves-aarch64.C: New test.
Some tests for the BID format are currently restricted to i?86 and
x86_64, but they also pass on AArch64, so this patch enables them.
Since all these tests are related to the BID format, it seems useful
to introduce a new effective-target (dfp_bid) instead of adding
aarch64 to the current target list.
2022-04-28 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* doc/sourcebuild.texi (Decimal floating point attributes): Document
dfp_bid effective-target.
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_dfp_bid): New.
* gcc.dg/dfp/bid-non-canonical-d128-1.c: Use dfp_bid
effective-target.
* gcc.dg/dfp/bid-non-canonical-d128-2.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d128-3.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d128-4.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d32-1.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d32-2.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d64-1.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d64-2.c: Likewise.
This patch copies all existing tests involving float/double/long
double types and replaces them with _Decimal32/_Decimal64/_Decimal128.
I thought it would be clearer/easier to maintain to do it this way
rather than adding tests for DFP types in the existing testcases,
except for func-ret-1.c and func-ret-3.c.
This makes sure all cases tested for traditional floating-point are
equally tested for decimal floating-point.
The patch also adds a test involving loading DFP values from memory.
2022-03-31 Christophe Lyon <christophe.lyon@arm.com>
gcc/testsuite/
* gcc.target/aarch64/aapcs64/aapcs64.exp: Support new dfp*.c tests.
* gcc.target/aarch64/aapcs64/func-ret-1.c: Add DFP tests.
* gcc.target/aarch64/aapcs64/func-ret-3.c: Add DFP tests.
* gcc.target/aarch64/aapcs64/type-def.h: Add DFP types.
* gcc.target/aarch64/aapcs64/dfp-1.c: New test.
* gcc.target/aarch64/aapcs64/ice_dfp_5.c: New test.
* gcc.target/aarch64/aapcs64/test_align_dfp-1.c: New test.
* gcc.target/aarch64/aapcs64/test_align_dfp-4.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_1.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_10.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_11.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_12.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_13.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_14.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_15.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_16.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_17.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_18.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_19.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_2.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_20.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_21.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_22.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_23.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_24.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_25.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_26.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_27.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_3.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_5.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_6.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_7.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_8.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_9.c: New test.
* gcc.target/aarch64/aapcs64/test_quad_double_dfp.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-1.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-10.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-11.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-12.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-13.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-14.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-16.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-2.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-3.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-4.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-5.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-6.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-8.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-9.c: New test.
The testcase in c-c++-common/dfp/pr39986.c detects if DFP constants
are correctly emitted in the assembly. However, AArch64 uses .word
instead of the expected .long directive. With this patch, we now
accept both.
2022-03-31 Christophe Lyon <christophe.lyon@arm.com>
gcc/testsuite/
* c-c++-common/dfp/pr39986.c: Accept .word directive.
DFP support on AArch64 relies on libgcc, so enable its DFP routines
for all AArch64 targets.
2022-03-31 Christophe Lyon <christophe.lyon@arm.com>
libgcc/
* config.host: Add t-dfprules to AArch64 targets.
This patch updates the aarch64 backend as needed to support DFP modes
(SD, DD and TD).
Changes v1->v2:
* Drop support for DFP modes in
aarch64_gen_{load||store}[wb]_pair as these are only used in
prologue/epilogue where DFP modes are not used. Drop the
changes to the corresponding patterns in aarch64.md, and
useless GPF_PAIR iterator.
* In aarch64_reinterpret_float_as_int, handle DDmode the same way
as DFmode (needed in case the representation of the
floating-point value can be loaded using mov/movk.
* In aarch64_float_const_zero_rtx_p, reject constants with DFP
mode: when X is zero, the callers want to emit either '0' or
'zr' depending on the context, which is not the way 0.0 is
represented in DFP mode (in particular fmov d0, #0 is not right
for DFP).
* In aarch64_legitimate_constant_p, accept DFP
2022-03-31 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/aarch64/aarch64.cc
(aarch64_split_128bit_move): Handle DFP modes.
(aarch64_mode_valid_for_sched_fusion_p): Likewise.
(aarch64_classify_address): Likewise.
(aarch64_legitimize_address_displacement): Likewise.
(aarch64_reinterpret_float_as_int): Likewise.
(aarch64_float_const_zero_rtx_p): Likewise.
(aarch64_can_const_movi_rtx_p): Likewise.
(aarch64_anchor_offset): Likewise.
(aarch64_secondary_reload): Likewise.
(aarch64_rtx_costs): Likewise.
(aarch64_legitimate_constant_p): Likewise.
(aarch64_gimplify_va_arg_expr): Likewise.
(aapcs_vfp_sub_candidate): Likewise.
(aarch64_vfp_is_call_or_return_candidate): Likewise.
(aarch64_output_scalar_simd_mov_immediate): Likewise.
(aarch64_gen_adjusted_ldpstp): Likewise.
(aarch64_scalar_mode_supported_p): Accept DFP modes if enabled.
* config/aarch64/aarch64.md
(movsf_aarch64): Use SFD iterator and rename into
mov<mode>_aarch64.
(movdf_aarch64): Use DFD iterator and rename into
mov<mode>_aarch64.
(movtf_aarch64): Use TFD iterator and rename into
mov<mode>_aarch64.
(split pattern for move TF mode): Use TFD iterator.
* config/aarch64/iterators.md
(GPF_TF_F16_MOV): Add DFP modes.
(SFD, DFD, TFD): New iterators.
(GPF_TF): Add DFP modes.
(TX, DX, DX2): Likewise.
We should prefer the __UINT_LEAST16_TYPE__ and __UINT_LEAST32_TYPE__
macros, if available, so that we don't need all of <cstdint> in every
header that uses std::char_traits.
libstdc++-v3/ChangeLog:
* include/bits/char_traits.h: Only include <cstdint> when
necessary.
* include/std/stacktrace: Use __UINTPTR_TYPE__ instead of
uintptr_t.
* src/c++11/cow-stdexcept.cc: Include <stdint.h>.
* src/c++17/floating_to_chars.cc: Likewise.
* testsuite/20_util/assume_aligned/1.cc: Include <cstdint>.
* testsuite/20_util/assume_aligned/3.cc: Likewise.
* testsuite/20_util/shared_ptr/creation/array.cc: Likewise.
Since the COW std::string was moved to its own header, we don't need the
atomic dispatch helpers in the definition of std::__cxx11::string. Move
the inclusion of the <ext/atomicity.h> header to <bits/cow_string.h>
where it's needed.
libstdc++-v3/ChangeLog:
* include/bits/basic_string.h: Do not include <ext/atomicity.h>
here.
* include/bits/cow_string.h: Include it here.
Currently the alias templates for std::pmr::vector, std::pmr::string
etc. are defined using a forward declaration for polymorphic_allocator.
This means you can't actually use the alias templates unless you also
include <memory_resource>. The rationale for that is that it's a fairly
large header, and most users don't need it. This isn't uncontroversial
though, and LWG 3681 questions whether it's even conforming.
This change adds a new <bits/memory_resource.h> header with the minimum
needed to use polymorphic_allocator and the std::pmr container aliases.
Including <memory_resource> is still necessary to use the program-wide
resource objects, or the pool resources or monotonic buffer resource.
libstdc++-v3/ChangeLog:
* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/bits/memory_resource.h: New file.
* include/std/deque: Include <bits/memory_resource.h>.
* include/std/forward_list: Likewise.
* include/std/list: Likewise.
* include/std/map: Likewise.
* include/std/memory_resource (pmr::memory_resource): Move to
new <bits/memory_resource.h> header.
(pmr::polymorphic_allocator): Likewise.
* include/std/regex: Likewise.
* include/std/set: Likewise.
* include/std/stacktrace: Likewise.
* include/std/string: Likewise.
* include/std/unordered_map: Likewise.
* include/std/unordered_set: Likewise.
* include/std/vector: Likewise.
* testsuite/21_strings/basic_string/types/pmr_typedefs.cc:
Remove <memory_resource> header and check construction.
* testsuite/23_containers/deque/types/pmr_typedefs.cc: Likewise.
* testsuite/23_containers/forward_list/pmr_typedefs.cc:
Likewise.
* testsuite/23_containers/list/pmr_typedefs.cc: Likewise.
* testsuite/23_containers/map/pmr_typedefs.cc: Likewise.
* testsuite/23_containers/multimap/pmr_typedefs.cc: Likewise.
* testsuite/23_containers/multiset/pmr_typedefs.cc: Likewise.
* testsuite/23_containers/set/pmr_typedefs.cc: Likewise.
* testsuite/23_containers/unordered_map/pmr_typedefs.cc:
Likewise.
* testsuite/23_containers/unordered_multimap/pmr_typedefs.cc:
Likewise.
* testsuite/23_containers/unordered_multiset/pmr_typedefs.cc:
Likewise.
* testsuite/23_containers/unordered_set/pmr_typedefs.cc:
Likewise.
* testsuite/23_containers/vector/pmr_typedefs.cc: Likewise.
* testsuite/28_regex/match_results/pmr_typedefs.cc: Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/variadic-tuple.C: Qualify function to avoid ADL
finding std::make_tuple.
The patch is a revised solution for PR middle-end/98865 incorporating
the feedback/suggestions from Richard Biener's review here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-May/593928.html
Most significantly, this patch now performs the transformation/optimization
during RTL expansion, where the target's rtx_costs can be used to determine
whether the original multiplication (that may potentially be implemented by
a shift or lea) is cheaper than a negation and a bit-wise and.
Previously the expression (x>>63)*y would be compiled with -O2 as
shrq $63, %rdi
movq %rdi, %rax
imulq %rsi, %rax
but with this patch now produces:
sarq $63, %rdi
movq %rdi, %rax
andq %rsi, %rax
Likewise the expression (x>>63)*135 [that appears in a hot-spot of the
Botan AES-128 benchmark] was previously:
shrq $63, %rdi
leaq (%rdi,%rdi,8), %rdx
movq %rdx, %rax
salq $4, %rax
subq %rdx, %rax
now becomes:
movq %rdi, %rax
sarq $63, %rax
andl $135, %eax
2022-05-19 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR middle-end/98865
* expr.cc (expand_expr_real_2) [MULT_EXPR]: Expand X*Y as X&Y
when both X and Y are [0, 1], X*Y as X&-Y when Y is [0,1] and
likewise X*Y as -X&Y when X is [0,1] using tree_nonzero_bits.
gcc/testsuite/ChangeLog
PR middle-end/98865
* gcc.target/i386/pr98865.c: New test case.
These defines are no longer used once the rs6000 built-in
reworks were completed. Time to remove them.
There was a reference to RS6000_BTC_SPECIAL in a TODO comment
in rs6000-builtins.def. That comment remains, but I have updated
the comment to refer to "SPECIAL" processing, instead of having it
refer directly to the _BTC_SPECIAL macro.
2022-05-18 Will Schmidt <will_schmidt@vnet.ibm.com>
gcc/
* config/rs6000/rs6000-builtins.def: Rephrase
to remove RS6000_BTC_SPECIAL from comment.
* config/rs6000/rs6000.h (RS6000_BTC_UNARY, RS6000_BTC_BINARY,
RS6000_BTC_TERNARY, RS6000_BTC_QUATERNARY,
RS6000_BTC_QUINARY, RS6000_BTC_SENARY, RS6000_BTC_OPND_MASK,
RS6000_BTC_SPECIAL, RS6000_BTC_PREDICATE, RS6000_BTC_ABS,
RS6000_BTC_DST, RS6000_BTC_TYPE_MASK, RS6000_BTC_MISC,
RS6000_BTC_CONST, RS6000_BTC_PURE, RS6000_BTC_FP,
RS6000_BTC_QUAD, RS6000_BTC_PAIR, RS6000_BTC_QUADPAIR,
RS6000_BTC_ATTR_MASK, RS6000_BTC_SPR, RS6000_BTC_VOID,
RS6000_BTC_CR, RS6000_BTC_OVERLOADED, RS6000_BTC_GIMPLE,
RS6000_BTC_MISC_MASK, RS6000_BTC_MEM, RS6000_BTC_SAT,
RS6000_BTM_ALWAYS): Delete.
This issue has recently been moved to Tentatively Ready, and seems
uncontroversial. This allows equality comparison with types that are
convertible to pmr::polymorphic_allocator, which fail deduction for the
existing equality operator.
libstdc++-v3/ChangeLog:
* include/std/memory_resource (polymorphic_allocator): Add
non-template equality operator, as proposed for LWG 3683.
* testsuite/20_util/polymorphic_allocator/lwg3683.cc: New test.
When forcing the condition to be split out from COND_EXPRs I see
a runtime failure of libgomp.fortran/atomic-19.f90 which can be
reduced to
!$omp atomic update, compare, capture
if (x == 69_2 - r) x = 6_8
v = x
being miscompiled, the difference being
- _13 = .ATOMIC_COMPARE_EXCHANGE (_9, _10, _11, 4, 0, 0);
- _14 = IMAGPART_EXPR <_13>;
- _15 = REALPART_EXPR <_13>;
- _16 = _14 != 0 ? _11 : _15;
- _2 = (integer(kind=4)) _16;
- v_17 = _2;
+ _14 = .ATOMIC_COMPARE_EXCHANGE (_10, _11, _12, 4, 0, 0);
+ _15 = IMAGPART_EXPR <_14>;
+ _16 = REALPART_EXPR <_14>;
+ _2 = (logical(kind=1)) _15;
+ _3 = (integer(kind=4)) _16;
+ v_17 = _3;
where one can see a missing COND_EXPR. It seems to be a latent
issue to me given the code can be exercised, it just maybe misses
a 'need_new' testcase combined with 'cond_stmt'. Appearantly
the if (cond_stmt) code is just to avoid creating a temporary
(and possibly to preserve the condition compute if used elsewhere
since the original stmt is going to be deleted). The following
makes the failure go away for me in my patched tree and it
also survives libgomp and gomp testing in an unpatched tree.
2022-05-13 Richard Biener <rguenther@suse.de>
* omp-expand.cc (expand_omp_atomic_cas): Do not short-cut
computation of the new value.
This function is no longer needed.
2022-05-19 Richard Biener <rguenther@suse.de>
* tree-ssa-pre.cc (get_or_alloc_expression_id): Remove.
(add_to_value): Use get_expression_id.
(bitmap_insert_into_set): Likewise.
(bitmap_value_insert_into_set): Likewise.
gcc/ada/
* gcc-interface/decl.cc (gnat_to_gnu_entity) <E_Constant>: Deal with
a constant related to a return in a function specially.
* gcc-interface/trans.cc (Call_to_gnu): Use return slot optimization
if the target is a return object.
(gnat_to_gnu) <N_Object_Declaration>: Deal with a constant related
to a return in a function specially.
The rationale is that these entities are almost always the result of
expansion activities in the front-end, over which the user has very
limited control. These warnings can be restored by means of -gnatD.
gcc/ada/
* gcc-interface/utils.cc (gnat_pushdecl): Also set TREE_NO_WARNING
on the decl if Comes_From_Source is false for the associated node.
This alphabetizes the large switch statement, removes a useless nested
switch statement, an artificial fall through and adds a default return.
No functional changes.
gcc/ada/
* gcc-interface/trans.cc (gnat_gimplify_expr): Tidy up.
The issue arises when the unchecked union contains both a fixed part and
a variant part, and is subject to a full representation clause covering
all the components in all the variants, when the component clauses do not
align the variant boundaries with byte boundaries consistently.
gcc/ada/
* gcc-interface/decl.cc (components_to_record): Use NULL recursively
as P_GNU_REP_LIST for the innermost variant level in the unchecked
union case with a fixed part.
The message "No source file position information available" is displayed
in the bugbox when Current_Error_Node has no location, which is useless.
gcc/ada/
* gcc-interface/trans.cc (gnat_to_gnu): Do not set Current_Error_Node
to a node without location.
The front-end properly computes a linear elaboration order for them, but
there was a loophole in the handling of the delayed case.
gcc/ada/
* gcc-interface/decl.cc (gnat_to_gnu_entity) <E_Access_Subtype>: And
skip the elaboration of the designated subtype when that of its base
type has been delayed.
This creates a couple of record subtypes pointing to each other through
access subtypes, and we break the circularity at the latter subtypes.
gcc/ada/
* gcc-interface/decl.cc (gnat_to_gnu_entity) <E_Record_Subtype>: If
it is a special subtype designated by an access subtype, then defer
the completion of incomplete types.
This makes it possible to pass the result to a C function directly.
gcc/ada/
* gcc-interface/utils.cc (unchecked_convert): Do not fold a string
constant if the target type is pointer to character.
We flag illegal pragma Elaborate with a call to Error_Msg on the pragma
argument, which in turn calls Set_Error_Posted on the enclosing
statement, i.e. on the pragma itself. The explicit call to
Set_Error_Posted on the pragma itself was redundant.
Cleanup related to handling of illegal code when detecting uninitialized
scalar objects.
gcc/ada/
* sem_prag.adb (Analyze_Pragma): Remove redundant call to
Set_Error_Posted.
When resolution of an expanded name fails, we call routine
Error_Missing_With_Of_Known_Unit which emits an error continuation
message (i.e. an error string starting with \\). However, for error
continuations to work properly there must be some prior error, because
continuation itself doesn't set flags like Serious_Errors_Detected.
Without these flags the problematic statement is not marked with
Error_Posted, which in turn is needed to prevent cascaded errors.
In particular, when unresolved procedure call uses a direct name or an
extended name with an unknown prefix, e.g.:
Unknown (1, 2, 3);
Unknown.Call (1, 2, 3);
then the N_Procedure_Call statements are marked with Error_Posted. But
when a call uses an extended name with a known prefix we failed to flag
the N_Procedure_Call with Error_Posted.
Found while improving the robustness of a feature that detects
uninitialized scalar objects.
gcc/ada/
* sem_ch8.adb (Find_Expanded_Name): Emit a main error message
before adding a continuation with the call to
Error_Missing_With_Of_Known_Unit.
gcc/ada/
* sem_ch13.adb (Build_Predicate_Functions): If a semantic error
has been detected then ignore Predicate_Failure aspect
specifications in the same way as is done for CodePeer and
SPARK. This avoids an internal compiler error if
Ancestor_Predicate_Function_Called is True but Result_Expr is
not an N_And_Then node (and is therefore unsuitable as an
argument in a call to Left_Opnd).
Now that finalization and return on the secondary stack are decoupled, the
transient scopes created because of the former need not necessarily manage
the secondary stack and trigger a violation of the associated restriction.
gcc/ada/
* exp_ch7.adb (Wrap_Transient_Declaration): Propagate Uses_Sec_Stack
to enclosing function if it does not return on the secondary stack.
* exp_ch6.adb (Expand_Call_Helper): Call Establish_Transient_Scope
with Manage_Sec_Stack set to True only when necessary.
* sem_res.adb (Resolve_Call): Likewise.
(Resolve_Entry_Call): Likewise.
Instead of using that of Original_Node (N) after rewriting, which does not
work if N had previously been rewritten.
gcc/ada/
* exp_ch4.adb (Narrow_Large_Operation): Preserve and reuse Etype.
When the prefix of an Access attribute is an explicit dereference of an
access parameter (or a renaming of such a dereference, or a subcomponent
of such a dereference), the context is a general access type to a
class-wide interface type, and an accessibility check must be generated,
the frontend silently skips generating an implicit type conversion to
force the displacement of the pointer to reference the secondary
dispatch table.
gcc/ada/
* exp_attr.adb (Add_Implicit_Interface_Type_Conversion): New
subprogram which factorizes code.
(Expand_N_Attribute_Reference): Call the new subprogram to add
the missing implicit interface type conversion.
In GNATprove mode we are don't want predicate failure to pollute the
predicate expression extracted from the predicate function.
gcc/ada/
* sem_ch13.adb (Build_Predicate_Function): Ignore predicate
failure in GNATprove mode.
The run-time behavior of the Ada 2022 Predicate_Failure aspect was
incorrectly implemented. This could cause incorrect exception messages
at execution time in the case of a predicate check failure, as
demonstrated by ACATS test C324006. In addition, a new attribute
(Predicate_Expression) is defined in order to improve the FE/SPARK
interface.
gcc/ada/
* einfo-utils.ads, einfo-utils.adb: Delete Predicate_Function_M
function and Set_Predicate_Function_M procedure.
* einfo.ads: Delete comments for Is_Predicate_Function_M and
Predicate_Function_M functions. Add comment for new
Predicate_Expression function. Update comment describing
predicate functions.
* exp_util.ads, exp_util.adb (Make_Predicate_Call): Replace Mem
formal parameter with Static_Mem and Dynamic_Mem formals.
(Make_Predicate_Check): Delete Add_Failure_Expression and call
to it.
* exp_ch4.adb (Expand_N_In.Predicate_Check): Update
Make_Predicate_Call call to match profile change.
* gen_il-fields.ads: Delete Is_Predicate_Function_M field, add
Predicate_Expression field.
* gen_il-gen-gen_entities.adb: Delete Is_Predicate_Function_M
use, add Predicate_Expression use.
* sem_ch13.adb (Build_Predicate_Functions): Rename as singular,
not plural; we no longer build a Predicate_M function. Delete
Predicate_M references. Add new Boolean parameter for predicate
functions when needed. Restructure body of generated predicate
functions to implement required Predicate_Failure behavior and
to set new Predicate_Expression attribute. Remove special
treatment of raise expressions within predicate expressions.
* sem_util.ads (Predicate_Failure_Expression,
Predicate_Function_Needs_Membership_Parameter): New functions.
* sem_util.adb (Is_Current_Instance): Fix bugs which caused
wrong result.
(Is_Current_Instance_Reference_In_Type_Aspect): Delete
Is_Predicate_Function_M reference.
(Predicate_Failure_Expression): New function.
(Propagate_Predicate_Attributes): Delete Is_Predicate_Function_M
references.
The underlying issue is that the front-end does not create transient scopes
for return statements, so objects copied for these statements can never be
finalized properly.
gcc/ada/
* exp_ch6.adb (Expand_Call_Helper): Adjust comment.
(Expand_Simple_Function_Return): For the case of a type which needs
finalization and is returned on the primary stack, do not create a
copy if the expression originates from a function call.
* exp_ch7.adb (Transient Scope Management): Adjust comment.
* exp_util.ads (Is_Related_To_Func_Return): Add WARNING line.
* fe.h (Is_Related_To_Func_Return): Declare.
Expansion of entry families created a slightly illegal AST with
Elsif_Parts being an empty list. Cleanup uncovered by the work on
detection of uninitialized scalars.
gcc/ada/
* exp_ch9.adb (Build_Find_Body_Index): Remove empty Elsif_Parts
from the constructed IF statement.
Expansion of entry families contained a condition that was always true.
Cleanup related to detection of uninitialized scalar objects (which
uncovered that expansion of entry families creates a slightly illegal
AST with Elsif_Parts being an empty list).
gcc/ada/
* exp_ch9.adb (Build_Find_Body_Index): Remove IF statement whose
condition was true-by-construction; remove excessive assertion
(since the call to Elsif_Parts will check that Nod is present
and it is an if-statement).