This patch fixes a spurious warning for the pru-unknown-elf target:
gcc/testsuite/gcc.dg/mallign.c:12:27: warning: ignoring return value of 'malloc' declared with attribute 'warn_unused_result' [-Wunused-result]
For 8-bit targets the resulting mask ignores all bits in the value
returned by malloc. Fix by first checking the target word size.
gcc/testsuite/ChangeLog:
* gcc.dg/mallign.c: Skip check if sizeof(word)==1.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
This adds demangling support for C++ modules. A new 'W' component
along with augmented behaviour of 'S' components.
include/
* demangle.h (enum demangle_component_type): Add module components.
libiberty/
* cp-demangle.c (d_make_comp): Adjust.
(d_name, d_prefix): Adjust subst handling. Add module handling.
(d_maybe_module_name): New.
(d_unqualified_name): Add incoming module parm. Handle it. Adjust all callers.
(d_special_name): Add 'GI' support.
(d_count_template_scopes): Adjust.
(d_print_comp_inner): Print module.
* testsuite/demangle-expected: New test cases
This is a first cleanup opportunity from the COND_EXPR gimplification
which allows us to remove now redundant forward_propagate_into_cond.
2022-05-23 Richard Biener <rguenther@suse.de>
* tree-ssa-forwprop.cc (forward_propagate_into_cond): Remove.
(pass_forwprop::execute): Do not propagate into COND_EXPR conditions.
This removes is_gimple_condexpr, note the vectorizer via patterns
still creates COND_EXPRs with embedded GENERIC conditions and has
a reference to the function in comments. Otherwise is_gimple_condexpr
is now equal to is_gimple_val.
2022-05-16 Richard Biener <rguenther@suse.de>
* gimple-expr.cc (is_gimple_condexpr): Remove.
* gimple-expr.h (is_gimple_condexpr): Likewise.
* gimplify.cc (gimplify_expr): Remove is_gimple_condexpr usage.
* tree-if-conv.cc (set_bb_predicate): Likewie.
(add_to_predicate_list): Likewise.
(gen_phi_arg_condition): Likewise.
(predicate_scalar_phi): Likewise.
(predicate_statements): Likewise.
This goes away with the selection operand allowed to be a GENERIC
tcc_comparison tree. It keeps those for vectorizer pattern recog,
those are short lived and removing this instance is a bigger task.
The patch doesn't yet remove dead code and functionality, that's
left for a followup. Instead the patch makes sure to produce
valid GIMPLE IL and continue to optimize COND_EXPRs where the
previous IL allowed and the new IL showed regressions in the testsuite.
2022-05-16 Richard Biener <rguenther@suse.de>
* gimple-expr.cc (is_gimple_condexpr): Equate to is_gimple_val.
* gimplify.cc (gimplify_pure_cond_expr): Gimplify the condition
as is_gimple_val.
* gimple-fold.cc (valid_gimple_rhs_p): Simplify.
* tree-cfg.cc (verify_gimple_assign_ternary): Likewise.
* gimple-loop-interchange.cc (loop_cand::undo_simple_reduction):
Build the condition of the COND_EXPR separately.
* tree-ssa-loop-im.cc (move_computations_worker): Likewise.
* tree-vect-generic.cc (expand_vector_condition): Likewise.
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Likewise.
* vr-values.cc (simplify_using_ranges::simplify): Likewise.
* tree-vect-patterns.cc: Add comment indicating we are
building invalid COND_EXPRs and why.
* omp-expand.cc (expand_omp_simd): Gimplify the condition
to the COND_EXPR separately.
(expand_omp_atomic_cas): Note part that should be unreachable
now.
* tree-ssa-forwprop.cc (forward_propagate_into_cond): Adjust
condition for valid replacements.
* tree-if-conv.cc (predicate_bbs): Simulate previous
re-folding of the condition in folded COND_EXPRs which
is necessary because of unfolded GIMPLE_CONDs in the IL
as in for example gcc.dg/fold-bopcond-1.c.
* gimple-range-gori.cc (gori_compute::condexpr_adjust):
Handle that the comparison is now in the def stmt of
the select operand. Required by gcc.dg/pr104526.c.
* gcc.dg/gimplefe-27.c: Adjust.
* gcc.dg/gimplefe-45.c: Likewise.
* gcc.dg/pr101145-2.c: Likewise.
* gcc.dg/pr98211.c: Likewise.
* gcc.dg/torture/pr89595.c: Likewise.
* gcc.dg/tree-ssa/divide-7.c: Likewise.
* gcc.dg/tree-ssa/ssa-lim-12.c: Likewise.
For allocatable/pointer arrays, a firstprivate to a device
not only needs to privatize the descriptor but also the actual
data. This is implemented as:
firstprivate(x) firstprivate(x.data) attach(x [bias: &x.data-&x)
where the address of x in device memory is saved in hostaddrs[i]
by libgomp and the middle end actually passes hostaddrs[i]' to
attach.
As side effect, has_device_addr(array_desc) had to be changed:
before, it was converted to firstprivate in the front end; now
it is handled in omp-low.cc as has_device_addr requires a shallow
firstprivate (not touching the data pointer) while the normal
firstprivate requires (now) a deep firstprivate.
gcc/fortran/ChangeLog:
PR fortran/104949
* f95-lang.cc (LANG_HOOKS_OMP_ARRAY_SIZE): Redefine.
* trans-openmp.cc (gfc_omp_array_size): New.
(gfc_trans_omp_variable_list): Never turn has_device_addr
to firstprivate.
* trans.h (gfc_omp_array_size): New.
gcc/ChangeLog:
PR fortran/104949
* langhooks-def.h (lhd_omp_array_size): New.
(LANG_HOOKS_OMP_ARRAY_SIZE): Define.
(LANG_HOOKS_DECLS): Add it.
* langhooks.cc (lhd_omp_array_size): New.
* langhooks.h (struct lang_hooks_for_decls): Add hook.
* omp-low.cc (scan_sharing_clauses, lower_omp_target):
Handle GOMP_MAP_FIRSTPRIVATE for array descriptors.
libgomp/ChangeLog:
PR fortran/104949
* target.c (gomp_map_vars_internal, copy_firstprivate_data):
Support attach for GOMP_MAP_FIRSTPRIVATE.
* testsuite/libgomp.fortran/target-firstprivate-1.f90: New test.
* testsuite/libgomp.fortran/target-firstprivate-2.f90: New test.
* testsuite/libgomp.fortran/target-firstprivate-3.f90: New test.
Double-word NOT requires two operations, but double-word NEG requires
three operations. Using SSE, vector NOT requires a pxor with -1, but
AND of NOT is cheap thanks to the existence of pandn. There's also some
legacy (aka incorrect) logic explicitly testing for DImode [independently
of TARGET_64BIT] in determining the cost of logic operations that's not
required.
2022-05-23 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386.cc (ix86_rtx_costs) <case AND>: Split from
XOR/IOR case. Account for two instructions for double-word
operations. In case of vector pandn, account for single
instruction. Likewise for integer andn with TARGET_BMI.
<case NOT>: Vector NOT requires more than 1 instruction (pxor).
<case NEG>: Double-word negation requires 3 instructions.
This commit fixes canonical extension order to follow the RISC-V ISA
Manual draft-20210402-1271737 or later.
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc (riscv_supported_std_ext):
Fix "K" extension prefix to be placed before "J".
* config/riscv/arch-canonicalize: Likewise.
Signed-off-by: Tsukasa OI <research_trasio@irq.a4lg.com>
kmovd only uses port5 which is often the bottleneck of
performance. Also from latency perspective, spill and reload mostly
could be STLF or even MRN which only take 1 cycle.
So the patch increase move cost between gpr and mask to be the same as
gpr <-> sse register.
gcc/ChangeLog:
* config/i386/x86-tune-costs.h (skylake_cost): Increase gpr
<-> mask cost from 5 to 6.
(icelake_cost): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/spill_to_mask-1.c: New test.
Like AVR and Cris, PRU has no alignment requirements. Thus it is
also affected by PR53535.
PR middle-end/53535
gcc/testsuite/ChangeLog:
* gcc.dg/pr46647.c: Skip for pru target.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
PRU has no condition code and conditional moves.
gcc/testsuite/ChangeLog:
* gcc.dg/ifcvt-4.c: Skip for PRU.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
If the target uses packed structs by default, there are no trailing
padding bytes allocated. Hence extra warnings are emitted.
gcc/testsuite/ChangeLog:
* gcc.dg/Warray-bounds-48-novec.c: Add expected warnings
if target packs the structs by default.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
Some of these tests take several minutes on a simulator like cris-elf,
so we can conditionally run fewer iterations. The testDiscreteDist
helper already supports custom sizes so we just need to make use of that
when { target simulator } matches.
The relevant code is sufficiently tested on other targets, so we're not
losing anything by only running a small number of iterators for sims.
libstdc++-v3/ChangeLog:
* testsuite/26_numerics/random/bernoulli_distribution/operators/values.cc:
Run fewer iterations for simulator targets.
* testsuite/26_numerics/random/binomial_distribution/operators/values.cc:
Likewise.
* testsuite/26_numerics/random/discrete_distribution/operators/values.cc:
Likewise.
* testsuite/26_numerics/random/geometric_distribution/operators/values.cc:
Likewise.
* testsuite/26_numerics/random/negative_binomial_distribution/operators/values.cc:
Likewise.
* testsuite/26_numerics/random/poisson_distribution/operators/values.cc:
Likewise.
* testsuite/26_numerics/random/uniform_int_distribution/operators/values.cc:
Likewise.
Improve and generalize rotate patterns. Rotates by more than half the
bitwidth of a register are canonicalized to rotate left. Many existing
shift patterns don't handle this case correctly, so add rotate left to
the shift iterator and convert rotate left into ror during assembly
output. Add missing zero_extend patterns for shifted BIC, ORN and EON.
gcc/
* config/aarch64/aarch64.md
(and_<SHIFT:optab><mode>3_compare0): Support rotate left.
(and_<SHIFT:optab>si3_compare0_uxtw): Likewise.
(<LOGICAL:optab>_<SHIFT:optab><mode>3): Likewise.
(<LOGICAL:optab>_<SHIFT:optab>si3_uxtw): Likewise.
(one_cmpl_<optab><mode>2): Likewise.
(<LOGICAL:optab>_one_cmpl_<SHIFT:optab><mode>3): Likewise.
(<LOGICAL:optab>_one_cmpl_<SHIFT:optab>sidi_uxtw): New pattern.
(eor_one_cmpl_<SHIFT:optab><mode>3_alt): Support rotate left.
(eor_one_cmpl_<SHIFT:optab>sidi3_alt_ze): Likewise.
(and_one_cmpl_<SHIFT:optab><mode>3_compare0): Likewise.
(and_one_cmpl_<SHIFT:optab>si3_compare0_uxtw): Likewise.
(and_one_cmpl_<SHIFT:optab><mode>3_compare0_no_reuse): Likewise.
(and_<SHIFT:optab><mode>3nr_compare0): Likewise.
(*<optab>si3_insn_uxtw): Use SHIFT_no_rotate.
(rolsi3_insn_uxtw): New pattern.
* config/aarch64/iterators.md (SHIFT): Add rotate left.
(SHIFT_no_rotate): Add new iterator.
(SHIFT:shift): Print rotate left as ror.
(is_rotl): Add test for left rotate.
gcc/testsuite/
* gcc.target/aarch64/ror_2.c: New test.
* gcc.target/aarch64/ror_3.c: New test.
The --with-cpu/--with-arch configure option processing not only checks valid
arguments but also sets TARGET_CPU_DEFAULT with a CPU and extension bitmask.
This isn't used however since a --with-cpu is translated into a -mcpu option
which is processed as if written on the command-line (so TARGET_CPU_DEFAULT
is never accessed).
So remove all the complex processing and bitmask, and just validate the
option. Fix a bug that always reports valid architecture extensions as invalid.
As a result the CPU processing in aarch64.c can be simplified.
gcc/
* config.gcc (aarch64*-*-*): Simplify --with-cpu and --with-arch
processing. Add support for architectural extensions.
* config/aarch64/aarch64.h (TARGET_CPU_DEFAULT): Remove
AARCH64_CPU_DEFAULT_FLAGS.
(TARGET_CPU_NBITS): Remove.
(TARGET_CPU_MASK): Remove.
* config/aarch64/aarch64.cc (AARCH64_CPU_DEFAULT_FLAGS): Remove define.
(get_tune_cpu): Assert CPU is always valid.
(get_arch): Assert architecture is always valid.
(aarch64_override_options): Cleanup CPU selection code and simplify logic.
(aarch64_option_restore): Remove unnecessary checks on tune.
This patch adds two new OpenMP runtime routines: omp_target_memcpy_async and
omp_target_memcpy_rect_async. Both functions are introduced in OpenMP 5.1 as
asynchronous variants of omp_target_memcpy and omp_target_memcpy_rect.
In contrast to the synchronous variants, the asynchronous functions have two
additional function parameters to allow the specification of task dependences:
int depobj_count
omp_depend_t *depobj_list
integer(c_int), value :: depobj_count
integer(omp_depend_kind), optional :: depobj_list(*)
The implementation splits the synchronous functions into two parts: (a) check
and (b) copy. Then (a) is used in the asynchronous functions for the sequential
part, and the actual copy process (b) is executed in a new created task. The
sequential part (a) takes into account the requirements for the return values:
"The routine returns zero if successful. Otherwise, it returns a non-zero
value." (omp_target_memcpy_async, OpenMP 5.1 spec, section 3.8.7)
"An application can determine the number of inclusive dimensions supported by an
implementation by passing NULL pointers (or C_NULL_PTR, for Fortran) for both
dst and src. The routine returns the number of dimensions supported by the
implementation for the specified device numbers. No copy operation is
performed." (omp_target_memcpy_rect_async, OpenMP 5.1 spec, section 3.8.8)
Due to asynchronicity an error is thrown if the asynchronous memcpy is not
successful (in contrast to the synchronous functions which use a return
value unequal to zero).
gcc/ChangeLog:
* omp-low.cc (omp_runtime_api_call): Added target_memcpy_async and
target_memcpy_rect_async to omp_runtime_apis array.
libgomp/ChangeLog:
* libgomp.map: Added omp_target_memcpy_async and
omp_target_memcpy_rect_async.
* libgomp.texi: Both functions are now supported.
* omp.h.in: Added omp_target_memcpy_async and
omp_target_memcpy_rect_async.
* omp_lib.f90.in: Added interfaces for both new functions.
* omp_lib.h.in: Likewise.
* target.c (ialias_redirect): Added for GOMP_task.
(omp_target_memcpy): Restructured into check and copy part.
(omp_target_memcpy_check): New helper function for omp_target_memcpy and
omp_target_memcpy_async that checks requirements.
(omp_target_memcpy_copy): New helper function for omp_target_memcpy and
omp_target_memcpy_async that performs the memcpy.
(omp_target_memcpy_async_helper): New helper function that is used in
omp_target_memcpy_async for the asynchronous task.
(omp_target_memcpy_async): Added.
(omp_target_memcpy_rect): Restructured into check and copy part.
(omp_target_memcpy_rect_check): New helper function for
omp_target_memcpy_rect and omp_target_memcpy_rect_async that checks
requirements.
(omp_target_memcpy_rect_copy): New helper function for
omp_target_memcpy_rect and omp_target_memcpy_rect_async that performs
the memcpy.
(omp_target_memcpy_rect_async_helper): New helper function that is used
in omp_target_memcpy_rect_async for the asynchronous task.
(omp_target_memcpy_rect_async): Added.
* task.c (ialias): Added for GOMP_task.
* testsuite/libgomp.c-c++-common/target-memcpy-async-1.c: New test.
* testsuite/libgomp.c-c++-common/target-memcpy-async-2.c: New test.
* testsuite/libgomp.c-c++-common/target-memcpy-rect-async-1.c: New test.
* testsuite/libgomp.c-c++-common/target-memcpy-rect-async-2.c: New test.
* testsuite/libgomp.fortran/target-memcpy-async-1.f90: New test.
* testsuite/libgomp.fortran/target-memcpy-async-2.f90: New test.
* testsuite/libgomp.fortran/target-memcpy-rect-async-1.f90: New test.
* testsuite/libgomp.fortran/target-memcpy-rect-async-2.f90: New test.
This patch replaces libbid's implementations of clz and ctz for 32 and
64 bits inputs which used several masks, and switches to the
corresponding builtins. This will provide a better implementation,
especially on targets with clz/ctz instructions.
2022-05-06 Christophe Lyon <christophe.lyon@arm.com>
libgcc/config/libbid/ChangeLog:
* bid_binarydecimal.c (CLZ32_MASK16): Delete.
(CLZ32_MASK8): Delete.
(CLZ32_MASK4): Delete.
(CLZ32_MASK2): Delete.
(CLZ32_MASK1): Delete.
(clz32_nz): Use __builtin_clz.
(ctz32_1bit): Delete.
(ctz32): Use __builtin_ctz.
(CLZ64_MASK32): Delete.
(CLZ64_MASK16): Delete.
(CLZ64_MASK8): Delete.
(CLZ64_MASK4): Delete.
(CLZ64_MASK2): Delete.
(CLZ64_MASK1): Delete.
(clz64_nz): Use __builtin_clzl.
(ctz64_1bit): Delete.
(ctz64): Use __builtin_ctzl.
This patch adds support for trunc and extend operations between HF
mode (_Float16) and Decimal Floating Point formats (_Decimal32,
_Decimal64 and _Decimal128).
For simplicity we rely on the implicit conversions inserted by the
compiler between HF and SD/DF/TF modes. The existing bid*_to_binary*
and binary*_to_bid* functions are non-trivial and at this stage it is
not clear if there is a performance-critical use case involving _Float16
and _Decimal* formats.
The patch also adds two executable tests, to make sure the right
functions are called, available (link phase) and functional.
Tested on aarch64 and x86_64. The number of symbol matches in the
testcases includes the .global XXX to avoid having to match different
call instructions for different targets.
2022-05-04 Christophe Lyon <christophe.lyon@arm.com>
libgcc/ChangeLog:
* Makefile.in (D32PBIT_FUNCS): Add _hf_to_sd and _sd_to_hf.
(D64PBIT_FUNCS): Add _hf_to_dd and _dd_to_hf.
(D128PBIT_FUNCS): Add _hf_to_td _td_to_hf.
libgcc/config/libbid/ChangeLog:
* bid_gcc_intrinsics.h (LIBGCC2_HAS_HF_MODE): Define according to
__LIBGCC_HAS_HF_MODE__.
(BID_HAS_HF_MODE): Define.
(HFtype): Define.
(__bid_extendhfsd): New prototype.
(__bid_extendhfdd): Likewise.
(__bid_extendhftd): Likewise.
(__bid_truncsdhf): Likewise.
(__bid_truncddhf): Likewise.
(__bid_trunctdhf): Likewise.
* _dd_to_hf.c: New file.
* _hf_to_dd.c: New file.
* _hf_to_sd.c: New file.
* _hf_to_td.c: New file.
* _sd_to_hf.c: New file.
* _td_to_hf.c: New file.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/convert-dfp-2.c: New test.
* gcc.dg/torture/convert-dfp.c: New test.
These tests exercise exception handling with Decimal Floating-Point
type.
dfp-1.C and dfp-2.C check that thrown objects of such types are
properly caught, whether when using C++ classes (decimalXX) or via GCC
mode attributes.
dfp-saves-aarch64.C checks that such objects are properly restored,
and has to use the mode attribute trick because objects of decimalXX
class type cannot be assigned to a register variable.
2022-05-03 Christophe Lyon <christophe.lyon@arm.com>
gcc/testsuite/
* g++.dg/eh/dfp-1.C: New test.
* g++.dg/eh/dfp-2.C: New test.
* g++.dg/eh/dfp-saves-aarch64.C: New test.
Some tests for the BID format are currently restricted to i?86 and
x86_64, but they also pass on AArch64, so this patch enables them.
Since all these tests are related to the BID format, it seems useful
to introduce a new effective-target (dfp_bid) instead of adding
aarch64 to the current target list.
2022-04-28 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* doc/sourcebuild.texi (Decimal floating point attributes): Document
dfp_bid effective-target.
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_dfp_bid): New.
* gcc.dg/dfp/bid-non-canonical-d128-1.c: Use dfp_bid
effective-target.
* gcc.dg/dfp/bid-non-canonical-d128-2.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d128-3.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d128-4.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d32-1.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d32-2.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d64-1.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d64-2.c: Likewise.
This patch copies all existing tests involving float/double/long
double types and replaces them with _Decimal32/_Decimal64/_Decimal128.
I thought it would be clearer/easier to maintain to do it this way
rather than adding tests for DFP types in the existing testcases,
except for func-ret-1.c and func-ret-3.c.
This makes sure all cases tested for traditional floating-point are
equally tested for decimal floating-point.
The patch also adds a test involving loading DFP values from memory.
2022-03-31 Christophe Lyon <christophe.lyon@arm.com>
gcc/testsuite/
* gcc.target/aarch64/aapcs64/aapcs64.exp: Support new dfp*.c tests.
* gcc.target/aarch64/aapcs64/func-ret-1.c: Add DFP tests.
* gcc.target/aarch64/aapcs64/func-ret-3.c: Add DFP tests.
* gcc.target/aarch64/aapcs64/type-def.h: Add DFP types.
* gcc.target/aarch64/aapcs64/dfp-1.c: New test.
* gcc.target/aarch64/aapcs64/ice_dfp_5.c: New test.
* gcc.target/aarch64/aapcs64/test_align_dfp-1.c: New test.
* gcc.target/aarch64/aapcs64/test_align_dfp-4.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_1.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_10.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_11.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_12.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_13.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_14.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_15.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_16.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_17.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_18.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_19.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_2.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_20.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_21.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_22.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_23.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_24.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_25.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_26.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_27.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_3.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_5.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_6.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_7.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_8.c: New test.
* gcc.target/aarch64/aapcs64/test_dfp_9.c: New test.
* gcc.target/aarch64/aapcs64/test_quad_double_dfp.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-1.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-10.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-11.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-12.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-13.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-14.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-16.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-2.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-3.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-4.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-5.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-6.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-8.c: New test.
* gcc.target/aarch64/aapcs64/va_arg_dfp-9.c: New test.
The testcase in c-c++-common/dfp/pr39986.c detects if DFP constants
are correctly emitted in the assembly. However, AArch64 uses .word
instead of the expected .long directive. With this patch, we now
accept both.
2022-03-31 Christophe Lyon <christophe.lyon@arm.com>
gcc/testsuite/
* c-c++-common/dfp/pr39986.c: Accept .word directive.
DFP support on AArch64 relies on libgcc, so enable its DFP routines
for all AArch64 targets.
2022-03-31 Christophe Lyon <christophe.lyon@arm.com>
libgcc/
* config.host: Add t-dfprules to AArch64 targets.
This patch updates the aarch64 backend as needed to support DFP modes
(SD, DD and TD).
Changes v1->v2:
* Drop support for DFP modes in
aarch64_gen_{load||store}[wb]_pair as these are only used in
prologue/epilogue where DFP modes are not used. Drop the
changes to the corresponding patterns in aarch64.md, and
useless GPF_PAIR iterator.
* In aarch64_reinterpret_float_as_int, handle DDmode the same way
as DFmode (needed in case the representation of the
floating-point value can be loaded using mov/movk.
* In aarch64_float_const_zero_rtx_p, reject constants with DFP
mode: when X is zero, the callers want to emit either '0' or
'zr' depending on the context, which is not the way 0.0 is
represented in DFP mode (in particular fmov d0, #0 is not right
for DFP).
* In aarch64_legitimate_constant_p, accept DFP
2022-03-31 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/aarch64/aarch64.cc
(aarch64_split_128bit_move): Handle DFP modes.
(aarch64_mode_valid_for_sched_fusion_p): Likewise.
(aarch64_classify_address): Likewise.
(aarch64_legitimize_address_displacement): Likewise.
(aarch64_reinterpret_float_as_int): Likewise.
(aarch64_float_const_zero_rtx_p): Likewise.
(aarch64_can_const_movi_rtx_p): Likewise.
(aarch64_anchor_offset): Likewise.
(aarch64_secondary_reload): Likewise.
(aarch64_rtx_costs): Likewise.
(aarch64_legitimate_constant_p): Likewise.
(aarch64_gimplify_va_arg_expr): Likewise.
(aapcs_vfp_sub_candidate): Likewise.
(aarch64_vfp_is_call_or_return_candidate): Likewise.
(aarch64_output_scalar_simd_mov_immediate): Likewise.
(aarch64_gen_adjusted_ldpstp): Likewise.
(aarch64_scalar_mode_supported_p): Accept DFP modes if enabled.
* config/aarch64/aarch64.md
(movsf_aarch64): Use SFD iterator and rename into
mov<mode>_aarch64.
(movdf_aarch64): Use DFD iterator and rename into
mov<mode>_aarch64.
(movtf_aarch64): Use TFD iterator and rename into
mov<mode>_aarch64.
(split pattern for move TF mode): Use TFD iterator.
* config/aarch64/iterators.md
(GPF_TF_F16_MOV): Add DFP modes.
(SFD, DFD, TFD): New iterators.
(GPF_TF): Add DFP modes.
(TX, DX, DX2): Likewise.
We should prefer the __UINT_LEAST16_TYPE__ and __UINT_LEAST32_TYPE__
macros, if available, so that we don't need all of <cstdint> in every
header that uses std::char_traits.
libstdc++-v3/ChangeLog:
* include/bits/char_traits.h: Only include <cstdint> when
necessary.
* include/std/stacktrace: Use __UINTPTR_TYPE__ instead of
uintptr_t.
* src/c++11/cow-stdexcept.cc: Include <stdint.h>.
* src/c++17/floating_to_chars.cc: Likewise.
* testsuite/20_util/assume_aligned/1.cc: Include <cstdint>.
* testsuite/20_util/assume_aligned/3.cc: Likewise.
* testsuite/20_util/shared_ptr/creation/array.cc: Likewise.
Since the COW std::string was moved to its own header, we don't need the
atomic dispatch helpers in the definition of std::__cxx11::string. Move
the inclusion of the <ext/atomicity.h> header to <bits/cow_string.h>
where it's needed.
libstdc++-v3/ChangeLog:
* include/bits/basic_string.h: Do not include <ext/atomicity.h>
here.
* include/bits/cow_string.h: Include it here.
Currently the alias templates for std::pmr::vector, std::pmr::string
etc. are defined using a forward declaration for polymorphic_allocator.
This means you can't actually use the alias templates unless you also
include <memory_resource>. The rationale for that is that it's a fairly
large header, and most users don't need it. This isn't uncontroversial
though, and LWG 3681 questions whether it's even conforming.
This change adds a new <bits/memory_resource.h> header with the minimum
needed to use polymorphic_allocator and the std::pmr container aliases.
Including <memory_resource> is still necessary to use the program-wide
resource objects, or the pool resources or monotonic buffer resource.
libstdc++-v3/ChangeLog:
* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/bits/memory_resource.h: New file.
* include/std/deque: Include <bits/memory_resource.h>.
* include/std/forward_list: Likewise.
* include/std/list: Likewise.
* include/std/map: Likewise.
* include/std/memory_resource (pmr::memory_resource): Move to
new <bits/memory_resource.h> header.
(pmr::polymorphic_allocator): Likewise.
* include/std/regex: Likewise.
* include/std/set: Likewise.
* include/std/stacktrace: Likewise.
* include/std/string: Likewise.
* include/std/unordered_map: Likewise.
* include/std/unordered_set: Likewise.
* include/std/vector: Likewise.
* testsuite/21_strings/basic_string/types/pmr_typedefs.cc:
Remove <memory_resource> header and check construction.
* testsuite/23_containers/deque/types/pmr_typedefs.cc: Likewise.
* testsuite/23_containers/forward_list/pmr_typedefs.cc:
Likewise.
* testsuite/23_containers/list/pmr_typedefs.cc: Likewise.
* testsuite/23_containers/map/pmr_typedefs.cc: Likewise.
* testsuite/23_containers/multimap/pmr_typedefs.cc: Likewise.
* testsuite/23_containers/multiset/pmr_typedefs.cc: Likewise.
* testsuite/23_containers/set/pmr_typedefs.cc: Likewise.
* testsuite/23_containers/unordered_map/pmr_typedefs.cc:
Likewise.
* testsuite/23_containers/unordered_multimap/pmr_typedefs.cc:
Likewise.
* testsuite/23_containers/unordered_multiset/pmr_typedefs.cc:
Likewise.
* testsuite/23_containers/unordered_set/pmr_typedefs.cc:
Likewise.
* testsuite/23_containers/vector/pmr_typedefs.cc: Likewise.
* testsuite/28_regex/match_results/pmr_typedefs.cc: Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/variadic-tuple.C: Qualify function to avoid ADL
finding std::make_tuple.
The patch is a revised solution for PR middle-end/98865 incorporating
the feedback/suggestions from Richard Biener's review here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-May/593928.html
Most significantly, this patch now performs the transformation/optimization
during RTL expansion, where the target's rtx_costs can be used to determine
whether the original multiplication (that may potentially be implemented by
a shift or lea) is cheaper than a negation and a bit-wise and.
Previously the expression (x>>63)*y would be compiled with -O2 as
shrq $63, %rdi
movq %rdi, %rax
imulq %rsi, %rax
but with this patch now produces:
sarq $63, %rdi
movq %rdi, %rax
andq %rsi, %rax
Likewise the expression (x>>63)*135 [that appears in a hot-spot of the
Botan AES-128 benchmark] was previously:
shrq $63, %rdi
leaq (%rdi,%rdi,8), %rdx
movq %rdx, %rax
salq $4, %rax
subq %rdx, %rax
now becomes:
movq %rdi, %rax
sarq $63, %rax
andl $135, %eax
2022-05-19 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR middle-end/98865
* expr.cc (expand_expr_real_2) [MULT_EXPR]: Expand X*Y as X&Y
when both X and Y are [0, 1], X*Y as X&-Y when Y is [0,1] and
likewise X*Y as -X&Y when X is [0,1] using tree_nonzero_bits.
gcc/testsuite/ChangeLog
PR middle-end/98865
* gcc.target/i386/pr98865.c: New test case.
These defines are no longer used once the rs6000 built-in
reworks were completed. Time to remove them.
There was a reference to RS6000_BTC_SPECIAL in a TODO comment
in rs6000-builtins.def. That comment remains, but I have updated
the comment to refer to "SPECIAL" processing, instead of having it
refer directly to the _BTC_SPECIAL macro.
2022-05-18 Will Schmidt <will_schmidt@vnet.ibm.com>
gcc/
* config/rs6000/rs6000-builtins.def: Rephrase
to remove RS6000_BTC_SPECIAL from comment.
* config/rs6000/rs6000.h (RS6000_BTC_UNARY, RS6000_BTC_BINARY,
RS6000_BTC_TERNARY, RS6000_BTC_QUATERNARY,
RS6000_BTC_QUINARY, RS6000_BTC_SENARY, RS6000_BTC_OPND_MASK,
RS6000_BTC_SPECIAL, RS6000_BTC_PREDICATE, RS6000_BTC_ABS,
RS6000_BTC_DST, RS6000_BTC_TYPE_MASK, RS6000_BTC_MISC,
RS6000_BTC_CONST, RS6000_BTC_PURE, RS6000_BTC_FP,
RS6000_BTC_QUAD, RS6000_BTC_PAIR, RS6000_BTC_QUADPAIR,
RS6000_BTC_ATTR_MASK, RS6000_BTC_SPR, RS6000_BTC_VOID,
RS6000_BTC_CR, RS6000_BTC_OVERLOADED, RS6000_BTC_GIMPLE,
RS6000_BTC_MISC_MASK, RS6000_BTC_MEM, RS6000_BTC_SAT,
RS6000_BTM_ALWAYS): Delete.
This issue has recently been moved to Tentatively Ready, and seems
uncontroversial. This allows equality comparison with types that are
convertible to pmr::polymorphic_allocator, which fail deduction for the
existing equality operator.
libstdc++-v3/ChangeLog:
* include/std/memory_resource (polymorphic_allocator): Add
non-template equality operator, as proposed for LWG 3683.
* testsuite/20_util/polymorphic_allocator/lwg3683.cc: New test.
When forcing the condition to be split out from COND_EXPRs I see
a runtime failure of libgomp.fortran/atomic-19.f90 which can be
reduced to
!$omp atomic update, compare, capture
if (x == 69_2 - r) x = 6_8
v = x
being miscompiled, the difference being
- _13 = .ATOMIC_COMPARE_EXCHANGE (_9, _10, _11, 4, 0, 0);
- _14 = IMAGPART_EXPR <_13>;
- _15 = REALPART_EXPR <_13>;
- _16 = _14 != 0 ? _11 : _15;
- _2 = (integer(kind=4)) _16;
- v_17 = _2;
+ _14 = .ATOMIC_COMPARE_EXCHANGE (_10, _11, _12, 4, 0, 0);
+ _15 = IMAGPART_EXPR <_14>;
+ _16 = REALPART_EXPR <_14>;
+ _2 = (logical(kind=1)) _15;
+ _3 = (integer(kind=4)) _16;
+ v_17 = _3;
where one can see a missing COND_EXPR. It seems to be a latent
issue to me given the code can be exercised, it just maybe misses
a 'need_new' testcase combined with 'cond_stmt'. Appearantly
the if (cond_stmt) code is just to avoid creating a temporary
(and possibly to preserve the condition compute if used elsewhere
since the original stmt is going to be deleted). The following
makes the failure go away for me in my patched tree and it
also survives libgomp and gomp testing in an unpatched tree.
2022-05-13 Richard Biener <rguenther@suse.de>
* omp-expand.cc (expand_omp_atomic_cas): Do not short-cut
computation of the new value.
This function is no longer needed.
2022-05-19 Richard Biener <rguenther@suse.de>
* tree-ssa-pre.cc (get_or_alloc_expression_id): Remove.
(add_to_value): Use get_expression_id.
(bitmap_insert_into_set): Likewise.
(bitmap_value_insert_into_set): Likewise.
gcc/ada/
* gcc-interface/decl.cc (gnat_to_gnu_entity) <E_Constant>: Deal with
a constant related to a return in a function specially.
* gcc-interface/trans.cc (Call_to_gnu): Use return slot optimization
if the target is a return object.
(gnat_to_gnu) <N_Object_Declaration>: Deal with a constant related
to a return in a function specially.
The rationale is that these entities are almost always the result of
expansion activities in the front-end, over which the user has very
limited control. These warnings can be restored by means of -gnatD.
gcc/ada/
* gcc-interface/utils.cc (gnat_pushdecl): Also set TREE_NO_WARNING
on the decl if Comes_From_Source is false for the associated node.
This alphabetizes the large switch statement, removes a useless nested
switch statement, an artificial fall through and adds a default return.
No functional changes.
gcc/ada/
* gcc-interface/trans.cc (gnat_gimplify_expr): Tidy up.
The issue arises when the unchecked union contains both a fixed part and
a variant part, and is subject to a full representation clause covering
all the components in all the variants, when the component clauses do not
align the variant boundaries with byte boundaries consistently.
gcc/ada/
* gcc-interface/decl.cc (components_to_record): Use NULL recursively
as P_GNU_REP_LIST for the innermost variant level in the unchecked
union case with a fixed part.
The message "No source file position information available" is displayed
in the bugbox when Current_Error_Node has no location, which is useless.
gcc/ada/
* gcc-interface/trans.cc (gnat_to_gnu): Do not set Current_Error_Node
to a node without location.