Remove RTEMS support from crossconfig.m4 since this code is not used due to
"with_newlib" being "yes".
libstdc++-v3/ChangeLog:
* configure: Regnerate.
* configure.ac (newlib, *-rtems*): Enable TLS support for all RTEMS
targets except bfin, lm32, mips, moxie, or1k, and v850.
For all RTEMS targets, define HAVE_ALIGNED_ALLOC, HAVE_AT_QUICK_EXIT,
HAVE_LINK, HAVE_POLL, HAVE_QUICK_EXIT, HAVE_READLINK, HAVE_SETENV,
HAVE_SLEEP, HAVE_SOCKATMARK, HAVE_STRERROR_L, HAVE_SYMLINK,
HAVE_TRUNCATE, and HAVE_USLEEP.
* crossconfig.m4 (*-rtems*): Remove.
As the following self-test testcase shows, wi::shifted_mask sometimes
doesn't create canonicalized wide_ints, which then fail to compare equal
to canonicalized wide_ints with the same value.
In particular, wi::mask (128, false, 128) gives { -1 } with len 1 and prec 128,
while wi::shifted_mask (0, 128, false, 128) gives { -1, -1 } with len 2
and prec 128.
The problem is that the code is written with the assumption that there are
3 bit blocks (or 2 if start is 0), but doesn't consider the possibility
where there are 2 bit blocks (or 1 if start is 0) where the highest block
isn't present. In that case, there is the optional block of negate ? 0 : -1
elts, followed by just one elt (either one from the if (shift) or just
negate ? -1 : 0) and the rest is implicit sign-extension.
Only if end < prec there is 1 or more bits above it that have different bit
value and so we need to emit all the elts till end and then one more elt.
if (end == prec) would work too, because we have:
if (width > prec - start)
width = prec - start;
unsigned int end = start + width;
so end is guaranteed to be end <= prec, dunno what is preferred.
2022-07-01 Jakub Jelinek <jakub@redhat.com>
PR middle-end/106144
* wide-int.cc (wi::shifted_mask): If end >= prec, return right after
emitting element for shift or if shift is 0 first element after start.
(wide_int_cc_tests): Add tests for equivalency of wi::mask and
wi::shifted_mask with 0 start.
When optimizing for size with -Oz, setting a register can be minimized by
pushing an immediate value to the stack and popping it to the destination.
Alas the one general register that shouldn't be updated via the stack is
the stack pointer itself, where "pop %esp" can't be represented in GCC's
RTL ("use of a register mentioned in pre_inc, pre_dec, post_inc or
post_dec is not permitted within the same instruction"). This patch
fixes PR target/106122 by explicitly checking for SP_REG in the
problematic peephole2.
2022-07-01 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/106122
* config/i386/i386.md (peephole2): Avoid generating pop %esp
when optimizing for size.
gcc/testsuite/ChangeLog
PR target/106122
* gcc.target/i386/pr106122.c: New test case.
This patch tidies up and unifies doubleword handling in i386.md;
converting all doubleword splitters for logic operations to post-reload
form, generalizing their define_insn_and_split templates to <dwi> form
(supporting TARGET_64BIT ? TImode : DImode), and where required tweaking
the corresponding expanders to use SDWIM to support TImode doubleword
operations.
2022-07-01 Roger Sayle <roger@nextmovesoftware.com>
Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
* config/i386/i386.md (general_szext_operand): Add TImode
support using x86_64_hilo_general_operand predicate.
(*cmp<dwi>_doubleword): Use x86_64_hilo_general_operand predicate.
(*add<dwi>3_doubleword): Improved optimization of zero addition.
(and<mode>3): Use SDWIM mode iterator to add support for double
word bit-wise AND in TImode. Use force_reg when double word
immediate operand isn't x86_64_hilo_general_operand.
(and<dwi>3_doubleword): Generalized from anddi3_doubleword and
converted into a post-reload splitter.
(*andndi3_doubleword): Previous define_insn deleted.
(*andn<mode>3_doubleword_bmi): New define_insn_and_split for
TARGET_BMI that splits post-reload.
(*andn<mode>3_doubleword): New define_insn_and_split for
!TARGET_BMI, that lowers/splits before reload.
(<any_or><mode>3): Use SDWIM mode iterator to add suppport for
double word bit-wise XOR and bit-wise IOR in TImode. Use
force_reg when double word immediate operand isn't
x86_64_hilo_general_operand.
(*<any_or>di3_doubleword): Generalized from <any_or>di3_doubleword.
(one_cmpl<mode>2): Use SDWIM mode iterator to add support for
double word bit-wise NOT in TImode.
(one_cmpl<dwi>2_doubleword): Generalize from one_cmpldi2_doubleword
and converted into a post-reload splitter.
The original fix is very likely too big a hammer.
gcc/
PR middle-end/105874
* expr.cc (expand_expr_real_1) <normal_inner_ref>: Force
EXPAND_MEMORY for the expansion of the inner reference only
in the usual cases where a memory reference is required.
Move -pthread from configure.ac to Makefile.in so that it is passed to AM_LDFLAGS.
PR lto/106118
lto-plugin/ChangeLog:
* configure.ac: Move -pthread from here...
* Makefile.am: ...to here.
* configure: Regenerate.
* Makefile.in: Likewise.
The following makes sure to not use the original TBAA type for
looking up a value across an aggregate copy when we had to offset
the read.
2022-06-30 Richard Biener <rguenther@suse.de>
PR tree-optimization/106131
* tree-ssa-sccvn.cc (vn_reference_lookup_3): Force alias-set
zero when offsetting the read looking through an aggregate
copy.
* g++.dg/torture/pr106131.C: New testcase.
Properly allow side effects only for a first BB in a condition chain.
PR tree-optimization/106126
gcc/ChangeLog:
* gimple-if-to-switch.cc (struct condition_info): Save
has_side_effect.
(find_conditions): Parse all BBs.
(pass_if_to_switch::execute): Allow only side effects for first
BB.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr106126.c: New test.
gcc/fortran/ChangeLog:
PR fortran/103137
PR fortran/103138
PR fortran/103693
PR fortran/105243
* decl.cc (gfc_match_data_decl): Reject CLASS entity declaration
when it is given the PARAMETER attribute.
gcc/testsuite/ChangeLog:
PR fortran/103137
PR fortran/103138
PR fortran/103693
PR fortran/105243
* gfortran.dg/class_58.f90: Fix test.
* gfortran.dg/class_73.f90: New test.
Co-authored-by: Steven G. Kargl <kargl@gcc.gnu.org>
The LTO merging of options from different input files was broken by:
commit 227a2ecf66
Author: Martin Liska <mliska@suse.cz>
Date: Fri Mar 12 11:53:47 2021 +0100
lto-wrapper: Use vec<cl_decoded_option> data type.
Previously, find_and_merge_options would merge options it read into
those in *opts. After this commit, options in *opts on entry to
find_and_merge_options are ignored; the only merging that takes place
is between multiple sets of options in the same input file that are
read in the same call to this function (not sure how that case can
occur at all). The effects include, for example, that if some objects
are built with PIC enabled and others with it disabled, and the last
LTO object processed has PIC enabled, the choice of PIC for the last
object will result in the whole program being built as PIC, when the
merging logic is intended to ensure that a mixture of PIC and non-PIC
objects results in the whole program being built as non-PIC.
Fix this with an extra argument to find_and_merge_options to determine
whether merging should take place. This shows up a second issue with
that commit (which I think wasn't actually intended to change code
semantics at all): once merging is enabled again, the check for
-Xassembler options became an infinite loop in the case where both
inputs had -Xassembler options, with the same first option, so fix
that loop to restore the previous semantics.
Note that I'm not sure how LTO option merging might be tested in the
testsuite (clearly there wasn't sufficient, if any, coverage to detect
these bugs).
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
PR lto/106129
* lto-wrapper.cc (find_option): Add argument start.
(merge_and_complain): Loop over existing_opt_index and
existing_opt2_index for Xassembler check. Update calls to
find_option.
(find_and_merge_options): Add argument first to determine whether
to merge options with those passed in *opts.
(run_gcc): Update calls to find_and_merge_options.
Currently the throwing overload of fs::temp_directory_path() will
discard the path that was obtained from the environment. When it fails
because the path doesn't resolve to a directory you get an unhelpful
error like:
filesystem error: temp_directory_path: Not a directory
It would be better to also print the path in that case, e.g.
filesystem error: temp_directory_path: Not a directory [/home/bob/tmp]
libstdc++-v3/ChangeLog:
* src/c++17/fs_ops.cc (fs::temp_directory_path()): Include path
in exception.
(fs::temp_directory_path(error_code&)): Rearrange to more
closely match the structure of the first overload.
* src/filesystem/ops.cc (fs::temp_directory_path): Likewise.
* testsuite/27_io/filesystem/operations/temp_directory_path.cc:
Check that exception contains the path.
* testsuite/experimental/filesystem/operations/temp_directory_path.cc:
Likewise.
Although the Filesystem TS isn't properly supported on Windows (unlike
the C++17 Filesystem lib), most tests do pass. Two of the failures are
due to PR 88881 which was only fixed for std::filesystem not the TS.
This applies the fix to the TS implementation too.
libstdc++-v3/ChangeLog:
PR libstdc++/88881
* src/filesystem/ops.cc (has_trailing_slash): New helper
function.
(fs::status): Strip trailing slashes.
(fs::symlink_status): Likewise.
* testsuite/experimental/filesystem/operations/temp_directory_path.cc:
Clean the environment before each test and use TMP instead of
TMPDIR so the test passes on Windows.
This patch makes the vrange_allocator an abstract class, and uses it
to implement the obstack allocator as well as a new GC allocator.
The GC bits will be used to implement the vrange storage class for
global ranges, which will be contributed in the next week or so.
Tested and benchmarked on x86-64 Linux.
gcc/ChangeLog:
* gimple-range-cache.cc (block_range_cache::block_range_cache):
Rename vrange_allocator to obstack_vrange_allocator.
(ssa_global_cache::ssa_global_cache): Same.
* gimple-range-edge.h (class gimple_outgoing_range): Same.
* gimple-range-infer.h (class infer_range_manager): Same.
* value-range.h (class vrange_allocator): Make abstract.
(class obstack_vrange_allocator): Inherit from vrange_allocator.
(class ggc_vrange_allocator): New.
In order to prune ordinary locations, we need to note the locations of
macros we'll be writing out. This rearanges the macro processing to achieve
that. Also drop an unneeded parameter from macro reading & writing.
Fix some it's/its errors.
gcc/cp/
* module.cc (module_state::write_define): Drop located param.
(module_state::read_define): Likewise.
(module_state::prepare_macros): New, broken out of ...
(module_state::write_macros): ... here. Adjust.
(module_state::write_begin): Adjust.
gcc/testsuite/
* g++.dg/modules/inext-1.H: Check include-next happened.
This patch was motivated by the investigation of Linus Torvalds' spill
heavy cryptography kernels in PR 105930. The <any_rotate>di3 expander
handles all rotations by an immediate constant for 1..63 bits with the
exception of 32 bits, which FAILs and is then split by the middle-end.
This patch makes these 32-bit doubleword rotations consistent with the
other DImode rotations during reload, which results in reduced register
pressure, fewer instructions and the use of x86's xchg instruction
when appropriate. In theory, xchg can be handled by register renaming,
but even on micro-architectures where it's implemented by 3 uops (no
worse than a three instruction shuffle), avoiding nominating a
"temporary" register, reduces user-visible register pressure (and
has obvious code size benefits).
The effects are best shown with the new testcase:
unsigned long long bar();
unsigned long long foo()
{
unsigned long long x = bar();
return (x>>32) | (x<<32);
}
for which GCC with -m32 -O2 currently generates:
subl $12, %esp
call bar
addl $12, %esp
movl %eax, %ecx
movl %edx, %eax
movl %ecx, %edx
ret
but with this patch now generates:
subl $12, %esp
call bar
addl $12, %esp
xchgl %edx, %eax
ret
With this patch, the number of lines of assembly language generated
for the blake2b kernel (from the attachment to PR105930) decreases
from 5626 to 5404. Although there's an impressive reduction in
instruction count, there's no change/reduction in stack frame size.
2022-06-30 Roger Sayle <roger@nextmovesoftware.com>
Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
* config/i386/i386.md (swap_mode): Rename from *swap<mode> to
provide gen_swapsi.
(<any_rotate>di3): Handle !TARGET_64BIT rotations by 32 bits
via new gen_<insn>32di2_doubleword below.
(<anyrotate>32di2_doubleword): New define_insn_and_split
that splits after reload as either a pair of move instructions
or an xchgl (using gen_swapsi).
gcc/testsuite/ChangeLog
* gcc.target/i386/xchg-3.c: New test case.
At some point when domwalk got the ability to use RPO for ordering
dominator children we carefully avoided update_ssa eating the cost
of RPO compute. Unfortunately some later consolidation of CTORs
lost this again so the following makes this explicit via a special
value to the bb_index_to_rpo argument of domwalk, speeding up
update_ssa again.
* domwalk.h (dom_walker::dom_walker): Update comment to
reflect reality and new special argument value for
bb_index_to_rpo.
* domwalk.cc (dom_walker::dom_walker): Recognize -1
bb_index_to_rpo.
* tree-into-ssa.cc
(rewrite_update_dom_walker::rewrite_update_dom_walker): Tell
dom_walker to not use RPO.
That warning won't happen on ilp32 targets, seems like Andrew Pinski
already mention that[1] before.
Verified on riscv32-unknown-elf and riscv64-unknown-elf.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92879#c1
gcc/testsuite/ChangeLog:
PR testsuite/102690
* g++.dg/warn/Warray-bounds-16.C: XFAIL only on lp64 for the
warning.
The routine fold_using_range::relation_fold_and_or needs to veriyf that both
operands of 2 stmts are the same, and uses GORIs dependency cache for this.
This cache cannot be counted on to reflect the current contents of a
stmt, expecially in the presence of an IL changing pass. Instead, look at the
statement operands.
PR tree-optimization/106114
gcc/
* gimple-range-fold.cc (fold_using_range::relation_fold_and_or): Check
statement operands instead of GORI cache.
gcc/testsuite/
* gcc.dg/pr106114.c: New.
2022-06-29 Antoni Boucher <bouanto@zoho.com>
gcc/jit/
PR jit/105812
* jit-playback.cc: Use the correct return type when folding in
as_truth_value.
gcc/testsuite/
PR jit/105812
* jit.dg/test-asm.cc: Add include missing to make the test pass.
* jit.dg/test-pr105812-bool-operations.c: New test.
Casting from vector to static array is permitted, and the frontend
generates a reinterpret cast, but casting back the other way resulted in
an error. This has been fixed to be properly handled in the code
generation pass of VectorExp, and the conversion for lvalue and rvalue
handling done in convert_expr and convert_for_rvalue respectively.
PR d/106139
gcc/d/ChangeLog:
* d-convert.cc (convert_expr): Handle casting from array to vector.
(convert_for_rvalue): Rewrite vector to array casts of the same
element type into a constructor.
(convert_for_assignment): Return calling convert_for_rvalue.
* expr.cc (ExprVisitor::visit (VectorExp *)): Handle generating a
vector expression from a static array.
* toir.cc (IRVisitor::visit (ReturnStatement *)): Call
convert_for_rvalue on return value.
gcc/testsuite/ChangeLog:
* gdc.dg/pr106139a.d: New test.
* gdc.dg/pr106139b.d: New test.
* gdc.dg/pr106139c.d: New test.
* gdc.dg/pr106139d.d: New test.
On musl <pthread.h> uses calloc() (via <sched.h>). jit/ includes
it directly and exposes use of poisoned calloc():
/build/build/./prev-gcc/xg++ ... ../../gcc-13-20220626/gcc/jit/jit-playback.cc
make[3]: *** [Makefile:1143: jit/libgccjit.o] Error 1
make[3]: *** Waiting for unfinished jobs....
In file included from /<<NIX>>/musl-1.2.3-dev/include/pthread.h:30,
from ../../gcc-13-20220626/gcc/jit/jit-playback.cc:44:
/<<NIX>>/musl-1.2.3-dev/include/sched.h:84:7: error: attempt to use poisoned "calloc"
84 | void *calloc(size_t, size_t);
| ^
/<<NIX>>/musl-1.2.3-dev/include/sched.h:124:36: error: attempt to use poisoned "calloc"
124 | #define CPU_ALLOC(n) ((cpu_set_t *)calloc(1,CPU_ALLOC_SIZE(n)))
| ^
The change moves <pthread.h> inclusion to "system.h" under new
INCLUDE_PTHREAD_H guard and uses this mechanism in libgccjit.
gcc/
PR c++/106102
* system.h: Introduce INCLUDE_PTHREAD_H macros to include <pthread.h>.
gcc/jit/
PR c++/106102
* jit-playback.cc: Include <pthread.h> via "system.h" to avoid calloc()
poisoning.
* jit-recording.cc: Ditto.
* libgccjit.cc: Ditto.
gcc/fortran/ChangeLog:
PR fortran/106121
* simplify.cc (gfc_simplify_extends_type_of): Do not attempt to
simplify when one of the arguments is a CLASS variable that was
not properly declared.
gcc/testsuite/ChangeLog:
PR fortran/106121
* gfortran.dg/extends_type_of_4.f90: New test.
Co-authored-by: Steven G. Kargl <kargl@gcc.gnu.org>
The macro location tables should really mention they are about
locations. So rename them. Also, add a missing free of the remapping
table, and remove some now-unneeded macro checking.
gcc/cp/
* module.cc (macro_info, macro_traits, macro_table,
macro_remap): Rename to ...
(macro_loc_info, macro_loc_traits, macro_loc_table,
macro_loc_remap): ... these. Update all uses.
(module_state::write_prepare_maps): Remove unneeded macro checking.
(module_state::write_begin): Free macro_loc_remap.
The variables being used to get the result out of TYPE_VECTOR_SUBPARTS
were being flagged by -Werror=maybe-uninitialized. As they have already
been checked for being constant earlier, use `to_constant' instead.
gcc/d/ChangeLog:
* intrinsics.cc (build_shuffle_mask_type): Use to_constant when
getting the number of subparts from a vector type.
(expand_intrinsic_vec_shufflevector): Likewise.
On Nios II, PIC function calls use R_NIOS2_CALL* relocations, which
may refer to a GOT entry that initially points to a PLT entry to
resolve the function on first call and that is then changed by the
dynamic linker to point directly to the function to be called so
subsequent calls do not go through the dynamic linker. To quote the
ABI, "A global offset table (GOT) entry referenced using
R_NIOS2_GOT16, R_NIOS2_GOT_LO as well as R_NIOS2_GOT_HA must be
resolved at load time. A GOT entry referenced only using
R_NIOS2_CALL16, R_NIOS2_CALL_LO as well as R_NIOS2_CALL_HA can
initially refer to a procedure linkage table (PLT) entry and then be
resolved lazily.".
However, GCC wrongly treats function addresses loaded from the GOT
with such relocations as constant. If the address load is pulled out
of a loop, then every call in the loop looks up the function by name.
This shows up as very slow execution of many glibc testcases in glibc
2.35 and later (tests that call functions from shared libc many times
in a loop), where tests are now built as PIE by default. Fix this
problem by using gen_rtx_MEM instead of gen_const_mem when loading
addresses for PIC function calls.
Tested with no regressions for cross to nios2-linux-gnu, where many
glibc tests pass that previously timed out.
* config/nios2/nios2.cc (nios2_load_pic_address): Use gen_rtx_MEM
not gen_const_mem for UNSPEC_PIC_CALL_SYM.
My patch apparently left some __float128 uses in libgfortran
that could use _Float128 instead, the following patch changes that.
2022-06-29 Jakub Jelinek <jakub@redhat.com>
* mk-kinds-h.sh: Change __float128 to _Float128 in a comment.
* acinclude.m4 (LIBGFOR_CHECK_MATH_IEEE128): Use _Float128 instead of
__float128.
* libgfortran.h (isnan): Change __float128 to _Float128 in a comment.
(__acoshieee128, __acosieee128, __asinhieee128, __asinieee128,
__atan2ieee128, __atanhieee128, __atanieee128, __copysignieee128,
__coshieee128, __cosieee128, __erfcieee128, __erfieee128,
__expieee128, __fabsieee128, __fmaieee128, __fmodieee128, __jnieee128,
__log10ieee128, __logieee128, __powieee128, __sinhieee128,
__sinieee128, __sqrtieee128, __tanhieee128, __tanieee128,
__ynieee128, __strtoieee128): Use _Float128 instead of __float128.
* configure: Regenerated.
My recent gfortran + libgfortran patch apparently broke (some?) aarch64
builds. While it is desirable to use just _Float128 rather than __float128,
we only want to use it (and e.g. define HAVE_FLOAT128) on targets where
_Float128 is supported and long double isn't IEEE quad precision.
Which is targets that support __float128 type which we have been testing
for before - _Float128 is supported on those targets and on targets where
long double is IEEE quad precision.
So, the following patch restores check for whether __float128 is supported
into the LIBGFOR_CHECK_FLOAT128 check which determines whether
HAVE_FLOAT128 is defined or whether to use libquadmath, so that e.g. on
aarch64 where long double is IEEE quad we don't do that.
2022-06-29 Jakub Jelinek <jakub@redhat.com>
PR bootstrap/106137
* acinclude.m4 (LIBGFOR_CHECK_FLOAT128): Adjust comment.
Also test for __float128.
(HAVE_FLOAT128): Adjust description.
* config.h.in: Regenerated.
* configure: Regenerated.
The following makes sure we preserve EH notes on call insns that
indicate the call doesn't perform a non-local goto when distributing
notes after combining insns.
2022-06-28 Richard Biener <rguenther@suse.de>
PR rtl-optimization/106082
* combine.cc (distribute_notes): Preserve notes when
they indicate a call doesn't perform a non-local goto.
The following fixes a mistake in looking up an extended operand
in the CSE of a truncated operation.
2022-06-28 Richard Biener <rguenther@suse.de>
PR tree-optimization/106112
* tree-ssa-sccvn.cc (valueized_wider_op): Properly extend
a constant operand according to its type.
* gcc.dg/torture/pr106112.c: New testcase.
When enabling AVX512FP via attribute or pragma, the _Float16 type would
remain unavailable when at initialization time SSE2 wouldn't be seen as
available for use. While this may hint at a wider underlying issue (like
the feature, the type may want providing dynamically, albeit this may be
challenging in particular for functions returning _Float16 yet having
the attribute specified after their return type), for now simply make
SSE2 available when targeting ix86.
gcc/testsuite/
* gcc.target/i386/avx512fp16-reduce-op-2.c: Force SSE2 for i?86.
* gcc.target/i386/pr99464.c: Likewise.
So far on 32-bit hosts this test failed (for both C and C++) because of
the ABI change warning occurring without (explictly) enabling MMX.
gcc/testsuite/
* c-c++-common/torture/builtin-shufflevector-2.c: Prune ix86 MMX
ABI warning.
C++2017 and previous standard description:
The value of E1 << E2 is E1 left-shifted E2 bit positions;
vacated bits are zero-filled. If E1 has an unsigned type,
the value of the result is E1×2E2, reduced modulo one more
than the maximum value representable inthe result type.
Otherwise, if E1 has a signed type and non-negative value,
and E1×2E2 is representablein the corresponding unsigned
type of the result type, then that value, converted to the
result type, is the resulting value; otherwise, the behavior
is undefined.
The value of E1 >> E2 is E1 right-shifted E2 bit positions.
If E1 has an unsigned type or if E1 has a signed type and
a non-negative value, the value of the result is the integral
part of the quotient of E1/2E2. If E1 has a signed type and
a negative value, the resulting value is implementation-defined.
gcc/ChangeLog:
PR target/106097
* config/loongarch/loongarch.cc (loongarch_build_integer):
Remove undefined behavior from code.
Vectors in D are exposed by the use of the `__vector(T[N])' type, and
whilst most unary and binary operations work as you'd expect, there are
some operations that are not possible without doing the operation
unrolled, or calling some target-specific built-in, or with inline asm.
This introduces a new `gcc.simd' module that introduces the following.
- Prefetching has been exposed by a convenient `prefetch' function in
the library.
- Loading and storing from an unaligned address have been exposed by
`loadUnaligned' and `storeUnaligned' intrinsics.
- Vector permutations have been exposed by `shuffle`, and
`shufflevector' intrinsics.
- Converting between two vectors with a different element type has been
exposed by a `convertvector' intrinsic.
- The ternary operator has been exposed with a `blendvector' intrinsic.
- Comparison operators have been exposed by `equalMask',
`notEqualMask', `greaterMask', and `greaterEqualMask' intrinsics.
- Logic operators have been exposed by convenient `notMask',
`andAndMask', and `orOrMask' functions in the library.
To be compatible with the LLVM D compiler's own SIMD intrinsic module,
there is also the addition of an `extractelement' and `insertelement'
convenience functions, and an alternative interface for calling the
`shufflevector' function.
The addition of these intrinsics lowers the boundary for users working
in SIMD to get the desired codegen they want out of the compiler.
Most of what is present here - apart from tests - is the adding of
machinery in the intrinsics suite of functions to do validation on
templated intrinsics. Whilst these are still matched from the library
by their generic (untyped) signature, there is a still an assumption
that what has been instantiated and handed down to the code generator is
valid, because why would these definitions be found outside of the
in-tree D runtime library? The majority of intrinsics are not
templates, so the test on the mangled signature string still guarantees
all types are as we expect them to be. However there are still a small
handful of other templated intrinsics (core.bitop.{rol,ror},
core.math.toPrec, std.math.traits.isNaN, ...) that are currently
unchecked, so would benefit from being included into this built-in
checking function at some point in the future.
gcc/d/ChangeLog:
* intrinsics.cc: Include diagnostic.h, langhooks.h,
vec-perm-indices.h.
(maybe_set_intrinsic): Add cases for new simd intrinsics.
(warn_mismatched_return_type): New function.
(warn_mismatched_argument): New function.
(build_shuffle_mask_type): New function.
(maybe_warn_intrinsic_mismatch): New function.
(expand_intrinsic_vec_cond): New function.
(expand_intrinsic_vec_convert): New function.
(expand_intrinsic_vec_blend): New function.
(expand_intrinsic_vec_shuffle): New function.
(expand_intrinsic_vec_shufflevector): New function.
(expand_intrinsic_vec_load_unaligned): New function.
(expand_intrinsic_vec_store_unaligned): New function.
(maybe_expand_intrinsic): Check signature of intrinsic before handing
off to front-end lowering. Add cases for new simd intrinsics.
* intrinsics.def (INTRINSIC_LOADUNALIGNED): Define intrinsic.
(INTRINSIC_STOREUNALIGNED): Define intrinsic.
(INTRINSIC_SHUFFLE): Define intrinsic.
(INTRINSIC_SHUFFLEVECTOR): Define intrinsic.
(INTRINSIC_CONVERTVECTOR): Define intrinsic.
(INTRINSIC_BLENDVECTOR): Define intrinsic.
(INTRINSIC_EQUALMASK): Define intrinsic.
(INTRINSIC_NOTEQUALMASK): Define intrinsic.
(INTRINSIC_GREATERMASK): Define intrinsic.
(INTRINSIC_GREATEREQUALMASK): Define intrinsic.
libphobos/ChangeLog:
* libdruntime/Makefile.am (DRUNTIME_DSOURCES): Add gcc/simd.d.
* libdruntime/Makefile.in: Regenerate.
* libdruntime/gcc/simd.d: New file.
gcc/testsuite/ChangeLog:
* gdc.dg/Wbuiltin_declaration_mismatch.d: Rename to...
* gdc.dg/Wbuiltin_declaration_mismatch1.d: ...this.
* gdc.dg/Wbuiltin_declaration_mismatch2.d: New test.
* gdc.dg/torture/simd_blendvector.d: New test.
* gdc.dg/torture/simd_cond.d: New test.
* gdc.dg/torture/simd_convertvector.d: New test.
* gdc.dg/torture/simd_load.d: New test.
* gdc.dg/torture/simd_logical.d: New test.
* gdc.dg/torture/simd_shuffle.d: New test.
* gdc.dg/torture/simd_shufflevector.d: New test.
* gdc.dg/torture/simd_store.d: New test.
This patch updates ucnid.h from Unicode 13 to Unicode 14. Additionally, the
procedure detailed in contrib/unicode/README, which updates
generated_wcwidth.h, has been expanded with instructions for updating this
file as well, so that both may be done at the same time conveniently. Two
additional Unicode data files which are needed to create ucnid.h are also
added to source control in contrib/unicode.
contrib/ChangeLog:
* unicode/README: Added instructions for updating ucnid.h.
* unicode/DerivedCoreProperties.txt: New file added to source
control from Unicode 14.0 release.
* unicode/DerivedNormalizationProps.txt: Likewise.
libcpp/ChangeLog:
* ucnid.h: Regenerated for Unicode 14.0.