The FIXME comments saying "Leave out until this is tested a bit more"
are from 1997. I think they've been sufficiently tested.
ChangeLog:
* config-ml.in (multi-do, multi-clean): Add @ to silence recipes.
Remove FIXME comments.
This fixes updating of the step vectors when filling up to group_size.
2020-11-09 Richard Biener <rguenther@suse.de>
PR tree-optimization/97753
* tree-vect-loop.c (vectorizable_induction): Fill vec_steps
when CSEing inside the group.
* gcc.dg/vect/pr97753.c: New testcase.
This fixes the order of walking PHIs and stmts for BB mask
precision compute.
2020-11-09 Richard Biener <rguenther@suse.de>
PR tree-optimization/97746
* tree-vect-patterns.c (vect_determine_precisions): First walk PHIs.
* gcc.dg/vect/bb-slp-pr97746.c: New testcase.
This refactors the ADL lookup. It just so happens the refactoring
makes dropping modules in simpler :) We break apart the namespace and
class fn processing, and move scope iteration to an outer function.
It'll also become possible to find the same enum in multiple place, so
we need to handle that idempotently.
gcc/cp/
* cp-tree.h (LOOKUP_FOUND_P): Add ENUMERAL_TYPE.
* name-lookup.c (class name_lookup): Add comments.
(name_lookup::adl_namespace_only): Replace with ...
(name_lookup::adl_class_fns): ... this and ...
(name_lookup::adl_namespace_fns): ... this.
(name_lookup::adl_namespace): Deal with inline nests here.
(name_lookup::adl_class): Complete the type here.
(name_lookup::adl_type): Call broken-out enum ..
(name_lookup::adl_enum): New. No need to call the namespace adl
if it is class-scope.
(name_lookup::search_adl): Iterate over collected scopes here.
This is a patch from my name-lookup overhaul. I noticed the parser
and one path in name-lookup looked through an overload of a single
known decl. It seems more consistent to do that in both paths through
name-lookup, and not in the parser itself.
gcc/cp/
* name-lookup.c (lookup_qualified_name): Expose an overload of a
singleton with known type.
(lookup_name_1): Just check the overload's type to expose it.
* parser.c (cp_parser_lookup_name): Do not do that check here.
This changes the phi translation cache to be per edge which
pushes it off the profiling radar. For larger testcases the
combined hashtable causes a load of cache misses and making it
per edge allows to shrink the entry further.
2020-11-09 Richard Biener <rguenther@suse.de>
PR tree-optimization/97765
* tree-ssa-pre.c (bb_bitmap_sets::phi_translate_table): Add.
(PHI_TRANS_TABLE): New macro.
(phi_translate_table): Remove.
(expr_pred_trans_d::pred): Remove.
(expr_pred_trans_d::hash): Simplify.
(expr_pred_trans_d::equal): Likewise.
(phi_trans_add): Adjust.
(phi_translate): Likewise. Remove hash-table expansion
detection and optimization.
(phi_translate_set): Allocate PHI_TRANS_TABLE here.
(init_pre): Adjsust.
(fini_pre): Free PHI_TRANS_TABLE.
As PR97705 shows, the commit r11-4637 caused some dumping
comparison difference error on pass ira. It exposed one
issue about the newly introduced function remove_scratches,
which can increase the largest pseudo reg number if it
succeeds, later some function will use the max_reg_num()
to get the latest max_regno, when iterating the numbers
we can access some data structures which are allocated as
the previous max_regno, some out of array bound accesses
can occur, the failure can be random since the values
beyond the array could be random.
This patch is to free/reinit/recompute the relevant data
structures that is regstat_n_sets_and_refs and reg_info_p
to ensure we won't access beyond some array bounds.
Bootstrapped/regtested on powerpc64le-linux-gnu P9 and
powerpc64-linux-gnu P8.
gcc/ChangeLog:
PR rtl-optimization/97705
* ira.c (ira): Refactor some regstat free/init/compute invocation
into lambda function regstat_recompute_for_max_regno, and call it
when max_regno increases as remove_scratches succeeds.
This attribute states that a property is one manipulated by class
methods (it requires a static variable and the setter and getter
must be provided explicitly, they cannot be @synthesized).
gcc/c-family/ChangeLog:
* c-common.h (OBJC_IS_PATTR_KEYWORD): Add class to the list
of keywords accepted in @property attribute contexts.
* c-objc.h (enum objc_property_attribute_group): Add
OBJC_PROPATTR_GROUP_CLASS.
(enum objc_property_attribute_kind): Add
OBJC_PROPERTY_ATTR_CLASS.
gcc/cp/ChangeLog:
* parser.c (cp_parser_objc_at_property_declaration): Handle
class keywords in @property attribute context.
gcc/objc/ChangeLog:
* objc-act.c (objc_prop_attr_kind_for_rid): Handle class
attribute.
(objc_add_property_declaration): Likewise.
* objc-act.h (PROPERTY_CLASS): Record class attribute state.
gcc/testsuite/ChangeLog:
* obj-c++.dg/property/at-property-4.mm: Test handling class
attributes.
* objc.dg/property/at-property-4.m: Likewise.
XFAIL-ing these is not sufficient, unfortunately, we need to
skip them completely.
gcc/testsuite/ChangeLog:
* c-c++-common/zero-scratch-regs-10.c: Skip for powerpc
Darwin.
* c-c++-common/zero-scratch-regs-11.c: Likewise.
* c-c++-common/zero-scratch-regs-8.c: Likewise.
* c-c++-common/zero-scratch-regs-9.c: Likewise.
The builtin_thread_pointer test does not work for emulated TLS.
Add a target requires to cover this.
gcc/testsuite/ChangeLog:
* gcc.target/i386/builtin_thread_pointer.c: Require native TLS.
The r11-4813 patch removed "ignored" from the dg-warnings in this test,
causing this test to fail when compiled as C++.
gcc/testsuite/ChangeLog:
* c-c++-common/Wimplicit-fallthrough-20.c: Adjust dg-warning.
generated_cpp_wcwidth.h was regenerated using Unicode 13.0.0 data files. No
material changes to the parsing scripts (either GCC- or glibc-sourced) were
necessary; glibc's utf8_gen.py was tweaked slightly by glibc and matched here.
contrib/ChangeLog:
* unicode/EastAsianWidth.txt: Update to Unicode 13.0.0.
* unicode/PropList.txt: Likewise.
* unicode/README: Likewise.
* unicode/UnicodeData.txt: Likewise.
* unicode/from_glibc/unicode_utils.py: Update to latest glibc version.
* unicode/from_glibc/utf8_gen.py: Likewise.
libcpp/ChangeLog:
* generated_cpp_wcwidth.h: Regenerated from Unicode 13.0.0 data.
This is the default, but it is still legal in user code and therefore
we should handle it in parsing. Fix whitespace issues in the lines
affected.
gcc/c-family/ChangeLog:
* c-common.c (c_common_reswords): Add 'atomic' property
attribute.
* c-common.h (enum rid): Add RID_PROPATOMIC for atomic
property attributes.
gcc/objc/ChangeLog:
* objc-act.c (objc_prop_attr_kind_for_rid): Handle
RID_PROPATOMIC.
gcc/testsuite/ChangeLog:
* obj-c++.dg/property/at-property-4.mm: Test atomic property
attribute.
* objc.dg/property/at-property-4.m: Likewise.
This attribute allows pointers to be marked as pointers to
an NSObject-compatible object. This allows for additional
checking of assignment etc. when refering to pointers to
opaque types.
gcc/c-family/ChangeLog:
* c-attribs.c (handle_nsobject_attribute): New.
* c.opt: Add WNSObject-attribute.
gcc/objc/ChangeLog:
* objc-act.c (objc_compare_types): Handle NSObject type
attributes.
(objc_type_valid_for_messaging): Likewise.
gcc/testsuite/ChangeLog:
* obj-c++.dg/attributes/nsobject-01.mm: New test.
* objc.dg/attributes/nsobject-01.m: New test.
These tests fail because of an unimplemented 'sorry'; there
is no plan to implement this in the short term, so XFAILing
the tests to reduce noise.
gcc/testsuite/ChangeLog:
* c-c++-common/zero-scratch-regs-10.c: XFAIL for
powerpc-darwin.
* c-c++-common/zero-scratch-regs-11.c: Likewise.
* c-c++-common/zero-scratch-regs-8.c: Likewise.
* c-c++-common/zero-scratch-regs-9.c: Likewise.
The mingw-w64 implementations of `__cxa_thread_atexit()` and `__cxa_atexit()` have been
using `__thiscall` since two years ago. Using the default calling convention (which is
`__cdecl`) causes crashes as explained in PR83562.
Calling conventions have no effect on x86_64-w64-mingw32.
Reference: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83562
Reference: https://sourceforge.net/p/mingw-w64/mingw-w64/ci/master/tree/mingw-w64-crt/crt/cxa_thread_atexit.c
Reference: f3e0fbb40c/
Reference: https://github.com/msys2/MINGW-packages/issues/7071
Signed-off-by: Liu Hao <lh_mouse@126.com>
2020-10-08 Liu Hao <lh_mouse@126.com>
libstdc++-v3:
* libsupc++/cxxabi.h: (__cxa_atexit): mark with _GLIBCXX_CDTOR_CALLABI
(__cxa_thread_atexit): ditto
* libsupc++/atexit_thread.cc: (__cxa_atexit): mark with
_GLIBCXX_CDTOR_CALLABI
(__cxa_thread_atexit): ditto
(elt): ditto
In ac001f5ce604 Alan fixed my thinko using operands that do not refer
to anything mentioned in the RTL pattern. Instead, it just uses fresh
new local rtxes for those.
This patch takes that a tiny bit further: it uses local rtx for all
temporaries used in the expanders. As a bonus that simplifies the code
a tiny bit as well.
2020-11-06 Segher Boessenkool <segher@kernel.crashing.org>
* config/rs6000/rs6000.md (@tablejump<mode>_normal): Don't abuse
operands[].
(@tablejump<mode>_nospec): Ditto.
Use the correct minimized test case source rather than the large test
source.
gcc/testsuite/
* gcc.target/powerpc/pr64505.c: Run everywhere. Use correct minimized
test case.
As part of the MMA base support, we incremented BIGGEST_ALIGNMENT in
order to align the __vector_pair and __vector_quad types to 256 and 512
bytes respectively. This had the unintended effect of changing the
default alignment used by __attribute__ ((__aligned__)) which causes
an ABI break because of some dodgy code in GLIBC's struct pthread.
The fix is to revert the BIGGEST_ALIGNMENT change and to force the
alignment on the type itself rather than the mode used by the type.
2020-11-06 Peter Bergner <bergner@linux.ibm.com>
gcc/
* config/rs6000/rs6000.h (BIGGEST_ALIGNMENT): Revert previous commit
so as not to break the ABI.
* config/rs6000/rs6000-call.c (rs6000_init_builtins): Set the ABI
mandated alignment for __vector_pair and __vector_quad types.
gcc/testsuite/
* gcc.target/powerpc/mma-alignment.c: New test.
I gave Ke Wen bad advice, luckily David corrected me: it is true that we
cannot use TARGET_POWERPC64 on many 32-bit OSes, since either the kernel
or userland does not save the top half of the 64-bit integer registers,
but we do not have to care about that in separate patterns or related
code. The flag is automatically not enabled by default on targets that
do not handle this correctly.
This patch fixes it.
Segher
2020-11-06 Segher Boessenkool <segher@kernel.crashing.org>
PR target/96933
* config/rs6000/rs6000.c (rs6000_expand_vector_init): Use
TARGET_POWERPC64 instead of TARGET_64BIT.
Add built-in functions __builtin_nansd32, __builtin_nansd64 and
__builtin_nansd128 to return signaling NaNs of decimal floating-point
types, analogous to the functions already present for binary
floating-point types.
This patch, independent of
<https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557136.html>
(pending review), is in preparation for adding the <float.h> macros
for such signaling NaNs that are in C2x, analogous to the macros for
other types that are in that patch.
Bootstrapped with no regressions for x86_64-pc-linux-gnu. Also ran
the new tests for powerpc64le-linux-gnu to confirm they do work in the
case (hardware DFP) where floating-point exceptions are supported for
DFP.
gcc/
2020-11-06 Joseph Myers <joseph@codesourcery.com>
* builtins.def (BUILT_IN_NANSD32, BUILT_IN_NANSD64)
(BUILT_IN_NANSD128): New built-in functions.
* fold-const-call.c (fold_const_call): Handle the new built-in
functions.
* doc/extend.texi (__builtin_nansd32, __builtin_nansd64)
(__builtin_nansd128): Document.
* doc/sourcebuild.texi (Effective-Target Keywords): Document
fenv_exceptions_dfp.
gcc/testsuite/
2020-11-06 Joseph Myers <joseph@codesourcery.com>
* lib/target-supports.exp
(check_effective_target_fenv_exceptions_dfp): New.
* gcc.dg/dfp/builtin-snan-1.c, gcc.dg/dfp/builtin-snan-2.c: New
tests.
While messing with check_handlers_1, I spotted this bug report which
complains that we don't warn about the case when we have two duplicated
handlers of type int. can_convert_eh implements [except.handle] and
that says: A handler is a match for an exception object of type E if
- The handler is of type cv T or cv T& and E and T are the same type
(ignoring the top-level cv-qualifiers), or [...]
but we don't implement this bullet properly for non-class types. The
fix therefore seems pretty obvious. Also change the return type to
bool when we're only returning yes/no.
gcc/cp/ChangeLog:
PR c++/81660
* except.c (can_convert_eh): Change the return type to bool. If
the type TO and FROM are the same, return true.
gcc/testsuite/ChangeLog:
PR c++/81660
* g++.dg/warn/Wexceptions3.C: New test.
* g++.dg/eh/pr42859.C: Add dg-warning.
* g++.dg/torture/pr81659.C: Likewise.
Function use_pred_not_overlap_with_undef_path_pred of
pass_late_warn_uninitialized
checks if predicate of variable use overlaps with predicate of undefined
control flow path.
For now, it only checks ssa_var comparing against constant, this can be
improved where
ssa_var compares against another ssa_var with value range info, as described in
comment:
+ /* Check value range info of rhs, do following transforms:
+ flag_var < [min, max] -> flag_var < max
+ flag_var > [min, max] -> flag_var > min
+
+ We can also transform LE_EXPR/GE_EXPR to LT_EXPR/GT_EXPR:
+ flag_var <= [min, max] -> flag_var < [min, max+1]
+ flag_var >= [min, max] -> flag_var > [min-1, max]
+ if no overflow/wrap. */
gcc/
* tree-ssa-uninit.c (find_var_cmp_const): New function.
(use_pred_not_overlap_with_undef_path_pred): Call above.
The change in r11-4748-50b840ac5e1d6534e345c3fee9a97ae45ced6bc7 causes
a build error on Solaris, due to the new explicit instantiation matching
patterns for two different symbol versions.
libstdc++-v3/ChangeLog:
* config/abi/pre/gnu.ver (GLIBCXX_3.4.21): Tighten up patterns
for basic_stringbuf that refer to __xfer_bufptrs.
This passes visibiliy through without warning (so that, for example,
__attribute__((__visibility("default"))) does not result in any
diagnostic).
gcc/objc/ChangeLog:
* objc-act.c (start_class): Accept visibility attributes
without warning.
At present, we are missing parsing and checking for around
half of the property attributes in use. The existing ad hoc scheme
for the parser's communication with the Objective C validation
is not suitable for extending to cover all the missing cases.
Additionally:
1/ We were declaring errors in two cases that the reference
implementation warns (or is silent).
I've elected to warn for both those cases, (Wattributes) it
could be that we should implement Wobjc-xxx-property warning
masks (TODO).
2/ We were emitting spurious complaints about missing property
attributes when these were not being parsed because we gave
up on the first syntax error.
3/ The quality of the diagnostic locations was poor (that's
true for much of Objective-C, we will have to improve it as
we modernise areas).
We continue to attempt to keep the code, warning and error output
similar (preferably identical output) between the C and C++ front
ends.
The interface to the Objective-C-specific parts of the parsing is
simplified to a vector of parsed (but not fully-checked) property
attributes, this will simplify the addition of new attributes.
gcc/c-family/ChangeLog:
* c-objc.h (enum objc_property_attribute_group): New
(enum objc_property_attribute_kind): New.
(OBJC_PROPATTR_GROUP_MASK): New.
(struct property_attribute_info): Small class encapsulating
parser output from property attributes.
(objc_prop_attr_kind_for_rid): New
(objc_add_property_declaration): Simplify interface.
* stub-objc.c (enum rid): Dummy type.
(objc_add_property_declaration): Simplify interface.
(objc_prop_attr_kind_for_rid): New.
gcc/c/ChangeLog:
* c-parser.c (c_parser_objc_at_property_declaration):
Improve parsing fidelity. Associate better location info
with @property attributes. Clean up the interface to
objc_add_property_declaration ().
gcc/cp/ChangeLog:
* parser.c (cp_parser_objc_at_property_declaration):
Improve parsing fidelity. Associate better location info
with @property attributes. Clean up the interface to
objc_add_property_declaration ().
gcc/objc/ChangeLog:
* objc-act.c (objc_prop_attr_kind_for_rid): New.
(objc_add_property_declaration): Adjust to consume the
parser output using a vector of parsed attributes.
gcc/testsuite/ChangeLog:
* obj-c++.dg/property/at-property-1.mm: Adjust expected
diagnostics.
* obj-c++.dg/property/at-property-29.mm: Likewise.
* obj-c++.dg/property/at-property-4.mm: Likewise.
* obj-c++.dg/property/property-neg-2.mm: Likewise.
* objc.dg/property/at-property-1.m: Likewise.
* objc.dg/property/at-property-29.m: Likewise.
* objc.dg/property/at-property-4.m: Likewise.
* objc.dg/property/at-property-5.m: Likewise.
* objc.dg/property/property-neg-2.m: Likewise.
On the following testcase where the cdtor attributes aren't on the
in-class declaration but on an out-of-class definition, the cdtors
have their clones created from the in-class declaration, and later on
duplicate_decls updates attributes on the abstract cdtors, but nothing
propagates them to the clones.
2020-11-06 Jakub Jelinek <jakub@redhat.com>
PR c++/67453
* decl.c (duplicate_decls): Propagate DECL_ATTRIBUTES and
DECL_PRESERVE_P from olddecl to its clones if any.
* g++.dg/ext/attr-used-2.C: New test.
As per Nigel Tufnel's assertion "... this one goes to 11".
The various parts of the code that deal with mapping Darwin versions
to macOS (X) versions need updating to deal with a major version of
11.
So now we have, for example:
Darwin 4 => macOS (X) 10.0
…
Darwin 14 => macOS (X) 10.10
...
Darwin 19 => macOS (X) 10.15
Darwin 20 => macOS 11.0
Because of the historical duplication of the "10" in macOSX 10.xx and
the number of tools that expect this, it is likely that system tools will
allow macos11.0 and/or macosx11.0 (despite that the latter makes little
sense).
Update the link test to cover Catalina (Darwin19/10.15) and
Big Sur (Darwin20/11.0).
gcc/ChangeLog:
* config/darwin-c.c: Allow for Darwin20 to correspond to macOS 11.
* config/darwin-driver.c: Likewise.
gcc/testsuite/ChangeLog:
* gcc.dg/darwin-minversion-link.c: Allow for Darwin19 (macOS 10.15)
and Darwin20 (macOS 11.0).
Turns out its size and time requirements can be stripped down
dramatically.
2020-11-06 Richard Biener <rguenther@suse.de>
* tree-ssa-pre.c (expr_pred_trans_d): Modify so elements
are embedded rather than allocated. Remove hashval member,
make all members integers.
(phi_trans_add): Adjust accordingly.
(phi_translate): Likewise. Deal with re-allocation
of the table.
When a range is recalculated, retain what was previously known as IL changes
can produce different results from un-executed code. This also paves
the way for external injection of ranges.
gcc/
PR tree-optimization/97737
PR tree-optimization/97741
* gimple-range.cc: (gimple_ranger::range_of_stmt): Intersect newly
calculated ranges with the existing known global range.
gcc/testsuite/
* gcc.dg/pr97737.c: New.
* gcc.dg/pr97741.c: New.
gcc/
* config/rx/rx.md (CTRLREG_PC): Add.
* config/rx/rx.c (CTRLREG_PC): Add
(rx_expand_builtin_mvtc): Add warning: PC register cannot
be used as dest.
In cleaning up C++'s handling of hidden decls, I renamed its
DECL_BUILTIN_P, which checks for loc == BUILTINS_LOCATION to
DECL_UNDECLARED_BUILTIN_P, because the location gets updated, if user
source declares the builtin, and the predicate no longer holds. The
original name was confusing me. (The builtin may still retain builtin
properties in the redeclaration, and other predicates can still detect
that.)
I discovered that tree.h had its own variant 'DECL_IS_BUILTIN', which
behaves in (almost) the same manner. And therefore has the same
mutating behaviour.
This patch deletes the C++ one, and renames tree.h's to
DECL_IS_UNDECLARED_BUILTIN, to emphasize its non-constantness. I
guess _IS_ wins over _P
gcc/
* tree.h (DECL_IS_BUILTIN): Rename to ...
(DECL_IS_UNDECLARED_BUILTIN): ... here. No need to use SOURCE_LOCUS.
* calls.c (maybe_warn_alloc_args_overflow): Adjust for rename.
* cfgexpand.c (pass_expand::execute): Likewise.
* dwarf2out.c (base_type_die, is_naming_typedef_decl): Likewise.
* godump.c (go_decl, go_type_decl): Likewise.
* print-tree.c (print_decl_identifier): Likewise.
* tree-pretty-print.c (dump_generic_node): Likewise.
* tree-ssa-ccp.c (pass_post_ipa_warn::execute): Likewise.
* xcoffout.c (xcoff_assign_fundamental_type_number): Likewise.
gcc/c-family/
* c-ada-spec.c (collect_ada_nodes): Rename
DECL_IS_BUILTIN->DECL_IS_UNDECLARED_BUILTIN.
(collect_ada_node): Likewise.
(dump_forward_type): Likewise.
* c-common.c (set_underlying_type): Rename
DECL_IS_BUILTIN->DECL_IS_UNDECLARED_BUILTIN.
(user_facing_original_type, c_common_finalize_early_debug): Likewise.
gcc/c/
* c-decl.c (diagnose_mismatched_decls): Rename
DECL_IS_BUILTIN->DECL_IS_UNDECLARED_BUILTIN.
(warn_if_shadowing, implicitly_declare, names_builtin_p)
(collect_source_refs): Likewise.
* c-typeck.c (inform_declaration, inform_for_arg)
(convert_for_assignment): Likewise.
gcc/cp/
* cp-tree.h (DECL_UNDECLARED_BUILTIN_P): Delete.
* cp-objcp-common.c (names_bultin_p): Rename
DECL_IS_BUILTIN->DECL_IS_UNDECLARED_BUILTIN.
* decl.c (decls_match): Likewise. Replace
DECL_UNDECLARED_BUILTIN_P with DECL_IS_UNDECLARED_BUILTIN.
(duplicate_decls): Likewise.
* decl2.c (collect_source_refs): Likewise.
* name-lookup.c (anticipated_builtin_p, print_binding_level)
(do_nonmember_using_decl): Likewise.
* pt.c (builtin_pack_fn_p): Likewise.
* typeck.c (error_args_num): Likewise.
gcc/lto/
* lto-symtab.c (lto_symtab_merge_decls_1): Rename
DECL_IS_BUILTIN->DECL_IS_UNDECLARED_BUILTIN.
gcc/go/
* go-gcc.cc (Gcc_backend::call_expression): Rename
DECL_IS_BUILTIN->DECL_IS_UNDECLARED_BUILTIN.
libcc1/
* libcc1plugin.cc (address_rewriter): Rename
DECL_IS_BUILTIN->DECL_IS_UNDECLARED_BUILTIN.
* libcp1plugin.cc (supplement_binding): Likewise.
The use of vqshrn_high_n_s32 was triggering an unneeded register move, because
sqshrn2 is destructive but was declared as inline assembly in arm_neon.h. This
patch implements sqshrn2 and uqshrn2 as actual intrinsics which do not trigger
the unnecessary move, along with new tests to cover them.
gcc/ChangeLog
2020-11-06 David Candler <david.candler@arm.com>
* config/aarch64/aarch64-builtins.c
(TYPES_SHIFT2IMM): Add define.
(TYPES_SHIFT2IMM_UUSS): Add define.
(TYPES_USHIFT2IMM): Add define.
* config/aarch64/aarch64-simd.md
(aarch64_<sur>q<r>shr<u>n2_n<mode>): Add new insn for upper saturating shift right.
* config/aarch64/aarch64-simd-builtins.def: Add intrinsics.
* config/aarch64/arm_neon.h:
(vqrshrn_high_n_s16): Expand using intrinsic rather than inline asm.
(vqrshrn_high_n_s32): Likewise.
(vqrshrn_high_n_s64): Likewise.
(vqrshrn_high_n_u16): Likewise.
(vqrshrn_high_n_u32): Likewise.
(vqrshrn_high_n_u64): Likewise.
(vqrshrun_high_n_s16): Likewise.
(vqrshrun_high_n_s32): Likewise.
(vqrshrun_high_n_s64): Likewise.
(vqshrn_high_n_s16): Likewise.
(vqshrn_high_n_s32): Likewise.
(vqshrn_high_n_s64): Likewise.
(vqshrn_high_n_u16): Likewise.
(vqshrn_high_n_u32): Likewise.
(vqshrn_high_n_u64): Likewise.
(vqshrun_high_n_s16): Likewise.
(vqshrun_high_n_s32): Likewise.
(vqshrun_high_n_s64): Likewise.
gcc/testsuite/ChangeLog
2020-11-06 David Candler <david.candler@arm.com>
* gcc.target/aarch64/advsimd-intrinsics/vqrshrn_high_n.c: New testcase.
* gcc.target/aarch64/advsimd-intrinsics/vqrshrun_high_n.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vqshrn_high_n.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vqshrun_high_n.c: Likewise.
* gcc.target/aarch64/narrow_high-intrinsics.c: Update expected assembler
for sqshrun2, sqrshrun2, sqshrn2, uqshrn2, sqrshrn2 and uqrshrn2.
Joseph pointed me at cb_get_source_date_epoch, which allows repeatable
builds and solves a FIXME I had on the modules branch. Unfortunately
it's used exclusively to generate __DATE__ and __TIME__ values, which
fallback to using a time(2) call. It'd be nicer if the preprocessor
made whatever time value it determined available to the rest of the
compiler. So this patch adds a new cpp_get_date function, which
abstracts the call to the get_source_date_epoch hook, or uses time
directly. The value is cached. Thus the timestamp I end up putting
on CMI files matches __DATE__ and __TIME__ expansions. That seems
worthwhile.
libcpp/
* include/cpplib.h (enum class CPP_time_kind): New.
(cpp_get_date): Declare.
* internal.h (struct cpp_reader): Replace source_date_epoch with
time_stamp and time_stamp_kind.
* init.c (cpp_create_reader): Initialize them.
* macro.c (_cpp_builtin_macro_text): Use cpp_get_date.
(cpp_get_date): Broken out from _cpp_builtin_macro_text and
genericized.
This patch adds support for permuting unpacked SVE vectors using:
- DUP
- EXT
- REV[BHW]
- REV
- TRN[12]
- UZP[12]
- ZIP[12]
This involves rewriting the REV[BHW] permute code so that the inputs
and outputs of the insn pattern have the same mode as the vectors
being permuted. This is different from the ACLE form, where the
reversal happens within individual elements rather than within
groups of multiple elements.
The patch does not add a conditional version of REV[BHW]. I'll come
back to that once we have partial-vector comparisons and selects.
The patch is really just enablement, adding an extra tool to the
toolbox. It doesn't bring any significant vectorisation opportunities
on its own. However, the patch does have one artificial example that
is now vectorised in a better way than before.
gcc/
* config/aarch64/aarch64-modes.def (VNx2BF, VNx4BF): Adjust nunits
and alignment based on the current VG.
* config/aarch64/iterators.md (SVE_ALL, SVE_24, SVE_2, SVE_4): Add
partial SVE BF modes.
(UNSPEC_REVBHW): New unspec.
(Vetype, Vesize, Vctype, VEL, Vel, vwcore, V_INT_CONTAINER)
(v_int_container, VPRED, vpred): Handle partial SVE BF modes.
(container_bits, Vcwtype): New mode attributes.
* config/aarch64/aarch64-sve.md
(@aarch64_sve_revbhw_<SVE_ALL:mode><PRED_HSD:mode>): New pattern.
(@aarch64_sve_dup_lane<mode>): Extended from SVE_FULL to SVE_ALL.
(@aarch64_sve_rev<mode>, @aarch64_sve_<perm_insn><mode>): Likewise.
(@aarch64_sve_ext<mode>): Likewise.
* config/aarch64/aarch64.c (aarch64_classify_vector_mode): Handle
E_VNx2BFmode and E_VNx4BFmode.
(aarch64_evpc_rev_local): Base the analysis on the container size
instead of the element size. Use the new aarch64_sve_revbhw
patterns for SVE.
(aarch64_evpc_dup): Handle partial SVE data modes. Use the
container size instead of the element size when applying the
SVE immediate limit. Fix a previously incorrect bounds check.
(aarch64_expand_vec_perm_const_1): Handle partial SVE data modes.
gcc/testsuite/
* gcc.target/aarch64/sve/dup_lane_2.c: New test.
* gcc.target/aarch64/sve/dup_lane_3.c: Likewise.
* gcc.target/aarch64/sve/ext_4.c: Likewise.
* gcc.target/aarch64/sve/rev_2.c: Likewise.
* gcc.target/aarch64/sve/revhw_1.c: Likewise.
* gcc.target/aarch64/sve/revhw_2.c: Likewise.
* gcc.target/aarch64/sve/slp_perm_8.c: Likewise.
* gcc.target/aarch64/sve/trn1_2.c: Likewise.
* gcc.target/aarch64/sve/trn2_2.c: Likewise.
* gcc.target/aarch64/sve/uzp1_2.c: Likewise.
* gcc.target/aarch64/sve/uzp2_2.c: Likewise.
* gcc.target/aarch64/sve/zip1_2.c: Likewise.
* gcc.target/aarch64/sve/zip2_2.c: Likewise.
gcc/ChangeLog:
* common.opt: Add new -fbit-tests option.
* doc/invoke.texi: Document the option.
* tree-switch-conversion.c (bit_test_cluster::find_bit_tests):
Use the option.
* tree-switch-conversion.h (is_enabled): New function.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/switch-4.c: New test.
This separates constant and non-constant value-ids to allow for
a more efficient constant_value_id_p and for more efficient bit-packing
inside the bitmap sets which never contain any constant values.
There's further optimization opportunities but at this stage
I'll do small refactorings.
2020-11-06 Richard Biener <rguenther@suse.de>
* tree-ssa-sccvn.h (get_max_constant_value_id): Declare.
(get_next_constant_value_id): Likewise.
(value_id_constant_p): Inline and simplify.
* tree-ssa-sccvn.c (constant_value_ids): Remove.
(next_constant_value_id): Add.
(get_or_alloc_constant_value_id): Adjust.
(value_id_constant_p): Remove definition.
(get_max_constant_value_id): Define.
(get_next_value_id): Add assert for overflow.
(get_next_constant_value_id): Define.
(run_rpo_vn): Adjust.
(free_rpo_vn): Likewise.
(do_rpo_vn): Initialize next_constant_value_id.
* tree-ssa-pre.c (constant_value_expressions): New.
(add_to_value): Split into constant/non-constant value
handling. Avoid exact re-allocation.
(vn_valnum_from_value_id): Adjust.
(phi_translate_1): Remove spurious exact re-allocation.
(bitmap_find_leader): Adjust. Make sure we return
a CONSTANT value for a constant value id.
(do_pre_regular_insertion): Use 2 auto-elements for avail.
(do_pre_partial_partial_insertion): Likewise.
(init_pre): Allocate constant_value_expressions.
(fini_pre): Release constant_value_expressions.