Now that nonzero bits are first class citizens in the range, we can
keep better track of them in range-ops, especially the bitwise and
operator.
This patch sets the nonzero mask for the trivial case. In doing so,
I've removed some old dead code that was an attempt to keep better
track of masks.
I'm sure there are tons of optimizations throughout range-ops that
could be implemented, especially the op1_range methods, but those
always make my head hurt. I'll leave them to the smarter hackers
out there.
I've removed the restriction that nonzero bits can't be queried from
legacy. This was causing special casing all over the place, and
it's not like we can generate incorrect code. We just silently
drop nonzero bits to -1 in some of the legacy code. The end result
is that VRP1, and other users of legacy, may not benefit from these
improvements.
Tested and benchmarked on x86-64 Linux.
gcc/ChangeLog:
* range-op.cc (unsigned_singleton_p): Remove.
(operator_bitwise_and::remove_impossible_ranges): Remove.
(operator_bitwise_and::fold_range): Set nonzero bits. *
* value-range.cc (irange::get_nonzero_bits): Remove
legacy_mode_p assert.
(irange::dump_bitmasks): Remove legacy_mode_p check.
The PR is about the aarch64 port using an ACLE built-in function
to vectorise a scalar function call, even though the ECF_* flags for
the ACLE function didn't match the ECF_* flags for the scalar call.
To some extent that kind of difference is inevitable, since the
ACLE intrinsics are supposed to follow the behaviour of the
underlying instruction as closely as possible. Also, using
target-specific builtins has the drawback of limiting further
gimple optimisation, since the gimple optimisers won't know what
the function does.
We handle several other maths functions, including round, floor
and ceil, by defining directly-mapped internal functions that
are linked to the associated built-in functions. This has two
main advantages:
- it means that, internally, we are not restricted to the set of
scalar types that happen to have associated C/C++ functions
- the functions (and thus the underlying optabs) extend naturally
to vectors
This patch takes the same approach for the remaining functions
handled by aarch64_builtin_vectorized_function.
gcc/
PR target/106253
* predict.h (insn_optimization_type): Declare.
* predict.cc (insn_optimization_type): New function.
* internal-fn.def (IFN_ICEIL, IFN_IFLOOR, IFN_IRINT, IFN_IROUND)
(IFN_LCEIL, IFN_LFLOOR, IFN_LRINT, IFN_LROUND, IFN_LLCEIL)
(IFN_LLFLOOR, IFN_LLRINT, IFN_LLROUND): New internal functions.
* internal-fn.cc (unary_convert_direct): New macro.
(expand_convert_optab_fn): New function.
(expand_unary_convert_optab_fn): New macro.
(direct_unary_convert_optab_supported_p): Likewise.
* optabs.cc (expand_sfix_optab): Pass insn_optimization_type to
convert_optab_handler.
* config/aarch64/aarch64-protos.h
(aarch64_builtin_vectorized_function): Delete.
* config/aarch64/aarch64-builtins.cc
(aarch64_builtin_vectorized_function): Delete.
* config/aarch64/aarch64.cc
(TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION): Delete.
* config/i386/i386.cc (ix86_optab_supported_p): Handle lround_optab.
* config/i386/i386.md (lround<X87MODEF:mode><SWI248x:mode>2): Remove
optimize_insn_for_size_p test.
gcc/testsuite/
PR target/106253
* gcc.target/aarch64/vect_unary_1.c: Add tests for iroundf,
llround, iceilf, llceil, ifloorf, llfloor, irintf and llrint.
* gfortran.dg/vect/pr106253.f: New test.
The following expands the comment in vect_do_peeling as to why we
do not need create_lcssa_for_virtual_phi and removes that function.
That's the last bit I have queued for the vectorizer virtual LCSSA
cleanup.
* tree-vect-loop-manip.cc (create_lcssa_for_virtual_phi):
Remove.
(vect_do_peeling): Do not call it, adjust comment.
The Has_Enough_Free_Memory was not correctly reporting a completely full
chunk in the case of a 0-sized allocation.
gcc/ada/
* libgnat/s-secsta.adb (Has_Enough_Free_Memory): Check for full
chunk before computing the available size.
If the flag Opt.Expand_Nonbinary_Modular_Ops is set (which occurs if
-gnateg is specified) then we implement predefined operations for a
modular type whose modulus is not a power of two by converting the
operands to some other type (either a signed integer type or a modular
type with a power-of-two modulus), doing the operation in that
representation, and converting back. If the bounds of the chosen type
are too narrow, then problems with intermediate overflow can result. But
there are performance advantages to choosing narrower bounds (and to
prefering an unsigned choice over a signed choice of the same size) when
multiple safe choices are available.
gcc/ada/
* exp_ch4.adb (Expand_Nonbinary_Modular_Op.Expand_Modular_Op):
Reimplement choice of which predefined type to use for the
implementation of a predefined operation of a modular type with
a non-power-of-two modulus.
This patch corrects an error in the compiler whereby a spurious
redundant use_type_clause warning gets issued when the clause appears in
the context_clause of a package preceding a with_clause for a package
with an identical use_clause in its specification.
gcc/ada/
* einfo.ads: Modify documentation for In_Use flag to include
scope stack manipulation.
* sem_ch8.adb (Use_One_Type): Add condition to return when
attempting to detect redundant use_type_clauses in child units
in certain cases.
This makes it possible to report violations of the No_Dependence restriction
during code generation, in other words outside of the Ada front-end proper.
These violations are supposed to be only for child units of System, so the
implementation is restricted to these cases.
gcc/ada/
* restrict.ads (type ND_Entry): Add System_Child component.
(Check_Restriction_No_Dependence_On_System): Declare.
* restrict.adb (Global_Restriction_No_Tasking): Move around.
(Violation_Of_No_Dependence): New procedure.
(Check_Restriction_No_Dependence): Call Violation_Of_No_Dependence
to report a violation.
(Check_Restriction_No_Dependence_On_System): New procedure.
(Set_Restriction_No_Dependenc): Set System_Child component if the
unit is a child of System.
* snames.ads-tmpl (Name_Arith_64): New package name.
(Name_Arith_128): Likewise.
(Name_Memory): Likewise.
(Name_Stack_Checking): Likewise.
* fe.h (Check_Restriction_No_Dependence_On_System): Declare.
This patch implements a syntactic language extension that allows
declarative items to appear in a sequence of statements. For example:
for X in S'Range loop
Item : Character renames S (X);
Item := Transform (Item);
end loop;
Previously, declare/begin/end was required, which is just noise.
gcc/ada/
* par.adb (P_Declarative_Items): New function to parse a
sequence of declarative items.
(P_Sequence_Of_Statements): Add Handled flag, to indicate
whether to wrap the result in a block statement.
* par-ch3.adb (P_Declarative_Item): Rename P_Declarative_Items
to be P_Declarative_Item, because it really only parses a single
declarative item, and to avoid conflict with the new
P_Declarative_Items. Add In_Statements. We keep the old
error-recovery mechanisms in place when In_Statements is False.
When True, we don't want to complain about statements, because
we are parsing a sequence of statements.
(P_Identifier_Declarations): If In_Statements, and we see what
looks like a statement, we no longer give an error. We return to
P_Sequence_Of_Statements with Done = True, so it can parse the
statement.
* par-ch5.adb (P_Sequence_Of_Statements): Call
P_Declarative_Items to parse declarative items that appear in
the statement list. Remove error handling code that complained
about such items. Check some errors conservatively. Wrap the
result in a block statement when necessary.
* par-ch11.adb (P_Handled_Sequence_Of_Statements): Pass
Handled => True to P_Sequence_Of_Statements.
* types.ads (No, Present): New functions for querying
Source_Ptrs (equal, not equal No_Location).
When looking for a misspelling of a restriction identifier we should
ignore the Not_A_Restriction_Id literal, because it doesn't represent
any restriction.
gcc/ada/
* sem_prag.adb (Process_Restrictions_Or_Restriction_Warnings):
Fix range of iteration.
When pragma Restriction is used with an unknown restriction identifier,
it is better to not process the restriction expression, as it will
likely produce confusing error message.
In particular, an odd message appeared when there was a typo in the
restriction identifier whose expression requires special processing
(e.g. No_Dependence_On instead of No_Dependence).
gcc/ada/
* sem_prag.adb (Process_Restrictions_Or_Restriction_Warnings):
Do not process expression of unknown restrictions.
Also move -P switch description to the top of the switches list.
gcc/ada/
* makeusg.adb,
doc/gnat_ugn/building_executable_programs_with_gnat.rst: Move -P
to the top of switches list and make it clear that gnatmake
passes the ball to gprbuild if -P is set.
* gnat_ugn.texi: Regenerate.
Follow-on to previous change, which missed the vxworks version of this
package.
gcc/ada/
* libgnat/g-socthi__vxworks.adb (C_Connect): Suppress new warning.
In the special mode for GNATprove, ignore switches controlling frontend
warnings, like already done for the control of style checks warnings.
Also remove special handling of warning mode in Errout to make up for
the previous division of control between -gnatw (GNAT) and --warnings
(GNATprove).
gcc/ada/
* errout.adb (Record_Compilation_Errors): Remove global
variable.
(Compilation_Errors): Simplify.
(Initialize): Inline Reset_Warnings.
(Reset_Warnings): Remove.
* errout.ads (Reset_Warnings): Remove.
(Compilation_Errors): Update comment.
* gnat1drv.adb (Adjust_Global_Switches): Ignore all frontend
warnings in GNATprove mode, except regarding elaboration and
suspicious contracts.
This plugs a small loophole in the Needs_Secondary_Stack predicate for
some protected types and record types containing protected components.
gcc/ada/
* sem_util.adb (Caller_Known_Size_Record): Make entry assertion
more robust and add guard for null argument. For protected
types, invoke Caller_Known_Size_Record on
Corresponding_Record_Type.
(Needs_Secondary_Stack): Likewise.
Only smp runtimes are built for vxworks7*, even though the -smp suffix
is removed during install. This change removes unused system packages
for rtp runtimes.
gcc/ada/
* libgnat/system-vxworks7-ppc-rtp.ads: Remove
* libgnat/system-vxworks7-x86-rtp.ads: Likewise.
This patch removes a spurious warning, saying that an internal entity of
a generic formal package is unreferenced. The immediate cause of this
warning is that the internal entity is explicitly flagged as coming from
source.
The explicit flagging was added decades ago to fix a missing
cross-reference in the ALI file. Apparently these days the
cross-references work fine without this flag.
gcc/ada/
* sem_ch12.adb (Analyze_Package_Instantiation): Remove dubious
call to Set_Comes_From_Source.
This patch refines the heuristics for when we warn about unreachable
code, to avoid common false alarms.
gcc/ada/
* sem_ch5.adb (Check_Unreachable_Code): Refine heuristics.
* sem_util.ads, sem_util.adb (Is_Static_Constant_Name): Remove
this; instead we have a new function Is_Simple_Case in
Sem_Ch5.Check_Unreachable_Code.
This patch removes a warning in examples like this:
if cond then
return; -- or other jump
end if;
X := ...; -- where the value is out of range
where cond is known at compile time. It could, for example, be a generic
formal parameter that is known to be True in some instances.
As a side effect, this patch adds new warnings about unreachable code.
gcc/ada/
* gnatls.adb (Output_License_Information): Remove pragma
No_Return; call sites deal with Exit_Program.
* libgnat/g-socthi.adb (C_Connect): Suppress warning about
unreachable code.
* sem_ch5.adb (Check_Unreachable_Code): Special-case if
statements with static conditions. If we remove unreachable
code (including the return statement) from a function, add
"raise Program_Error", so we won't warn about missing returns.
Remove Original_Node in test for N_Raise_Statement; it's not
needed. Remove test for CodePeer_Mode; if Operating_Mode =
Generate_Code, then CodePeer_Mode can't be True. Misc cleanup.
Do not reuse Nxt variable for unrelated purpose (the usage in
the Kill_Dead_Code loop is entirely local to the loop).
* sem_ch6.adb: Add check for Is_Transfer. Misc cleanup.
* sem_prag.adb: Minor.
* sem_res.adb: Minor.
* sem_util.adb: Minor cleanup.
(Is_Trivial_Boolean): Move to nonnested place, so it can be
called from elsewhere.
(Is_Static_Constant_Boolean): New function.
* sem_util.ads (Is_Trivial_Boolean): Export.
(Is_Static_Constant_Boolean): New function.
In the case of an expression function that is a primitive function of a
tagged type, freezing the tagged type needs to freeze the function (and
its return expression). A bug in this area could result in incorrect
behavior both at compile time and at run time. At compile time, freezing
rule violations could go undetected so that an illegal program could be
incorrectly accepted. At run time, a dispatching call to the primitive
function could end up dispatching through a not-yet-initialized slot in
the dispatch table, typically (although not always) resulting in a
segmentation fault.
gcc/ada/
* freeze.adb (Check_Expression_Function.Find_Constant): Add a
check that a type that is referenced as the prefix of an
attribute is fully declared.
(Freeze_And_Append): Do not freeze the profile when freezing an
expression function.
(Freeze_Entity): When a tagged type is frozen, also freeze any
primitive operations of the type that are expression functions.
* sem_ch6.adb (Analyze_Subprogram_Body_Helper): Do not prevent
freezing associated with an expression function body if the
function is a dispatching op.
Fix an inconsistency, where GNAT was warning about references to unset
objects inside generic packages with no bodies but not inside ordinary
packages with no bodies.
gcc/ada/
* sem_ch7.adb (Analyze_Package_Declaration): Check references to
unset objects.
gcc/testsuite/
* gnat.dg/specs/discr5.ads: Expect new warnings.
* gnat.dg/specs/empty_variants.ads: Likewise.
* gnat.dg/specs/pack13.ads: Likewise.
A small fix for the aspect Yield defined in AI12-0279 for Ada 2022, to
accept aspect given for a subprogram body which acts as its own spec.
For example:
procedure Switch with Yield => True is
begin
...
end Switch;
gcc/ada/
* sem_ch13.adb (Analyze_Aspect_Yield): Look at the entity kind,
not at the declaration kind.
The concatenation routines may read too much data on the source side when
the destination buffer is larger than the final result. This change makes
sure that this does not happen any more and also removes obsolete stuff.
gcc/ada/
* rtsfind.ads (RE_Id): Remove RE_Str_Concat_Bounds_N values.
(RE_Unit_Table): Remove RE_Str_Concat_Bounds_N entries.
* libgnat/s-conca2.ads (Str_Concat_2): Adjust head comment.
(Str_Concat_Bounds_2): Delete.
* libgnat/s-conca2.adb (Str_Concat_2): Use the length of the last
input to size the last assignment.
(Str_Concat_Bounds_2): Delete.
* libgnat/s-conca3.ads (Str_Concat_3): Adjust head comment.
(Str_Concat_Bounds_3): Delete.
* libgnat/s-conca3.adb (Str_Concat_3): Use the length of the last
input to size the last assignment.
(Str_Concat_Bounds_3): Delete.
* libgnat/s-conca4.ads (Str_Concat_4): Adjust head comment.
(Str_Concat_Bounds_4): Delete.
* libgnat/s-conca4.adb (Str_Concat_4): Use the length of the last
input to size the last assignment.
(Str_Concat_Bounds_4): Delete.
* libgnat/s-conca5.ads (Str_Concat_5): Adjust head comment.
(Str_Concat_Bounds_5): Delete.
* libgnat/s-conca5.adb (Str_Concat_5): Use the length of the last
input to size the last assignment.
(Str_Concat_Bounds_5): Delete.
* libgnat/s-conca6.ads (Str_Concat_6): Adjust head comment.
(Str_Concat_Bounds_6): Delete.
* libgnat/s-conca6.adb (Str_Concat_6): Use the length of the last
input to size the last assignment.
(Str_Concat_Bounds_6): Delete.
* libgnat/s-conca7.ads (Str_Concat_7): Adjust head comment.
(Str_Concat_Bounds_7): Delete.
* libgnat/s-conca7.adb (Str_Concat_7): Use the length of the last
input to size the last assignment.
(Str_Concat_Bounds_7): Delete.
* libgnat/s-conca8.ads (Str_Concat_8): Adjust head comment.
(Str_Concat_Bounds_8): Delete.
* libgnat/s-conca8.adb (Str_Concat_8): Use the length of the last
input to size the last assignment.
(Str_Concat_Bounds_8): Delete.
* libgnat/s-conca9.ads (Str_Concat_9): Adjust head comment.
(Str_Concat_Bounds_9): Delete.
* libgnat/s-conca9.adb (Str_Concat_9): Use the length of the last
input to size the last assignment.
(Str_Concat_Bounds_9): Delete.
This patch renames Next and Previous in a-convec.ads and other
containers to be _Next and _Previous, to avoid namespace pollution. The
compiler now uses the leading-underscore names to look them up.
The scanner is changed to allow this.
gcc/ada/
* exp_ch5.adb (Expand_Iterator_Loop_Over_Array): Use _Next and
_Previous in the optimized expansion of "for ... of". No longer
need to check parameter profiles for these, because the
leading-underscore names are unique.
* libgnat/a-convec.ads (_Next, _Previous): Renamings of Next and
Previous, to avoid namespace pollution.
* libgnat/a-cbdlli.ads, libgnat/a-cbhama.ads,
libgnat/a-cbhase.ads, libgnat/a-cbmutr.ads,
libgnat/a-cborma.ads, libgnat/a-cborse.ads,
libgnat/a-cdlili.ads, libgnat/a-cidlli.ads,
libgnat/a-cihama.ads, libgnat/a-cihase.ads,
libgnat/a-cimutr.ads, libgnat/a-ciorma.ads,
libgnat/a-ciorse.ads, libgnat/a-cobove.ads,
libgnat/a-cohama.ads, libgnat/a-cohase.ads,
libgnat/a-coinve.ads, libgnat/a-comutr.ads,
libgnat/a-coorma.ads, libgnat/a-coorse.ads: Likewise. Also,
remove duplicated comments -- refer to one comment about _Next,
_Previous, Pseudo_Reference in libgnat/a-convec.ads. DRY.
* scng.adb (Scan): Allow leading underscores in identifiers in
the run-time library.
* snames.ads-tmpl (Name_uNext, Name_uPrevious): New names with
leading underscores.
GNAT was already warning about unreachable code after raise/goto/exit
statements, but not after calls to procedures with No_Return. Now this
warning is extended.
Also, previously the warning was suppressed for unreachable RETURN after
RAISE statements. Now this suppression is narrowed to functions, because
only in function such a RETURN statement might be indeed needed (where
it is the only RETURN statement of a function).
gcc/ada/
* sem_ch5.adb (Check_Unreachable_Code): Extend suppression to
calls with No_Return aspect, but narrow it to functions.
* sem_res.adb (Resolve_Call): Warn about unreachable code after
calls with No_Return.
This patch removes some obsolete code in the scanner and related files,
and corrects some comments. Tok_Special is used only by the
preprocessor, and uses only the two characters '#' and '$'.
It might be simpler to have a single flag indicating we're scanning for
preprocessing, instead of the Special_Characters array and the
End_Of_Line_Is_Token flag, but that's for another day.
gcc/ada/
* scans.ads: Fix obsolete comments about Tok_Special, and give
Special_Character a predicate assuring it is one of the two
characters used in preprocessing.
* scng.ads: Clean up comments.
* scng.adb: Clean up handling of Tok_Special. Remove comment
about '@' (target_name), which doesn't seem very helpful.
Set_Special_Character will now blow up if given anything other
than '#' and '$', because of the predicate on Special_Character;
it's not clear why it used to say "when others => null;".
Remove Comment_Is_Token, which is not used.
* scn.ads: Remove commented-out use clause. Remove redundant
comment.
* ali-util.adb: Use "is null" for do-nothing procedures.
* gprep.adb (Post_Scan): Use "is null".
This patch fixes a bug in which if the environment task has a specific
termination handler, and that handler raises an exception, the handler
is called recursively, causing infinite recursion. The RM requires such
exceptions to be ignored.
gcc/ada/
* libgnarl/s-solita.adb (Task_Termination_Handler_T): Ignore all
exceptions propagated by Specific_Handler.
* libgnarl/s-tassta.adb, libgnarl/s-taskin.ads: Minor.
While doing Preanalysis (as is the case during ghost code handling),
some range and/or overflow checks can be saved (see Saved_Checks in
checks.adb) and later one omitted as they would be redundant (see
Find_Check in checks.adb). In the case of ghost code, the node being
Preanalyzed is a temporary copy that is discarded, so its corresponding
check is not expanded later. The node that gets expanded later is not
having any checks expanded as it is wrongly assumed it has already been
done before.
As is already the case in Preanalyze_And_Resolve, this change suppresses
all checks during Preanalyze except for GNATprove mode.
gcc/ada/
* sem.adb (Preanalyze): Suppress checks when not in GNATprove
mode.
* sem_res.adb (Preanalyze_And_Resolve): Add cross reference in
comment to above procedure.
* sinfo.ads: Typo fix in comment.
Before this patch, the only formal doubly linked lists were bounded and
definite. This means that it is necessary to provide their maximum
length or capacity at instantiation and that they can only be used with
definite element types.
The formal lists added by this patch are unbounded and indefinite.
Their length grows dynamically until Count_Type'Last. This makes them
easier to use but requires the use of dynamic allocation and controlled
types.
gcc/ada/
* libgnat/a-cfidll.adb, libgnat/a-cfidll.ads: Implementation
files of the formal unbounded indefinite list.
* Makefile.rtl, impunit.adb: Take into account the add of the
new files.
It is safe to call Is_Access_Variable without calling
Is_Access_Object_Type before. Compiler cleanup only; semantics is
unaffected.
gcc/ada/
* sem_util.adb (Is_Variable): Remove excessive guard.
aarch64_builtin_vectorized_function handles some built-in functions
that already have equivalent internal functions. This seems to be
redundant now, since the target builtins that it chooses are mapped
to the same optab patterns as the internal functions.
gcc/
* config/aarch64/aarch64-builtins.cc
(aarch64_builtin_vectorized_function): Remove handling of
floor, ceil, trunc, round, nearbyint, sqrt, clz and ctz.
gcc/testsuite/
* gcc.target/aarch64/vect_unary_1.c: New test.
Running the testsuite on a toolchain build with --enable-default-pie
had some unexpected fails. Adjust the tests to tolerate the effects
of this configuration option on x86_64-linux-gnu and i686-linux-gnu.
The cet-sjlj* tests get offsets before the base symbol name with PIC
or PIE. A single pattern covering both alternatives somehow triggered
two matches rather than the single expected match, thus my narrowing
the '.*' to not skip line breaks, but that was not enough. Still
puzzled, I separated the patterns into nonpic and !nonpic, and we get
the expected matchcounts this way.
Tests for -mfentry require an mfentry effective target, which excludes
32-bit x86 with PIC or PIE enabled, that's why the patterns that
accept the PIC sym@RELOC annotations only cover x86_64. mvc7 is
getting regexps extended to cover PIC reloc annotatios and all of the
named variants, and tightened to avoid unexpected '.' matches.
The pr24414 test stores in an unadorned named variable in an old-style
asm statement, to check that such asm statements get an implicit
memory clobber. Rewriting the asm into a GCC extended asm with the
variable as an output would remove the regression it checks against.
Problem is, the literal reference to the variable is not PIC, so it's
rejected by the elf64 linker with an error, and flagged with a warning
by the elf32 one. We could presumably make the variable references
PIC-friendly with #ifdefs, but I doubt that's worth the trouble. I'm
just arranging for the test to be skipped if PIC or PIE are enabled by
default.
for gcc/testsuite/ChangeLog
* gcc.target/i386/cet-sjlj-6a.c: Cope with --enable-default-pie.
* gcc.target/i386/cet-sjlj-6b.c: Likewise.
* gcc.target/i386/fentryname3.c: Likewise.
* gcc.target/i386/mvc7.c: Likewise.
* gcc.target/i386/pr24414.c: Likewise.
* gcc.target/i386/pr93492-3.c: Likewise.
* gcc.target/i386/pr93492-5.c: Likewise.
* gcc.target/i386/pr98482-1.c: Likewise.
Contrary to gomp_{error,warning,fatal}, no tailing '\n' is added with
gomp_debug; only affected was a 'requires'-related output.
libgomp/ChangeLog:
* target.c (gomp_target_init): Added tailing '\n' to gomp_debug.
For any typedef-name or template parameter, T, add_const_t<T> is
equivalent to T const, so we can avoid instantiating the std::add_const
class template and just say T const (or const T).
This isn't true for a non-typedef like int&, where int& const would be
ill-formed, but we shouldn't be using add_const_t<int&> anyway, because
we know what that type is.
The only place we need to continue using std::add_const is in the
std::bind implementation where it's used as a template template
parameter to be applied as a metafunction elsewhere.
libstdc++-v3/ChangeLog:
* include/bits/stl_iterator.h (__iter_to_alloc_t): Replace
add_const_t with const-qualifier.
* include/bits/utility.h (tuple_element<N, cv T>): Likewise for
all cv-qualifiers.
* include/std/type_traits (add_const, add_volatile): Replace
typedef-declaration with using-declaration.
(add_cv): Replace add_const and add_volatile with cv-qualifiers.
* include/std/variant (variant_alternative<N, cv T>): Replace
add_const_t, add_volatile_t and add_cv_t etc. with cv-qualifiers.
Fix-up for recent commit 06b2a2abe2
"Enhance '_Pragma' diagnostics verification in OMP C/C++ test cases".
Supposedly it's the same issue as in
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101551#c2>, where I'd
noted that:
| [...] with an offloading-enabled build of GCC we're losing
| "note: in expansion of macro '[...]'" diagnostics.
| (Effectively '-ftrack-macro-expansion=0'?)
PR middle-end/101551
libgomp/
* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: XFAIL
'offloading_enabled' diagnostics issue.
range_from_dom makes a recursive call to resolve the immediate dominator
when there are multiple incoming edges to a block. This is not necessary
when the dominator already has an on-entry cache value.
PR tree-optimization/106234
* gimple-range-cache.cc (ranger_cache::range_from_dom): Check dominator
cache value before recursively resolving it.
This patch upgrades x86_64's scalar-to-vector (STV) pass to more
aggressively transform 128-bit scalar TImode operations into vector
V1TImode operations performed on SSE registers. TImode functionality
already exists in STV, but only for move operations. This change
brings support for logical operations (AND, IOR, XOR, NOT and ANDN)
and comparisons.
The effect of these changes are conveniently demonstrated by the new
sse4_1-stv-5.c test case:
__int128 a[16];
__int128 b[16];
__int128 c[16];
void foo()
{
for (unsigned int i=0; i<16; i++)
a[i] = b[i] & ~c[i];
}
which when currently compiled on mainline wtih -O2 -msse4 produces:
foo: xorl %eax, %eax
.L2: movq c(%rax), %rsi
movq c+8(%rax), %rdi
addq $16, %rax
notq %rsi
notq %rdi
andq b-16(%rax), %rsi
andq b-8(%rax), %rdi
movq %rsi, a-16(%rax)
movq %rdi, a-8(%rax)
cmpq $256, %rax
jne .L2
ret
but with this patch now produces:
foo: xorl %eax, %eax
.L2: movdqa c(%rax), %xmm0
pandn b(%rax), %xmm0
addq $16, %rax
movaps %xmm0, a-16(%rax)
cmpq $256, %rax
jne .L2
ret
Technically, the STV pass is implemented by three C++ classes, a common
abstract base class "scalar_chain" that contains common functionality,
and two derived classes: general_scalar_chain (which handles SI and
DI modes) and timode_scalar_chain (which handles TI modes). As
mentioned previously, because only TI mode moves were handled the
two worker classes behaved significantly differently. These changes
bring the functionality of these two classes closer together, which
is reflected by refactoring more shared code from general_scalar_chain
to the parent scalar_chain and reusing it from timode_scalar_chain.
There still remain significant differences (and simplifications) so
the existing division of classes (as specializations) continues to
make sense.
2022-07-11 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-features.h (scalar_chain): Add fields
insns_conv, n_sse_to_integer and n_integer_to_sse to this
parent class, moved from general_scalar_chain.
(scalar_chain::convert_compare): Protected method moved
from general_scalar_chain.
(mark_dual_mode_def): Make protected, not private virtual.
(scalar_chain:convert_op): New private virtual method.
(general_scalar_chain::general_scalar_chain): Simplify constructor.
(general_scalar_chain::~general_scalar_chain): Delete destructor.
(general_scalar_chain): Move insns_conv, n_sse_to_integer and
n_integer_to_sse fields to parent class, scalar_chain.
(general_scalar_chain::mark_dual_mode_def): Delete prototype.
(general_scalar_chain::convert_compare): Delete prototype.
(timode_scalar_chain::compute_convert_gain): Remove simplistic
implementation, convert to a method prototype.
(timode_scalar_chain::mark_dual_mode_def): Delete prototype.
(timode_scalar_chain::convert_op): Prototype new virtual method.
* config/i386/i386-features.cc (scalar_chain::scalar_chain):
Allocate insns_conv and initialize n_sse_to_integer and
n_integer_to_sse fields in constructor.
(scalar_chain::scalar_chain): Free insns_conv in destructor.
(general_scalar_chain::general_scalar_chain): Delete
constructor, now defined in the class declaration.
(general_scalar_chain::~general_scalar_chain): Delete destructor.
(scalar_chain::mark_dual_mode_def): Renamed from
general_scalar_chain::mark_dual_mode_def.
(timode_scalar_chain::mark_dual_mode_def): Delete.
(scalar_chain::convert_compare): Renamed from
general_scalar_chain::convert_compare.
(timode_scalar_chain::compute_convert_gain): New method to
determine the gain from converting a TImode chain to V1TImode.
(timode_scalar_chain::convert_op): New method to convert an
operand from TImode to V1TImode.
(timode_scalar_chain::convert_insn) <case REG>: Only PUT_MODE
on REG_EQUAL notes that were originally TImode (not CONST_INT).
Handle AND, ANDN, XOR, IOR, NOT and COMPARE.
(timode_mem_p): Helper predicate to check where operand is
memory reference with sufficient alignment for TImode STV.
(timode_scalar_to_vector_candidate_p): Use convertible_comparison_p
to check whether COMPARE is convertible. Handle SET_DESTs that
that are REG_P or MEM_P and SET_SRCs that are REG, CONST_INT,
CONST_WIDE_INT, MEM, AND, ANDN, IOR, XOR or NOT.
gcc/testsuite/ChangeLog
* gcc.target/i386/sse4_1-stv-2.c: New test case, pand.
* gcc.target/i386/sse4_1-stv-3.c: New test case, por.
* gcc.target/i386/sse4_1-stv-4.c: New test case, pxor.
* gcc.target/i386/sse4_1-stv-5.c: New test case, pandn.
* gcc.target/i386/sse4_1-stv-6.c: New test case, ptest.
In g:76c3041b856cb0 I'd removed a "C ? optab_vector : optab_mixed_sign"
argument from a call to directly_supported_p, thinking that the argument
only existed because of the condition (which I was removing). But the
difference between the scalar and vector forms matters for shifts,
so we do still need the argument.
gcc/
PR tree-optimization/106250
* tree-vect-loop.cc (vectorizable_reduction): Reinstate final
argument to directly_supported_p.
In r13-1544, handle_pragma_diagnostic was refactored to support processing
early pragmas. During that process the part looking up option arguments was
inadvertenly moved too early, prior to checking the option was valid, causing
PR106252. Fixed by moving the check back where it goes.
gcc/c-family/ChangeLog:
PR preprocessor/106252
* c-pragma.cc (handle_pragma_diagnostic_impl): Don't look up the
option argument prior to verifying the option was found.
When working on a smaller region like a loop version copy the main
time spent is now dominance fast query recompute which does a full
function DFS walk. The dominance queries within the region of
interest should be O(log n) without fast queries and we should do
on the order of O(n) of them which overall means reasonable
complexity.
For the artificial testcase I'm looking at this shaves off
considerable time again.
* tree-into-ssa.cc (update_ssa): Do not forcefully
re-compute dominance fast queries for TODO_update_ssa_no_phi.
The following fixes the last commit to honor the case we are not
vectorizing a loop.
PR tree-optimization/106228
* tree-vect-data-refs.cc (vect_setup_realignment): Adjust
VUSE compute for the non-loop case.
When we do TODO_update_ssa_no_phi we already avoid computing
dominance frontiers for all blocks - it is worth to also avoid
walking all dominated blocks in the update domwalk and restrict
the walk to the SEME region with the affected blocks. We can
do that by walking the CFG in reverse from blocks_to_update to
the common immediate dominator, marking blocks in the region
and telling the domwalk to STOP when leaving it.
For an artificial testcase with N adjacent loops with one
unswitching opportunity that takes the incremental SSA updating
off the -ftime-report radar:
tree loop unswitching : 11.25 ( 3%) 0.09 ( 5%) 11.53 ( 3%) 36M ( 9%)
`- tree SSA incremental : 35.74 ( 9%) 0.07 ( 4%) 36.65 ( 9%) 2734k ( 1%)
improves to
tree loop unswitching : 10.21 ( 3%) 0.05 ( 3%) 11.50 ( 3%) 36M ( 9%)
`- tree SSA incremental : 0.66 ( 0%) 0.02 ( 1%) 0.49 ( 0%) 2734k ( 1%)
for less localized updates the SEME region isn't likely constrained
enough so I've restricted the extra work to TODO_update_ssa_no_phi
callers.
* tree-into-ssa.cc (rewrite_mode::REWRITE_UPDATE_REGION): New.
(rewrite_update_dom_walker::rewrite_update_dom_walker): Update.
(rewrite_update_dom_walker::m_in_region_flag): New.
(rewrite_update_dom_walker::before_dom_children): If the region
to update is marked, STOP at exits.
(rewrite_blocks): For REWRITE_UPDATE_REGION mark the region
to be updated.
(dump_update_ssa): Use bitmap_empty_p.
(update_ssa): Likewise. Use REWRITE_UPDATE_REGION when
TODO_update_ssa_no_phi.
* tree-cfgcleanup.cc (cleanup_tree_cfg_noloop): Account
pending update_ssa to the caller.
The following avoids the need to massage the target optimization
node at WPA time when we fixup the optimization node, copying
FP related flags from callee to caller. The target is already
set up to fixup, but that only works when not switching between
functions. After fixing that the fixup is then done at LTRANS
time when materializing the function.
2022-07-01 Richard Biener <rguenthert@suse.de>
PR target/105459
* config/i386/i386-options.cc (ix86_set_current_function):
Rebuild the target optimization node whenever necessary,
not only when the optimization node didn't change.
* gcc.dg/lto/pr105459_0.c: New testcase.