r12-136 made us canonicalize an object/offset pair with negative offset
into one with a nonnegative offset, by iteratively absorbing the
innermost component into the offset and stopping as soon as the offset
becomes nonnegative.
This patch strengthens this transformation by making it keep on absorbing
even if the offset is already 0 as long as the innermost component is at
position 0 (and thus absorbing doesn't change the offset). This lets us
accept the two constexpr testcases below, which we'd previously reject
essentially because cxx_fold_indirect_ref would be unable to resolve
*(B*)&b.D123 (where D123 is the base A subobject at position 0) to just b.
PR c++/103879
gcc/cp/ChangeLog:
* constexpr.c (cxx_fold_indirect_ref): Split out object/offset
canonicalization step into a local lambda. Strengthen it to
absorb more components at position 0. Use it before both calls
to cxx_fold_indirect_ref_1.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1y/constexpr-base2.C: New test.
* g++.dg/cpp1y/constexpr-base2a.C: New test.
Here we're rejecting the calls to g1 and g2 as ambiguous even though one
overload is more constrained than the other (and they're otherwise tied),
because the implicit 'this' parameter of the non-static overload causes
cand_parms_match to think the function parameter lists aren't equivalent.
This patch fixes this by making cand_parms_match skip over 'this'
appropriately. Note that this bug only affects partial ordering of
non-template member functions because for member function templates
more_specialized_fn seems to already skip over 'this' appropriately.
PR c++/103783
gcc/cp/ChangeLog:
* call.c (cand_parms_match): Skip over 'this' when given one
static and one non-static member function.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-memfun2.C: New test.
Immediate functions should never be emitted into assembly, the FE doesn't
genericize them and does various things to ensure they aren't gimplified.
But the following testcase ICEs anyway due to that, because the consteval
function returns a lambda, and operator() of the lambda has
decl_function_context of the consteval function. cgraphunit.c then
does:
/* Preserve a functions function context node. It will
later be needed to output debug info. */
if (tree fn = decl_function_context (decl))
{
cgraph_node *origin_node = cgraph_node::get_create (fn);
enqueue_node (origin_node);
}
which enqueues the immediate function and then tries to gimplify it,
which results in ICE because it hasn't been genericized.
When I try similar testcase with constexpr instead of consteval and
static constinit auto instead of auto in main, what happens is that
the functions are gimplified, later ipa.c discovers they aren't reachable
and sets body_removed to true for them (and clears other flags) and we end
up with a debug info which has the foo and bar functions without
DW_AT_low_pc and other code specific attributes, just stuff from its BLOCK
structure and in there the lambda with DW_AT_low_pc etc.
The following patch attempts to emulate that behavior early, so that cgraph
doesn't try to gimplify those and pretends they were already gimplified
and found unused and optimized away.
2022-01-10 Jakub Jelinek <jakub@redhat.com>
PR c++/103912
* semantics.c (expand_or_defer_fn): For immediate functions, set
node->body_removed to true and clear analyzed, definition and
force_output.
* decl2.c (c_parse_final_cleanups): Ignore immediate functions for
expand_or_defer_fn.
* g++.dg/cpp2a/consteval26.C: New test.
Currently, expand_vector_condition detects only vcondMN and vconduMN
named RTX patterns. Teach it to also consider vec_cmpMN and vec_cmpuMN
RTX patterns when all ones vector is returned for true and all zeros vector
is returned for false.
2022-01-10 Richard Biener <rguenther@suse.de>
gcc/ChangeLog:
PR tree-optimization/103948
* tree-vect-generic.c (expand_vector_condition): Return true if
all ones vector is returned for true, all zeros vector for false
and the target defines corresponding vec_cmp{,u}MN named RTX pattern.
Power10 ISA added `xxblendv*` instructions which are realized in the
`vec_blendv` instrinsic.
Use `vec_blendv` for `_mm_blendv_epi8`, `_mm_blendv_ps`, and
`_mm_blendv_pd` compatibility intrinsics, when `_ARCH_PWR10`.
Update original implementation of _mm_blendv_epi8 to use signed types,
to better match the function parameters. Realization is unchanged.
Also, copy a test from i386 for testing `_mm_blendv_ps`.
This should have come with commit ed04cf6d73,
but was inadvertently omitted.
2022-01-10 Paul A. Clarke <pc@us.ibm.com>
gcc
* config/rs6000/smmintrin.h (_mm_blendv_epi8): Use vec_blendv
when _ARCH_PWR10. Use signed types.
(_mm_blendv_ps): Use vec_blendv when _ARCH_PWR10.
(_mm_blendv_pd): Likewise.
gcc/testsuite
* gcc.target/powerpc/sse4_1-blendvps.c: Copy from gcc.target/i386,
adjust dg directives to suit.
gcc/ChangeLog:
* tree-vectorizer.c (better_epilogue_loop_than_p): Round factors up for
epilogue costing.
* tree-vect-loop.c (vect_analyze_loop): Re-analyze all modes for
epilogues, unless we are guaranteed that we can't have partial vectors.
* genopinit.c: (partial_vectors_supported): Generate new function.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/masked_epilogue.c: New test.
2022-01-10 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/103366
* trans-expr.c (gfc_conv_gfc_desc_to_cfi_desc): Allow unlimited
polymorphic actual argument passed to assumed type formal.
gcc/testsuite/
PR fortran/103366
* gfortran.dg/pr103366.f90: New test.
For zero-width bitfields current GCC classify_argument does:
if (DECL_BIT_FIELD (field))
{
for (i = (int_bit_position (field)
+ (bit_offset % 64)) / 8 / 8;
i < ((int_bit_position (field) + (bit_offset % 64))
+ tree_to_shwi (DECL_SIZE (field))
+ 63) / 8 / 8; i++)
classes[i]
= merge_classes (X86_64_INTEGER_CLASS, classes[i]);
}
which I think means that if the zero-width bitfields are at bit-positions
(in the toplevel aggregate) which are multiples of 64 bits doesn't do
anything, (int_bit_position (field) + (bit_offset % 64)) / 64 and
(int_bit_position (field) + (bit_offset % 64) + 63) / 64 should be equal.
But for zero-width bitfields at other bit positions it will call
merge_classes once. Now, the typical case is that the zero width bitfield
is surrounded by some bitfields and in that case, it doesn't change
anything, but it can be sandwitched in between floats too as the testcases
show.
In C we had this behavior, in C++ previously the FE was removing the
zero-width bitfields and therefore they were ignored.
LLVM and ICC seems to ignore those bitfields both in C and C++ (== passing
struct S in SSE register rather than in GPR).
The x86-64 psABI has been recently clarified by
1aa4398d26
that zero width bitfield should be always ignored.
This patch implements that and emits a warning for C for cases where the ABI
changed from GCC 11.
2022-01-10 Jakub Jelinek <jakub@redhat.com>
PR target/102024
* config/i386/i386.c (classify_argument): Add zero_width_bitfields
argument, when seeing DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD bitfields,
always ignore them, when seeing other zero sized bitfields, either
set zero_width_bitfields to 1 and ignore it or if equal to 2 process
it. Pass it to recursive calls. Add wrapper
with old arguments and diagnose ABI differences for C structures
with zero width bitfields. Formatting fixes.
* gcc.target/i386/pr102024.c: New test.
* g++.target/i386/pr102024.C: New test.
This patch looks for allocno conflicts of the following form:
- One allocno (X) is a cap allocno for some non-cap allocno X2.
- X2 belongs to some loop L2.
- The other allocno (Y) is a non-cap allocno.
- Y is an ancestor of some allocno Y2 in L2.
- Y2 is not referenced in L2 (that is, ALLOCNO_NREFS (Y2) == 0).
- Y can use a different allocation from Y2.
In this case, Y's register is live across L2 but is not used within it,
whereas X's register is used only within L2. The conflict is therefore
only "soft", in that it can easily be avoided by spilling Y2 inside L2
without affecting any insn references.
In principle we could do this for ALLOCNO_NREFS (Y2) != 0 too, with the
callers then taking Y2's ALLOCNO_MEMORY_COST into account. There would
then be no "cliff edge" between a Y2 that has no references and a Y2 that
has (say) a single cold reference.
However, doing that isn't necessary for the PR and seems to give
variable results in practice. (fotonik3d_r improves slightly but
namd_r regresses slightly.) It therefore seemed better to start
with the higher-value zero-reference case and see how things go.
On top of the previous patches in the series, this fixes the exchange2
regression seen in GCC 11.
gcc/
PR rtl-optimization/98782
* ira-int.h (ira_soft_conflict): Declare.
* ira-color.c (max_soft_conflict_loop_depth): New constant.
(ira_soft_conflict): New function.
(spill_soft_conflicts): Likewise.
(assign_hard_reg): Use them to handle the case described by
the comment above ira_soft_conflict.
(improve_allocation): Likewise.
* ira.c (check_allocation): Allow allocnos with "soft" conflicts
to share the same register.
gcc/testsuite/
* gcc.target/aarch64/reg-alloc-4.c: New test.
If an allocno A in an inner loop L spans a call, a parent allocno AP
can choose to handle a call-clobbered/caller-saved hard register R
in one of two ways:
(1) save R before each call in L and restore R after each call
(2) spill R to memory throughout L
(2) can be cheaper than (1) in some cases, particularly if L does
not reference A.
Before the patch we always did (1). The patch adds support for
picking (2) instead, when it seems cheaper. It builds on the
earlier support for not propagating conflicts to parent allocnos.
gcc/
PR rtl-optimization/98782
* ira-int.h (ira_caller_save_cost): New function.
(ira_caller_save_loop_spill_p): Likewise.
* ira-build.c (ira_propagate_hard_reg_costs): Test whether it is
cheaper to spill a call-clobbered register throughout a loop rather
than spill it around each individual call. If so, treat all
call-clobbered registers as conflicts and...
(propagate_allocno_info): ...do not propagate call information
from the child to the parent.
* ira-color.c (move_spill_restore): Update accordingly.
* ira-costs.c (ira_tune_allocno_costs): Use ira_caller_save_cost.
gcc/testsuite/
* gcc.target/aarch64/reg-alloc-3.c: New test.
Suppose that:
- an inner loop L contains an allocno A
- L clobbers hard register R while A is live
- A's parent allocno is AP
Previously, propagate_allocno_info would propagate conflict sets up the
loop tree, so that the conflict between A and R would become a conflict
between AP and R (and so on for ancestors of AP).
However, when IRA treats loops as separate allocation regions, it can
decide on a loop-by-loop basis whether to allocate a register or spill
to memory. Conflicts in inner loops therefore don't need to become
hard conflicts in parent loops. Instead we can record that using the
“conflicting” registers for the parent allocnos has a higher cost.
In the example above, this higher cost is the sum of:
- the cost of saving R on entry to L
- the cost of keeping the pseudo register in memory throughout L
- the cost of reloading R on exit from L
This value is also a cap on the hard register cost that A can contribute
to AP in general (not just for conflicts). Whatever allocation we pick
for AP, there is always the option of spilling that register to memory
throughout L, so the cost to A of allocating a register to AP can't be
more than the cost of spilling A.
To take an extreme example: if allocating a register R2 to A is more
expensive than spilling A to memory, ALLOCNO_HARD_REG_COSTS (A)[R2]
could be (say) 2 times greater than ALLOCNO_MEMORY_COST (A) or 100
times greater than ALLOCNO_MEMORY_COST (A). But this scale factor
doesn't matter to AP. All that matters is that R2 is more expensive
than memory for A, so that allocating R2 to AP should be costed as
spilling A to memory (again assuming that A and AP are in different
allocation regions). Propagating a factor of 100 would distort the
register costs for AP.
move_spill_restore tries to undo the propagation done by
propagate_allocno_info, so we need some extra processing there.
gcc/
PR rtl-optimization/98782
* ira-int.h (ira_allocno::might_conflict_with_parent_p): New field.
(ALLOCNO_MIGHT_CONFLICT_WITH_PARENT_P): New macro.
(ira_single_region_allocno_p): New function.
(ira_total_conflict_hard_regs): Likewise.
* ira-build.c (ira_create_allocno): Initialize
ALLOCNO_MIGHT_CONFLICT_WITH_PARENT_P.
(ira_propagate_hard_reg_costs): New function.
(propagate_allocno_info): Use it. Try to avoid propagating
hard register conflicts to parent allocnos if we can handle
the conflicts by spilling instead. Limit the propagated
register costs to the cost of spilling throughout the child loop.
* ira-color.c (color_pass): Use ira_single_region_allocno_p to
test whether a child and parent allocno can share the same
register.
(move_spill_restore): Adjust for the new behavior of
propagate_allocno_info.
gcc/testsuite/
* gcc.target/aarch64/reg-alloc-2.c: New test.
color_pass has two instances of the same code for propagating non-cap
assignments from parent loops to subloops. This patch adds a helper
function for testing when such propagations are required for correctness
and uses it to remove the duplicated code.
A later patch will use this in ira-build.c too, which is why the
function is exported to ira-int.h.
No functional change intended.
gcc/
PR rtl-optimization/98782
* ira-int.h (ira_subloop_allocnos_can_differ_p): New function,
extracted from...
* ira-color.c (color_pass): ...here.
This patch adds comments to describe each use of ira_loop_border_costs.
I think this highlights that move_spill_restore was using the wrong cost
in one case, which came from tranposing [0] and [1] in the original
(pre-ira_loop_border_costs) ira_memory_move_cost expressions. The
difference would only be noticeable on targets that distinguish between
load and store costs.
gcc/
PR rtl-optimization/98782
* ira-color.c (color_pass): Add comments to describe the spill costs.
(move_spill_restore): Likewise. Fix reversed calculation.
The final index into (ira_)memory_move_cost is 1 for loads and
0 for stores. Thus the combination:
entry_freq * memory_cost[1] + exit_freq * memory_cost[0]
is the cost of loading a register on entry to a loop and
storing it back on exit from the loop. This is the cost to
use if the register is successfully allocated within the
loop but is spilled in the parent loop. Similarly:
entry_freq * memory_cost[0] + exit_freq * memory_cost[1]
is the cost of storing a register on entry to the loop and
restoring it on exit from the loop. This is the cost to
use if the register is spilled within the loop but is
successfully allocated in the parent loop.
The patch adds a helper class for calculating these values and
mechanically replaces the existing instances. There is no attempt to
editorialise the choice between using “spill inside” and “spill outside”
costs. (I think one of them is the wrong way round, but a later patch
deals with that.)
No functional change intended.
gcc/
PR rtl-optimization/98782
* ira-int.h (ira_loop_border_costs): New class.
* ira-color.c (ira_loop_border_costs::ira_loop_border_costs):
New constructor.
(calculate_allocno_spill_cost): Use ira_loop_border_costs.
(color_pass): Likewise.
(move_spill_restore): Likewise.
glibc strptime passes around some state, what fields in struct tm have been
set and what needs to be finalized through possibly recursive calls, and
at the end performs various finalizations, like applying %p so that it
works for both %I %p and %p %I orders, or applying century so that both
%C %y and %y %C works, or computation of missing fields from others
(e.g. from %Y and %j one can compute tm_mon, tm_mday and tm_wday,
from %Y %U %w, %Y %W %w, %Y %U %a, or %Y %W %w one can compute
tm_mon, tm_mday, tm_yday or e.g. from %Y %m %d one can compute tm_wday
and tm_yday.
As the finalization is quite large and doesn't need to be a template
(doesn't depend on any iterators or char types), I've put it into libstdc++,
and left some padding in the state struct, so that perhaps in the future we
can track some more state without changing ABI.
Unfortunately, there is an ugly problem that the standard mandates that
get method calls the do_get virtual method and I don't see how we can
cary on any state in between those calls (even if we did an ABI change
for the facets, the methods are const, so that I think multiple threads
could use the same time_get objects and we couldn't store state in there).
There is a hack for that for GCC (seems to work with ICC too, doesn't work
with clang++) if the do_get method isn't overriden we can pass the state
around.
For both do_get_year and per IRC discussions also for %y, the behavior is
if 1-2 digits are parsed, the year is treated according to POSIX 2008 %y
rules (0-68 is 2000-2068, 69-99 is 1969-1999), if 3-4 digits are parsed,
it is treated as %Y.
2022-01-10 Jakub Jelinek <jakub@redhat.com>
PR libstdc++/77760
* include/bits/locale_facets_nonio.h (__time_get_state): New struct.
(time_get::_M_extract_via_format): Declare new method with
__time_get_state& as an extra argument.
* include/bits/locale_facets_nonio.tcc (_M_extract_via_format): Add
__state argument, set various fields in it while parsing. Handle %j,
%U, %w and %W, fix up handling of %y, %Y and %C, don't adjust tm_hour
for %p immediately. Add a wrapper around the method without the
__state argument for backwards compatibility.
(_M_extract_num): Remove all __len == 4 special cases.
(time_get::do_get_time, time_get::do_get_date, time_get::do_get): Zero
initialize __state, pass it to _M_extract_via_format and finalize it
at the end.
(do_get_year): For 1-2 digit parsed years, map 0-68 to 2000-2068,
69-99 to 1969-1999. For 3-4 digit parsed years use that as year.
(get): If do_get isn't overloaded from the locale_facets_nonio.tcc
version, don't call do_get but call _M_extract_via_format instead to
pass around state.
* config/abi/pre/gnu.ver (GLIBCXX_3.4.30): Export _M_extract_via_format
with extra __time_get_state and __time_get_state::_M_finalize_state.
* src/c++98/locale_facets.cc (is_leap, day_of_the_week,
day_of_the_year): New functions in anon namespace.
(mon_yday): New var in anon namespace.
(__time_get_state::_M_finalize_state): Define.
* testsuite/22_locale/time_get/get/char/4.cc: New test.
* testsuite/22_locale/time_get/get/wchar_t/4.cc: New test.
* testsuite/22_locale/time_get/get_year/char/1.cc (test01): Parse 197
as year 197AD instead of error.
* testsuite/22_locale/time_get/get_year/char/5.cc (test01): Parse 1 as
year 2001 instead of error.
* testsuite/22_locale/time_get/get_year/char/6.cc: New test.
* testsuite/22_locale/time_get/get_year/wchar_t/1.cc (test01): Parse
197 as year 197AD instead of error.
* testsuite/22_locale/time_get/get_year/wchar_t/5.cc (test01): Parse
1 as year 2001 instead of error.
* testsuite/22_locale/time_get/get_year/wchar_t/6.cc: New test.
This fixes the --disable-hosted-libstdcxx build so that it works with
--without-headers. Currently you need to also use --with-newlib, which
is confusing for users who aren't actually using newlib.
The AM_PROG_LIBTOOL checks are currently skipped for --with-newlib and
--with-avrlibc builds, with this change they are also skipped when using
--without-headers. It would be nice if using --disable-hosted-libstdcxx
automatically skipped those checks, but GLIBCXX_ENABLE_HOSTED comes too
late to make the AM_PROG_LIBTOOL checks depend on $is_hosted.
The checks for EOF, SEEK_CUR etc. cause the build to fail if there is no
<stdio.h> available. Unlike most headers, which get a HAVE_FOO_H macro,
<stdio.h> is in autoconf's default includes, so every check tries to
include it unconditionally. This change skips those checks for
freestanding builds.
Similarly, the checks for <stdint.h> types done by GCC_HEADER_STDINT try
to include <stdio.h> and fail for --without-headers builds. This change
skips the use of GCC_HEADER_STDINT for freestanding. We can probably
stop using GCC_HEADER_STDINT entirely, since only one file uses the
gstdint.h header that is generated, and that could easily be changed to
use <stdint.h> instead. That can wait for stage 1.
We also need to skip the GLIBCXX_CROSSCONFIG stage if --without-headers
was used, since we don't have any of the functions it deals with.
The end result of the changes above is that it should not be necessary
for a --disable-hosted-libstdcxx --without-headers build to also use
--with-newlib.
Finally, compile libsupc++ with -ffreestanding when --without-headers is
used, so that <stdint.h> will use <gcc-stdint.h> instead of expecting it
to come from libc.
libstdc++-v3/ChangeLog:
PR libstdc++/103866
* acinclude.m4 (GLIBCXX_COMPUTE_STDIO_INTEGER_CONSTANTS): Do
nothing for freestanding builds.
(GLIBCXX_ENABLE_HOSTED): Define FREESTANDING_FLAGS.
* configure.ac: Do not use AC_LIBTOOL_DLOPEN when configured
with --without-headers. Do not use GCC_HEADER_STDINT for
freestanding builds.
* libsupc++/Makefile.am (HOSTED_CXXFLAGS): Use -ffreestanding
for freestanding builds.
* configure: Regenerate.
* Makefile.in: Regenerate.
* doc/Makefile.in: Regenerate.
* include/Makefile.in: Regenerate.
* libsupc++/Makefile.in: Regenerate.
* po/Makefile.in: Regenerate.
* python/Makefile.in: Regenerate.
* src/Makefile.in: Regenerate.
* src/c++11/Makefile.in: Regenerate.
* src/c++17/Makefile.in: Regenerate.
* src/c++20/Makefile.in: Regenerate.
* src/c++98/Makefile.in: Regenerate.
* src/filesystem/Makefile.in: Regenerate.
* testsuite/Makefile.in: Regenerate.
When building a build!=host compiler, the just-built gcc can't be used
to build the target libstdc++ (because it is built for the host triplet,
not the build triplet). The top-level configure.ac sets up the build
flags for libstdc++ (and other "raw_cxx" libs) like this:
GCC_TARGET_TOOL(c++ for libstdc++, RAW_CXX_FOR_TARGET, CXX,
[gcc/xgcc -shared-libgcc -B$$r/$(HOST_SUBDIR)/gcc -nostdinc++ -L$$r/$(TARGET_SUBDIR)/libstdc++-v3/src -L$$r/$(TARGET_SUBDIR)/libstdc++-v3/src/.libs -L$$r/$(TARGET_SUBDIR)/libstdc++-v3/libsupc++/.libs],
c++)
The -nostdinc++ flag is only used for the IN-TREE-TOOL, i.e. when using
the just-built gcc/xgcc compiler. This means that the cross-compiler
used to build libstdc++ will add its own libstdc++ headers to the
include path. That results in the #include <cfenv> in
src/c++17/floating_to_chars.cc and src/c++17/floating_from_chars.cc
doing #include_next <fenv.h> and finding the libstdc++ fenv.h wrapper
from the host compiler. Because that has the same include guard as the
<fenv.h> in the libstdc++ we're trying to build, we never reach the
underlying <fenv.h> from libc. That results in several errors of the
form:
error: 'fenv_t' has not been declared in '::'
The most correct fix would be to add -nostdinc++ to the
RAW_CXX_FOR_TARGET variable in configure.ac, or the
RAW_CXX_TARGET_EXPORTS variable in Makefile.tpl.
Another solution would be to make the libstdc++ <fenv.h> wrapper use
_GLIBCXX_INCLUDE_NEXT_C_HEADERS like our <stdlib.h> and other C header
wrappers.
For now though, the simplest and safest solution is to just add
-nostdinc++ to the CXXFLAGS used for src/c++17/*.cc, which is what this
does.
libstdc++-v3/ChangeLog:
PR libstdc++/100017
* src/c++17/Makefile.am (AM_CXXFLAGS): Add -nostdinc++.
* src/c++17/Makefile.in: Regenerate.
The PR uncovered that -freorder-blocks-and-partition was working by accident
on 64-bit Windows, i.e. the middle-end was supposed to disable it with SEH.
After the change installed on mainline, the middle-end properly disables it,
which is too bad since a significant amount of work went into it for SEH.
gcc/
PR target/103465
* coretypes.h (unwind_info_type): Swap UI_SEH and UI_TARGET.
We use the issignaling macro, present in some libc's (notably glibc),
when it is available. Compile all IEEE-related files in the library
(both C and Fortran sources) with -fsignaling-nans to ensure maximum
compatibility.
libgfortran/ChangeLog:
PR fortran/82207
* Makefile.am: Pass -fsignaling-nans for IEEE files.
* Makefile.in: Regenerate.
* ieee/ieee_helper.c: Use issignaling macro to recognized
signaling NaNs.
gcc/testsuite/ChangeLog:
PR fortran/82207
* gfortran.dg/ieee/signaling_1.f90: New test.
* gfortran.dg/ieee/signaling_1_c.c: New file.
This makes __builtin_shufflevector lowering force the result
of the BIT_FIELD_REF lowpart operation to a temporary as to
fulfil the IL verifier constraint that BIT_FIELD_REFs should
be always in outermost handled component position. Trying to
enforce this during gimplification isn't as straight-forward
as here where we know we're dealing with an rvalue.
FAIL: c-c++-common/torture/builtin-shufflevector-1.c -O0 execution test
2022-01-05 Richard Biener <rguenther@suse.de>
PR middle-end/101530
gcc/c-family/
* c-common.c (c_build_shufflevector): Wrap the BIT_FIELD_REF
in a TARGET_EXPR to force a temporary.
gcc/testsuite/
* c-c++-common/builtin-shufflevector-3.c: New testcase.
This fixes a mistake done with r8-5008 when introducing
allow_peel to the unroll code. The intent was to allow
peeling that doesn't grow code but the result was that
with -O3 and UL_ALL this wasn't done. The following
instantiates the desired effect by adjusting ul to UL_NO_GROWTH
if peeling is not allowed.
2022-01-05 Richard Biener <rguenther@suse.de>
PR tree-optimization/100359
* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely):
Allow non-growing peeling with !allow_peel and UL_ALL.
* gcc.dg/tree-ssa/pr100359.c: New testcase.
gcc/ada/
* gcc-interface/trans.c (Identifier_to_gnu): Use correct subtype.
(elaborate_profile): New function.
(Call_to_gnu): Call it on the formals and the result type before
retrieving the translated result type from the subprogram type.
gcc/ada/
* gcc-interface/decl.c (gnat_to_gnu_entity) <E_Record_Type>: Fix
computation of boolean result in the unchecked union case.
(components_to_record): Rename MAYBE_UNUSED parameter to IN_VARIANT
and remove local variable of the same name. Pass NULL recursively
as P_GNU_REP_LIST for nested variants in the unchecked union case.
gcc/ada/
* gcc-interface/trans.c (lvalue_required_p) <N_Pragma>: New case.
<N_Pragma_Argument_Association>: Likewise.
(Pragma_to_gnu) <Pragma_Inspection_Point>: Fetch the corresponding
variable of a constant before marking it as addressable.
gcc/ada/
* libgnat/s-atocou__builtin.adb (Decrement, Increment): Switch
from __sync to __atomic builtins; use 'Address to be consistent
with System.Atomic_Primitives.
gcc/ada/
* exp_pakd.adb (Install_PAT): Do not reset the alignment here.
* layout.adb (Layout_Type): Call Adjust_Esize_Alignment after having
copied the RM_Size onto the Esize when the latter is too small.
gcc/fortran/ChangeLog:
PR fortran/101762
* expr.c (gfc_check_pointer_assign): For pointer initialization
targets, check that subscripts and substring indices in
specifications are constant expressions.
gcc/testsuite/ChangeLog:
PR fortran/101762
* gfortran.dg/pr101762.f90: New test.
After PR97896 for which some code was added to ignore the KIND argument
of the INDEX intrinsics, and PR87711 for which that was extended to LEN_TRIM
as well, this propagates it further to MASKL, MASKR, SCAN and VERIFY.
PR fortran/103789
gcc/fortran/ChangeLog:
* trans-array.c (arg_evaluated_for_scalarization): Add MASKL, MASKR,
SCAN and VERIFY to the list of intrinsics whose KIND argument is to be
ignored.
gcc/testsuite/ChangeLog:
* gfortran.dg/maskl_1.f90: New test.
* gfortran.dg/maskr_1.f90: New test.
* gfortran.dg/scan_3.f90: New test.
* gfortran.dg/verify_3.f90: New test.
This patch improves the code generated when moving a 128-bit value
in TImode, represented by two 64-bit registers, to V1TImode, which
is a single SSE register.
Currently, the simple move:
typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16)));
uv1ti foo(__int128 x) { return (uv1ti)x; }
is always transferred via memory, as:
foo: movq %rdi, -24(%rsp)
movq %rsi, -16(%rsp)
movdqa -24(%rsp), %xmm0
ret
with this patch, we now generate (with -msse2):
foo: movq %rdi, %xmm1
movq %rsi, %xmm2
punpcklqdq %xmm2, %xmm1
movdqa %xmm1, %xmm0
ret
and with -mavx2:
foo: vmovq %rdi, %xmm1
vpinsrq $1, %rsi, %xmm1, %xmm0
ret
Even more dramatic is the improvement of zero extended transfers.
uv1ti bar(unsigned char c) { return (uv1ti)(__int128)c; }
Previously generated:
bar: movq $0, -16(%rsp)
movzbl %dil, %eax
movq %rax, -24(%rsp)
vmovdqa -24(%rsp), %xmm0
ret
Now generates:
bar: movzbl %dil, %edi
movq %rdi, %xmm0
ret
My first attempt at this functionality attempted to use a simple
define_split, but unfortunately, this triggers very late during the
compilation preventing some of the simplifications we'd like (in
combine). For example the foo case above becomes:
foo: movq %rsi, -16(%rsp)
movq %rdi, %xmm0
movhps -16(%rsp), %xmm0
transferring half directly, and the other half via memory.
And for the bar case above, GCC fails to appreciate that
movq/vmovq clears the high bits, resulting in:
bar: movzbl %dil, %eax
xorl %edx, %edx
vmovq %rax, %xmm1
vpinsrq $1, %rdx, %xmm1, %xmm0
ret
Hence the solution (i.e. this patch) is to add a special case
to ix86_expand_vector_move for TImode to V1TImode transfers.
2022-01-08 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-expand.c (ix86_expand_vector_move): Add
special case for TImode to V1TImode moves, going via V2DImode.
gcc/testsuite/ChangeLog
* gcc.target/i386/sse2-v1ti-mov-1.c: New test case.
* gcc.target/i386/sse2-v1ti-zext.c: New test case.
The match.pd address_comparison simplification can only handle
ADDR_EXPR comparisons possibly converted to some other type (I wonder
if we shouldn't restrict it in address_compare to casts to pointer
types or pointer-sized integer types, I think we shouldn't optimize
(short) (&var) == (short) (&var2) because we really don't know whether
it will be true or false). On GIMPLE, most of pointer to pointer
casts are useless and optimized away and further we have in
gimple_fold_stmt_to_constant_1 an optimization that folds
&something p+ const_int
into
&MEM_REF[..., off]
On GENERIC, we don't do that and e.g. for constant evaluation it
could be pretty harmful if e.g. such pointers are dereferenced, because
it can lose what exact field it was starting with etc., all it knows
is the base and offset, type and alias set.
Instead of teaching the match.pd address_compare about 3 extra variants
where one or both compared operands are pointer_plus, this patch attempts
to fold operands of comparisons similarly to gimple_fold_stmt_to_constant_1
before calling fold_binary on it.
There is another thing though, while we do have (x p+ y) p+ z to
x p+ (y + z) simplification which works on GIMPLE well because of the
useless pointer conversions, on GENERIC we can have pointer casts in between
and at that point we can end up with large expressions like
((type3) (((type2) ((type1) (&var + 2) + 2) + 2) + 2))
etc. Pointer-plus doesn't really care what exact pointer type it has as
long as it is a pointer, so the following match.pd simplification for
GENERIC only (it is useless for GIMPLE) also moves the cast so that nested
p+ can be simplified.
Note, I've noticed we don't really diagnose going out of bounds with
pointer_plus (unlike e.g. with ARRAY_REF) during constant evaluation, I
think another patch for cxx_eval_binary_expression with POINTER_PLUS will be
needed. But it isn't clear to me what exactly it should do in case of
subobjects. If we start with address of a whole var, (&var), I guess we
should diagnose if the pointer_plus gets before start of the var (i.e.
"negative") or 1 byte past the end of the var, but what if we start with
&var.field or &var.field[3] ? For &var.field, shall we diagnose out of
bounds of field (except perhaps flexible members?) or the whole var?
For ARRAY_REFs, I assume we must at least strip all the outer ARRAY_REFs
and so start with &var.field too, right?
2022-01-08 Jakub Jelinek <jakub@redhat.com>
PR c++/89074
gcc/
* match.pd ((ptr) (x p+ y) p+ z -> (ptr) (x p+ (y + z))): New GENERIC
simplification.
gcc/cp/
* constexpr.c (cxx_maybe_fold_addr_pointer_plus): New function.
(cxx_eval_binary_expression): Use it.
gcc/testsuite/
* g++.dg/cpp1y/constexpr-89074-2.C: New test.
* g++.dg/cpp1z/constexpr-89074-1.C: New test.
In the patch for PR92385 I added asserts to see if we tried to make a
vec_init of a vec_init, but didn't see any in regression testing. This
testcase is one case, which seems reasonable: we create a VEC_INIT_EXPR for
the aggregate initializer, and then again to express the actual
initialization of the member. We already do similar collapsing of
TARGET_EXPR. So let's just remove the asserts.
PR c++/103946
gcc/cp/ChangeLog:
* init.c (build_vec_init): Remove assert.
* tree.c (build_vec_init_expr): Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/nsdmi-array1.C: New test.
The standard says that a destroying operator delete is preferred, but that
only applies to the delete-expression, not the cleanup if a new-expression
initialization throws. As a result of this patch, several of the destroying
delete tests don't get EH cleanups, but I'm turning off the warning in cases
where the initialization can't throw anyway.
It's unclear what should happen if the class does not declare a non-deleting
operator delete; a proposal in CWG was to call the global delete, which
makes sense to me if the class doesn't declare its own operator new. If it
does, we warn and don't call any deallocation function if initialization
throws.
PR c++/100588
gcc/cp/ChangeLog:
* call.c (build_op_delete_call): Ignore destroying delete
if alloc_fn.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/destroying-delete5.C: Expect warning.
* g++.dg/cpp2a/destroying-delete6.C: New test.
gcc/analyzer/ChangeLog:
* engine.cc (impl_run_checkers): Pass logger to engine ctor.
* region-model-manager.cc
(region_model_manager::region_model_manager): Add logger param and
use it to initialize m_logger.
* region-model.cc (engine::engine): New.
* region-model.h (region_model_manager::region_model_manager):
Add logger param.
(region_model_manager::get_logger): New.
(region_model_manager::m_logger): New field.
(engine::engine): New.
* store.cc (store_manager::get_logger): New.
(store::set_value): Log scope. Log when marking a cluster as
unknown due to possible aliasing.
* store.h (store_manager::get_logger): New decl.