* ira-conflicts.c (print_hard_reg_set): Correct output for sets
including FIRST_PSEUDO_REGISTER - 1.
* ira-color.c (print_hard_reg_set): Ditto.
Before, for a target with FIRST_PSEUDO_REGISTER 20, you'd get "19-18"
for (1<<19). For (1<<18)|(1<<19), you'd get "18".
I was using ira-conflicts.c:print_hard_reg_set with a local
patch to gdbinit.in in a debug-session, and noticed the
erroneous output. I see there's an almost identical function in
ira-color.c and on top of that, there's another function by the
same name and with similar semantics in sel-sched-dump.c, but
the last one doesn't try to print ranges.
This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product
operations (vector/by element) to the ARM back-end.
These are:
usdot (vector), <us/su>dot (by element).
The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and
for ARM they remain optional after as of ARMv8.6-a.
The functions are declared in arm_neon.h, RTL patterns are defined to
generate assembler and tests are added to verify and perform adequate checks.
Regression testing on arm-none-eabi passed successfully.
gcc/ChangeLog:
2020-02-11 Stam Markianos-Wright <stam.markianos-wright@arm.com>
* config/arm/arm-builtins.c (enum arm_type_qualifiers):
(USTERNOP_QUALIFIERS): New define.
(USMAC_LANE_QUADTUP_QUALIFIERS): New define.
(SUMAC_LANE_QUADTUP_QUALIFIERS): New define.
(arm_expand_builtin_args): Add case ARG_BUILTIN_LANE_QUADTUP_INDEX.
(arm_expand_builtin_1): Add qualifier_lane_quadtup_index.
* config/arm/arm_neon.h (vusdot_s32): New.
(vusdot_lane_s32): New.
(vusdotq_lane_s32): New.
(vsudot_lane_s32): New.
(vsudotq_lane_s32): New.
* config/arm/arm_neon_builtins.def (usdot, usdot_lane,sudot_lane): New.
* config/arm/iterators.md (DOTPROD_I8MM): New.
(sup, opsuffix): Add <us/su>.
* config/arm/neon.md (neon_usdot, <us/su>dot_lane: New.
* config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New.
gcc/testsuite/ChangeLog:
2020-02-11 Stam Markianos-Wright <stam.markianos-wright@arm.com>
* gcc.target/arm/simd/vdot-2-1.c: New test.
* gcc.target/arm/simd/vdot-2-2.c: New test.
* gcc.target/arm/simd/vdot-2-3.c: New test.
* gcc.target/arm/simd/vdot-2-4.c: New test.
Constant evaluation of genericize_spaceship produced a CONSTRUCTOR, which we
then wanted to bind to a reference, which we can't do. So wrap the result
in a TARGET_EXPR so we get something with an address.
We also need to handle treating the result of cxx_eval_binary_expression as
a glvalue for SPACESHIP_EXPR.
My earlier change to add uid_sensitive to maybe_constant_value was wrong; we
don't even look at the cache when manifestly_const_eval, and I failed to
adjust the later call to cxx_eval_outermost_constant_expr.
gcc/cp/ChangeLog
2020-02-11 Jason Merrill <jason@redhat.com>
PR c++/93650
PR c++/90691
* constexpr.c (maybe_constant_value): Correct earlier change.
(cxx_eval_binary_expression) [SPACESHIP_EXPR]: Pass lval through.
* method.c (genericize_spaceship): Wrap result in TARGET_EXPR.
This patch fixes two issues with return type deduction in the presence of an
abbreviated function template.
The first issue (PR 69448) is that if a placeholder auto return type contains
any modifiers such as & or *, then the abbreviated function template
compensation in splice_late_return_type does not get performed for the
underlying auto node, leading to incorrect return type deduction. This happens
because splice_late_return_type does not consider that a placeholder auto return
type might have modifiers. To fix this it seems we need to look through
modifiers in the return type to obtain the location of the underlying auto node
in order to replace it with the adjusted auto node. To that end this patch
refactors the utility function find_type_usage to return a pointer to the
matched tree, and uses it to find and replace the underlying auto node.
The second issue (PR 80471) is that the AUTO_IS_DECLTYPE flag is not being
preserved in splice_late_return_type when compensating for an abbreviated
function template, leading to us treating a decltype(auto) return type as if it
was an auto return type. Fixed by making make_auto_1 set the AUTO_IS_DECLTYPE
flag whenever we're building a decltype(auto) node and adjusting callers
appropriately. The test for PR 80471 is adjusted to expect the correct
behavior.
gcc/cp/ChangeLog:
PR c++/69448
PR c++/80471
* type-utils.h (find_type_usage): Refactor to take a tree * and to
return a tree *, and update documentation accordingly.
* pt.c (make_auto_1): Set AUTO_IS_DECLTYPE when building a
decltype(auto) node.
(make_constrained_decltype_auto): No need to explicitly set
AUTO_IS_DECLTYPE anymore.
(splice_late_return_type): Use find_type_usage to find and
replace a possibly nested auto node instead of using is_auto.
Check test for is_auto into an assert when deciding whether
to late_return_type.
(type_uses_auto): Adjust the call to find_type_usage.
* parser.c (cp_parser_decltype): No need to explicitly set
AUTO_IS_DECLTYPE anymore.
libcc1/ChangeLog:
PR c++/69448
PR c++/80471
* libcp1plugin.cc (plugin_get_expr_type): No need to explicitly set
AUTO_IS_DECLTYPE anymore.
gcc/testsuite/ChangeLog:
PR c++/69448
PR c++/80471
* g++.dg/concepts/abbrev3.C: New test.
* g++.dg/cpp2a/concepts-pr80471.C: Adjust a static_assert to expect the
correct behavior.
* g++.dg/cpp0x/auto9.C: Adjust a dg-error directive.
This patch improves the pretty printing of standard concept definitions in error
messages. In particular, standard concepts are now printed qualified whenever
appropriate, and the "concept" specifier is printed only when the
TFF_DECL_SPECIFIERS flag is specified.
In the below test, the first error message changes from
9:15: error: ‘b’ was not declared in this scope; did you mean ‘concept b’?
to
9:15: error: ‘b’ was not declared in this scope; did you mean ‘a::b’?
gcc/cp/ChangeLog:
* error.c (dump_decl) [CONCEPT_DECL]: Use dump_simple_decl.
(dump_simple_decl): Handle standard concept definitions as well as
variable concept definitions.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts6.C: New test.
gcc/analyzer/ChangeLog:
PR analyzer/93647
* diagnostic-manager.cc
(diagnostic_manager::prune_for_sm_diagnostic): Bulletproof against
VAR being constant.
* region-model.cc (region_model::get_lvalue_1): Provide a better
error message when encountering an unhandled tree code.
gcc/testsuite/ChangeLog:
PR analyzer/93647
* gcc.dg/analyzer/torture/pr93647.c: New test.
As mentioned in the PR, for -mavx -mno-avx2 the backend does support
vcondv4div4df and vcondv8siv8sf optabs (while generally 32-byte vectors
aren't much supported in that case, it is performed using
vandps/vandnps/vorps). The problem is that after the last generic vector
lowering (where the VEC_COND_EXPR still compares two V4DF vectors and
has two V4DI last operands and V4DI result and so is considered ok) fre4
folds the condition into constant, at which point the middle-end during
expansion will try vcond_mask_optab and fall back to trying to expand it
as the constant vector < 0 vcondv4div4di, but neither of them is supported
for -mavx -mno-avx2 and thus we ICE.
So, the options I see is either what the following patch does, also support
vcond_mask_v4div4di and vcond_mask_v4siv4si already for TARGET_AVX, or
require for vcondv4div4df and vcondv8siv8sf TARGET_AVX2 rather than current
TARGET_AVX.
2020-02-10 Jakub Jelinek <jakub@redhat.com>
PR target/93637
* config/i386/sse.md (VI_256_AVX2): New mode iterator.
(vcond_mask_<mode><sseintvecmodelower>): Use it instead of VI_256.
Change condition from TARGET_AVX2 to TARGET_AVX.
* gcc.target/i386/avx-pr93637.c: New test.
PR analyzer/93405 reports an ICE with -fanalyzer when passing
a constant "by reference" in gfortran.
The issue is that the constant is passed as an ADDR_EXPR
of a CONST_DECL, and region_model::get_lvalue_1 doesn't
know how to handle CONST_DECL.
This patch implements it for CONST_DECL by providing
a placeholder region, holding the CONST_DECL's value,
fixing the ICE.
gcc/analyzer/ChangeLog:
PR analyzer/93405
* region-model.cc (region_model::get_lvalue_1): Implement
CONST_DECL.
gcc/testsuite/ChangeLog:
PR analyzer/93405
* gfortran.dg/analyzer/pr93405.f90: New test.
This patch adds a gfortran.dg/analyzer subdirectory with an analyzer.exp,
setting DEFAULT_FFLAGS on the tests run within it.
It also adds a couple of simple proof-of-concept tests of e.g. detecting
double-frees from gfortran.
gcc/testsuite/ChangeLog:
* gfortran.dg/analyzer/analyzer.exp: New subdirectory and .exp
suite.
* gfortran.dg/analyzer/malloc-example.f90: New test.
* gfortran.dg/analyzer/malloc.f90: New test.
The length used for the comparison for 'CFStringRef' was only comparing
for 'CFString', potentially allowing mismatched identifiers.
2020-02-10 Iain Sandoe <iain@sandoe.co.uk>
PR other/93641
* config/darwin-c.c (darwin_cfstring_ref_p): Fix up last
argument of strncmp.
PR fortran/83113
* array.c: Do not attempt to set the array spec for a submodule
function symbol (as it has already been set in the corresponding
module procedure interface).
* symbol.c: Do not reject duplicate POINTER, ALLOCATABLE, or
DIMENSION attributes in declarations of a submodule function.
* gfortran.h: Add a macro that tests for a module procedure in a
submodule.
* gfortran.dg/pr83113.f90: New test.
Random spotting. Exposes the missed benefit for delay-slot
filling of a splitter for indexed addressing mode (the [rN+M]
one). To be considered for common instructions and perhaps only
for suitable M; at least +-63 is obvious (when there's a register
available) as both the original and the add fit in delay-slots.
* gcc.target/cris/pr93372-2.c, gcc.target/cris/pr93372-5.c,
gcc.target/cris/pr93372-8.c: New tests.
These tests fails miserably both at being an example of cc0
eliminating compare instructions, and post-cc0-CRIS at showing a
significant improvement. They're here to track suboptimal
comparison code for CRIS.
This test was separated from the posted and approved patch named
"dbr: Filter-out TARGET_FLAGS_REGNUM from end_of_function_needs"
and applied: it doesn't fail yet. It differs from the posted
version in that function "g" is commented-out; see the added
comment.
* config/cris/cris.c (cris_reduce_compare): New function.
* config/cris/cris-protos.h (cris_reduce_compare): Add prototype.
* config/cris/cris.md ("cbranch<mode>4", "cbranchdi4", "cstoredi4")
(cstore<mode>4"): Apply cris_reduce_compare in expanders.
The decc0ration work of the CRIS port made me look closer at the
code for trivial comparisons, as in the condition for branches
and conditional-stores, like in:
void g(short int a, short int b)
{
short int c = a + b;
if (c >= 0)
foo ();
}
At -O2, the cc0 version of the CRIS port has an explicit
*uneliminated* compare instruction ("cmp.w -1,$r10") instead of
an (eliminated) compare against 0 (which below I'll call a
zero-compare). This for the CRIS-cc0 version, but I see this
also for a much older gcc, at 4.7. For the decc0rated port, the
compare *is* a test against 0, eventually eliminated. To wit,
for cc0 (mind the delay-slot):
_g:
subq 4,$sp
add.w $r11,$r10
cmp.w -1,$r10
ble .L9
move $srp,[$sp]
jsr _foo
.L9:
jump [$sp+]
The compare instruction is expected to be eliminated, i.e. the
following diff to the above is desired, modulo the missing
sibling call, which corresponds to what I get from 4.7 and for
the decc0rated port:
!--- a Wed Feb 5 15:22:27 2020
!+++ b Wed Feb 5 15:22:51 2020
!@@ -1,8 +1,7 @@
! _g:
! subq 4,$sp
! add.w $r11,$r10
!- cmp.w -1,$r10
!- ble .L9
!+ bmi .L9
! move $srp,[$sp]
!
! jsr _foo
Tracking this difference, I see that for both cc0-CRIS and the
decc0rated CRIS, the comparison actually starts out as a compare
against -1 at "expand" time, but is transformed for decc0rated
CRIS to a zero-compare in "cse1".
For CRIS-cc0 "cse1" does try to replace the compare with a
zero-compare, but fails because at the same time it tries to
replace the c operand with (a + b). Or some such; it fails and
no other pass succeeds. I was not into fixing cc0-handling in
core gcc, so I didn't look closer.
BTW, at first, I was a bit surprised to see that for compares
against a constant, a zero-compare is not canonical RTX for
*all* conditions, and that instead only a subset of all RTX
conditions against a constant are canonical, transforming one
condition to the canonical one by adding 1 or -1 to the
constant. It does makes sense at a closer look, but still not
so much when emitting RTL.
There are several places that mention in comments that emitting
RTX as zero-compare is preferable, but nothing is done about it.
Some generic code instead seems confused that the *target* is
helped by seeing canonical RTX, or perhaps it (its authors) like
me, confused about what a canonical comparison is. For example,
prepare_cmp_insn calls canonicalize_comparison last before
emitting the actual instructions. I see most ports for various
port-specific reasons does their own massaging in their cbranch
and cstore expanders. Still, the suboptimal compares *should*
be fixed at expand time; better start out right than just
relying on later optimizations.
This kind of change is not acceptable in the current gcc
development stage, at least as a change in generic code.
However, it's problematic enough that I chose to fix this right
now in the CRIS port. For that, I claim a possibly
long-standing regression. After this, code before and after
decc0ration is similar enough that I can spot
compare-elimination-efforts and apply regression test-cases
without them drowning in cc0-specific xfailing.
I hope to eventually lift out cris_reduce_compare (renamed) into
say expmed.c, called in e.g. emit_store_flag_1 (replacing the
in-line code) and prepare_cmp_insn. Later.
Linux CET kernel places a restore token on shadow stack for signal
handler to enhance security. The restore token is 8 byte and aligned
to 8 bytes. It is usually transparent to user programs since kernel
will pop the restore token when signal handler returns. But when an
exception is thrown from a signal handler, now we need to pop the
restore token from shadow stack. For x86-64, we just need to treat
the signal frame as normal frame. For i386, we need to search for
the restore token to check if the original shadow stack is 8 byte
aligned. If the original shadow stack is 8 byte aligned, we just
need to pop 2 slots, one restore token, from shadow stack. Otherwise,
we need to pop 3 slots, one restore token + 4 byte padding, from
shadow stack.
This patch also includes 2 tests, one has a restore token with 4 byte
padding and one without.
Tested on Linux/x86-64 CET machine with and without -m32.
libgcc/
PR libgcc/85334
* config/i386/shadow-stack-unwind.h (_Unwind_Frames_Increment):
New.
gcc/testsuite/
PR libgcc/85334
* g++.target/i386/pr85334-1.C: New test.
* g++.target/i386/pr85334-2.C: Likewise.
The peephole that detects a mov of one register to another followed by
a comparison of the original register against zero is only used in Arm
state; but the instruction that matches this is generic to all 32-bit
compilation states. That instruction lacks support for SP which is
permitted in Arm state, but has restrictions in Thumb2 code.
This patch fixes the problem by allowing SP when in ARM state for all
registers; in Thumb state it allows SP only as a source when the
register really is copied to another target.
* config/arm/arm.md (movsi_compare0): Allow SP as a source register
in Thumb state and also as a destination in Arm state. Add T16
variants.
The last argument to strncasecmp is incorrect, so it matched even when
can%' wasn't followed by t. Also, the !ISALPHA (format_chars[1]) test
looks pointless, format_chars[1] must be ' if strncasecmp succeeded and
so will never be ISALPHA.
2020-02-10 Jakub Jelinek <jakub@redhat.com>
PR other/93641
* c-format.c (check_plain): Fix up last argument of strncasecmp.
Remove useless extra test.
* gcc.dg/format/gcc_diag-11.c (test_cdiag_bad_words): Add two further
tests.
Commit r10-6500-g811a475ea3fcc55ee4aea7c81171891ef19dfc25 broke the
GCC build for arm-none-uclinuxfdpiceabi, as it forgot to update some
uses of gnu_Unwind_Find_got.
2020-02-10 Christophe Lyon <christophe.lyon@linaro.org>
libgcc/
PR target/93615
* unwind-arm-common.inc: Replace uses of gnu_Unwind_Find_got with
_Unwind_gnu_Find_got.
* unwind-pe.h: Likewise.
I'm not aware of symbols starting with _ZG that don't start with _ZGR
prefix, but perhaps in the future there might be some.
2020-02-10 Jakub Jelinek <jakub@redhat.com>
PR other/93641
* error.c (dump_decl_name): Fix up last argument to strncmp.
Clearly I can't count, so we would consider as SECTION_BSS even sections
like .lbssfoo or .gnu.linkonce.lbbar, even when linker only considers as
special .lbss or .lbss.baz or .gnu.linkonce.lb.qux.
2020-02-10 Jakub Jelinek <jakub@redhat.com>
PR target/58218
PR other/93641
* config/i386/i386.c (x86_64_elf_section_type_flags): Fix up last
arguments of strncmp.
We were already rejecting initialization of a flexible array member in a
constructor; we similarly shouldn't try to clean it up.
PR c++/93618
* tree.c (array_of_unknown_bound_p): New.
* init.c (perform_member_init): Do nothing for flexible arrays.
Add xfails for nvptx offloading because
"no GOMP_OFFLOAD_async_run implemented in plugin-nvptx.c"
(https://gcc.gnu.org/PR81688) and because
"omp target link not implemented for nvptx"
(https://gcc.gnu.org/PR81689).
libgomp/
* testsuite/libgomp.c/target-33.c: Add xfail for execution on
offload_target_nvptx, cf. https://gcc.gnu.org/PR81688.
* testsuite/libgomp.c/target-34.c: Likewise.
* testsuite/libgomp.c/target-link-1.c: Add xfail for
offload_target_nvptx, cf. https://gcc.gnu.org/PR81689.
Besides simple pass-through (aggregate) jump function, arithmetic (aggregate)
jump function could also bring same (aggregate) value as parameter passed-in
for self-feeding recursive call. For example,
f1 (int i) /* normal jump function */
{
f1 (i & 1);
}
Suppose i is 0, recursive propagation via (i & 1) also gets 0, which
can be seen as a simple pass-through of i.
f2 (int *p) /* aggregate jump function */
{
int t = *p & 1;
f2 (&t);
}
Likewise, if *p is 0, (*p & 1) is also 0, and &t is an aggregate simple
pass-through of p.
2020-02-10 Feng Xue <fxue@os.amperecomputing.com>
PR ipa/93203
* ipa-cp.c (ipcp_lattice::add_value): Add source with same call edge
but different source value.
(adjust_callers_for_value_intersection): New function.
(gather_edges_for_value): Adjust order of callers to let a
non-self-recursive caller be the first element.
(self_recursive_pass_through_p): Add a new parameter "simple", and
check generalized self-recursive pass-through jump function.
(self_recursive_agg_pass_through_p): Likewise.
(find_more_scalar_values_for_callers_subset): Compute value from
pass-through jump function for self-recursive.
(intersect_with_plats): Cleanup previous implementation code for value
itersection with self-recursive call edge.
(intersect_with_agg_replacements): Likewise.
(intersect_aggregates_with_edge): Deduce value from pass-through jump
function for self-recursive call edge. Cleanup previous implementation
code for value intersection with self-recursive call edge.
(decide_whether_version_node): Remove dead callers and adjust order
to let a non-self-recursive caller be the first element.
PR ipa/93203
* g++.dg/ipa/pr93203.C: New test.
The names of split_before_sched2 ("split4") and split_before_regstack
("split3") do not reflect their insertion point in the sequence of passes,
where split_before_regstack follows split_before_sched2. Reorder the code
and rename the passes to reflect the reality.
split_before_regstack pass does not need to run if split_before_sched2 pass
was already performed. Introduce enable_split_before_sched2 function to
simplify gating functions of these two passes.
There is no need for a separate rest_of_handle_split_before_sched2.
split_all_insns can be called unconditionally from
pass_split_before_sched2::execute, since the corresponding gating function
determines if the pass is executed or not.
* recog.c: Move pass_split_before_sched2 code in front of
pass_split_before_regstack.
(pass_data_split_before_sched2): Rename pass to split3 from split4.
(pass_data_split_before_regstack): Rename pass to split4 from split3.
(rest_of_handle_split_before_sched2): Remove.
(pass_split_before_sched2::execute): Unconditionally call
split_all_insns.
(enable_split_before_sched2): New function.
(pass_split_before_sched2::gate): Use enable_split_before_sched2.
(pass_split_before_regstack::gate): Ditto.
* config/nds32/nds32.c (nds32_split_double_word_load_store_p):
Update name check for renamed split4 pass.
* config/sh/sh.c (register_sh_passes): Update pass insertion
point for renamed split4 pass.
The helpers that implement BUILTIN-PTR-CMP do not currently check if the
arguments are actually comparable, so the concept is true when it
shouldn't be.
Since we're trying to test for an unambiguous conversion to pointers, we
can also require that it returns bool, because the built-in comparisons
for pointers do return bool.
* include/bits/range_cmp.h (__detail::__eq_builtin_ptr_cmp): Require
equality comparison to be valid and return bool.
(__detail::__less_builtin_ptr_cmp): Likewise for less-than comparison.
* testsuite/20_util/function_objects/range.cmp/equal_to.cc: Check
type with ambiguous conversion to fundamental types.
* testsuite/20_util/function_objects/range.cmp/less.cc: Likewise.
The first (valid) testcase ICEs because for
A *a = new B ();
a->foo (); // virtual method call
we actually see &heap and the "heap " objects don't have the class or
whatever else type was used in new expression, but an array type containing
one (or more of those for array new) and so when using TYPE_BINFO (objtype)
on it we ICE.
This patch handles this special case, and otherwise punts (as shown e.g. in
the second testcase, where because the heap object is already deleted,
we don't really want to allow it to be used.
2020-02-09 Jakub Jelinek <jakub@redhat.com>
PR c++/93633
* constexpr.c (cxx_eval_constant_expression): If obj is heap var with
ARRAY_TYPE, use the element type. Punt if objtype after that is not
a class type.
* g++.dg/cpp2a/constexpr-new11.C: New test.
* g++.dg/cpp2a/constexpr-new12.C: New test.
* g++.dg/cpp2a/constexpr-new13.C: New test.
DECL_IN_CONSTANT_POOL are shared and thus don't really get emitted in the
BLOCK where they are used, so for OpenMP target regions that have initializers
gimplified into copying from them we actually map them at runtime from host to
offload devices. This patch instead marks them as "omp declare target", so
that they are on the target device from the beginning and don't need to be
copied there.
2020-02-09 Jakub Jelinek <jakub@redhat.com>
* gimplify.c (gimplify_adjust_omp_clauses_1): Promote
DECL_IN_CONSTANT_POOL variables into "omp declare target" to avoid
copying them around between host and target.
* testsuite/libgomp.c/target-38.c: New test.
Hi,
The problem here is that the vector mode version of movmisalign<mode>
was only conditionalized on if SIMD was enabled instead of being
also conditionalized on STRICT_ALIGNMENT too.
Applied as pre-approved in the bug report by Richard Sandiford
after a bootstrap/test on aarch64-linux-gnu.
Thanks,
Andrew Pinski
ChangeLog:
PR target/91927
* config/aarch64/aarch64-simd.md (movmisalign<mode>): Check
STRICT_ALIGNMENT also.
testsuite/ChangeLog:
PR target/91927
* gcc.target/aarch64/pr91927.c: New testcase.
The fix for PR target/92923 exposed some test cases with fragile
scan-assembler-times counting. Split the test cases into smaller
functions, which allows less chance of optimizations causing slight
instruction count numbers.
gcc/testsuite/
PR target/93136
* gcc.dg/vmx/ops.c: Add -flax-vector-conversions to dg-options.
* gcc.target/powerpc/vsx-vector-6.h: Split tests into smaller functions.
* gcc.target/powerpc/vsx-vector-6.p7.c: Adjust scan-assembler-times
regex directives. Adjust expected instruction counts.
* gcc.target/powerpc/vsx-vector-6.p8.c: Likewise.
* gcc.target/powerpc/vsx-vector-6.p9.c: Likewise.
Avoid paradoxical subregs when caller save. This reduces stack frame size
due to smaller loads and stores, and more frequent rematerialization.
PR target/93532
* config/riscv/riscv.h (HARD_REGNO_CALLER_SAVE_MODE): Define.