This makes sure to put special-ops expanded rhs left where
expression rewrite expects it.
2020-08-27 Richard Biener <rguenther@suse.de>
PR tree-optimization/96579
* tree-ssa-reassoc.c (linearize_expr_tree): If we expand
rhs via special ops make sure to swap operands.
* gcc.dg/pr96579.c: New testcase.
This improves DSEs stmt walking by not considering a DEF without
uses for further processing (and thus giving up when there's two
paths to follow).
2020-08-26 Richard Biener <rguenther@suse.de>
PR tree-optimization/96565
* tree-ssa-dse.c (dse_classify_store): Remove defs with
no uses from further processing.
* gcc.dg/tree-ssa/ssa-dse-40.c: New testcase.
* gcc.dg/builtin-object-size-4.c: Adjust.
Almost all of the proposed resolution for LWG 3448 is already
implemented; the only part left is to adjust the return type of
transform_view::sentinel::operator-.
libstdc++-v3/ChangeLog:
PR libstdc++/95322
* include/std/ranges (transform_view::sentinel::__distance_from):
Give this a deduced return type.
(transform_view::sentinel::operator-): Adjust the return type so
that it's based on the constness of the iterator rather than
that of the sentinel.
* testsuite/std/ranges/adaptors/95322.cc: Refer to LWG 3488.
This implements the proposed resolution for LWG 3406, and adds a
testcase for the example from P1994R1.
libstdc++-v3/ChangeLog:
* include/std/ranges (elements_view::begin): Adjust constraints.
(elements_view::end): Likewise.
(elements_view::_Sentinel::operator==): Templatize to take both
_Iterator<true> and _Iterator<false>.
(elements_view::_Sentinel::operator-): Likewise.
* testsuite/std/ranges/adaptors/elements.cc: Add testcase for
the example from P1994R1.
* testsuite/std/ranges/adaptors/lwg3406.cc: New test.
The example from the paper doesn't compile without the proposed
resolution for LWG 3406, so we'll add a testcase for this once the
proposed resolution is implemented.
libstdc++-v3/ChangeLog:
* include/std/ranges (elements_view::end): Replace these two
overloads with four new overloads.
(elements_view::_Iterator::operator==): Remove.
(elements_view::_Iterator::operator-): Likewise.
(elements_view::_Sentinel): Define.
A number of i386 math optimisation tests are looking assembly instructions
that are only emitted when the compiler knows the target has a C99 libm
available. Since targets like *-elf may not have such a libm, a C99 runtime
requirement is added to these tests.
gcc/testsuite/ChangeLog
* gcc.target/i386/387-7.c: Add dg-require-effective-target c99_runtime.
* gcc.target/i386/387-9.c: Likewise.
* g++.target/i386/avx512bw-pr96246-1.C: Likewise.
* gcc.target/i386/avx512f-rint-sfix-vec-2.c: Likewise.
* gcc.target/i386/avx512f-rintf-sfix-vec-2.c: Likewise.
* g++.target/i386/avx512vl-pr96246-1.C: Likewise.
* gcc.target/i386/pr61403.c: Likewise.
* gcc.target/i386/sse4_1-ceil-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-ceilf-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-floor-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-floorf-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-rint-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-rintf-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-round-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-roundf-sfix-vec.c: Likewise.
The wording of the description of -fprofile-exclude-files is easy to
misunderstand. One can be led to believe a file is excluded only if
it matches all of the patterns, not just one. This patch tries to
clarify the function. It also adjusts the wording of
-fprofile-filter-files accordingly.
gcc/
PR gcov-profile/96285
* common.opt, doc/invoke.texi: Clarify wording of
-fprofile-exclude-files and adjust -fprofile-filter-files to
match.
The implementation of define_expand and define_insn patterns to handle
shifts in the MSP430 backend is inconsistent, resulting in missed
opportunities to make best use of the architecture's features.
There's now a single define_expand used as the entry point for all valid
shifts, and the decision to either use a helper function to perform the
shift (often required for the 430 ISA), or fall through to the
define_insn patterns can be made from that expander function.
Shifts by a constant amount have been grouped into one define_insn for
each type of shift, instead of having different define_insn patterns for
shifts by different amounts.
A new target option "-mmax-inline-shift=" has been added to allow tuning
of the number of shift instructions to emit inline, instead of using
a library helper function.
gcc/ChangeLog:
* config/msp430/constraints.md (K): Change unused constraint to
constraint to a const_int between 1 and 19.
(P): New constraint.
* config/msp430/msp430-protos.h (msp430x_logical_shift_right): Remove.
(msp430_expand_shift): New.
(msp430_output_asm_shift_insns): New.
* config/msp430/msp430.c (msp430_rtx_costs): Remove shift costs.
(CSH): Remove.
(msp430_expand_helper): Remove hard-coded generation of some inline
shift insns.
(use_helper_for_const_shift): New.
(msp430_expand_shift): New.
(msp430_output_asm_shift_insns): New.
(msp430_print_operand): Add new 'W' operand selector.
(msp430x_logical_shift_right): Remove.
* config/msp430/msp430.md (HPSI): New define_mode_iterator.
(HDI): Likewise.
(any_shift): New define_code_iterator.
(shift_insn): New define_code_attr.
Adjust unnamed insn patterns searched for by combine.
(ashlhi3): Remove.
(slli_1): Remove.
(430x_shift_left): Remove.
(slll_1): Remove.
(slll_2): Remove.
(ashlsi3): Remove.
(ashldi3): Remove.
(ashrhi3): Remove.
(srai_1): Remove.
(430x_arithmetic_shift_right): Remove.
(srap_1): Remove.
(srap_2): Remove.
(sral_1): Remove.
(sral_2): Remove.
(ashrsi3): Remove.
(ashrdi3): Remove.
(lshrhi3): Remove.
(srli_1): Remove.
(430x_logical_shift_right): Remove.
(srlp_1): Remove.
(srll_1): Remove.
(srll_2x): Remove.
(lshrsi3): Remove.
(lshrdi3): Remove.
(<shift_insn><mode>3): New define_expand.
(<shift_insn>hi3_430): New define_insn.
(<shift_insn>si3_const): Likewise.
(ashl<mode>3_430x): Likewise.
(ashr<mode>3_430x): Likewise.
(lshr<mode>3_430x): Likewise.
(*bitbranch<mode>4_z): Replace renamed predicate msp430_bitpos with
const_0_to_15_operand.
* config/msp430/msp430.opt: New option -mmax-inline-shift=.
* config/msp430/predicates.md (const_1_to_8_operand): New predicate.
(const_0_to_15_operand): Rename msp430_bitpos predicate.
(const_1_to_19_operand): New predicate.
* doc/invoke.texi: Document -mmax-inline-shift=.
libgcc/ChangeLog:
* config/msp430/slli.S (__gnu_mspabi_sllp): New.
* config/msp430/srai.S (__gnu_mspabi_srap): New.
* config/msp430/srli.S (__gnu_mspabi_srlp): New.
gcc/testsuite/ChangeLog:
* gcc.target/msp430/emulate-srli.c: Fix expected assembler text.
* gcc.target/msp430/max-inline-shift-430-no-opt.c: New test.
* gcc.target/msp430/max-inline-shift-430.c: New test.
* gcc.target/msp430/max-inline-shift-430x.c: New test.
The _Tuple_impl constructor for allocator-extended construction from a
different tuple type uses the _Tuple_impl's own _Head type in the
__use_alloc test. That is incorrect, because the argument tuple could
have a different type. Using the wrong type might select the
leading-allocator convention when it should use the trailing-allocator
convention, or vice versa.
libstdc++-v3/ChangeLog:
PR libstdc++/96803
* include/std/tuple
(_Tuple_impl(allocator_arg_t, Alloc, const _Tuple_impl<U...>&)):
Replace parameter pack with a type parameter and a pack and pass
the first type to __use_alloc.
* testsuite/20_util/tuple/cons/96803.cc: New test.
A recent change altered the layout of EBO-helper base classes, resulting
in an ambiguity when the hash function and equality predicate are the
same type.
This modifies the type of one of the base classes, so that we don't get
two base classes of the same type.
libstdc++-v3/ChangeLog:
* include/bits/hashtable_policy.h (_Hash_code_base): Change
index of _Hashtable_ebo_helper base class.
* testsuite/23_containers/unordered_map/dup_types.cc: New test.
This removes all uses of VR_ANTI_RANGE.
gcc/ChangeLog:
* tree-ssa-dom.c (simplify_stmt_for_jump_threading): Abstract code out to...
* tree-vrp.c (find_case_label_range): ...here. Rewrite for to use irange
API.
(simplify_stmt_for_jump_threading): Call find_case_label_range instead of
duplicating the code in simplify_stmt_for_jump_threading.
* tree-vrp.h (find_case_label_range): New prototype.
This fixes vectorized PHI latch edge updating and delay it until
all of the loop is code generated to deal with the case that the
latch def is a PHI in the same block.
2020-08-26 Richard Biener <rguenther@suse.de>
PR tree-optimization/96698
* tree-vectorizer.h (loop_vec_info::reduc_latch_defs): New.
(loop_vec_info::reduc_latch_slp_defs): Likewise.
* tree-vect-stmts.c (vect_transform_stmt): Only record
stmts to update PHI latches from, perform the update ...
* tree-vect-loop.c (vect_transform_loop): ... here after
vectorizing those PHIs.
(info_for_reduction): Properly handle non-reduction PHIs.
* gcc.dg/vect/pr96698.c: New testcase.
Since GCC 6.1 there is no reason we can't just use __glibcxx_assert in
constexpr functions in string_view. As long as the condition is true,
there will be no call to std::__replacement_assert that would make the
function ineligible for constant evaluation.
PR libstdc++/71960
* include/experimental/string_view (basic_string_view):
Enable debug assertions.
* include/std/string_view (basic_string_view):
Likewise.
The corresponding commit had the Co-authored-by: lines in the middle of
the commit message instead of at the end, so the ChangeLog script didn't
consider them.
This appropriately uses VMAT_ELEMENTWISE when the vectors cannot be
filled from a single SLP group until we get more explicit support
for negative stride SLP.
2020-08-26 Richard Biener <rguenther@suse.de>
PR tree-optimization/96783
* tree-vect-stmts.c (get_group_load_store_type): Use
VMAT_ELEMENTWISE for negative strides when we cannot
use VMAT_STRIDED_SLP.
* gcc.dg/vect/pr96783-1.c: New testcase.
* gcc.dg/vect/pr96783-2.c: Likewise.
Jason's fix to retain operator lookups inside dependent lambda
functions turns out to be needed on the modules branch for all
template functions. Because the context for that lookup no longer
exists in imports. There were also a couple of shortcomings, which
this patch fixes.
(a) we conflate 'we found nothing' and 'we can redo this at
instantiation time'. Fixed by making the former produce
error_mark_node. That needs a fix in name-lookup to know that finding
a binding containing error_mark_node, means 'stop looking' you found
nothing.
(b) we'd continually do lookups for every operator, if nothing needed
to be retained. Fixed by always caching that information, and then
dealing with it when pushing the bindings.
(c) if what we found was that find by a global namespace lookup, we'd
not cache that. But that'd cause us to either find decls declared
after the template, potentially hiding those we expected to find. So
don't do that check.
This still retains only recording on lambdas. As the comment says, we
could enable for all templates.
gcc/cp/
* decl.c (poplevel): A local-binding tree list holds the name in
TREE_PURPOSE.
* name-lookup.c (update_local_overload): Add id to TREE_PURPOSE.
(lookup_name_1): Deal with local-binding error_mark_node marker.
(op_unqualified_lookup): Return error_mark_node for 'nothing
found'. Retain global binding, check class binding here.
(maybe_save_operator_binding): Reimplement to always cache a
result.
(push_operator_bindings): Deal with 'ignore' marker.
gcc/testsuite/
* g++.dg/lookup/operator-1.C: New.
* g++.dg/lookup/operator-2.C: New.
There are three failures in gcc.target/aarch64/insv_1.c.
FAIL: gcc.target/aarch64/insv_1.c scan-assembler bfi\tx[0-9]+, x[0-9]+, 0, 8
FAIL: gcc.target/aarch64/insv_1.c scan-assembler bfi\tx[0-9]+, x[0-9]+, 16, 5
FAIL: gcc.target/aarch64/insv_1.c scan-assembler movk\tx[0-9]+, 0x1d6b, lsl 32
This patch fix the third failure which was missed "#" before immediate value
in scan-assembler.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/insv_1.c: Add '#' in scan-assembler
The addition of NOTE_INSN_BEGIN_STMT and NOTE_INSN_INLINE_ENTRY notes
reintroduced quadratic behavior into dwarf2out_var_location.
This function needs to know the next real instruction to which the var
location note applies, but the way final_scan_insn is called outside of
final.c main loop doesn't make it easy to look up the next real insn in
there (and for non-dwarf it is even useless). Usually next real insn is
only a few notes away, but we can have hundreds of thousands of consecutive
notes only followed by a real insn. dwarf2out_var_location to avoid the
quadratic behavior contains a cache, it remembers the next note and when it
is called again on that loc_note, it can use the previously computed
dwarf2out_next_real_insn result, rather than walking the insn chain once
again. But, for NOTE_INSN_{BEGIN_STMT,INLINE_ENTRY} dwarf2out_var_location
is not called while the code puts into the cache those notes, which means if
we have e.g. in the worst case NOTE_INSN_VAR_LOCATION and
NOTE_INSN_BEGIN_STMT notes alternating, the cache is not really used.
The following patch fixes it by looking up the next NOTE_INSN_VAR_LOCATION
if any. While the lookup could be perhaps done together with looking for
the next real insn once (e.g. in dwarf2out_next_real_insn or its copy),
there are other dwarf2out_next_real_insn callers which don't need/want that
behavior and if there are more than two NOTE_INSN_VAR_LOCATION notes
followed by the same real insn, we need to do that "find next
NOTE_INSN_VAR_LOCATION" walk anyway.
On the testcase from the PR this patch speeds it 2.8times, from 0m0.674s
to 0m0.236s (why it takes for the reporter more than 60s is unknown).
2020-08-26 Jakub Jelinek <jakub@redhat.com>
PR debug/96729
* dwarf2out.c (dwarf2out_next_real_insn): Adjust function comment.
(dwarf2out_var_location): Look for next_note only if next_real is
non-NULL, in that case look for the first non-deleted
NOTE_INSN_VAR_LOCATION between loc_note and next_real, if any.
The storage class `in' is now a first-class citizen with its own mangle
symbol, of which also permits `in ref'. Previously, `in' was an alias
to `const [scope]', which is a type constructor.
The mangle symbol repurposed for this is `I', which was originally used
by identifier types. However, while TypeIdentifier is part of the
grammar, it must be resolved to some other entity during the semantic
passes, and so shouldn't appear anywhere in the mangled name.
Old tests that are now no longer valid have been removed.
libiberty/ChangeLog:
* d-demangle.c (dlang_function_args): Handle 'in' and 'in ref'
parameter storage classes.
(dlang_type): Remove identifier type.
* testsuite/d-demangle-expected: Update tests.
1. Removes prelude assert for constructors and destructors. To trigger
these asserts one needed to construct or destruct an aggregate at the
null memory location. This would crash upon any data member access,
which is required for a constructor or destructor to do anything useful.
2. Disables bounds checking in foreach statements, when the array is
either a static or dynamic array. If we trust the array `.length' to
be correct, then all elements are between `[0 .. length]', and can't
can't be out of bounds.
Reviewed-on: https://github.com/dlang/dmd/pull/11623
gcc/d/ChangeLog:
* dmd/MERGE: Merge upstream dmd e49192807
Backports a change from upstream dmd that moves front-end NRVO checking
from ReturnStatement semantic to the end of FuncDeclaration semantic.
In the codegen, retStyle has been partially implemented so that only
structs and static arrays return RETstack. This isn't accurate, but
don't need to be for the purposes of semantic analysis.
If a function either has TREE_ADDRESSABLE or must return in memory, then
DECL_RESULT is set as the shidden field for the function. This is used
in the codegen pass for ReturnStatement where it is now detected whether
a function is returning a struct literal or a constructor function, then
the DECL_RESULT is used to directly construct the return value, instead
of doing so via temporaries.
Reviewed-on: https://github.com/dlang/dmd/pull/11622
gcc/d/ChangeLog:
PR d/96156
* d-frontend.cc (retStyle): Only return RETstack for struct and static
array types.
* decl.cc (DeclVisitor::visit (FuncDeclaration *)): Use NRVO return
for all TREE_ADDRESSABLE types. Set shidden to the RESULT_DECL.
* expr.cc (ExprVisitor::visit (CallExp *)): Force TARGET_EXPR if the
'this' pointer reference is a CONSTRUCTOR.
(ExprVisitor::visit (StructLiteralExp *)): Generate assignment to the
symbol to initialize with literal.
* toir.cc (IRVisitor::visit (ReturnStatement *)): Detect returning
struct literals and write directly into the RESULT_DECL.
* dmd/MERGE: Merge upstream dmd fe5f388d8.
gcc/testsuite/ChangeLog:
PR d/96156
* gdc.dg/pr96156.d: New test.
The target files tilegx/mul-tables.c and tilepri/mul-tables.c were
updated in SVN r255743, but the generator file that produces them
wasn't, so it was reverting this change during builds.
gcc/ChangeLog:
* config/tilepro/gen-mul-tables.cc (main): Define IN_TARGET_CODE to 1
in the target file.
The tile*-*-* targets were marked as obsolete in SVN r259724.
contrib/ChangeLog:
* config-list.mk (LIST): Add OPT-enable-obsolete to tilegx-linux-gnu,
tilegxbe-linux-gnu, and tilepro-linux-gnu.
Fixes both a bug where compilation would hang, and an issue where recursive
template limits are hit too early.
Reviewed-on: https://github.com/dlang/dmd/pull/11621
gcc/d/ChangeLog:
* dmd/MERGE: Merge upstream dmd cb4a96fae
This would be an improvement over reading one character at a time.
An ICE was discovered when mixing reading from stdin with `-v', this has been
fixed in upstream DMD and backported as well.
Reviewed-on: https://github.com/dlang/dmd/pull/11620
gcc/d/ChangeLog:
* d-lang.cc (d_parse_file): Use read() to load contents from stdin,
allow the front-end to free the memory after parsing.
* dmd/MERGE: Merge upstream dmd 2cc25c219.
Same issue as the initial commit that addressed PR96153, only this time to fix
it also for structs that are not returned in memory. Tests have been added
that triggered an assertion on x86_64, however the original test was failing
on SPARC 64-bit targets.
gcc/d/ChangeLog:
PR d/96153
* d-codegen.cc (build_address): Create a temporary for CALL_EXPRs
returning trivial aggregates, pre-filling it with zeroes.
(build_memset_call): Use build_zero_cst if setting the entire object.
gcc/testsuite/ChangeLog:
PR d/96153
* gdc.dg/pr96153.d: Add new tests.
TREE_ADDRESSABLE was not propagated from the RECORD_TYPE to the ARRAY_TYPE, so
NRVO code generation was not being triggered.
gcc/d/ChangeLog:
PR d/96157
* d-codegen.cc (d_build_call): Handle TREE_ADDRESSABLE static arrays.
* types.cc (make_array_type): Propagate TREE_ADDRESSABLE from base
type to static array.
gcc/testsuite/ChangeLog:
PR d/96157
* gdc.dg/pr96157a.d: New test.
* gdc.dg/pr96157b.d: New test.
Fail compilation tests only check for language errors from the front-end, all
default option switches do nothing to alter the error.
gcc/testsuite/ChangeLog:
* lib/gdc-utils.exp (gdc-convert-test): Clear PERMUTE_ARGS for
fail_compilation tests if not set by test file.
gcc/d/ChangeLog:
* d-gimplify.cc (d_gimplify_expr): Move lowering of each tree node to
separate functions.
(d_gimplify_modify_expr): New function.
(d_gimplify_addr_expr): New function.
(d_gimplify_call_expr): New function.
(d_gimplify_unsigned_rshift_expr): New function.
Rewriting testcase with cpp source file, then compare operator could
be used directly for vector, this would avoid impact of vectorizer.
gcc/testsuite/ChangeLog:
PR target/96667
* gcc.target/i386/avx512bw-pr96246-1.c: Moved to...
* g++.target/i386/avx512bw-pr96246-1.C: ...here.
* gcc.target/i386/avx512bw-pr96246-2.c: Moved to...
* g++.target/i386/avx512bw-pr96246-2.C: ...here.
* gcc.target/i386/avx512vl-pr96246-1.c: Moved to...
* g++.target/i386/avx512vl-pr96246-1.C: ...here.
* gcc.target/i386/avx512vl-pr96246-2.c: Moved to...
* g++.target/i386/avx512vl-pr96246-2.C: ...here.
PR analyzer/94858 reports a false diagnostic from
-Wanalyzer-malloc-leak, where the allocated pointer is pointed to by a
field of a struct, and a loop writes to a buffer, writing through an
iterating pointer value.
There were several underlying problems, relating to clobbering of the
struct holding the malloc-ed pointer; in each case the analyzer was
conservatively assuming that a write could affect this region,
clobbering it to "unknown", and this was detected as a leak.
The initial write within the loop dereferences the initial value of
a field, and the analyzer was assuming that that pointer could
point to the result of the malloc call. The patch extends
store::eval_alias_1 so that it "knows" that the initial value of a
pointer at the beginning of a path can't point to a region that was
allocated on the heap after the beginning of the path.
On fixing that, the next issue is that within the loop the iterated
pointer value becomes "unknown", and hence *ptr becomes a write to a
symbolic region, and thus might clobber the struct (which it can't).
This patch adds enough logic to svalue::can_merge_p to merge the
iterating pointer value so that at the 2nd iteration analyzing
the loop it becomes a widening_svalue from the initial svalue, so
that this becomes a fixed point of the analysis, and is not an
unknown_svalue. The patch further extends store::eval_alias_1 so that
it "knows" that this widening_svalue can only point to the same base
region as the initial value did; in particular, symbolic writes through
this pointer can only clobber that base region, not the struct holding
the malloc-ed pointer.
gcc/analyzer/ChangeLog:
PR analyzer/94858
* region-model-manager.cc
(region_model_manager::get_or_create_widening_svalue): Assert that
neither of the inputs are themselves widenings.
* store.cc (store::eval_alias_1): The initial value of a pointer
can't point to a region that was allocated on the heap after the
beginning of the path. A widened pointer value can't alias anything
that the initial pointer value can't alias.
* svalue.cc (svalue::can_merge_p): Merge BINOP (X, OP, CST) with X
to a widening svalue. Merge
BINOP(WIDENING(BASE, BINOP(BASE, X)), X) and BINOP(BASE, X) to
to the LHS of the first BINOP.
gcc/testsuite/ChangeLog:
PR analyzer/94858
* gcc.dg/analyzer/loop-start-up-to-end-by-1.c: Remove xfail.
* gcc.dg/analyzer/pr94858-1.c: New test.
* gcc.dg/analyzer/pr94858-2.c: New test.
* gcc.dg/analyzer/torture/loop-inc-ptr-2.c: Update expected number
of enodes.
* gcc.dg/analyzer/torture/loop-inc-ptr-3.c: Likewise.
gcc/analyzer/ChangeLog:
PR analyzer/96777
* region-model.h (class compound_svalue): Document that all keys
must be concrete.
(compound_svalue::compound_svalue): Move definition to svalue.cc.
* store.cc (binding_map::apply_ctor_to_region): Handle
initializers for trailing arrays with incomplete size.
* svalue.cc (compound_svalue::compound_svalue): Move definition
here from region-model.h. Add assertion that all keys are
concrete.
gcc/testsuite/ChangeLog:
PR analyzer/96777
* gcc.dg/analyzer/pr96777.c: New test.
Change CTZ_DEFINED_VALUE_AT_ZERO/CTZ_DEFINED_VALUE_AT_ZERO to return 0/2
to enable table-based clz/ctz optimization:
-- Macro: CLZ_DEFINED_VALUE_AT_ZERO (MODE, VALUE)
-- Macro: CTZ_DEFINED_VALUE_AT_ZERO (MODE, VALUE)
A C expression that indicates whether the architecture defines a
value for 'clz' or 'ctz' with a zero operand. A result of '0'
indicates the value is undefined. If the value is defined for only
the RTL expression, the macro should evaluate to '1'; if the value
applies also to the corresponding optab entry (which is normally
the case if it expands directly into the corresponding RTL), then
the macro should evaluate to '2'. In the cases where the value is
defined, VALUE should be set to this value.
gcc/
PR target/95863
* config/i386/i386.h (CTZ_DEFINED_VALUE_AT_ZERO): Return 0/2.
(CLZ_DEFINED_VALUE_AT_ZERO): Likewise.
gcc/testsuite/
PR target/95863
* gcc.target/i386/pr95863-1.c: New test.
* gcc.target/i386/pr95863-2.c: Likewise.
This is my proposed fix to PR middle-end/87256 where synth_mult takes an
unreasonable amount of CPU time determining an optimal sequence of
instructions to perform multiplication by (large) integer constants on hppa.
One workaround proposed in bugzilla, is to increase the hash table used
to cache/reuse intermediate results. This helps but is a workaround for
the (hidden) underlying problem.
The real issue is that the hppa_rtx_costs function is providing wildly
inaccurate values (estimates) to the middle-end. For example, (p*q)+(r*s)
would appear to be cheaper than a single multiplication. Another
example is that "(ashiftrt:di regA regB)" is claimed to be only be
COST_N_INSNS(1) when in fact the hppa backend actually generates
slightly more than a single instruction.
It turns out that simply tightening up the logic in hppa_rtx_costs to
return more reasonable values, dramatically reduces the number of recursive
invocations in synth_mult for the test case in PR87256, and presumably
also produces faster code (that should be observable in benchmarks).
2020-08-25 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR middle-end/87256
* config/pa/pa.c (hppa_rtx_costs_shadd_p): New helper function
to check for coefficients supported by shNadd and shladd,l.
(hppa_rtx_costs): Rewrite to avoid using estimates based upon
FACTOR and enable recursing deeper into RTL expressions.