This patch adds opt_for_fn for all cross module params used by inliner
so they can be modified at function granuality. With inlining almost always
there are three functions to consider (callee and caller of the inlined edge
and the outer function caller is inlined to).
I always use the outer function params since that is how local parameters
behave. I hope it is kind of what is also expected in most case: it is better
to inline agressively into -O3 compiled code rather than inline agressively -O3
functions into their callers.
New params infrastructure is nice. One drawback is that is very hard to
search for individual param uses since they all occupy global namespace.
With C++ world we had chance to do something like params.param_flag_name
or params::param_flag_name instead...
Bootstrapped/regtested x86_64-linux, comitted.
* cif-code.def (MAX_INLINE_INSNS_SINGLE_O2_LIMIT): Remove.
* doc/invoke.texi (max-inline-insns-single-O2,
inline-heuristics-hint-percent-O2, inline-min-speedup-O2,
early-inlining-insns-O2): Remove documentation.
* ipa-fnsummary.c (analyze_function_body,
compute_fn_summary): Use opt_for_fn when accessing parameters.
* ipa-inline.c (caller_growth_limits, can_inline_edge_p,
inline_insns_auto, can_inline_edge_by_limits_p,
want_early_inline_function_p, big_speedup_p,
want_inline_small_function_p, want_inline_self_recursive_call_p,
recursive_inlining, compute_max_insns, inline_small_functions):
Likewise.
* opts.c (default_options): Add -O3 defaults for
OPT__param_early_inlining_insns_,
OPT__param_inline_heuristics_hint_percent_,
OPT__param_inline_min_speedup_, OPT__param_max_inline_insns_single_.
* params.opt (-param=early-inlining-insns-O2=,
-param=inline-heuristics-hint-percent-O2=,
-param=inline-min-speedup-O2=, -param=max-inline-insns-single-O2=
-param=early-inlining-insns=, -param=inline-heuristics-hint-percent=,
-param=inline-min-speedup=, -param=inline-unit-growth=,
-param=large-function-growth=, -param=large-stack-frame=,
-param=large-stack-frame-growth=, -param=large-unit-insns=,
-param=max-inline-insns-recursive=,
-param=max-inline-insns-recursive-auto=,
-param=max-inline-insns-single=,
-param=max-inline-insns-size=, -param=max-inline-insns-small=,
-param=max-inline-recursive-depth=,
-param=max-inline-recursive-depth-auto=,
-param=min-inline-recursive-probability=,
-param=partial-inlining-entry-probability=,
-param=uninlined-function-insns=, -param=uninlined-function-time=,
-param=uninlined-thunk-insns=, -param=uninlined-thunk-time=): Add
Optimization.
* g++.dg/tree-ssa/pr53844.C: Drop -O2 from param name.
* g++.dg/tree-ssa/pr61034.C: Likewise.
* g++.dg/tree-ssa/pr8781.C: Likewise.
* g++.dg/warn/Wstringop-truncation-1.C: Likewise.
* gcc.dg/ipa/pr63416.c: Likewise.
* gcc.dg/tree-ssa/ssa-thread-12.c: Likewise.
* gcc.dg/vect/pr66142.c: Likewise.
* gcc.dg/winline-3.c: Likewise.
* gcc.target/powerpc/pr72804.c: Likewise.
From-SVN: r278644
PR target/92615
* config/i386/i386.c (ix86_md_asm_adjust): If dest_mode is
GET_MODE (dest), is not QImode, using ZERO_EXTEND and dest is not
register_operand, force x into register before storing it into dest.
Formatting fix.
* gcc.target/i386/pr92615.c: New test.
From-SVN: r278642
PR rtl-optimization/92610
* cse.c (rest_of_handle_cse2): Call cleanup_cfg (0) also if
cse_cfg_altered is set, even when tem is 0.
(rest_of_handle_cse_after_global_opts): Likewise.
* g++.dg/opt/pr92610.C: New test.
From-SVN: r278640
Part of P1327R1 is to allow typeid with an operand of polymorphic type in
constexpr. I found that we pretty much support it already, the only tweak
was to allow TYPEID_EXPR (only created in a template) in constexpr in C++20.
* constexpr.c (potential_constant_expression_1): Allow a typeid
expression whose operand is of polymorphic type in constexpr in
C++20.
* rtti.c (build_typeid): Remove obsolete FIXME comment.
* g++.dg/cpp2a/constexpr-typeid1.C: New test.
* g++.dg/cpp2a/constexpr-typeid2.C: New test.
* g++.dg/cpp2a/constexpr-typeid3.C: New test.
* g++.dg/cpp2a/constexpr-typeid4.C: New test.
From-SVN: r278635
The tests amended here now have different code-gen with default
options because, previously, the access were indirected per Darwin
ABI for common accesses. The revised code-gen does not match the
expected scan-asms because Darwin defaults to fPIC. For these tests,
it seems that the best solution is to use '-mdynamic-no-pic' in the
m32 case which makes the output similar to the ElF platform default.
gcc/testsuite/ChangeLog:
2019-11-22 Iain Sandoe <iain@sandoe.co.uk>
* gcc.target/i386/pr27971.c: Use mdynamic-no-pic for m32 on
Darwin.
* gcc.target/i386/sse2-load-multi.c: Likewise.
* gcc.target/i386/sse2-store-multi.c: Likewise.
From-SVN: r278631
* c-cppbuiltin.c (c_cpp_builtins): Bump __cpp_init_captures
and __cpp_generic_lambdas for -std=c++2a. Define
__cpp_designated_initializers, __cpp_constexpr_in_decltype and
__cpp_consteval for -std=c++2a. Remove a FIXME comment about
__cpp_concepts for -std=c++2a.
* g++.dg/cpp1z/feat-cxx1z.C: Only compile with -std=c++17.
* g++.dg/cpp2a/feat-cxx2a.C: Adjust for P1902R1 changes.
* g++.dg/cpp2a/desig15.C: New test.
* g++.dg/cpp2a/lambda-pack-init3.C: New test.
* g++.dg/cpp2a/lambda-generic6.C: New test.
* g++.dg/cpp2a/consteval15.C: New test.
From-SVN: r278628
PR tree-optimization/92618
* tree-ssa-reassoc.c (v_info): Change from auto_vec to a struct
containing the auto_vec and a tree.
(undistribute_bitref_for_vector): Handle the case when element type
of vec is not the same as type of the BIT_FIELD_REF. Formatting
fixes.
* gcc.c-torture/compile/pr92618.c: New test.
* gcc.c-torture/execute/pr92618.c: New test.
From-SVN: r278626
2019-11-22 Andrew Stubbs <ams@codesourcery.com>
gcc/
* config/gcn/gcn.c (gcn_hsa_declare_function_name): Calculate
granulated_sgprs according to architecture.
From-SVN: r278617
vect-widen-mult-u8.c and vect-widen-mult-u8-u32.c were failing
on arm-linux-gnueabihf with epilogue vectorisation because we
print the expected messages twice rather than once. We could
fix that either by removing the counts or by disabling epilogue
loop vectorisation. The other vect-widen-mult-* tests do the
latter, so I did the same here.
2019-11-22 Richard Sandiford <richard.sandiford@arm.com>
gcc/testsuite/
* gcc.dg/vect/vect-widen-mult-u8.c: Disable epilogue loop
vectorization.
* gcc.dg/vect/vect-widen-mult-u8-u32.c: Likewise.
From-SVN: r278613
gcc.dg/vect/vect-cond-reduc-3.c had been failing on
arm-linux-gnueabihf since the test was added, because the test needs
support for VEC_COND_EXPR <float cmp float, int, int> whereas the target
only supports VEC_COND_EXPRs in which all modes are the same. (I have
a fix for that, but it's not really stage 3 material.)
2019-11-22 Richard Sandiford <richard.sandiford@arm.com>
gcc/testsuite/
* gcc.dg/vect/vect-cond-reduc-3.c: Require vect_cond_mixed
rather than vect_condition.
From-SVN: r278612
gcc.target/aarch64/sve/clastb_[57].c started failing after the increase
in the cost of vec_to_scalar (r278452). The problem is that we were
double-counting the cost of the CLASTB: once in vect_model_reduction_cost
as a vec_to_scalar and once in vectorizable_condition as a plain
vector_stmt.
Based on the TODO above vect_model_reduction_cost, I think the
preferred long-term direction is for vectorizable_* to cost these
things itself, so that's what the patch does (for this one case only).
2019-11-22 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-stmts.c (vect_model_simple_cost): Take an optional
vect_cost_for_stmt.
(vectorizable_condition): Calculate the cost of EXTRACT_LAST_REDUCTION
here rather than...
* tree-vect-loop.c (vect_model_reduction_cost): ...here.
From-SVN: r278611
The patterns neg_scc_insn and not_scc_insn are not correct, leading to
failing pr77309 test for ARC700. Add two new bic compare with zero
patterns to improve output code.
gcc/
xxxx-xx-xx Claudiu Zissulescu <claziss@synopsys.com>
* config/arc/arc.md (bic_f): Use cc_set_register predicate.
(bic_cmp0_noout): New pattern.
(bic_cmp0): Likewise.
(neg_scc_insn): Remove pattern.
(not_scc_insn): Likewise.
From-SVN: r278610
Fix ARC specific tests by improving the matching pattern and adding
the missing functionality in arc.exp
gcc/tests
xxxx-xx-xx Claudiu Zissulescu <claziss@synopsys.com>
* gcc.target/arc/add_n-combine.c: Match add1/2/3 instruction in
output assembly.
* gcc.target/arc/arc.exp (check_effective_target_codedensity):
Add.
* gcc.target/arc/cmem-7.c: Fix matching patterns.
* gcc.target/arc/cmem-bit-1.c: Likewise.
* gcc.target/arc/cmem-bit-2.c: Likewise.
* gcc.target/arc/cmem-bit-3.c: Likewise.
* gcc.target/arc/cmem-bit-4.c: Likewise.
* gcc.target/arc/interrupt-2.c: Match rtie insn for A7.
* gcc.target/arc/store-merge-1.c: This test is only meaningful for
architectures with double load/store operations.
From-SVN: r278609
The patch to make -fcommon the default introduces a bogus claim into
the GCC documentation.
-fcommon was claimed to be incompatible with ISO C for preventing
duplicate definitions from being diagnosed. It does, but as that
elicits undefined behaviour (the requirement that there shall be no
more than one external definition is not a constraint), ISO C does not
require any diagnostic for it. In the absence of any other rule this
would violate, both -fcommon and -fno-common are fully compatible with
all versions of ISO C.
2019-11-21 Harald van Dijk <harald@gigawatt.nl>
* doc/invoke.texi (-fcommon): Remove claim about ISO C.
From-SVN: r278604
Various bad uses of the [[fallthrough]] attribute are constraint
violations in C2x, so need pedwarns rather than warnings.
This patch duly turns the relevant warnings into pedwarns. The
relevant code is not specific to C, and does not know which form the
attribute was given in ([[fallthrough]] or [[gnu::fallthrough]] or
__attribute__((fallthrough))), but as I understand it these usages are
also erroneous for C++ and it seems reasonable to give a pedwarn here
even when a form other than [[fallthrough]] is being used.
The precise meaning of the standard wording about "The next statement
that would be executed" seems a but unclear in some corner cases; the
tests added keep to cases where it is clear whether or not the next
statement executed is of the required form.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
gcc:
* gimplify.c (expand_FALLTHROUGH_r, expand_FALLTHROUGH): Use
pedwarn instead of warning_at for fallthrough not preceding a case
or default label.
gcc/c-family:
* c-attribs.c (handle_fallthrough_attribute): Use pedwarn instead
of warning.
gcc/testsuite:
* gcc.dg/c2x-attr-fallthrough-6.c: New test. Split out from
c2x-attr-fallthrough-3.c.
* gcc.dg/c2x-attr-fallthrough-1.c: Add more tests.
* gcc.dg/c2x-attr-fallthrough-2.c: Update expected diagnostics.
* gcc.dg/c2x-attr-fallthrough-3.c: Split inside-switch part of
test out to c2x-attr-fallthrough-6.c.
From-SVN: r278599
These two tests are explicitly testing the use of specific
sections or assembler directives for data that is placed in
common. Append -fcommon to the flags to restore them.
gcc/testsuite/ChangeLog:
2019-11-21 Iain Sandoe <iain@sandoe.co.uk>
* gcc.dg/darwin-comm.c: Add -fcommon to compile flags.
* gcc.dg/darwin-sections.c: Likewise.
From-SVN: r278596
We currently expand various floating point comparisons early, to some
sequences with cror insns and the like. This doesn't optimize well.
Change that to allow any of the 14 floating point comparisons in the
instruction stream, and split them after combine (at split1).
* config/rs6000/predicates.md (extra_insn_branch_comparison_operator):
New predicate.
* config/rs6000/rs6000-protos.h (rs6000_emit_fp_cror): New declaration.
* config/rs6000/rs6000.c (rs6000_generate_compare): Don't do anything
special for FP comparisons that need a cror instruction eventually.
(rs6000_emit_fp_cror): New function.
(rs6000_emit_sCOND): Expand all floating point comparisons to one
instruction, for normal FP modes, with HONOR_NANS.
(rs6000_emit_cbranch): Reformat.
* config/rs6000/rs6000.md (fp_rev): New iterator.
(fp_two): New iterator.
*<code><mode>_cc for fp_rev and GPR: New define_insn_and_split.
*<code><mode>_cc for fp_two and GPR: New define_insn_and_split.
*cbranch_2insn: New define_insn_and_split.
From-SVN: r278593
Allowing mixed vector sizes broke the assumption in the following assert,
since it's now possible for different accesses to require different
levels of alignment:
/* FORNOW: use the same mask to test all potentially unaligned
references in the loop. The vectorizer currently supports
a single vector size, see the reference to
GET_MODE_NUNITS (TYPE_MODE (vectype)) where the
vectorization factor is computed. */
gcc_assert (!LOOP_VINFO_PTR_MASK (loop_vinfo)
|| LOOP_VINFO_PTR_MASK (loop_vinfo) == mask);
I guess we could try to over-align smaller accesses so that all
of them are consistent, or try to support multiple alignment masks,
but for now the easiest fix seems to be to turn the assert into a
bail-out check.
2019-11-21 Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR tree-optimization/92526
* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Reject
versioning for alignment if the accesses do not have a consistent
mask, rather than asserting that the masks are consistent.
gcc/testsuite/
PR tree-optimization/92526
* gcc.target/aarch64/pr92526.c: New test.
From-SVN: r278592
In vect-alias-check-1.c we unroll the inner loop and then vectorise
the stores at a[c + 1][b]. Since the access has no guaranteed
alignemnt, we need a realignment mechanism or support for unaligned
accesses in order to vectorise.
In vect-alias-check-18.c we use a reverse access and so need
permute support in order to vectorise.
I'm not really sure when this part of the testsuite prefers
{ xfail { ! foo } } and when it prefers { target foo }. xfail
seems like the most common choice for the alignment restriction,
whereas vect_int and vect_perm are mostly dg-require-effective-target
style features, so I went with that combination.
2019-11-21 Richard Sandiford <richard.sandiford@arm.com>
gcc/testsuite/
PR testsuite/92543
* gcc.dg/vect/vect-alias-check-1.c: XFAIL the alias check message
if there is no realignment support and no support for unaligned
accesses.
* gcc.dg/vect/vect-alias-check-18.c: Restrict the test for the
alias message to targets that have permute support.
From-SVN: r278591
This patch fixes some cases in which we weren't checking whether we had
a vector mode before calling related_vector_mode or before making vector
optab queries.
2019-11-21 Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR tree-optimization/92595
* tree-vect-stmts.c (get_group_load_store_type): Add a VECTOR_MODE_P
check.
(vectorizable_store, vectorizable_load): Likewise.
gcc/testsuite/
PR tree-optimization/92595
* g++.dg/vect/pr92595.cc: New test.
From-SVN: r278590
Hello,
This patch fixes arm acle testcase crc_hf_1.c by modifying the compiler
options directive.
Regression tested on arm-none-eabi and found no regressions.
Ok for trunk? If ok, please commit on my behalf, I don't have the commit
rights.
Thanks,
Srinath.
Applied on behalf of Srinath.
gcc/testsuite/ChangeLog:
2019-11-21 Srinath Parvathaneni <srinath.parvathaneni@arm.com>
* gcc.target/arm/acle/crc_hf_1.c: Modify the compiler options directive
from dg-options to dg-additional-options.
From-SVN: r278588
Add a missing extern to ensure the test passes with -fno-common change.
Committed as obvious.
testsuite/
* gfortran.dg/global_vars_f90_init_driver.c: Add missing extern.
From-SVN: r278557
* ipa-fnsummary.c (evaluate_conditions_for_known_args): Be
ready for some vectors to not be allocated.
(evaluate_properties_for_edge): Document better; make
known_vals and known_aggs caller allocated; avoid determining
values of parameters which are not used.
(ipa_merge_fn_summary_after_inlining): Pre allocate known_vals and
known_aggs.
* ipa-inline-analysis.c (do_estimate_edge_time): Likewise.
(do_estimate_edge_size): Likewise.
(do_estimate_edge_hints): Likewise.
* ipa-cp.c (ipa_get_indirect_edge_target_1): Do not early exit when
values are not known.
(ipa_release_agg_values): Add option to not release vector itself.
From-SVN: r278553
The test fp-int-convert-timode-1.c uses FE_TONEAREST without
actually checking if the target has defined it.
Like the rest of the tests I now add a check to see if the target
has actually implemented it.
This fixed Arm newlib target failures.
Regtested on aarch64-none-elf and aarch64_be-none-elf and no issues.
Committed under the GCC obvious rules.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/fp-int-convert-timode-1.c: Add check for FE_TONEAREST.
From-SVN: r278552
2019-11-21 Richard Biener <rguenther@suse.de>
* cfgloop.h (loop_iterator::~loop_iterator): Remove.
(loop_iterator::to_visit): Use an auto_vec with internal storage.
(loop_iterator::loop_iterator): Adjust.
* cfganal.c (compute_dominance_frontiers_1): Fold into...
(compute_dominance_frontiers): ... this. Hoist invariant
get_immediate_dominator call.
(compute_idf): Use a work-set instead of a work-list for more
optimal iteration order and duplicate avoidance.
* tree-into-ssa.c (mark_phi_for_rewrite): Avoid re-allocating
the vector all the time, instead pre-allocate the vector only
once.
(delete_update_ssa): Simplify.
* vec.h (va_heap::release): Disable -Wfree-nonheap-object around it.
From-SVN: r278550
Bumping the cost of vec_to_scalar made the .s loop in
gcc.target/aarch64/sve2/whilerw_1.c use a runtime profitability check,
like the .d version already did. Since the cost model isn't really
being tested here, the most robust fix seemed to be to disable it,
which I should really have done from the outset.
2019-11-21 Richard Sandiford <richard.sandiford@arm.com>
gcc/testsuite/
* gcc.target/aarch64/sve2/whilerw_1.c: Add -fno-vect-cost-model.
Require x0 in the .d test too.
From-SVN: r278549
PR tree-optimization/91355
* tree-ssa-sink.c (select_best_block): Use >= rather than >
for early_bb scaled count with best_bb count comparison.
* g++.dg/torture/pr91355.C: New test.
From-SVN: r278548
This test fails on targets without symbol alias support, but we don't
want to skip it entirely with the usual dg-requires, thus expect the
error on the alias line.
gcc/testsuite/ChangeLog:
2019-11-21 Iain Sandoe <iain@sandoe.co.uk>
* gcc.dg/gnu2x-attrs-1.c: Expect an error for the alias case
on Darwin.
From-SVN: r278547