We didn't take the cost of generating loop masks into account, and so
tended to underestimate the cost of loops that need multiple masks.
2019-11-13 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-loop.c (vect_estimate_min_profitable_iters): Include
the cost of generating loop masks.
gcc/testsuite/
* gcc.target/aarch64/sve/mask_struct_store_3.c: Add
-fno-vect-cost-model.
* gcc.target/aarch64/sve/mask_struct_store_3_run.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_2.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_2_run.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_3.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_3_run.c: Likewise.
From-SVN: r278125
vect_analyze_loop_costing uses two profitability thresholds: a runtime
one and a static compile-time one. The runtime one is simply the point
at which the vector loop is cheaper than the scalar loop, while the
static one also takes into account the cost of choosing between the
scalar and vector loops at runtime. We compare this static cost against
the expected execution frequency to decide whether it's worth generating
any vector code at all.
However, we never reclaimed the cost of applying the runtime threshold
if it turned out that the vector code can always be used. And we only
know whether that's true once we've calculated what the runtime
threshold would be.
2019-11-13 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vectorizer.h (vect_apply_runtime_profitability_check_p):
New function.
* tree-vect-loop-manip.c (vect_loop_versioning): Use it.
* tree-vect-loop.c (vect_analyze_loop_2): Likewise.
(vect_transform_loop): Likewise.
(vect_analyze_loop_costing): Don't take the cost of versioning
into account for the static profitability threshold if it turns
out that no versioning is needed.
From-SVN: r278124
* ipa.c (cgraph_build_static_cdtor): Pass optimization_default_node
and target_option_default_node to get -fprofile-generate ctors working
right with LTO.
From-SVN: r278123
vectorizable_assignment handles true SSA-to-SSA copies (which hopefully
we don't see in practice) and no-op conversions that are required
to maintain correct gimple, such as changes between signed and
unsigned types. These cases shouldn't generate any code and so
shouldn't count against either the scalar or vector costs.
Later patches test this, but it seemed worth splitting out.
2019-11-13 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vectorizer.h (vect_nop_conversion_p): Declare.
* tree-vect-stmts.c (vect_nop_conversion_p): New function.
(vectorizable_assignment): Don't add a cost for nop conversions.
* tree-vect-loop.c (vect_compute_single_scalar_iteration_cost):
Likewise.
* tree-vect-slp.c (vect_bb_slp_scalar_cost): Likewise.
From-SVN: r278122
This patch makes two tweaks to vectorizable_conversion. The first
is to use "modifier" to distinguish between promotion, demotion,
and neither promotion nor demotion, rather than using a code for
some cases and "modifier" for others. The second is to take ncopies
into account for the promotion and demotion costs; previously we gave
multiple copies the same cost as a single copy.
Later patches test this, but it seemed worth splitting out.
2019-11-13 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-stmts.c (vect_model_promotion_demotion_cost): Take the
number of ncopies as an additional argument.
(vectorizable_conversion): Update call accordingly. Use "modifier"
to check whether a conversion is between vectors with the same
numbers of units.
From-SVN: r278121
This is a like-for-like change at the moment, but is a prerequisite
for removing mode_for_int_vector.
2019-11-13 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/aarch64-sve-builtins-functions.h
(unary_count::expand): Use aarch64_sve_int_mode instead of
mode_for_int_vector.
From-SVN: r278120
One of the changes in r277281 was to make the typedef variant
handling in strip_typedefs pass the raw DECL_ORIGINAL_TYPE to the
recursive call, instead of applying TYPE_MAIN_VARIANT first.
This PR shows that that interacts badly with the implementation
of DR1558, because we then refuse to strip aliases with dependent
template parameters and trip:
gcc_assert (!typedef_variant_p (result)
|| ((flags & STF_USER_VISIBLE)
&& !user_facing_original_type_p (result)));
Keeping the current behaviour but suppressing the ICE leads to a
duplicate error (the dg-bogus in the first test), so that didn't
seem like a good fix.
I assume keeping the alias should never actually be necessary for
DECL_ORIGINAL_TYPEs, because it will already have been checked
somewhere, even for implicit TYPE_DECLs. This patch therefore
passes a flag to say that we can safely strip aliases with
dependent template parameters.
2019-11-13 Richard Sandiford <richard.sandiford@arm.com>
gcc/cp/
PR c++/92206
* cp-tree.h (STF_STRIP_DEPENDENT): New constant.
* tree.c (strip_typedefs): Add STF_STRIP_DEPENDENT to the flags
when calling strip_typedefs recursively on a DECL_ORIGINAL_TYPE.
Don't apply the fix for DR1558 in that case; allow aliases with
dependent template parameters to be stripped instead.
gcc/testsuite/
PR c++/92206
* g++.dg/cpp0x/alias-decl-pr92206-1.C: New test.
* g++.dg/cpp0x/alias-decl-pr92206-2.C: Likewise.
* g++.dg/cpp0x/alias-decl-pr92206-3.C: Likewise.
From-SVN: r278119
2019-11-13 Richard Biener <rguenther@suse.de>
PR tree-optimization/92473
* tree-vect-loop.c (vect_create_epilog_for_reduction): Perform
direct optab reduction in the correct type.
From-SVN: r278113
This case is testing 'web' on ignore naked clobber.
-funroll-loops no longer implies -fweb for powerpc.
So, add -fweb to enable 'web' for this case.
gcc.testsuite/
2019-11-13 Jiufu Guo <guojiufu@linux.ibm.com>
PR target/92465
* gcc.dg/pr47763.c: Add option -fweb.
From-SVN: r278112
C++98 does not have long long int, and does not use (unsigned) long
long int for hexadecimal literals. So let's use an ULL suffix here,
which is still not strict C++98, but which works with more compilers.
* config/rs6000/rs6000.md (rs6000_set_fpscr_drn): Use ULL on big
hexadecimal literal.
From-SVN: r278107
* config/rs6000/vsx.md (xscmpexpdp_<code> for CMP_TEST): Handle
UNORDERED if !HONOR_NANS (DFmode).
(xscmpexpqp_<code>_<mode> for CMP_TEST and IEEE128): Handle UNORDERED
if !HONOR_NANS (<MODE>mode).
From-SVN: r278103
gcc/ChangeLog:
PR middle-end/83688
* gimple-ssa-sprintf.c (format_result::alias_info): New struct.
(directive::argno): New member.
(format_result::aliases, format_result::alias_count): New data members.
(format_result::append_alias): New member function.
(fmtresult::dst_offset): New data member.
(pass_sprintf_length::call_info::dst_origin): New data member.
(pass_sprintf_length::call_info::dst_field, dst_offset): Same.
(char_type_p, array_elt_at_offset, field_at_offset): New functions.
(get_origin_and_offset): Same.
(format_string): Call it.
(format_directive): Call append_alias and set directive argument
number.
(maybe_warn_overlap): New function.
(pass_sprintf_length::compute_format_length): Call it.
(pass_sprintf_length::handle_gimple_call): Initialize new members.
* gcc/tree-ssa-strlen.c (): Also enable when -Wrestrict is on.
gcc/testsuite/ChangeLog:
PR tree-optimization/35503
* gcc.dg/tree-ssa/builtin-sprintf-warn-23.c: New test.
From-SVN: r278098
try_forward_edges does not update dominance info, and merge_blocks
relies on it being up-to-date. In PR92430 stale dominance info makes
merge_blocks produce a loop in the dominator tree, which in turn makes
delete_basic_block loop forever.
Fix by freeing dominance info at the beginning of cleanup_cfg.
gcc/ChangeLog:
2019-11-12 Ilya Leoshkevich <iii@linux.ibm.com>
PR rtl-optimization/92430
* cfgcleanup.c (pass_jump_after_combine::execute): Free
dominance info at the beginning.
gcc/testsuite/ChangeLog:
2019-11-12 Ilya Leoshkevich <iii@linux.ibm.com>
PR rtl-optimization/92430
* gcc.dg/pr92430.c: New test (from Arseny Solokha).
From-SVN: r278095
2019-11-12 Martin Liska <mliska@suse.cz>
* common/common-target.def:
Do not mention set_default_param_value
and set_param_value.
* doc/tm.texi: Likewise.
From-SVN: r278088
2019-11-12 Martin Liska <mliska@suse.cz>
* common.opt: Remove param_values.
* config/i386/i386-options.c (ix86_valid_target_attribute_p):
Remove finalize_options_struct.
* gcc.c (driver::decode_argv): Do not call global_init_params
and finish_params.
(driver::finalize): Do not call params_c_finalize
and finalize_options_struct.
* opt-suggestions.c (option_proposer::get_completions): Remove
special casing of params.
(option_proposer::find_param_completions): Remove.
(test_completion_partial_match): Update expected output.
* opt-suggestions.h: Remove find_param_completions.
* opts-common.c (add_misspelling_candidates): Add
--param with a space.
* opts.c (handle_param): Remove.
(init_options_struct):. Remove init_options_struct and
similar calls.
(finalize_options_struct): Remove.
(common_handle_option): Use SET_OPTION_IF_UNSET.
* opts.h (finalize_options_struct): Remove.
* toplev.c (general_init): Do not call global_init_params.
(toplev::finalize): Do not call params_c_finalize and
finalize_options_struct.
From-SVN: r278087
2019-11-12 Martin Liska <mliska@suse.cz>
* Makefile.in: Include params.opt.
* flag-types.h (enum parloops_schedule_type): Add
parloops_schedule_type used in params.opt.
* params.opt: New file.
From-SVN: r278084
2019-11-12 Martin Liska <mliska@suse.cz>
* common.opt: Remove --param and --param= options.
* opt-functions.awk: Mark CL_PARAMS for options
that have Param keyword.
* opts-common.c (decode_cmdline_options_to_array):
Replace --param key=value with --param=key=value.
* opts.c (print_filtered_help): Remove special
printing of params.
(print_specific_help): Update title for params.
(common_handle_option): Do not handle OPT__param.
opts.h (SET_OPTION_IF_UNSET): New macro.
* doc/options.texi: Document Param keyword.
From-SVN: r278083
The `serial' construct (cf. section 2.5.3 of the OpenACC 2.6 standard)
is equivalent to a `parallel' construct with clauses `num_gangs(1)
num_workers(1) vector_length(1)' implied.
These clauses are therefore not supported with the `serial'
construct. All the remaining clauses accepted with `parallel' are also
accepted with `serial'.
The `serial' construct is implemented like `parallel', except for
hardcoding dimensions rather than taking them from the relevant
clauses, in `expand_omp_target'.
Separate codes are used to denote the `serial' construct throughout the
middle end, even though the mapping of `serial' to an equivalent
`parallel' construct could have been done in the individual language
frontends. In particular, this allows to distinguish between compute
constructs in warnings, error messages, dumps etc.
2019-11-12 Maciej W. Rozycki <macro@codesourcery.com>
Tobias Burnus <tobias@codesourcery.com>
Frederik Harwath <frederik@codesourcery.com>
Thomas Schwinge <thomas@codesourcery.com>
gcc/
* gimple.h (gf_mask): Add GF_OMP_TARGET_KIND_OACC_SERIAL
enumeration constant.
(is_gimple_omp_oacc): Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
(is_gimple_omp_offloaded): Likewise.
* gimplify.c (omp_region_type): Add ORT_ACC_SERIAL enumeration
constant. Adjust the value of ORT_NONE accordingly.
(is_gimple_stmt): Handle OACC_SERIAL.
(oacc_default_clause): Handle ORT_ACC_SERIAL.
(gomp_needs_data_present): Likewise.
(gimplify_adjust_omp_clauses): Likewise.
(gimplify_omp_workshare): Handle OACC_SERIAL.
(gimplify_expr): Likewise.
* omp-expand.c (expand_omp_target):
Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
(build_omp_regions_1, omp_make_gimple_edges): Likewise.
* omp-low.c (is_oacc_parallel): Rename function to...
(is_oacc_parallel_or_serial): ... this.
Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
(scan_sharing_clauses): Adjust accordingly.
(scan_omp_for): Likewise.
(lower_oacc_head_mark): Likewise.
(convert_from_firstprivate_int): Likewise.
(lower_omp_target): Likewise.
(check_omp_nesting_restrictions): Handle
GF_OMP_TARGET_KIND_OACC_SERIAL.
(lower_oacc_reductions): Likewise.
(lower_omp_target): Likewise.
* tree.def (OACC_SERIAL): New tree code.
* tree-pretty-print.c (dump_generic_node): Handle OACC_SERIAL.
* doc/generic.texi (OpenACC): Document OACC_SERIAL.
gcc/c-family/
* c-pragma.h (pragma_kind): Add PRAGMA_OACC_SERIAL enumeration
constant.
* c-pragma.c (oacc_pragmas): Add "serial" entry.
gcc/c/
* c-parser.c (OACC_SERIAL_CLAUSE_MASK): New macro.
(c_parser_oacc_kernels_parallel): Rename function to...
(c_parser_oacc_compute): ... this. Handle PRAGMA_OACC_SERIAL.
(c_parser_omp_construct): Update accordingly.
gcc/cp/
* constexpr.c (potential_constant_expression_1): Handle
OACC_SERIAL.
* parser.c (OACC_SERIAL_CLAUSE_MASK): New macro.
(cp_parser_oacc_kernels_parallel): Rename function to...
(cp_parser_oacc_compute): ... this. Handle PRAGMA_OACC_SERIAL.
(cp_parser_omp_construct): Update accordingly.
(cp_parser_pragma): Handle PRAGMA_OACC_SERIAL. Fix alphabetic
order.
* pt.c (tsubst_expr): Handle OACC_SERIAL.
gcc/fortran/
* gfortran.h (gfc_statement): Add ST_OACC_SERIAL_LOOP,
ST_OACC_END_SERIAL_LOOP, ST_OACC_SERIAL and ST_OACC_END_SERIAL
enumeration constants.
(gfc_exec_op): Add EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL
enumeration constants.
* match.h (gfc_match_oacc_serial): New prototype.
(gfc_match_oacc_serial_loop): Likewise.
* dump-parse-tree.c (show_omp_node, show_code_node): Handle
EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL.
* match.c (match_exit_cycle): Handle EXEC_OACC_SERIAL_LOOP.
* openmp.c (OACC_SERIAL_CLAUSES): New macro.
(gfc_match_oacc_serial_loop): New function.
(gfc_match_oacc_serial): Likewise.
(oacc_is_loop): Handle EXEC_OACC_SERIAL_LOOP.
(resolve_omp_clauses): Handle EXEC_OACC_SERIAL.
(oacc_code_to_statement): Handle EXEC_OACC_SERIAL and
EXEC_OACC_SERIAL_LOOP.
(gfc_resolve_oacc_directive): Likewise.
* parse.c (decode_oacc_directive) <'s'>: Add case for "serial"
and "serial loop".
(next_statement): Handle ST_OACC_SERIAL_LOOP and ST_OACC_SERIAL.
(gfc_ascii_statement): Likewise. Handle ST_OACC_END_SERIAL_LOOP
and ST_OACC_END_SERIAL.
(parse_oacc_structured_block): Handle ST_OACC_SERIAL.
(parse_oacc_loop): Handle ST_OACC_SERIAL_LOOP and
ST_OACC_END_SERIAL_LOOP.
(parse_executable): Handle ST_OACC_SERIAL_LOOP and
ST_OACC_SERIAL.
(is_oacc): Handle EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL.
* resolve.c (gfc_resolve_blocks, gfc_resolve_code): Likewise.
* st.c (gfc_free_statement): Likewise.
* trans-openmp.c (gfc_trans_oacc_construct): Handle
EXEC_OACC_SERIAL.
(gfc_trans_oacc_combined_directive): Handle
EXEC_OACC_SERIAL_LOOP.
(gfc_trans_oacc_directive): Handle EXEC_OACC_SERIAL_LOOP and
EXEC_OACC_SERIAL.
* trans.c (trans_code): Likewise.
gcc/testsuite/
* c-c++-common/goacc/parallel-dims.c: New test.
* gfortran.dg/goacc/parallel-dims.f90: New test.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: New test.
* testsuite/libgomp.oacc-fortran/parallel-dims-aux.c: New test.
* testsuite/libgomp.oacc-fortran/parallel-dims.f89: New test.
* testsuite/libgomp.oacc-fortran/parallel-dims-2.f90: New test.
Reviewed-by: Thomas Schwinge <thomas@codesourcery.com>
Co-Authored-By: Frederik Harwath <frederik@codesourcery.com>
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
Co-Authored-By: Tobias Burnus <tobias@codesourcery.com>
From-SVN: r278082
PR tree-optimization/92452
* tree-vrp.c (vrp_prop::check_array_ref): If TRUNC_DIV_EXPR folds
into NULL_TREE, set up_bound to NULL_TREE instead of computing
MINUS_EXPR on it.
* c-c++-common/pr92452.c: New test.
From-SVN: r278080
Supporting TLS for -mpcrel turns out to be relatively simple. The
existing TLSGD and TLSLD unspecs happily can have their GOT pointer
reg element replaced with zero, refelecting the fact that optimisation
of calls to __tls_get_addr when pc-rel won't use the GOT pointer.
Some other insns also can be reused, and just a few added.
* config/rs6000/predicates.md (unspec_tls): Allow const0_rtx for got
element of unspec vec.
* config/rs6000/rs6000.c (rs6000_legitimize_tls_address): Support
PC-relative TLS.
* config/rs6000/rs6000.md (UNSPEC_TLSTLS_PCREL): New unspec.
(tls_gd_pcrel, tls_ld_pcrel): New insns.
(tls_dtprel, tls_tprel): Set attr prefixed when tls_size is not 16.
(tls_got_tprel_pcrel, tls_tls_pcrel): New insns.
From-SVN: r278076
2019-11-11 Michael Meissner <meissner@linux.ibm.com>
* config/rs6000/predicates.md (prefixed_memory): New predicate.
* config/rs6000/rs6000.md (stack_protect_setdi): Deal with either
address being a prefixed load/store.
(stack_protect_testdi): Deal with either address being a prefixed
load.
From-SVN: r278069