gcc/
2016-11-16 Yuri Rumyantsev <ysrumyan@gmail.com>
* params.def (PARAM_VECT_EPILOGUES_NOMASK): New.
* tree-if-conv.c (tree_if_conversion): Make public.
* * tree-if-conv.h: New file.
* tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid
dynamic alias checks for epilogues.
* tree-vect-loop-manip.c (vect_do_peeling): Return created epilog.
* tree-vect-loop.c: include tree-if-conv.h.
(new_loop_vec_info): Add zeroing orig_loop_info field.
(vect_analyze_loop_2): Don't try to enhance alignment for epilogues.
(vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL
if epilogue is vectorized, set up orig_loop_info field of loop_vinfo
using passed argument.
(vect_transform_loop): Check if created epilogue should be returned
for further vectorization with less vf. If-convert epilogue if
required. Print vectorization success for epilogue.
* tree-vectorizer.c (vectorize_loops): Add epilogue vectorization
if it is required, pass loop_vinfo produced during vectorization of
loop body to vect_analyze_loop.
* tree-vectorizer.h (struct _loop_vec_info): Add new field
orig_loop_info.
(LOOP_VINFO_ORIG_LOOP_INFO): New.
(LOOP_VINFO_EPILOGUE_P): New.
(LOOP_VINFO_ORIG_VECT_FACTOR): New.
(vect_do_peeling): Change prototype to return epilogue.
(vect_analyze_loop): Add argument of loop_vec_info type.
(vect_transform_loop): Return created loop.
gcc/testsuite/
2016-11-16 Yuri Rumyantsev <ysrumyan@gmail.com>
* lib/target-supports.exp (check_avx2_hw_available): New.
(check_effective_target_avx2_runtime): New.
* gcc.dg/vect/vect-tail-nomask-1.c: New test.
From-SVN: r242501
* c-warn.c (warn_for_unused_label): Save all labels used
in goto or in &label.
* asan.c (enum asan_check_flags): Move the enum to header file.
(asan_init_shadow_ptr_types): Make type creation more generic.
(shadow_mem_size): New function.
(asan_emit_stack_protection): Use newly added ASAN_SHADOW_GRANULARITY.
Rewritten stack unpoisoning code.
(build_shadow_mem_access): Add new argument return_address.
(instrument_derefs): Instrument local variables if use after scope
sanitization is enabled.
(asan_store_shadow_bytes): New function.
(asan_expand_mark_ifn): Likewise.
(asan_sanitize_stack_p): Moved from asan_sanitize_stack_p.
* asan.h (enum asan_mark_flags): Moved here from asan.c
(asan_protect_stack_decl): Protect all declaration that need
to live in memory.
(asan_sanitize_use_after_scope): New function.
(asan_no_sanitize_address_p): Likewise.
* cfgexpand.c (partition_stack_vars): Consider
asan_sanitize_use_after_scope in condition.
(expand_stack_vars): Likewise.
* common.opt (-fsanitize-address-use-after-scope): New option.
* doc/invoke.texi (use-after-scope-direct-emission-threshold):
Explain the parameter.
* flag-types.h (enum sanitize_code): Define SANITIZE_USE_AFTER_SCOPE.
* gimplify.c (build_asan_poison_call_expr): New function.
(asan_poison_variable): Likewise.
(gimplify_bind_expr): Generate poisoning/unpoisoning for local
variables that have address taken.
(gimplify_decl_expr): Likewise.
(gimplify_target_expr): Likewise for C++ temporaries.
(sort_by_decl_uid): New function.
(gimplify_expr): Unpoison all variables for a label we can jump
from outside of a scope.
(gimplify_switch_expr): Unpoison variables defined in the switch
context.
(gimplify_function_tree): Clear asan_poisoned_variables.
(asan_poison_variables): New function.
(warn_switch_unreachable_r): Handle IFN_ASAN_MARK.
* internal-fn.c (expand_ASAN_MARK): New function.
* internal-fn.def (ASAN_MARK): Declare.
* opts.c (finish_options): Handle -fstack-reuse if
-fsanitize-address-use-after-scope is enabled.
(common_handle_option): Enable address sanitization if
-fsanitize-address-use-after-scope is enabled.
* params.def (PARAM_USE_AFTER_SCOPE_DIRECT_EMISSION_THRESHOLD):
New parameter.
* params.h: Likewise.
* sancov.c (pass_sanopt::execute): Handle IFN_ASAN_MARK.
* sanitizer.def: Define __asan_poison_stack_memory and
__asan_unpoison_stack_memory functions.
* asan.c (asan_mark_poison_p): New function.
(transform_statements): Handle asan_mark_poison_p calls.
* gimple.c (nonfreeing_call_p): Handle IFN_ASAN_MARK.
From-SVN: r241896
gcc/ChangeLog:
PR tree-optimization/18046
* genmodes.c (emit_mode_size_inline): Emit an assert that
verifies that mode is a valid array index.
(emit_mode_nuinits_inline): Likewise.
(emit_mode_inner_inline): Likewise.
(emit_mode_unit_size_inline): Likewise.
(emit_mode_unit_precision_inline): Likewise.
* tree-vrp.c: Include params.h.
(find_switch_asserts): Register edge assertions for the default
label which correspond to the anti-ranges of each case label.
* params.def (PARAM_MAX_VRP_SWITCH_ASSERTIONS): New.
* doc/invoke.texi: Document it.
gcc/testsuite/ChangeLog:
PR tree-optimization/18046
* gcc.dg/tree-ssa/ssa-dom-thread-6.c: Bump FSM count to 5.
* gcc.dg/tree-ssa/vrp103.c: New test.
* gcc.dg/tree-ssa/vrp104.c: New test.
From-SVN: r238761
PR tree-optimization/56541
* doc/invoke.texi (@item max-tree-if-conversion-phi-args): New item.
* params.def (PARAM_MAX_TREE_IF_CONVERSION_PHI_ARGS): new param.
* tree-if-conv.c (MAX_PHI_ARG_NUM): new macro.
(any_complicated_phi): new static variable.
(aggressive_if_conv): delete.
(if_convertible_phi_p): support phis with more than two arguments.
(if_convertible_bb_p): remvoe check on aggressive_if_conv and
critical pred edges.
(ifcvt_split_critical_edges): support phis with more than two
arguments by checking new parameter. only split critical edges
if needed.
(tree_if_conversion): handle simd pragma marked loop using new
local variable aggressive_if_conv. check any_complicated_phi.
gcc/testsuite
PR tree-optimization/56541
* gcc.dg/tree-ssa/ifc-pr56541.c: new test.
* gcc.dg/vect/pr56541.c: new test.
From-SVN: r235808
2016-03-30 Michael Matz <matz@suse.de>
Richard Biener <rguenther@suse.de>
PR ipa/12392
* ipa-polymorphic-call.c (struct type_change_info): Change
speculative to an unsigned allowing to limit the work we do.
(csftc_abort_walking_p): New inline function..
(check_stmt_for_type_change): Limit the number of may-defs
skipped for speculative devirtualization to
max-speculative-devirt-maydefs.
* params.def (max-speculative-devirt-maydefs): New param.
* doc/invoke.texi (--param max-speculative-devirt-maydefs): Document.
Co-Authored-By: Richard Biener <rguenther@suse.de>
From-SVN: r234546
PR tree-optimization/68580
* params.def (FSM_MAXIMUM_PHI_ARGUMENTS): New param.
* tree-ssa-threadbackward.c
(fsm_find_control_statement_thread_paths): Do not try to walk
through large PHI nodes.
From-SVN: r233053
PR tree-optimization/68398
* params.def (PARAM_FSM_SCALE_PATH_STMTS): New parameter.
(PARAM_FSM_SCALE_PATH_BLOCKS): Likewise.
* tree-ssa-threadbackward.c (fsm_find_control_statement_thread_paths):
Only count PHIs in the last block in the path. The others will
const/copy propagate away. Add heuristic to allow more irreducible
subloops to be created when it is likely profitable to do so.
* tree-ssa-threadbackward.c (fsm_find_control_statement_thread_paths):
Fix typo in comment. Use gsi_after_labels and remove the GIMPLE_LABEL
check from within the loop. Use gsi_next_nondebug rather than gsi_next.
PR tree-optimization/68398
* gcc.dg/tree-ssa/pr66752-3.c: Update expected output.
* gcc.dg/tree-ssa/ssa-dom-thread-2c.c: Add extra statements on thread
path to avoid new heuristic allowing more irreducible regions
* gcc.dg/tree-ssa/ssa-dom-thread-2d.c: Likewise.
* gcc.dg/tree-ssa/vrp46.c: Likewise.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Update expected output.
* gcc.dg/tree-ssa/ssa-dom-thread-2g.c: New test.
* gcc.dg/tree-ssa/ssa-dom-thread-2h.c: Likewise.
From-SVN: r232897
2016-01-19 Martin Jambor <mjambor@suse.cz>
Martin Liska <mliska@suse.cz>
Michael Matz <matz@suse.de>
libgomp/
* plugin/Makefrag.am: Add HSA plugin requirements.
* plugin/configfrag.ac (HSA_RUNTIME_INCLUDE): New variable.
(HSA_RUNTIME_LIB): Likewise.
(HSA_RUNTIME_CPPFLAGS): Likewise.
(HSA_RUNTIME_INCLUDE): New substitution.
(HSA_RUNTIME_LIB): Likewise.
(HSA_RUNTIME_LDFLAGS): Likewise.
(hsa-runtime): New configure option.
(hsa-runtime-include): Likewise.
(hsa-runtime-lib): Likewise.
(PLUGIN_HSA): New substitution variable.
Fill HSA_RUNTIME_INCLUDE and HSA_RUNTIME_LIB according to the new
configure options.
(PLUGIN_HSA_CPPFLAGS): Likewise.
(PLUGIN_HSA_LDFLAGS): Likewise.
(PLUGIN_HSA_LIBS): Likewise.
Check that we have access to HSA run-time.
* libgomp-plugin.h (offload_target_type): New element
OFFLOAD_TARGET_TYPE_HSA.
* libgomp.h (gomp_target_task): New fields firstprivate_copies and
args.
(bool gomp_create_target_task): Updated.
(gomp_device_descr): Extra parameter of run_func and async_run_func,
new field can_run_func.
* libgomp_g.h (GOMP_target_ext): Update prototype.
* oacc-host.c (host_run): Added a new parameter args.
* target.c (calculate_firstprivate_requirements): New function.
(copy_firstprivate_data): Likewise.
(gomp_target_fallback_firstprivate): Use them.
(gomp_target_unshare_firstprivate): New function.
(gomp_get_target_fn_addr): Allow returning NULL for shared memory
devices.
(GOMP_target): Do host fallback for all shared memory devices. Do not
pass any args to plugins.
(GOMP_target_ext): Introduce device-specific argument parameter args.
Allow host fallback if device shares memory. Do not remap data if
device has shared memory.
(gomp_target_task_fn): Likewise. Also treat shared memory devices
like host fallback for mappings.
(GOMP_target_data): Treat shared memory devices like host fallback.
(GOMP_target_data_ext): Likewise.
(GOMP_target_update): Likewise.
(GOMP_target_update_ext): Likewise. Also pass NULL as args to
gomp_create_target_task.
(GOMP_target_enter_exit_data): Likewise.
(omp_target_alloc): Treat shared memory devices like host fallback.
(omp_target_free): Likewise.
(omp_target_is_present): Likewise.
(omp_target_memcpy): Likewise.
(omp_target_memcpy_rect): Likewise.
(omp_target_associate_ptr): Likewise.
(gomp_load_plugin_for_device): Also load can_run.
* task.c (GOMP_PLUGIN_target_task_completion): Free
firstprivate_copies.
(gomp_create_target_task): Accept new argument args and store it to
ttask.
* plugin/plugin-hsa.c: New file.
gcc/
* Makefile.in (OBJS): Add new source files.
(GTFILES): Add hsa.c.
* common.opt (disable_hsa): New variable.
(-Whsa): New warning.
* config.in (ENABLE_HSA): New.
* configure.ac: Treat hsa differently from other accelerators.
(OFFLOAD_TARGETS): Define ENABLE_OFFLOADING according to
$enable_offloading.
(ENABLE_HSA): Define ENABLE_HSA according to $enable_hsa.
* doc/install.texi (Configuration): Document --with-hsa-runtime,
--with-hsa-runtime-include, --with-hsa-runtime-lib and
--with-hsa-kmt-lib.
* doc/invoke.texi (-Whsa): Document.
(hsa-gen-debug-stores): Likewise.
* lto-wrapper.c (compile_images_for_offload_targets): Do not attempt
to invoke offload compiler for hsa acclerator.
* opts.c (common_handle_option): Determine whether HSA offloading
should be performed.
* params.def (PARAM_HSA_GEN_DEBUG_STORES): New parameter.
* builtin-types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New.
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed.
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New.
* gimple-low.c (lower_stmt): Also handle GIMPLE_OMP_GRID_BODY.
* gimple-pretty-print.c (dump_gimple_omp_for): Also handle
GF_OMP_FOR_KIND_GRID_LOOP.
(dump_gimple_omp_block): Also handle GIMPLE_OMP_GRID_BODY.
(pp_gimple_stmt_1): Likewise.
* gimple-walk.c (walk_gimple_stmt): Likewise.
* gimple.c (gimple_build_omp_grid_body): New function.
(gimple_copy): Also handle GIMPLE_OMP_GRID_BODY.
* gimple.def (GIMPLE_OMP_GRID_BODY): New.
* gimple.h (enum gf_mask): Added GF_OMP_PARALLEL_GRID_PHONY,
GF_OMP_FOR_KIND_GRID_LOOP, GF_OMP_FOR_GRID_PHONY and
GF_OMP_TEAMS_GRID_PHONY.
(gimple_statement_omp_single_layout): Updated comments.
(gimple_build_omp_grid_body): New function.
(gimple_has_substatements): Also handle GIMPLE_OMP_GRID_BODY.
(gimple_omp_for_grid_phony): New function.
(gimple_omp_for_set_grid_phony): Likewise.
(gimple_omp_parallel_grid_phony): Likewise.
(gimple_omp_parallel_set_grid_phony): Likewise.
(gimple_omp_teams_grid_phony): Likewise.
(gimple_omp_teams_set_grid_phony): Likewise.
(gimple_return_set_retbnd): Also handle GIMPLE_OMP_GRID_BODY.
* omp-builtins.def (BUILT_IN_GOMP_OFFLOAD_REGISTER): New.
(BUILT_IN_GOMP_OFFLOAD_UNREGISTER): Likewise.
(BUILT_IN_GOMP_TARGET): Updated type.
* omp-low.c: Include symbol-summary.h, hsa.h and params.h.
(adjust_for_condition): New function.
(get_omp_for_step_from_incr): Likewise.
(extract_omp_for_data): Moved parts to adjust_for_condition and
get_omp_for_step_from_incr.
(build_outer_var_ref): Handle GIMPLE_OMP_GRID_BODY.
(fixup_child_record_type): Bail out if receiver_decl is NULL.
(scan_sharing_clauses): Handle OMP_CLAUSE__GRIDDIM_.
(scan_omp_parallel): Do not create child functions for phony
constructs.
(check_omp_nesting_restrictions): Handle GIMPLE_OMP_GRID_BODY.
(scan_omp_1_op): Checking assert we are not remapping to
ERROR_MARK. Also also handle GIMPLE_OMP_GRID_BODY.
(parallel_needs_hsa_kernel_p): New function.
(expand_parallel_call): Register apprpriate parallel child
functions as HSA kernels.
(grid_launch_attributes_trees): New type.
(grid_attr_trees): New variable.
(grid_create_kernel_launch_attr_types): New function.
(grid_insert_store_range_dim): Likewise.
(grid_get_kernel_launch_attributes): Likewise.
(get_target_argument_identifier_1): Likewise.
(get_target_argument_identifier): Likewise.
(get_target_argument_value): Likewise.
(push_target_argument_according_to_value): Likewise.
(get_target_arguments): Likewise.
(expand_omp_target): Call get_target_arguments instead of looking
up for teams and thread limit.
(grid_expand_omp_for_loop): New function.
(grid_arg_decl_map): New type.
(grid_remap_kernel_arg_accesses): New function.
(grid_expand_target_kernel_body): New function.
(expand_omp): Call it.
(lower_omp_for): Do not emit phony constructs.
(lower_omp_taskreg): Do not emit phony constructs but create for them
a temporary variable receiver_decl.
(lower_omp_taskreg): Do not emit phony constructs.
(lower_omp_teams): Likewise.
(lower_omp_grid_body): New function.
(lower_omp_1): Call it.
(grid_reg_assignment_to_local_var_p): New function.
(grid_seq_only_contains_local_assignments): Likewise.
(grid_find_single_omp_among_assignments_1): Likewise.
(grid_find_single_omp_among_assignments): Likewise.
(grid_find_ungridifiable_statement): Likewise.
(grid_target_follows_gridifiable_pattern): Likewise.
(grid_remap_prebody_decls): Likewise.
(grid_copy_leading_local_assignments): Likewise.
(grid_process_kernel_body_copy): Likewise.
(grid_attempt_target_gridification): Likewise.
(grid_gridify_all_targets_stmt): Likewise.
(grid_gridify_all_targets): Likewise.
(execute_lower_omp): Call grid_gridify_all_targets.
(make_gimple_omp_edges): Handle GIMPLE_OMP_GRID_BODY.
* tree-core.h (omp_clause_code): Added OMP_CLAUSE__GRIDDIM_.
(tree_omp_clause): Added union field dimension.
* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE__GRIDDIM_.
* tree.c (omp_clause_num_ops): Added number of arguments of
OMP_CLAUSE__GRIDDIM_.
(omp_clause_code_name): Added name of OMP_CLAUSE__GRIDDIM_.
(walk_tree_1): Handle OMP_CLAUSE__GRIDDIM_.
* tree.h (OMP_CLAUSE_GRIDDIM_DIMENSION): New.
(OMP_CLAUSE_SET_GRIDDIM_DIMENSION): Likewise.
(OMP_CLAUSE_GRIDDIM_SIZE): Likewise.
(OMP_CLAUSE_GRIDDIM_GROUP): Likewise.
* passes.def: Schedule pass_ipa_hsa and pass_gen_hsail.
* tree-pass.h (make_pass_gen_hsail): Declare.
(make_pass_ipa_hsa): Likewise.
* ipa-hsa.c: New file.
* lto-section-in.c (lto_section_name): Add hsa section name.
* lto-streamer.h (lto_section_type): Add hsa section.
* timevar.def (TV_IPA_HSA): New.
* hsa-brig-format.h: New file.
* hsa-brig.c: New file.
* hsa-dump.c: Likewise.
* hsa-gen.c: Likewise.
* hsa.c: Likewise.
* hsa.h: Likewise.
* toplev.c (compile_file): Call hsa_output_brig.
* hsa-regalloc.c: New file.
gcc/fortran/
* types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New.
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed.
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New.
gcc/lto/
* lto-partition.c: Include "hsa.h"
(add_symbol_to_partition_1): Put hsa implementations into the
same partition as host implementations.
liboffloadmic/
* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_async_run): New
unused parameter.
(GOMP_OFFLOAD_run): Likewise.
include/
* gomp-constants.h (GOMP_DEVICE_HSA): New macro.
(GOMP_VERSION_HSA): Likewise.
(GOMP_TARGET_ARG_DEVICE_MASK): Likewise.
(GOMP_TARGET_ARG_DEVICE_ALL): Likewise.
(GOMP_TARGET_ARG_SUBSEQUENT_PARAM): Likewise.
(GOMP_TARGET_ARG_ID_MASK): Likewise.
(GOMP_TARGET_ARG_NUM_TEAMS): Likewise.
(GOMP_TARGET_ARG_THREAD_LIMIT): Likewise.
(GOMP_TARGET_ARG_VALUE_SHIFT): Likewise.
(GOMP_TARGET_ARG_HSA_KERNEL_ATTRIBUTES): Likewise.
From-SVN: r232549
gcc/
2016-01-11 Yuri Rumyantsev <ysrumyan@gmail.com>
PR rtl-optimization/68920
* config/i386/i386.c (ix86_option_override_internal): Restrict number
of conditional moves for RTL if-conversion to 1 for
TARGET_ONE_IF_CONV_INSN.
* config/i386/i386.h (TARGET_ONE_IF_CONV_INSN): New macros.
* config/i386/x86-tune.def (X86_TUNE_ONE_IF_CONV_INSN): New macros.
* params.def (PARAM_MAX_RTL_IF_CONVERSION_INSNS) : Introduce new
parameter to restirct number of conditional moves for
RTL if-conversion.
* doc/invoke.texi (max-rtl-if-conversion-insns): Document it.
* ifcvt.c (bb_ok_for_noce_convert_multiple_sets): Limit number of
conditionl moves.
gcc/testsuite/
2016-01-11 Yuri Rumyantsev <ysrumyan@gmail.com>
PR rtl-optimization/68920
* gcc.dg/ifcvt-4.c: Add "--param max-rtl-if-conversion-insns=3" option
for ix86 targets.
* gcc.dg/ifcvt-5.c: New test.
From-SVN: r232220
* gcc/cprop.c (is_too_expensive): Remove.
(gcse.h): Include.
(one_cprop_pass): Call gcse_or_cprop_is_too_expensive, not
is_too_expensive.
* gcc/gcse.h (gcse_or_cprop_is_too_expensive): Declare.
* gcc/gcse.c (is_too_expensive): Rename to ...
(gcse_or_cprop_is_too_expensive): ... this.
Expand warning to add required size of max-gcse-memory.
(one_pre_gcse_pass): Use it.
(one_code_hoisting_pass): Use it.
* gcc/params.def (max-gcse-memory): Increase from 50MB to 128MB.
From-SVN: r230276
* graphite-scop-detection.c (build_scops): Do not handle scops with more
than PARAM_GRAPHITE_MAX_ARRAYS_PER_SCOP arrays.
* params.def (PARAM_GRAPHITE_MAX_ARRAYS_PER_SCOP): New.
Co-Authored-By: Sebastian Pop <s.pop@samsung.com>
From-SVN: r229150
2015-10-13 Tom de Vries <tom@codesourcery.com>
PR tree-optimization/67476
* doc/invoke.texi (@item parloops-schedule): New item.
* params.def (PARAM_PARLOOPS_SCHEDULE): New DEFPARAMENUM5.
* tree-parloops.c: Include params-enum.h.
(create_parallel_loop): Handle PARAM_PARLOOPS_SCHEDULE.
* testsuite/libgomp.c/autopar-3.c: New test.
* testsuite/libgomp.c/autopar-4.c: New test.
* testsuite/libgomp.c/autopar-5.c: New test.
* testsuite/libgomp.c/autopar-6.c: New test.
* testsuite/libgomp.c/autopar-7.c: New test.
* testsuite/libgomp.c/autopar-8.c: New test.
From-SVN: r228756
The upcoming patch to move sqrt and cbrt simplifications to match.pd
caused a regression because the (abs @0)->@0 simplification didn't
trigger for:
(abs (convert (abs X)))
The simplification is based on tree_expr_nonnegative_p, which at
the moment just gives up if it sees an SSA_NAME.
This patch makes tree_expr_nonnegative_p recurse into SSA name
definitions, but limits the depth of recursion to a small number
for the reason mentioned in the comment (adapted from an existing
comment in gimple_val_nonnegative_real_p). The patch reuses code
in tree-vrp.c, moving it to gimple-fold.c. It also replaces calls
to gimple_val_nonnegative_real_p with calls to tree_expr_nonnegative_p.
A knock-on effect is that we can now prove _i_589 < 0 is false in
sequences like:
i_1917 = ASSERT_EXPR <i_1075, i_1075 == 0>;
_i_589 = (const int) i_1917;
_i_1507 = ASSERT_EXPR <_i_589, _i_589 < 0>;
This defeats an assert in tree-vrp.c that ASSERT_EXPR conditions
are never known to be false. Previously the assert only ever used
local knowledge and so would be limited to cases like x != x for
integer x. Now that we use global knowledge it's possible to prove
the assertion condition is false in blocks that are in practice
unreachable. The patch therefore removes the assert.
Bootstrapped & regression-tested on x86_64-linux-gnu. I didn't write
a specific test because this is already covered by the testsuite if
the follow-on patch is also applied.
gcc/
* params.def (PARAM_MAX_SSA_NAME_QUERY_DEPTH): New param.
* doc/invoke.texi (--param max-ssa-name-query-depth): Document.
* fold-const.h (tree_unary_nonnegative_warnv_p)
(tree_single_nonnegative_warnv_p, tree_call_nonnegative_warnv_p)
(tree_expr_nonnegative_warnv_p): Add depth parameters.
* fold-const.c: Include gimple-fold.h and params.h.
(tree_ssa_name_nonnegative_warnv_p): New function.
(tree_unary_nonnegative_warnv_p, tree_binary_nonnegative_warnv_p)
(tree_single_nonnegative_warnv_p, tree_call_nonnegative_warnv_p)
(tree_invalid_nonnegative_warnv_p, tree_expr_nonnegative_warnv_p):
Add a depth parameter and increment it for recursive calls to
tree_expr_nonnegative_warnv_p. Use tree_ssa_name_nonnegative_warnv_p
to handle SSA names.
* gimple-fold.h (gimple_val_nonnegative_real_p): Delete.
(gimple_stmt_nonnegative_warnv_p): Declare.
* tree-vrp.c (remove_range_assertions): Remove assert that condition
cannot be proven false.
(gimple_assign_nonnegative_warnv_p, gimple_call_nonnegative_warnv_p)
(gimple_stmt_nonnegative_warnv_p): Move to...
* gimple-fold.c: ...here. Add depth parameters and pass them
down to the tree routines. Accept statements that aren't
assignments or calls but just return false for them.
(gimple_val_nonnegative_real_p): Delete.
* tree-ssa-math-opts.c (gimple_expand_builtin_pow): Use
tree_expr_nonnegative_p instead of gimple_val_nonnegative_real_p.
Check HONOR_NANs first.
From-SVN: r228614
Redesign Graphite scop detection for faster compiler time and detecting more SCoPs.
Existing algorithm for SCoP detection in graphite was based on dominator tree
where a tree (CFG) traversal was required for analyzing an SESE. The tree
traversal is linear in the number of basic blocks and SCoP detection is
(probably) linear in number of instructions. That algorithm utilized a generic
infrastructure of SESE which does not directly represent loops. With regards to
graphite framework, we are only interested in subtrees with loops. The new
algorithm is geared towards tree traversal on loop structure. The algorithm is
linear in number of loops which is faster than the previous algorithm.
Briefly, we start the traversal at a loop-nest and analyze it recursively for
validity. Once a valid loop is found we find a valid adjacent loop. If an
adjacent loop is found and is valid, we merge both loop nests otherwise we form
a SCoP from the previous loop nest, and resume the algorithm from the adjacent
loop nest. The data structure to represent an SESE is an ordered pair of edges
(entry, exit). The new algoritm can extend a SCoP in both the directions. With
this approach, the number of instructions to be analyzed for validity reduces to
a minimal set. We start by analyzing those statements which are inside a loop,
because validity of those statements is necessary for the validity of loop. The
statements outside the loop nest can be just excluded from the SESE if they are
not valid.
This patch depends on: https://gcc.gnu.org/ml/gcc-patches/2015-09/msg02024.html
Passes (c,c++,fortran) regtest and bootstrap.
gcc/ChangeLog:
2015-09-27 Aditya Kumar <hiraditya@msn.com>
Sebastian Pop <s.pop@samsung.com>
* graphite-optimize-isl.c (optimize_isl):
* graphite-scop-detection.c (struct sese_l): New type.
(get_entry_bb): API for getting entry bb of SESE.
(get_exit_bb): API for getting exit bb of SESE.
(class debug_printer): New type. Simple printer in debug mode.
(trivially_empty_bb_p): New. Return true when BB is empty or
contains only debug instructions.
(graphite_can_represent_expr): Call scalar_evoution_in_region
instead of analyze_scalar_evolution. Pass in scop instead of only
the scop entry.
(stmt_has_simple_data_refs_p): Pass in scop instead of only the
scop entry.
(stmt_simple_for_scop_p): Same.
(harmful_stmt_in_bb): Same.
(graphite_can_represent_loop): Deleted.
(struct scopdet_info): Deleted.
(scopdet_basic_block_info): Deleted.
(build_scops_1): Deleted.
(bb_in_sd_region): Deleted.
(find_single_entry_edge): Deleted.
(find_single_exit_edge): Deleted.
(create_single_entry_edge): Deleted.
(sd_region_without_exit): Deleted.
(create_single_exit_edge): Deleted.
(unmark_exit_edges): Deleted.
(mark_exit_edges): Deleted.
(create_sese_edges): Deleted.
(build_graphite_scops): Deleted.
(canonicalize_loop_closed_ssa): Recompute all dominators at the
end.
(build_scops): Use the new scop_builder to build scops.
(dot_all_scops_1): Use the new pretty printer. Print loop father
as well.
(loop_body_is_valid_scop): New. Return true if loop body is a
valid scop.
(class scop_builder): New. Builds SCoPs for polyhedral
optimizatios.
(scop_builder): New. Constructor.
(static sese_l invalid_sese): sese_l with invalid edges.
(get_sese): Get an sese (from a loop) if possible, invalid_sese
otherwise.
(get_nearest_dom_with_single_entry): Get nearest dominator of a
basic_block with single entry. Return NULL if we get to the
beginning of a function.
(get_nearest_pdom_with_single_exit): Get nearest post-dominator of
a basic_block with single exit. Return NULL if we get to the
beginning of a function.
(print_sese): Pretty-print SESE.
(merge_sese): Merge two SESEs if possible and return the new SESE.
(build_scop_depth): Start building the SCoP within a loop nest.
(build_scop_breadth): Start building the SCoP at a single loop
depth. Merge adjacent SESEs if valid.
(can_represent_loop_1): Returns true if Graphite can represent
loop inside SCoP. Helper for can_represent_loop.
(can_represent_loop): Returns true if Graphite can represent LOOP
and all its nested loops in SCoP.
(loop_is_valid_scop): Returns true if LOOP and all its nests
constitute a valid SCoP.
(region_has_one_loop): Returns true of a region has only one loop.
(add_scop): Add SCoP to the list of valid scops. Removes an
already existing scop if it intersects with or subsumed by this
one.
(harmful_stmt_in_region): Returns true if SCoP has any statment
which cannot be represented by Graphite.
(subsumes): Returns true of SCoP S1 subsumes SCoP S2.
(remove_subscops): Remove any SCoP from the list of already found
SCoPs, if subsumed by S1.
(intersects): Return true if region bounded by SCoPs S1 and S2
intersect.
(remove_intersecting_scops): Remove any SCoP which intersects with
S1.
* graphite.c (print_graphite_scop_statistics):
(print_graphite_statistics): Print SCoP info while debugging.
(graphite_initialize): Early exit in case number of loops in a
function is less than PARAM_GRAPHITE_MIN_LOOPS_PER_FUNCTION or
basic blocks are more than PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION.
(graphite_finalize):
* params.def: Add PARAM_GRAPHITE_MIN_LOOPS_PER_FUNCTION.
* sese.h (sese_loop_depth): Remove unnecessary gcc_assert.
(recompute_all_dominators): Recalculate POST_DOMINATORS.
* tree-cfg.c (print_loops): Print the function name while printing
loops.
gcc/testsuite/ChangeLog:
2015-09-27 Aditya Kumar <hiraditya@msn.com>
Sebastian Pop <s.pop@samsung.com>
* gcc.dg/graphite/block-1.c: Modified to match the pattern.
* gcc.dg/graphite/block-3.c: Same.
* gcc.dg/graphite/block-4.c: Same.
* gcc.dg/graphite/block-5.c: Same.
* gcc.dg/graphite/block-6.c: Same.
* gcc.dg/graphite/block-7.c: Same.
* gcc.dg/graphite/block-8.c: Same.
* gcc.dg/graphite/block-pr47654.c: Same.
* gcc.dg/graphite/interchange-0.c: Same.
* gcc.dg/graphite/interchange-1.c: Same.
* gcc.dg/graphite/interchange-10.c: Same.
* gcc.dg/graphite/interchange-11.c: Same.
* gcc.dg/graphite/interchange-12.c: Same.
* gcc.dg/graphite/interchange-13.c: Same.
* gcc.dg/graphite/interchange-14.c: Same.
* gcc.dg/graphite/interchange-15.c: Same.
* gcc.dg/graphite/interchange-3.c: Same.
* gcc.dg/graphite/interchange-4.c: Same.
* gcc.dg/graphite/interchange-5.c: Same.
* gcc.dg/graphite/interchange-6.c: Same.
* gcc.dg/graphite/interchange-7.c: Same.
* gcc.dg/graphite/interchange-8.c: Same.
* gcc.dg/graphite/interchange-9.c: Same.
* gcc.dg/graphite/interchange-mvt.c: Same.
* gcc.dg/graphite/pr35356-1.c (foo): Same.
* gcc.dg/graphite/pr35356-3.c: Same.
* gcc.dg/graphite/pr37485.c: Same.
* gcc/testsuite/gcc.dg/graphite/run-id-pr67700-1.c: New test case.
* gcc.dg/graphite/scop-1.c (int toto): Modified to match the pattern.
* gcc.dg/graphite/scop-11.c: Same.
* gcc.dg/graphite/scop-5.c: Same.
* gcc.dg/graphite/uns-block-1.c: Same.
* gcc.dg/graphite/uns-interchange-9.c: Same.
* gfortran.dg/graphite/block-1.f90: Same.
* gfortran.dg/graphite/interchange-3.f90: Same.
* gfortran.dg/graphite/pr14741.f90: Same.
From-SVN: r228215
2015-09-02 Sebastian Pop <s.pop@samsung.com>
* config.in: Regenerate.
* configure: Regenerate.
* configure.ac (HAVE_ISL_CTX_MAX_OPERATIONS): Detect.
* graphite-optimize-isl.c (optimize_isl): Stop computation when
PARAM_MAX_ISL_OPERATIONS is reached.
* params.def (PARAM_MAX_ISL_OPERATIONS): Add.
* graphite-dependences.c (extend_schedule): Remove gcc_asserts on
result equal to isl_stat_ok as the status now can be isl_error_quota.
(subtract_commutative_associative_deps): Same.
(compute_deps): Same.
testsuite/
* gcc.dg/graphite/uns-interchange-12.c: Adjust pattern to pass with
both isl-0.12 and isl-0.15.
* gcc.dg/graphite/uns-interchange-14.c: Same.
* gcc.dg/graphite/uns-interchange-15.c: Same.
* gcc.dg/graphite/uns-interchange-mvt.c: Same.
From-SVN: r227572
2015-09-03 Tom de Vries <tom@codesourcery.com>
* doc/invoke.texi (parloops-chunk-size): Add item.
* params.def (PARAM_PARLOOPS_CHUNK_SIZE): Add DEFPARAM.
* tree-parloops.c: Include params.h.
(create_parallel_loop): Set chunk-size of schedule of omp-for loop, if
param parloops-chunk-size is used.
From-SVN: r227434
* params.def (PARAM_MAX_POW_SQRT_DEPTH): New param.
* tree-ssa-math-opts.c: Include params.h
(pow_synth_sqrt_info): New struct.
(representable_as_half_series_p): New function.
(get_fn_chain): Likewise.
(print_nested_fn): Likewise.
(dump_fractional_sqrt_sequence): Likewise.
(dump_integer_part): Likewise.
(expand_pow_as_sqrts): Likewise.
(gimple_expand_builtin_pow): Use above to attempt to expand
pow as series of square roots. Removed now unused variables.
* gcc.target/aarch64/pow-sqrt-synth-1.c: New test.
* gcc.dg/pow-sqrt.x: New file.
* gcc.dg/pow-sqrt-1.c: New test.
* gcc.dg/pow-sqrt-2.c: Likewise.
* gcc.dg/pow-sqrt-3.c: Likewise.
From-SVN: r223167
PR ipa/65478
* params.def (PARAM_IPA_CP_RECURSION_PENALTY) : New.
(PARAM_IPA_CP_SINGLE_CALL_PENALTY): Likewise.
* ipa-prop.h (ipa_node_params): New flags node_within_scc and
node_calling_single_call.
* ipa-cp.c (count_callers): New function.
(set_single_call_flag): Likewise.
(initialize_node_lattices): Count callers and set single_flag_call if
necessary.
(incorporate_penalties): New function.
(good_cloning_opportunity_p): Use it, dump new flags.
(propagate_constants_topo): Set node_within_scc flag if appropriate.
* doc/invoke.texi (ipa-cp-recursion-penalty,
ipa-cp-single-call-pentalty): Document.
From-SVN: r221763
2015-02-27 Vladimir Makarov <vmakarov@redhat.com>
* params.def (PARAM_LRA_INHERITANCE_EBB_PROBABILITY_CUTOFF): Fix
a typo in the description.
From-SVN: r221071
PR tree-optimization/63989
* params.def (PARAM_MAX_TRACKED_STRLENS): Increment default
from 1000 to 10000.
* tree-ssa-strlen.c (get_strinfo): Moved earlier.
(get_stridx): If we don't have a record for certain SSA_NAME,
but it is POINTER_PLUS_EXPR of some SSA_NAME we do with
constant offset, call get_stridx_plus_constant.
(get_stridx_plus_constant): New function.
(zero_length_string): Don't use get_stridx here.
* gcc.dg/strlenopt-27.c: New test.
From-SVN: r219362
* params.def (PARAM_MAX_COMPLETELY_PEELED_INSNS): Increase to 200.
* config/i386/i386.c (ix86_option_override_internal): Do not increase
PARAM_MAX_COMPLETELY_PEELED_INSNS.
From-SVN: r217971
2014-10-06 Rong Xu <xur@google.com>
* gcc/params.def (PARAM_INDIR_CALL_TOPN_PROFILE): New param.
* gcc/tree-profile.c: (params.h): New include.
(init_ic_make_global_vars): Make __gcov_indirect_call_topn_callee
and __gcov_indirect_call_topn_counters for
indirect_call_topn_profile.
(gimple_init_edge_profiler): New decls for
__gcov_indirect_call_topn_profiler.
(gimple_gen_ic_profiler): Generate the correct profiler call.
(gimple_gen_ic_func_profiler): Fix format.
* gcc/value-prof.c (params.h): New include.
(dump_histogram_value): Hanlde indirect_call_topn counters.
(stream_in_histogram_value): Ditto.
(gimple_indirect_call_to_profile): Use indirect_call_topn
profile when PARAM_INDIR_CALL_TOPN_PROFILE is set.
(gimple_find_values_to_profile): Hanlde indirect_call_topn
counters.
* gcc/value-prof.h (enum hist_type): Histrogram type for
indirect_call_topn counters.
* gcc/profile.c (instrument_values): Instrument
indirect_call_topn counters.
From-SVN: r215963