Merge of HSA

2016-01-19  Martin Jambor  <mjambor@suse.cz>
	    Martin Liska  <mliska@suse.cz>
	    Michael Matz <matz@suse.de>

libgomp/
	* plugin/Makefrag.am: Add HSA plugin requirements.
	* plugin/configfrag.ac (HSA_RUNTIME_INCLUDE): New variable.
	(HSA_RUNTIME_LIB): Likewise.
	(HSA_RUNTIME_CPPFLAGS): Likewise.
	(HSA_RUNTIME_INCLUDE): New substitution.
	(HSA_RUNTIME_LIB): Likewise.
	(HSA_RUNTIME_LDFLAGS): Likewise.
	(hsa-runtime): New configure option.
	(hsa-runtime-include): Likewise.
	(hsa-runtime-lib): Likewise.
	(PLUGIN_HSA): New substitution variable.
	Fill HSA_RUNTIME_INCLUDE and HSA_RUNTIME_LIB according to the new
	configure options.
	(PLUGIN_HSA_CPPFLAGS): Likewise.
	(PLUGIN_HSA_LDFLAGS): Likewise.
	(PLUGIN_HSA_LIBS): Likewise.
	Check that we have access to HSA run-time.
	* libgomp-plugin.h (offload_target_type): New element
	OFFLOAD_TARGET_TYPE_HSA.
	* libgomp.h (gomp_target_task): New fields firstprivate_copies and
	args.
	(bool gomp_create_target_task): Updated.
	(gomp_device_descr): Extra parameter of run_func and async_run_func,
	new field can_run_func.
	* libgomp_g.h (GOMP_target_ext): Update prototype.
	* oacc-host.c (host_run): Added a new parameter args.
	* target.c (calculate_firstprivate_requirements): New function.
	(copy_firstprivate_data): Likewise.
	(gomp_target_fallback_firstprivate): Use them.
	(gomp_target_unshare_firstprivate): New function.
	(gomp_get_target_fn_addr): Allow returning NULL for shared memory
	devices.
	(GOMP_target): Do host fallback for all shared memory devices.  Do not
	pass any args to plugins.
	(GOMP_target_ext): Introduce device-specific argument parameter args.
	Allow host fallback if device shares memory.  Do not remap data if
	device has shared memory.
	(gomp_target_task_fn): Likewise.  Also treat shared memory devices
	like host fallback for mappings.
	(GOMP_target_data): Treat shared memory devices like host fallback.
	(GOMP_target_data_ext): Likewise.
	(GOMP_target_update): Likewise.
	(GOMP_target_update_ext): Likewise.  Also pass NULL as args to
	gomp_create_target_task.
	(GOMP_target_enter_exit_data): Likewise.
	(omp_target_alloc): Treat shared memory devices like host fallback.
	(omp_target_free): Likewise.
	(omp_target_is_present): Likewise.
	(omp_target_memcpy): Likewise.
	(omp_target_memcpy_rect): Likewise.
	(omp_target_associate_ptr): Likewise.
	(gomp_load_plugin_for_device): Also load can_run.
	* task.c (GOMP_PLUGIN_target_task_completion): Free
	firstprivate_copies.
	(gomp_create_target_task): Accept new argument args and store it to
	ttask.
	* plugin/plugin-hsa.c: New file.

gcc/
	* Makefile.in (OBJS): Add new source files.
	(GTFILES): Add hsa.c.
	* common.opt (disable_hsa): New variable.
	(-Whsa): New warning.
	* config.in (ENABLE_HSA): New.
	* configure.ac: Treat hsa differently from other accelerators.
	(OFFLOAD_TARGETS): Define ENABLE_OFFLOADING according to
	$enable_offloading.
	(ENABLE_HSA): Define ENABLE_HSA according to $enable_hsa.
	* doc/install.texi (Configuration): Document --with-hsa-runtime,
	--with-hsa-runtime-include, --with-hsa-runtime-lib and
	--with-hsa-kmt-lib.
	* doc/invoke.texi (-Whsa): Document.
	(hsa-gen-debug-stores): Likewise.
	* lto-wrapper.c (compile_images_for_offload_targets): Do not attempt
	to invoke offload compiler for hsa acclerator.
	* opts.c (common_handle_option): Determine whether HSA offloading
	should be performed.
	* params.def (PARAM_HSA_GEN_DEBUG_STORES): New parameter.
	* builtin-types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New.
	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed.
	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New.
	* gimple-low.c (lower_stmt): Also handle GIMPLE_OMP_GRID_BODY.
	* gimple-pretty-print.c (dump_gimple_omp_for): Also handle
	GF_OMP_FOR_KIND_GRID_LOOP.
	(dump_gimple_omp_block): Also handle GIMPLE_OMP_GRID_BODY.
	(pp_gimple_stmt_1): Likewise.
	* gimple-walk.c (walk_gimple_stmt): Likewise.
	* gimple.c (gimple_build_omp_grid_body): New function.
	(gimple_copy): Also handle GIMPLE_OMP_GRID_BODY.
	* gimple.def (GIMPLE_OMP_GRID_BODY): New.
	* gimple.h (enum gf_mask): Added GF_OMP_PARALLEL_GRID_PHONY,
	GF_OMP_FOR_KIND_GRID_LOOP, GF_OMP_FOR_GRID_PHONY and
	GF_OMP_TEAMS_GRID_PHONY.
	(gimple_statement_omp_single_layout): Updated comments.
	(gimple_build_omp_grid_body): New function.
	(gimple_has_substatements): Also handle GIMPLE_OMP_GRID_BODY.
	(gimple_omp_for_grid_phony): New function.
	(gimple_omp_for_set_grid_phony): Likewise.
	(gimple_omp_parallel_grid_phony): Likewise.
	(gimple_omp_parallel_set_grid_phony): Likewise.
	(gimple_omp_teams_grid_phony): Likewise.
	(gimple_omp_teams_set_grid_phony): Likewise.
	(gimple_return_set_retbnd): Also handle GIMPLE_OMP_GRID_BODY.
	* omp-builtins.def (BUILT_IN_GOMP_OFFLOAD_REGISTER): New.
	(BUILT_IN_GOMP_OFFLOAD_UNREGISTER): Likewise.
	(BUILT_IN_GOMP_TARGET): Updated type.
	* omp-low.c: Include symbol-summary.h, hsa.h and params.h.
	(adjust_for_condition): New function.
	(get_omp_for_step_from_incr): Likewise.
	(extract_omp_for_data): Moved parts to adjust_for_condition and
	get_omp_for_step_from_incr.
	(build_outer_var_ref): Handle GIMPLE_OMP_GRID_BODY.
	(fixup_child_record_type): Bail out if receiver_decl is NULL.
	(scan_sharing_clauses): Handle OMP_CLAUSE__GRIDDIM_.
	(scan_omp_parallel): Do not create child functions for phony
	constructs.
	(check_omp_nesting_restrictions): Handle GIMPLE_OMP_GRID_BODY.
	(scan_omp_1_op): Checking assert we are not remapping to
	ERROR_MARK.  Also also handle GIMPLE_OMP_GRID_BODY.
	(parallel_needs_hsa_kernel_p): New function.
	(expand_parallel_call): Register apprpriate parallel child
	functions as HSA kernels.
	(grid_launch_attributes_trees): New type.
	(grid_attr_trees): New variable.
	(grid_create_kernel_launch_attr_types): New function.
	(grid_insert_store_range_dim): Likewise.
	(grid_get_kernel_launch_attributes): Likewise.
	(get_target_argument_identifier_1): Likewise.
	(get_target_argument_identifier): Likewise.
	(get_target_argument_value): Likewise.
	(push_target_argument_according_to_value): Likewise.
	(get_target_arguments): Likewise.
	(expand_omp_target): Call get_target_arguments instead of looking
	up for teams and thread limit.
	(grid_expand_omp_for_loop): New function.
	(grid_arg_decl_map): New type.
	(grid_remap_kernel_arg_accesses): New function.
	(grid_expand_target_kernel_body): New function.
	(expand_omp): Call it.
	(lower_omp_for): Do not emit phony constructs.
	(lower_omp_taskreg): Do not emit phony constructs but create for them
	a temporary variable receiver_decl.
	(lower_omp_taskreg): Do not emit phony constructs.
	(lower_omp_teams): Likewise.
	(lower_omp_grid_body): New function.
	(lower_omp_1): Call it.
	(grid_reg_assignment_to_local_var_p): New function.
	(grid_seq_only_contains_local_assignments): Likewise.
	(grid_find_single_omp_among_assignments_1): Likewise.
	(grid_find_single_omp_among_assignments): Likewise.
	(grid_find_ungridifiable_statement): Likewise.
	(grid_target_follows_gridifiable_pattern): Likewise.
	(grid_remap_prebody_decls): Likewise.
	(grid_copy_leading_local_assignments): Likewise.
	(grid_process_kernel_body_copy): Likewise.
	(grid_attempt_target_gridification): Likewise.
	(grid_gridify_all_targets_stmt): Likewise.
	(grid_gridify_all_targets): Likewise.
	(execute_lower_omp): Call grid_gridify_all_targets.
	(make_gimple_omp_edges): Handle GIMPLE_OMP_GRID_BODY.
	* tree-core.h (omp_clause_code): Added OMP_CLAUSE__GRIDDIM_.
	(tree_omp_clause): Added union field dimension.
	* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE__GRIDDIM_.
	* tree.c (omp_clause_num_ops): Added number of arguments of
	OMP_CLAUSE__GRIDDIM_.
	(omp_clause_code_name): Added name of OMP_CLAUSE__GRIDDIM_.
	(walk_tree_1): Handle OMP_CLAUSE__GRIDDIM_.
	* tree.h (OMP_CLAUSE_GRIDDIM_DIMENSION): New.
	(OMP_CLAUSE_SET_GRIDDIM_DIMENSION): Likewise.
	(OMP_CLAUSE_GRIDDIM_SIZE): Likewise.
	(OMP_CLAUSE_GRIDDIM_GROUP): Likewise.
	* passes.def: Schedule pass_ipa_hsa and pass_gen_hsail.
	* tree-pass.h (make_pass_gen_hsail): Declare.
	(make_pass_ipa_hsa): Likewise.
	* ipa-hsa.c: New file.
	* lto-section-in.c (lto_section_name): Add hsa section name.
	* lto-streamer.h (lto_section_type): Add hsa section.
	* timevar.def (TV_IPA_HSA): New.
        * hsa-brig-format.h: New file.
	* hsa-brig.c: New file.
	* hsa-dump.c: Likewise.
	* hsa-gen.c: Likewise.
	* hsa.c: Likewise.
	* hsa.h: Likewise.
	* toplev.c (compile_file): Call hsa_output_brig.
	* hsa-regalloc.c: New file.

gcc/fortran/
	* types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New.
	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed.
	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New.

gcc/lto/
	* lto-partition.c: Include "hsa.h"
	(add_symbol_to_partition_1): Put hsa implementations into the
	same partition as host implementations.

liboffloadmic/
	* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_async_run): New
	unused parameter.
	(GOMP_OFFLOAD_run): Likewise.

include/
	* gomp-constants.h (GOMP_DEVICE_HSA): New macro.
	(GOMP_VERSION_HSA): Likewise.
	(GOMP_TARGET_ARG_DEVICE_MASK): Likewise.
	(GOMP_TARGET_ARG_DEVICE_ALL): Likewise.
	(GOMP_TARGET_ARG_SUBSEQUENT_PARAM): Likewise.
	(GOMP_TARGET_ARG_ID_MASK): Likewise.
	(GOMP_TARGET_ARG_NUM_TEAMS): Likewise.
	(GOMP_TARGET_ARG_THREAD_LIMIT): Likewise.
	(GOMP_TARGET_ARG_VALUE_SHIFT): Likewise.
	(GOMP_TARGET_ARG_HSA_KERNEL_ATTRIBUTES): Likewise.

From-SVN: r232549
This commit is contained in:
Martin Jambor 2016-01-19 11:35:10 +01:00
parent 2bedb645f2
commit b2b4005150
60 changed files with 18449 additions and 198 deletions

View File

@ -1,3 +1,135 @@
2016-01-19 Martin Jambor <mjambor@suse.cz>
Martin Liska <mliska@suse.cz>
Michael Matz <matz@suse.de>
* Makefile.in (OBJS): Add new source files.
(GTFILES): Add hsa.c.
* common.opt (disable_hsa): New variable.
(-Whsa): New warning.
* config.in (ENABLE_HSA): New.
* configure.ac: Treat hsa differently from other accelerators.
(OFFLOAD_TARGETS): Define ENABLE_OFFLOADING according to
$enable_offloading.
(ENABLE_HSA): Define ENABLE_HSA according to $enable_hsa.
* doc/install.texi (Configuration): Document --with-hsa-runtime,
--with-hsa-runtime-include, --with-hsa-runtime-lib and
--with-hsa-kmt-lib.
* doc/invoke.texi (-Whsa): Document.
(hsa-gen-debug-stores): Likewise.
* lto-wrapper.c (compile_images_for_offload_targets): Do not attempt
to invoke offload compiler for hsa acclerator.
* opts.c (common_handle_option): Determine whether HSA offloading
should be performed.
* params.def (PARAM_HSA_GEN_DEBUG_STORES): New parameter.
* builtin-types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New.
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed.
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New.
* gimple-low.c (lower_stmt): Also handle GIMPLE_OMP_GRID_BODY.
* gimple-pretty-print.c (dump_gimple_omp_for): Also handle
GF_OMP_FOR_KIND_GRID_LOOP.
(dump_gimple_omp_block): Also handle GIMPLE_OMP_GRID_BODY.
(pp_gimple_stmt_1): Likewise.
* gimple-walk.c (walk_gimple_stmt): Likewise.
* gimple.c (gimple_build_omp_grid_body): New function.
(gimple_copy): Also handle GIMPLE_OMP_GRID_BODY.
* gimple.def (GIMPLE_OMP_GRID_BODY): New.
* gimple.h (enum gf_mask): Added GF_OMP_PARALLEL_GRID_PHONY,
GF_OMP_FOR_KIND_GRID_LOOP, GF_OMP_FOR_GRID_PHONY and
GF_OMP_TEAMS_GRID_PHONY.
(gimple_statement_omp_single_layout): Updated comments.
(gimple_build_omp_grid_body): New function.
(gimple_has_substatements): Also handle GIMPLE_OMP_GRID_BODY.
(gimple_omp_for_grid_phony): New function.
(gimple_omp_for_set_grid_phony): Likewise.
(gimple_omp_parallel_grid_phony): Likewise.
(gimple_omp_parallel_set_grid_phony): Likewise.
(gimple_omp_teams_grid_phony): Likewise.
(gimple_omp_teams_set_grid_phony): Likewise.
(gimple_return_set_retbnd): Also handle GIMPLE_OMP_GRID_BODY.
* omp-builtins.def (BUILT_IN_GOMP_OFFLOAD_REGISTER): New.
(BUILT_IN_GOMP_OFFLOAD_UNREGISTER): Likewise.
(BUILT_IN_GOMP_TARGET): Updated type.
* omp-low.c: Include symbol-summary.h, hsa.h and params.h.
(adjust_for_condition): New function.
(get_omp_for_step_from_incr): Likewise.
(extract_omp_for_data): Moved parts to adjust_for_condition and
get_omp_for_step_from_incr.
(build_outer_var_ref): Handle GIMPLE_OMP_GRID_BODY.
(fixup_child_record_type): Bail out if receiver_decl is NULL.
(scan_sharing_clauses): Handle OMP_CLAUSE__GRIDDIM_.
(scan_omp_parallel): Do not create child functions for phony
constructs.
(check_omp_nesting_restrictions): Handle GIMPLE_OMP_GRID_BODY.
(scan_omp_1_op): Checking assert we are not remapping to
ERROR_MARK. Also also handle GIMPLE_OMP_GRID_BODY.
(parallel_needs_hsa_kernel_p): New function.
(expand_parallel_call): Register apprpriate parallel child
functions as HSA kernels.
(grid_launch_attributes_trees): New type.
(grid_attr_trees): New variable.
(grid_create_kernel_launch_attr_types): New function.
(grid_insert_store_range_dim): Likewise.
(grid_get_kernel_launch_attributes): Likewise.
(get_target_argument_identifier_1): Likewise.
(get_target_argument_identifier): Likewise.
(get_target_argument_value): Likewise.
(push_target_argument_according_to_value): Likewise.
(get_target_arguments): Likewise.
(expand_omp_target): Call get_target_arguments instead of looking
up for teams and thread limit.
(grid_expand_omp_for_loop): New function.
(grid_arg_decl_map): New type.
(grid_remap_kernel_arg_accesses): New function.
(grid_expand_target_kernel_body): New function.
(expand_omp): Call it.
(lower_omp_for): Do not emit phony constructs.
(lower_omp_taskreg): Do not emit phony constructs but create for them
a temporary variable receiver_decl.
(lower_omp_taskreg): Do not emit phony constructs.
(lower_omp_teams): Likewise.
(lower_omp_grid_body): New function.
(lower_omp_1): Call it.
(grid_reg_assignment_to_local_var_p): New function.
(grid_seq_only_contains_local_assignments): Likewise.
(grid_find_single_omp_among_assignments_1): Likewise.
(grid_find_single_omp_among_assignments): Likewise.
(grid_find_ungridifiable_statement): Likewise.
(grid_target_follows_gridifiable_pattern): Likewise.
(grid_remap_prebody_decls): Likewise.
(grid_copy_leading_local_assignments): Likewise.
(grid_process_kernel_body_copy): Likewise.
(grid_attempt_target_gridification): Likewise.
(grid_gridify_all_targets_stmt): Likewise.
(grid_gridify_all_targets): Likewise.
(execute_lower_omp): Call grid_gridify_all_targets.
(make_gimple_omp_edges): Handle GIMPLE_OMP_GRID_BODY.
* tree-core.h (omp_clause_code): Added OMP_CLAUSE__GRIDDIM_.
(tree_omp_clause): Added union field dimension.
* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE__GRIDDIM_.
* tree.c (omp_clause_num_ops): Added number of arguments of
OMP_CLAUSE__GRIDDIM_.
(omp_clause_code_name): Added name of OMP_CLAUSE__GRIDDIM_.
(walk_tree_1): Handle OMP_CLAUSE__GRIDDIM_.
* tree.h (OMP_CLAUSE_GRIDDIM_DIMENSION): New.
(OMP_CLAUSE_SET_GRIDDIM_DIMENSION): Likewise.
(OMP_CLAUSE_GRIDDIM_SIZE): Likewise.
(OMP_CLAUSE_GRIDDIM_GROUP): Likewise.
* passes.def: Schedule pass_ipa_hsa and pass_gen_hsail.
* tree-pass.h (make_pass_gen_hsail): Declare.
(make_pass_ipa_hsa): Likewise.
* ipa-hsa.c: New file.
* lto-section-in.c (lto_section_name): Add hsa section name.
* lto-streamer.h (lto_section_type): Add hsa section.
* timevar.def (TV_IPA_HSA): New.
* hsa-brig-format.h: New file.
* hsa-brig.c: New file.
* hsa-dump.c: Likewise.
* hsa-gen.c: Likewise.
* hsa.c: Likewise.
* hsa.h: Likewise.
* toplev.c (compile_file): Call hsa_output_brig.
* hsa-regalloc.c: New file.
2016-01-18 Jeff Law <law@redhat.com>
PR tree-optimization/69320

View File

@ -1297,6 +1297,11 @@ OBJS = \
graphite-sese-to-poly.o \
gtype-desc.o \
haifa-sched.o \
hsa.o \
hsa-gen.o \
hsa-regalloc.o \
hsa-brig.o \
hsa-dump.o \
hw-doloop.o \
hwint.o \
ifcvt.o \
@ -1321,6 +1326,7 @@ OBJS = \
ipa-icf.o \
ipa-icf-gimple.o \
ipa-reference.o \
ipa-hsa.o \
ipa-ref.o \
ipa-utils.o \
ipa.o \
@ -2404,6 +2410,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h $(srcdir)/coretypes.h \
$(srcdir)/sancov.c \
$(srcdir)/ipa-devirt.c \
$(srcdir)/internal-fn.h \
$(srcdir)/hsa.c \
@all_gtfiles@
# Compute the list of GT header files from the corresponding C sources,

View File

@ -478,6 +478,8 @@ DEF_FUNCTION_TYPE_4 (BT_FN_BOOL_UINT_LONGPTR_LONGPTR_LONGPTR,
DEF_FUNCTION_TYPE_4 (BT_FN_BOOL_UINT_ULLPTR_ULLPTR_ULLPTR,
BT_BOOL, BT_UINT, BT_PTR_ULONGLONG, BT_PTR_ULONGLONG,
BT_PTR_ULONGLONG)
DEF_FUNCTION_TYPE_4 (BT_FN_VOID_UINT_PTR_INT_PTR, BT_VOID, BT_INT, BT_PTR,
BT_INT, BT_PTR)
DEF_FUNCTION_TYPE_5 (BT_FN_INT_STRING_INT_SIZE_CONST_STRING_VALIST_ARG,
BT_INT, BT_STRING, BT_INT, BT_SIZE, BT_CONST_STRING,
@ -555,10 +557,9 @@ DEF_FUNCTION_TYPE_9 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
BT_BOOL, BT_UINT, BT_PTR, BT_INT)
DEF_FUNCTION_TYPE_10 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT,
BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_INT, BT_INT)
DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR)
DEF_FUNCTION_TYPE_11 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_UINT_LONG_INT_LONG_LONG_LONG,
BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,

View File

@ -239,6 +239,10 @@ Inserts call to __sanitizer_cov_trace_pc into every basic block.
Variable
bool dump_base_name_prefixed = false
; Flag whether HSA generation has been explicitely disabled
Variable
bool flag_disable_hsa = false
###
Driver
@ -593,6 +597,10 @@ Wfree-nonheap-object
Common Var(warn_free_nonheap_object) Init(1) Warning
Warn when attempting to free a non-heap object.
Whsa
Common Var(warn_hsa) Init(1) Warning
Warn when a function cannot be expanded to HSAIL.
Winline
Common Var(warn_inline) Warning
Warn when an inlined function cannot be inlined.

View File

@ -144,6 +144,12 @@
#endif
/* Define this to enable support for generating HSAIL. */
#ifndef USED_FOR_TARGET
#undef ENABLE_HSA
#endif
/* Define if gcc should always pass --build-id to linker. */
#ifndef USED_FOR_TARGET
#undef ENABLE_LD_BUILDID

19
gcc/configure vendored
View File

@ -7700,6 +7700,13 @@ fi
for tgt in `echo $enable_offload_targets | sed 's/,/ /g'`; do
tgt=`echo $tgt | sed 's/=.*//'`
if echo "$tgt" | grep "^hsa" > /dev/null ; then
enable_hsa=1
else
enable_offloading=1
fi
if test x"$offload_targets" = x; then
offload_targets=$tgt
else
@ -7711,7 +7718,7 @@ cat >>confdefs.h <<_ACEOF
#define OFFLOAD_TARGETS "$offload_targets"
_ACEOF
if test x"$offload_targets" != x; then
if test x"$enable_offloading" != x; then
$as_echo "#define ENABLE_OFFLOADING 1" >>confdefs.h
@ -7721,6 +7728,12 @@ $as_echo "#define ENABLE_OFFLOADING 0" >>confdefs.h
fi
if test x"$enable_hsa" = x1 ; then
$as_echo "#define ENABLE_HSA 1" >>confdefs.h
fi
# Check whether --with-multilib-list was given.
if test "${with_multilib_list+set}" = set; then :
@ -18406,7 +18419,7 @@ else
lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
lt_status=$lt_dlunknown
cat > conftest.$ac_ext <<_LT_EOF
#line 18409 "configure"
#line 18422 "configure"
#include "confdefs.h"
#if HAVE_DLFCN_H
@ -18512,7 +18525,7 @@ else
lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
lt_status=$lt_dlunknown
cat > conftest.$ac_ext <<_LT_EOF
#line 18515 "configure"
#line 18528 "configure"
#include "confdefs.h"
#if HAVE_DLFCN_H

View File

@ -940,6 +940,13 @@ AC_SUBST(accel_dir_suffix)
for tgt in `echo $enable_offload_targets | sed 's/,/ /g'`; do
tgt=`echo $tgt | sed 's/=.*//'`
if echo "$tgt" | grep "^hsa" > /dev/null ; then
enable_hsa=1
else
enable_offloading=1
fi
if test x"$offload_targets" = x; then
offload_targets=$tgt
else
@ -948,7 +955,7 @@ for tgt in `echo $enable_offload_targets | sed 's/,/ /g'`; do
done
AC_DEFINE_UNQUOTED(OFFLOAD_TARGETS, "$offload_targets",
[Define to offload targets, separated by commas.])
if test x"$offload_targets" != x; then
if test x"$enable_offloading" != x; then
AC_DEFINE(ENABLE_OFFLOADING, 1,
[Define this to enable support for offloading.])
else
@ -956,6 +963,11 @@ else
[Define this to enable support for offloading.])
fi
if test x"$enable_hsa" = x1 ; then
AC_DEFINE(ENABLE_HSA, 1,
[Define this to enable support for generating HSAIL.])
fi
AC_ARG_WITH(multilib-list,
[AS_HELP_STRING([--with-multilib-list], [select multilibs (AArch64, SH and x86-64 only)])],
:,

View File

@ -1992,6 +1992,28 @@ specifying paths @var{path1}, @dots{}, @var{pathN}.
% @var{srcdir}/configure \
--enable-offload-target=i686-unknown-linux-gnu=/path/to/i686/compiler,x86_64-pc-linux-gnu
@end smallexample
If @samp{hsa} is specified as one of the targets, the compiler will be
built with support for HSA GPU accelerators. Because the same
compiler will emit the accelerator code, no path should be specified.
@item --with-hsa-runtime=@var{pathname}
@itemx --with-hsa-runtime-include=@var{pathname}
@itemx --with-hsa-runtime-lib=@var{pathname}
If you configure GCC with HSA offloading but do not have the HSA
run-time library installed in a standard location then you can
explicitly specify the directory where they are installed. The
@option{--with-hsa-runtime=@/@var{hsainstalldir}} option is a
shorthand for
@option{--with-hsa-runtime-lib=@/@var{hsainstalldir}/lib} and
@option{--with-hsa-runtime-include=@/@var{hsainstalldir}/include}.
@item --with-hsa-kmt-lib=@var{pathname}
If you configure GCC with HSA offloading but do not have the HSA
KMT library installed in a standard location then you can
explicitly specify the directory where it resides.
@end table
@subheading Cross-Compiler-Specific Options

View File

@ -305,7 +305,7 @@ Objective-C and Objective-C++ Dialects}.
-Wunused-but-set-parameter -Wunused-but-set-variable @gol
-Wuseless-cast -Wvariadic-macros -Wvector-operation-performance @gol
-Wvla -Wvolatile-register-var -Wwrite-strings @gol
-Wzero-as-null-pointer-constant}
-Wzero-as-null-pointer-constant -Whsa}
@item C and Objective-C-only Warning Options
@gccoptlist{-Wbad-function-cast -Wmissing-declarations @gol
@ -5693,6 +5693,10 @@ Suppress warnings when a positional initializer is used to initialize
a structure that has been marked with the @code{designated_init}
attribute.
@item -Whsa
Issue a warning when HSAIL cannot be emitted for the compiled function or
OpenMP construct.
@end table
@node Debugging Options
@ -9508,6 +9512,12 @@ dynamic, guided, auto, runtime). The default is static.
Maximum depth of recursion when querying properties of SSA names in things
like fold routines. One level of recursion corresponds to following a
use-def chain.
@item hsa-gen-debug-stores
Enable emission of special debug stores within HSA kernels which are
then read and reported by libgomp plugin. Generation of these stores
is disabled by default, use @option{--param hsa-gen-debug-stores=1} to
enable it.
@end table
@end table

View File

@ -1,3 +1,9 @@
2016-01-19 Martin Jambor <mjambor@suse.cz>
* types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New.
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed.
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New.
2016-01-15 Paul Thomas <pault@gcc.gnu.org>
PR fortran/64324

View File

@ -159,6 +159,8 @@ DEF_FUNCTION_TYPE_4 (BT_FN_BOOL_UINT_LONGPTR_LONGPTR_LONGPTR,
DEF_FUNCTION_TYPE_4 (BT_FN_BOOL_UINT_ULLPTR_ULLPTR_ULLPTR,
BT_BOOL, BT_UINT, BT_PTR_ULONGLONG, BT_PTR_ULONGLONG,
BT_PTR_ULONGLONG)
DEF_FUNCTION_TYPE_4 (BT_FN_VOID_UINT_PTR_INT_PTR, BT_VOID, BT_INT, BT_PTR,
BT_INT, BT_PTR)
DEF_FUNCTION_TYPE_5 (BT_FN_VOID_OMPFN_PTR_UINT_UINT_UINT,
BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, BT_UINT, BT_UINT,
@ -220,10 +222,9 @@ DEF_FUNCTION_TYPE_9 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
BT_BOOL, BT_UINT, BT_PTR, BT_INT)
DEF_FUNCTION_TYPE_10 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT,
DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_INT, BT_INT)
BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR)
DEF_FUNCTION_TYPE_11 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_UINT_LONG_INT_LONG_LONG_LONG,
BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,

View File

@ -358,6 +358,7 @@ lower_stmt (gimple_stmt_iterator *gsi, struct lower_data *data)
case GIMPLE_OMP_TASK:
case GIMPLE_OMP_TARGET:
case GIMPLE_OMP_TEAMS:
case GIMPLE_OMP_GRID_BODY:
data->cannot_fallthru = false;
lower_omp_directive (gsi, data);
data->cannot_fallthru = false;

View File

@ -1187,6 +1187,9 @@ dump_gimple_omp_for (pretty_printer *buffer, gomp_for *gs, int spc, int flags)
case GF_OMP_FOR_KIND_CILKSIMD:
pp_string (buffer, "#pragma simd");
break;
case GF_OMP_FOR_KIND_GRID_LOOP:
pp_string (buffer, "#pragma omp for grid_loop");
break;
default:
gcc_unreachable ();
}
@ -1494,6 +1497,9 @@ dump_gimple_omp_block (pretty_printer *buffer, gimple *gs, int spc, int flags)
case GIMPLE_OMP_SECTION:
pp_string (buffer, "#pragma omp section");
break;
case GIMPLE_OMP_GRID_BODY:
pp_string (buffer, "#pragma omp gridified body");
break;
default:
gcc_unreachable ();
}
@ -2301,6 +2307,7 @@ pp_gimple_stmt_1 (pretty_printer *buffer, gimple *gs, int spc, int flags)
case GIMPLE_OMP_MASTER:
case GIMPLE_OMP_TASKGROUP:
case GIMPLE_OMP_SECTION:
case GIMPLE_OMP_GRID_BODY:
dump_gimple_omp_block (buffer, gs, spc, flags);
break;

View File

@ -655,6 +655,7 @@ walk_gimple_stmt (gimple_stmt_iterator *gsi, walk_stmt_fn callback_stmt,
case GIMPLE_OMP_SINGLE:
case GIMPLE_OMP_TARGET:
case GIMPLE_OMP_TEAMS:
case GIMPLE_OMP_GRID_BODY:
ret = walk_gimple_seq_mod (gimple_omp_body_ptr (stmt), callback_stmt,
callback_op, wi);
if (ret)

View File

@ -954,6 +954,19 @@ gimple_build_omp_master (gimple_seq body)
return p;
}
/* Build a GIMPLE_OMP_GRID_BODY statement.
BODY is the sequence of statements to be executed by the kernel. */
gimple *
gimple_build_omp_grid_body (gimple_seq body)
{
gimple *p = gimple_alloc (GIMPLE_OMP_GRID_BODY, 0);
if (body)
gimple_omp_set_body (p, body);
return p;
}
/* Build a GIMPLE_OMP_TASKGROUP statement.
@ -1807,6 +1820,7 @@ gimple_copy (gimple *stmt)
case GIMPLE_OMP_SECTION:
case GIMPLE_OMP_MASTER:
case GIMPLE_OMP_TASKGROUP:
case GIMPLE_OMP_GRID_BODY:
copy_omp_body:
new_seq = gimple_seq_copy (gimple_omp_body (stmt));
gimple_omp_set_body (copy, new_seq);

View File

@ -376,6 +376,10 @@ DEFGSCODE(GIMPLE_OMP_TEAMS, "gimple_omp_teams", GSS_OMP_SINGLE_LAYOUT)
CLAUSES is an OMP_CLAUSE chain holding the associated clauses. */
DEFGSCODE(GIMPLE_OMP_ORDERED, "gimple_omp_ordered", GSS_OMP_SINGLE_LAYOUT)
/* GIMPLE_OMP_GRID_BODY <BODY> represents a parallel loop lowered for execution
on a GPU. It is an artificial statement created by omp lowering. */
DEFGSCODE(GIMPLE_OMP_GRID_BODY, "gimple_omp_gpukernel", GSS_OMP)
/* GIMPLE_PREDICT <PREDICT, OUTCOME> specifies a hint for branch prediction.
PREDICT is one of the predictors from predict.def.

View File

@ -146,6 +146,7 @@ enum gf_mask {
GF_CALL_CTRL_ALTERING = 1 << 7,
GF_CALL_WITH_BOUNDS = 1 << 8,
GF_OMP_PARALLEL_COMBINED = 1 << 0,
GF_OMP_PARALLEL_GRID_PHONY = 1 << 1,
GF_OMP_TASK_TASKLOOP = 1 << 0,
GF_OMP_FOR_KIND_MASK = (1 << 4) - 1,
GF_OMP_FOR_KIND_FOR = 0,
@ -153,12 +154,14 @@ enum gf_mask {
GF_OMP_FOR_KIND_TASKLOOP = 2,
GF_OMP_FOR_KIND_CILKFOR = 3,
GF_OMP_FOR_KIND_OACC_LOOP = 4,
GF_OMP_FOR_KIND_GRID_LOOP = 5,
/* Flag for SIMD variants of OMP_FOR kinds. */
GF_OMP_FOR_SIMD = 1 << 3,
GF_OMP_FOR_KIND_SIMD = GF_OMP_FOR_SIMD | 0,
GF_OMP_FOR_KIND_CILKSIMD = GF_OMP_FOR_SIMD | 1,
GF_OMP_FOR_COMBINED = 1 << 4,
GF_OMP_FOR_COMBINED_INTO = 1 << 5,
GF_OMP_FOR_GRID_PHONY = 1 << 6,
GF_OMP_TARGET_KIND_MASK = (1 << 4) - 1,
GF_OMP_TARGET_KIND_REGION = 0,
GF_OMP_TARGET_KIND_DATA = 1,
@ -172,6 +175,7 @@ enum gf_mask {
GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA = 9,
GF_OMP_TARGET_KIND_OACC_DECLARE = 10,
GF_OMP_TARGET_KIND_OACC_HOST_DATA = 11,
GF_OMP_TEAMS_GRID_PHONY = 1 << 0,
/* True on an GIMPLE_OMP_RETURN statement if the return does not require
a thread synchronization via some sort of barrier. The exact barrier
@ -733,7 +737,7 @@ struct GTY((tag("GSS_OMP_SINGLE_LAYOUT")))
{
/* [ WORD 1-7 ] : base class */
/* [ WORD 7 ] */
/* [ WORD 8 ] */
tree clauses;
};
@ -1454,6 +1458,7 @@ gomp_task *gimple_build_omp_task (gimple_seq, tree, tree, tree, tree,
tree, tree);
gimple *gimple_build_omp_section (gimple_seq);
gimple *gimple_build_omp_master (gimple_seq);
gimple *gimple_build_omp_grid_body (gimple_seq);
gimple *gimple_build_omp_taskgroup (gimple_seq);
gomp_continue *gimple_build_omp_continue (tree, tree);
gomp_ordered *gimple_build_omp_ordered (gimple_seq, tree);
@ -1714,6 +1719,7 @@ gimple_has_substatements (gimple *g)
case GIMPLE_OMP_CRITICAL:
case GIMPLE_WITH_CLEANUP_EXPR:
case GIMPLE_TRANSACTION:
case GIMPLE_OMP_GRID_BODY:
return true;
default:
@ -5079,6 +5085,24 @@ gimple_omp_for_set_pre_body (gimple *gs, gimple_seq pre_body)
omp_for_stmt->pre_body = pre_body;
}
/* Return the kernel_phony of OMP_FOR statement. */
static inline bool
gimple_omp_for_grid_phony (const gomp_for *omp_for)
{
return (gimple_omp_subcode (omp_for) & GF_OMP_FOR_GRID_PHONY) != 0;
}
/* Set kernel_phony flag of OMP_FOR to VALUE. */
static inline void
gimple_omp_for_set_grid_phony (gomp_for *omp_for, bool value)
{
if (value)
omp_for->subcode |= GF_OMP_FOR_GRID_PHONY;
else
omp_for->subcode &= ~GF_OMP_FOR_GRID_PHONY;
}
/* Return the clauses associated with OMP_PARALLEL GS. */
@ -5165,6 +5189,24 @@ gimple_omp_parallel_set_data_arg (gomp_parallel *omp_parallel_stmt,
omp_parallel_stmt->data_arg = data_arg;
}
/* Return the kernel_phony flag of OMP_PARALLEL_STMT. */
static inline bool
gimple_omp_parallel_grid_phony (const gomp_parallel *stmt)
{
return (gimple_omp_subcode (stmt) & GF_OMP_PARALLEL_GRID_PHONY) != 0;
}
/* Set kernel_phony flag of OMP_PARALLEL_STMT to VALUE. */
static inline void
gimple_omp_parallel_set_grid_phony (gomp_parallel *stmt, bool value)
{
if (value)
stmt->subcode |= GF_OMP_PARALLEL_GRID_PHONY;
else
stmt->subcode &= ~GF_OMP_PARALLEL_GRID_PHONY;
}
/* Return the clauses associated with OMP_TASK GS. */
@ -5638,6 +5680,24 @@ gimple_omp_teams_set_clauses (gomp_teams *omp_teams_stmt, tree clauses)
omp_teams_stmt->clauses = clauses;
}
/* Return the kernel_phony flag of an OMP_TEAMS_STMT. */
static inline bool
gimple_omp_teams_grid_phony (const gomp_teams *omp_teams_stmt)
{
return (gimple_omp_subcode (omp_teams_stmt) & GF_OMP_TEAMS_GRID_PHONY) != 0;
}
/* Set kernel_phony flag of an OMP_TEAMS_STMT to VALUE. */
static inline void
gimple_omp_teams_set_grid_phony (gomp_teams *omp_teams_stmt, bool value)
{
if (value)
omp_teams_stmt->subcode |= GF_OMP_TEAMS_GRID_PHONY;
else
omp_teams_stmt->subcode &= ~GF_OMP_TEAMS_GRID_PHONY;
}
/* Return the clauses associated with OMP_SECTIONS GS. */
@ -6002,7 +6062,8 @@ gimple_return_set_retbnd (gimple *gs, tree retval)
case GIMPLE_OMP_RETURN: \
case GIMPLE_OMP_ATOMIC_LOAD: \
case GIMPLE_OMP_ATOMIC_STORE: \
case GIMPLE_OMP_CONTINUE
case GIMPLE_OMP_CONTINUE: \
case GIMPLE_OMP_GRID_BODY
static inline bool
is_gimple_omp (const gimple *stmt)

1234
gcc/hsa-brig-format.h Normal file

File diff suppressed because it is too large Load Diff

2560
gcc/hsa-brig.c Normal file

File diff suppressed because it is too large Load Diff

1189
gcc/hsa-dump.c Normal file

File diff suppressed because it is too large Load Diff

6151
gcc/hsa-gen.c Normal file

File diff suppressed because it is too large Load Diff

719
gcc/hsa-regalloc.c Normal file
View File

@ -0,0 +1,719 @@
/* HSAIL IL Register allocation and out-of-SSA.
Copyright (C) 2013-2016 Free Software Foundation, Inc.
Contributed by Michael Matz <matz@suse.de>
This file is part of GCC.
GCC is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3, or (at your option)
any later version.
GCC is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with GCC; see the file COPYING3. If not see
<http://www.gnu.org/licenses/>. */
#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "tm.h"
#include "is-a.h"
#include "vec.h"
#include "tree.h"
#include "dominance.h"
#include "cfg.h"
#include "cfganal.h"
#include "function.h"
#include "bitmap.h"
#include "dumpfile.h"
#include "cgraph.h"
#include "print-tree.h"
#include "cfghooks.h"
#include "symbol-summary.h"
#include "hsa.h"
/* Process a PHI node PHI of basic block BB as a part of naive out-f-ssa. */
static void
naive_process_phi (hsa_insn_phi *phi)
{
unsigned count = phi->operand_count ();
for (unsigned i = 0; i < count; i++)
{
gcc_checking_assert (phi->get_op (i));
hsa_op_base *op = phi->get_op (i);
hsa_bb *hbb;
edge e;
if (!op)
break;
e = EDGE_PRED (phi->m_bb, i);
if (single_succ_p (e->src))
hbb = hsa_bb_for_bb (e->src);
else
{
basic_block old_dest = e->dest;
hbb = hsa_init_new_bb (split_edge (e));
/* If switch insn used this edge, fix jump table. */
hsa_bb *source = hsa_bb_for_bb (e->src);
hsa_insn_sbr *sbr;
if (source->m_last_insn
&& (sbr = dyn_cast <hsa_insn_sbr *> (source->m_last_insn)))
sbr->replace_all_labels (old_dest, hbb->m_bb);
}
hsa_build_append_simple_mov (phi->m_dest, op, hbb);
}
}
/* Naive out-of SSA. */
static void
naive_outof_ssa (void)
{
basic_block bb;
hsa_cfun->m_in_ssa = false;
FOR_ALL_BB_FN (bb, cfun)
{
hsa_bb *hbb = hsa_bb_for_bb (bb);
hsa_insn_phi *phi;
for (phi = hbb->m_first_phi;
phi;
phi = phi->m_next ? as_a <hsa_insn_phi *> (phi->m_next) : NULL)
naive_process_phi (phi);
/* Zap PHI nodes, they will be deallocated when everything else will. */
hbb->m_first_phi = NULL;
hbb->m_last_phi = NULL;
}
}
/* Return register class number for the given HSA TYPE. 0 means the 'c' one
bit register class, 1 means 's' 32 bit class, 2 stands for 'd' 64 bit class
and 3 for 'q' 128 bit class. */
static int
m_reg_class_for_type (BrigType16_t type)
{
switch (type)
{
case BRIG_TYPE_B1:
return 0;
case BRIG_TYPE_U8:
case BRIG_TYPE_U16:
case BRIG_TYPE_U32:
case BRIG_TYPE_S8:
case BRIG_TYPE_S16:
case BRIG_TYPE_S32:
case BRIG_TYPE_F16:
case BRIG_TYPE_F32:
case BRIG_TYPE_B8:
case BRIG_TYPE_B16:
case BRIG_TYPE_B32:
case BRIG_TYPE_U8X4:
case BRIG_TYPE_S8X4:
case BRIG_TYPE_U16X2:
case BRIG_TYPE_S16X2:
case BRIG_TYPE_F16X2:
return 1;
case BRIG_TYPE_U64:
case BRIG_TYPE_S64:
case BRIG_TYPE_F64:
case BRIG_TYPE_B64:
case BRIG_TYPE_U8X8:
case BRIG_TYPE_S8X8:
case BRIG_TYPE_U16X4:
case BRIG_TYPE_S16X4:
case BRIG_TYPE_F16X4:
case BRIG_TYPE_U32X2:
case BRIG_TYPE_S32X2:
case BRIG_TYPE_F32X2:
return 2;
case BRIG_TYPE_B128:
case BRIG_TYPE_U8X16:
case BRIG_TYPE_S8X16:
case BRIG_TYPE_U16X8:
case BRIG_TYPE_S16X8:
case BRIG_TYPE_F16X8:
case BRIG_TYPE_U32X4:
case BRIG_TYPE_U64X2:
case BRIG_TYPE_S32X4:
case BRIG_TYPE_S64X2:
case BRIG_TYPE_F32X4:
case BRIG_TYPE_F64X2:
return 3;
default:
gcc_unreachable ();
}
}
/* If the Ith operands of INSN is or contains a register (in an address),
return the address of that register operand. If not return NULL. */
static hsa_op_reg **
insn_reg_addr (hsa_insn_basic *insn, int i)
{
hsa_op_base *op = insn->get_op (i);
if (!op)
return NULL;
hsa_op_reg *reg = dyn_cast <hsa_op_reg *> (op);
if (reg)
return (hsa_op_reg **) insn->get_op_addr (i);
hsa_op_address *addr = dyn_cast <hsa_op_address *> (op);
if (addr && addr->m_reg)
return &addr->m_reg;
return NULL;
}
struct m_reg_class_desc
{
unsigned next_avail, max_num;
unsigned used_num, max_used;
uint64_t used[2];
char cl_char;
};
/* Rewrite the instructions in BB to observe spilled live ranges.
CLASSES is the global register class state. */
static void
rewrite_code_bb (basic_block bb, struct m_reg_class_desc *classes)
{
hsa_bb *hbb = hsa_bb_for_bb (bb);
hsa_insn_basic *insn, *next_insn;
for (insn = hbb->m_first_insn; insn; insn = next_insn)
{
next_insn = insn->m_next;
unsigned count = insn->operand_count ();
for (unsigned i = 0; i < count; i++)
{
gcc_checking_assert (insn->get_op (i));
hsa_op_reg **regaddr = insn_reg_addr (insn, i);
if (regaddr)
{
hsa_op_reg *reg = *regaddr;
if (reg->m_reg_class)
continue;
gcc_assert (reg->m_spill_sym);
int cl = m_reg_class_for_type (reg->m_type);
hsa_op_reg *tmp, *tmp2;
if (insn->op_output_p (i))
tmp = hsa_spill_out (insn, reg, &tmp2);
else
tmp = hsa_spill_in (insn, reg, &tmp2);
*regaddr = tmp;
tmp->m_reg_class = classes[cl].cl_char;
tmp->m_hard_num = (char) (classes[cl].max_num + i);
if (tmp2)
{
gcc_assert (cl == 0);
tmp2->m_reg_class = classes[1].cl_char;
tmp2->m_hard_num = (char) (classes[1].max_num + i);
}
}
}
}
}
/* Dump current function to dump file F, with info specific
to register allocation. */
void
dump_hsa_cfun_regalloc (FILE *f)
{
basic_block bb;
fprintf (f, "\nHSAIL IL for %s\n", hsa_cfun->m_name);
FOR_ALL_BB_FN (bb, cfun)
{
hsa_bb *hbb = (struct hsa_bb *) bb->aux;
bitmap_print (dump_file, hbb->m_livein, "m_livein ", "\n");
dump_hsa_bb (f, hbb);
bitmap_print (dump_file, hbb->m_liveout, "m_liveout ", "\n");
}
}
/* Given the global register allocation state CLASSES and a
register REG, try to give it a hardware register. If successful,
store that hardreg in REG and return it, otherwise return -1.
Also changes CLASSES to accommodate for the allocated register. */
static int
try_alloc_reg (struct m_reg_class_desc *classes, hsa_op_reg *reg)
{
int cl = m_reg_class_for_type (reg->m_type);
int ret = -1;
if (classes[1].used_num + classes[2].used_num * 2 + classes[3].used_num * 4
>= 128 - 5)
return -1;
if (classes[cl].used_num < classes[cl].max_num)
{
unsigned int i;
classes[cl].used_num++;
if (classes[cl].used_num > classes[cl].max_used)
classes[cl].max_used = classes[cl].used_num;
for (i = 0; i < classes[cl].used_num; i++)
if (! (classes[cl].used[i / 64] & (((uint64_t)1) << (i & 63))))
break;
ret = i;
classes[cl].used[i / 64] |= (((uint64_t)1) << (i & 63));
reg->m_reg_class = classes[cl].cl_char;
reg->m_hard_num = i;
}
return ret;
}
/* Free up hardregs used by REG, into allocation state CLASSES. */
static void
free_reg (struct m_reg_class_desc *classes, hsa_op_reg *reg)
{
int cl = m_reg_class_for_type (reg->m_type);
int ret = reg->m_hard_num;
gcc_assert (reg->m_reg_class == classes[cl].cl_char);
classes[cl].used_num--;
classes[cl].used[ret / 64] &= ~(((uint64_t)1) << (ret & 63));
}
/* Note that the live range for REG ends at least at END. */
static void
note_lr_end (hsa_op_reg *reg, int end)
{
if (reg->m_lr_end < end)
reg->m_lr_end = end;
}
/* Note that the live range for REG starts at least at BEGIN. */
static void
note_lr_begin (hsa_op_reg *reg, int begin)
{
if (reg->m_lr_begin > begin)
reg->m_lr_begin = begin;
}
/* Given two registers A and B, return -1, 0 or 1 if A's live range
starts before, at or after B's live range. */
static int
cmp_begin (const void *a, const void *b)
{
const hsa_op_reg * const *rega = (const hsa_op_reg * const *)a;
const hsa_op_reg * const *regb = (const hsa_op_reg * const *)b;
int ret;
if (rega == regb)
return 0;
ret = (*rega)->m_lr_begin - (*regb)->m_lr_begin;
if (ret)
return ret;
return ((*rega)->m_order - (*regb)->m_order);
}
/* Given two registers REGA and REGB, return true if REGA's
live range ends after REGB's. This results in a sorting order
with earlier end points at the end. */
static bool
cmp_end (hsa_op_reg * const &rega, hsa_op_reg * const &regb)
{
int ret;
if (rega == regb)
return false;
ret = (regb)->m_lr_end - (rega)->m_lr_end;
if (ret)
return ret < 0;
return (((regb)->m_order - (rega)->m_order)) < 0;
}
/* Expire all old intervals in ACTIVE (a per-regclass vector),
that is, those that end before the interval REG starts. Give
back resources freed so into the state CLASSES. */
static void
expire_old_intervals (hsa_op_reg *reg, vec<hsa_op_reg*> *active,
struct m_reg_class_desc *classes)
{
for (int i = 0; i < 4; i++)
while (!active[i].is_empty ())
{
hsa_op_reg *a = active[i].pop ();
if (a->m_lr_end > reg->m_lr_begin)
{
active[i].quick_push (a);
break;
}
free_reg (classes, a);
}
}
/* The interval REG didn't get a hardreg. Spill it or one of those
from ACTIVE (if the latter, then REG will become allocated to the
hardreg that formerly was used by it). */
static void
spill_at_interval (hsa_op_reg *reg, vec<hsa_op_reg*> *active)
{
int cl = m_reg_class_for_type (reg->m_type);
gcc_assert (!active[cl].is_empty ());
hsa_op_reg *cand = active[cl][0];
if (cand->m_lr_end > reg->m_lr_end)
{
reg->m_reg_class = cand->m_reg_class;
reg->m_hard_num = cand->m_hard_num;
active[cl].ordered_remove (0);
unsigned place = active[cl].lower_bound (reg, cmp_end);
active[cl].quick_insert (place, reg);
}
else
cand = reg;
gcc_assert (!cand->m_spill_sym);
BrigType16_t type = cand->m_type;
if (type == BRIG_TYPE_B1)
type = BRIG_TYPE_U8;
cand->m_reg_class = 0;
cand->m_spill_sym = hsa_get_spill_symbol (type);
cand->m_spill_sym->m_name_number = cand->m_order;
}
/* Given the global register state CLASSES allocate all HSA virtual
registers either to hardregs or to a spill symbol. */
static void
linear_scan_regalloc (struct m_reg_class_desc *classes)
{
/* Compute liveness. */
bool changed;
int i, n;
int insn_order;
int *bbs = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
bitmap work = BITMAP_ALLOC (NULL);
vec<hsa_op_reg*> ind2reg = vNULL;
vec<hsa_op_reg*> active[4] = {vNULL, vNULL, vNULL, vNULL};
hsa_insn_basic *m_last_insn;
/* We will need the reverse post order for linearization,
and the post order for liveness analysis, which is the same
backward. */
n = pre_and_rev_post_order_compute (NULL, bbs, true);
ind2reg.safe_grow_cleared (hsa_cfun->m_reg_count);
/* Give all instructions a linearized number, at the same time
build a mapping from register index to register. */
insn_order = 1;
for (i = 0; i < n; i++)
{
basic_block bb = BASIC_BLOCK_FOR_FN (cfun, bbs[i]);
hsa_bb *hbb = hsa_bb_for_bb (bb);
hsa_insn_basic *insn;
for (insn = hbb->m_first_insn; insn; insn = insn->m_next)
{
unsigned opi;
insn->m_number = insn_order++;
for (opi = 0; opi < insn->operand_count (); opi++)
{
gcc_checking_assert (insn->get_op (opi));
hsa_op_reg **regaddr = insn_reg_addr (insn, opi);
if (regaddr)
ind2reg[(*regaddr)->m_order] = *regaddr;
}
}
}
/* Initialize all live ranges to [after-end, 0). */
for (i = 0; i < hsa_cfun->m_reg_count; i++)
if (ind2reg[i])
ind2reg[i]->m_lr_begin = insn_order, ind2reg[i]->m_lr_end = 0;
/* Classic liveness analysis, as long as something changes:
m_liveout is union (m_livein of successors)
m_livein is m_liveout minus defs plus uses. */
do
{
changed = false;
for (i = n - 1; i >= 0; i--)
{
edge e;
edge_iterator ei;
basic_block bb = BASIC_BLOCK_FOR_FN (cfun, bbs[i]);
hsa_bb *hbb = hsa_bb_for_bb (bb);
/* Union of successors m_livein (or empty if none). */
bool first = true;
FOR_EACH_EDGE (e, ei, bb->succs)
if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun))
{
hsa_bb *succ = hsa_bb_for_bb (e->dest);
if (first)
{
bitmap_copy (work, succ->m_livein);
first = false;
}
else
bitmap_ior_into (work, succ->m_livein);
}
if (first)
bitmap_clear (work);
bitmap_copy (hbb->m_liveout, work);
/* Remove defs, include uses in a backward insn walk. */
hsa_insn_basic *insn;
for (insn = hbb->m_last_insn; insn; insn = insn->m_prev)
{
unsigned opi;
unsigned ndefs = insn->input_count ();
for (opi = 0; opi < ndefs && insn->get_op (opi); opi++)
{
gcc_checking_assert (insn->get_op (opi));
hsa_op_reg **regaddr = insn_reg_addr (insn, opi);
if (regaddr)
bitmap_clear_bit (work, (*regaddr)->m_order);
}
for (; opi < insn->operand_count (); opi++)
{
gcc_checking_assert (insn->get_op (opi));
hsa_op_reg **regaddr = insn_reg_addr (insn, opi);
if (regaddr)
bitmap_set_bit (work, (*regaddr)->m_order);
}
}
/* Note if that changed something. */
if (bitmap_ior_into (hbb->m_livein, work))
changed = true;
}
}
while (changed);
/* Make one pass through all instructions in linear order,
noting and merging possible live range start and end points. */
m_last_insn = NULL;
for (i = n - 1; i >= 0; i--)
{
basic_block bb = BASIC_BLOCK_FOR_FN (cfun, bbs[i]);
hsa_bb *hbb = hsa_bb_for_bb (bb);
hsa_insn_basic *insn;
int after_end_number;
unsigned bit;
bitmap_iterator bi;
if (m_last_insn)
after_end_number = m_last_insn->m_number;
else
after_end_number = insn_order;
/* Everything live-out in this BB has at least an end point
after us. */
EXECUTE_IF_SET_IN_BITMAP (hbb->m_liveout, 0, bit, bi)
note_lr_end (ind2reg[bit], after_end_number);
for (insn = hbb->m_last_insn; insn; insn = insn->m_prev)
{
unsigned opi;
unsigned ndefs = insn->input_count ();
for (opi = 0; opi < insn->operand_count (); opi++)
{
gcc_checking_assert (insn->get_op (opi));
hsa_op_reg **regaddr = insn_reg_addr (insn, opi);
if (regaddr)
{
hsa_op_reg *reg = *regaddr;
if (opi < ndefs)
note_lr_begin (reg, insn->m_number);
else
note_lr_end (reg, insn->m_number);
}
}
}
/* Everything live-in in this BB has a start point before
our first insn. */
int before_start_number;
if (hbb->m_first_insn)
before_start_number = hbb->m_first_insn->m_number;
else
before_start_number = after_end_number;
before_start_number--;
EXECUTE_IF_SET_IN_BITMAP (hbb->m_livein, 0, bit, bi)
note_lr_begin (ind2reg[bit], before_start_number);
if (hbb->m_first_insn)
m_last_insn = hbb->m_first_insn;
}
for (i = 0; i < hsa_cfun->m_reg_count; i++)
if (ind2reg[i])
{
/* All regs that have still their start at after all code actually
are defined at the start of the routine (prologue). */
if (ind2reg[i]->m_lr_begin == insn_order)
ind2reg[i]->m_lr_begin = 0;
/* All regs that have no use but a def will have lr_end == 0,
they are actually live from def until after the insn they are
defined in. */
if (ind2reg[i]->m_lr_end == 0)
ind2reg[i]->m_lr_end = ind2reg[i]->m_lr_begin + 1;
}
/* Sort all intervals by increasing start point. */
gcc_assert (ind2reg.length () == (size_t) hsa_cfun->m_reg_count);
#ifdef ENABLE_CHECKING
for (unsigned i = 0; i < ind2reg.length (); i++)
gcc_assert (ind2reg[i]);
#endif
ind2reg.qsort (cmp_begin);
for (i = 0; i < 4; i++)
active[i].reserve_exact (hsa_cfun->m_reg_count);
/* Now comes the linear scan allocation. */
for (i = 0; i < hsa_cfun->m_reg_count; i++)
{
hsa_op_reg *reg = ind2reg[i];
if (!reg)
continue;
expire_old_intervals (reg, active, classes);
int cl = m_reg_class_for_type (reg->m_type);
if (try_alloc_reg (classes, reg) >= 0)
{
unsigned place = active[cl].lower_bound (reg, cmp_end);
active[cl].quick_insert (place, reg);
}
else
spill_at_interval (reg, active);
/* Some interesting dumping as we go. */
if (dump_file)
{
fprintf (dump_file, " reg%d: [%5d, %5d)->",
reg->m_order, reg->m_lr_begin, reg->m_lr_end);
if (reg->m_reg_class)
fprintf (dump_file, "$%c%i", reg->m_reg_class, reg->m_hard_num);
else
fprintf (dump_file, "[%%__%s_%i]",
hsa_seg_name (reg->m_spill_sym->m_segment),
reg->m_spill_sym->m_name_number);
for (int cl = 0; cl < 4; cl++)
{
bool first = true;
hsa_op_reg *r;
fprintf (dump_file, " {");
for (int j = 0; active[cl].iterate (j, &r); j++)
if (first)
{
fprintf (dump_file, "%d", r->m_order);
first = false;
}
else
fprintf (dump_file, ", %d", r->m_order);
fprintf (dump_file, "}");
}
fprintf (dump_file, "\n");
}
}
BITMAP_FREE (work);
free (bbs);
if (dump_file)
{
fprintf (dump_file, "------- After liveness: -------\n");
dump_hsa_cfun_regalloc (dump_file);
fprintf (dump_file, " ----- Intervals:\n");
for (i = 0; i < hsa_cfun->m_reg_count; i++)
{
hsa_op_reg *reg = ind2reg[i];
if (!reg)
continue;
fprintf (dump_file, " reg%d: [%5d, %5d)->", reg->m_order,
reg->m_lr_begin, reg->m_lr_end);
if (reg->m_reg_class)
fprintf (dump_file, "$%c%i\n", reg->m_reg_class, reg->m_hard_num);
else
fprintf (dump_file, "[%%__%s_%i]\n",
hsa_seg_name (reg->m_spill_sym->m_segment),
reg->m_spill_sym->m_name_number);
}
}
for (i = 0; i < 4; i++)
active[i].release ();
ind2reg.release ();
}
/* Entry point for register allocation. */
static void
regalloc (void)
{
basic_block bb;
m_reg_class_desc classes[4];
/* If there are no registers used in the function, exit right away. */
if (hsa_cfun->m_reg_count == 0)
return;
memset (classes, 0, sizeof (classes));
classes[0].next_avail = 0;
classes[0].max_num = 7;
classes[0].cl_char = 'c';
classes[1].cl_char = 's';
classes[2].cl_char = 'd';
classes[3].cl_char = 'q';
for (int i = 1; i < 4; i++)
{
classes[i].next_avail = 0;
classes[i].max_num = 20;
}
linear_scan_regalloc (classes);
FOR_ALL_BB_FN (bb, cfun)
rewrite_code_bb (bb, classes);
}
/* Out of SSA and register allocation on HSAIL IL. */
void
hsa_regalloc (void)
{
naive_outof_ssa ();
if (dump_file)
{
fprintf (dump_file, "------- After out-of-SSA: -------\n");
dump_hsa_cfun (dump_file);
}
regalloc ();
if (dump_file)
{
fprintf (dump_file, "------- After register allocation: -------\n");
dump_hsa_cfun (dump_file);
}
}

947
gcc/hsa.c Normal file
View File

@ -0,0 +1,947 @@
/* Implementation of commonly needed HSAIL related functions and methods.
Copyright (C) 2013-2016 Free Software Foundation, Inc.
Contributed by Martin Jambor <mjambor@suse.cz> and
Martin Liska <mliska@suse.cz>.
This file is part of GCC.
GCC is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3, or (at your option)
any later version.
GCC is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with GCC; see the file COPYING3. If not see
<http://www.gnu.org/licenses/>. */
#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "tm.h"
#include "is-a.h"
#include "hash-set.h"
#include "hash-map.h"
#include "vec.h"
#include "tree.h"
#include "dumpfile.h"
#include "gimple-pretty-print.h"
#include "diagnostic-core.h"
#include "alloc-pool.h"
#include "cgraph.h"
#include "print-tree.h"
#include "stringpool.h"
#include "symbol-summary.h"
#include "hsa.h"
#include "internal-fn.h"
#include "ctype.h"
/* Structure containing intermediate HSA representation of the generated
function. */
class hsa_function_representation *hsa_cfun;
/* Element of the mapping vector between a host decl and an HSA kernel. */
struct GTY(()) hsa_decl_kernel_map_element
{
/* The decl of the host function. */
tree decl;
/* Name of the HSA kernel in BRIG. */
char * GTY((skip)) name;
/* Size of OMP data, if the kernel contains a kernel dispatch. */
unsigned omp_data_size;
/* True if the function is gridified kernel. */
bool gridified_kernel_p;
};
/* Mapping between decls and corresponding HSA kernels in this compilation
unit. */
static GTY (()) vec<hsa_decl_kernel_map_element, va_gc>
*hsa_decl_kernel_mapping;
/* Mapping between decls and corresponding HSA kernels
called by the function. */
hash_map <tree, vec <const char *> *> *hsa_decl_kernel_dependencies;
/* Hash function to lookup a symbol for a decl. */
hash_table <hsa_noop_symbol_hasher> *hsa_global_variable_symbols;
/* HSA summaries. */
hsa_summary_t *hsa_summaries = NULL;
/* HSA number of threads. */
hsa_symbol *hsa_num_threads = NULL;
/* HSA function that cannot be expanded to HSAIL. */
hash_set <tree> *hsa_failed_functions = NULL;
/* True if compilation unit-wide data are already allocated and initialized. */
static bool compilation_unit_data_initialized;
/* Return true if FNDECL represents an HSA-callable function. */
bool
hsa_callable_function_p (tree fndecl)
{
return (lookup_attribute ("omp declare target", DECL_ATTRIBUTES (fndecl))
&& !lookup_attribute ("oacc function", DECL_ATTRIBUTES (fndecl)));
}
/* Allocate HSA structures that are are used when dealing with different
functions. */
void
hsa_init_compilation_unit_data (void)
{
if (compilation_unit_data_initialized)
return;
compilation_unit_data_initialized = true;
hsa_global_variable_symbols = new hash_table <hsa_noop_symbol_hasher> (8);
hsa_failed_functions = new hash_set <tree> ();
hsa_emitted_internal_decls = new hash_table <hsa_internal_fn_hasher> (2);
}
/* Free data structures that are used when dealing with different
functions. */
void
hsa_deinit_compilation_unit_data (void)
{
gcc_assert (compilation_unit_data_initialized);
delete hsa_failed_functions;
delete hsa_emitted_internal_decls;
for (hash_table <hsa_noop_symbol_hasher>::iterator it
= hsa_global_variable_symbols->begin ();
it != hsa_global_variable_symbols->end ();
++it)
{
hsa_symbol *sym = *it;
delete sym;
}
delete hsa_global_variable_symbols;
if (hsa_num_threads)
{
delete hsa_num_threads;
hsa_num_threads = NULL;
}
compilation_unit_data_initialized = false;
}
/* Return true if we are generating large HSA machine model. */
bool
hsa_machine_large_p (void)
{
/* FIXME: I suppose this is technically wrong but should work for me now. */
return (GET_MODE_BITSIZE (Pmode) == 64);
}
/* Return the HSA profile we are using. */
bool
hsa_full_profile_p (void)
{
return true;
}
/* Return true if a register in operand number OPNUM of instruction
is an output. False if it is an input. */
bool
hsa_insn_basic::op_output_p (unsigned opnum)
{
switch (m_opcode)
{
case HSA_OPCODE_PHI:
case BRIG_OPCODE_CBR:
case BRIG_OPCODE_SBR:
case BRIG_OPCODE_ST:
case BRIG_OPCODE_SIGNALNORET:
/* FIXME: There are probably missing cases here, double check. */
return false;
case BRIG_OPCODE_EXPAND:
/* Example: expand_v4_b32_b128 (dest0, dest1, dest2, dest3), src0. */
return opnum < operand_count () - 1;
default:
return opnum == 0;
}
}
/* Return true if OPCODE is an floating-point bit instruction opcode. */
bool
hsa_opcode_floating_bit_insn_p (BrigOpcode16_t opcode)
{
switch (opcode)
{
case BRIG_OPCODE_NEG:
case BRIG_OPCODE_ABS:
case BRIG_OPCODE_CLASS:
case BRIG_OPCODE_COPYSIGN:
return true;
default:
return false;
}
}
/* Return the number of destination operands for this INSN. */
unsigned
hsa_insn_basic::input_count ()
{
switch (m_opcode)
{
default:
return 1;
case BRIG_OPCODE_NOP:
return 0;
case BRIG_OPCODE_EXPAND:
return 2;
case BRIG_OPCODE_LD:
/* ld_v[234] not yet handled. */
return 1;
case BRIG_OPCODE_ST:
return 0;
case BRIG_OPCODE_ATOMICNORET:
return 0;
case BRIG_OPCODE_SIGNAL:
return 1;
case BRIG_OPCODE_SIGNALNORET:
return 0;
case BRIG_OPCODE_MEMFENCE:
return 0;
case BRIG_OPCODE_RDIMAGE:
case BRIG_OPCODE_LDIMAGE:
case BRIG_OPCODE_STIMAGE:
case BRIG_OPCODE_QUERYIMAGE:
case BRIG_OPCODE_QUERYSAMPLER:
sorry ("HSA image ops not handled");
return 0;
case BRIG_OPCODE_CBR:
case BRIG_OPCODE_BR:
return 0;
case BRIG_OPCODE_SBR:
return 0; /* ??? */
case BRIG_OPCODE_WAVEBARRIER:
return 0; /* ??? */
case BRIG_OPCODE_BARRIER:
case BRIG_OPCODE_ARRIVEFBAR:
case BRIG_OPCODE_INITFBAR:
case BRIG_OPCODE_JOINFBAR:
case BRIG_OPCODE_LEAVEFBAR:
case BRIG_OPCODE_RELEASEFBAR:
case BRIG_OPCODE_WAITFBAR:
return 0;
case BRIG_OPCODE_LDF:
return 1;
case BRIG_OPCODE_ACTIVELANECOUNT:
case BRIG_OPCODE_ACTIVELANEID:
case BRIG_OPCODE_ACTIVELANEMASK:
case BRIG_OPCODE_ACTIVELANEPERMUTE:
return 1; /* ??? */
case BRIG_OPCODE_CALL:
case BRIG_OPCODE_SCALL:
case BRIG_OPCODE_ICALL:
return 0;
case BRIG_OPCODE_RET:
return 0;
case BRIG_OPCODE_ALLOCA:
return 1;
case BRIG_OPCODE_CLEARDETECTEXCEPT:
return 0;
case BRIG_OPCODE_SETDETECTEXCEPT:
return 0;
case BRIG_OPCODE_PACKETCOMPLETIONSIG:
case BRIG_OPCODE_PACKETID:
case BRIG_OPCODE_CASQUEUEWRITEINDEX:
case BRIG_OPCODE_LDQUEUEREADINDEX:
case BRIG_OPCODE_LDQUEUEWRITEINDEX:
case BRIG_OPCODE_STQUEUEREADINDEX:
case BRIG_OPCODE_STQUEUEWRITEINDEX:
return 1; /* ??? */
case BRIG_OPCODE_ADDQUEUEWRITEINDEX:
return 1;
case BRIG_OPCODE_DEBUGTRAP:
return 0;
case BRIG_OPCODE_GROUPBASEPTR:
case BRIG_OPCODE_KERNARGBASEPTR:
return 1; /* ??? */
case HSA_OPCODE_ARG_BLOCK:
return 0;
case BRIG_KIND_DIRECTIVE_COMMENT:
return 0;
}
}
/* Return the number of source operands for this INSN. */
unsigned
hsa_insn_basic::num_used_ops ()
{
gcc_checking_assert (input_count () <= operand_count ());
return operand_count () - input_count ();
}
/* Set alignment to VALUE. */
void
hsa_insn_mem::set_align (BrigAlignment8_t value)
{
/* TODO: Perhaps remove this dump later on: */
if (dump_file && (dump_flags & TDF_DETAILS) && value < m_align)
{
fprintf (dump_file, "Decreasing alignment to %u in instruction ", value);
dump_hsa_insn (dump_file, this);
}
m_align = value;
}
/* Return size of HSA type T in bits. */
unsigned
hsa_type_bit_size (BrigType16_t t)
{
switch (t)
{
case BRIG_TYPE_B1:
return 1;
case BRIG_TYPE_U8:
case BRIG_TYPE_S8:
case BRIG_TYPE_B8:
return 8;
case BRIG_TYPE_U16:
case BRIG_TYPE_S16:
case BRIG_TYPE_B16:
case BRIG_TYPE_F16:
return 16;
case BRIG_TYPE_U32:
case BRIG_TYPE_S32:
case BRIG_TYPE_B32:
case BRIG_TYPE_F32:
case BRIG_TYPE_U8X4:
case BRIG_TYPE_U16X2:
case BRIG_TYPE_S8X4:
case BRIG_TYPE_S16X2:
case BRIG_TYPE_F16X2:
return 32;
case BRIG_TYPE_U64:
case BRIG_TYPE_S64:
case BRIG_TYPE_F64:
case BRIG_TYPE_B64:
case BRIG_TYPE_U8X8:
case BRIG_TYPE_U16X4:
case BRIG_TYPE_U32X2:
case BRIG_TYPE_S8X8:
case BRIG_TYPE_S16X4:
case BRIG_TYPE_S32X2:
case BRIG_TYPE_F16X4:
case BRIG_TYPE_F32X2:
return 64;
case BRIG_TYPE_B128:
case BRIG_TYPE_U8X16:
case BRIG_TYPE_U16X8:
case BRIG_TYPE_U32X4:
case BRIG_TYPE_U64X2:
case BRIG_TYPE_S8X16:
case BRIG_TYPE_S16X8:
case BRIG_TYPE_S32X4:
case BRIG_TYPE_S64X2:
case BRIG_TYPE_F16X8:
case BRIG_TYPE_F32X4:
case BRIG_TYPE_F64X2:
return 128;
default:
gcc_assert (hsa_seen_error ());
return t;
}
}
/* Return BRIG bit-type with BITSIZE length. */
BrigType16_t
hsa_bittype_for_bitsize (unsigned bitsize)
{
switch (bitsize)
{
case 1:
return BRIG_TYPE_B1;
case 8:
return BRIG_TYPE_B8;
case 16:
return BRIG_TYPE_B16;
case 32:
return BRIG_TYPE_B32;
case 64:
return BRIG_TYPE_B64;
case 128:
return BRIG_TYPE_B128;
default:
gcc_unreachable ();
}
}
/* Return BRIG unsigned int type with BITSIZE length. */
BrigType16_t
hsa_uint_for_bitsize (unsigned bitsize)
{
switch (bitsize)
{
case 8:
return BRIG_TYPE_U8;
case 16:
return BRIG_TYPE_U16;
case 32:
return BRIG_TYPE_U32;
case 64:
return BRIG_TYPE_U64;
default:
gcc_unreachable ();
}
}
/* Return BRIG float type with BITSIZE length. */
BrigType16_t
hsa_float_for_bitsize (unsigned bitsize)
{
switch (bitsize)
{
case 16:
return BRIG_TYPE_F16;
case 32:
return BRIG_TYPE_F32;
case 64:
return BRIG_TYPE_F64;
default:
gcc_unreachable ();
}
}
/* Return HSA bit-type with the same size as the type T. */
BrigType16_t
hsa_bittype_for_type (BrigType16_t t)
{
return hsa_bittype_for_bitsize (hsa_type_bit_size (t));
}
/* Return true if and only if TYPE is a floating point number type. */
bool
hsa_type_float_p (BrigType16_t type)
{
switch (type & BRIG_TYPE_BASE_MASK)
{
case BRIG_TYPE_F16:
case BRIG_TYPE_F32:
case BRIG_TYPE_F64:
return true;
default:
return false;
}
}
/* Return true if and only if TYPE is an integer number type. */
bool
hsa_type_integer_p (BrigType16_t type)
{
switch (type & BRIG_TYPE_BASE_MASK)
{
case BRIG_TYPE_U8:
case BRIG_TYPE_U16:
case BRIG_TYPE_U32:
case BRIG_TYPE_U64:
case BRIG_TYPE_S8:
case BRIG_TYPE_S16:
case BRIG_TYPE_S32:
case BRIG_TYPE_S64:
return true;
default:
return false;
}
}
/* Return true if and only if TYPE is an bit-type. */
bool
hsa_btype_p (BrigType16_t type)
{
switch (type & BRIG_TYPE_BASE_MASK)
{
case BRIG_TYPE_B8:
case BRIG_TYPE_B16:
case BRIG_TYPE_B32:
case BRIG_TYPE_B64:
case BRIG_TYPE_B128:
return true;
default:
return false;
}
}
/* Return HSA alignment encoding alignment to N bits. */
BrigAlignment8_t
hsa_alignment_encoding (unsigned n)
{
gcc_assert (n >= 8 && !(n & (n - 1)));
if (n >= 256)
return BRIG_ALIGNMENT_32;
switch (n)
{
case 8:
return BRIG_ALIGNMENT_1;
case 16:
return BRIG_ALIGNMENT_2;
case 32:
return BRIG_ALIGNMENT_4;
case 64:
return BRIG_ALIGNMENT_8;
case 128:
return BRIG_ALIGNMENT_16;
default:
gcc_unreachable ();
}
}
/* Return natural alignment of HSA TYPE. */
BrigAlignment8_t
hsa_natural_alignment (BrigType16_t type)
{
return hsa_alignment_encoding (hsa_type_bit_size (type & ~BRIG_TYPE_ARRAY));
}
/* Call the correct destructor of a HSA instruction. */
void
hsa_destroy_insn (hsa_insn_basic *insn)
{
if (hsa_insn_phi *phi = dyn_cast <hsa_insn_phi *> (insn))
phi->~hsa_insn_phi ();
else if (hsa_insn_br *br = dyn_cast <hsa_insn_br *> (insn))
br->~hsa_insn_br ();
else if (hsa_insn_cmp *cmp = dyn_cast <hsa_insn_cmp *> (insn))
cmp->~hsa_insn_cmp ();
else if (hsa_insn_mem *mem = dyn_cast <hsa_insn_mem *> (insn))
mem->~hsa_insn_mem ();
else if (hsa_insn_atomic *atomic = dyn_cast <hsa_insn_atomic *> (insn))
atomic->~hsa_insn_atomic ();
else if (hsa_insn_seg *seg = dyn_cast <hsa_insn_seg *> (insn))
seg->~hsa_insn_seg ();
else if (hsa_insn_call *call = dyn_cast <hsa_insn_call *> (insn))
call->~hsa_insn_call ();
else if (hsa_insn_arg_block *block = dyn_cast <hsa_insn_arg_block *> (insn))
block->~hsa_insn_arg_block ();
else if (hsa_insn_sbr *sbr = dyn_cast <hsa_insn_sbr *> (insn))
sbr->~hsa_insn_sbr ();
else if (hsa_insn_comment *comment = dyn_cast <hsa_insn_comment *> (insn))
comment->~hsa_insn_comment ();
else
insn->~hsa_insn_basic ();
}
/* Call the correct destructor of a HSA operand. */
void
hsa_destroy_operand (hsa_op_base *op)
{
if (hsa_op_code_list *list = dyn_cast <hsa_op_code_list *> (op))
list->~hsa_op_code_list ();
else if (hsa_op_operand_list *list = dyn_cast <hsa_op_operand_list *> (op))
list->~hsa_op_operand_list ();
else if (hsa_op_reg *reg = dyn_cast <hsa_op_reg *> (op))
reg->~hsa_op_reg ();
else if (hsa_op_immed *immed = dyn_cast <hsa_op_immed *> (op))
immed->~hsa_op_immed ();
else
op->~hsa_op_base ();
}
/* Create a mapping between the original function DECL and kernel name NAME. */
void
hsa_add_kern_decl_mapping (tree decl, char *name, unsigned omp_data_size,
bool gridified_kernel_p)
{
hsa_decl_kernel_map_element dkm;
dkm.decl = decl;
dkm.name = name;
dkm.omp_data_size = omp_data_size;
dkm.gridified_kernel_p = gridified_kernel_p;
vec_safe_push (hsa_decl_kernel_mapping, dkm);
}
/* Return the number of kernel decl name mappings. */
unsigned
hsa_get_number_decl_kernel_mappings (void)
{
return vec_safe_length (hsa_decl_kernel_mapping);
}
/* Return the decl in the Ith kernel decl name mapping. */
tree
hsa_get_decl_kernel_mapping_decl (unsigned i)
{
return (*hsa_decl_kernel_mapping)[i].decl;
}
/* Return the name in the Ith kernel decl name mapping. */
char *
hsa_get_decl_kernel_mapping_name (unsigned i)
{
return (*hsa_decl_kernel_mapping)[i].name;
}
/* Return maximum OMP size for kernel decl name mapping. */
unsigned
hsa_get_decl_kernel_mapping_omp_size (unsigned i)
{
return (*hsa_decl_kernel_mapping)[i].omp_data_size;
}
/* Return if the function is gridified kernel in decl name mapping. */
bool
hsa_get_decl_kernel_mapping_gridified (unsigned i)
{
return (*hsa_decl_kernel_mapping)[i].gridified_kernel_p;
}
/* Free the mapping between original decls and kernel names. */
void
hsa_free_decl_kernel_mapping (void)
{
if (hsa_decl_kernel_mapping == NULL)
return;
for (unsigned i = 0; i < hsa_decl_kernel_mapping->length (); ++i)
free ((*hsa_decl_kernel_mapping)[i].name);
ggc_free (hsa_decl_kernel_mapping);
}
/* Add new kernel dependency. */
void
hsa_add_kernel_dependency (tree caller, const char *called_function)
{
if (hsa_decl_kernel_dependencies == NULL)
hsa_decl_kernel_dependencies = new hash_map<tree, vec<const char *> *> ();
vec <const char *> *s = NULL;
vec <const char *> **slot = hsa_decl_kernel_dependencies->get (caller);
if (slot == NULL)
{
s = new vec <const char *> ();
hsa_decl_kernel_dependencies->put (caller, s);
}
else
s = *slot;
s->safe_push (called_function);
}
/* Modify the name P in-place so that it is a valid HSA identifier. */
void
hsa_sanitize_name (char *p)
{
for (; *p; p++)
if (*p == '.' || *p == '-')
*p = '_';
}
/* Clone the name P, set trailing ampersand and sanitize the name. */
char *
hsa_brig_function_name (const char *p)
{
unsigned len = strlen (p);
char *buf = XNEWVEC (char, len + 2);
buf[0] = '&';
buf[len + 1] = '\0';
memcpy (buf + 1, p, len);
hsa_sanitize_name (buf);
return buf;
}
/* Return declaration name if exists. */
const char *
hsa_get_declaration_name (tree decl)
{
if (!DECL_NAME (decl))
{
char buf[64];
snprintf (buf, 64, "__hsa_anonymous_%i", DECL_UID (decl));
const char *ggc_str = ggc_strdup (buf);
return ggc_str;
}
tree name_tree;
if (TREE_CODE (decl) == FUNCTION_DECL
|| (TREE_CODE (decl) == VAR_DECL && is_global_var (decl)))
name_tree = DECL_ASSEMBLER_NAME (decl);
else
name_tree = DECL_NAME (decl);
const char *name = IDENTIFIER_POINTER (name_tree);
/* User-defined assembly names have prepended asterisk symbol. */
if (name[0] == '*')
name++;
return name;
}
void
hsa_summary_t::link_functions (cgraph_node *gpu, cgraph_node *host,
hsa_function_kind kind, bool gridified_kernel_p)
{
hsa_function_summary *gpu_summary = get (gpu);
hsa_function_summary *host_summary = get (host);
gpu_summary->m_kind = kind;
host_summary->m_kind = kind;
gpu_summary->m_gpu_implementation_p = true;
host_summary->m_gpu_implementation_p = false;
gpu_summary->m_gridified_kernel_p = gridified_kernel_p;
host_summary->m_gridified_kernel_p = gridified_kernel_p;
gpu_summary->m_binded_function = host;
host_summary->m_binded_function = gpu;
tree gdecl = gpu->decl;
DECL_ATTRIBUTES (gdecl)
= tree_cons (get_identifier ("flatten"), NULL_TREE,
DECL_ATTRIBUTES (gdecl));
tree fn_opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl);
if (fn_opts == NULL_TREE)
fn_opts = optimization_default_node;
fn_opts = copy_node (fn_opts);
TREE_OPTIMIZATION (fn_opts)->x_flag_tree_loop_vectorize = false;
TREE_OPTIMIZATION (fn_opts)->x_flag_tree_slp_vectorize = false;
DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl) = fn_opts;
}
/* Add a HOST function to HSA summaries. */
void
hsa_register_kernel (cgraph_node *host)
{
if (hsa_summaries == NULL)
hsa_summaries = new hsa_summary_t (symtab);
hsa_function_summary *s = hsa_summaries->get (host);
s->m_kind = HSA_KERNEL;
}
/* Add a pair of functions to HSA summaries. GPU is an HSA implementation of
a HOST function. */
void
hsa_register_kernel (cgraph_node *gpu, cgraph_node *host)
{
if (hsa_summaries == NULL)
hsa_summaries = new hsa_summary_t (symtab);
hsa_summaries->link_functions (gpu, host, HSA_KERNEL, true);
}
/* Return true if expansion of the current HSA function has already failed. */
bool
hsa_seen_error (void)
{
return hsa_cfun->m_seen_error;
}
/* Mark current HSA function as failed. */
void
hsa_fail_cfun (void)
{
hsa_failed_functions->add (hsa_cfun->m_decl);
hsa_cfun->m_seen_error = true;
}
char *
hsa_internal_fn::name ()
{
char *name = xstrdup (internal_fn_name (m_fn));
for (char *ptr = name; *ptr; ptr++)
*ptr = TOLOWER (*ptr);
const char *suffix = NULL;
if (m_type_bit_size == 32)
suffix = "f";
if (suffix)
{
char *name2 = concat (name, suffix, NULL);
free (name);
name = name2;
}
hsa_sanitize_name (name);
return name;
}
unsigned
hsa_internal_fn::get_arity ()
{
switch (m_fn)
{
case IFN_ACOS:
case IFN_ASIN:
case IFN_ATAN:
case IFN_COS:
case IFN_EXP:
case IFN_EXP10:
case IFN_EXP2:
case IFN_EXPM1:
case IFN_LOG:
case IFN_LOG10:
case IFN_LOG1P:
case IFN_LOG2:
case IFN_LOGB:
case IFN_SIGNIFICAND:
case IFN_SIN:
case IFN_SQRT:
case IFN_TAN:
case IFN_CEIL:
case IFN_FLOOR:
case IFN_NEARBYINT:
case IFN_RINT:
case IFN_ROUND:
case IFN_TRUNC:
return 1;
case IFN_ATAN2:
case IFN_COPYSIGN:
case IFN_FMOD:
case IFN_POW:
case IFN_REMAINDER:
case IFN_SCALB:
case IFN_LDEXP:
return 2;
break;
case IFN_CLRSB:
case IFN_CLZ:
case IFN_CTZ:
case IFN_FFS:
case IFN_PARITY:
case IFN_POPCOUNT:
default:
/* As we produce sorry message for unknown internal functions,
reaching this label is definitely a bug. */
gcc_unreachable ();
}
}
BrigType16_t
hsa_internal_fn::get_argument_type (int n)
{
switch (m_fn)
{
case IFN_ACOS:
case IFN_ASIN:
case IFN_ATAN:
case IFN_COS:
case IFN_EXP:
case IFN_EXP10:
case IFN_EXP2:
case IFN_EXPM1:
case IFN_LOG:
case IFN_LOG10:
case IFN_LOG1P:
case IFN_LOG2:
case IFN_LOGB:
case IFN_SIGNIFICAND:
case IFN_SIN:
case IFN_SQRT:
case IFN_TAN:
case IFN_CEIL:
case IFN_FLOOR:
case IFN_NEARBYINT:
case IFN_RINT:
case IFN_ROUND:
case IFN_TRUNC:
case IFN_ATAN2:
case IFN_COPYSIGN:
case IFN_FMOD:
case IFN_POW:
case IFN_REMAINDER:
case IFN_SCALB:
return hsa_float_for_bitsize (m_type_bit_size);
case IFN_LDEXP:
{
if (n == -1 || n == 0)
return hsa_float_for_bitsize (m_type_bit_size);
else
return BRIG_TYPE_S32;
}
default:
/* As we produce sorry message for unknown internal functions,
reaching this label is definitely a bug. */
gcc_unreachable ();
}
}
#include "gt-hsa.h"

1402
gcc/hsa.h Normal file

File diff suppressed because it is too large Load Diff

331
gcc/ipa-hsa.c Normal file
View File

@ -0,0 +1,331 @@
/* Callgraph based analysis of static variables.
Copyright (C) 2015-2016 Free Software Foundation, Inc.
Contributed by Martin Liska <mliska@suse.cz>
This file is part of GCC.
GCC is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 3, or (at your option) any later
version.
GCC is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
for more details.
You should have received a copy of the GNU General Public License
along with GCC; see the file COPYING3. If not see
<http://www.gnu.org/licenses/>. */
/* Interprocedural HSA pass is responsible for creation of HSA clones.
For all these HSA clones, we emit HSAIL instructions and pass processing
is terminated. */
#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "tm.h"
#include "is-a.h"
#include "hash-set.h"
#include "vec.h"
#include "tree.h"
#include "tree-pass.h"
#include "function.h"
#include "basic-block.h"
#include "gimple.h"
#include "dumpfile.h"
#include "gimple-pretty-print.h"
#include "tree-streamer.h"
#include "stringpool.h"
#include "cgraph.h"
#include "print-tree.h"
#include "symbol-summary.h"
#include "hsa.h"
namespace {
/* If NODE is not versionable, warn about not emiting HSAIL and return false.
Otherwise return true. */
static bool
check_warn_node_versionable (cgraph_node *node)
{
if (!node->local.versionable)
{
warning_at (EXPR_LOCATION (node->decl), OPT_Whsa,
"could not emit HSAIL for function %s: function cannot be "
"cloned", node->name ());
return false;
}
return true;
}
/* The function creates HSA clones for all functions that were either
marked as HSA kernels or are callable HSA functions. Apart from that,
we redirect all edges that come from an HSA clone and end in another
HSA clone to connect these two functions. */
static unsigned int
process_hsa_functions (void)
{
struct cgraph_node *node;
if (hsa_summaries == NULL)
hsa_summaries = new hsa_summary_t (symtab);
FOR_EACH_DEFINED_FUNCTION (node)
{
hsa_function_summary *s = hsa_summaries->get (node);
/* A linked function is skipped. */
if (s->m_binded_function != NULL)
continue;
if (s->m_kind != HSA_NONE)
{
if (!check_warn_node_versionable (node))
continue;
cgraph_node *clone
= node->create_virtual_clone (vec <cgraph_edge *> (),
NULL, NULL, "hsa");
TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
clone->force_output = true;
hsa_summaries->link_functions (clone, node, s->m_kind, false);
if (dump_file)
fprintf (dump_file, "Created a new HSA clone: %s, type: %s\n",
clone->name (),
s->m_kind == HSA_KERNEL ? "kernel" : "function");
}
else if (hsa_callable_function_p (node->decl))
{
if (!check_warn_node_versionable (node))
continue;
cgraph_node *clone
= node->create_virtual_clone (vec <cgraph_edge *> (),
NULL, NULL, "hsa");
TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
if (!cgraph_local_p (node))
clone->force_output = true;
hsa_summaries->link_functions (clone, node, HSA_FUNCTION, false);
if (dump_file)
fprintf (dump_file, "Created a new HSA function clone: %s\n",
clone->name ());
}
}
/* Redirect all edges that are between HSA clones. */
FOR_EACH_DEFINED_FUNCTION (node)
{
cgraph_edge *e = node->callees;
while (e)
{
hsa_function_summary *src = hsa_summaries->get (node);
if (src->m_kind != HSA_NONE && src->m_gpu_implementation_p)
{
hsa_function_summary *dst = hsa_summaries->get (e->callee);
if (dst->m_kind != HSA_NONE && !dst->m_gpu_implementation_p)
{
e->redirect_callee (dst->m_binded_function);
if (dump_file)
fprintf (dump_file,
"Redirecting edge to HSA function: %s->%s\n",
xstrdup_for_dump (e->caller->name ()),
xstrdup_for_dump (e->callee->name ()));
}
}
e = e->next_callee;
}
}
return 0;
}
/* Iterate all HSA functions and stream out HSA function summary. */
static void
ipa_hsa_write_summary (void)
{
struct bitpack_d bp;
struct cgraph_node *node;
struct output_block *ob;
unsigned int count = 0;
lto_symtab_encoder_iterator lsei;
lto_symtab_encoder_t encoder;
if (!hsa_summaries)
return;
ob = create_output_block (LTO_section_ipa_hsa);
encoder = ob->decl_state->symtab_node_encoder;
ob->symbol = NULL;
for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei);
lsei_next_function_in_partition (&lsei))
{
node = lsei_cgraph_node (lsei);
hsa_function_summary *s = hsa_summaries->get (node);
if (s->m_kind != HSA_NONE)
count++;
}
streamer_write_uhwi (ob, count);
/* Process all of the functions. */
for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei);
lsei_next_function_in_partition (&lsei))
{
node = lsei_cgraph_node (lsei);
hsa_function_summary *s = hsa_summaries->get (node);
if (s->m_kind != HSA_NONE)
{
encoder = ob->decl_state->symtab_node_encoder;
int node_ref = lto_symtab_encoder_encode (encoder, node);
streamer_write_uhwi (ob, node_ref);
bp = bitpack_create (ob->main_stream);
bp_pack_value (&bp, s->m_kind, 2);
bp_pack_value (&bp, s->m_gpu_implementation_p, 1);
bp_pack_value (&bp, s->m_binded_function != NULL, 1);
streamer_write_bitpack (&bp);
if (s->m_binded_function)
stream_write_tree (ob, s->m_binded_function->decl, true);
}
}
streamer_write_char_stream (ob->main_stream, 0);
produce_asm (ob, NULL);
destroy_output_block (ob);
}
/* Read section in file FILE_DATA of length LEN with data DATA. */
static void
ipa_hsa_read_section (struct lto_file_decl_data *file_data, const char *data,
size_t len)
{
const struct lto_function_header *header
= (const struct lto_function_header *) data;
const int cfg_offset = sizeof (struct lto_function_header);
const int main_offset = cfg_offset + header->cfg_size;
const int string_offset = main_offset + header->main_size;
struct data_in *data_in;
unsigned int i;
unsigned int count;
lto_input_block ib_main ((const char *) data + main_offset,
header->main_size, file_data->mode_table);
data_in
= lto_data_in_create (file_data, (const char *) data + string_offset,
header->string_size, vNULL);
count = streamer_read_uhwi (&ib_main);
for (i = 0; i < count; i++)
{
unsigned int index;
struct cgraph_node *node;
lto_symtab_encoder_t encoder;
index = streamer_read_uhwi (&ib_main);
encoder = file_data->symtab_node_encoder;
node = dyn_cast<cgraph_node *> (lto_symtab_encoder_deref (encoder,
index));
gcc_assert (node->definition);
hsa_function_summary *s = hsa_summaries->get (node);
struct bitpack_d bp = streamer_read_bitpack (&ib_main);
s->m_kind = (hsa_function_kind) bp_unpack_value (&bp, 2);
s->m_gpu_implementation_p = bp_unpack_value (&bp, 1);
bool has_tree = bp_unpack_value (&bp, 1);
if (has_tree)
{
tree decl = stream_read_tree (&ib_main, data_in);
s->m_binded_function = cgraph_node::get_create (decl);
}
}
lto_free_section_data (file_data, LTO_section_ipa_hsa, NULL, data,
len);
lto_data_in_delete (data_in);
}
/* Load streamed HSA functions summary and assign the summary to a function. */
static void
ipa_hsa_read_summary (void)
{
struct lto_file_decl_data **file_data_vec = lto_get_file_decl_data ();
struct lto_file_decl_data *file_data;
unsigned int j = 0;
if (hsa_summaries == NULL)
hsa_summaries = new hsa_summary_t (symtab);
while ((file_data = file_data_vec[j++]))
{
size_t len;
const char *data = lto_get_section_data (file_data, LTO_section_ipa_hsa,
NULL, &len);
if (data)
ipa_hsa_read_section (file_data, data, len);
}
}
const pass_data pass_data_ipa_hsa =
{
IPA_PASS, /* type */
"hsa", /* name */
OPTGROUP_NONE, /* optinfo_flags */
TV_IPA_HSA, /* tv_id */
0, /* properties_required */
0, /* properties_provided */
0, /* properties_destroyed */
0, /* todo_flags_start */
TODO_dump_symtab, /* todo_flags_finish */
};
class pass_ipa_hsa : public ipa_opt_pass_d
{
public:
pass_ipa_hsa (gcc::context *ctxt)
: ipa_opt_pass_d (pass_data_ipa_hsa, ctxt,
NULL, /* generate_summary */
ipa_hsa_write_summary, /* write_summary */
ipa_hsa_read_summary, /* read_summary */
ipa_hsa_write_summary, /* write_optimization_summary */
ipa_hsa_read_summary, /* read_optimization_summary */
NULL, /* stmt_fixup */
0, /* function_transform_todo_flags_start */
NULL, /* function_transform */
NULL) /* variable_transform */
{}
/* opt_pass methods: */
virtual bool gate (function *);
virtual unsigned int execute (function *) { return process_hsa_functions (); }
}; // class pass_ipa_reference
bool
pass_ipa_hsa::gate (function *)
{
return hsa_gen_requested_p ();
}
} // anon namespace
ipa_opt_pass_d *
make_pass_ipa_hsa (gcc::context *ctxt)
{
return new pass_ipa_hsa (ctxt);
}

View File

@ -51,7 +51,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
"ipcp_trans",
"icf",
"offload_table",
"mode_table"
"mode_table",
"hsa"
};

View File

@ -244,6 +244,7 @@ enum lto_section_type
LTO_section_ipa_icf,
LTO_section_offload_table,
LTO_section_mode_table,
LTO_section_ipa_hsa,
LTO_N_SECTION_TYPES /* Must be last. */
};

View File

@ -736,6 +736,7 @@ compile_images_for_offload_targets (unsigned in_argc, char *in_argv[],
return;
unsigned num_targets = parse_env_var (target_names, &names, NULL);
int next_name_entry = 0;
const char *compiler_path = getenv ("COMPILER_PATH");
if (!compiler_path)
goto out;
@ -745,13 +746,19 @@ compile_images_for_offload_targets (unsigned in_argc, char *in_argv[],
offload_names = XCNEWVEC (char *, num_targets + 1);
for (unsigned i = 0; i < num_targets; i++)
{
offload_names[i]
/* HSA does not use LTO-like streaming and a different compiler, skip
it. */
if (strcmp (names[i], "hsa") == 0)
continue;
offload_names[next_name_entry]
= compile_offload_image (names[i], compiler_path, in_argc, in_argv,
compiler_opts, compiler_opt_count,
linker_opts, linker_opt_count);
if (!offload_names[i])
if (!offload_names[next_name_entry])
fatal_error (input_location,
"problem with building target image for %s\n", names[i]);
next_name_entry++;
}
out:

View File

@ -1,3 +1,10 @@
2016-01-19 Martin Liska <mliska@suse.cz>
Martin Jambor <mjambor@suse.cz>
* lto-partition.c: Include "hsa.h"
(add_symbol_to_partition_1): Put hsa implementations into the
same partition as host implementations.
2016-01-12 Jan Hubicka <hubicka@ucw.cz>
PR lto/69003

View File

@ -34,6 +34,7 @@ along with GCC; see the file COPYING3. If not see
#include "ipa-prop.h"
#include "ipa-inline.h"
#include "lto-partition.h"
#include "hsa.h"
vec<ltrans_partition> ltrans_partitions;
@ -170,6 +171,24 @@ add_symbol_to_partition_1 (ltrans_partition part, symtab_node *node)
Therefore put it into the same partition. */
if (cnode->instrumented_version)
add_symbol_to_partition_1 (part, cnode->instrumented_version);
/* Add an HSA associated with the symbol. */
if (hsa_summaries != NULL)
{
hsa_function_summary *s = hsa_summaries->get (cnode);
if (s->m_kind == HSA_KERNEL)
{
/* Add binded function. */
bool added = add_symbol_to_partition_1 (part,
s->m_binded_function);
gcc_assert (added);
if (symtab->dump_file)
fprintf (symtab->dump_file,
"adding an HSA function (host/gpu) to the "
"partition: %s\n",
s->m_binded_function->name ());
}
}
}
add_references_to_partition (part, node);

View File

@ -340,8 +340,13 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_SINGLE_COPY_START, "GOMP_single_copy_start",
BT_FN_PTR, ATTR_NOTHROW_LEAF_LIST)
DEF_GOMP_BUILTIN (BUILT_IN_GOMP_SINGLE_COPY_END, "GOMP_single_copy_end",
BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST)
DEF_GOMP_BUILTIN (BUILT_IN_GOMP_OFFLOAD_REGISTER, "GOMP_offload_register_ver",
BT_FN_VOID_UINT_PTR_INT_PTR, ATTR_NOTHROW_LIST)
DEF_GOMP_BUILTIN (BUILT_IN_GOMP_OFFLOAD_UNREGISTER,
"GOMP_offload_unregister_ver",
BT_FN_VOID_UINT_PTR_INT_PTR, ATTR_NOTHROW_LIST)
DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET, "GOMP_target_ext",
BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT,
BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
ATTR_NOTHROW_LIST)
DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_DATA, "GOMP_target_data_ext",
BT_FN_VOID_INT_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)

File diff suppressed because it is too large Load Diff

View File

@ -1916,8 +1916,35 @@ common_handle_option (struct gcc_options *opts,
break;
case OPT_foffload_:
/* Deferred. */
break;
{
const char *p = arg;
opts->x_flag_disable_hsa = true;
while (*p != 0)
{
const char *comma = strchr (p, ',');
if ((strncmp (p, "disable", 7) == 0)
&& (p[7] == ',' || p[7] == '\0'))
{
opts->x_flag_disable_hsa = true;
break;
}
if ((strncmp (p, "hsa", 3) == 0)
&& (p[3] == ',' || p[3] == '\0'))
{
#ifdef ENABLE_HSA
opts->x_flag_disable_hsa = false;
#else
sorry ("HSA has not been enabled during configuration");
#endif
}
if (!comma)
break;
p = comma + 1;
}
break;
}
#ifndef ACCEL_COMPILER
case OPT_foffload_abi_:

View File

@ -1183,6 +1183,11 @@ DEFPARAM (PARAM_MAX_RTL_IF_CONVERSION_INSNS,
"Maximum number of insns in a basic block to consider for RTL "
"if-conversion.",
10, 0, 99)
DEFPARAM (PARAM_HSA_GEN_DEBUG_STORES,
"hsa-gen-debug-stores",
"Level of hsa debug stores verbosity",
0, 0, 1)
/*
Local variables:

View File

@ -151,6 +151,7 @@ along with GCC; see the file COPYING3. If not see
NEXT_PASS (pass_ipa_cp);
NEXT_PASS (pass_ipa_cdtor_merge);
NEXT_PASS (pass_target_clone);
NEXT_PASS (pass_ipa_hsa);
NEXT_PASS (pass_ipa_inline);
NEXT_PASS (pass_ipa_pure_const);
NEXT_PASS (pass_ipa_reference);
@ -386,6 +387,7 @@ along with GCC; see the file COPYING3. If not see
NEXT_PASS (pass_nrv);
NEXT_PASS (pass_cleanup_cfg_post_optimizing);
NEXT_PASS (pass_warn_function_noreturn);
NEXT_PASS (pass_gen_hsail);
NEXT_PASS (pass_expand);

View File

@ -97,6 +97,7 @@ DEFTIMEVAR (TV_WHOPR_WPA_IO , "whopr wpa I/O")
DEFTIMEVAR (TV_WHOPR_PARTITIONING , "whopr partitioning")
DEFTIMEVAR (TV_WHOPR_LTRANS , "whopr ltrans")
DEFTIMEVAR (TV_IPA_REFERENCE , "ipa reference")
DEFTIMEVAR (TV_IPA_HSA , "ipa HSA")
DEFTIMEVAR (TV_IPA_PROFILE , "ipa profile")
DEFTIMEVAR (TV_IPA_AUTOFDO , "auto profile")
DEFTIMEVAR (TV_IPA_PURE_CONST , "ipa pure const")

View File

@ -75,6 +75,7 @@ along with GCC; see the file COPYING3. If not see
#include "gcse.h"
#include "tree-chkp.h"
#include "omp-low.h"
#include "hsa.h"
#if defined(DBX_DEBUGGING_INFO) || defined(XCOFF_DEBUGGING_INFO)
#include "dbxout.h"
@ -518,6 +519,8 @@ compile_file (void)
omp_finish_file ();
hsa_output_brig ();
output_shared_constant_pool ();
output_object_blocks ();
finish_tm_clone_pairs ();

View File

@ -458,7 +458,11 @@ enum omp_clause_code {
OMP_CLAUSE_VECTOR_LENGTH,
/* OpenACC clause: tile ( size-expr-list ). */
OMP_CLAUSE_TILE
OMP_CLAUSE_TILE,
/* OpenMP internal-only clause to specify grid dimensions of a gridified
kernel. */
OMP_CLAUSE__GRIDDIM_
};
#undef DEFTREESTRUCT
@ -1375,6 +1379,9 @@ struct GTY(()) tree_omp_clause {
enum tree_code reduction_code;
enum omp_clause_linear_kind linear_kind;
enum tree_code if_modifier;
/* The dimension a OMP_CLAUSE__GRIDDIM_ clause of a gridified target
construct describes. */
unsigned int dimension;
} GTY ((skip)) subcode;
/* The gimplification of OMP_CLAUSE_REDUCTION_{INIT,MERGE} for omp-low's

View File

@ -471,6 +471,7 @@ extern gimple_opt_pass *make_pass_sanopt (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_oacc_kernels (gcc::context *ctxt);
extern simple_ipa_opt_pass *make_pass_ipa_oacc (gcc::context *ctxt);
extern simple_ipa_opt_pass *make_pass_ipa_oacc_kernels (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_gen_hsail (gcc::context *ctxt);
/* IPA Passes */
extern simple_ipa_opt_pass *make_pass_ipa_lower_emutls (gcc::context *ctxt);
@ -495,6 +496,7 @@ extern ipa_opt_pass_d *make_pass_ipa_cp (gcc::context *ctxt);
extern ipa_opt_pass_d *make_pass_ipa_icf (gcc::context *ctxt);
extern ipa_opt_pass_d *make_pass_ipa_devirt (gcc::context *ctxt);
extern ipa_opt_pass_d *make_pass_ipa_reference (gcc::context *ctxt);
extern ipa_opt_pass_d *make_pass_ipa_hsa (gcc::context *ctxt);
extern ipa_opt_pass_d *make_pass_ipa_pure_const (gcc::context *ctxt);
extern simple_ipa_opt_pass *make_pass_ipa_pta (gcc::context *ctxt);
extern simple_ipa_opt_pass *make_pass_ipa_tm (gcc::context *ctxt);

View File

@ -942,6 +942,18 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, int flags)
pp_right_paren (pp);
break;
case OMP_CLAUSE__GRIDDIM_:
pp_string (pp, "_griddim_(");
pp_unsigned_wide_integer (pp, OMP_CLAUSE__GRIDDIM__DIMENSION (clause));
pp_colon (pp);
dump_generic_node (pp, OMP_CLAUSE__GRIDDIM__SIZE (clause), spc, flags,
false);
pp_comma (pp);
dump_generic_node (pp, OMP_CLAUSE__GRIDDIM__GROUP (clause), spc, flags,
false);
pp_right_paren (pp);
break;
default:
/* Should never happen. */
dump_generic_node (pp, clause, spc, flags, false);

View File

@ -328,6 +328,7 @@ unsigned const char omp_clause_num_ops[] =
1, /* OMP_CLAUSE_NUM_WORKERS */
1, /* OMP_CLAUSE_VECTOR_LENGTH */
1, /* OMP_CLAUSE_TILE */
2, /* OMP_CLAUSE__GRIDDIM_ */
};
const char * const omp_clause_code_name[] =
@ -398,7 +399,8 @@ const char * const omp_clause_code_name[] =
"num_gangs",
"num_workers",
"vector_length",
"tile"
"tile",
"_griddim_"
};
@ -11744,6 +11746,7 @@ walk_tree_1 (tree *tp, walk_tree_fn func, void *data,
switch (OMP_CLAUSE_CODE (*tp))
{
case OMP_CLAUSE_GANG:
case OMP_CLAUSE__GRIDDIM_:
WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, 1));
/* FALLTHRU */

View File

@ -1636,6 +1636,14 @@ extern void protected_set_expr_location (tree, location_t);
#define OMP_CLAUSE_TILE_LIST(NODE) \
OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_TILE), 0)
#define OMP_CLAUSE__GRIDDIM__DIMENSION(NODE) \
(OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_)\
->omp_clause.subcode.dimension)
#define OMP_CLAUSE__GRIDDIM__SIZE(NODE) \
OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_), 0)
#define OMP_CLAUSE__GRIDDIM__GROUP(NODE) \
OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_), 1)
/* SSA_NAME accessors. */
/* Returns the IDENTIFIER_NODE giving the SSA name a name or NULL_TREE

View File

@ -1,3 +1,16 @@
2016-01-19 Martin Jambor <mjambor@suse.cz>
* gomp-constants.h (GOMP_DEVICE_HSA): New macro.
(GOMP_VERSION_HSA): Likewise.
(GOMP_TARGET_ARG_DEVICE_MASK): Likewise.
(GOMP_TARGET_ARG_DEVICE_ALL): Likewise.
(GOMP_TARGET_ARG_SUBSEQUENT_PARAM): Likewise.
(GOMP_TARGET_ARG_ID_MASK): Likewise.
(GOMP_TARGET_ARG_NUM_TEAMS): Likewise.
(GOMP_TARGET_ARG_THREAD_LIMIT): Likewise.
(GOMP_TARGET_ARG_VALUE_SHIFT): Likewise.
(GOMP_TARGET_ARG_HSA_KERNEL_ATTRIBUTES): Likewise.
2016-01-07 Mike Frysinger <vapier@gentoo.org>
* longlong.h: Change !__SHMEDIA__ to

View File

@ -176,6 +176,7 @@ enum gomp_map_kind
#define GOMP_DEVICE_NOT_HOST 4
#define GOMP_DEVICE_NVIDIA_PTX 5
#define GOMP_DEVICE_INTEL_MIC 6
#define GOMP_DEVICE_HSA 7
#define GOMP_DEVICE_ICV -1
#define GOMP_DEVICE_HOST_FALLBACK -2
@ -201,6 +202,7 @@ enum gomp_map_kind
#define GOMP_VERSION 0
#define GOMP_VERSION_NVIDIA_PTX 1
#define GOMP_VERSION_INTEL_MIC 0
#define GOMP_VERSION_HSA 0
#define GOMP_VERSION_PACK(LIB, DEV) (((LIB) << 16) | (DEV))
#define GOMP_VERSION_LIB(PACK) (((PACK) >> 16) & 0xffff)
@ -228,4 +230,30 @@ enum gomp_map_kind
#define GOMP_LAUNCH_OP(X) (((X) >> GOMP_LAUNCH_OP_SHIFT) & 0xffff)
#define GOMP_LAUNCH_OP_MAX 0xffff
/* Bitmask to apply in order to find out the intended device of a target
argument. */
#define GOMP_TARGET_ARG_DEVICE_MASK ((1 << 7) - 1)
/* The target argument is significant for all devices. */
#define GOMP_TARGET_ARG_DEVICE_ALL 0
/* Flag set when the subsequent element in the device-specific argument
values. */
#define GOMP_TARGET_ARG_SUBSEQUENT_PARAM (1 << 7)
/* Bitmask to apply to a target argument to find out the value identifier. */
#define GOMP_TARGET_ARG_ID_MASK (((1 << 8) - 1) << 8)
/* Target argument index of NUM_TEAMS. */
#define GOMP_TARGET_ARG_NUM_TEAMS (1 << 8)
/* Target argument index of THREAD_LIMIT. */
#define GOMP_TARGET_ARG_THREAD_LIMIT (2 << 8)
/* If the value is directly embeded in target argument, it should be a 16-bit
at most and shifted by this many bits. */
#define GOMP_TARGET_ARG_VALUE_SHIFT 16
/* HSA specific data structures. */
/* Identifiers of device-specific target arguments. */
#define GOMP_TARGET_ARG_HSA_KERNEL_ATTRIBUTES (1 << 8)
#endif

View File

@ -1,3 +1,64 @@
2016-01-19 Martin Jambor <mjambor@suse.cz>
Martin Liska <mliska@suse.cz>
* plugin/Makefrag.am: Add HSA plugin requirements.
* plugin/configfrag.ac (HSA_RUNTIME_INCLUDE): New variable.
(HSA_RUNTIME_LIB): Likewise.
(HSA_RUNTIME_CPPFLAGS): Likewise.
(HSA_RUNTIME_INCLUDE): New substitution.
(HSA_RUNTIME_LIB): Likewise.
(HSA_RUNTIME_LDFLAGS): Likewise.
(hsa-runtime): New configure option.
(hsa-runtime-include): Likewise.
(hsa-runtime-lib): Likewise.
(PLUGIN_HSA): New substitution variable.
Fill HSA_RUNTIME_INCLUDE and HSA_RUNTIME_LIB according to the new
configure options.
(PLUGIN_HSA_CPPFLAGS): Likewise.
(PLUGIN_HSA_LDFLAGS): Likewise.
(PLUGIN_HSA_LIBS): Likewise.
Check that we have access to HSA run-time.
* libgomp-plugin.h (offload_target_type): New element
OFFLOAD_TARGET_TYPE_HSA.
* libgomp.h (gomp_target_task): New fields firstprivate_copies and
args.
(bool gomp_create_target_task): Updated.
(gomp_device_descr): Extra parameter of run_func and async_run_func,
new field can_run_func.
* libgomp_g.h (GOMP_target_ext): Update prototype.
* oacc-host.c (host_run): Added a new parameter args.
* target.c (calculate_firstprivate_requirements): New function.
(copy_firstprivate_data): Likewise.
(gomp_target_fallback_firstprivate): Use them.
(gomp_target_unshare_firstprivate): New function.
(gomp_get_target_fn_addr): Allow returning NULL for shared memory
devices.
(GOMP_target): Do host fallback for all shared memory devices. Do not
pass any args to plugins.
(GOMP_target_ext): Introduce device-specific argument parameter args.
Allow host fallback if device shares memory. Do not remap data if
device has shared memory.
(gomp_target_task_fn): Likewise. Also treat shared memory devices
like host fallback for mappings.
(GOMP_target_data): Treat shared memory devices like host fallback.
(GOMP_target_data_ext): Likewise.
(GOMP_target_update): Likewise.
(GOMP_target_update_ext): Likewise. Also pass NULL as args to
gomp_create_target_task.
(GOMP_target_enter_exit_data): Likewise.
(omp_target_alloc): Treat shared memory devices like host fallback.
(omp_target_free): Likewise.
(omp_target_is_present): Likewise.
(omp_target_memcpy): Likewise.
(omp_target_memcpy_rect): Likewise.
(omp_target_associate_ptr): Likewise.
(gomp_load_plugin_for_device): Also load can_run.
* task.c (GOMP_PLUGIN_target_task_completion): Free
firstprivate_copies.
(gomp_create_target_task): Accept new argument args and store it to
ttask.
* plugin/plugin-hsa.c: New file.
2016-01-18 Tom de Vries <tom@codesourcery.com>
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: New test.

View File

@ -17,7 +17,7 @@
# Plugins for offload execution, Makefile.am fragment.
#
# Copyright (C) 2014-2015 Free Software Foundation, Inc.
# Copyright (C) 2014-2016 Free Software Foundation, Inc.
#
# Contributed by Mentor Embedded.
#
@ -89,7 +89,8 @@ DIST_COMMON = $(top_srcdir)/plugin/Makefrag.am ChangeLog \
$(srcdir)/omp_lib.f90.in $(srcdir)/libgomp_f.h.in \
$(srcdir)/libgomp.spec.in $(srcdir)/../depcomp
@PLUGIN_NVPTX_TRUE@am__append_1 = libgomp-plugin-nvptx.la
@USE_FORTRAN_TRUE@am__append_2 = openacc.f90
@PLUGIN_HSA_TRUE@am__append_2 = libgomp-plugin-hsa.la
@USE_FORTRAN_TRUE@am__append_3 = openacc.f90
subdir = .
ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
am__aclocal_m4_deps = $(top_srcdir)/../config/acx.m4 \
@ -147,6 +148,17 @@ am__installdirs = "$(DESTDIR)$(toolexeclibdir)" "$(DESTDIR)$(infodir)" \
"$(DESTDIR)$(toolexeclibdir)"
LTLIBRARIES = $(toolexeclib_LTLIBRARIES)
am__DEPENDENCIES_1 =
@PLUGIN_HSA_TRUE@libgomp_plugin_hsa_la_DEPENDENCIES = libgomp.la \
@PLUGIN_HSA_TRUE@ $(am__DEPENDENCIES_1)
@PLUGIN_HSA_TRUE@am_libgomp_plugin_hsa_la_OBJECTS = \
@PLUGIN_HSA_TRUE@ libgomp_plugin_hsa_la-plugin-hsa.lo
libgomp_plugin_hsa_la_OBJECTS = $(am_libgomp_plugin_hsa_la_OBJECTS)
libgomp_plugin_hsa_la_LINK = $(LIBTOOL) --tag=CC \
$(libgomp_plugin_hsa_la_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \
--mode=link $(CCLD) $(AM_CFLAGS) $(CFLAGS) \
$(libgomp_plugin_hsa_la_LDFLAGS) $(LDFLAGS) -o $@
@PLUGIN_HSA_TRUE@am_libgomp_plugin_hsa_la_rpath = -rpath \
@PLUGIN_HSA_TRUE@ $(toolexeclibdir)
@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_DEPENDENCIES = libgomp.la \
@PLUGIN_NVPTX_TRUE@ $(am__DEPENDENCIES_1)
@PLUGIN_NVPTX_TRUE@am_libgomp_plugin_nvptx_la_OBJECTS = \
@ -187,7 +199,8 @@ FCLD = $(FC)
FCLINK = $(LIBTOOL) --tag=FC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \
--mode=link $(FCLD) $(AM_FCFLAGS) $(FCFLAGS) $(AM_LDFLAGS) \
$(LDFLAGS) -o $@
SOURCES = $(libgomp_plugin_nvptx_la_SOURCES) $(libgomp_la_SOURCES)
SOURCES = $(libgomp_plugin_hsa_la_SOURCES) \
$(libgomp_plugin_nvptx_la_SOURCES) $(libgomp_la_SOURCES)
MULTISRCTOP =
MULTIBUILDTOP =
MULTIDIRS =
@ -255,6 +268,8 @@ FC = @FC@
FCFLAGS = @FCFLAGS@
FGREP = @FGREP@
GREP = @GREP@
HSA_RUNTIME_INCLUDE = @HSA_RUNTIME_INCLUDE@
HSA_RUNTIME_LIB = @HSA_RUNTIME_LIB@
INSTALL = @INSTALL@
INSTALL_DATA = @INSTALL_DATA@
INSTALL_PROGRAM = @INSTALL_PROGRAM@
@ -299,6 +314,10 @@ PACKAGE_URL = @PACKAGE_URL@
PACKAGE_VERSION = @PACKAGE_VERSION@
PATH_SEPARATOR = @PATH_SEPARATOR@
PERL = @PERL@
PLUGIN_HSA = @PLUGIN_HSA@
PLUGIN_HSA_CPPFLAGS = @PLUGIN_HSA_CPPFLAGS@
PLUGIN_HSA_LDFLAGS = @PLUGIN_HSA_LDFLAGS@
PLUGIN_HSA_LIBS = @PLUGIN_HSA_LIBS@
PLUGIN_NVPTX = @PLUGIN_NVPTX@
PLUGIN_NVPTX_CPPFLAGS = @PLUGIN_NVPTX_CPPFLAGS@
PLUGIN_NVPTX_LDFLAGS = @PLUGIN_NVPTX_LDFLAGS@
@ -391,7 +410,7 @@ libsubincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/include
AM_CPPFLAGS = $(addprefix -I, $(search_path))
AM_CFLAGS = $(XCFLAGS)
AM_LDFLAGS = $(XLDFLAGS) $(SECTION_LDFLAGS) $(OPT_LDFLAGS)
toolexeclib_LTLIBRARIES = libgomp.la $(am__append_1)
toolexeclib_LTLIBRARIES = libgomp.la $(am__append_1) $(am__append_2)
nodist_toolexeclib_HEADERS = libgomp.spec
# -Wc is only a libtool option.
@ -415,7 +434,7 @@ libgomp_la_SOURCES = alloc.c barrier.c critical.c env.c error.c iter.c \
bar.c ptrlock.c time.c fortran.c affinity.c target.c \
splay-tree.c libgomp-plugin.c oacc-parallel.c oacc-host.c \
oacc-init.c oacc-mem.c oacc-async.c oacc-plugin.c oacc-cuda.c \
priority_queue.c $(am__append_2)
priority_queue.c $(am__append_3)
# Nvidia PTX OpenACC plugin.
@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info $(libtool_VERSION)
@ -426,6 +445,16 @@ libgomp_la_SOURCES = alloc.c barrier.c critical.c env.c error.c iter.c \
@PLUGIN_NVPTX_TRUE@ $(lt_host_flags) $(PLUGIN_NVPTX_LDFLAGS)
@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LIBADD = libgomp.la $(PLUGIN_NVPTX_LIBS)
@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LIBTOOLFLAGS = --tag=disable-static
# Heterogenous Systems Architecture plugin
@PLUGIN_HSA_TRUE@libgomp_plugin_hsa_version_info = -version-info $(libtool_VERSION)
@PLUGIN_HSA_TRUE@libgomp_plugin_hsa_la_SOURCES = plugin/plugin-hsa.c
@PLUGIN_HSA_TRUE@libgomp_plugin_hsa_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_HSA_CPPFLAGS)
@PLUGIN_HSA_TRUE@libgomp_plugin_hsa_la_LDFLAGS = \
@PLUGIN_HSA_TRUE@ $(libgomp_plugin_hsa_version_info) \
@PLUGIN_HSA_TRUE@ $(lt_host_flags) $(PLUGIN_HSA_LDFLAGS)
@PLUGIN_HSA_TRUE@libgomp_plugin_hsa_la_LIBADD = libgomp.la $(PLUGIN_HSA_LIBS)
@PLUGIN_HSA_TRUE@libgomp_plugin_hsa_la_LIBTOOLFLAGS = --tag=disable-static
nodist_noinst_HEADERS = libgomp_f.h
nodist_libsubinclude_HEADERS = omp.h openacc.h
@USE_FORTRAN_TRUE@nodist_finclude_HEADERS = omp_lib.h omp_lib.f90 omp_lib.mod omp_lib_kinds.mod \
@ -553,6 +582,8 @@ clean-toolexeclibLTLIBRARIES:
echo "rm -f \"$${dir}/so_locations\""; \
rm -f "$${dir}/so_locations"; \
done
libgomp-plugin-hsa.la: $(libgomp_plugin_hsa_la_OBJECTS) $(libgomp_plugin_hsa_la_DEPENDENCIES) $(EXTRA_libgomp_plugin_hsa_la_DEPENDENCIES)
$(libgomp_plugin_hsa_la_LINK) $(am_libgomp_plugin_hsa_la_rpath) $(libgomp_plugin_hsa_la_OBJECTS) $(libgomp_plugin_hsa_la_LIBADD) $(LIBS)
libgomp-plugin-nvptx.la: $(libgomp_plugin_nvptx_la_OBJECTS) $(libgomp_plugin_nvptx_la_DEPENDENCIES) $(EXTRA_libgomp_plugin_nvptx_la_DEPENDENCIES)
$(libgomp_plugin_nvptx_la_LINK) $(am_libgomp_plugin_nvptx_la_rpath) $(libgomp_plugin_nvptx_la_OBJECTS) $(libgomp_plugin_nvptx_la_LIBADD) $(LIBS)
libgomp.la: $(libgomp_la_OBJECTS) $(libgomp_la_DEPENDENCIES) $(EXTRA_libgomp_la_DEPENDENCIES)
@ -575,6 +606,7 @@ distclean-compile:
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/iter.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/iter_ull.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libgomp-plugin.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libgomp_plugin_hsa_la-plugin-hsa.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libgomp_plugin_nvptx_la-plugin-nvptx.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/lock.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/loop.Plo@am__quote@
@ -623,6 +655,13 @@ distclean-compile:
@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
@am__fastdepCC_FALSE@ $(LTCOMPILE) -c -o $@ $<
libgomp_plugin_hsa_la-plugin-hsa.lo: plugin/plugin-hsa.c
@am__fastdepCC_TRUE@ $(LIBTOOL) --tag=CC $(libgomp_plugin_hsa_la_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libgomp_plugin_hsa_la_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -MT libgomp_plugin_hsa_la-plugin-hsa.lo -MD -MP -MF $(DEPDIR)/libgomp_plugin_hsa_la-plugin-hsa.Tpo -c -o libgomp_plugin_hsa_la-plugin-hsa.lo `test -f 'plugin/plugin-hsa.c' || echo '$(srcdir)/'`plugin/plugin-hsa.c
@am__fastdepCC_TRUE@ $(am__mv) $(DEPDIR)/libgomp_plugin_hsa_la-plugin-hsa.Tpo $(DEPDIR)/libgomp_plugin_hsa_la-plugin-hsa.Plo
@AMDEP_TRUE@@am__fastdepCC_FALSE@ source='plugin/plugin-hsa.c' object='libgomp_plugin_hsa_la-plugin-hsa.lo' libtool=yes @AMDEPBACKSLASH@
@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
@am__fastdepCC_FALSE@ $(LIBTOOL) --tag=CC $(libgomp_plugin_hsa_la_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libgomp_plugin_hsa_la_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o libgomp_plugin_hsa_la-plugin-hsa.lo `test -f 'plugin/plugin-hsa.c' || echo '$(srcdir)/'`plugin/plugin-hsa.c
libgomp_plugin_nvptx_la-plugin-nvptx.lo: plugin/plugin-nvptx.c
@am__fastdepCC_TRUE@ $(LIBTOOL) --tag=CC $(libgomp_plugin_nvptx_la_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libgomp_plugin_nvptx_la_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -MT libgomp_plugin_nvptx_la-plugin-nvptx.lo -MD -MP -MF $(DEPDIR)/libgomp_plugin_nvptx_la-plugin-nvptx.Tpo -c -o libgomp_plugin_nvptx_la-plugin-nvptx.lo `test -f 'plugin/plugin-nvptx.c' || echo '$(srcdir)/'`plugin/plugin-nvptx.c
@am__fastdepCC_TRUE@ $(am__mv) $(DEPDIR)/libgomp_plugin_nvptx_la-plugin-nvptx.Tpo $(DEPDIR)/libgomp_plugin_nvptx_la-plugin-nvptx.Plo

View File

@ -60,6 +60,9 @@
/* Define to 1 if you have the `strtoull' function. */
#undef HAVE_STRTOULL
/* Define to 1 if the system has the type `struct _Mutex_Control'. */
#undef HAVE_STRUCT__MUTEX_CONTROL
/* Define to 1 if the target runtime linker supports binding the same symbol
to different versions. */
#undef HAVE_SYMVER_SYMBOL_RENAMING_RUNTIME_SUPPORT
@ -119,6 +122,9 @@
/* Define to the version of this package. */
#undef PACKAGE_VERSION
/* Define to 1 if the HSA plugin is built, 0 if not. */
#undef PLUGIN_HSA
/* Define to 1 if the NVIDIA plugin is built, 0 if not. */
#undef PLUGIN_NVPTX

166
libgomp/configure vendored
View File

@ -627,10 +627,18 @@ LIBGOMP_BUILD_VERSIONED_SHLIB_FALSE
LIBGOMP_BUILD_VERSIONED_SHLIB_TRUE
OPT_LDFLAGS
SECTION_LDFLAGS
PLUGIN_HSA_FALSE
PLUGIN_HSA_TRUE
PLUGIN_NVPTX_FALSE
PLUGIN_NVPTX_TRUE
offload_additional_lib_paths
offload_additional_options
PLUGIN_HSA_LIBS
PLUGIN_HSA_LDFLAGS
PLUGIN_HSA_CPPFLAGS
PLUGIN_HSA
HSA_RUNTIME_LIB
HSA_RUNTIME_INCLUDE
PLUGIN_NVPTX_LIBS
PLUGIN_NVPTX_LDFLAGS
PLUGIN_NVPTX_CPPFLAGS
@ -782,6 +790,10 @@ enable_maintainer_mode
with_cuda_driver
with_cuda_driver_include
with_cuda_driver_lib
with_hsa_runtime
with_hsa_runtime_include
with_hsa_runtime_lib
with_hsa_kmt_lib
enable_linux_futex
enable_tls
enable_symvers
@ -1453,6 +1465,17 @@ Optional Packages:
--with-cuda-driver-lib=PATH
specify directory for the installed CUDA driver
library
--with-hsa-runtime=PATH specify prefix directory for installed HSA run-time
package. Equivalent to
--with-hsa-runtime-include=PATH/include plus
--with-hsa-runtime-lib=PATH/lib
--with-hsa-runtime-include=PATH
specify directory for installed HSA run-time include
files
--with-hsa-runtime-lib=PATH
specify directory for the installed HSA run-time
library
--with-hsa-kmt-lib=PATH specify directory for installed HSA KMT library.
Some influential environment variables:
CC C compiler command
@ -11121,7 +11144,7 @@ else
lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
lt_status=$lt_dlunknown
cat > conftest.$ac_ext <<_LT_EOF
#line 11124 "configure"
#line 11147 "configure"
#include "confdefs.h"
#if HAVE_DLFCN_H
@ -11227,7 +11250,7 @@ else
lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
lt_status=$lt_dlunknown
cat > conftest.$ac_ext <<_LT_EOF
#line 11230 "configure"
#line 11253 "configure"
#include "confdefs.h"
#if HAVE_DLFCN_H
@ -15090,7 +15113,7 @@ esac
# Plugins for offload execution, configure.ac fragment. -*- mode: autoconf -*-
#
# Copyright (C) 2014-2015 Free Software Foundation, Inc.
# Copyright (C) 2014-2016 Free Software Foundation, Inc.
#
# Contributed by Mentor Embedded.
#
@ -15225,6 +15248,72 @@ PLUGIN_NVPTX_LIBS=
# Look for HSA run-time, its includes and libraries
HSA_RUNTIME_INCLUDE=
HSA_RUNTIME_LIB=
HSA_RUNTIME_CPPFLAGS=
HSA_RUNTIME_LDFLAGS=
# Check whether --with-hsa-runtime was given.
if test "${with_hsa_runtime+set}" = set; then :
withval=$with_hsa_runtime;
fi
# Check whether --with-hsa-runtime-include was given.
if test "${with_hsa_runtime_include+set}" = set; then :
withval=$with_hsa_runtime_include;
fi
# Check whether --with-hsa-runtime-lib was given.
if test "${with_hsa_runtime_lib+set}" = set; then :
withval=$with_hsa_runtime_lib;
fi
if test "x$with_hsa_runtime" != x; then
HSA_RUNTIME_INCLUDE=$with_hsa_runtime/include
HSA_RUNTIME_LIB=$with_hsa_runtime/lib
fi
if test "x$with_hsa_runtime_include" != x; then
HSA_RUNTIME_INCLUDE=$with_hsa_runtime_include
fi
if test "x$with_hsa_runtime_lib" != x; then
HSA_RUNTIME_LIB=$with_hsa_runtime_lib
fi
if test "x$HSA_RUNTIME_INCLUDE" != x; then
HSA_RUNTIME_CPPFLAGS=-I$HSA_RUNTIME_INCLUDE
fi
if test "x$HSA_RUNTIME_LIB" != x; then
HSA_RUNTIME_LDFLAGS=-L$HSA_RUNTIME_LIB
fi
# Check whether --with-hsa-kmt-lib was given.
if test "${with_hsa_kmt_lib+set}" = set; then :
withval=$with_hsa_kmt_lib;
fi
if test "x$with_hsa_kmt_lib" != x; then
HSA_RUNTIME_LDFLAGS="$HSA_RUNTIME_LDFLAGS -L$with_hsa_kmt_lib"
HSA_RUNTIME_LIB=
fi
PLUGIN_HSA=0
PLUGIN_HSA_CPPFLAGS=
PLUGIN_HSA_LDFLAGS=
PLUGIN_HSA_LIBS=
# Get offload targets and path to install tree of offloading compiler.
offload_additional_options=
offload_additional_lib_paths=
@ -15277,6 +15366,60 @@ rm -f core conftest.err conftest.$ac_objext \
;;
esac
;;
hsa*)
case "${target}" in
x86_64-*-*)
case " ${CC} ${CFLAGS} " in
*" -m32 "*)
PLUGIN_HSA=0
;;
*)
tgt_name=hsa
PLUGIN_HSA=$tgt
PLUGIN_HSA_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
PLUGIN_HSA_LDFLAGS=$HSA_RUNTIME_LDFLAGS
PLUGIN_HSA_LIBS="-lhsa-runtime64 -lhsakmt"
PLUGIN_HSA_save_CPPFLAGS=$CPPFLAGS
CPPFLAGS="$PLUGIN_HSA_CPPFLAGS $CPPFLAGS"
PLUGIN_HSA_save_LDFLAGS=$LDFLAGS
LDFLAGS="$PLUGIN_HSA_LDFLAGS $LDFLAGS"
PLUGIN_HSA_save_LIBS=$LIBS
LIBS="$PLUGIN_HSA_LIBS $LIBS"
cat confdefs.h - <<_ACEOF >conftest.$ac_ext
/* end confdefs.h. */
#include "hsa.h"
int
main ()
{
hsa_status_t status = hsa_init ()
;
return 0;
}
_ACEOF
if ac_fn_c_try_link "$LINENO"; then :
PLUGIN_HSA=1
fi
rm -f core conftest.err conftest.$ac_objext \
conftest$ac_exeext conftest.$ac_ext
CPPFLAGS=$PLUGIN_HSA_save_CPPFLAGS
LDFLAGS=$PLUGIN_HSA_save_LDFLAGS
LIBS=$PLUGIN_HSA_save_LIBS
case $PLUGIN_HSA in
hsa*)
HSA_PLUGIN=0
as_fn_error "HSA run-time package required for HSA support" "$LINENO" 5
;;
esac
;;
esac
;;
*-*-*)
PLUGIN_HSA=0
;;
esac
;;
*)
as_fn_error "unknown offload target specified" "$LINENO" 5
;;
@ -15313,6 +15456,19 @@ cat >>confdefs.h <<_ACEOF
#define PLUGIN_NVPTX $PLUGIN_NVPTX
_ACEOF
if test $PLUGIN_HSA = 1; then
PLUGIN_HSA_TRUE=
PLUGIN_HSA_FALSE='#'
else
PLUGIN_HSA_TRUE='#'
PLUGIN_HSA_FALSE=
fi
cat >>confdefs.h <<_ACEOF
#define PLUGIN_HSA $PLUGIN_HSA
_ACEOF
# Check for functions needed.
@ -16712,6 +16868,10 @@ if test -z "${PLUGIN_NVPTX_TRUE}" && test -z "${PLUGIN_NVPTX_FALSE}"; then
as_fn_error "conditional \"PLUGIN_NVPTX\" was never defined.
Usually this means the macro was only invoked conditionally." "$LINENO" 5
fi
if test -z "${PLUGIN_HSA_TRUE}" && test -z "${PLUGIN_HSA_FALSE}"; then
as_fn_error "conditional \"PLUGIN_HSA\" was never defined.
Usually this means the macro was only invoked conditionally." "$LINENO" 5
fi
if test -z "${LIBGOMP_BUILD_VERSIONED_SHLIB_TRUE}" && test -z "${LIBGOMP_BUILD_VERSIONED_SHLIB_FALSE}"; then
as_fn_error "conditional \"LIBGOMP_BUILD_VERSIONED_SHLIB\" was never defined.
Usually this means the macro was only invoked conditionally." "$LINENO" 5

View File

@ -48,7 +48,8 @@ enum offload_target_type
OFFLOAD_TARGET_TYPE_HOST = 2,
/* OFFLOAD_TARGET_TYPE_HOST_NONSHM = 3 removed. */
OFFLOAD_TARGET_TYPE_NVIDIA_PTX = 5,
OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
OFFLOAD_TARGET_TYPE_INTEL_MIC = 6,
OFFLOAD_TARGET_TYPE_HSA = 7
};
/* Auxiliary struct, used for transferring pairs of addresses from plugin

View File

@ -496,6 +496,10 @@ struct gomp_target_task
struct target_mem_desc *tgt;
struct gomp_task *task;
struct gomp_team *team;
/* Copies of firstprivate mapped data for shared memory accelerators. */
void *firstprivate_copies;
/* Device-specific target arguments. */
void **args;
void *hostaddrs[];
};
@ -750,7 +754,8 @@ extern void gomp_task_maybe_wait_for_dependencies (void **);
extern bool gomp_create_target_task (struct gomp_device_descr *,
void (*) (void *), size_t, void **,
size_t *, unsigned short *, unsigned int,
void **, enum gomp_target_task_state);
void **, void **,
enum gomp_target_task_state);
static void inline
gomp_finish_task (struct gomp_task *task)
@ -937,8 +942,9 @@ struct gomp_device_descr
void *(*dev2host_func) (int, void *, const void *, size_t);
void *(*host2dev_func) (int, void *, const void *, size_t);
void *(*dev2dev_func) (int, void *, const void *, size_t);
void (*run_func) (int, void *, void *);
void (*async_run_func) (int, void *, void *, void *);
bool (*can_run_func) (void *);
void (*run_func) (int, void *, void *, void **);
void (*async_run_func) (int, void *, void *, void **, void *);
/* Splay tree containing information about mapped memory regions. */
struct splay_tree_s mem_map;

View File

@ -278,8 +278,7 @@ extern void GOMP_single_copy_end (void *);
extern void GOMP_target (int, void (*) (void *), const void *,
size_t, void **, size_t *, unsigned char *);
extern void GOMP_target_ext (int, void (*) (void *), size_t, void **, size_t *,
unsigned short *, unsigned int, void **,
int, int);
unsigned short *, unsigned int, void **, void **);
extern void GOMP_target_data (int, const void *,
size_t, void **, size_t *, unsigned char *);
extern void GOMP_target_data_ext (int, size_t, void **, size_t *,

View File

@ -123,7 +123,8 @@ host_host2dev (int n __attribute__ ((unused)),
}
static void
host_run (int n __attribute__ ((unused)), void *fn_ptr, void *vars)
host_run (int n __attribute__ ((unused)), void *fn_ptr, void *vars,
void **args __attribute__((unused)))
{
void (*fn)(void *) = (void (*)(void *)) fn_ptr;

View File

@ -38,3 +38,16 @@ libgomp_plugin_nvptx_la_LDFLAGS += $(PLUGIN_NVPTX_LDFLAGS)
libgomp_plugin_nvptx_la_LIBADD = libgomp.la $(PLUGIN_NVPTX_LIBS)
libgomp_plugin_nvptx_la_LIBTOOLFLAGS = --tag=disable-static
endif
if PLUGIN_HSA
# Heterogenous Systems Architecture plugin
libgomp_plugin_hsa_version_info = -version-info $(libtool_VERSION)
toolexeclib_LTLIBRARIES += libgomp-plugin-hsa.la
libgomp_plugin_hsa_la_SOURCES = plugin/plugin-hsa.c
libgomp_plugin_hsa_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_HSA_CPPFLAGS)
libgomp_plugin_hsa_la_LDFLAGS = $(libgomp_plugin_hsa_version_info) \
$(lt_host_flags)
libgomp_plugin_hsa_la_LDFLAGS += $(PLUGIN_HSA_LDFLAGS)
libgomp_plugin_hsa_la_LIBADD = libgomp.la $(PLUGIN_HSA_LIBS)
libgomp_plugin_hsa_la_LIBTOOLFLAGS = --tag=disable-static
endif

View File

@ -81,6 +81,62 @@ AC_SUBST(PLUGIN_NVPTX_CPPFLAGS)
AC_SUBST(PLUGIN_NVPTX_LDFLAGS)
AC_SUBST(PLUGIN_NVPTX_LIBS)
# Look for HSA run-time, its includes and libraries
HSA_RUNTIME_INCLUDE=
HSA_RUNTIME_LIB=
AC_SUBST(HSA_RUNTIME_INCLUDE)
AC_SUBST(HSA_RUNTIME_LIB)
HSA_RUNTIME_CPPFLAGS=
HSA_RUNTIME_LDFLAGS=
AC_ARG_WITH(hsa-runtime,
[AS_HELP_STRING([--with-hsa-runtime=PATH],
[specify prefix directory for installed HSA run-time package.
Equivalent to --with-hsa-runtime-include=PATH/include
plus --with-hsa-runtime-lib=PATH/lib])])
AC_ARG_WITH(hsa-runtime-include,
[AS_HELP_STRING([--with-hsa-runtime-include=PATH],
[specify directory for installed HSA run-time include files])])
AC_ARG_WITH(hsa-runtime-lib,
[AS_HELP_STRING([--with-hsa-runtime-lib=PATH],
[specify directory for the installed HSA run-time library])])
if test "x$with_hsa_runtime" != x; then
HSA_RUNTIME_INCLUDE=$with_hsa_runtime/include
HSA_RUNTIME_LIB=$with_hsa_runtime/lib
fi
if test "x$with_hsa_runtime_include" != x; then
HSA_RUNTIME_INCLUDE=$with_hsa_runtime_include
fi
if test "x$with_hsa_runtime_lib" != x; then
HSA_RUNTIME_LIB=$with_hsa_runtime_lib
fi
if test "x$HSA_RUNTIME_INCLUDE" != x; then
HSA_RUNTIME_CPPFLAGS=-I$HSA_RUNTIME_INCLUDE
fi
if test "x$HSA_RUNTIME_LIB" != x; then
HSA_RUNTIME_LDFLAGS=-L$HSA_RUNTIME_LIB
fi
AC_ARG_WITH(hsa-kmt-lib,
[AS_HELP_STRING([--with-hsa-kmt-lib=PATH],
[specify directory for installed HSA KMT library.])])
if test "x$with_hsa_kmt_lib" != x; then
HSA_RUNTIME_LDFLAGS="$HSA_RUNTIME_LDFLAGS -L$with_hsa_kmt_lib"
HSA_RUNTIME_LIB=
fi
PLUGIN_HSA=0
PLUGIN_HSA_CPPFLAGS=
PLUGIN_HSA_LDFLAGS=
PLUGIN_HSA_LIBS=
AC_SUBST(PLUGIN_HSA)
AC_SUBST(PLUGIN_HSA_CPPFLAGS)
AC_SUBST(PLUGIN_HSA_LDFLAGS)
AC_SUBST(PLUGIN_HSA_LIBS)
# Get offload targets and path to install tree of offloading compiler.
offload_additional_options=
offload_additional_lib_paths=
@ -122,6 +178,49 @@ if test x"$enable_offload_targets" != x; then
;;
esac
;;
hsa*)
case "${target}" in
x86_64-*-*)
case " ${CC} ${CFLAGS} " in
*" -m32 "*)
PLUGIN_HSA=0
;;
*)
tgt_name=hsa
PLUGIN_HSA=$tgt
PLUGIN_HSA_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
PLUGIN_HSA_LDFLAGS=$HSA_RUNTIME_LDFLAGS
PLUGIN_HSA_LIBS="-lhsa-runtime64 -lhsakmt"
PLUGIN_HSA_save_CPPFLAGS=$CPPFLAGS
CPPFLAGS="$PLUGIN_HSA_CPPFLAGS $CPPFLAGS"
PLUGIN_HSA_save_LDFLAGS=$LDFLAGS
LDFLAGS="$PLUGIN_HSA_LDFLAGS $LDFLAGS"
PLUGIN_HSA_save_LIBS=$LIBS
LIBS="$PLUGIN_HSA_LIBS $LIBS"
AC_LINK_IFELSE(
[AC_LANG_PROGRAM(
[#include "hsa.h"],
[hsa_status_t status = hsa_init ()])],
[PLUGIN_HSA=1])
CPPFLAGS=$PLUGIN_HSA_save_CPPFLAGS
LDFLAGS=$PLUGIN_HSA_save_LDFLAGS
LIBS=$PLUGIN_HSA_save_LIBS
case $PLUGIN_HSA in
hsa*)
HSA_PLUGIN=0
AC_MSG_ERROR([HSA run-time package required for HSA support])
;;
esac
;;
esac
;;
*-*-*)
PLUGIN_HSA=0
;;
esac
;;
*)
AC_MSG_ERROR([unknown offload target specified])
;;
@ -145,3 +244,6 @@ AC_DEFINE_UNQUOTED(OFFLOAD_TARGETS, "$offload_targets",
AM_CONDITIONAL([PLUGIN_NVPTX], [test $PLUGIN_NVPTX = 1])
AC_DEFINE_UNQUOTED([PLUGIN_NVPTX], [$PLUGIN_NVPTX],
[Define to 1 if the NVIDIA plugin is built, 0 if not.])
AM_CONDITIONAL([PLUGIN_HSA], [test $PLUGIN_HSA = 1])
AC_DEFINE_UNQUOTED([PLUGIN_HSA], [$PLUGIN_HSA],
[Define to 1 if the HSA plugin is built, 0 if not.])

1493
libgomp/plugin/plugin-hsa.c Normal file

File diff suppressed because it is too large Load Diff

View File

@ -1329,6 +1329,49 @@ gomp_target_fallback (void (*fn) (void *), void **hostaddrs)
*thr = old_thr;
}
/* Calculate alignment and size requirements of a private copy of data shared
as GOMP_MAP_FIRSTPRIVATE and store them to TGT_ALIGN and TGT_SIZE. */
static inline void
calculate_firstprivate_requirements (size_t mapnum, size_t *sizes,
unsigned short *kinds, size_t *tgt_align,
size_t *tgt_size)
{
size_t i;
for (i = 0; i < mapnum; i++)
if ((kinds[i] & 0xff) == GOMP_MAP_FIRSTPRIVATE)
{
size_t align = (size_t) 1 << (kinds[i] >> 8);
if (*tgt_align < align)
*tgt_align = align;
*tgt_size = (*tgt_size + align - 1) & ~(align - 1);
*tgt_size += sizes[i];
}
}
/* Copy data shared as GOMP_MAP_FIRSTPRIVATE to DST. */
static inline void
copy_firstprivate_data (char *tgt, size_t mapnum, void **hostaddrs,
size_t *sizes, unsigned short *kinds, size_t tgt_align,
size_t tgt_size)
{
uintptr_t al = (uintptr_t) tgt & (tgt_align - 1);
if (al)
tgt += tgt_align - al;
tgt_size = 0;
size_t i;
for (i = 0; i < mapnum; i++)
if ((kinds[i] & 0xff) == GOMP_MAP_FIRSTPRIVATE)
{
size_t align = (size_t) 1 << (kinds[i] >> 8);
tgt_size = (tgt_size + align - 1) & ~(align - 1);
memcpy (tgt + tgt_size, hostaddrs[i], sizes[i]);
hostaddrs[i] = tgt + tgt_size;
tgt_size = tgt_size + sizes[i];
}
}
/* Host fallback with firstprivate map-type handling. */
static void
@ -1336,37 +1379,40 @@ gomp_target_fallback_firstprivate (void (*fn) (void *), size_t mapnum,
void **hostaddrs, size_t *sizes,
unsigned short *kinds)
{
size_t i, tgt_align = 0, tgt_size = 0;
char *tgt = NULL;
for (i = 0; i < mapnum; i++)
if ((kinds[i] & 0xff) == GOMP_MAP_FIRSTPRIVATE)
{
size_t align = (size_t) 1 << (kinds[i] >> 8);
if (tgt_align < align)
tgt_align = align;
tgt_size = (tgt_size + align - 1) & ~(align - 1);
tgt_size += sizes[i];
}
size_t tgt_align = 0, tgt_size = 0;
calculate_firstprivate_requirements (mapnum, sizes, kinds, &tgt_align,
&tgt_size);
if (tgt_align)
{
tgt = gomp_alloca (tgt_size + tgt_align - 1);
uintptr_t al = (uintptr_t) tgt & (tgt_align - 1);
if (al)
tgt += tgt_align - al;
tgt_size = 0;
for (i = 0; i < mapnum; i++)
if ((kinds[i] & 0xff) == GOMP_MAP_FIRSTPRIVATE)
{
size_t align = (size_t) 1 << (kinds[i] >> 8);
tgt_size = (tgt_size + align - 1) & ~(align - 1);
memcpy (tgt + tgt_size, hostaddrs[i], sizes[i]);
hostaddrs[i] = tgt + tgt_size;
tgt_size = tgt_size + sizes[i];
}
char *tgt = gomp_alloca (tgt_size + tgt_align - 1);
copy_firstprivate_data (tgt, mapnum, hostaddrs, sizes, kinds, tgt_align,
tgt_size);
}
gomp_target_fallback (fn, hostaddrs);
}
/* Handle firstprivate map-type for shared memory devices and the host
fallback. Return the pointer of firstprivate copies which has to be freed
after use. */
static void *
gomp_target_unshare_firstprivate (size_t mapnum, void **hostaddrs,
size_t *sizes, unsigned short *kinds)
{
size_t tgt_align = 0, tgt_size = 0;
char *tgt = NULL;
calculate_firstprivate_requirements (mapnum, sizes, kinds, &tgt_align,
&tgt_size);
if (tgt_align)
{
tgt = gomp_malloc (tgt_size + tgt_align - 1);
copy_firstprivate_data (tgt, mapnum, hostaddrs, sizes, kinds, tgt_align,
tgt_size);
}
return tgt;
}
/* Helper function of GOMP_target{,_ext} routines. */
static void *
@ -1390,7 +1436,12 @@ gomp_get_target_fn_addr (struct gomp_device_descr *devicep,
splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
gomp_mutex_unlock (&devicep->lock);
if (tgt_fn == NULL)
gomp_fatal ("Target function wasn't mapped");
{
if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
return NULL;
else
gomp_fatal ("Target function wasn't mapped");
}
return (void *) tgt_fn->tgt_offset;
}
@ -1416,13 +1467,16 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
void *fn_addr;
if (devicep == NULL
|| !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
/* All shared memory devices should use the GOMP_target_ext function. */
|| devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM
|| !(fn_addr = gomp_get_target_fn_addr (devicep, fn)))
return gomp_target_fallback (fn, hostaddrs);
struct target_mem_desc *tgt_vars
= gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
GOMP_MAP_VARS_TARGET);
devicep->run_func (devicep->target_id, fn_addr, (void *) tgt_vars->tgt_start);
devicep->run_func (devicep->target_id, fn_addr, (void *) tgt_vars->tgt_start,
NULL);
gomp_unmap_vars (tgt_vars, true);
}
@ -1430,6 +1484,15 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
and several arguments have been added:
FLAGS is a bitmask, see GOMP_TARGET_FLAG_* in gomp-constants.h.
DEPEND is array of dependencies, see GOMP_task for details.
ARGS is a pointer to an array consisting of a variable number of both
device-independent and device-specific arguments, which can take one two
elements where the first specifies for which device it is intended, the type
and optionally also the value. If the value is not present in the first
one, the whole second element the actual value. The last element of the
array is a single NULL. Among the device independent can be for example
NUM_TEAMS and THREAD_LIMIT.
NUM_TEAMS is positive if GOMP_teams will be called in the body with
that value, or 1 if teams construct is not present, or 0, if
teams construct does not have num_teams clause and so the choice is
@ -1443,14 +1506,10 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
void
GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
void **hostaddrs, size_t *sizes, unsigned short *kinds,
unsigned int flags, void **depend, int num_teams,
int thread_limit)
unsigned int flags, void **depend, void **args)
{
struct gomp_device_descr *devicep = resolve_device (device);
(void) num_teams;
(void) thread_limit;
if (flags & GOMP_TARGET_FLAG_NOWAIT)
{
struct gomp_thread *thr = gomp_thread ();
@ -1487,7 +1546,7 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
&& !thr->task->final_task)
{
gomp_create_target_task (devicep, fn, mapnum, hostaddrs,
sizes, kinds, flags, depend,
sizes, kinds, flags, depend, args,
GOMP_TARGET_TASK_BEFORE_MAP);
return;
}
@ -1507,17 +1566,30 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
void *fn_addr;
if (devicep == NULL
|| !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
|| !(fn_addr = gomp_get_target_fn_addr (devicep, fn)))
|| !(fn_addr = gomp_get_target_fn_addr (devicep, fn))
|| (devicep->can_run_func && !devicep->can_run_func (fn_addr)))
{
gomp_target_fallback_firstprivate (fn, mapnum, hostaddrs, sizes, kinds);
return;
}
struct target_mem_desc *tgt_vars
= gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, true,
GOMP_MAP_VARS_TARGET);
devicep->run_func (devicep->target_id, fn_addr, (void *) tgt_vars->tgt_start);
gomp_unmap_vars (tgt_vars, true);
struct target_mem_desc *tgt_vars;
void *fpc = NULL;
if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
{
fpc = gomp_target_unshare_firstprivate (mapnum, hostaddrs, sizes, kinds);
tgt_vars = NULL;
}
else
tgt_vars = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds,
true, GOMP_MAP_VARS_TARGET);
devicep->run_func (devicep->target_id, fn_addr,
tgt_vars ? (void *) tgt_vars->tgt_start : hostaddrs,
args);
if (tgt_vars)
gomp_unmap_vars (tgt_vars, true);
else
free (fpc);
}
/* Host fallback for GOMP_target_data{,_ext} routines. */
@ -1547,7 +1619,8 @@ GOMP_target_data (int device, const void *unused, size_t mapnum,
struct gomp_device_descr *devicep = resolve_device (device);
if (devicep == NULL
|| !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
|| !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
|| (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM))
return gomp_target_data_fallback ();
struct target_mem_desc *tgt
@ -1565,7 +1638,8 @@ GOMP_target_data_ext (int device, size_t mapnum, void **hostaddrs,
struct gomp_device_descr *devicep = resolve_device (device);
if (devicep == NULL
|| !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
|| !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
|| devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
return gomp_target_data_fallback ();
struct target_mem_desc *tgt
@ -1595,7 +1669,8 @@ GOMP_target_update (int device, const void *unused, size_t mapnum,
struct gomp_device_descr *devicep = resolve_device (device);
if (devicep == NULL
|| !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
|| !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
|| devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
return;
gomp_update (devicep, mapnum, hostaddrs, sizes, kinds, false);
@ -1626,7 +1701,7 @@ GOMP_target_update_ext (int device, size_t mapnum, void **hostaddrs,
if (gomp_create_target_task (devicep, (void (*) (void *)) NULL,
mapnum, hostaddrs, sizes, kinds,
flags | GOMP_TARGET_FLAG_UPDATE,
depend, GOMP_TARGET_TASK_DATA))
depend, NULL, GOMP_TARGET_TASK_DATA))
return;
}
else
@ -1646,7 +1721,8 @@ GOMP_target_update_ext (int device, size_t mapnum, void **hostaddrs,
}
if (devicep == NULL
|| !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
|| !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
|| devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
return;
struct gomp_thread *thr = gomp_thread ();
@ -1756,7 +1832,7 @@ GOMP_target_enter_exit_data (int device, size_t mapnum, void **hostaddrs,
{
if (gomp_create_target_task (devicep, (void (*) (void *)) NULL,
mapnum, hostaddrs, sizes, kinds,
flags, depend,
flags, depend, NULL,
GOMP_TARGET_TASK_DATA))
return;
}
@ -1777,7 +1853,8 @@ GOMP_target_enter_exit_data (int device, size_t mapnum, void **hostaddrs,
}
if (devicep == NULL
|| !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
|| !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
|| devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
return;
struct gomp_thread *thr = gomp_thread ();
@ -1815,7 +1892,8 @@ gomp_target_task_fn (void *data)
void *fn_addr;
if (devicep == NULL
|| !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
|| !(fn_addr = gomp_get_target_fn_addr (devicep, ttask->fn)))
|| !(fn_addr = gomp_get_target_fn_addr (devicep, ttask->fn))
|| (devicep->can_run_func && !devicep->can_run_func (fn_addr)))
{
ttask->state = GOMP_TARGET_TASK_FALLBACK;
gomp_target_fallback_firstprivate (ttask->fn, ttask->mapnum,
@ -1826,22 +1904,36 @@ gomp_target_task_fn (void *data)
if (ttask->state == GOMP_TARGET_TASK_FINISHED)
{
gomp_unmap_vars (ttask->tgt, true);
if (ttask->tgt)
gomp_unmap_vars (ttask->tgt, true);
return false;
}
ttask->tgt
= gomp_map_vars (devicep, ttask->mapnum, ttask->hostaddrs, NULL,
ttask->sizes, ttask->kinds, true,
GOMP_MAP_VARS_TARGET);
void *actual_arguments;
if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
{
ttask->tgt = NULL;
ttask->firstprivate_copies
= gomp_target_unshare_firstprivate (ttask->mapnum, ttask->hostaddrs,
ttask->sizes, ttask->kinds);
actual_arguments = ttask->hostaddrs;
}
else
{
ttask->tgt = gomp_map_vars (devicep, ttask->mapnum, ttask->hostaddrs,
NULL, ttask->sizes, ttask->kinds, true,
GOMP_MAP_VARS_TARGET);
actual_arguments = (void *) ttask->tgt->tgt_start;
}
ttask->state = GOMP_TARGET_TASK_READY_TO_RUN;
devicep->async_run_func (devicep->target_id, fn_addr,
(void *) ttask->tgt->tgt_start, (void *) ttask);
devicep->async_run_func (devicep->target_id, fn_addr, actual_arguments,
ttask->args, (void *) ttask);
return true;
}
else if (devicep == NULL
|| !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
|| !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
|| devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
return false;
size_t i;
@ -1891,7 +1983,8 @@ omp_target_alloc (size_t size, int device_num)
if (devicep == NULL)
return NULL;
if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
|| devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
return malloc (size);
gomp_mutex_lock (&devicep->lock);
@ -1919,7 +2012,8 @@ omp_target_free (void *device_ptr, int device_num)
if (devicep == NULL)
return;
if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
|| devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
{
free (device_ptr);
return;
@ -1946,7 +2040,8 @@ omp_target_is_present (void *ptr, int device_num)
if (devicep == NULL)
return 0;
if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
|| devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
return 1;
gomp_mutex_lock (&devicep->lock);
@ -1976,7 +2071,8 @@ omp_target_memcpy (void *dst, void *src, size_t length, size_t dst_offset,
if (dst_devicep == NULL)
return EINVAL;
if (!(dst_devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
if (!(dst_devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
|| dst_devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
dst_devicep = NULL;
}
if (src_device_num != GOMP_DEVICE_HOST_FALLBACK)
@ -1988,7 +2084,8 @@ omp_target_memcpy (void *dst, void *src, size_t length, size_t dst_offset,
if (src_devicep == NULL)
return EINVAL;
if (!(src_devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
if (!(src_devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
|| src_devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
src_devicep = NULL;
}
if (src_devicep == NULL && dst_devicep == NULL)
@ -2118,7 +2215,8 @@ omp_target_memcpy_rect (void *dst, void *src, size_t element_size,
if (dst_devicep == NULL)
return EINVAL;
if (!(dst_devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
if (!(dst_devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
|| dst_devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
dst_devicep = NULL;
}
if (src_device_num != GOMP_DEVICE_HOST_FALLBACK)
@ -2130,7 +2228,8 @@ omp_target_memcpy_rect (void *dst, void *src, size_t element_size,
if (src_devicep == NULL)
return EINVAL;
if (!(src_devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
if (!(src_devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
|| src_devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
src_devicep = NULL;
}
@ -2166,7 +2265,8 @@ omp_target_associate_ptr (void *host_ptr, void *device_ptr, size_t size,
if (devicep == NULL)
return EINVAL;
if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
|| devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
return EINVAL;
gomp_mutex_lock (&devicep->lock);
@ -2309,6 +2409,7 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
{
DLSYM (run);
DLSYM (async_run);
DLSYM_OPT (can_run, can_run);
DLSYM (dev2dev);
}
if (device->capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)

View File

@ -582,6 +582,7 @@ GOMP_PLUGIN_target_task_completion (void *data)
return;
}
ttask->state = GOMP_TARGET_TASK_FINISHED;
free (ttask->firstprivate_copies);
gomp_target_task_completion (team, task);
gomp_mutex_unlock (&team->task_lock);
}
@ -594,7 +595,7 @@ bool
gomp_create_target_task (struct gomp_device_descr *devicep,
void (*fn) (void *), size_t mapnum, void **hostaddrs,
size_t *sizes, unsigned short *kinds,
unsigned int flags, void **depend,
unsigned int flags, void **depend, void **args,
enum gomp_target_task_state state)
{
struct gomp_thread *thr = gomp_thread ();
@ -654,6 +655,7 @@ gomp_create_target_task (struct gomp_device_descr *devicep,
ttask->devicep = devicep;
ttask->fn = fn;
ttask->mapnum = mapnum;
ttask->args = args;
memcpy (ttask->hostaddrs, hostaddrs, mapnum * sizeof (void *));
ttask->sizes = (size_t *) &ttask->hostaddrs[mapnum];
memcpy (ttask->sizes, sizes, mapnum * sizeof (size_t));

View File

@ -111,6 +111,8 @@ FC = @FC@
FCFLAGS = @FCFLAGS@
FGREP = @FGREP@
GREP = @GREP@
HSA_RUNTIME_INCLUDE = @HSA_RUNTIME_INCLUDE@
HSA_RUNTIME_LIB = @HSA_RUNTIME_LIB@
INSTALL = @INSTALL@
INSTALL_DATA = @INSTALL_DATA@
INSTALL_PROGRAM = @INSTALL_PROGRAM@
@ -155,6 +157,10 @@ PACKAGE_URL = @PACKAGE_URL@
PACKAGE_VERSION = @PACKAGE_VERSION@
PATH_SEPARATOR = @PATH_SEPARATOR@
PERL = @PERL@
PLUGIN_HSA = @PLUGIN_HSA@
PLUGIN_HSA_CPPFLAGS = @PLUGIN_HSA_CPPFLAGS@
PLUGIN_HSA_LDFLAGS = @PLUGIN_HSA_LDFLAGS@
PLUGIN_HSA_LIBS = @PLUGIN_HSA_LIBS@
PLUGIN_NVPTX = @PLUGIN_NVPTX@
PLUGIN_NVPTX_CPPFLAGS = @PLUGIN_NVPTX_CPPFLAGS@
PLUGIN_NVPTX_LDFLAGS = @PLUGIN_NVPTX_LDFLAGS@

View File

@ -1,3 +1,8 @@
2016-01-19 Martin Jambor <mjambor@suse.cz>
* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_async_run): New
unused parameter.
(GOMP_OFFLOAD_run): Likewise.
2015-12-14 Ilya Verbin <ilya.verbin@intel.com>
* plugin/libgomp-plugin-intelmic.cpp (unregister_main_image): Remove.

View File

@ -528,7 +528,7 @@ GOMP_OFFLOAD_dev2dev (int device, void *dst_ptr, const void *src_ptr,
extern "C" void
GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars,
void *async_data)
void **, void *async_data)
{
TRACE ("(device = %d, tgt_fn = %p, tgt_vars = %p, async_data = %p)", device,
tgt_fn, tgt_vars, async_data);
@ -544,7 +544,7 @@ GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars,
}
extern "C" void
GOMP_OFFLOAD_run (int device, void *tgt_fn, void *tgt_vars)
GOMP_OFFLOAD_run (int device, void *tgt_fn, void *tgt_vars, void **)
{
TRACE ("(device = %d, tgt_fn = %p, tgt_vars = %p)", device, tgt_fn, tgt_vars);