omp-low.c (lower_omp_ordered): Add argument to GOMP_SMD_ORDERED_* internal calls - 0 if...

gcc/
2015-11-14  Jakub Jelinek  <jakub@redhat.com>

	* omp-low.c (lower_omp_ordered): Add argument to GOMP_SMD_ORDERED_*
	internal calls - 0 if ordered simd and 1 for ordered threads simd.
	* tree-vectorizer.c (adjust_simduid_builtins): If GOMP_SIMD_ORDERED_*
	argument is 1, replace it with GOMP_ordered_* call instead of removing
	it.
gcc/c/
2015-11-14  Jakub Jelinek  <jakub@redhat.com>

	* c-typeck.c (c_finish_omp_clauses): Don't mark
	GOMP_MAP_FIRSTPRIVATE_POINTER decls addressable.
gcc/cp/
2015-11-14  Jakub Jelinek  <jakub@redhat.com>

	* semantics.c (finish_omp_clauses): Don't mark
	GOMP_MAP_FIRSTPRIVATE_POINTER decls addressable.
libgomp/
2015-11-14  Jakub Jelinek  <jakub@redhat.com>
	    Aldy Hernandez  <aldyh@redhat.com>
	    Ilya Verbin  <ilya.verbin@intel.com>

	* ordered.c (gomp_doacross_init, GOMP_doacross_post,
	GOMP_doacross_wait, gomp_doacross_ull_init, GOMP_doacross_ull_post,
	GOMP_doacross_ull_wait): For GFS_GUIDED don't divide number of
	iterators or IV by chunk size.
	* parallel.c (gomp_resolve_num_threads): Don't assume that
	if thr->ts.team is non-NULL, then pool must be non-NULL.
	* libgomp-plugin.h (GOMP_PLUGIN_target_task_completion): Declare.
	* libgomp.map (GOMP_PLUGIN_1.1): New symbol version, export
	GOMP_PLUGIN_target_task_completion.
	* Makefile.am (libgomp_la_SOURCES): Add priority_queue.c.
	* Makefile.in: Regenerate.
	* libgomp.h: Shuffle prototypes and forward definitions around so
	priority queues can be defined.
	(enum gomp_task_kind): Add GOMP_TASK_ASYNC_RUNNING.
	(enum gomp_target_task_state): New enum.
	(struct gomp_target_task): Add state, tgt, task and team fields.
	(gomp_create_target_task): Change return type to bool, add
	state argument.
	(gomp_target_task_fn): Change return type to bool.
	(struct gomp_device_descr): Add async_run_func.
	(struct gomp_task): Remove children, next_child, prev_child,
	next_queue, prev_queue, next_taskgroup, prev_taskgroup.
	Add pnode field.
	(struct gomp_taskgroup): Remove children.
	Add taskgroup_queue.
	(struct gomp_team): Change task_queue type to a priority queue.
	(splay_compare): Define inline.
	(priority_queue_offset): New.
	(priority_node_to_task): New.
	(task_to_priority_node): New.
	* oacc-mem.c: Do not include splay-tree.h.
	* priority_queue.c: New file.
	* priority_queue.h: New file.
	* splay-tree.c: Do not include splay-tree.h.
	(splay_tree_foreach_internal): New.
	(splay_tree_foreach): New.
	* splay-tree.h: Become re-entrant if splay_tree_prefix is defined.
	(splay_tree_callback): Define typedef.
	* target.c (splay_compare): Move to libgomp.h.
	(GOMP_target): Don't adjust *thr in any way around running offloaded
	task.
	(GOMP_target_ext): Likewise.  Handle target nowait.
	(GOMP_target_update_ext, GOMP_target_enter_exit_data): Check
	return value from gomp_create_target_task, if false, fallthrough
	as if no dependencies exist.
	(gomp_target_task_fn): Change return type to bool, return true
	if the task should have another part scheduled later.  Handle
	target nowait.
	(gomp_load_plugin_for_device): Initialize async_run.
	* task.c (gomp_init_task): Initialize children_queue.
	(gomp_clear_parent_in_list): New.
	(gomp_clear_parent_in_tree): New.
	(gomp_clear_parent): Handle priorities.
	(GOMP_task): Likewise.
	(priority_queue_move_task_first,
	gomp_target_task_completion, GOMP_PLUGIN_target_task_completion):
	New functions.
	(gomp_create_target_task): Use priority queues.  Change return type
	to bool, add state argument, return false if for async
	{{enter,exit} data,update} constructs no dependencies need to be
	waited for, handle target nowait.  Set task->fn to NULL instead of
	gomp_target_task_fn.
	(verify_children_queue): Remove.
	(priority_list_upgrade_task): New.
	(priority_queue_upgrade_task): New.
	(verify_task_queue): Remove.
	(priority_list_downgrade_task): New.
	(priority_queue_downgrade_task): New.
	(gomp_task_run_pre): Use priority queues.
	Abstract code out to priority_queue_downgrade_task.
	(gomp_task_run_post_handle_dependers): Use priority queues.
	(gomp_task_run_post_remove_parent): Likewise.
	(gomp_task_run_post_remove_taskgroup): Likewise.
	(gomp_barrier_handle_tasks): Likewise.  Handle target nowait target
	tasks specially.
	(GOMP_taskwait): Likewise.
	(gomp_task_maybe_wait_for_dependencies): Likewise.  Abstract code to
	priority-queue_upgrade_task.
	(GOMP_taskgroup_start): Use priority queues.
	(GOMP_taskgroup_end): Likewise.  Handle target nowait target tasks
	specially.  If taskgroup is NULL, and thr->ts.level is 0, act as a
	barrier.
	* taskloop.c (GOMP_taskloop): Handle priorities.
	* team.c (gomp_new_team): Call priority_queue_init.
	(free_team): Call priority_queue_free.
	(gomp_free_thread): Call gomp_team_end if thr->ts.team is artificial
	team created for target nowait in implicit parallel region.
	(gomp_team_start): For nested check, test thr->ts.level instead of
	thr->ts.team != NULL.
	* testsuite/libgomp.c/doacross-3.c: New test.
	* testsuite/libgomp.c/ordered-5.c: New test.
	* testsuite/libgomp.c/priority.c: New test.
	* testsuite/libgomp.c/target-31.c: New test.
	* testsuite/libgomp.c/target-32.c: New test.
	* testsuite/libgomp.c/target-33.c: New test.
	* testsuite/libgomp.c/target-34.c: New test.
liboffloadmic/
2015-11-14  Ilya Verbin  <ilya.verbin@intel.com>

	* runtime/offload_host.cpp (task_completion_callback): New
	variable.
	(offload_proxy_task_completed_ooo): Call task_completion_callback.
	(__offload_register_task_callback): New function.
	* runtime/offload_host.h (__offload_register_task_callback): New
	declaration.
	* plugin/libgomp-plugin-intelmic.cpp (offload): Add async_data
	argument, handle async offloading.
	(register_main_image): Call register_main_image.
	(GOMP_OFFLOAD_init_device, get_target_table, GOMP_OFFLOAD_alloc,
	GOMP_OFFLOAD_free, GOMP_OFFLOAD_host2dev, GOMP_OFFLOAD_dev2host,
	GOMP_OFFLOAD_dev2dev) Adjust offload callers.
	(GOMP_OFFLOAD_async_run): New function.
	(GOMP_OFFLOAD_run): Implement using GOMP_OFFLOAD_async_run.

From-SVN: r230381
This commit is contained in:
Jakub Jelinek 2015-11-14 19:42:13 +01:00
parent 67f0527a22
commit e460634820
35 changed files with 2893 additions and 727 deletions

View File

@ -1,3 +1,11 @@
2015-11-14 Jakub Jelinek <jakub@redhat.com>
* omp-low.c (lower_omp_ordered): Add argument to GOMP_SMD_ORDERED_*
internal calls - 0 if ordered simd and 1 for ordered threads simd.
* tree-vectorizer.c (adjust_simduid_builtins): If GOMP_SIMD_ORDERED_*
argument is 1, replace it with GOMP_ordered_* call instead of removing
it.
2015-11-13 Rich Felker <dalias@libc.org>
* config/sh/sh.md (symGOT_load): Suppress __stack_chk_guard

View File

@ -1,3 +1,8 @@
2015-11-14 Jakub Jelinek <jakub@redhat.com>
* c-typeck.c (c_finish_omp_clauses): Don't mark
GOMP_MAP_FIRSTPRIVATE_POINTER decls addressable.
2015-11-14 Marek Polacek <polacek@redhat.com>
* c-decl.c: Use RECORD_OR_UNION_TYPE_P throughout.

View File

@ -12918,7 +12918,10 @@ c_finish_omp_clauses (tree clauses, bool is_omp, bool declare_simd)
omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
remove = true;
}
else if (!c_mark_addressable (t))
else if ((OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP
|| (OMP_CLAUSE_MAP_KIND (c)
!= GOMP_MAP_FIRSTPRIVATE_POINTER))
&& !c_mark_addressable (t))
remove = true;
else if (!(OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
&& (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER

View File

@ -1,3 +1,8 @@
2015-11-14 Jakub Jelinek <jakub@redhat.com>
* semantics.c (finish_omp_clauses): Don't mark
GOMP_MAP_FIRSTPRIVATE_POINTER decls addressable.
2015-11-13 Kai Tietz <ktietz70@googlemail.com>
Marek Polacek <polacek@redhat.com>
Jason Merrill <jason@redhat.com>

View File

@ -6581,6 +6581,9 @@ finish_omp_clauses (tree clauses, bool allow_fields, bool declare_simd)
}
else if (!processing_template_decl
&& TREE_CODE (TREE_TYPE (t)) != REFERENCE_TYPE
&& (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP
|| (OMP_CLAUSE_MAP_KIND (c)
!= GOMP_MAP_FIRSTPRIVATE_POINTER))
&& !cxx_mark_addressable (t))
remove = true;
else if (!(OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP

View File

@ -13951,8 +13951,10 @@ lower_omp_ordered (gimple_stmt_iterator *gsi_p, omp_context *ctx)
gomp_ordered *ord_stmt = as_a <gomp_ordered *> (stmt);
gcall *x;
gbind *bind;
bool simd
= find_omp_clause (gimple_omp_ordered_clauses (ord_stmt), OMP_CLAUSE_SIMD);
bool simd = find_omp_clause (gimple_omp_ordered_clauses (ord_stmt),
OMP_CLAUSE_SIMD);
bool threads = find_omp_clause (gimple_omp_ordered_clauses (ord_stmt),
OMP_CLAUSE_THREADS);
if (find_omp_clause (gimple_omp_ordered_clauses (ord_stmt),
OMP_CLAUSE_DEPEND))
@ -13975,7 +13977,8 @@ lower_omp_ordered (gimple_stmt_iterator *gsi_p, omp_context *ctx)
if (simd)
{
x = gimple_build_call_internal (IFN_GOMP_SIMD_ORDERED_START, 0);
x = gimple_build_call_internal (IFN_GOMP_SIMD_ORDERED_START, 1,
build_int_cst (NULL_TREE, threads));
cfun->has_simduid_loops = true;
}
else
@ -13989,7 +13992,8 @@ lower_omp_ordered (gimple_stmt_iterator *gsi_p, omp_context *ctx)
gimple_omp_set_body (stmt, NULL);
if (simd)
x = gimple_build_call_internal (IFN_GOMP_SIMD_ORDERED_END, 0);
x = gimple_build_call_internal (IFN_GOMP_SIMD_ORDERED_END, 1,
build_int_cst (NULL_TREE, threads));
else
x = gimple_build_call (builtin_decl_explicit (BUILT_IN_GOMP_ORDERED_END),
0);

View File

@ -177,6 +177,21 @@ adjust_simduid_builtins (hash_table<simduid_to_vf> *htab)
break;
case IFN_GOMP_SIMD_ORDERED_START:
case IFN_GOMP_SIMD_ORDERED_END:
if (integer_onep (gimple_call_arg (stmt, 0)))
{
enum built_in_function bcode
= (ifn == IFN_GOMP_SIMD_ORDERED_START
? BUILT_IN_GOMP_ORDERED_START
: BUILT_IN_GOMP_ORDERED_END);
gimple *g
= gimple_build_call (builtin_decl_explicit (bcode), 0);
tree vdef = gimple_vdef (stmt);
gimple_set_vdef (g, vdef);
SSA_NAME_DEF_STMT (vdef) = g;
gimple_set_vuse (g, gimple_vuse (stmt));
gsi_replace (&i, g, true);
continue;
}
gsi_remove (&i, true);
unlink_stmt_vdef (stmt);
continue;

View File

@ -1,3 +1,104 @@
2015-11-14 Jakub Jelinek <jakub@redhat.com>
Aldy Hernandez <aldyh@redhat.com>
Ilya Verbin <ilya.verbin@intel.com>
* ordered.c (gomp_doacross_init, GOMP_doacross_post,
GOMP_doacross_wait, gomp_doacross_ull_init, GOMP_doacross_ull_post,
GOMP_doacross_ull_wait): For GFS_GUIDED don't divide number of
iterators or IV by chunk size.
* parallel.c (gomp_resolve_num_threads): Don't assume that
if thr->ts.team is non-NULL, then pool must be non-NULL.
* libgomp-plugin.h (GOMP_PLUGIN_target_task_completion): Declare.
* libgomp.map (GOMP_PLUGIN_1.1): New symbol version, export
GOMP_PLUGIN_target_task_completion.
* Makefile.am (libgomp_la_SOURCES): Add priority_queue.c.
* Makefile.in: Regenerate.
* libgomp.h: Shuffle prototypes and forward definitions around so
priority queues can be defined.
(enum gomp_task_kind): Add GOMP_TASK_ASYNC_RUNNING.
(enum gomp_target_task_state): New enum.
(struct gomp_target_task): Add state, tgt, task and team fields.
(gomp_create_target_task): Change return type to bool, add
state argument.
(gomp_target_task_fn): Change return type to bool.
(struct gomp_device_descr): Add async_run_func.
(struct gomp_task): Remove children, next_child, prev_child,
next_queue, prev_queue, next_taskgroup, prev_taskgroup.
Add pnode field.
(struct gomp_taskgroup): Remove children.
Add taskgroup_queue.
(struct gomp_team): Change task_queue type to a priority queue.
(splay_compare): Define inline.
(priority_queue_offset): New.
(priority_node_to_task): New.
(task_to_priority_node): New.
* oacc-mem.c: Do not include splay-tree.h.
* priority_queue.c: New file.
* priority_queue.h: New file.
* splay-tree.c: Do not include splay-tree.h.
(splay_tree_foreach_internal): New.
(splay_tree_foreach): New.
* splay-tree.h: Become re-entrant if splay_tree_prefix is defined.
(splay_tree_callback): Define typedef.
* target.c (splay_compare): Move to libgomp.h.
(GOMP_target): Don't adjust *thr in any way around running offloaded
task.
(GOMP_target_ext): Likewise. Handle target nowait.
(GOMP_target_update_ext, GOMP_target_enter_exit_data): Check
return value from gomp_create_target_task, if false, fallthrough
as if no dependencies exist.
(gomp_target_task_fn): Change return type to bool, return true
if the task should have another part scheduled later. Handle
target nowait.
(gomp_load_plugin_for_device): Initialize async_run.
* task.c (gomp_init_task): Initialize children_queue.
(gomp_clear_parent_in_list): New.
(gomp_clear_parent_in_tree): New.
(gomp_clear_parent): Handle priorities.
(GOMP_task): Likewise.
(priority_queue_move_task_first,
gomp_target_task_completion, GOMP_PLUGIN_target_task_completion):
New functions.
(gomp_create_target_task): Use priority queues. Change return type
to bool, add state argument, return false if for async
{{enter,exit} data,update} constructs no dependencies need to be
waited for, handle target nowait. Set task->fn to NULL instead of
gomp_target_task_fn.
(verify_children_queue): Remove.
(priority_list_upgrade_task): New.
(priority_queue_upgrade_task): New.
(verify_task_queue): Remove.
(priority_list_downgrade_task): New.
(priority_queue_downgrade_task): New.
(gomp_task_run_pre): Use priority queues.
Abstract code out to priority_queue_downgrade_task.
(gomp_task_run_post_handle_dependers): Use priority queues.
(gomp_task_run_post_remove_parent): Likewise.
(gomp_task_run_post_remove_taskgroup): Likewise.
(gomp_barrier_handle_tasks): Likewise. Handle target nowait target
tasks specially.
(GOMP_taskwait): Likewise.
(gomp_task_maybe_wait_for_dependencies): Likewise. Abstract code to
priority-queue_upgrade_task.
(GOMP_taskgroup_start): Use priority queues.
(GOMP_taskgroup_end): Likewise. Handle target nowait target tasks
specially. If taskgroup is NULL, and thr->ts.level is 0, act as a
barrier.
* taskloop.c (GOMP_taskloop): Handle priorities.
* team.c (gomp_new_team): Call priority_queue_init.
(free_team): Call priority_queue_free.
(gomp_free_thread): Call gomp_team_end if thr->ts.team is artificial
team created for target nowait in implicit parallel region.
(gomp_team_start): For nested check, test thr->ts.level instead of
thr->ts.team != NULL.
* testsuite/libgomp.c/doacross-3.c: New test.
* testsuite/libgomp.c/ordered-5.c: New test.
* testsuite/libgomp.c/priority.c: New test.
* testsuite/libgomp.c/target-31.c: New test.
* testsuite/libgomp.c/target-32.c: New test.
* testsuite/libgomp.c/target-33.c: New test.
* testsuite/libgomp.c/target-34.c: New test.
2015-11-13 Nathan Sidwell <nathan@codesourcery.com>
* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: New.

View File

@ -63,7 +63,7 @@ libgomp_la_SOURCES = alloc.c barrier.c critical.c env.c error.c iter.c \
task.c team.c work.c lock.c mutex.c proc.c sem.c bar.c ptrlock.c \
time.c fortran.c affinity.c target.c splay-tree.c libgomp-plugin.c \
oacc-parallel.c oacc-host.c oacc-init.c oacc-mem.c oacc-async.c \
oacc-plugin.c oacc-cuda.c
oacc-plugin.c oacc-cuda.c priority_queue.c
include $(top_srcdir)/plugin/Makefrag.am

View File

@ -168,7 +168,7 @@ am_libgomp_la_OBJECTS = alloc.lo barrier.lo critical.lo env.lo \
fortran.lo affinity.lo target.lo splay-tree.lo \
libgomp-plugin.lo oacc-parallel.lo oacc-host.lo oacc-init.lo \
oacc-mem.lo oacc-async.lo oacc-plugin.lo oacc-cuda.lo \
$(am__objects_1)
priority_queue.lo $(am__objects_1)
libgomp_la_OBJECTS = $(am_libgomp_la_OBJECTS)
DEFAULT_INCLUDES = -I.@am__isrc@
depcomp = $(SHELL) $(top_srcdir)/../depcomp
@ -415,7 +415,7 @@ libgomp_la_SOURCES = alloc.c barrier.c critical.c env.c error.c iter.c \
bar.c ptrlock.c time.c fortran.c affinity.c target.c \
splay-tree.c libgomp-plugin.c oacc-parallel.c oacc-host.c \
oacc-init.c oacc-mem.c oacc-async.c oacc-plugin.c oacc-cuda.c \
$(am__append_2)
priority_queue.c $(am__append_2)
# Nvidia PTX OpenACC plugin.
@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info $(libtool_VERSION)
@ -589,6 +589,7 @@ distclean-compile:
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-plugin.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ordered.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/parallel.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/priority_queue.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/proc.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ptrlock.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/sections.Plo@am__quote@

View File

@ -63,6 +63,7 @@ struct addr_pair
extern void *GOMP_PLUGIN_malloc (size_t) __attribute__ ((malloc));
extern void *GOMP_PLUGIN_malloc_cleared (size_t) __attribute__ ((malloc));
extern void *GOMP_PLUGIN_realloc (void *, size_t);
void GOMP_PLUGIN_target_task_completion (void *);
extern void GOMP_PLUGIN_debug (int, const char *, ...)
__attribute__ ((format (printf, 2, 3)));

View File

@ -50,6 +50,22 @@
#include <stdlib.h>
#include <stdarg.h>
/* Needed for memset in priority_queue.c. */
#if _LIBGOMP_CHECKING_
# ifdef STRING_WITH_STRINGS
# include <string.h>
# include <strings.h>
# else
# ifdef HAVE_STRING_H
# include <string.h>
# else
# ifdef HAVE_STRINGS_H
# include <strings.h>
# endif
# endif
# endif
#endif
#ifdef HAVE_ATTRIBUTE_VISIBILITY
# pragma GCC visibility push(hidden)
#endif
@ -65,6 +81,44 @@ enum memmodel
MEMMODEL_SEQ_CST = 5
};
/* alloc.c */
extern void *gomp_malloc (size_t) __attribute__((malloc));
extern void *gomp_malloc_cleared (size_t) __attribute__((malloc));
extern void *gomp_realloc (void *, size_t);
/* Avoid conflicting prototypes of alloca() in system headers by using
GCC's builtin alloca(). */
#define gomp_alloca(x) __builtin_alloca(x)
/* error.c */
extern void gomp_vdebug (int, const char *, va_list);
extern void gomp_debug (int, const char *, ...)
__attribute__ ((format (printf, 2, 3)));
#define gomp_vdebug(KIND, FMT, VALIST) \
do { \
if (__builtin_expect (gomp_debug_var, 0)) \
(gomp_vdebug) ((KIND), (FMT), (VALIST)); \
} while (0)
#define gomp_debug(KIND, ...) \
do { \
if (__builtin_expect (gomp_debug_var, 0)) \
(gomp_debug) ((KIND), __VA_ARGS__); \
} while (0)
extern void gomp_verror (const char *, va_list);
extern void gomp_error (const char *, ...)
__attribute__ ((format (printf, 1, 2)));
extern void gomp_vfatal (const char *, va_list)
__attribute__ ((noreturn));
extern void gomp_fatal (const char *, ...)
__attribute__ ((noreturn, format (printf, 1, 2)));
struct gomp_task;
struct gomp_taskgroup;
struct htab;
#include "priority_queue.h"
#include "sem.h"
#include "mutex.h"
#include "bar.h"
@ -298,6 +352,7 @@ extern gomp_mutex_t gomp_managed_threads_lock;
#endif
extern unsigned long gomp_max_active_levels_var;
extern bool gomp_cancel_var;
extern int gomp_max_task_priority_var;
extern unsigned long long gomp_spin_count_var, gomp_throttled_spin_count_var;
extern unsigned long gomp_available_cpus, gomp_managed_threads;
extern unsigned long *gomp_nthreads_var_list, gomp_nthreads_var_list_len;
@ -318,13 +373,14 @@ enum gomp_task_kind
/* Task created by GOMP_task and waiting to be run. */
GOMP_TASK_WAITING,
/* Task currently executing or scheduled and about to execute. */
GOMP_TASK_TIED
GOMP_TASK_TIED,
/* Used for target tasks that have vars mapped and async run started,
but not yet completed. Once that completes, they will be readded
into the queues as GOMP_TASK_WAITING in order to perform the var
unmapping. */
GOMP_TASK_ASYNC_RUNNING
};
struct gomp_task;
struct gomp_taskgroup;
struct htab;
struct gomp_task_depend_entry
{
/* Address of dependency. */
@ -352,8 +408,8 @@ struct gomp_taskwait
{
bool in_taskwait;
bool in_depend_wait;
/* Number of tasks we are waiting for. */
size_t n_depend;
struct gomp_task *last_parent_depends_on;
gomp_sem_t taskwait_sem;
};
@ -361,26 +417,10 @@ struct gomp_taskwait
struct gomp_task
{
/* Parent circular list. See children description below. */
/* Parent of this task. */
struct gomp_task *parent;
/* Circular list representing the children of this task.
In this list we first have parent_depends_on ready to run tasks,
then !parent_depends_on ready to run tasks, and finally already
running tasks. */
struct gomp_task *children;
struct gomp_task *next_child;
struct gomp_task *prev_child;
/* Circular task_queue in `struct gomp_team'.
GOMP_TASK_WAITING tasks come before GOMP_TASK_TIED tasks. */
struct gomp_task *next_queue;
struct gomp_task *prev_queue;
/* Circular queue in gomp_taskgroup->children.
GOMP_TASK_WAITING tasks come before GOMP_TASK_TIED tasks. */
struct gomp_task *next_taskgroup;
struct gomp_task *prev_taskgroup;
/* Children of this task. */
struct priority_queue children_queue;
/* Taskgroup this task belongs in. */
struct gomp_taskgroup *taskgroup;
/* Tasks that depend on this task. */
@ -389,8 +429,19 @@ struct gomp_task
struct gomp_taskwait *taskwait;
/* Number of items in DEPEND. */
size_t depend_count;
/* Number of tasks in the DEPENDERS field above. */
/* Number of tasks this task depends on. Once this counter reaches
0, we have no unsatisfied dependencies, and this task can be put
into the various queues to be scheduled. */
size_t num_dependees;
/* Priority of this task. */
int priority;
/* The priority node for this task in each of the different queues.
We put this here to avoid allocating space for each priority
node. Then we play offsetof() games to convert between pnode[]
entries and the gomp_task in which they reside. */
struct priority_node pnode[3];
struct gomp_task_icv icv;
void (*fn) (void *);
void *fn_data;
@ -407,21 +458,32 @@ struct gomp_task
struct gomp_task_depend_entry depend[];
};
/* This structure describes a single #pragma omp taskgroup. */
struct gomp_taskgroup
{
struct gomp_taskgroup *prev;
/* Circular list of tasks that belong in this taskgroup.
Tasks are chained by next/prev_taskgroup within gomp_task, and
are sorted by GOMP_TASK_WAITING tasks, and then GOMP_TASK_TIED
tasks. */
struct gomp_task *children;
/* Queue of tasks that belong in this taskgroup. */
struct priority_queue taskgroup_queue;
bool in_taskgroup_wait;
bool cancelled;
gomp_sem_t taskgroup_sem;
size_t num_children;
};
/* Various state of OpenMP async offloading tasks. */
enum gomp_target_task_state
{
GOMP_TARGET_TASK_DATA,
GOMP_TARGET_TASK_BEFORE_MAP,
GOMP_TARGET_TASK_FALLBACK,
GOMP_TARGET_TASK_READY_TO_RUN,
GOMP_TARGET_TASK_RUNNING,
GOMP_TARGET_TASK_FINISHED
};
/* This structure describes a target task. */
struct gomp_target_task
{
struct gomp_device_descr *devicep;
@ -430,6 +492,10 @@ struct gomp_target_task
size_t *sizes;
unsigned short *kinds;
unsigned int flags;
enum gomp_target_task_state state;
struct target_mem_desc *tgt;
struct gomp_task *task;
struct gomp_team *team;
void *hostaddrs[];
};
@ -495,9 +561,8 @@ struct gomp_team
struct gomp_work_share work_shares[8];
gomp_mutex_t task_lock;
/* Scheduled tasks. Chain fields are next/prev_queue within a
gomp_task. */
struct gomp_task *task_queue;
/* Scheduled tasks. */
struct priority_queue task_queue;
/* Number of all GOMP_TASK_{WAITING,TIED} tasks in the team. */
unsigned int task_count;
/* Number of GOMP_TASK_WAITING tasks currently waiting to be scheduled. */
@ -627,39 +692,6 @@ extern bool gomp_affinity_init_level (int, unsigned long, bool);
extern void gomp_affinity_print_place (void *);
extern void gomp_get_place_proc_ids_8 (int, int64_t *);
/* alloc.c */
extern void *gomp_malloc (size_t) __attribute__((malloc));
extern void *gomp_malloc_cleared (size_t) __attribute__((malloc));
extern void *gomp_realloc (void *, size_t);
/* Avoid conflicting prototypes of alloca() in system headers by using
GCC's builtin alloca(). */
#define gomp_alloca(x) __builtin_alloca(x)
/* error.c */
extern void gomp_vdebug (int, const char *, va_list);
extern void gomp_debug (int, const char *, ...)
__attribute__ ((format (printf, 2, 3)));
#define gomp_vdebug(KIND, FMT, VALIST) \
do { \
if (__builtin_expect (gomp_debug_var, 0)) \
(gomp_vdebug) ((KIND), (FMT), (VALIST)); \
} while (0)
#define gomp_debug(KIND, ...) \
do { \
if (__builtin_expect (gomp_debug_var, 0)) \
(gomp_debug) ((KIND), __VA_ARGS__); \
} while (0)
extern void gomp_verror (const char *, va_list);
extern void gomp_error (const char *, ...)
__attribute__ ((format (printf, 1, 2)));
extern void gomp_vfatal (const char *, va_list)
__attribute__ ((noreturn));
extern void gomp_fatal (const char *, ...)
__attribute__ ((noreturn, format (printf, 1, 2)));
/* iter.c */
extern int gomp_iter_static_next (long *, long *);
@ -715,10 +747,10 @@ extern void gomp_init_task (struct gomp_task *, struct gomp_task *,
extern void gomp_end_task (void);
extern void gomp_barrier_handle_tasks (gomp_barrier_state_t);
extern void gomp_task_maybe_wait_for_dependencies (void **);
extern void gomp_create_target_task (struct gomp_device_descr *,
extern bool gomp_create_target_task (struct gomp_device_descr *,
void (*) (void *), size_t, void **,
size_t *, unsigned short *, unsigned int,
void **);
void **, enum gomp_target_task_state);
static void inline
gomp_finish_task (struct gomp_task *task)
@ -739,8 +771,9 @@ extern void gomp_free_thread (void *);
extern void gomp_init_targets_once (void);
extern int gomp_get_num_devices (void);
extern void gomp_target_task_fn (void *);
extern bool gomp_target_task_fn (void *);
/* Splay tree definitions. */
typedef struct splay_tree_node_s *splay_tree_node;
typedef struct splay_tree_s *splay_tree;
typedef struct splay_tree_key_s *splay_tree_key;
@ -800,6 +833,21 @@ struct splay_tree_key_s {
uintptr_t async_refcount;
};
/* The comparison function. */
static inline int
splay_compare (splay_tree_key x, splay_tree_key y)
{
if (x->host_start == x->host_end
&& y->host_start == y->host_end)
return 0;
if (x->host_end <= y->host_start)
return -1;
if (x->host_start >= y->host_end)
return 1;
return 0;
}
#include "splay-tree.h"
typedef struct acc_dispatch_t
@ -877,6 +925,7 @@ struct gomp_device_descr
void *(*host2dev_func) (int, void *, const void *, size_t);
void *(*dev2dev_func) (int, void *, const void *, size_t);
void (*run_func) (int, void *, void *);
void (*async_run_func) (int, void *, void *, void *);
/* Splay tree containing information about mapped memory regions. */
struct splay_tree_s mem_map;
@ -1016,4 +1065,34 @@ extern int gomp_test_nest_lock_25 (omp_nest_lock_25_t *) __GOMP_NOTHROW;
# define ialias_call(fn) fn
#endif
/* Helper function for priority_node_to_task() and
task_to_priority_node().
Return the offset from a task to its priority_node entry. The
priority_node entry is has a type of TYPE. */
static inline size_t
priority_queue_offset (enum priority_queue_type type)
{
return offsetof (struct gomp_task, pnode[(int) type]);
}
/* Return the task associated with a priority NODE of type TYPE. */
static inline struct gomp_task *
priority_node_to_task (enum priority_queue_type type,
struct priority_node *node)
{
return (struct gomp_task *) ((char *) node - priority_queue_offset (type));
}
/* Return the priority node of type TYPE for a given TASK. */
static inline struct priority_node *
task_to_priority_node (enum priority_queue_type type,
struct gomp_task *task)
{
return (struct priority_node *) ((char *) task
+ priority_queue_offset (type));
}
#endif /* LIBGOMP_H */

View File

@ -407,3 +407,8 @@ GOMP_PLUGIN_1.0 {
GOMP_PLUGIN_async_unmap_vars;
GOMP_PLUGIN_acc_thread;
};
GOMP_PLUGIN_1.1 {
global:
GOMP_PLUGIN_target_task_completion;
} GOMP_PLUGIN_1.0;

View File

@ -31,7 +31,6 @@
#include "libgomp.h"
#include "gomp-constants.h"
#include "oacc-int.h"
#include "splay-tree.h"
#include <stdint.h>
#include <assert.h>

View File

@ -297,6 +297,8 @@ gomp_doacross_init (unsigned ncounts, long *counts, long chunk_size)
if (ws->sched == GFS_STATIC)
num_ents = team->nthreads;
else if (ws->sched == GFS_GUIDED)
num_ents = counts[0];
else
num_ents = (counts[0] - 1) / chunk_size + 1;
if (num_bits <= MAX_COLLAPSED_BITS)
@ -366,6 +368,8 @@ GOMP_doacross_post (long *counts)
if (__builtin_expect (ws->sched == GFS_STATIC, 1))
ent = thr->ts.team_id;
else if (ws->sched == GFS_GUIDED)
ent = counts[0];
else
ent = counts[0] / doacross->chunk_size;
unsigned long *array = (unsigned long *) (doacross->array
@ -426,6 +430,8 @@ GOMP_doacross_wait (long first, ...)
else
ent = first / ws->chunk_size % thr->ts.team->nthreads;
}
else if (ws->sched == GFS_GUIDED)
ent = first;
else
ent = first / doacross->chunk_size;
unsigned long *array = (unsigned long *) (doacross->array
@ -520,6 +526,8 @@ gomp_doacross_ull_init (unsigned ncounts, gomp_ull *counts, gomp_ull chunk_size)
if (ws->sched == GFS_STATIC)
num_ents = team->nthreads;
else if (ws->sched == GFS_GUIDED)
num_ents = counts[0];
else
num_ents = (counts[0] - 1) / chunk_size + 1;
if (num_bits <= MAX_COLLAPSED_BITS)
@ -595,6 +603,8 @@ GOMP_doacross_ull_post (gomp_ull *counts)
if (__builtin_expect (ws->sched == GFS_STATIC, 1))
ent = thr->ts.team_id;
else if (ws->sched == GFS_GUIDED)
ent = counts[0];
else
ent = counts[0] / doacross->chunk_size_ull;
@ -676,6 +686,8 @@ GOMP_doacross_ull_wait (gomp_ull first, ...)
else
ent = first / ws->chunk_size_ull % thr->ts.team->nthreads;
}
else if (ws->sched == GFS_GUIDED)
ent = first;
else
ent = first / doacross->chunk_size_ull;

View File

@ -85,7 +85,7 @@ gomp_resolve_num_threads (unsigned specified, unsigned count)
nested parallel, so there is just one thread in the
contention group as well, no need to handle it atomically. */
pool = thr->thread_pool;
if (thr->ts.team == NULL)
if (thr->ts.team == NULL || pool == NULL)
{
num_threads = max_num_threads;
if (num_threads > icv->thread_limit_var)

300
libgomp/priority_queue.c Normal file
View File

@ -0,0 +1,300 @@
/* Copyright (C) 2015 Free Software Foundation, Inc.
Contributed by Aldy Hernandez <aldyh@redhat.com>.
This file is part of the GNU Offloading and Multi Processing Library
(libgomp).
Libgomp is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3, or (at your option)
any later version.
Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details.
Under Section 7 of GPL version 3, you are granted additional
permissions described in the GCC Runtime Library Exception, version
3.1, as published by the Free Software Foundation.
You should have received a copy of the GNU General Public License and
a copy of the GCC Runtime Library Exception along with this program;
see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
<http://www.gnu.org/licenses/>. */
/* Priority queue implementation of GOMP tasks. */
#include "libgomp.h"
#if _LIBGOMP_CHECKING_
#include <stdio.h>
/* Sanity check to verify whether a TASK is in LIST. Return TRUE if
found, FALSE otherwise.
TYPE is the type of priority queue this task resides in. */
static inline bool
priority_queue_task_in_list_p (enum priority_queue_type type,
struct priority_list *list,
struct gomp_task *task)
{
struct priority_node *p = list->tasks;
do
{
if (priority_node_to_task (type, p) == task)
return true;
p = p->next;
}
while (p != list->tasks);
return false;
}
/* Tree version of priority_queue_task_in_list_p. */
static inline bool
priority_queue_task_in_tree_p (enum priority_queue_type type,
struct priority_queue *head,
struct gomp_task *task)
{
struct priority_list *list
= priority_queue_lookup_priority (head, task->priority);
if (!list)
return false;
return priority_queue_task_in_list_p (type, list, task);
}
/* Generic version of priority_queue_task_in_list_p that works for
trees or lists. */
bool
priority_queue_task_in_queue_p (enum priority_queue_type type,
struct priority_queue *head,
struct gomp_task *task)
{
if (priority_queue_empty_p (head, MEMMODEL_RELAXED))
return false;
if (priority_queue_multi_p (head))
return priority_queue_task_in_tree_p (type, head, task);
else
return priority_queue_task_in_list_p (type, &head->l, task);
}
/* Sanity check LIST to make sure the tasks therein are in the right
order. LIST is a priority list of type TYPE.
The expected order is that GOMP_TASK_WAITING tasks come before
GOMP_TASK_TIED/GOMP_TASK_ASYNC_RUNNING ones.
If CHECK_DEPS is TRUE, we also check that parent_depends_on WAITING
tasks come before !parent_depends_on WAITING tasks. This is only
applicable to the children queue, and the caller is expected to
ensure that we are verifying the children queue. */
static void
priority_list_verify (enum priority_queue_type type,
struct priority_list *list, bool check_deps)
{
bool seen_tied = false;
bool seen_plain_waiting = false;
struct priority_node *p = list->tasks;
while (1)
{
struct gomp_task *t = priority_node_to_task (type, p);
if (seen_tied && t->kind == GOMP_TASK_WAITING)
gomp_fatal ("priority_queue_verify: WAITING task after TIED");
if (t->kind >= GOMP_TASK_TIED)
seen_tied = true;
else if (check_deps && t->kind == GOMP_TASK_WAITING)
{
if (t->parent_depends_on)
{
if (seen_plain_waiting)
gomp_fatal ("priority_queue_verify: "
"parent_depends_on after !parent_depends_on");
}
else
seen_plain_waiting = true;
}
p = p->next;
if (p == list->tasks)
break;
}
}
/* Callback type for priority_tree_verify_callback. */
struct cbtype
{
enum priority_queue_type type;
bool check_deps;
};
/* Verify every task in NODE.
Callback for splay_tree_foreach. */
static void
priority_tree_verify_callback (prio_splay_tree_key key, void *data)
{
struct cbtype *cb = (struct cbtype *) data;
priority_list_verify (cb->type, &key->l, cb->check_deps);
}
/* Generic version of priority_list_verify.
Sanity check HEAD to make sure the tasks therein are in the right
order. The priority_queue holds tasks of type TYPE.
If CHECK_DEPS is TRUE, we also check that parent_depends_on WAITING
tasks come before !parent_depends_on WAITING tasks. This is only
applicable to the children queue, and the caller is expected to
ensure that we are verifying the children queue. */
void
priority_queue_verify (enum priority_queue_type type,
struct priority_queue *head, bool check_deps)
{
if (priority_queue_empty_p (head, MEMMODEL_RELAXED))
return;
if (priority_queue_multi_p (head))
{
struct cbtype cb = { type, check_deps };
prio_splay_tree_foreach (&head->t,
priority_tree_verify_callback, &cb);
}
else
priority_list_verify (type, &head->l, check_deps);
}
#endif /* _LIBGOMP_CHECKING_ */
/* Remove NODE from priority queue HEAD, wherever it may be inside the
tree. HEAD contains tasks of type TYPE. */
void
priority_tree_remove (enum priority_queue_type type,
struct priority_queue *head,
struct priority_node *node)
{
/* ?? The only reason this function is not inlined is because we
need to find the priority within gomp_task (which has not been
completely defined in the header file). If the lack of inlining
is a concern, we could pass the priority number as a
parameter, or we could move this to libgomp.h. */
int priority = priority_node_to_task (type, node)->priority;
/* ?? We could avoid this lookup by keeping a pointer to the key in
the priority_node. */
struct priority_list *list
= priority_queue_lookup_priority (head, priority);
#if _LIBGOMP_CHECKING_
if (!list)
gomp_fatal ("Unable to find priority %d", priority);
#endif
/* If NODE was the last in its priority, clean up the priority. */
if (priority_list_remove (list, node, MEMMODEL_RELAXED))
{
prio_splay_tree_remove (&head->t, (prio_splay_tree_key) list);
list->tasks = NULL;
#if _LIBGOMP_CHECKING_
memset (list, 0xaf, sizeof (*list));
#endif
free (list);
}
}
/* Return the highest priority WAITING task in a splay tree NODE. If
there are no WAITING tasks available, return NULL.
NODE is a priority list containing tasks of type TYPE.
The right most node in a tree contains the highest priority.
Recurse down to find such a node. If the task at that max node is
not WAITING, bubble back up and look at the remaining tasks
in-order. */
static struct gomp_task *
priority_tree_next_task_1 (enum priority_queue_type type,
prio_splay_tree_node node)
{
again:
if (!node)
return NULL;
struct gomp_task *ret = priority_tree_next_task_1 (type, node->right);
if (ret)
return ret;
ret = priority_node_to_task (type, node->key.l.tasks);
if (ret->kind == GOMP_TASK_WAITING)
return ret;
node = node->left;
goto again;
}
/* Return the highest priority WAITING task from within Q1 and Q2,
while giving preference to tasks from Q1. Q1 is a queue containing
items of type TYPE1. Q2 is a queue containing items of type TYPE2.
Since we are mostly interested in Q1, if there are no WAITING tasks
in Q1, we don't bother checking Q2, and just return NULL.
As a special case, Q2 can be NULL, in which case, we just choose
the highest priority WAITING task in Q1. This is an optimization
to speed up looking through only one queue.
If the returned task is chosen from Q1, *Q1_CHOSEN_P is set to
TRUE, otherwise it is set to FALSE. */
struct gomp_task *
priority_tree_next_task (enum priority_queue_type type1,
struct priority_queue *q1,
enum priority_queue_type type2,
struct priority_queue *q2,
bool *q1_chosen_p)
{
struct gomp_task *t1 = priority_tree_next_task_1 (type1, q1->t.root);
if (!t1
/* Special optimization when only searching through one queue. */
|| !q2)
{
*q1_chosen_p = true;
return t1;
}
struct gomp_task *t2 = priority_tree_next_task_1 (type2, q2->t.root);
if (!t2 || t1->priority > t2->priority)
{
*q1_chosen_p = true;
return t1;
}
if (t2->priority > t1->priority)
{
*q1_chosen_p = false;
return t2;
}
/* If we get here, the priorities are the same, so we must look at
parent_depends_on to make our decision. */
#if _LIBGOMP_CHECKING_
if (t1 != t2)
gomp_fatal ("priority_tree_next_task: t1 != t2");
#endif
if (t2->parent_depends_on && !t1->parent_depends_on)
{
*q1_chosen_p = false;
return t2;
}
*q1_chosen_p = true;
return t1;
}
/* Priority splay trees comparison function. */
static inline int
prio_splay_compare (prio_splay_tree_key x, prio_splay_tree_key y)
{
if (x->l.priority == y->l.priority)
return 0;
return x->l.priority < y->l.priority ? -1 : 1;
}
/* Define another splay tree instantiation, for priority_list's. */
#define splay_tree_prefix prio
#define splay_tree_c
#include "splay-tree.h"

485
libgomp/priority_queue.h Normal file
View File

@ -0,0 +1,485 @@
/* Copyright (C) 2015 Free Software Foundation, Inc.
Contributed by Aldy Hernandez <aldyh@redhat.com>.
This file is part of the GNU Offloading and Multi Processing Library
(libgomp).
Libgomp is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3, or (at your option)
any later version.
Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details.
Under Section 7 of GPL version 3, you are granted additional
permissions described in the GCC Runtime Library Exception, version
3.1, as published by the Free Software Foundation.
You should have received a copy of the GNU General Public License and
a copy of the GCC Runtime Library Exception along with this program;
see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
<http://www.gnu.org/licenses/>. */
/* Header file for a priority queue of GOMP tasks. */
/* ?? Perhaps all the priority_tree_* functions are complex and rare
enough to go out-of-line and be moved to priority_queue.c. ?? */
#ifndef _PRIORITY_QUEUE_H_
#define _PRIORITY_QUEUE_H_
/* One task. */
struct priority_node
{
/* Next and previous chains in a circular doubly linked list for
tasks within this task's priority. */
struct priority_node *next, *prev;
};
/* All tasks within the same priority. */
struct priority_list
{
/* Priority of the tasks in this set. */
int priority;
/* Tasks. */
struct priority_node *tasks;
/* This points to the last of the higher priority WAITING tasks.
Remember that for the children queue, we have:
parent_depends_on WAITING tasks.
!parent_depends_on WAITING tasks.
TIED tasks.
This is a pointer to the last of the parent_depends_on WAITING
tasks which are essentially, higher priority items within their
priority. */
struct priority_node *last_parent_depends_on;
};
/* Another splay tree instantiation, for priority_list's. */
typedef struct prio_splay_tree_node_s *prio_splay_tree_node;
typedef struct prio_splay_tree_s *prio_splay_tree;
typedef struct prio_splay_tree_key_s *prio_splay_tree_key;
struct prio_splay_tree_key_s {
/* This structure must only containing a priority_list, as we cast
prio_splay_tree_key to priority_list throughout. */
struct priority_list l;
};
#define splay_tree_prefix prio
#include "splay-tree.h"
/* The entry point into a priority queue of tasks.
There are two alternate implementations with which to store tasks:
as a balanced tree of sorts, or as a simple list of tasks. If
there are only priority-0 items (ROOT is NULL), we use the simple
list, otherwise (ROOT is non-NULL) we use the tree. */
struct priority_queue
{
/* If t.root != NULL, this is a splay tree of priority_lists to hold
all tasks. This is only used if multiple priorities are in play,
otherwise we use the priority_list `l' below to hold all
(priority-0) tasks. */
struct prio_splay_tree_s t;
/* If T above is NULL, only priority-0 items exist, so keep them
in a simple list. */
struct priority_list l;
};
enum priority_insert_type {
/* Insert at the beginning of a priority list. */
PRIORITY_INSERT_BEGIN,
/* Insert at the end of a priority list. */
PRIORITY_INSERT_END
};
/* Used to determine in which queue a given priority node belongs in.
See pnode field of gomp_task. */
enum priority_queue_type
{
PQ_TEAM, /* Node belongs in gomp_team's task_queue. */
PQ_CHILDREN, /* Node belongs in parent's children_queue. */
PQ_TASKGROUP, /* Node belongs in taskgroup->taskgroup_queue. */
PQ_IGNORED = 999
};
/* Priority queue implementation prototypes. */
extern bool priority_queue_task_in_queue_p (enum priority_queue_type,
struct priority_queue *,
struct gomp_task *);
extern void priority_queue_dump (enum priority_queue_type,
struct priority_queue *);
extern void priority_queue_verify (enum priority_queue_type,
struct priority_queue *, bool);
extern void priority_tree_remove (enum priority_queue_type,
struct priority_queue *,
struct priority_node *);
extern struct gomp_task *priority_tree_next_task (enum priority_queue_type,
struct priority_queue *,
enum priority_queue_type,
struct priority_queue *,
bool *);
/* Return TRUE if there is more than one priority in HEAD. This is
used throughout to to choose between the fast path (priority 0 only
items) and a world with multiple priorities. */
static inline bool
priority_queue_multi_p (struct priority_queue *head)
{
return __builtin_expect (head->t.root != NULL, 0);
}
/* Initialize a priority queue. */
static inline void
priority_queue_init (struct priority_queue *head)
{
head->t.root = NULL;
/* To save a few microseconds, we don't initialize head->l.priority
to 0 here. It is implied that priority will be 0 if head->t.root
== NULL.
priority_tree_insert() will fix this when we encounter multiple
priorities. */
head->l.tasks = NULL;
head->l.last_parent_depends_on = NULL;
}
static inline void
priority_queue_free (struct priority_queue *head)
{
/* There's nothing to do, as tasks were freed as they were removed
in priority_queue_remove. */
}
/* Forward declarations. */
static inline size_t priority_queue_offset (enum priority_queue_type);
static inline struct gomp_task *priority_node_to_task
(enum priority_queue_type,
struct priority_node *);
static inline struct priority_node *task_to_priority_node
(enum priority_queue_type,
struct gomp_task *);
/* Return TRUE if priority queue HEAD is empty.
MODEL IS MEMMODEL_ACQUIRE if we should use an acquire atomic to
read from the root of the queue, otherwise MEMMODEL_RELAXED if we
should use a plain load. */
static inline _Bool
priority_queue_empty_p (struct priority_queue *head, enum memmodel model)
{
/* Note: The acquire barriers on the loads here synchronize with
the write of a NULL in gomp_task_run_post_remove_parent. It is
not necessary that we synchronize with other non-NULL writes at
this point, but we must ensure that all writes to memory by a
child thread task work function are seen before we exit from
GOMP_taskwait. */
if (priority_queue_multi_p (head))
{
if (model == MEMMODEL_ACQUIRE)
return __atomic_load_n (&head->t.root, MEMMODEL_ACQUIRE) == NULL;
return head->t.root == NULL;
}
if (model == MEMMODEL_ACQUIRE)
return __atomic_load_n (&head->l.tasks, MEMMODEL_ACQUIRE) == NULL;
return head->l.tasks == NULL;
}
/* Look for a given PRIORITY in HEAD. Return it if found, otherwise
return NULL. This only applies to the tree variant in HEAD. There
is no point in searching for priorities in HEAD->L. */
static inline struct priority_list *
priority_queue_lookup_priority (struct priority_queue *head, int priority)
{
if (head->t.root == NULL)
return NULL;
struct prio_splay_tree_key_s k;
k.l.priority = priority;
return (struct priority_list *)
prio_splay_tree_lookup (&head->t, &k);
}
/* Insert task in DATA, with PRIORITY, in the priority list in LIST.
LIST contains items of type TYPE.
If POS is PRIORITY_INSERT_BEGIN, the new task is inserted at the
top of its respective priority. If POS is PRIORITY_INSERT_END, the
task is inserted at the end of its priority.
If ADJUST_PARENT_DEPENDS_ON is TRUE, LIST is a children queue, and
we must keep track of higher and lower priority WAITING tasks by
keeping the queue's last_parent_depends_on field accurate. This
only applies to the children queue, and the caller must ensure LIST
is a children queue in this case.
If ADJUST_PARENT_DEPENDS_ON is TRUE, TASK_IS_PARENT_DEPENDS_ON is
set to the task's parent_depends_on field. If
ADJUST_PARENT_DEPENDS_ON is FALSE, this field is irrelevant.
Return the new priority_node. */
static inline void
priority_list_insert (enum priority_queue_type type,
struct priority_list *list,
struct gomp_task *task,
int priority,
enum priority_insert_type pos,
bool adjust_parent_depends_on,
bool task_is_parent_depends_on)
{
struct priority_node *node = task_to_priority_node (type, task);
if (list->tasks)
{
/* If we are keeping track of higher/lower priority items,
but this is a lower priority WAITING task
(parent_depends_on != NULL), put it after all ready to
run tasks. See the comment in
priority_queue_upgrade_task for a visual on how tasks
should be organized. */
if (adjust_parent_depends_on
&& pos == PRIORITY_INSERT_BEGIN
&& list->last_parent_depends_on
&& !task_is_parent_depends_on)
{
struct priority_node *last_parent_depends_on
= list->last_parent_depends_on;
node->next = last_parent_depends_on->next;
node->prev = last_parent_depends_on;
}
/* Otherwise, put it at the top/bottom of the queue. */
else
{
node->next = list->tasks;
node->prev = list->tasks->prev;
if (pos == PRIORITY_INSERT_BEGIN)
list->tasks = node;
}
node->next->prev = node;
node->prev->next = node;
}
else
{
node->next = node;
node->prev = node;
list->tasks = node;
}
if (adjust_parent_depends_on
&& list->last_parent_depends_on == NULL
&& task_is_parent_depends_on)
list->last_parent_depends_on = node;
}
/* Tree version of priority_list_insert. */
static inline void
priority_tree_insert (enum priority_queue_type type,
struct priority_queue *head,
struct gomp_task *task,
int priority,
enum priority_insert_type pos,
bool adjust_parent_depends_on,
bool task_is_parent_depends_on)
{
if (__builtin_expect (head->t.root == NULL, 0))
{
/* The first time around, transfer any priority 0 items to the
tree. */
if (head->l.tasks != NULL)
{
prio_splay_tree_node k = gomp_malloc (sizeof (*k));
k->left = NULL;
k->right = NULL;
k->key.l.priority = 0;
k->key.l.tasks = head->l.tasks;
k->key.l.last_parent_depends_on = head->l.last_parent_depends_on;
prio_splay_tree_insert (&head->t, k);
head->l.tasks = NULL;
}
}
struct priority_list *list
= priority_queue_lookup_priority (head, priority);
if (!list)
{
prio_splay_tree_node k = gomp_malloc (sizeof (*k));
k->left = NULL;
k->right = NULL;
k->key.l.priority = priority;
k->key.l.tasks = NULL;
k->key.l.last_parent_depends_on = NULL;
prio_splay_tree_insert (&head->t, k);
list = &k->key.l;
}
priority_list_insert (type, list, task, priority, pos,
adjust_parent_depends_on,
task_is_parent_depends_on);
}
/* Generic version of priority_*_insert. */
static inline void
priority_queue_insert (enum priority_queue_type type,
struct priority_queue *head,
struct gomp_task *task,
int priority,
enum priority_insert_type pos,
bool adjust_parent_depends_on,
bool task_is_parent_depends_on)
{
#if _LIBGOMP_CHECKING_
if (priority_queue_task_in_queue_p (type, head, task))
gomp_fatal ("Attempt to insert existing task %p", task);
#endif
if (priority_queue_multi_p (head) || __builtin_expect (priority > 0, 0))
priority_tree_insert (type, head, task, priority, pos,
adjust_parent_depends_on,
task_is_parent_depends_on);
else
priority_list_insert (type, &head->l, task, priority, pos,
adjust_parent_depends_on,
task_is_parent_depends_on);
}
/* If multiple priorities are in play, return the highest priority
task from within Q1 and Q2, while giving preference to tasks from
Q1. If the returned task is chosen from Q1, *Q1_CHOSEN_P is set to
TRUE, otherwise it is set to FALSE.
If multiple priorities are not in play (only 0 priorities are
available), the next task is chosen exclusively from Q1.
As a special case, Q2 can be NULL, in which case, we just choose
the highest priority WAITING task in Q1. This is an optimization
to speed up looking through only one queue.
We assume Q1 has at least one item. */
static inline struct gomp_task *
priority_queue_next_task (enum priority_queue_type t1,
struct priority_queue *q1,
enum priority_queue_type t2,
struct priority_queue *q2,
bool *q1_chosen_p)
{
#if _LIBGOMP_CHECKING_
if (priority_queue_empty_p (q1, MEMMODEL_RELAXED))
gomp_fatal ("priority_queue_next_task: Q1 is empty");
#endif
if (priority_queue_multi_p (q1))
{
struct gomp_task *t
= priority_tree_next_task (t1, q1, t2, q2, q1_chosen_p);
/* If T is NULL, there are no WAITING tasks in Q1. In which
case, return any old (non-waiting) task which will cause the
caller to do the right thing when checking T->KIND ==
GOMP_TASK_WAITING. */
if (!t)
{
#if _LIBGOMP_CHECKING_
if (*q1_chosen_p == false)
gomp_fatal ("priority_queue_next_task inconsistency");
#endif
return priority_node_to_task (t1, q1->t.root->key.l.tasks);
}
return t;
}
else
{
*q1_chosen_p = true;
return priority_node_to_task (t1, q1->l.tasks);
}
}
/* Remove NODE from LIST.
If we are removing the one and only item in the list, and MODEL is
MEMMODEL_RELEASE, use an atomic release to clear the list.
If the list becomes empty after the remove, return TRUE. */
static inline bool
priority_list_remove (struct priority_list *list,
struct priority_node *node,
enum memmodel model)
{
bool empty = false;
node->prev->next = node->next;
node->next->prev = node->prev;
if (list->tasks == node)
{
if (node->next != node)
list->tasks = node->next;
else
{
/* We access task->children in GOMP_taskwait outside of
the task lock mutex region, so need a release barrier
here to ensure memory written by child_task->fn above
is flushed before the NULL is written. */
if (model == MEMMODEL_RELEASE)
__atomic_store_n (&list->tasks, NULL, MEMMODEL_RELEASE);
else
list->tasks = NULL;
empty = true;
goto remove_out;
}
}
remove_out:
#if _LIBGOMP_CHECKING_
memset (node, 0xaf, sizeof (*node));
#endif
return empty;
}
/* This is the generic version of priority_list_remove.
Remove NODE from priority queue HEAD. HEAD contains tasks of type TYPE.
If we are removing the one and only item in the priority queue and
MODEL is MEMMODEL_RELEASE, use an atomic release to clear the queue.
If the queue becomes empty after the remove, return TRUE. */
static inline bool
priority_queue_remove (enum priority_queue_type type,
struct priority_queue *head,
struct gomp_task *task,
enum memmodel model)
{
#if _LIBGOMP_CHECKING_
if (!priority_queue_task_in_queue_p (type, head, task))
gomp_fatal ("Attempt to remove missing task %p", task);
#endif
if (priority_queue_multi_p (head))
{
priority_tree_remove (type, head, task_to_priority_node (type, task));
if (head->t.root == NULL)
{
if (model == MEMMODEL_RELEASE)
/* Errr, we store NULL twice, the alternative would be to
use an atomic release directly in the splay tree
routines. Worth it? */
__atomic_store_n (&head->t.root, NULL, MEMMODEL_RELEASE);
return true;
}
return false;
}
else
return priority_list_remove (&head->l,
task_to_priority_node (type, task), model);
}
#endif /* _PRIORITY_QUEUE_H_ */

View File

@ -37,9 +37,6 @@
are amortized O(log n) time for a tree with n nodes. */
#include "libgomp.h"
#include "splay-tree.h"
extern int splay_compare (splay_tree_key, splay_tree_key);
/* Rotate the edge joining the left child N with its parent P. PP is the
grandparents' pointer to P. */
@ -215,3 +212,27 @@ splay_tree_lookup (splay_tree sp, splay_tree_key key)
else
return NULL;
}
/* Helper function for splay_tree_foreach.
Run FUNC on every node in KEY. */
static void
splay_tree_foreach_internal (splay_tree_node node, splay_tree_callback func,
void *data)
{
if (!node)
return;
func (&node->key, data);
splay_tree_foreach_internal (node->left, func, data);
/* Yeah, whatever. GCC can fix my tail recursion. */
splay_tree_foreach_internal (node->right, func, data);
}
/* Run FUNC on each of the nodes in SP. */
attribute_hidden void
splay_tree_foreach (splay_tree sp, splay_tree_callback func, void *data)
{
splay_tree_foreach_internal (sp->root, func, data);
}

View File

@ -33,7 +33,17 @@ typedef struct splay_tree_node_s *splay_tree_node;
typedef struct splay_tree_s *splay_tree;
typedef struct splay_tree_key_s *splay_tree_key;
define splay_tree_key_s structure, and define
splay_compare inline function. */
splay_compare inline function.
Alternatively, they can define splay_tree_prefix macro before
including this header and then all the above types, the
splay_compare function and the splay_tree_{lookup,insert_remove}
function will be prefixed by that prefix. If splay_tree_prefix
macro is defined, this header must be included twice: once where
you need the header file definitions, and once where you need the
.c implementation routines. In the latter case, you must also
define the macro splay_tree_c. See the include of splay-tree.h in
priority_queue.[hc] for an example. */
/* For an easily readable description of splay-trees, see:
@ -43,8 +53,37 @@ typedef struct splay_tree_key_s *splay_tree_key;
The major feature of splay trees is that all basic tree operations
are amortized O(log n) time for a tree with n nodes. */
#ifndef _SPLAY_TREE_H
#define _SPLAY_TREE_H 1
#ifdef splay_tree_prefix
# define splay_tree_name_1(prefix, name) prefix ## _ ## name
# define splay_tree_name(prefix, name) splay_tree_name_1 (prefix, name)
# define splay_tree_node_s \
splay_tree_name (splay_tree_prefix, splay_tree_node_s)
# define splay_tree_s \
splay_tree_name (splay_tree_prefix, splay_tree_s)
# define splay_tree_key_s \
splay_tree_name (splay_tree_prefix, splay_tree_key_s)
# define splay_tree_node \
splay_tree_name (splay_tree_prefix, splay_tree_node)
# define splay_tree \
splay_tree_name (splay_tree_prefix, splay_tree)
# define splay_tree_key \
splay_tree_name (splay_tree_prefix, splay_tree_key)
# define splay_compare \
splay_tree_name (splay_tree_prefix, splay_compare)
# define splay_tree_lookup \
splay_tree_name (splay_tree_prefix, splay_tree_lookup)
# define splay_tree_insert \
splay_tree_name (splay_tree_prefix, splay_tree_insert)
# define splay_tree_remove \
splay_tree_name (splay_tree_prefix, splay_tree_remove)
# define splay_tree_foreach \
splay_tree_name (splay_tree_prefix, splay_tree_foreach)
# define splay_tree_callback \
splay_tree_name (splay_tree_prefix, splay_tree_callback)
#endif
#ifndef splay_tree_c
/* Header file definitions and prototypes. */
/* The nodes in the splay tree. */
struct splay_tree_node_s {
@ -59,8 +98,33 @@ struct splay_tree_s {
splay_tree_node root;
};
typedef void (*splay_tree_callback) (splay_tree_key, void *);
extern splay_tree_key splay_tree_lookup (splay_tree, splay_tree_key);
extern void splay_tree_insert (splay_tree, splay_tree_node);
extern void splay_tree_remove (splay_tree, splay_tree_key);
extern void splay_tree_foreach (splay_tree, splay_tree_callback, void *);
#else /* splay_tree_c */
# ifdef splay_tree_prefix
# include "splay-tree.c"
# undef splay_tree_name_1
# undef splay_tree_name
# undef splay_tree_node_s
# undef splay_tree_s
# undef splay_tree_key_s
# undef splay_tree_node
# undef splay_tree
# undef splay_tree_key
# undef splay_compare
# undef splay_tree_lookup
# undef splay_tree_insert
# undef splay_tree_remove
# undef splay_tree_foreach
# undef splay_tree_callback
# undef splay_tree_c
# endif
#endif /* #ifndef splay_tree_c */
#endif /* _SPLAY_TREE_H */
#ifdef splay_tree_prefix
# undef splay_tree_prefix
#endif

View File

@ -92,23 +92,6 @@ gomp_realloc_unlock (void *old, size_t size)
return ret;
}
/* The comparison function. */
attribute_hidden int
splay_compare (splay_tree_key x, splay_tree_key y)
{
if (x->host_start == x->host_end
&& y->host_start == y->host_end)
return 0;
if (x->host_end <= y->host_start)
return -1;
if (x->host_start >= y->host_end)
return 1;
return 0;
}
#include "splay-tree.h"
attribute_hidden void
gomp_init_targets_once (void)
{
@ -1365,17 +1348,7 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
struct target_mem_desc *tgt_vars
= gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
GOMP_MAP_VARS_TARGET);
struct gomp_thread old_thr, *thr = gomp_thread ();
old_thr = *thr;
memset (thr, '\0', sizeof (*thr));
if (gomp_places_list)
{
thr->place = old_thr.place;
thr->ts.place_partition_len = gomp_places_list_len;
}
devicep->run_func (devicep->target_id, fn_addr, (void *) tgt_vars->tgt_start);
gomp_free_thread (thr);
*thr = old_thr;
gomp_unmap_vars (tgt_vars, true);
}
@ -1404,10 +1377,52 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
(void) num_teams;
(void) thread_limit;
/* If there are depend clauses, but nowait is not present,
block the parent task until the dependencies are resolved
and then just continue with the rest of the function as if it
is a merged task. */
if (flags & GOMP_TARGET_FLAG_NOWAIT)
{
struct gomp_thread *thr = gomp_thread ();
/* Create a team if we don't have any around, as nowait
target tasks make sense to run asynchronously even when
outside of any parallel. */
if (__builtin_expect (thr->ts.team == NULL, 0))
{
struct gomp_team *team = gomp_new_team (1);
struct gomp_task *task = thr->task;
struct gomp_task_icv *icv = task ? &task->icv : &gomp_global_icv;
team->prev_ts = thr->ts;
thr->ts.team = team;
thr->ts.team_id = 0;
thr->ts.work_share = &team->work_shares[0];
thr->ts.last_work_share = NULL;
#ifdef HAVE_SYNC_BUILTINS
thr->ts.single_count = 0;
#endif
thr->ts.static_trip = 0;
thr->task = &team->implicit_task[0];
gomp_init_task (thr->task, NULL, icv);
if (task)
{
thr->task = task;
gomp_end_task ();
free (task);
thr->task = &team->implicit_task[0];
}
else
pthread_setspecific (gomp_thread_destructor, thr);
}
if (thr->ts.team
&& !thr->task->final_task)
{
gomp_create_target_task (devicep, fn, mapnum, hostaddrs,
sizes, kinds, flags, depend,
GOMP_TARGET_TASK_BEFORE_MAP);
return;
}
}
/* If there are depend clauses, but nowait is not present
(or we are in a final task), block the parent task until the
dependencies are resolved and then just continue with the rest
of the function as if it is a merged task. */
if (depend != NULL)
{
struct gomp_thread *thr = gomp_thread ();
@ -1427,17 +1442,7 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
struct target_mem_desc *tgt_vars
= gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, true,
GOMP_MAP_VARS_TARGET);
struct gomp_thread old_thr, *thr = gomp_thread ();
old_thr = *thr;
memset (thr, '\0', sizeof (*thr));
if (gomp_places_list)
{
thr->place = old_thr.place;
thr->ts.place_partition_len = gomp_places_list_len;
}
devicep->run_func (devicep->target_id, fn_addr, (void *) tgt_vars->tgt_start);
gomp_free_thread (thr);
*thr = old_thr;
gomp_unmap_vars (tgt_vars, true);
}
@ -1544,23 +1549,25 @@ GOMP_target_update_ext (int device, size_t mapnum, void **hostaddrs,
&& thr->ts.team
&& !thr->task->final_task)
{
gomp_create_target_task (devicep, (void (*) (void *)) NULL,
mapnum, hostaddrs, sizes, kinds,
flags | GOMP_TARGET_FLAG_UPDATE,
depend);
return;
if (gomp_create_target_task (devicep, (void (*) (void *)) NULL,
mapnum, hostaddrs, sizes, kinds,
flags | GOMP_TARGET_FLAG_UPDATE,
depend, GOMP_TARGET_TASK_DATA))
return;
}
else
{
struct gomp_team *team = thr->ts.team;
/* If parallel or taskgroup has been cancelled, don't start new
tasks. */
if (team
&& (gomp_team_barrier_cancelled (&team->barrier)
|| (thr->task->taskgroup
&& thr->task->taskgroup->cancelled)))
return;
struct gomp_team *team = thr->ts.team;
/* If parallel or taskgroup has been cancelled, don't start new
tasks. */
if (team
&& (gomp_team_barrier_cancelled (&team->barrier)
|| (thr->task->taskgroup
&& thr->task->taskgroup->cancelled)))
return;
gomp_task_maybe_wait_for_dependencies (depend);
gomp_task_maybe_wait_for_dependencies (depend);
}
}
}
@ -1664,22 +1671,25 @@ GOMP_target_enter_exit_data (int device, size_t mapnum, void **hostaddrs,
&& thr->ts.team
&& !thr->task->final_task)
{
gomp_create_target_task (devicep, (void (*) (void *)) NULL,
mapnum, hostaddrs, sizes, kinds,
flags, depend);
return;
if (gomp_create_target_task (devicep, (void (*) (void *)) NULL,
mapnum, hostaddrs, sizes, kinds,
flags, depend,
GOMP_TARGET_TASK_DATA))
return;
}
else
{
struct gomp_team *team = thr->ts.team;
/* If parallel or taskgroup has been cancelled, don't start new
tasks. */
if (team
&& (gomp_team_barrier_cancelled (&team->barrier)
|| (thr->task->taskgroup
&& thr->task->taskgroup->cancelled)))
return;
struct gomp_team *team = thr->ts.team;
/* If parallel or taskgroup has been cancelled, don't start new
tasks. */
if (team
&& (gomp_team_barrier_cancelled (&team->barrier)
|| (thr->task->taskgroup
&& thr->task->taskgroup->cancelled)))
return;
gomp_task_maybe_wait_for_dependencies (depend);
gomp_task_maybe_wait_for_dependencies (depend);
}
}
}
@ -1711,38 +1721,65 @@ GOMP_target_enter_exit_data (int device, size_t mapnum, void **hostaddrs,
gomp_exit_data (devicep, mapnum, hostaddrs, sizes, kinds);
}
void
bool
gomp_target_task_fn (void *data)
{
struct gomp_target_task *ttask = (struct gomp_target_task *) data;
struct gomp_device_descr *devicep = ttask->devicep;
if (ttask->fn != NULL)
{
/* GOMP_target_ext */
if (devicep == NULL
|| !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
{
ttask->state = GOMP_TARGET_TASK_FALLBACK;
gomp_target_fallback_firstprivate (ttask->fn, ttask->mapnum,
ttask->hostaddrs, ttask->sizes,
ttask->kinds);
return false;
}
if (ttask->state == GOMP_TARGET_TASK_FINISHED)
{
gomp_unmap_vars (ttask->tgt, true);
return false;
}
void *fn_addr = gomp_get_target_fn_addr (devicep, ttask->fn);
ttask->tgt
= gomp_map_vars (devicep, ttask->mapnum, ttask->hostaddrs, NULL,
ttask->sizes, ttask->kinds, true,
GOMP_MAP_VARS_TARGET);
ttask->state = GOMP_TARGET_TASK_READY_TO_RUN;
devicep->async_run_func (devicep->target_id, fn_addr,
(void *) ttask->tgt->tgt_start, (void *) ttask);
return true;
}
else if (ttask->devicep == NULL
|| !(ttask->devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
return;
else if (devicep == NULL
|| !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
return false;
size_t i;
if (ttask->flags & GOMP_TARGET_FLAG_UPDATE)
gomp_update (ttask->devicep, ttask->mapnum, ttask->hostaddrs, ttask->sizes,
gomp_update (devicep, ttask->mapnum, ttask->hostaddrs, ttask->sizes,
ttask->kinds, true);
else if ((ttask->flags & GOMP_TARGET_FLAG_EXIT_DATA) == 0)
for (i = 0; i < ttask->mapnum; i++)
if ((ttask->kinds[i] & 0xff) == GOMP_MAP_STRUCT)
{
gomp_map_vars (ttask->devicep, ttask->sizes[i] + 1,
&ttask->hostaddrs[i], NULL, &ttask->sizes[i],
&ttask->kinds[i], true, GOMP_MAP_VARS_ENTER_DATA);
gomp_map_vars (devicep, ttask->sizes[i] + 1, &ttask->hostaddrs[i],
NULL, &ttask->sizes[i], &ttask->kinds[i], true,
GOMP_MAP_VARS_ENTER_DATA);
i += ttask->sizes[i];
}
else
gomp_map_vars (ttask->devicep, 1, &ttask->hostaddrs[i], NULL,
&ttask->sizes[i], &ttask->kinds[i],
true, GOMP_MAP_VARS_ENTER_DATA);
gomp_map_vars (devicep, 1, &ttask->hostaddrs[i], NULL, &ttask->sizes[i],
&ttask->kinds[i], true, GOMP_MAP_VARS_ENTER_DATA);
else
gomp_exit_data (ttask->devicep, ttask->mapnum, ttask->hostaddrs,
ttask->sizes, ttask->kinds);
gomp_exit_data (devicep, ttask->mapnum, ttask->hostaddrs, ttask->sizes,
ttask->kinds);
return false;
}
void
@ -2187,6 +2224,7 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
if (device->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
{
DLSYM (run);
DLSYM (async_run);
DLSYM (dev2dev);
}
if (device->capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)

File diff suppressed because it is too large Load Diff

View File

@ -155,8 +155,8 @@ GOMP_taskloop (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
else
ialias_call (GOMP_taskgroup_start) ();
/* FIXME, use priority. */
(void) priority;
if (priority > gomp_max_task_priority_var)
priority = gomp_max_task_priority_var;
if ((flags & GOMP_TASK_FLAG_IF) == 0 || team == NULL
|| (thr->task && thr->task->final_task)
@ -175,6 +175,7 @@ GOMP_taskloop (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
for (i = 0; i < num_tasks; i++)
{
gomp_init_task (&task[i], parent, gomp_icv (false));
task[i].priority = priority;
task[i].kind = GOMP_TASK_UNDEFERRED;
task[i].final_task = (thr->task && thr->task->final_task)
|| (flags & GOMP_TASK_FLAG_FINAL);
@ -198,10 +199,11 @@ GOMP_taskloop (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
task_step -= step;
fn (arg);
arg += arg_size;
if (task[i].children != NULL)
if (!priority_queue_empty_p (&task[i].children_queue,
MEMMODEL_RELAXED))
{
gomp_mutex_lock (&team->task_lock);
gomp_clear_parent (task[i].children);
gomp_clear_parent (&task[i].children_queue);
gomp_mutex_unlock (&team->task_lock);
}
gomp_end_task ();
@ -213,6 +215,7 @@ GOMP_taskloop (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
struct gomp_task task;
gomp_init_task (&task, thr->task, gomp_icv (false));
task.priority = priority;
task.kind = GOMP_TASK_UNDEFERRED;
task.final_task = (thr->task && thr->task->final_task)
|| (flags & GOMP_TASK_FLAG_FINAL);
@ -228,10 +231,11 @@ GOMP_taskloop (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
if (i == nfirst)
task_step -= step;
fn (data);
if (task.children != NULL)
if (!priority_queue_empty_p (&task.children_queue,
MEMMODEL_RELAXED))
{
gomp_mutex_lock (&team->task_lock);
gomp_clear_parent (task.children);
gomp_clear_parent (&task.children_queue);
gomp_mutex_unlock (&team->task_lock);
}
gomp_end_task ();
@ -254,6 +258,7 @@ GOMP_taskloop (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
arg = (char *) (((uintptr_t) (task + 1) + arg_align - 1)
& ~(uintptr_t) (arg_align - 1));
gomp_init_task (task, parent, gomp_icv (false));
task->priority = priority;
task->kind = GOMP_TASK_UNDEFERRED;
task->in_tied_task = parent->in_tied_task;
task->taskgroup = taskgroup;
@ -298,48 +303,20 @@ GOMP_taskloop (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
for (i = 0; i < num_tasks; i++)
{
struct gomp_task *task = tasks[i];
if (parent->children)
{
task->next_child = parent->children;
task->prev_child = parent->children->prev_child;
task->next_child->prev_child = task;
task->prev_child->next_child = task;
}
else
{
task->next_child = task;
task->prev_child = task;
}
parent->children = task;
priority_queue_insert (PQ_CHILDREN, &parent->children_queue,
task, priority,
PRIORITY_INSERT_BEGIN,
/*last_parent_depends_on=*/false,
task->parent_depends_on);
if (taskgroup)
{
if (taskgroup->children)
{
task->next_taskgroup = taskgroup->children;
task->prev_taskgroup = taskgroup->children->prev_taskgroup;
task->next_taskgroup->prev_taskgroup = task;
task->prev_taskgroup->next_taskgroup = task;
}
else
{
task->next_taskgroup = task;
task->prev_taskgroup = task;
}
taskgroup->children = task;
}
if (team->task_queue)
{
task->next_queue = team->task_queue;
task->prev_queue = team->task_queue->prev_queue;
task->next_queue->prev_queue = task;
task->prev_queue->next_queue = task;
}
else
{
task->next_queue = task;
task->prev_queue = task;
team->task_queue = task;
}
priority_queue_insert (PQ_TASKGROUP, &taskgroup->taskgroup_queue,
task, priority, PRIORITY_INSERT_BEGIN,
/*last_parent_depends_on=*/false,
task->parent_depends_on);
priority_queue_insert (PQ_TEAM, &team->task_queue, task, priority,
PRIORITY_INSERT_END,
/*last_parent_depends_on=*/false,
task->parent_depends_on);
++team->task_count;
++team->task_queued_count;
}

View File

@ -193,7 +193,7 @@ gomp_new_team (unsigned nthreads)
team->ordered_release = (void *) &team->implicit_task[nthreads];
team->ordered_release[0] = &team->master_release;
team->task_queue = NULL;
priority_queue_init (&team->task_queue);
team->task_count = 0;
team->task_queued_count = 0;
team->task_running_count = 0;
@ -214,6 +214,7 @@ free_team (struct gomp_team *team)
#endif
gomp_barrier_destroy (&team->barrier);
gomp_mutex_destroy (&team->task_lock);
priority_queue_free (&team->task_queue);
free (team);
}
@ -271,6 +272,8 @@ gomp_free_thread (void *arg __attribute__((unused)))
free (pool);
thr->thread_pool = NULL;
}
if (thr->ts.level == 0 && __builtin_expect (thr->ts.team != NULL, 0))
gomp_team_end ();
if (thr->task != NULL)
{
struct gomp_task *task = thr->task;
@ -300,7 +303,7 @@ gomp_team_start (void (*fn) (void *), void *data, unsigned nthreads,
struct gomp_thread **affinity_thr = NULL;
thr = gomp_thread ();
nested = thr->ts.team != NULL;
nested = thr->ts.level;
pool = thr->thread_pool;
task = thr->task;
icv = task ? &task->icv : &gomp_global_icv;

View File

@ -0,0 +1,225 @@
extern void abort (void);
#define N 256
int a[N], b[N / 16][8][4], c[N / 32][8][8], g[N / 16][8][6];
volatile int d, e;
volatile unsigned long long f;
int
main ()
{
unsigned long long i;
int j, k, l, m;
#pragma omp parallel private (l)
{
#pragma omp for schedule(guided, 3) ordered (1) nowait
for (i = 1; i < N + f; i++)
{
#pragma omp atomic write
a[i] = 1;
#pragma omp ordered depend(sink: i - 1)
if (i > 1)
{
#pragma omp atomic read
l = a[i - 1];
if (l < 2)
abort ();
}
#pragma omp atomic write
a[i] = 2;
if (i < N - 1)
{
#pragma omp atomic read
l = a[i + 1];
if (l == 3)
abort ();
}
#pragma omp ordered depend(source)
#pragma omp atomic write
a[i] = 3;
}
#pragma omp for schedule(guided) ordered (3) nowait
for (i = 3; i < N / 16 - 1 + f; i++)
for (j = 0; j < 8; j += 2)
for (k = 1; k <= 3; k++)
{
#pragma omp atomic write
b[i][j][k] = 1;
#pragma omp ordered depend(sink: i, j - 2, k - 1) \
depend(sink: i - 2, j - 2, k + 1)
#pragma omp ordered depend(sink: i - 3, j + 2, k - 2)
if (j >= 2 && k > 1)
{
#pragma omp atomic read
l = b[i][j - 2][k - 1];
if (l < 2)
abort ();
}
#pragma omp atomic write
b[i][j][k] = 2;
if (i >= 5 && j >= 2 && k < 3)
{
#pragma omp atomic read
l = b[i - 2][j - 2][k + 1];
if (l < 2)
abort ();
}
if (i >= 6 && j < N / 16 - 3 && k == 3)
{
#pragma omp atomic read
l = b[i - 3][j + 2][k - 2];
if (l < 2)
abort ();
}
#pragma omp ordered depend(source)
#pragma omp atomic write
b[i][j][k] = 3;
}
#define A(n) int n;
#define B(n) A(n##0) A(n##1) A(n##2) A(n##3)
#define C(n) B(n##0) B(n##1) B(n##2) B(n##3)
#define D(n) C(n##0) C(n##1) C(n##2) C(n##3)
D(m)
#undef A
#pragma omp for collapse (2) ordered(61) schedule(guided, 15)
for (i = 2; i < N / 32 + f; i++)
for (j = 7; j > 1; j--)
for (k = 6; k >= 0; k -= 2)
#define A(n) for (n = 4; n < 5; n++)
D(m)
#undef A
{
#pragma omp atomic write
c[i][j][k] = 1;
#define A(n) ,n
#define E(n) C(n##0) C(n##1) C(n##2) B(n##30) B(n##31) A(n##320) A(n##321)
#pragma omp ordered depend (sink: i, j, k + 2 E(m)) \
depend (sink:i - 2, j + 1, k - 4 E(m)) \
depend(sink: i - 1, j - 2, k - 2 E(m))
if (k <= 4)
{
l = c[i][j][k + 2];
if (l < 2)
abort ();
}
#pragma omp atomic write
c[i][j][k] = 2;
if (i >= 4 && j < 7 && k >= 4)
{
l = c[i - 2][j + 1][k - 4];
if (l < 2)
abort ();
}
if (i >= 3 && j >= 4 && k >= 2)
{
l = c[i - 1][j - 2][k - 2];
if (l < 2)
abort ();
}
#pragma omp ordered depend (source)
#pragma omp atomic write
c[i][j][k] = 3;
}
#pragma omp for schedule(guided, 5) ordered (3) nowait
for (j = 0; j < N / 16 - 1; j++)
for (k = 0; k < 8; k += 2)
for (i = 3; i <= 5 + f; i++)
{
#pragma omp atomic write
g[j][k][i] = 1;
#pragma omp ordered depend(sink: j, k - 2, i - 1) \
depend(sink: j - 2, k - 2, i + 1)
#pragma omp ordered depend(sink: j - 3, k + 2, i - 2)
if (k >= 2 && i > 3)
{
#pragma omp atomic read
l = g[j][k - 2][i - 1];
if (l < 2)
abort ();
}
#pragma omp atomic write
g[j][k][i] = 2;
if (j >= 2 && k >= 2 && i < 5)
{
#pragma omp atomic read
l = g[j - 2][k - 2][i + 1];
if (l < 2)
abort ();
}
if (j >= 3 && k < N / 16 - 3 && i == 5)
{
#pragma omp atomic read
l = g[j - 3][k + 2][i - 2];
if (l < 2)
abort ();
}
#pragma omp ordered depend(source)
#pragma omp atomic write
g[j][k][i] = 3;
}
#pragma omp for collapse(2) ordered(4) lastprivate (i, j, k)
for (i = 2; i < f + 3; i++)
for (j = d + 1; j >= 0; j--)
for (k = 0; k < d; k++)
for (l = 0; l < d + 2; l++)
{
#pragma omp ordered depend (source)
#pragma omp ordered depend (sink:i - 2, j + 2, k - 2, l)
if (!e)
abort ();
}
#pragma omp single
{
if (i != 3 || j != -1 || k != 0)
abort ();
i = 8; j = 9; k = 10;
}
#pragma omp for collapse(2) ordered(4) lastprivate (i, j, k, m)
for (i = 2; i < f + 3; i++)
for (j = d + 1; j >= 0; j--)
for (k = 0; k < d + 2; k++)
for (m = 0; m < d; m++)
{
#pragma omp ordered depend (source)
#pragma omp ordered depend (sink:i - 2, j + 2, k - 2, m)
abort ();
}
#pragma omp single
if (i != 3 || j != -1 || k != 2 || m != 0)
abort ();
#pragma omp for collapse(2) ordered(4) nowait
for (i = 2; i < f + 3; i++)
for (j = d; j > 0; j--)
for (k = 0; k < d + 2; k++)
for (l = 0; l < d + 4; l++)
{
#pragma omp ordered depend (source)
#pragma omp ordered depend (sink:i - 2, j + 2, k - 2, l)
if (!e)
abort ();
}
#pragma omp for nowait
for (i = 0; i < N; i++)
if (a[i] != 3 * (i >= 1))
abort ();
#pragma omp for collapse(2) private(k) nowait
for (i = 0; i < N / 16; i++)
for (j = 0; j < 8; j++)
for (k = 0; k < 4; k++)
if (b[i][j][k] != 3 * (i >= 3 && i < N / 16 - 1 && (j & 1) == 0 && k >= 1))
abort ();
#pragma omp for collapse(3) nowait
for (i = 0; i < N / 32; i++)
for (j = 0; j < 8; j++)
for (k = 0; k < 8; k++)
if (c[i][j][k] != 3 * (i >= 2 && j >= 2 && (k & 1) == 0))
abort ();
#pragma omp for collapse(2) private(k) nowait
for (i = 0; i < N / 16; i++)
for (j = 0; j < 8; j++)
for (k = 0; k < 6; k++)
if (g[i][j][k] != 3 * (i < N / 16 - 1 && (j & 1) == 0 && k >= 3))
abort ();
}
return 0;
}

View File

@ -0,0 +1,30 @@
/* { dg-do run } */
/* { dg-additional-options "-msse2" { target sse2_runtime } } */
/* { dg-additional-options "-mavx" { target avx_runtime } } */
extern void abort (void);
int a[1024], b = -1;
int
main ()
{
int i;
#pragma omp parallel for simd ordered
for (i = 0; i < 1024; i++)
{
a[i] = i;
#pragma omp ordered threads simd
{
if (b + 1 != i)
abort ();
b = i;
}
a[i] += 3;
}
if (b != 1023)
abort ();
for (i = 0; i < 1024; i++)
if (a[i] != i + 3)
abort ();
return 0;
}

View File

@ -0,0 +1,62 @@
/* { dg-do run } */
/* { dg-set-target-env-var OMP_MAX_TASK_PRIORITY "10" } */
/* This test verifies that the "priority" clause of omp task works as
advertised.
Testing the OpenMP task scheduler is a bit tricky, especially when
trying to determine what ran first (without explicitly calling
time() and/or synchronizing between threads). What we do here is
run in single threaded mode which guarantees that we won't run into
data races while accessing the "prio" array.
We give each task a priority from 0..63, while setting
OMP_MAX_TASK_PRIORITY to 10, which basically gives us 10 lower
priority tasks, and the rest scheduled to run earlier. We verify
that the priority < 10 tasks run last. */
#include <omp.h>
#include <stdlib.h>
#define N 64
int main()
{
int tsknum=0, prio[N];
int max_priority = omp_get_max_task_priority ();
int saved_tsknum = -1;
int i;
#pragma omp parallel num_threads(1)
#pragma omp single private (i)
{
for (i = 0; i < N; i++)
#pragma omp task priority(i ^ 1)
{
int t;
#pragma omp atomic capture seq_cst
t = tsknum++;
prio[t] = i ^ 1;
}
#pragma omp atomic read seq_cst
saved_tsknum = tsknum;
}
/* If any of the tasks have run before all tasks were created, don't
make any assumption on the task order. Otherwise, we should have
tasks with >= max_priority scheduled first in arbitrary order,
followed by the rest of tasks in decreasing priority order, as
there is only one thread that can schedule them. */
if (saved_tsknum == 0)
{
for (i = 0; i < N; i++)
if (i < N - max_priority)
{
if (prio[i] < max_priority)
abort ();
}
else if (i != N - prio[i] - 1)
abort ();
}
return 0;
}

View File

@ -0,0 +1,163 @@
#include <omp.h>
#include <stdlib.h>
int a = 1, b = 2, c = 3, d = 4;
int e[2] = { 5, 6 }, f[2] = { 7, 8 }, g[2] = { 9, 10 }, h[2] = { 11, 12 };
__attribute__((noinline, noclone)) void
use (int *k, int *l, int *m, int *n, int *o, int *p, int *q, int *r)
{
asm volatile ("" : : "r" (k) : "memory");
asm volatile ("" : : "r" (l) : "memory");
asm volatile ("" : : "r" (m) : "memory");
asm volatile ("" : : "r" (n) : "memory");
asm volatile ("" : : "r" (o) : "memory");
asm volatile ("" : : "r" (p) : "memory");
asm volatile ("" : : "r" (q) : "memory");
asm volatile ("" : : "r" (r) : "memory");
}
#pragma omp declare target to (use)
int
main ()
{
int err = 0, r = -1, t[4];
long s[4] = { -1, -2, -3, -4 };
int j = 13, k = 14, l[2] = { 15, 16 }, m[2] = { 17, 18 };
#pragma omp target private (a, b, e, f) firstprivate (c, d, g, h) map(from: r, s, t) \
map(tofrom: err, j, l) map(to: k, m)
#pragma omp teams num_teams (4) thread_limit (8) private (b, f) firstprivate (d, h, k, m)
{
int u1 = k, u2[2] = { m[0], m[1] };
int u3[64];
int i;
for (i = 0; i < 64; i++)
u3[i] = k + i;
#pragma omp parallel num_threads (1)
{
if (c != 3 || d != 4 || g[0] != 9 || g[1] != 10 || h[0] != 11 || h[1] != 12 || k != 14 || m[0] != 17 || m[1] != 18)
#pragma omp atomic write
err = 1;
b = omp_get_team_num ();
if (b >= 4)
#pragma omp atomic write
err = 1;
if (b == 0)
{
a = omp_get_num_teams ();
e[0] = 2 * a;
e[1] = 3 * a;
}
f[0] = 2 * b;
f[1] = 3 * b;
#pragma omp atomic update
c++;
#pragma omp atomic update
g[0] += 2;
#pragma omp atomic update
g[1] += 3;
d++;
h[0] += 2;
h[1] += 3;
k += b;
m[0] += 2 * b;
m[1] += 3 * b;
}
use (&a, &b, &c, &d, e, f, g, h);
#pragma omp parallel firstprivate (u1, u2)
{
int w = omp_get_thread_num ();
int x = 19;
int y[2] = { 20, 21 };
int v = 24;
int ll[64];
if (u1 != 14 || u2[0] != 17 || u2[1] != 18)
#pragma omp atomic write
err = 1;
u1 += w;
u2[0] += 2 * w;
u2[1] += 3 * w;
use (&u1, u2, &t[b], l, &k, m, &j, h);
#pragma omp master
t[b] = omp_get_num_threads ();
#pragma omp atomic update
j++;
#pragma omp atomic update
l[0] += 2;
#pragma omp atomic update
l[1] += 3;
#pragma omp atomic update
k += 4;
#pragma omp atomic update
m[0] += 5;
#pragma omp atomic update
m[1] += 6;
x += w;
y[0] += 2 * w;
y[1] += 3 * w;
#pragma omp simd safelen(32) private (v)
for (i = 0; i < 64; i++)
{
v = 3 * i;
ll[i] = u1 + v * u2[0] + u2[1] + x + y[0] + y[1] + v + h[0] + u3[i];
}
#pragma omp barrier
use (&u1, u2, &t[b], l, &k, m, &x, y);
if (w < 0 || w > 8 || w != omp_get_thread_num () || u1 != 14 + w
|| u2[0] != 17 + 2 * w || u2[1] != 18 + 3 * w
|| x != 19 + w || y[0] != 20 + 2 * w || y[1] != 21 + 3 * w
|| v != 24)
#pragma omp atomic write
err = 1;
for (i = 0; i < 64; i++)
if (ll[i] != u1 + 3 * i * u2[0] + u2[1] + x + y[0] + y[1] + 3 * i + 13 + 14 + i)
#pragma omp atomic write
err = 1;
}
#pragma omp parallel num_threads (1)
{
if (b == 0)
{
r = a;
if (a != omp_get_num_teams ()
|| e[0] != 2 * a
|| e[1] != 3 * a)
#pragma omp atomic write
err = 1;
}
int v1, v2, v3;
#pragma omp atomic read
v1 = c;
#pragma omp atomic read
v2 = g[0];
#pragma omp atomic read
v3 = g[1];
s[b] = v1 * 65536L + v2 * 256L + v3;
if (d != 5 || h[0] != 13 || h[1] != 15
|| k != 14 + b + 4 * t[b]
|| m[0] != 17 + 2 * b + 5 * t[b]
|| m[1] != 18 + 3 * b + 6 * t[b]
|| b != omp_get_team_num ()
|| f[0] != 2 * b || f[1] != 3 * b)
#pragma omp atomic write
err = 1;
}
}
if (err != 0) abort ();
if (r < 1 || r > 4) abort ();
if (a != 1 || b != 2 || c != 3 || d != 4) abort ();
if (e[0] != 5 || e[1] != 6 || f[0] != 7 || f[1] != 8) abort ();
if (g[0] != 9 || g[1] != 10 || h[0] != 11 || h[1] != 12) abort ();
int i, cnt = 0;
for (i = 0; i < r; i++)
if ((s[i] >> 16) < 3 + 1 || (s[i] >> 16) > 3 + 4
|| ((s[i] >> 8) & 0xff) < 9 + 2 * 1 || ((s[i] >> 8) & 0xff) > 9 + 2 * 4
|| (s[i] & 0xff) < 10 + 3 * 1 || (s[i] & 0xff) > 10 + 3 * 4
|| t[i] < 1 || t[i] > 8)
abort ();
else
cnt += t[i];
if (j != 13 + cnt || l[0] != 15 + 2 * cnt || l[1] != 16 + 3 * cnt) abort ();
return 0;
}

View File

@ -0,0 +1,54 @@
#include <stdlib.h>
#include <unistd.h>
int main ()
{
int a = 0, b = 0, c = 0, d[7];
#pragma omp parallel
#pragma omp single
{
#pragma omp task depend(out: d[0])
a = 2;
#pragma omp target enter data nowait map(to: a,b,c) depend(in: d[0]) depend(out: d[1])
#pragma omp target nowait map(alloc: a) depend(in: d[1]) depend(out: d[2])
a++;
#pragma omp target nowait map(alloc: b) depend(in: d[2]) depend(out: d[3])
{
usleep (1000);
#pragma omp atomic update
b |= 4;
}
#pragma omp target nowait map(alloc: b) depend(in: d[2]) depend(out: d[4])
{
usleep (5000);
#pragma omp atomic update
b |= 1;
}
#pragma omp target nowait map(alloc: c) depend(in: d[3], d[4]) depend(out: d[5])
{
usleep (5000);
#pragma omp atomic update
c |= 8;
}
#pragma omp target nowait map(alloc: c) depend(in: d[3], d[4]) depend(out: d[6])
{
usleep (1000);
#pragma omp atomic update
c |= 2;
}
#pragma omp target exit data map(always,from: a,b,c) depend(in: d[5], d[6])
}
if (a != 3 || b != 5 || c != 10)
abort ();
return 0;
}

View File

@ -0,0 +1,93 @@
extern void abort (void);
int
main ()
{
int a = 1, b = 2, c = 4, d[7];
#pragma omp taskgroup
{
#pragma omp target enter data nowait map (to: a, b, c) depend(out: d[0])
#pragma omp target nowait map (alloc: a, b) depend(in: d[0]) depend(out: d[1])
{
#pragma omp atomic update
a |= 4;
#pragma omp atomic update
b |= 8;
}
#pragma omp target nowait map (alloc: a, c) depend(in: d[0]) depend(out: d[2])
{
#pragma omp atomic update
a |= 16;
#pragma omp atomic update
c |= 32;
}
#pragma omp target exit data nowait map (from: a, b, c) depend(in: d[1], d[2])
}
if (a != 21 || b != 10 || c != 36)
abort ();
#pragma omp target map (tofrom: a, b) nowait
{
a &= ~16;
b &= ~2;
}
#pragma omp target map (tofrom: c) nowait
{
c |= 8;
}
#pragma omp barrier
if (a != 5 || b != 8 || c != 44)
abort ();
#pragma omp target map (tofrom: a, b) nowait
{
a |= 32;
b |= 4;
}
#pragma omp target map (tofrom: c) nowait
{
c &= ~4;
}
#pragma omp taskwait
if (a != 37 || b != 12 || c != 40)
abort ();
#pragma omp target nowait map (tofrom: a, b) depend(out: d[3])
{
#pragma omp atomic update
a = a + 9;
b -= 8;
}
#pragma omp target nowait map (tofrom: a, c) depend(out: d[4])
{
#pragma omp atomic update
a = a + 4;
c >>= 1;
}
#pragma omp task if (0) depend (in: d[3], d[4]) shared (a, b, c)
if (a != 50 || b != 4 || c != 20)
abort ();
#pragma omp task shared (a)
a += 50;
#pragma omp target nowait map (tofrom: b)
b++;
#pragma omp target map (tofrom: c) nowait
c--;
#pragma omp taskwait
if (a != 100 || b != 5 || c != 19)
abort ();
#pragma omp target map (tofrom: a) nowait depend(out: d[5])
a++;
#pragma omp target map (tofrom: b) nowait depend(out: d[6])
b++;
#pragma omp target map (tofrom: a, b) depend(in: d[5], d[6])
{
if (a != 101 || b != 6)
a = -9;
else
{
a = 24;
b = 38;
}
}
if (a != 24 || b != 38)
abort ();
return 0;
}

View File

@ -0,0 +1,112 @@
extern void abort (void);
int
main ()
{
int a = 1, b = 2, c = 4, d[7];
#pragma omp parallel
{
#pragma omp single
{
#pragma omp taskgroup
{
#pragma omp target enter data nowait map (to: a, b, c) depend(out: d[0])
#pragma omp target nowait map (alloc: a, b) depend(in: d[0]) depend(out: d[1])
{
#pragma omp atomic update
a |= 4;
#pragma omp atomic update
b |= 8;
}
#pragma omp target nowait map (alloc: a, c) depend(in: d[0]) depend(out: d[2])
{
#pragma omp atomic update
a |= 16;
#pragma omp atomic update
c |= 32;
}
#pragma omp target exit data nowait map (from: a, b, c) depend(in: d[1], d[2])
}
if (a != 21 || b != 10 || c != 36)
abort ();
#pragma omp target map (tofrom: a, b) nowait
{
a &= ~16;
b &= ~2;
}
#pragma omp target map (tofrom: c) nowait
{
c |= 8;
}
} /* Implicit barrier here. */
#pragma omp single
{
if (a != 5 || b != 8 || c != 44)
abort ();
#pragma omp target map (tofrom: a, b) nowait
{
a |= 32;
b |= 4;
}
#pragma omp target map (tofrom: c) nowait
c &= ~4;
#pragma omp taskwait
if (a != 37 || b != 12 || c != 40)
abort ();
#pragma omp target nowait map (tofrom: a, b) depend(out: d[3])
{
#pragma omp atomic update
a = a + 9;
b -= 8;
}
#pragma omp target nowait map (tofrom: a, c) depend(out: d[4])
{
#pragma omp atomic update
a = a + 4;
c >>= 1;
}
#pragma omp task if (0) depend (in: d[3], d[4]) shared (a, b, c)
if (a != 50 || b != 4 || c != 20)
abort ();
#pragma omp task shared (a)
a += 50;
#pragma omp target nowait map (tofrom: b)
b++;
#pragma omp target map (tofrom: c) nowait
c--;
#pragma omp taskwait
if (a != 100 || b != 5 || c != 19)
abort ();
#pragma omp target map (tofrom: a) nowait depend(out: d[5])
a++;
#pragma omp target map (tofrom: b) nowait depend(out: d[6])
b++;
#pragma omp target map (tofrom: a, b) depend(in: d[5], d[6])
{
if (a != 101 || b != 6)
a = -9;
else
{
a = 24;
b = 38;
}
}
if (a != 24 || b != 38)
abort ();
} /* Implicit barrier here. */
#pragma omp master
{
#pragma omp target nowait map (tofrom: a, b)
{
a *= 2;
b++;
}
#pragma omp target map (tofrom: c) nowait
c--;
}
#pragma omp barrier
if (a != 48 || b != 39 || c != 18)
abort ();
}
return 0;
}

View File

@ -1,3 +1,20 @@
2015-11-14 Ilya Verbin <ilya.verbin@intel.com>
* runtime/offload_host.cpp (task_completion_callback): New
variable.
(offload_proxy_task_completed_ooo): Call task_completion_callback.
(__offload_register_task_callback): New function.
* runtime/offload_host.h (__offload_register_task_callback): New
declaration.
* plugin/libgomp-plugin-intelmic.cpp (offload): Add async_data
argument, handle async offloading.
(register_main_image): Call register_main_image.
(GOMP_OFFLOAD_init_device, get_target_table, GOMP_OFFLOAD_alloc,
GOMP_OFFLOAD_free, GOMP_OFFLOAD_host2dev, GOMP_OFFLOAD_dev2host,
GOMP_OFFLOAD_dev2dev) Adjust offload callers.
(GOMP_OFFLOAD_async_run): New function.
(GOMP_OFFLOAD_run): Implement using GOMP_OFFLOAD_async_run.
2015-10-26 Ilya Verbin <ilya.verbin@intel.com>
Aleksander Ivanushenko <aleksander.ivanushenko@intel.com>

View File

@ -192,11 +192,23 @@ GOMP_OFFLOAD_get_num_devices (void)
static void
offload (const char *file, uint64_t line, int device, const char *name,
int num_vars, VarDesc *vars, VarDesc2 *vars2)
int num_vars, VarDesc *vars, VarDesc2 *vars2, const void **async_data)
{
OFFLOAD ofld = __offload_target_acquire1 (&device, file, line);
if (ofld)
__offload_offload1 (ofld, name, 0, num_vars, vars, vars2, 0, NULL, NULL);
{
if (async_data == NULL)
__offload_offload1 (ofld, name, 0, num_vars, vars, vars2, 0, NULL,
NULL);
else
{
OffloadFlags flags;
flags.flags = 0;
flags.bits.omp_async = 1;
__offload_offload3 (ofld, name, 0, num_vars, vars, NULL, 0, NULL,
async_data, 0, NULL, flags, NULL);
}
}
else
{
fprintf (stderr, "%s:%d: Offload target acquire failed\n", file, line);
@ -208,6 +220,10 @@ static void
register_main_image ()
{
__offload_register_image (&main_target_image);
/* liboffloadmic will call GOMP_PLUGIN_target_task_completion when
asynchronous task on target is completed. */
__offload_register_task_callback (GOMP_PLUGIN_target_task_completion);
}
/* liboffloadmic loads and runs offload_target_main on all available devices
@ -218,7 +234,7 @@ GOMP_OFFLOAD_init_device (int device)
TRACE ("");
pthread_once (&main_image_is_registered, register_main_image);
offload (__FILE__, __LINE__, device, "__offload_target_init_proc", 0,
NULL, NULL);
NULL, NULL, NULL);
}
extern "C" void
@ -240,7 +256,7 @@ get_target_table (int device, int &num_funcs, int &num_vars, void **&table)
VarDesc2 vd1g[2] = { { "num_funcs", 0 }, { "num_vars", 0 } };
offload (__FILE__, __LINE__, device, "__offload_target_table_p1", 2,
vd1, vd1g);
vd1, vd1g, NULL);
int table_size = num_funcs + 2 * num_vars;
if (table_size > 0)
@ -254,7 +270,7 @@ get_target_table (int device, int &num_funcs, int &num_vars, void **&table)
VarDesc2 vd2g = { "table", 0 };
offload (__FILE__, __LINE__, device, "__offload_target_table_p2", 1,
&vd2, &vd2g);
&vd2, &vd2g, NULL);
}
}
@ -401,8 +417,8 @@ GOMP_OFFLOAD_alloc (int device, size_t size)
vd1[1].size = sizeof (void *);
VarDesc2 vd1g[2] = { { "size", 0 }, { "tgt_ptr", 0 } };
offload (__FILE__, __LINE__, device, "__offload_target_alloc", 2, vd1, vd1g);
offload (__FILE__, __LINE__, device, "__offload_target_alloc", 2, vd1, vd1g,
NULL);
return tgt_ptr;
}
@ -416,7 +432,8 @@ GOMP_OFFLOAD_free (int device, void *tgt_ptr)
vd1.size = sizeof (void *);
VarDesc2 vd1g = { "tgt_ptr", 0 };
offload (__FILE__, __LINE__, device, "__offload_target_free", 1, &vd1, &vd1g);
offload (__FILE__, __LINE__, device, "__offload_target_free", 1, &vd1, &vd1g,
NULL);
}
extern "C" void *
@ -435,7 +452,7 @@ GOMP_OFFLOAD_host2dev (int device, void *tgt_ptr, const void *host_ptr,
VarDesc2 vd1g[2] = { { "tgt_ptr", 0 }, { "size", 0 } };
offload (__FILE__, __LINE__, device, "__offload_target_host2tgt_p1", 2,
vd1, vd1g);
vd1, vd1g, NULL);
VarDesc vd2 = vd_host2tgt;
vd2.ptr = (void *) host_ptr;
@ -443,7 +460,7 @@ GOMP_OFFLOAD_host2dev (int device, void *tgt_ptr, const void *host_ptr,
VarDesc2 vd2g = { "var", 0 };
offload (__FILE__, __LINE__, device, "__offload_target_host2tgt_p2", 1,
&vd2, &vd2g);
&vd2, &vd2g, NULL);
return tgt_ptr;
}
@ -464,7 +481,7 @@ GOMP_OFFLOAD_dev2host (int device, void *host_ptr, const void *tgt_ptr,
VarDesc2 vd1g[2] = { { "tgt_ptr", 0 }, { "size", 0 } };
offload (__FILE__, __LINE__, device, "__offload_target_tgt2host_p1", 2,
vd1, vd1g);
vd1, vd1g, NULL);
VarDesc vd2 = vd_tgt2host;
vd2.ptr = (void *) host_ptr;
@ -472,7 +489,7 @@ GOMP_OFFLOAD_dev2host (int device, void *host_ptr, const void *tgt_ptr,
VarDesc2 vd2g = { "var", 0 };
offload (__FILE__, __LINE__, device, "__offload_target_tgt2host_p2", 1,
&vd2, &vd2g);
&vd2, &vd2g, NULL);
return host_ptr;
}
@ -495,22 +512,32 @@ GOMP_OFFLOAD_dev2dev (int device, void *dst_ptr, const void *src_ptr,
VarDesc2 vd1g[3] = { { "dst_ptr", 0 }, { "src_ptr", 0 }, { "size", 0 } };
offload (__FILE__, __LINE__, device, "__offload_target_tgt2tgt", 3, vd1,
vd1g);
vd1g, NULL);
return dst_ptr;
}
extern "C" void
GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars,
void *async_data)
{
TRACE ("(device = %d, tgt_fn = %p, tgt_vars = %p, async_data = %p)", device,
tgt_fn, tgt_vars, async_data);
VarDesc vd[2] = { vd_host2tgt, vd_host2tgt };
vd[0].ptr = &tgt_fn;
vd[0].size = sizeof (void *);
vd[1].ptr = &tgt_vars;
vd[1].size = sizeof (void *);
offload (__FILE__, __LINE__, device, "__offload_target_run", 2, vd, NULL,
(const void **) async_data);
}
extern "C" void
GOMP_OFFLOAD_run (int device, void *tgt_fn, void *tgt_vars)
{
TRACE ("(tgt_fn = %p, tgt_vars = %p)", tgt_fn, tgt_vars);
TRACE ("(device = %d, tgt_fn = %p, tgt_vars = %p)", device, tgt_fn, tgt_vars);
VarDesc vd1[2] = { vd_host2tgt, vd_host2tgt };
vd1[0].ptr = &tgt_fn;
vd1[0].size = sizeof (void *);
vd1[1].ptr = &tgt_vars;
vd1[1].size = sizeof (void *);
VarDesc2 vd1g[2] = { { "tgt_fn", 0 }, { "tgt_vars", 0 } };
offload (__FILE__, __LINE__, device, "__offload_target_run", 2, vd1, vd1g);
GOMP_OFFLOAD_async_run (device, tgt_fn, tgt_vars, NULL);
}

View File

@ -64,6 +64,8 @@ static void __offload_fini_library(void);
#define GET_OFFLOAD_NUMBER(timer_data) \
timer_data? timer_data->offload_number : 0
static void (*task_completion_callback)(void *);
extern "C" {
#ifdef TARGET_WINNT
// Windows does not support imports from libraries without actually
@ -2507,7 +2509,7 @@ extern "C" {
const void *info
)
{
/* TODO: Call callback function, pass info. */
task_completion_callback ((void *) info);
}
}
@ -5669,6 +5671,11 @@ extern "C" void __offload_unregister_image(const void *target_image)
}
}
extern "C" void __offload_register_task_callback(void (*cb)(void *))
{
task_completion_callback = cb;
}
// Runtime trace interface for user programs
void __offload_console_trace(int level)

View File

@ -376,6 +376,9 @@ extern "C" bool __offload_target_image_is_executable(const void *target_image);
extern "C" bool __offload_register_image(const void* image);
extern "C" void __offload_unregister_image(const void* image);
// Registers asynchronous task completion callback
extern "C" void __offload_register_task_callback(void (*cb)(void *));
// Initializes offload runtime library.
DLL_LOCAL extern int __offload_init_library(void);