gcc/liboffloadmic/runtime/offload_host.h
Jakub Jelinek e460634820 omp-low.c (lower_omp_ordered): Add argument to GOMP_SMD_ORDERED_* internal calls - 0 if...
gcc/
2015-11-14  Jakub Jelinek  <jakub@redhat.com>

	* omp-low.c (lower_omp_ordered): Add argument to GOMP_SMD_ORDERED_*
	internal calls - 0 if ordered simd and 1 for ordered threads simd.
	* tree-vectorizer.c (adjust_simduid_builtins): If GOMP_SIMD_ORDERED_*
	argument is 1, replace it with GOMP_ordered_* call instead of removing
	it.
gcc/c/
2015-11-14  Jakub Jelinek  <jakub@redhat.com>

	* c-typeck.c (c_finish_omp_clauses): Don't mark
	GOMP_MAP_FIRSTPRIVATE_POINTER decls addressable.
gcc/cp/
2015-11-14  Jakub Jelinek  <jakub@redhat.com>

	* semantics.c (finish_omp_clauses): Don't mark
	GOMP_MAP_FIRSTPRIVATE_POINTER decls addressable.
libgomp/
2015-11-14  Jakub Jelinek  <jakub@redhat.com>
	    Aldy Hernandez  <aldyh@redhat.com>
	    Ilya Verbin  <ilya.verbin@intel.com>

	* ordered.c (gomp_doacross_init, GOMP_doacross_post,
	GOMP_doacross_wait, gomp_doacross_ull_init, GOMP_doacross_ull_post,
	GOMP_doacross_ull_wait): For GFS_GUIDED don't divide number of
	iterators or IV by chunk size.
	* parallel.c (gomp_resolve_num_threads): Don't assume that
	if thr->ts.team is non-NULL, then pool must be non-NULL.
	* libgomp-plugin.h (GOMP_PLUGIN_target_task_completion): Declare.
	* libgomp.map (GOMP_PLUGIN_1.1): New symbol version, export
	GOMP_PLUGIN_target_task_completion.
	* Makefile.am (libgomp_la_SOURCES): Add priority_queue.c.
	* Makefile.in: Regenerate.
	* libgomp.h: Shuffle prototypes and forward definitions around so
	priority queues can be defined.
	(enum gomp_task_kind): Add GOMP_TASK_ASYNC_RUNNING.
	(enum gomp_target_task_state): New enum.
	(struct gomp_target_task): Add state, tgt, task and team fields.
	(gomp_create_target_task): Change return type to bool, add
	state argument.
	(gomp_target_task_fn): Change return type to bool.
	(struct gomp_device_descr): Add async_run_func.
	(struct gomp_task): Remove children, next_child, prev_child,
	next_queue, prev_queue, next_taskgroup, prev_taskgroup.
	Add pnode field.
	(struct gomp_taskgroup): Remove children.
	Add taskgroup_queue.
	(struct gomp_team): Change task_queue type to a priority queue.
	(splay_compare): Define inline.
	(priority_queue_offset): New.
	(priority_node_to_task): New.
	(task_to_priority_node): New.
	* oacc-mem.c: Do not include splay-tree.h.
	* priority_queue.c: New file.
	* priority_queue.h: New file.
	* splay-tree.c: Do not include splay-tree.h.
	(splay_tree_foreach_internal): New.
	(splay_tree_foreach): New.
	* splay-tree.h: Become re-entrant if splay_tree_prefix is defined.
	(splay_tree_callback): Define typedef.
	* target.c (splay_compare): Move to libgomp.h.
	(GOMP_target): Don't adjust *thr in any way around running offloaded
	task.
	(GOMP_target_ext): Likewise.  Handle target nowait.
	(GOMP_target_update_ext, GOMP_target_enter_exit_data): Check
	return value from gomp_create_target_task, if false, fallthrough
	as if no dependencies exist.
	(gomp_target_task_fn): Change return type to bool, return true
	if the task should have another part scheduled later.  Handle
	target nowait.
	(gomp_load_plugin_for_device): Initialize async_run.
	* task.c (gomp_init_task): Initialize children_queue.
	(gomp_clear_parent_in_list): New.
	(gomp_clear_parent_in_tree): New.
	(gomp_clear_parent): Handle priorities.
	(GOMP_task): Likewise.
	(priority_queue_move_task_first,
	gomp_target_task_completion, GOMP_PLUGIN_target_task_completion):
	New functions.
	(gomp_create_target_task): Use priority queues.  Change return type
	to bool, add state argument, return false if for async
	{{enter,exit} data,update} constructs no dependencies need to be
	waited for, handle target nowait.  Set task->fn to NULL instead of
	gomp_target_task_fn.
	(verify_children_queue): Remove.
	(priority_list_upgrade_task): New.
	(priority_queue_upgrade_task): New.
	(verify_task_queue): Remove.
	(priority_list_downgrade_task): New.
	(priority_queue_downgrade_task): New.
	(gomp_task_run_pre): Use priority queues.
	Abstract code out to priority_queue_downgrade_task.
	(gomp_task_run_post_handle_dependers): Use priority queues.
	(gomp_task_run_post_remove_parent): Likewise.
	(gomp_task_run_post_remove_taskgroup): Likewise.
	(gomp_barrier_handle_tasks): Likewise.  Handle target nowait target
	tasks specially.
	(GOMP_taskwait): Likewise.
	(gomp_task_maybe_wait_for_dependencies): Likewise.  Abstract code to
	priority-queue_upgrade_task.
	(GOMP_taskgroup_start): Use priority queues.
	(GOMP_taskgroup_end): Likewise.  Handle target nowait target tasks
	specially.  If taskgroup is NULL, and thr->ts.level is 0, act as a
	barrier.
	* taskloop.c (GOMP_taskloop): Handle priorities.
	* team.c (gomp_new_team): Call priority_queue_init.
	(free_team): Call priority_queue_free.
	(gomp_free_thread): Call gomp_team_end if thr->ts.team is artificial
	team created for target nowait in implicit parallel region.
	(gomp_team_start): For nested check, test thr->ts.level instead of
	thr->ts.team != NULL.
	* testsuite/libgomp.c/doacross-3.c: New test.
	* testsuite/libgomp.c/ordered-5.c: New test.
	* testsuite/libgomp.c/priority.c: New test.
	* testsuite/libgomp.c/target-31.c: New test.
	* testsuite/libgomp.c/target-32.c: New test.
	* testsuite/libgomp.c/target-33.c: New test.
	* testsuite/libgomp.c/target-34.c: New test.
liboffloadmic/
2015-11-14  Ilya Verbin  <ilya.verbin@intel.com>

	* runtime/offload_host.cpp (task_completion_callback): New
	variable.
	(offload_proxy_task_completed_ooo): Call task_completion_callback.
	(__offload_register_task_callback): New function.
	* runtime/offload_host.h (__offload_register_task_callback): New
	declaration.
	* plugin/libgomp-plugin-intelmic.cpp (offload): Add async_data
	argument, handle async offloading.
	(register_main_image): Call register_main_image.
	(GOMP_OFFLOAD_init_device, get_target_table, GOMP_OFFLOAD_alloc,
	GOMP_OFFLOAD_free, GOMP_OFFLOAD_host2dev, GOMP_OFFLOAD_dev2host,
	GOMP_OFFLOAD_dev2dev) Adjust offload callers.
	(GOMP_OFFLOAD_async_run): New function.
	(GOMP_OFFLOAD_run): Implement using GOMP_OFFLOAD_async_run.

From-SVN: r230381
2015-11-14 19:42:13 +01:00

462 lines
15 KiB
C++

/*
Copyright (c) 2014-2015 Intel Corporation. All Rights Reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of Intel Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/*! \file
\brief The parts of the runtime library used only on the host
*/
#ifndef OFFLOAD_HOST_H_INCLUDED
#define OFFLOAD_HOST_H_INCLUDED
#ifndef TARGET_WINNT
#include <unistd.h>
#endif // TARGET_WINNT
#include "offload_common.h"
#include "offload_util.h"
#include "offload_engine.h"
#include "offload_env.h"
#include "offload_orsl.h"
#include "coi/coi_client.h"
// MIC engines.
DLL_LOCAL extern Engine* mic_engines;
DLL_LOCAL extern uint32_t mic_engines_total;
// DMA channel count used by COI and set via
// OFFLOAD_DMA_CHANNEL_COUNT environment variable
DLL_LOCAL extern uint32_t mic_dma_channel_count;
//! The target image is packed as follows.
/*! 1. 8 bytes containing the size of the target binary */
/*! 2. a null-terminated string which is the binary name */
/*! 3. <size> number of bytes that are the contents of the image */
/*! The address of symbol __offload_target_image
is the address of this structure. */
struct Image {
int64_t size; //!< Size in bytes of the target binary name and contents
char data[]; //!< The name and contents of the target image
};
// The offload descriptor.
class OffloadDescriptor
{
public:
enum OmpAsyncLastEventType {
c_last_not, // not last event
c_last_write, // the last event that is write
c_last_read, // the last event that is read
c_last_runfunc // the last event that is runfunction
};
OffloadDescriptor(
int index,
_Offload_status *status,
bool is_mandatory,
bool is_openmp,
OffloadHostTimerData * timer_data
) :
m_device(mic_engines[index == -1 ? 0 : index % mic_engines_total]),
m_is_mandatory(is_mandatory),
m_is_openmp(is_openmp),
m_inout_buf(0),
m_func_desc(0),
m_func_desc_size(0),
m_in_deps(0),
m_in_deps_total(0),
m_in_deps_allocated(0),
m_out_deps(0),
m_out_deps_total(0),
m_out_deps_allocated(0),
m_vars(0),
m_vars_extra(0),
m_status(status),
m_timer_data(timer_data),
m_out_with_preallocated(false),
m_preallocated_alloc(false),
m_traceback_called(false),
m_stream(-1),
m_omp_async_last_event_type(c_last_not)
{
m_wait_all_devices = index == -1;
}
~OffloadDescriptor()
{
if (m_in_deps != 0) {
free(m_in_deps);
}
if (m_out_deps != 0) {
free(m_out_deps);
}
if (m_func_desc != 0) {
free(m_func_desc);
}
if (m_vars != 0) {
free(m_vars);
free(m_vars_extra);
}
}
bool offload(const char *name, bool is_empty,
VarDesc *vars, VarDesc2 *vars2, int vars_total,
const void **waits, int num_waits, const void **signal,
int entry_id, const void *stack_addr,
OffloadFlags offload_flags);
bool offload_finish(bool is_traceback);
bool is_signaled();
OffloadHostTimerData* get_timer_data() const {
return m_timer_data;
}
void set_stream(_Offload_stream stream) {
m_stream = stream;
}
_Offload_stream get_stream() {
return(m_stream);
}
private:
bool offload_wrap(const char *name, bool is_empty,
VarDesc *vars, VarDesc2 *vars2, int vars_total,
const void **waits, int num_waits, const void **signal,
int entry_id, const void *stack_addr,
OffloadFlags offload_flags);
bool wait_dependencies(const void **waits, int num_waits,
_Offload_stream stream);
bool setup_descriptors(VarDesc *vars, VarDesc2 *vars2, int vars_total,
int entry_id, const void *stack_addr);
bool setup_misc_data(const char *name);
bool send_pointer_data(bool is_async, void* info);
bool send_noncontiguous_pointer_data(
int i,
PtrData* src_buf,
PtrData* dst_buf,
COIEVENT *event,
uint64_t &sent_data,
uint32_t in_deps_amount,
COIEVENT *in_deps
);
bool receive_noncontiguous_pointer_data(
int i,
COIBUFFER dst_buf,
COIEVENT *event,
uint64_t &received_data,
uint32_t in_deps_amount,
COIEVENT *in_deps
);
bool gather_copyin_data();
bool compute(void *);
bool receive_pointer_data(bool is_async, bool first_run, void * info);
bool scatter_copyout_data();
void cleanup();
bool find_ptr_data(PtrData* &ptr_data, void *base, int64_t disp,
int64_t length, bool is_targptr,
bool error_does_not_exist = true);
bool alloc_ptr_data(PtrData* &ptr_data, void *base, int64_t disp,
int64_t length, int64_t alloc_disp, int align,
bool is_targptr, bool is_prealloc, bool pin);
bool create_preallocated_buffer(PtrData* ptr_data, void *base);
bool init_static_ptr_data(PtrData *ptr_data);
bool init_mic_address(PtrData *ptr_data);
bool offload_stack_memory_manager(const void * stack_begin, int routine_id,
int buf_size, int align, bool *is_new);
bool nullify_target_stack(COIBUFFER targ_buf, uint64_t size);
bool gen_var_descs_for_pointer_array(int i);
void get_stream_in_dependencies(uint32_t &in_deps_amount,
COIEVENT* &in_deps);
void report_coi_error(error_types msg, COIRESULT res);
_Offload_result translate_coi_error(COIRESULT res) const;
void setup_omp_async_info();
void register_omp_event_call_back(const COIEVENT *event, const void *info);
private:
typedef std::list<COIBUFFER> BufferList;
// extra data associated with each variable descriptor
struct VarExtra {
PtrData* src_data;
PtrData* dst_data;
AutoData* auto_data;
int64_t cpu_disp;
int64_t cpu_offset;
void *alloc;
CeanReadRanges *read_rng_src;
CeanReadRanges *read_rng_dst;
int64_t ptr_arr_offset;
bool is_arr_ptr_el;
OmpAsyncLastEventType omp_last_event_type;
};
template<typename T> class ReadArrElements {
public:
ReadArrElements():
ranges(NULL),
el_size(sizeof(T)),
offset(0),
count(0),
is_empty(true),
base(NULL)
{}
bool read_next(bool flag)
{
if (flag != 0) {
if (is_empty) {
if (ranges) {
if (!get_next_range(ranges, &offset)) {
// ranges are over
return false;
}
}
// all contiguous elements are over
else if (count != 0) {
return false;
}
length_cur = size;
}
else {
offset += el_size;
}
val = (T)get_el_value(base, offset, el_size);
length_cur -= el_size;
count++;
is_empty = length_cur == 0;
}
return true;
}
public:
CeanReadRanges * ranges;
T val;
int el_size;
int64_t size,
offset,
length_cur;
bool is_empty;
int count;
char *base;
};
// ptr_data for persistent auto objects
PtrData* m_stack_ptr_data;
PtrDataList m_destroy_stack;
// Engine
Engine& m_device;
// true for offload_wait target(mic) stream(0)
bool m_wait_all_devices;
// if true offload is mandatory
bool m_is_mandatory;
// if true offload has openmp origin
const bool m_is_openmp;
// The Marshaller for the inputs of the offloaded region.
Marshaller m_in;
// The Marshaller for the outputs of the offloaded region.
Marshaller m_out;
// List of buffers that are passed to dispatch call
BufferList m_compute_buffers;
// List of buffers that need to be destroyed at the end of offload
BufferList m_destroy_buffers;
// Variable descriptors
VarDesc* m_vars;
VarExtra* m_vars_extra;
int m_vars_total;
// Pointer to a user-specified status variable
_Offload_status *m_status;
// Function descriptor
FunctionDescriptor* m_func_desc;
uint32_t m_func_desc_size;
// Buffer for transferring copyin/copyout data
COIBUFFER m_inout_buf;
// Dependencies
COIEVENT *m_in_deps;
uint32_t m_in_deps_total;
uint32_t m_in_deps_allocated;
COIEVENT *m_out_deps;
uint32_t m_out_deps_total;
uint32_t m_out_deps_allocated;
// Stream
_Offload_stream m_stream;
// Timer data
OffloadHostTimerData *m_timer_data;
// copyin/copyout data length
uint64_t m_in_datalen;
uint64_t m_out_datalen;
// a boolean value calculated in setup_descriptors. If true we need to do
// a run function on the target. Otherwise it may be optimized away.
bool m_need_runfunction;
// initialized value of m_need_runfunction;
// is used to recognize offload_transfer
bool m_initial_need_runfunction;
// a Boolean value set to true when OUT clauses with preallocated targetptr
// is encountered to indicate that call receive_pointer_data needs to be
// invoked again after call to scatter_copyout_data.
bool m_out_with_preallocated;
// a Boolean value set to true if an alloc_if(1) is used with preallocated
// targetptr to indicate the need to scatter_copyout_data even for
// async offload
bool m_preallocated_alloc;
// a Boolean value set to true if traceback routine is called
bool m_traceback_called;
OmpAsyncLastEventType m_omp_async_last_event_type;
};
// Initialization types for MIC
enum OffloadInitType {
c_init_on_start, // all devices before entering main
c_init_on_offload, // single device before starting the first offload
c_init_on_offload_all // all devices before starting the first offload
};
// Determines if MIC code is an executable or a shared library
extern "C" bool __offload_target_image_is_executable(const void *target_image);
// Initializes library and registers specified offload image.
extern "C" bool __offload_register_image(const void* image);
extern "C" void __offload_unregister_image(const void* image);
// Registers asynchronous task completion callback
extern "C" void __offload_register_task_callback(void (*cb)(void *));
// Initializes offload runtime library.
DLL_LOCAL extern int __offload_init_library(void);
// thread data for associating pipelines with threads
DLL_LOCAL extern pthread_key_t mic_thread_key;
// location of offload_main executable
// To be used if the main application has no offload and is not built
// with -offload but dynamic library linked in has offload pragma
DLL_LOCAL extern char* mic_device_main;
// Environment variables for devices
DLL_LOCAL extern MicEnvVar mic_env_vars;
// CPU frequency
DLL_LOCAL extern uint64_t cpu_frequency;
// LD_LIBRARY_PATH for MIC libraries
DLL_LOCAL extern char* mic_library_path;
// stack size for target
DLL_LOCAL extern uint32_t mic_stack_size;
// Preallocated memory size for buffers on MIC
DLL_LOCAL extern uint64_t mic_buffer_size;
// Preallocated 4K page memory size for buffers on MIC
DLL_LOCAL extern uint64_t mic_4k_buffer_size;
// Preallocated 2M page memory size for buffers on MIC
DLL_LOCAL extern uint64_t mic_2m_buffer_size;
// Setting controlling inout proxy
DLL_LOCAL extern bool mic_proxy_io;
DLL_LOCAL extern char* mic_proxy_fs_root;
// Threshold for creating buffers with large pages
DLL_LOCAL extern uint64_t __offload_use_2mb_buffers;
// offload initialization type
DLL_LOCAL extern OffloadInitType __offload_init_type;
// Device number to offload to when device is not explicitly specified.
DLL_LOCAL extern int __omp_device_num;
// target executable
DLL_LOCAL extern TargetImage* __target_exe;
// IDB support
// Called by the offload runtime after initialization of offload infrastructure
// has been completed.
extern "C" void __dbg_target_so_loaded();
// Called by the offload runtime when the offload infrastructure is about to be
// shut down, currently at application exit.
extern "C" void __dbg_target_so_unloaded();
// Null-terminated string containing path to the process image of the hosting
// application (offload_main)
#define MAX_TARGET_NAME 512
extern "C" char __dbg_target_exe_name[MAX_TARGET_NAME];
// Integer specifying the process id
extern "C" pid_t __dbg_target_so_pid;
// Integer specifying the 0-based device number
extern "C" int __dbg_target_id;
// Set to non-zero by the host-side debugger to enable offload debugging
// support
extern "C" int __dbg_is_attached;
// Major version of the debugger support API
extern "C" const int __dbg_api_major_version;
// Minor version of the debugger support API
extern "C" const int __dbg_api_minor_version;
#endif // OFFLOAD_HOST_H_INCLUDED