libgomp/
* plugin/plugin-gcn.c (parse_target_attributes): Automatically set
the number of teams and threads if necessary.
(gcn_exec): Automatically set the number of gangs and workers if
necessary.
Co-Authored-By: Andrew Stubbs <ams@codesourcery.com>
In the patch that implemented omp_get_device_num(), there was an error where
the stringification of GOMP_DEVICE_NUM_VAR, which is the macro expanding to
the actual symbol used, was erroneously using the STRINGX() macro in the
libgomp offload image symbol search, and expansion of the variable name
string through the additional layer of preprocessor symbol was not properly
achieved.
This patch fixes this by changing to properly use XSTRING(), also from
include/symcat.h.
libgomp/ChangeLog:
* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Change uses of STRINGX
into XSTRING when looking for GOMP_DEVICE_NUM_VAR in offload image.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.
Up to now the libgomp GCN plugin has been finding the offload variables
by using a symbol lookup, but the AMD runtime requires that the symbols are
global for that to work. This was ensured by mkoffload as a post-procssing
step, but the LLVM 13 assembler no longer accepts this in the case where the
variable was previously declared differently.
This patch switches to locating the symbols directly from the
offload_var_table, which means that only one symbol needs to be forced
global.
This changes breaks the libgomp image compatibility so GOMP_VERSION_GCN has
also been bumped.
gcc/ChangeLog:
* config/gcn/mkoffload.c (process_asm): Process the variable table
completely differently.
(process_obj): Encode the varaible data differently.
include/ChangeLog:
* gomp-constants.h (GOMP_VERSION_GCN): Bump.
libgomp/ChangeLog:
* plugin/plugin-gcn.c (struct gcn_image_desc): Remove global_variables.
(GOMP_OFFLOAD_load_image): Locate the offload variables via the
table, not individual symbols.
This patch implements the omp_get_device_num library routine, specified in
OpenMP 5.0.
GOMP_DEVICE_NUM_VAR is a macro symbol which defines name of a "device number"
variable, is defined on the device-side libgomp, has it's address returned to
host-side libgomp during device initialization, and the host libgomp then
sets its value to the designated device number.
libgomp/ChangeLog:
* icv-device.c (omp_get_device_num): New API function, host side.
* fortran.c (omp_get_device_num_): New interface function.
* libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol.
* libgomp.map (OMP_5.0.2): New version space with omp_get_device_num,
omp_get_device_num_.
* libgomp.texi (omp_get_device_num): Add documentation for new API
function.
* omp.h.in (omp_get_device_num): Add declaration.
* omp_lib.f90.in (omp_get_device_num): Likewise.
* omp_lib.h.in (omp_get_device_num): Likewise.
* target.c (gomp_load_image_to_device): If additional entry for device
number exists at end of returned entries from 'load_image_func' hook,
copy the assigned device number over to the device variable.
* config/gcn/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
(omp_get_device_num): New API function, device side.
* plugin/plugin-gcn.c ("symcat.h"): Add include.
(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
at end of returned 'target_table' entries.
* config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
(omp_get_device_num): New API function, device side.
* plugin/plugin-nvptx.c ("symcat.h"): Add include.
(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
at end of returned 'target_table' entries.
* testsuite/lib/libgomp.exp
(check_effective_target_offload_target_intelmic): New function for
testing for intelmic offloading.
* testsuite/libgomp.c-c++-common/target-45.c: New test.
* testsuite/libgomp.fortran/target10.f90: New test.
This patch fixes several places in libgomp/target.c where "ephemeral" data
(on the stack or in temporary heap locations) may be used as the source of
an asynchronous host-to-device copy that may not complete before the host
data disappears.
An existing, but flawed, workaround for this problem in the AMD GCN
libgomp offloading plugin is currently present on mainline, and was
posted for the og9 branch here:
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-08/msg00901.html
and previous versions of this patch were posted here (for mainline/og9):
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg01482.htmlhttps://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01026.html
libgomp/
* libgomp.h (gomp_copy_host2dev): Update prototype.
* oacc-mem.c (memcpy_tofrom_device, update_dev_host): Add new
argument to gomp_copy_host2dev (false).
* plugin/plugin-gcn.c (struct copy_data): Remove free_src field.
(copy_data): Don't free src.
(queue_push_copy): Remove free_src handling.
(GOMP_OFFLOAD_dev2dev): Update call to queue_push_copy.
(GOMP_OFFLOAD_openacc_async_host2dev): Remove source-data
snapshotting.
(GOMP_OFFLOAD_openacc_async_dev2host): Update call to
queue_push_copy.
* target.c (goacc_device_copy_async): Add SRCADDR_ORIG parameter.
(gomp_copy_host2dev): Add EPHEMERAL parameter. Snapshot source
data when true, and set up deferred freeing of temporary buffer.
(gomp_copy_dev2host): Update call to goacc_device_copy_async.
(gomp_map_vars_existing, gomp_map_pointer, gomp_attach_pointer)
(gomp_detach_pointer, gomp_map_vars_internal, gomp_update): Update
calls to gomp_copy_host2dev with appropriate ephemeral argument.
* testsuite/libgomp.oacc-c-c++-common/async-data-1-1.c: Remove
XFAIL.
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
... which currently has *not* been forced to 'num_workers (1)'.
In addition to the testcases modified here, this also fixes:
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/mode-transitions.c -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O0 execution test
[Etc.]
mode-transitions.exe: [...]/libgomp.oacc-c-c++-common/mode-transitions.c:702: t17: Assertion `arr_b[i] == (i ^ 31) * 8' failed.
libgomp/
* plugin/plugin-gcn.c (gcn_exec): Force 'num_workers (1)'
unconditionally.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c:
Update.
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c: Likewise.
For unknown reasons, this had gotten added for the libgomp HSA plugin in commit
b8d89b03db (r242749) "Remove build dependence on
HSA run-time", and later propagated into the GCN plugin.
libgomp/
* plugin/plugin-gcn.c (init_environment_variables): Don't prepend
the 'HSA_RUNTIME_LIB' path to 'libhsa-runtime64.so'.
* plugin/configfrag.ac (HSA_RUNTIME_LIB): Clean up.
* config.h.in: Regenerate.
* configure: Likewise.
These are the same header files that exist in the Radeon Open Compute Runtime
project (as of October 2020), but they have been specially relicensed by AMD
for use in GCC.
The header files retain AMD copyright.
include/ChangeLog:
* hsa.h: Replace whole file.
* hsa_ext_amd.h: New file.
* hsa_ext_image.h: New file.
libgomp/ChangeLog:
* plugin/plugin-gcn.c: Include hsa_ext_amd.h.
(HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT): Delete redundant definition.
Ensure the code will continue to compile when elf.h gets these definitions.
libgomp/ChangeLog:
* plugin/plugin-gcn.c: Don't redefine relocations if elf.h has them.
(reserved): Delete unused define.
This upgrades the compiler to emit HSA Code Object v3 binaries. This means
changing the assembler directives, and linker command line options.
The gcn-run and libgomp loaders need corresponding alterations. The
relocations no longer need to be fixed up manually, and the kernel symbol
names have changed slightly.
This move makes the binaries compatible with the new rocgdb from ROCm 3.5.
2020-06-17 Andrew Stubbs <ams@codesourcery.com>
gcc/
* config/gcn/gcn-hsa.h (TEXT_SECTION_ASM_OP): Use ".text".
(BSS_SECTION_ASM_OP): Use ".bss".
(ASM_SPEC): Remove "-mattr=-code-object-v3".
(LINK_SPEC): Add "--export-dynamic".
* config/gcn/gcn-opts.h (processor_type): Replace PROCESSOR_VEGA with
PROCESSOR_VEGA10 and PROCESSOR_VEGA20.
* config/gcn/gcn-run.c (HSA_RUNTIME_LIB): Use ".so.1" variant.
(load_image): Remove obsolete relocation handling.
Add ".kd" suffix to the symbol names.
* config/gcn/gcn.c (MAX_NORMAL_SGPR_COUNT): Set to 62.
(gcn_option_override): Update gcn_isa test.
(gcn_kernel_arg_types): Update all the assembler directives.
Remove the obsolete options.
(gcn_conditional_register_usage): Update MAX_NORMAL_SGPR_COUNT usage.
(gcn_omp_device_kind_arch_isa): Handle PROCESSOR_VEGA10 and
PROCESSOR_VEGA20.
(output_file_start): Rework assembler file header.
(gcn_hsa_declare_function_name): Rework kernel metadata.
* config/gcn/gcn.h (GCN_KERNEL_ARG_TYPES): Set to 16.
* config/gcn/gcn.opt (PROCESSOR_VEGA): Remove enum.
(PROCESSOR_VEGA10): New enum value.
(PROCESSOR_VEGA20): New enum value.
libgomp/
* plugin/plugin-gcn.c (init_environment_variables): Use ".so.1"
variant for HSA_RUNTIME_LIB name.
(find_executable_symbol_1): Delete.
(find_executable_symbol): Delete.
(init_kernel_properties): Add ".kd" suffix to symbol names.
(find_load_offset): Delete.
(create_and_finalize_hsa_program): Remove relocation handling.
Ensure that the returned status values are not ignored. The old code was
not broken, but this is both safer and satisfies static analysis.
2020-04-23 Andrew Stubbs <ams@codesourcery.com>
PR other/94629
libgomp/
* plugin/plugin-gcn.c (init_hsa_context): Check return value from
hsa_iterate_agents.
(GOMP_OFFLOAD_init_device): Check return values from both calls to
hsa_agent_iterate_regions.
2020-01-31 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/
* config/gcn/mkoffload.c (process_asm): Add sgpr_count and vgpr_count
to definition of hsa_kernel_description. Parse assembly to find SGPR
and VGPR count of kernel and store in hsa_kernel_description.
libgomp/
* plugin/plugin-gcn.c (struct hsa_kernel_description): Add sgpr_count
and vgpr_count fields.
(struct kernel_info): Add a field for a hsa_kernel_description.
(run_kernel): Reduce the number of threads/workers if the requested
number would require too many VGPRs.
(init_basic_kernel_info): Initialize description field with
the hsa_kernel_description entry for the kernel.
Add full support for the OpenACC 2.6 acc_get_property and
acc_get_property_string functions to the libgomp GCN plugin.
libgomp/
* plugin-gcn.c (struct agent_info): Add fields "name" and
"vendor_name" ...
(GOMP_OFFLOAD_init_device): ... and init from here.
(struct hsa_context_info): Add field "driver_version_s" ...
(init_hsa_contest): ... and init from here.
(GOMP_OFFLOAD_openacc_get_property): Replace stub with a proper
implementation.
* testsuite/libgomp.oacc-c-c++-common/acc_get_property.c:
Enable test execution for amdgcn and host offloading targets.
* testsuite/libgomp.oacc-fortran/acc_get_property.f90: Likewise.
* testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c
(expect_device_properties): Split function into ...
(expect_device_string_properties): ... this new function ...
(expect_device_memory): ... and this new function.
* testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c:
Add test.
The HSA/ROCm runtime rejects binaries not built for the exact GPU device
present. So far, the libgomp amdgcn plugin does not verify that the GPU ISA
and the ISA specified at compile time match before handing over the binary to
the runtime. In case of a mismatch, the user is confronted with an unhelpful
runtime error.
This commit implements a runtime ISA check. In case of an ISA mismatch, the
execution is aborted with a clear error message and a hint at the correct
compilation parameters for the GPU on which the execution has been attempted.
libgomp/
* plugin/plugin-gcn.c (EF_AMDGPU_MACH): New enum.
* (EF_AMDGPU_MACH_MASK): New constant.
* (gcn_isa): New typedef.
* (gcn_gfx801_s): New constant.
* (gcn_gfx803_s): New constant.
* (gcn_gfx900_s): New constant.
* (gcn_gfx906_s): New constant.
* (gcn_isa_name_len): New constant.
* (elf_gcn_isa_field): New function.
* (isa_hsa_name): New function.
* (isa_gcc_name): New function.
* (isa_code): New function.
* (struct agent_info): Add field "device_isa" and remove field
"gfx900_p".
* (GOMP_OFFLOAD_init_device): Adapt agent init to "agent_info"
field changes, fail if device has unknown ISA.
* (parse_target_attributes): Replace "gfx900_p" by "device_isa".
* (isa_matches_agent): New function ...
* (create_and_finalize_hsa_program): ... used from here to check
that the GPU ISA and the code-object ISA match.
Add generic support for the OpenACC 2.6 `acc_get_property' and
`acc_get_property_string' routines, as well as full handlers for the
host and the NVPTX offload targets and minimal handlers for the HSA,
Intel MIC, and AMD GCN offload targets.
Included are C/C++ and Fortran tests that, in particular, print
the property values for acc_property_vendor, acc_property_memory,
acc_property_free_memory, acc_property_name, and acc_property_driver.
The output looks as follows:
Vendor: GNU
Name: GOMP
Total memory: 0
Free memory: 0
Driver: 1.0
with the host driver (where the memory related properties are not
supported for the host device and yield 0, conforming to the standard)
and output like:
Vendor: Nvidia
Total memory: 12651462656
Free memory: 12202737664
Name: TITAN V
Driver: CUDA Driver 9.1
with the NVPTX driver.
2019-12-22 Maciej W. Rozycki <macro@codesourcery.com>
Frederik Harwath <frederik@codesourcery.com>
Thomas Schwinge <tschwinge@codesourcery.com>
include/
* gomp-constants.h (gomp_device_property): New enum.
libgomp/
* libgomp.h (gomp_device_descr): Add `get_property_func' member.
* libgomp-plugin.h (gomp_device_property_value): New union.
(gomp_device_property_value): New prototype.
* openacc.h (acc_device_t): Add `acc_device_current' enumeration
constant.
(acc_device_property_t): New enum.
(acc_get_property, acc_get_property_string): New prototypes.
* oacc-init.c (acc_get_device_type): Also assert that result
is not `acc_device_current'.
(get_property_any, acc_get_property, acc_get_property_string):
New functions.
* openacc.f90 (openacc_kinds): Add `acc_device_current' and
`acc_property_memory', `acc_property_free_memory',
`acc_property_name', `acc_property_vendor' and
`acc_property_driver' constants. Add `acc_device_property' data
type.
(openacc_internal): Add `acc_get_property' and
`acc_get_property_string' interfaces. Add `acc_get_property_h',
`acc_get_property_string_h', `acc_get_property_l' and
`acc_get_property_string_l'.
* oacc-host.c (host_get_property): New function.
(host_dispatch): Wire it.
* target.c (gomp_load_plugin_for_device): Handle `get_property'.
* libgomp.map (OACC_2.6): Add `acc_get_property', `acc_get_property_h_',
`acc_get_property_string' and `acc_get_property_string_h_' symbols.
* libgomp.texi (OpenACC Runtime Library Routines): Add
`acc_get_property'.
(acc_get_property): New node.
* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_property): New
function (stub).
* plugin/plugin-hsa.c (GOMP_OFFLOAD_get_property): New function.
* plugin/plugin-nvptx.c (CUDA_CALLS): Add `cuDeviceGetName',
`cuDeviceTotalMem', `cuDriverGetVersion' and `cuMemGetInfo'
calls.
(GOMP_OFFLOAD_get_property): New function.
(struct ptx_device): Add new field "name".
(cuda_driver_version_s): Add new static variable ...
(nvptx_init): ... and init from here.
* testsuite/libgomp.oacc-c-c++-common/acc_get_property.c: New test.
* testsuite/libgomp.oacc-c-c++-common/acc_get_property-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/acc_get_property-3.c: New test.
* testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c: New file
with test helper functions.
* testsuite/libgomp.oacc-fortran/acc_get_property.f90: New test.
liboffloadmic/
* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_get_property):
New function.
Reviewed-by: Thomas Schwinge <thomas@codesourcery.com>
Co-Authored-By: Frederik Harwath <frederik@codesourcery.com>
Co-Authored-By: Thomas Schwinge <tschwinge@codesourcery.com>
From-SVN: r279710