2017-05-23 11:16:05 +02:00
|
|
|
/* OpenACC parallelism dimensions clauses: num_gangs, num_workers,
|
|
|
|
vector_length. */
|
|
|
|
|
Add OpenACC 2.6 `serial' construct support
The `serial' construct (cf. section 2.5.3 of the OpenACC 2.6 standard)
is equivalent to a `parallel' construct with clauses `num_gangs(1)
num_workers(1) vector_length(1)' implied.
These clauses are therefore not supported with the `serial'
construct. All the remaining clauses accepted with `parallel' are also
accepted with `serial'.
The `serial' construct is implemented like `parallel', except for
hardcoding dimensions rather than taking them from the relevant
clauses, in `expand_omp_target'.
Separate codes are used to denote the `serial' construct throughout the
middle end, even though the mapping of `serial' to an equivalent
`parallel' construct could have been done in the individual language
frontends. In particular, this allows to distinguish between compute
constructs in warnings, error messages, dumps etc.
2019-11-12 Maciej W. Rozycki <macro@codesourcery.com>
Tobias Burnus <tobias@codesourcery.com>
Frederik Harwath <frederik@codesourcery.com>
Thomas Schwinge <thomas@codesourcery.com>
gcc/
* gimple.h (gf_mask): Add GF_OMP_TARGET_KIND_OACC_SERIAL
enumeration constant.
(is_gimple_omp_oacc): Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
(is_gimple_omp_offloaded): Likewise.
* gimplify.c (omp_region_type): Add ORT_ACC_SERIAL enumeration
constant. Adjust the value of ORT_NONE accordingly.
(is_gimple_stmt): Handle OACC_SERIAL.
(oacc_default_clause): Handle ORT_ACC_SERIAL.
(gomp_needs_data_present): Likewise.
(gimplify_adjust_omp_clauses): Likewise.
(gimplify_omp_workshare): Handle OACC_SERIAL.
(gimplify_expr): Likewise.
* omp-expand.c (expand_omp_target):
Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
(build_omp_regions_1, omp_make_gimple_edges): Likewise.
* omp-low.c (is_oacc_parallel): Rename function to...
(is_oacc_parallel_or_serial): ... this.
Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
(scan_sharing_clauses): Adjust accordingly.
(scan_omp_for): Likewise.
(lower_oacc_head_mark): Likewise.
(convert_from_firstprivate_int): Likewise.
(lower_omp_target): Likewise.
(check_omp_nesting_restrictions): Handle
GF_OMP_TARGET_KIND_OACC_SERIAL.
(lower_oacc_reductions): Likewise.
(lower_omp_target): Likewise.
* tree.def (OACC_SERIAL): New tree code.
* tree-pretty-print.c (dump_generic_node): Handle OACC_SERIAL.
* doc/generic.texi (OpenACC): Document OACC_SERIAL.
gcc/c-family/
* c-pragma.h (pragma_kind): Add PRAGMA_OACC_SERIAL enumeration
constant.
* c-pragma.c (oacc_pragmas): Add "serial" entry.
gcc/c/
* c-parser.c (OACC_SERIAL_CLAUSE_MASK): New macro.
(c_parser_oacc_kernels_parallel): Rename function to...
(c_parser_oacc_compute): ... this. Handle PRAGMA_OACC_SERIAL.
(c_parser_omp_construct): Update accordingly.
gcc/cp/
* constexpr.c (potential_constant_expression_1): Handle
OACC_SERIAL.
* parser.c (OACC_SERIAL_CLAUSE_MASK): New macro.
(cp_parser_oacc_kernels_parallel): Rename function to...
(cp_parser_oacc_compute): ... this. Handle PRAGMA_OACC_SERIAL.
(cp_parser_omp_construct): Update accordingly.
(cp_parser_pragma): Handle PRAGMA_OACC_SERIAL. Fix alphabetic
order.
* pt.c (tsubst_expr): Handle OACC_SERIAL.
gcc/fortran/
* gfortran.h (gfc_statement): Add ST_OACC_SERIAL_LOOP,
ST_OACC_END_SERIAL_LOOP, ST_OACC_SERIAL and ST_OACC_END_SERIAL
enumeration constants.
(gfc_exec_op): Add EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL
enumeration constants.
* match.h (gfc_match_oacc_serial): New prototype.
(gfc_match_oacc_serial_loop): Likewise.
* dump-parse-tree.c (show_omp_node, show_code_node): Handle
EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL.
* match.c (match_exit_cycle): Handle EXEC_OACC_SERIAL_LOOP.
* openmp.c (OACC_SERIAL_CLAUSES): New macro.
(gfc_match_oacc_serial_loop): New function.
(gfc_match_oacc_serial): Likewise.
(oacc_is_loop): Handle EXEC_OACC_SERIAL_LOOP.
(resolve_omp_clauses): Handle EXEC_OACC_SERIAL.
(oacc_code_to_statement): Handle EXEC_OACC_SERIAL and
EXEC_OACC_SERIAL_LOOP.
(gfc_resolve_oacc_directive): Likewise.
* parse.c (decode_oacc_directive) <'s'>: Add case for "serial"
and "serial loop".
(next_statement): Handle ST_OACC_SERIAL_LOOP and ST_OACC_SERIAL.
(gfc_ascii_statement): Likewise. Handle ST_OACC_END_SERIAL_LOOP
and ST_OACC_END_SERIAL.
(parse_oacc_structured_block): Handle ST_OACC_SERIAL.
(parse_oacc_loop): Handle ST_OACC_SERIAL_LOOP and
ST_OACC_END_SERIAL_LOOP.
(parse_executable): Handle ST_OACC_SERIAL_LOOP and
ST_OACC_SERIAL.
(is_oacc): Handle EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL.
* resolve.c (gfc_resolve_blocks, gfc_resolve_code): Likewise.
* st.c (gfc_free_statement): Likewise.
* trans-openmp.c (gfc_trans_oacc_construct): Handle
EXEC_OACC_SERIAL.
(gfc_trans_oacc_combined_directive): Handle
EXEC_OACC_SERIAL_LOOP.
(gfc_trans_oacc_directive): Handle EXEC_OACC_SERIAL_LOOP and
EXEC_OACC_SERIAL.
* trans.c (trans_code): Likewise.
gcc/testsuite/
* c-c++-common/goacc/parallel-dims.c: New test.
* gfortran.dg/goacc/parallel-dims.f90: New test.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: New test.
* testsuite/libgomp.oacc-fortran/parallel-dims-aux.c: New test.
* testsuite/libgomp.oacc-fortran/parallel-dims.f89: New test.
* testsuite/libgomp.oacc-fortran/parallel-dims-2.f90: New test.
Reviewed-by: Thomas Schwinge <thomas@codesourcery.com>
Co-Authored-By: Frederik Harwath <frederik@codesourcery.com>
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
Co-Authored-By: Tobias Burnus <tobias@codesourcery.com>
From-SVN: r278082
2019-11-12 09:45:35 +01:00
|
|
|
/* See also '../libgomp.oacc-fortran/parallel-dims.f90'. */
|
|
|
|
|
2017-05-23 11:16:05 +02:00
|
|
|
#include <limits.h>
|
|
|
|
#include <openacc.h>
|
2018-05-02 19:53:29 +02:00
|
|
|
#include <gomp-constants.h>
|
2017-05-23 11:16:05 +02:00
|
|
|
|
|
|
|
/* TODO: "(int) acc_device_*" casts because of the C++ acc_on_device wrapper
|
|
|
|
not behaving as expected for -O0. */
|
|
|
|
#pragma acc routine seq
|
|
|
|
static unsigned int __attribute__ ((optimize ("O2"))) acc_gang ()
|
|
|
|
{
|
|
|
|
if (acc_on_device ((int) acc_device_host))
|
|
|
|
return 0;
|
|
|
|
else if (acc_on_device ((int) acc_device_nvidia))
|
2018-05-02 19:53:29 +02:00
|
|
|
return __builtin_goacc_parlevel_id (GOMP_DIM_GANG);
|
2017-05-23 11:16:05 +02:00
|
|
|
else
|
|
|
|
__builtin_abort ();
|
|
|
|
}
|
|
|
|
|
|
|
|
#pragma acc routine seq
|
|
|
|
static unsigned int __attribute__ ((optimize ("O2"))) acc_worker ()
|
|
|
|
{
|
|
|
|
if (acc_on_device ((int) acc_device_host))
|
|
|
|
return 0;
|
|
|
|
else if (acc_on_device ((int) acc_device_nvidia))
|
2018-05-02 19:53:29 +02:00
|
|
|
return __builtin_goacc_parlevel_id (GOMP_DIM_WORKER);
|
2017-05-23 11:16:05 +02:00
|
|
|
else
|
|
|
|
__builtin_abort ();
|
|
|
|
}
|
|
|
|
|
|
|
|
#pragma acc routine seq
|
|
|
|
static unsigned int __attribute__ ((optimize ("O2"))) acc_vector ()
|
|
|
|
{
|
|
|
|
if (acc_on_device ((int) acc_device_host))
|
|
|
|
return 0;
|
|
|
|
else if (acc_on_device ((int) acc_device_nvidia))
|
2018-05-02 19:53:29 +02:00
|
|
|
return __builtin_goacc_parlevel_id (GOMP_DIM_VECTOR);
|
2017-05-23 11:16:05 +02:00
|
|
|
else
|
|
|
|
__builtin_abort ();
|
|
|
|
}
|
2015-11-04 21:48:05 +01:00
|
|
|
|
|
|
|
|
|
|
|
int main ()
|
|
|
|
{
|
2017-05-23 11:16:05 +02:00
|
|
|
acc_init (acc_device_default);
|
|
|
|
|
Add OpenACC 2.6 `serial' construct support
The `serial' construct (cf. section 2.5.3 of the OpenACC 2.6 standard)
is equivalent to a `parallel' construct with clauses `num_gangs(1)
num_workers(1) vector_length(1)' implied.
These clauses are therefore not supported with the `serial'
construct. All the remaining clauses accepted with `parallel' are also
accepted with `serial'.
The `serial' construct is implemented like `parallel', except for
hardcoding dimensions rather than taking them from the relevant
clauses, in `expand_omp_target'.
Separate codes are used to denote the `serial' construct throughout the
middle end, even though the mapping of `serial' to an equivalent
`parallel' construct could have been done in the individual language
frontends. In particular, this allows to distinguish between compute
constructs in warnings, error messages, dumps etc.
2019-11-12 Maciej W. Rozycki <macro@codesourcery.com>
Tobias Burnus <tobias@codesourcery.com>
Frederik Harwath <frederik@codesourcery.com>
Thomas Schwinge <thomas@codesourcery.com>
gcc/
* gimple.h (gf_mask): Add GF_OMP_TARGET_KIND_OACC_SERIAL
enumeration constant.
(is_gimple_omp_oacc): Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
(is_gimple_omp_offloaded): Likewise.
* gimplify.c (omp_region_type): Add ORT_ACC_SERIAL enumeration
constant. Adjust the value of ORT_NONE accordingly.
(is_gimple_stmt): Handle OACC_SERIAL.
(oacc_default_clause): Handle ORT_ACC_SERIAL.
(gomp_needs_data_present): Likewise.
(gimplify_adjust_omp_clauses): Likewise.
(gimplify_omp_workshare): Handle OACC_SERIAL.
(gimplify_expr): Likewise.
* omp-expand.c (expand_omp_target):
Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
(build_omp_regions_1, omp_make_gimple_edges): Likewise.
* omp-low.c (is_oacc_parallel): Rename function to...
(is_oacc_parallel_or_serial): ... this.
Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
(scan_sharing_clauses): Adjust accordingly.
(scan_omp_for): Likewise.
(lower_oacc_head_mark): Likewise.
(convert_from_firstprivate_int): Likewise.
(lower_omp_target): Likewise.
(check_omp_nesting_restrictions): Handle
GF_OMP_TARGET_KIND_OACC_SERIAL.
(lower_oacc_reductions): Likewise.
(lower_omp_target): Likewise.
* tree.def (OACC_SERIAL): New tree code.
* tree-pretty-print.c (dump_generic_node): Handle OACC_SERIAL.
* doc/generic.texi (OpenACC): Document OACC_SERIAL.
gcc/c-family/
* c-pragma.h (pragma_kind): Add PRAGMA_OACC_SERIAL enumeration
constant.
* c-pragma.c (oacc_pragmas): Add "serial" entry.
gcc/c/
* c-parser.c (OACC_SERIAL_CLAUSE_MASK): New macro.
(c_parser_oacc_kernels_parallel): Rename function to...
(c_parser_oacc_compute): ... this. Handle PRAGMA_OACC_SERIAL.
(c_parser_omp_construct): Update accordingly.
gcc/cp/
* constexpr.c (potential_constant_expression_1): Handle
OACC_SERIAL.
* parser.c (OACC_SERIAL_CLAUSE_MASK): New macro.
(cp_parser_oacc_kernels_parallel): Rename function to...
(cp_parser_oacc_compute): ... this. Handle PRAGMA_OACC_SERIAL.
(cp_parser_omp_construct): Update accordingly.
(cp_parser_pragma): Handle PRAGMA_OACC_SERIAL. Fix alphabetic
order.
* pt.c (tsubst_expr): Handle OACC_SERIAL.
gcc/fortran/
* gfortran.h (gfc_statement): Add ST_OACC_SERIAL_LOOP,
ST_OACC_END_SERIAL_LOOP, ST_OACC_SERIAL and ST_OACC_END_SERIAL
enumeration constants.
(gfc_exec_op): Add EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL
enumeration constants.
* match.h (gfc_match_oacc_serial): New prototype.
(gfc_match_oacc_serial_loop): Likewise.
* dump-parse-tree.c (show_omp_node, show_code_node): Handle
EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL.
* match.c (match_exit_cycle): Handle EXEC_OACC_SERIAL_LOOP.
* openmp.c (OACC_SERIAL_CLAUSES): New macro.
(gfc_match_oacc_serial_loop): New function.
(gfc_match_oacc_serial): Likewise.
(oacc_is_loop): Handle EXEC_OACC_SERIAL_LOOP.
(resolve_omp_clauses): Handle EXEC_OACC_SERIAL.
(oacc_code_to_statement): Handle EXEC_OACC_SERIAL and
EXEC_OACC_SERIAL_LOOP.
(gfc_resolve_oacc_directive): Likewise.
* parse.c (decode_oacc_directive) <'s'>: Add case for "serial"
and "serial loop".
(next_statement): Handle ST_OACC_SERIAL_LOOP and ST_OACC_SERIAL.
(gfc_ascii_statement): Likewise. Handle ST_OACC_END_SERIAL_LOOP
and ST_OACC_END_SERIAL.
(parse_oacc_structured_block): Handle ST_OACC_SERIAL.
(parse_oacc_loop): Handle ST_OACC_SERIAL_LOOP and
ST_OACC_END_SERIAL_LOOP.
(parse_executable): Handle ST_OACC_SERIAL_LOOP and
ST_OACC_SERIAL.
(is_oacc): Handle EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL.
* resolve.c (gfc_resolve_blocks, gfc_resolve_code): Likewise.
* st.c (gfc_free_statement): Likewise.
* trans-openmp.c (gfc_trans_oacc_construct): Handle
EXEC_OACC_SERIAL.
(gfc_trans_oacc_combined_directive): Handle
EXEC_OACC_SERIAL_LOOP.
(gfc_trans_oacc_directive): Handle EXEC_OACC_SERIAL_LOOP and
EXEC_OACC_SERIAL.
* trans.c (trans_code): Likewise.
gcc/testsuite/
* c-c++-common/goacc/parallel-dims.c: New test.
* gfortran.dg/goacc/parallel-dims.f90: New test.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: New test.
* testsuite/libgomp.oacc-fortran/parallel-dims-aux.c: New test.
* testsuite/libgomp.oacc-fortran/parallel-dims.f89: New test.
* testsuite/libgomp.oacc-fortran/parallel-dims-2.f90: New test.
Reviewed-by: Thomas Schwinge <thomas@codesourcery.com>
Co-Authored-By: Frederik Harwath <frederik@codesourcery.com>
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
Co-Authored-By: Tobias Burnus <tobias@codesourcery.com>
From-SVN: r278082
2019-11-12 09:45:35 +01:00
|
|
|
/* OpenACC parallel construct. */
|
|
|
|
|
2017-05-23 11:16:05 +02:00
|
|
|
/* Non-positive value. */
|
|
|
|
|
|
|
|
/* GR, WS, VS. */
|
|
|
|
{
|
|
|
|
#define GANGS 0 /* { dg-warning "'num_gangs' value must be positive" "" { target c } } */
|
|
|
|
int gangs_actual = GANGS;
|
|
|
|
int gangs_min, gangs_max, workers_min, workers_max, vectors_min, vectors_max;
|
|
|
|
gangs_min = workers_min = vectors_min = INT_MAX;
|
|
|
|
gangs_max = workers_max = vectors_max = INT_MIN;
|
|
|
|
#pragma acc parallel copy (gangs_actual) \
|
|
|
|
reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max) \
|
|
|
|
num_gangs (GANGS) /* { dg-warning "'num_gangs' value must be positive" "" { target c++ } } */
|
|
|
|
{
|
|
|
|
/* We're actually executing with num_gangs (1). */
|
|
|
|
gangs_actual = 1;
|
|
|
|
for (int i = 100 * gangs_actual; i > -100 * gangs_actual; --i)
|
|
|
|
{
|
|
|
|
/* <https://gcc.gnu.org/PR80547>. */
|
|
|
|
#if 0
|
|
|
|
gangs_min = gangs_max = acc_gang ();
|
|
|
|
workers_min = workers_max = acc_worker ();
|
|
|
|
vectors_min = vectors_max = acc_vector ();
|
|
|
|
#else
|
|
|
|
int gangs = acc_gang ();
|
|
|
|
gangs_min = (gangs_min < gangs) ? gangs_min : gangs;
|
|
|
|
gangs_max = (gangs_max > gangs) ? gangs_max : gangs;
|
|
|
|
int workers = acc_worker ();
|
|
|
|
workers_min = (workers_min < workers) ? workers_min : workers;
|
|
|
|
workers_max = (workers_max > workers) ? workers_max : workers;
|
|
|
|
int vectors = acc_vector ();
|
|
|
|
vectors_min = (vectors_min < vectors) ? vectors_min : vectors;
|
|
|
|
vectors_max = (vectors_max > vectors) ? vectors_max : vectors;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (gangs_actual != 1)
|
|
|
|
__builtin_abort ();
|
|
|
|
if (gangs_min != 0 || gangs_max != gangs_actual - 1
|
|
|
|
|| workers_min != 0 || workers_max != 0
|
|
|
|
|| vectors_min != 0 || vectors_max != 0)
|
|
|
|
__builtin_abort ();
|
|
|
|
#undef GANGS
|
|
|
|
}
|
|
|
|
|
|
|
|
/* GP, WS, VS. */
|
|
|
|
{
|
|
|
|
#define GANGS 0 /* { dg-warning "'num_gangs' value must be positive" "" { target c } } */
|
|
|
|
int gangs_actual = GANGS;
|
|
|
|
int gangs_min, gangs_max, workers_min, workers_max, vectors_min, vectors_max;
|
|
|
|
gangs_min = workers_min = vectors_min = INT_MAX;
|
|
|
|
gangs_max = workers_max = vectors_max = INT_MIN;
|
|
|
|
#pragma acc parallel copy (gangs_actual) \
|
|
|
|
num_gangs (GANGS) /* { dg-warning "'num_gangs' value must be positive" "" { target c++ } } */
|
|
|
|
{
|
|
|
|
/* We're actually executing with num_gangs (1). */
|
|
|
|
gangs_actual = 1;
|
|
|
|
#pragma acc loop gang reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
|
|
|
|
for (int i = 100 * gangs_actual; i > -100 * gangs_actual; --i)
|
|
|
|
{
|
|
|
|
gangs_min = gangs_max = acc_gang ();
|
|
|
|
workers_min = workers_max = acc_worker ();
|
|
|
|
vectors_min = vectors_max = acc_vector ();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (gangs_actual != 1)
|
|
|
|
__builtin_abort ();
|
|
|
|
if (gangs_min != 0 || gangs_max != gangs_actual - 1
|
|
|
|
|| workers_min != 0 || workers_max != 0
|
|
|
|
|| vectors_min != 0 || vectors_max != 0)
|
|
|
|
__builtin_abort ();
|
|
|
|
#undef GANGS
|
|
|
|
}
|
|
|
|
|
|
|
|
/* GR, WP, VS. */
|
|
|
|
{
|
|
|
|
#define WORKERS 0 /* { dg-warning "'num_workers' value must be positive" "" { target c } } */
|
|
|
|
int workers_actual = WORKERS;
|
|
|
|
int gangs_min, gangs_max, workers_min, workers_max, vectors_min, vectors_max;
|
|
|
|
gangs_min = workers_min = vectors_min = INT_MAX;
|
|
|
|
gangs_max = workers_max = vectors_max = INT_MIN;
|
|
|
|
#pragma acc parallel copy (workers_actual) \
|
|
|
|
num_workers (WORKERS) /* { dg-warning "'num_workers' value must be positive" "" { target c++ } } */
|
|
|
|
{
|
|
|
|
/* We're actually executing with num_workers (1). */
|
|
|
|
workers_actual = 1;
|
|
|
|
#pragma acc loop worker reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
|
|
|
|
for (int i = 100 * workers_actual; i > -100 * workers_actual; --i)
|
|
|
|
{
|
|
|
|
gangs_min = gangs_max = acc_gang ();
|
|
|
|
workers_min = workers_max = acc_worker ();
|
|
|
|
vectors_min = vectors_max = acc_vector ();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (workers_actual != 1)
|
|
|
|
__builtin_abort ();
|
|
|
|
if (gangs_min != 0 || gangs_max != 0
|
|
|
|
|| workers_min != 0 || workers_max != workers_actual - 1
|
|
|
|
|| vectors_min != 0 || vectors_max != 0)
|
|
|
|
__builtin_abort ();
|
|
|
|
#undef WORKERS
|
|
|
|
}
|
|
|
|
|
|
|
|
/* GR, WS, VP. */
|
|
|
|
{
|
|
|
|
#define VECTORS 0 /* { dg-warning "'vector_length' value must be positive" "" { target c } } */
|
|
|
|
int vectors_actual = VECTORS;
|
|
|
|
int gangs_min, gangs_max, workers_min, workers_max, vectors_min, vectors_max;
|
|
|
|
gangs_min = workers_min = vectors_min = INT_MAX;
|
|
|
|
gangs_max = workers_max = vectors_max = INT_MIN;
|
2019-02-22 11:51:35 +01:00
|
|
|
#pragma acc parallel copy (vectors_actual) /* { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } } */ \
|
2017-05-23 11:16:05 +02:00
|
|
|
vector_length (VECTORS) /* { dg-warning "'vector_length' value must be positive" "" { target c++ } } */
|
|
|
|
{
|
|
|
|
/* We're actually executing with vector_length (1), just the GCC nvptx
|
|
|
|
back end enforces vector_length (32). */
|
|
|
|
if (acc_on_device (acc_device_nvidia))
|
|
|
|
vectors_actual = 32;
|
|
|
|
else
|
|
|
|
vectors_actual = 1;
|
|
|
|
#pragma acc loop vector reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
|
|
|
|
for (int i = 100 * vectors_actual; i > -100 * vectors_actual; --i)
|
|
|
|
{
|
|
|
|
gangs_min = gangs_max = acc_gang ();
|
|
|
|
workers_min = workers_max = acc_worker ();
|
|
|
|
vectors_min = vectors_max = acc_vector ();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (acc_get_device_type () == acc_device_nvidia)
|
|
|
|
{
|
|
|
|
if (vectors_actual != 32)
|
|
|
|
__builtin_abort ();
|
|
|
|
}
|
|
|
|
else
|
|
|
|
if (vectors_actual != 1)
|
|
|
|
__builtin_abort ();
|
|
|
|
if (gangs_min != 0 || gangs_max != 0
|
|
|
|
|| workers_min != 0 || workers_max != 0
|
|
|
|
|| vectors_min != 0 || vectors_max != vectors_actual - 1)
|
|
|
|
__builtin_abort ();
|
|
|
|
#undef VECTORS
|
|
|
|
}
|
|
|
|
|
2016-04-08 23:09:47 +02:00
|
|
|
|
2017-05-23 11:16:05 +02:00
|
|
|
/* High value. */
|
|
|
|
|
|
|
|
/* GR, WS, VS. */
|
2015-11-04 21:48:05 +01:00
|
|
|
{
|
2017-05-23 11:16:05 +02:00
|
|
|
/* There is no actual limit for the number of gangs, so we try with a
|
|
|
|
rather high value. */
|
|
|
|
int gangs = 12345;
|
|
|
|
int gangs_actual = gangs;
|
|
|
|
int gangs_min, gangs_max, workers_min, workers_max, vectors_min, vectors_max;
|
|
|
|
gangs_min = workers_min = vectors_min = INT_MAX;
|
|
|
|
gangs_max = workers_max = vectors_max = INT_MIN;
|
|
|
|
#pragma acc parallel copy (gangs_actual) \
|
|
|
|
reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max) \
|
|
|
|
num_gangs (gangs)
|
|
|
|
{
|
|
|
|
if (acc_on_device (acc_device_host))
|
|
|
|
{
|
|
|
|
/* We're actually executing with num_gangs (1). */
|
|
|
|
gangs_actual = 1;
|
|
|
|
}
|
|
|
|
/* As we're executing GR not GP, don't multiply with a "gangs_actual"
|
|
|
|
factor. */
|
|
|
|
for (int i = 100 /* * gangs_actual */; i > -100 /* * gangs_actual */; --i)
|
|
|
|
{
|
|
|
|
gangs_min = gangs_max = acc_gang ();
|
|
|
|
workers_min = workers_max = acc_worker ();
|
|
|
|
vectors_min = vectors_max = acc_vector ();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (gangs_actual < 1)
|
|
|
|
__builtin_abort ();
|
|
|
|
if (gangs_min != 0 || gangs_max != gangs_actual - 1
|
|
|
|
|| workers_min != 0 || workers_max != 0
|
|
|
|
|| vectors_min != 0 || vectors_max != 0)
|
|
|
|
__builtin_abort ();
|
2015-11-04 21:48:05 +01:00
|
|
|
}
|
|
|
|
|
2017-05-23 11:16:05 +02:00
|
|
|
/* GP, WS, VS. */
|
2015-11-04 21:48:05 +01:00
|
|
|
{
|
2017-05-23 11:16:05 +02:00
|
|
|
/* There is no actual limit for the number of gangs, so we try with a
|
|
|
|
rather high value. */
|
|
|
|
int gangs = 12345;
|
|
|
|
int gangs_actual = gangs;
|
|
|
|
int gangs_min, gangs_max, workers_min, workers_max, vectors_min, vectors_max;
|
|
|
|
gangs_min = workers_min = vectors_min = INT_MAX;
|
|
|
|
gangs_max = workers_max = vectors_max = INT_MIN;
|
|
|
|
#pragma acc parallel copy (gangs_actual) \
|
|
|
|
num_gangs (gangs)
|
|
|
|
{
|
|
|
|
if (acc_on_device (acc_device_host))
|
|
|
|
{
|
|
|
|
/* We're actually executing with num_gangs (1). */
|
|
|
|
gangs_actual = 1;
|
|
|
|
}
|
|
|
|
#pragma acc loop gang reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
|
|
|
|
for (int i = 100 * gangs_actual; i > -100 * gangs_actual; --i)
|
|
|
|
{
|
|
|
|
gangs_min = gangs_max = acc_gang ();
|
|
|
|
workers_min = workers_max = acc_worker ();
|
|
|
|
vectors_min = vectors_max = acc_vector ();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (gangs_actual < 1)
|
|
|
|
__builtin_abort ();
|
|
|
|
if (gangs_min != 0 || gangs_max != gangs_actual - 1
|
|
|
|
|| workers_min != 0 || workers_max != 0
|
|
|
|
|| vectors_min != 0 || vectors_max != 0)
|
|
|
|
__builtin_abort ();
|
2015-11-04 21:48:05 +01:00
|
|
|
}
|
|
|
|
|
2017-05-23 11:16:05 +02:00
|
|
|
/* GR, WP, VS. */
|
|
|
|
{
|
|
|
|
/* We try with an outrageously large value. */
|
|
|
|
#define WORKERS 2 << 20
|
|
|
|
int workers_actual = WORKERS;
|
|
|
|
int gangs_min, gangs_max, workers_min, workers_max, vectors_min, vectors_max;
|
|
|
|
gangs_min = workers_min = vectors_min = INT_MAX;
|
|
|
|
gangs_max = workers_max = vectors_max = INT_MIN;
|
2019-02-22 11:51:35 +01:00
|
|
|
#pragma acc parallel copy (workers_actual) /* { dg-warning "using num_workers \\(32\\), ignoring 2097152" "" { target openacc_nvidia_accel_selected } } */ \
|
2017-05-23 11:16:05 +02:00
|
|
|
num_workers (WORKERS)
|
|
|
|
{
|
|
|
|
if (acc_on_device (acc_device_host))
|
|
|
|
{
|
|
|
|
/* We're actually executing with num_workers (1). */
|
|
|
|
workers_actual = 1;
|
|
|
|
}
|
|
|
|
else if (acc_on_device (acc_device_nvidia))
|
|
|
|
{
|
|
|
|
/* The GCC nvptx back end enforces num_workers (32). */
|
|
|
|
workers_actual = 32;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
__builtin_abort ();
|
|
|
|
#pragma acc loop worker reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
|
|
|
|
for (int i = 100 * workers_actual; i > -100 * workers_actual; --i)
|
|
|
|
{
|
|
|
|
gangs_min = gangs_max = acc_gang ();
|
|
|
|
workers_min = workers_max = acc_worker ();
|
|
|
|
vectors_min = vectors_max = acc_vector ();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (workers_actual < 1)
|
|
|
|
__builtin_abort ();
|
|
|
|
if (gangs_min != 0 || gangs_max != 0
|
|
|
|
|| workers_min != 0 || workers_max != workers_actual - 1
|
|
|
|
|| vectors_min != 0 || vectors_max != 0)
|
|
|
|
__builtin_abort ();
|
|
|
|
#undef WORKERS
|
|
|
|
}
|
|
|
|
|
|
|
|
/* GR, WP, VS. */
|
|
|
|
{
|
|
|
|
/* We try with an outrageously large value. */
|
|
|
|
int workers = 2 << 20;
|
|
|
|
/* For nvptx offloading, this one will not result in "using num_workers
|
|
|
|
(32), ignoring runtime setting", and will in fact try to launch with
|
|
|
|
"num_workers (workers)", which will run into "libgomp: cuLaunchKernel
|
|
|
|
error: invalid argument". So, limit ourselves here. */
|
|
|
|
if (acc_get_device_type () == acc_device_nvidia)
|
|
|
|
workers = 32;
|
|
|
|
int workers_actual = workers;
|
|
|
|
int gangs_min, gangs_max, workers_min, workers_max, vectors_min, vectors_max;
|
|
|
|
gangs_min = workers_min = vectors_min = INT_MAX;
|
|
|
|
gangs_max = workers_max = vectors_max = INT_MIN;
|
|
|
|
#pragma acc parallel copy (workers_actual) \
|
|
|
|
num_workers (workers)
|
|
|
|
{
|
|
|
|
if (acc_on_device (acc_device_host))
|
|
|
|
{
|
|
|
|
/* We're actually executing with num_workers (1). */
|
|
|
|
workers_actual = 1;
|
|
|
|
}
|
|
|
|
else if (acc_on_device (acc_device_nvidia))
|
|
|
|
{
|
|
|
|
/* We're actually executing with num_workers (32). */
|
|
|
|
/* workers_actual = 32; */
|
|
|
|
}
|
|
|
|
else
|
|
|
|
__builtin_abort ();
|
|
|
|
#pragma acc loop worker reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
|
|
|
|
for (int i = 100 * workers_actual; i > -100 * workers_actual; --i)
|
|
|
|
{
|
|
|
|
gangs_min = gangs_max = acc_gang ();
|
|
|
|
workers_min = workers_max = acc_worker ();
|
|
|
|
vectors_min = vectors_max = acc_vector ();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (workers_actual < 1)
|
|
|
|
__builtin_abort ();
|
|
|
|
if (gangs_min != 0 || gangs_max != 0
|
|
|
|
|| workers_min != 0 || workers_max != workers_actual - 1
|
|
|
|
|| vectors_min != 0 || vectors_max != 0)
|
|
|
|
__builtin_abort ();
|
|
|
|
}
|
|
|
|
|
|
|
|
/* GR, WS, VP. */
|
|
|
|
{
|
|
|
|
/* We try with an outrageously large value. */
|
|
|
|
#define VECTORS 2 << 20
|
|
|
|
int vectors_actual = VECTORS;
|
|
|
|
int gangs_min, gangs_max, workers_min, workers_max, vectors_min, vectors_max;
|
|
|
|
gangs_min = workers_min = vectors_min = INT_MAX;
|
|
|
|
gangs_max = workers_max = vectors_max = INT_MIN;
|
2019-02-22 11:51:35 +01:00
|
|
|
#pragma acc parallel copy (vectors_actual) /* { dg-warning "using vector_length \\(1024\\), ignoring 2097152" "" { target openacc_nvidia_accel_selected } } */ \
|
2017-05-23 11:16:05 +02:00
|
|
|
vector_length (VECTORS)
|
|
|
|
{
|
|
|
|
if (acc_on_device (acc_device_host))
|
|
|
|
{
|
|
|
|
/* We're actually executing with vector_length (1). */
|
|
|
|
vectors_actual = 1;
|
|
|
|
}
|
|
|
|
else if (acc_on_device (acc_device_nvidia))
|
|
|
|
{
|
|
|
|
/* The GCC nvptx back end enforces vector_length (32). */
|
2019-01-12 23:17:42 +01:00
|
|
|
vectors_actual = 1024;
|
2017-05-23 11:16:05 +02:00
|
|
|
}
|
|
|
|
else
|
|
|
|
__builtin_abort ();
|
|
|
|
#pragma acc loop vector reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
|
|
|
|
for (int i = 100 * vectors_actual; i > -100 * vectors_actual; --i)
|
|
|
|
{
|
|
|
|
gangs_min = gangs_max = acc_gang ();
|
|
|
|
workers_min = workers_max = acc_worker ();
|
|
|
|
vectors_min = vectors_max = acc_vector ();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (vectors_actual < 1)
|
|
|
|
__builtin_abort ();
|
|
|
|
if (gangs_min != 0 || gangs_max != 0
|
|
|
|
|| workers_min != 0 || workers_max != 0
|
|
|
|
|| vectors_min != 0 || vectors_max != vectors_actual - 1)
|
|
|
|
__builtin_abort ();
|
|
|
|
#undef VECTORS
|
|
|
|
}
|
|
|
|
|
|
|
|
/* GR, WS, VP. */
|
|
|
|
{
|
|
|
|
/* We try with an outrageously large value. */
|
|
|
|
int vectors = 2 << 20;
|
|
|
|
int vectors_actual = vectors;
|
|
|
|
int gangs_min, gangs_max, workers_min, workers_max, vectors_min, vectors_max;
|
|
|
|
gangs_min = workers_min = vectors_min = INT_MAX;
|
|
|
|
gangs_max = workers_max = vectors_max = INT_MIN;
|
2019-02-22 11:51:35 +01:00
|
|
|
#pragma acc parallel copy (vectors_actual) /* { dg-warning "using vector_length \\(32\\), ignoring runtime setting" "" { target openacc_nvidia_accel_selected } } */ \
|
2017-05-23 11:16:05 +02:00
|
|
|
vector_length (vectors)
|
|
|
|
{
|
|
|
|
if (acc_on_device (acc_device_host))
|
|
|
|
{
|
|
|
|
/* We're actually executing with vector_length (1). */
|
|
|
|
vectors_actual = 1;
|
|
|
|
}
|
|
|
|
else if (acc_on_device (acc_device_nvidia))
|
|
|
|
{
|
|
|
|
/* The GCC nvptx back end enforces vector_length (32). */
|
|
|
|
vectors_actual = 32;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
__builtin_abort ();
|
|
|
|
#pragma acc loop vector reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
|
|
|
|
for (int i = 100 * vectors_actual; i > -100 * vectors_actual; --i)
|
|
|
|
{
|
|
|
|
gangs_min = gangs_max = acc_gang ();
|
|
|
|
workers_min = workers_max = acc_worker ();
|
|
|
|
vectors_min = vectors_max = acc_vector ();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (vectors_actual < 1)
|
|
|
|
__builtin_abort ();
|
|
|
|
if (gangs_min != 0 || gangs_max != 0
|
|
|
|
|| workers_min != 0 || workers_max != 0
|
|
|
|
|| vectors_min != 0 || vectors_max != vectors_actual - 1)
|
|
|
|
__builtin_abort ();
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/* Composition of GP, WP, VP. */
|
|
|
|
{
|
|
|
|
int gangs = 12345;
|
|
|
|
/* With nvptx offloading, multi-level reductions apparently are very slow
|
|
|
|
in the following case. So, limit ourselves here. */
|
|
|
|
if (acc_get_device_type () == acc_device_nvidia)
|
|
|
|
gangs = 3;
|
|
|
|
int gangs_actual = gangs;
|
|
|
|
#define WORKERS 3
|
|
|
|
int workers_actual = WORKERS;
|
|
|
|
#define VECTORS 11
|
|
|
|
int vectors_actual = VECTORS;
|
|
|
|
int gangs_min, gangs_max, workers_min, workers_max, vectors_min, vectors_max;
|
|
|
|
gangs_min = workers_min = vectors_min = INT_MAX;
|
|
|
|
gangs_max = workers_max = vectors_max = INT_MIN;
|
2019-02-22 11:51:35 +01:00
|
|
|
#pragma acc parallel copy (gangs_actual, workers_actual, vectors_actual) /* { dg-warning "using vector_length \\(32\\), ignoring 11" "" { target openacc_nvidia_accel_selected } } */ \
|
2017-05-23 11:16:05 +02:00
|
|
|
num_gangs (gangs) \
|
|
|
|
num_workers (WORKERS) \
|
|
|
|
vector_length (VECTORS)
|
|
|
|
{
|
|
|
|
if (acc_on_device (acc_device_host))
|
|
|
|
{
|
|
|
|
/* We're actually executing with num_gangs (1), num_workers (1),
|
|
|
|
vector_length (1). */
|
|
|
|
gangs_actual = 1;
|
|
|
|
workers_actual = 1;
|
|
|
|
vectors_actual = 1;
|
|
|
|
}
|
|
|
|
else if (acc_on_device (acc_device_nvidia))
|
|
|
|
{
|
|
|
|
/* The GCC nvptx back end enforces vector_length (32). */
|
|
|
|
vectors_actual = 32;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
__builtin_abort ();
|
|
|
|
#pragma acc loop gang reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
|
|
|
|
for (int i = 100 * gangs_actual; i > -100 * gangs_actual; --i)
|
|
|
|
#pragma acc loop worker reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
|
|
|
|
for (int j = 100 * workers_actual; j > -100 * workers_actual; --j)
|
|
|
|
#pragma acc loop vector reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
|
|
|
|
for (int k = 100 * vectors_actual; k > -100 * vectors_actual; --k)
|
|
|
|
{
|
|
|
|
gangs_min = gangs_max = acc_gang ();
|
|
|
|
workers_min = workers_max = acc_worker ();
|
|
|
|
vectors_min = vectors_max = acc_vector ();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (gangs_min != 0 || gangs_max != gangs_actual - 1
|
|
|
|
|| workers_min != 0 || workers_max != workers_actual - 1
|
|
|
|
|| vectors_min != 0 || vectors_max != vectors_actual - 1)
|
|
|
|
__builtin_abort ();
|
|
|
|
#undef VECTORS
|
|
|
|
#undef WORKERS
|
|
|
|
}
|
|
|
|
|
|
|
|
|
Add OpenACC 2.6 `serial' construct support
The `serial' construct (cf. section 2.5.3 of the OpenACC 2.6 standard)
is equivalent to a `parallel' construct with clauses `num_gangs(1)
num_workers(1) vector_length(1)' implied.
These clauses are therefore not supported with the `serial'
construct. All the remaining clauses accepted with `parallel' are also
accepted with `serial'.
The `serial' construct is implemented like `parallel', except for
hardcoding dimensions rather than taking them from the relevant
clauses, in `expand_omp_target'.
Separate codes are used to denote the `serial' construct throughout the
middle end, even though the mapping of `serial' to an equivalent
`parallel' construct could have been done in the individual language
frontends. In particular, this allows to distinguish between compute
constructs in warnings, error messages, dumps etc.
2019-11-12 Maciej W. Rozycki <macro@codesourcery.com>
Tobias Burnus <tobias@codesourcery.com>
Frederik Harwath <frederik@codesourcery.com>
Thomas Schwinge <thomas@codesourcery.com>
gcc/
* gimple.h (gf_mask): Add GF_OMP_TARGET_KIND_OACC_SERIAL
enumeration constant.
(is_gimple_omp_oacc): Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
(is_gimple_omp_offloaded): Likewise.
* gimplify.c (omp_region_type): Add ORT_ACC_SERIAL enumeration
constant. Adjust the value of ORT_NONE accordingly.
(is_gimple_stmt): Handle OACC_SERIAL.
(oacc_default_clause): Handle ORT_ACC_SERIAL.
(gomp_needs_data_present): Likewise.
(gimplify_adjust_omp_clauses): Likewise.
(gimplify_omp_workshare): Handle OACC_SERIAL.
(gimplify_expr): Likewise.
* omp-expand.c (expand_omp_target):
Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
(build_omp_regions_1, omp_make_gimple_edges): Likewise.
* omp-low.c (is_oacc_parallel): Rename function to...
(is_oacc_parallel_or_serial): ... this.
Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
(scan_sharing_clauses): Adjust accordingly.
(scan_omp_for): Likewise.
(lower_oacc_head_mark): Likewise.
(convert_from_firstprivate_int): Likewise.
(lower_omp_target): Likewise.
(check_omp_nesting_restrictions): Handle
GF_OMP_TARGET_KIND_OACC_SERIAL.
(lower_oacc_reductions): Likewise.
(lower_omp_target): Likewise.
* tree.def (OACC_SERIAL): New tree code.
* tree-pretty-print.c (dump_generic_node): Handle OACC_SERIAL.
* doc/generic.texi (OpenACC): Document OACC_SERIAL.
gcc/c-family/
* c-pragma.h (pragma_kind): Add PRAGMA_OACC_SERIAL enumeration
constant.
* c-pragma.c (oacc_pragmas): Add "serial" entry.
gcc/c/
* c-parser.c (OACC_SERIAL_CLAUSE_MASK): New macro.
(c_parser_oacc_kernels_parallel): Rename function to...
(c_parser_oacc_compute): ... this. Handle PRAGMA_OACC_SERIAL.
(c_parser_omp_construct): Update accordingly.
gcc/cp/
* constexpr.c (potential_constant_expression_1): Handle
OACC_SERIAL.
* parser.c (OACC_SERIAL_CLAUSE_MASK): New macro.
(cp_parser_oacc_kernels_parallel): Rename function to...
(cp_parser_oacc_compute): ... this. Handle PRAGMA_OACC_SERIAL.
(cp_parser_omp_construct): Update accordingly.
(cp_parser_pragma): Handle PRAGMA_OACC_SERIAL. Fix alphabetic
order.
* pt.c (tsubst_expr): Handle OACC_SERIAL.
gcc/fortran/
* gfortran.h (gfc_statement): Add ST_OACC_SERIAL_LOOP,
ST_OACC_END_SERIAL_LOOP, ST_OACC_SERIAL and ST_OACC_END_SERIAL
enumeration constants.
(gfc_exec_op): Add EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL
enumeration constants.
* match.h (gfc_match_oacc_serial): New prototype.
(gfc_match_oacc_serial_loop): Likewise.
* dump-parse-tree.c (show_omp_node, show_code_node): Handle
EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL.
* match.c (match_exit_cycle): Handle EXEC_OACC_SERIAL_LOOP.
* openmp.c (OACC_SERIAL_CLAUSES): New macro.
(gfc_match_oacc_serial_loop): New function.
(gfc_match_oacc_serial): Likewise.
(oacc_is_loop): Handle EXEC_OACC_SERIAL_LOOP.
(resolve_omp_clauses): Handle EXEC_OACC_SERIAL.
(oacc_code_to_statement): Handle EXEC_OACC_SERIAL and
EXEC_OACC_SERIAL_LOOP.
(gfc_resolve_oacc_directive): Likewise.
* parse.c (decode_oacc_directive) <'s'>: Add case for "serial"
and "serial loop".
(next_statement): Handle ST_OACC_SERIAL_LOOP and ST_OACC_SERIAL.
(gfc_ascii_statement): Likewise. Handle ST_OACC_END_SERIAL_LOOP
and ST_OACC_END_SERIAL.
(parse_oacc_structured_block): Handle ST_OACC_SERIAL.
(parse_oacc_loop): Handle ST_OACC_SERIAL_LOOP and
ST_OACC_END_SERIAL_LOOP.
(parse_executable): Handle ST_OACC_SERIAL_LOOP and
ST_OACC_SERIAL.
(is_oacc): Handle EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL.
* resolve.c (gfc_resolve_blocks, gfc_resolve_code): Likewise.
* st.c (gfc_free_statement): Likewise.
* trans-openmp.c (gfc_trans_oacc_construct): Handle
EXEC_OACC_SERIAL.
(gfc_trans_oacc_combined_directive): Handle
EXEC_OACC_SERIAL_LOOP.
(gfc_trans_oacc_directive): Handle EXEC_OACC_SERIAL_LOOP and
EXEC_OACC_SERIAL.
* trans.c (trans_code): Likewise.
gcc/testsuite/
* c-c++-common/goacc/parallel-dims.c: New test.
* gfortran.dg/goacc/parallel-dims.f90: New test.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: New test.
* testsuite/libgomp.oacc-fortran/parallel-dims-aux.c: New test.
* testsuite/libgomp.oacc-fortran/parallel-dims.f89: New test.
* testsuite/libgomp.oacc-fortran/parallel-dims-2.f90: New test.
Reviewed-by: Thomas Schwinge <thomas@codesourcery.com>
Co-Authored-By: Frederik Harwath <frederik@codesourcery.com>
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
Co-Authored-By: Tobias Burnus <tobias@codesourcery.com>
From-SVN: r278082
2019-11-12 09:45:35 +01:00
|
|
|
/* OpenACC kernels construct. */
|
|
|
|
|
2017-05-23 11:16:05 +02:00
|
|
|
/* We can't test parallelized OpenACC kernels constructs in this way: use of
|
|
|
|
the acc_gang, acc_worker, acc_vector functions will make the construct
|
|
|
|
unparallelizable. */
|
|
|
|
|
|
|
|
|
|
|
|
/* Unparallelized OpenACC kernels constructs must get launched as 1 x 1 x 1
|
|
|
|
kernels. */
|
|
|
|
{
|
|
|
|
int gangs_min, gangs_max, workers_min, workers_max, vectors_min, vectors_max;
|
|
|
|
gangs_min = workers_min = vectors_min = INT_MAX;
|
|
|
|
gangs_max = workers_max = vectors_max = INT_MIN;
|
|
|
|
#pragma acc kernels
|
|
|
|
{
|
|
|
|
/* This is to make the OpenACC kernels construct unparallelizable. */
|
|
|
|
asm volatile ("" : : : "memory");
|
|
|
|
|
|
|
|
#pragma acc loop reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
|
|
|
|
for (int i = 100; i > -100; --i)
|
|
|
|
{
|
|
|
|
gangs_min = gangs_max = acc_gang ();
|
|
|
|
workers_min = workers_max = acc_worker ();
|
|
|
|
vectors_min = vectors_max = acc_vector ();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (gangs_min != 0 || gangs_max != 1 - 1
|
|
|
|
|| workers_min != 0 || workers_max != 1 - 1
|
|
|
|
|| vectors_min != 0 || vectors_max != 1 - 1)
|
|
|
|
__builtin_abort ();
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2017-05-23 17:47:32 +02:00
|
|
|
/* Unparallelized OpenACC kernels constructs must get launched as 1 x 1 x 1
|
|
|
|
kernels even when there are explicit num_gangs, num_workers, or
|
|
|
|
vector_length clauses. */
|
|
|
|
{
|
|
|
|
int gangs = 5;
|
|
|
|
#define WORKERS 5
|
|
|
|
#define VECTORS 13
|
|
|
|
int gangs_min, gangs_max, workers_min, workers_max, vectors_min, vectors_max;
|
|
|
|
gangs_min = workers_min = vectors_min = INT_MAX;
|
|
|
|
gangs_max = workers_max = vectors_max = INT_MIN;
|
|
|
|
#pragma acc kernels \
|
|
|
|
num_gangs (gangs) \
|
|
|
|
num_workers (WORKERS) \
|
|
|
|
vector_length (VECTORS)
|
|
|
|
{
|
|
|
|
/* This is to make the OpenACC kernels construct unparallelizable. */
|
|
|
|
asm volatile ("" : : : "memory");
|
|
|
|
|
|
|
|
#pragma acc loop reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
|
|
|
|
for (int i = 100; i > -100; --i)
|
|
|
|
{
|
|
|
|
gangs_min = gangs_max = acc_gang ();
|
|
|
|
workers_min = workers_max = acc_worker ();
|
|
|
|
vectors_min = vectors_max = acc_vector ();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (gangs_min != 0 || gangs_max != 1 - 1
|
|
|
|
|| workers_min != 0 || workers_max != 1 - 1
|
|
|
|
|| vectors_min != 0 || vectors_max != 1 - 1)
|
|
|
|
__builtin_abort ();
|
|
|
|
#undef VECTORS
|
|
|
|
#undef WORKERS
|
|
|
|
}
|
|
|
|
|
|
|
|
|
Add OpenACC 2.6 `serial' construct support
The `serial' construct (cf. section 2.5.3 of the OpenACC 2.6 standard)
is equivalent to a `parallel' construct with clauses `num_gangs(1)
num_workers(1) vector_length(1)' implied.
These clauses are therefore not supported with the `serial'
construct. All the remaining clauses accepted with `parallel' are also
accepted with `serial'.
The `serial' construct is implemented like `parallel', except for
hardcoding dimensions rather than taking them from the relevant
clauses, in `expand_omp_target'.
Separate codes are used to denote the `serial' construct throughout the
middle end, even though the mapping of `serial' to an equivalent
`parallel' construct could have been done in the individual language
frontends. In particular, this allows to distinguish between compute
constructs in warnings, error messages, dumps etc.
2019-11-12 Maciej W. Rozycki <macro@codesourcery.com>
Tobias Burnus <tobias@codesourcery.com>
Frederik Harwath <frederik@codesourcery.com>
Thomas Schwinge <thomas@codesourcery.com>
gcc/
* gimple.h (gf_mask): Add GF_OMP_TARGET_KIND_OACC_SERIAL
enumeration constant.
(is_gimple_omp_oacc): Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
(is_gimple_omp_offloaded): Likewise.
* gimplify.c (omp_region_type): Add ORT_ACC_SERIAL enumeration
constant. Adjust the value of ORT_NONE accordingly.
(is_gimple_stmt): Handle OACC_SERIAL.
(oacc_default_clause): Handle ORT_ACC_SERIAL.
(gomp_needs_data_present): Likewise.
(gimplify_adjust_omp_clauses): Likewise.
(gimplify_omp_workshare): Handle OACC_SERIAL.
(gimplify_expr): Likewise.
* omp-expand.c (expand_omp_target):
Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
(build_omp_regions_1, omp_make_gimple_edges): Likewise.
* omp-low.c (is_oacc_parallel): Rename function to...
(is_oacc_parallel_or_serial): ... this.
Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
(scan_sharing_clauses): Adjust accordingly.
(scan_omp_for): Likewise.
(lower_oacc_head_mark): Likewise.
(convert_from_firstprivate_int): Likewise.
(lower_omp_target): Likewise.
(check_omp_nesting_restrictions): Handle
GF_OMP_TARGET_KIND_OACC_SERIAL.
(lower_oacc_reductions): Likewise.
(lower_omp_target): Likewise.
* tree.def (OACC_SERIAL): New tree code.
* tree-pretty-print.c (dump_generic_node): Handle OACC_SERIAL.
* doc/generic.texi (OpenACC): Document OACC_SERIAL.
gcc/c-family/
* c-pragma.h (pragma_kind): Add PRAGMA_OACC_SERIAL enumeration
constant.
* c-pragma.c (oacc_pragmas): Add "serial" entry.
gcc/c/
* c-parser.c (OACC_SERIAL_CLAUSE_MASK): New macro.
(c_parser_oacc_kernels_parallel): Rename function to...
(c_parser_oacc_compute): ... this. Handle PRAGMA_OACC_SERIAL.
(c_parser_omp_construct): Update accordingly.
gcc/cp/
* constexpr.c (potential_constant_expression_1): Handle
OACC_SERIAL.
* parser.c (OACC_SERIAL_CLAUSE_MASK): New macro.
(cp_parser_oacc_kernels_parallel): Rename function to...
(cp_parser_oacc_compute): ... this. Handle PRAGMA_OACC_SERIAL.
(cp_parser_omp_construct): Update accordingly.
(cp_parser_pragma): Handle PRAGMA_OACC_SERIAL. Fix alphabetic
order.
* pt.c (tsubst_expr): Handle OACC_SERIAL.
gcc/fortran/
* gfortran.h (gfc_statement): Add ST_OACC_SERIAL_LOOP,
ST_OACC_END_SERIAL_LOOP, ST_OACC_SERIAL and ST_OACC_END_SERIAL
enumeration constants.
(gfc_exec_op): Add EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL
enumeration constants.
* match.h (gfc_match_oacc_serial): New prototype.
(gfc_match_oacc_serial_loop): Likewise.
* dump-parse-tree.c (show_omp_node, show_code_node): Handle
EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL.
* match.c (match_exit_cycle): Handle EXEC_OACC_SERIAL_LOOP.
* openmp.c (OACC_SERIAL_CLAUSES): New macro.
(gfc_match_oacc_serial_loop): New function.
(gfc_match_oacc_serial): Likewise.
(oacc_is_loop): Handle EXEC_OACC_SERIAL_LOOP.
(resolve_omp_clauses): Handle EXEC_OACC_SERIAL.
(oacc_code_to_statement): Handle EXEC_OACC_SERIAL and
EXEC_OACC_SERIAL_LOOP.
(gfc_resolve_oacc_directive): Likewise.
* parse.c (decode_oacc_directive) <'s'>: Add case for "serial"
and "serial loop".
(next_statement): Handle ST_OACC_SERIAL_LOOP and ST_OACC_SERIAL.
(gfc_ascii_statement): Likewise. Handle ST_OACC_END_SERIAL_LOOP
and ST_OACC_END_SERIAL.
(parse_oacc_structured_block): Handle ST_OACC_SERIAL.
(parse_oacc_loop): Handle ST_OACC_SERIAL_LOOP and
ST_OACC_END_SERIAL_LOOP.
(parse_executable): Handle ST_OACC_SERIAL_LOOP and
ST_OACC_SERIAL.
(is_oacc): Handle EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL.
* resolve.c (gfc_resolve_blocks, gfc_resolve_code): Likewise.
* st.c (gfc_free_statement): Likewise.
* trans-openmp.c (gfc_trans_oacc_construct): Handle
EXEC_OACC_SERIAL.
(gfc_trans_oacc_combined_directive): Handle
EXEC_OACC_SERIAL_LOOP.
(gfc_trans_oacc_directive): Handle EXEC_OACC_SERIAL_LOOP and
EXEC_OACC_SERIAL.
* trans.c (trans_code): Likewise.
gcc/testsuite/
* c-c++-common/goacc/parallel-dims.c: New test.
* gfortran.dg/goacc/parallel-dims.f90: New test.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: New test.
* testsuite/libgomp.oacc-fortran/parallel-dims-aux.c: New test.
* testsuite/libgomp.oacc-fortran/parallel-dims.f89: New test.
* testsuite/libgomp.oacc-fortran/parallel-dims-2.f90: New test.
Reviewed-by: Thomas Schwinge <thomas@codesourcery.com>
Co-Authored-By: Frederik Harwath <frederik@codesourcery.com>
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
Co-Authored-By: Tobias Burnus <tobias@codesourcery.com>
From-SVN: r278082
2019-11-12 09:45:35 +01:00
|
|
|
/* OpenACC serial construct. */
|
|
|
|
|
|
|
|
/* GR, WS, VS. */
|
|
|
|
{
|
|
|
|
int gangs_min, gangs_max, workers_min, workers_max, vectors_min, vectors_max;
|
|
|
|
gangs_min = workers_min = vectors_min = INT_MAX;
|
|
|
|
gangs_max = workers_max = vectors_max = INT_MIN;
|
|
|
|
#pragma acc serial /* { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } } */ \
|
|
|
|
reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
|
|
|
|
{
|
|
|
|
for (int i = 100; i > -100; i--)
|
|
|
|
{
|
|
|
|
gangs_min = gangs_max = acc_gang ();
|
|
|
|
workers_min = workers_max = acc_worker ();
|
|
|
|
vectors_min = vectors_max = acc_vector ();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (gangs_min != 0 || gangs_max != 1 - 1
|
|
|
|
|| workers_min != 0 || workers_max != 1 - 1
|
|
|
|
|| vectors_min != 0 || vectors_max != 1 - 1)
|
|
|
|
__builtin_abort ();
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Composition of GP, WP, VP. */
|
|
|
|
{
|
|
|
|
int vectors_actual = 1; /* Implicit 'vector_length (1)' clause. */
|
|
|
|
int gangs_min, gangs_max, workers_min, workers_max, vectors_min, vectors_max;
|
|
|
|
gangs_min = workers_min = vectors_min = INT_MAX;
|
|
|
|
gangs_max = workers_max = vectors_max = INT_MIN;
|
|
|
|
#pragma acc serial copy (vectors_actual) /* { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } } */ \
|
|
|
|
copy (gangs_min, gangs_max, workers_min, workers_max, vectors_min, vectors_max)
|
|
|
|
{
|
|
|
|
if (acc_on_device (acc_device_nvidia))
|
|
|
|
{
|
|
|
|
/* The GCC nvptx back end enforces vector_length (32). */
|
|
|
|
/* It's unclear if that's actually permissible here;
|
|
|
|
<https://github.com/OpenACC/openacc-spec/issues/238> "OpenACC
|
|
|
|
'serial' construct might not actually be serial". */
|
|
|
|
vectors_actual = 32;
|
|
|
|
}
|
|
|
|
#pragma acc loop gang reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
|
|
|
|
for (int i = 100; i > -100; i--)
|
|
|
|
#pragma acc loop worker reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
|
|
|
|
for (int j = 100; j > -100; j--)
|
|
|
|
#pragma acc loop vector reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
|
|
|
|
for (int k = 100 * vectors_actual; k > -100 * vectors_actual; k--)
|
|
|
|
{
|
|
|
|
gangs_min = gangs_max = acc_gang ();
|
|
|
|
workers_min = workers_max = acc_worker ();
|
|
|
|
vectors_min = vectors_max = acc_vector ();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (acc_get_device_type () == acc_device_nvidia)
|
|
|
|
{
|
|
|
|
if (vectors_actual != 32)
|
|
|
|
__builtin_abort ();
|
|
|
|
}
|
|
|
|
else
|
|
|
|
if (vectors_actual != 1)
|
|
|
|
__builtin_abort ();
|
|
|
|
if (gangs_min != 0 || gangs_max != 1 - 1
|
|
|
|
|| workers_min != 0 || workers_max != 1 - 1
|
|
|
|
|| vectors_min != 0 || vectors_max != vectors_actual - 1)
|
|
|
|
__builtin_abort ();
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2015-11-04 21:48:05 +01:00
|
|
|
return 0;
|
|
|
|
}
|