PR middle-end/91301
* gimplify.c (gimplify_omp_for): If for class iterator on
distribute parallel for there is no data sharing clause
on inner_for_stmt, look for private clause on combined
parallel too and if found, move it to inner_for_stmt.
* testsuite/libgomp.c++/for-27.C: New test.
From-SVN: r273922
2019-07-23 Steven G. Kargl <kargl@gcc.gnu.org>
* arith.c (gfc_convert_integer, gfc_convert_real, gfc_convert_complex):
Move to ...
* primary.c (convert_integer, convert_real, convert_complex): ... here.
Rename and make static functions.
(match_integer_constant): Use convert_integer
(match_real_constant): Use convert_real.
(match_complex_constant: Use convert_complex.
* arith.h (gfc_convert_integer, gfc_convert_real, gfc_convert_complex):
Remove prototypes.
* array.c (match_array_cons_element): A BOZ cannot be a data
statement value. Jump to a common exit point.
* check.c (gfc_invalid_boz): New function. Emit error or warning
for a BOZ in an invalid context.
(boz_args_check): Move to top of file to prevent need of forward
declaration.
(is_boz_constant): New function. Check that BOZ expr is constant.
(gfc_b z2real): New function. In-place conversion of BOZ literal
constant to REAL in accordance to F2018.
(gfc_boz2int): New function. In-place conversion of BOZ literal
onstant to INTEGER in accordance to F2018.
(gfc_check_achar, gfc_check_char, gfc_check_float): Use gfc_invalid_boz.
Convert BOZ as needed.
(gfc_check_bge_bgt_ble_blt): Enforce F2018 requirements on BGE,
BGT, BLE, and BLT intrinsic functions.
(gfc_check_cmplx): Re-organize to check kind, if present, first.
Convert BOZ real and/or imaginary parts as needed in accordance to
F2018.
(gfc_check_complex): Use gfc_invalid_boz. Convert BOZ as needed.
(gfc_check_dcmplx, gfc_check_dble ): Convert BOZ as needed.
(gfc_check_dshift): Make dshift[lr] conform to F2018 standard.
gfc_check_float (gfc_expr *a)
(gfc_check_iand_ieor_ior): Make IAND, IEOR, and IOR conform to
F2018 standard.
(gfc_check_int): Conform to F2018 standard.
(gfc_check_intconv): Deprecate SHORT and LONG aliases for INT2 and
INT. Simply return for a BOZ argument. See gfc_simplify_intconv.
(gfc_check_merge_bits): Make MERGE_BITS conform to Fortran 2018
standard.
(gfc_check_real): Remove incorrect comment. Check kind, if present,
first. Simply return for a BOZ argument. See gfc_simplify_real.
(gfc_check_and): Re-do error handling for BOZ arguments. Remove
special casing ts.type != BT_INTEGER or BT_LOGICAL.
* decl.c (match_old_style_init): Check for BOZ in old-style
initialization. Issue error or warning depending on
-fallow-invalid-boz option. Issue error if variable is not an
INTEGER or REAL and the value is BOZ.
* expr.c (gfc_copy_expr): Copy a BT_BOZ gfc_expr.
(gfc_check_assign): Re-do error handling for a BOZ in an assignment
statement. Do in-place conversion of RHS based on LHS type of
INTEGER or REAL.
* gfortran.h (gfc_expr): Add a boz component. Remove is_boz component.
(gfc_boz2int, gfc_boz2real, gfc_invalid_boz): New prototypes.
* interface.c (gfc_extend_assign): Guard against replacing an
intrinsic involving a BOZ literal constant on RHS.
* invoke.texi: Doument -fallow-invalid-boz.
* lang.opt: New option. -fallow-invalid-boz.
* libgfortran.h (bt): Elevate BOZ to a basic type.
* misc.c (gfc_basic_typename, gfc_typename): Translate BT_BOZ to BOZ.
* primary.c (convert_integer, convert_real, convert_complex): to here.
Rename and make static functions.
* primary.c(match_boz_constant): Rewrite parsing of a BOZ. Re-do
error handling. Deprecate 'X' for hexidecimal and postfix notation.
Use -fallow-invalid-boz and gfc_invalid_boz to accept deprecated code.
* resolve.c (resolve_ordinary_assign): Rework a RHS that is a
BOZ literal constant. Use gfc_invalid_boz to allow previous
nonstandard behavior. Remove range checking of BOZ conversion.
* simplify.c (convert_boz): Remove function.
(simplify_cmplx): Remove conversion of BOZ constants, because
conversion is done in gfc_check_cmplx.
(gfc_simplify_float): Remove conversion of BOZ constant, because
conversion is done in gfc_check_float.
(simplify_intconv): Use gfc_boz2int to convert BOZ to INTEGER.
Remove range checking for BOZ conversion.
(gfc_simplify_real): Use k, if present, to determine kind. Convert
BOZ to REAL. Remove range checking for BOZ conversion.
target-memory.c (gfc_convert_boz): Rewrite to deal with convert of
a BOZ to a REAL value.
2019-07-23 Steven G. Kargl <kargl@gcc.gnu.org>
* gfortran.dg/achar_5.f90: Fix for new BOZ handling.
* arithmetic_overflow_1.f90: Ditto.
* gfortran.dg/boz_11.f90: Ditto.
* gfortran.dg/boz_12.f90: Ditto.
* gfortran.dg/boz_4.f90: Ditto.
* gfortran.dg/boz_5.f90: Ditto.
* gfortran.dg/boz_6.f90: Ditto.
* gfortran.dg/boz_7.f90: Ditto.
* gfortran.dg/boz_8.f90: Ditto.
* gfortran.dg/dec_structure_6.f90: Ditto.
* gfortran.dg/dec_union_1.f90: Ditto.
* gfortran.dg/dec_union_2.f90: Ditto.
* gfortran.dg/dec_union_5.f90: Ditto.
* gfortran.dg/dshift_3.f90: Ditto.
* gfortran.dg/gnu_logical_2.f90: Ditto.
* gfortran.dg/int_conv_1.f90: Ditto.
* gfortran.dg/ishft_1.f90: Ditto.
* gfortran.dg/nan_4.f90: Ditto.
* gfortran.dg/no_range_check_3.f90: Ditto.
* gfortran.dg/pr16433.f: Ditto.
* gfortran.dg/pr44491.f90: Ditto.
* gfortran.dg/pr58027.f90: Ditto.
* gfortran.dg/pr81509_2.f90: Ditto.
* gfortran.dg/unf_io_convert_1.f90: Ditto.
* gfortran.dg/unf_io_convert_2.f90: Ditto.
* gfortran.fortran-torture/execute/intrinsic_fraction_exponent.f90:
Ditto.
* gfortran.fortran-torture/execute/intrinsic_mvbits.f90: Ditto.
* gfortran.fortran-torture/execute/intrinsic_nearest.f90: Ditto.
* gfortran.fortran-torture/execute/seq_io.f90: Ditto.
* gfortran.dg/gnu_logical_1.F: Delete test.
* gfortran.dg/merge_bits_3.f90: New test.
* gfortran.dg/merge_bits_3.f90: Ditto.
* gfortran.dg/boz_int.f90: Ditto.
* gfortran.dg/boz_bge.f90: Ditto.
* gfortran.dg/boz_complex_1.f90: Ditto.
* gfortran.dg/boz_complex_2.f90: Ditto.
* gfortran.dg/boz_complex_3.f90: Ditto.
* gfortran.dg/boz_dble.f90: Ditto.
* gfortran.dg/boz_dshift_1.f90: Ditto.
* gfortran.dg/boz_dshift_2.f90: Ditto.
* gfortran.dg/boz_float_1.f90: Ditto.
* gfortran.dg/boz_float_2.f90: Ditto.
* gfortran.dg/boz_float_3.f90: Ditto.
* gfortran.dg/boz_iand_1.f90: Ditto.
* gfortran.dg/boz_iand_2.f90: Ditto.
2019-07-23 Steven G. Kargl <kargl@gcc.gnu.org>
* testsuite/libgomp.fortran/reduction4.f90: Update BOZ usage
* testsuite/libgomp.fortran/reduction5.f90: Ditto.
From-SVN: r273747
* tree-vect-stmts.c (scan_operand_equal_p): Look through MEM_REF
with SSA_NAME address of POINTER_PLUS_EXPR. Handle MULT_EXPR
and casts in offset when different, both through gimple stmts
and through trees. Rewritten using loops to minimize code duplication
for each operand.
* g++.dg/vect/simd-6.cc: Replace xfail with target x86.
* g++.dg/vect/simd-9.cc: Likewise.
* testsuite/libgomp.c++/scan-13.C: Replace xfail with target x86.
* testsuite/libgomp.c++/scan-16.C: Likewise.
From-SVN: r273249
* omp-low.c (lower_rec_input_clauses): For lastprivate clauses in
ctx->for_simd_scan_phase simd copy the outer var to the privatized
variable(s). For conditional lastprivate look through outer
GIMPLE_OMP_SCAN context.
(lower_omp_1): For conditional lastprivate look through outer
GIMPLE_OMP_SCAN context.
* testsuite/libgomp.c/scan-19.c: New test.
* testsuite/libgomp.c/scan-20.c: New test.
From-SVN: r273169
* omp-low.c (struct omp_context): Add for_simd_scan_phase member.
(maybe_lookup_ctx): Add forward declaration.
(omp_find_scan): Likewise. Walk into body of simd if composited
with worksharing loop.
(scan_omp_simd_scan): New function.
(scan_omp_1_stmt): Call it.
(lower_rec_simd_input_clauses): Don't create rvar nor rvar2 if
ctx->for_simd_scan_phase.
(lower_rec_input_clauses): Do much less work for inscan reductions
in ctx->for_simd_scan_phase is_simd regions.
(lower_omp_scan): Set is_simd also on simd constructs composited
with worksharing loop, unless ctx->for_simd_scan_phase. Never emit
a sorry message. Don't change GIMPLE_OMP_SCAN stmts into nops and
emit their body after in simd constructs composited with worksharing
loop.
(lower_omp_for_scan): Handle worksharing loop composited with simd.
* c-c++-common/gomp/scan-4.c: Don't expect sorry message.
* testsuite/libgomp.c/scan-11.c: New test.
* testsuite/libgomp.c/scan-12.c: New test.
* testsuite/libgomp.c/scan-13.c: New test.
* testsuite/libgomp.c/scan-14.c: New test.
* testsuite/libgomp.c/scan-15.c: New test.
* testsuite/libgomp.c/scan-16.c: New test.
* testsuite/libgomp.c/scan-17.c: New test.
* testsuite/libgomp.c/scan-18.c: New test.
* testsuite/libgomp.c++/scan-9.C: New test.
* testsuite/libgomp.c++/scan-10.C: New test.
* testsuite/libgomp.c++/scan-11.C: New test.
* testsuite/libgomp.c++/scan-12.C: New test.
* testsuite/libgomp.c++/scan-13.C: New test.
* testsuite/libgomp.c++/scan-14.C: New test.
* testsuite/libgomp.c++/scan-15.C: New test.
* testsuite/libgomp.c++/scan-16.C: New test.
From-SVN: r273157
* omp-expand.c (expand_omp_for_static_nochunk): Don't emit
GOMP_loop_start at the start of second worksharing loop in a scan.
For nowait, don't emit GOMP_loop_end_nowait at the end of first
worksharing loop in a scan even if there are conditional lastprivates,
and do emit GOMP_loop_end_nowait at the end of second worksharing loop.
* testsuite/libgomp.c/scan-9.c: New test.
* testsuite/libgomp.c/scan-10.c: New test.
From-SVN: r273095
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE__SCANTEMP_
clause.
* tree.h (OMP_CLAUSE_DECL): Use OMP_CLAUSE__SCANTEMP_ instead of
OMP_CLAUSE__CONDTEMP_ as range's upper bound.
(OMP_CLAUSE__SCANTEMP__ALLOC, OMP_CLAUSE__SCANTEMP__CONTROL): Define.
* tree.c (omp_clause_num_ops, omp_clause_code_name): Add
OMP_CLAUSE__SCANTEMP_ entry.
(walk_tree_1): Handle OMP_CLAUSE__SCANTEMP_.
* tree-pretty-print.c (dump_omp_clause): Likewise.
* tree-nested.c (convert_nonlocal_omp_clauses,
convert_local_omp_clauses): Likewise.
* omp-general.h (struct omp_for_data): Add have_scantemp and
have_nonctrl_scantemp members.
* omp-general.c (omp_extract_for_data): Initialize them.
* omp-low.c (struct omp_context): Add scan_exclusive member.
(scan_omp_1_stmt): Don't unnecessarily mask gimple_omp_for_kind
result again with GF_OMP_FOR_KIND_MASK. Initialize also
ctx->scan_exclusive.
(lower_rec_simd_input_clauses): Use ctx->scan_exclusive instead
of !ctx->scan_inclusive.
(lower_rec_input_clauses): Simplify gimplification of dtors using
gimplify_and_add. For non-is_simd test OMP_CLAUSE_REDUCTION_INSCAN
rather than rvarp. Handle OMP_CLAUSE_REDUCTION_INSCAN in worksharing
loops. Don't add barrier for reduction_omp_orig_ref if
ctx->scan_??xclusive.
(lower_reduction_clauses): Don't do anything for ctx->scan_??xclusive.
(lower_omp_scan): Use ctx->scan_exclusive instead
of !ctx->scan_inclusive. Handle worksharing loops with inscan
reductions. Use new_vard != new_var instead of repeated
omp_is_reference calls.
(omp_find_scan, lower_omp_for_scan): New functions.
(lower_omp_for): Call lower_omp_for_scan for worksharing loops with
inscan reductions.
* omp-expand.c (expand_omp_scantemp_alloc): New function.
(expand_omp_for_static_nochunk): Handle fd->have_nonctrl_scantemp
and fd->have_scantemp.
* c-c++-common/gomp/scan-3.c (f1): Don't expect a sorry message.
* c-c++-common/gomp/scan-5.c (foo): Likewise.
* testsuite/libgomp.c++/scan-1.C: New test.
* testsuite/libgomp.c++/scan-2.C: New test.
* testsuite/libgomp.c++/scan-3.C: New test.
* testsuite/libgomp.c++/scan-4.C: New test.
* testsuite/libgomp.c++/scan-5.C: New test.
* testsuite/libgomp.c++/scan-6.C: New test.
* testsuite/libgomp.c++/scan-7.C: New test.
* testsuite/libgomp.c++/scan-8.C: New test.
* testsuite/libgomp.c/scan-1.c: New test.
* testsuite/libgomp.c/scan-2.c: New test.
* testsuite/libgomp.c/scan-3.c: New test.
* testsuite/libgomp.c/scan-4.c: New test.
* testsuite/libgomp.c/scan-5.c: New test.
* testsuite/libgomp.c/scan-6.c: New test.
* testsuite/libgomp.c/scan-7.c: New test.
* testsuite/libgomp.c/scan-8.c: New test.
From-SVN: r272958
Test what OpenMP 5.0 has to say on this topic. And, do the same for OpenACC.
libgomp/
PR fortran/90743
* oacc-parallel.c (GOACC_parallel_keyed): Handle NULL mapping
case.
* testsuite/libgomp.fortran/target-allocatable-1-1.f90: New file.
* testsuite/libgomp.fortran/target-allocatable-1-2.f90: Likewise.
* testsuite/libgomp.oacc-fortran/allocatable-1-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/allocatable-1-2.f90: Likewise.
From-SVN: r272447
When compiling the test-case with r268755, we run into a SIGSEGV in
oacc_entry_exit_ok_1 when trying to dereference a NULL red:
...
struct reduction_info *red;
red = reduction_phi (reduction_list, use_stmt);
tree val = PHI_RESULT (red->keep_res);
...
Fix this by handling ref == NULL.
Bootstrapped and reg-tested on x86_64.
Build and reg-tested on x86_64 with nvptx accelerator.
2019-06-16 Tom de Vries <tdevries@suse.de>
PR tree-optimization/89376
* tree-parloops.c (oacc_entry_exit_ok_1): Handle red == NULL.
* testsuite/libgomp.oacc-c-c++-common/pr89376.c: New test.
From-SVN: r272338
After the fix for "PR tree-optimization/89713 - Assume loop with an exit is
finite" ( r272234 ) empty oacc loops are removed before expand.
Update pr85381-{2,4}.c accordingly.
2019-06-15 Tom de Vries <tdevries@suse.de>
PR tree-optimization/89713
* testsuite/libgomp.oacc-c-c++-common/pr85381-2.c: Expect no bar.sync.
* testsuite/libgomp.oacc-c-c++-common/pr85381-4.c: Same.
From-SVN: r272324
If we compile the openacc testcase with -fopenacc -O2, we run into a SIGSEGV
or assert. The root cause for this is that pass_thread_jumps breaks the
invariant that OACC_FORK and OACC_JOIN mark the start and end of a
single-entry-single-exit region.
Fix this by bailing out when encountering an IFN_UNIQUE in
thread_jumps::profitable_jump_thread_path.
Bootstrapped and reg-tested on x86_64.
Build and reg-tested libgomp on x86_64 with nvptx accelerator.
2019-06-15 Tom de Vries <tdevries@suse.de>
PR tree-optimization/90009
* tree-ssa-threadbackward.c (thread_jumps::profitable_jump_thread_path):
Return NULL if bb contains IFN_UNIQUE.
* testsuite/libgomp.oacc-c-c++-common/pr90009.c: New test.
From-SVN: r272321
* omp-low.c (lower_rec_input_clauses): For lastprivate conditional
references, lookup in in hash map MEM_REF operand instead of the
MEM_REF itself.
(lower_omp_1): When looking for lastprivate conditional assignments,
handle MEM_REFs with REFERENCE_TYPE operands.
* testsuite/libgomp.c++/lastprivate-conditional-1.C: New test.
* testsuite/libgomp.c++/lastprivate-conditional-2.C: New test.
From-SVN: r271948
* gimplify.c (gimplify_scan_omp_clauses): Don't sorry_at on lastprivate
conditional on combined for simd.
* omp-low.c (struct omp_context): Add combined_into_simd_safelen0
member.
(lower_rec_input_clauses): For gimple_omp_for_combined_into_p max_vf 1
constructs, don't remove lastprivate_conditional_map, but instead set
ctx->combined_into_simd_safelen0 and adjust hash_map, so that it points
to parent construct temporaries.
(lower_lastprivate_clauses): Handle ctx->combined_into_simd_safelen0
like !ctx->lastprivate_conditional_map.
(lower_omp_1) <case GIMPLE_ASSIGN>: If up->combined_into_simd_safelen0,
use up->outer context instead of up.
* omp-expand.c (expand_omp_for_generic): Perform cond_var bump even if
gimple_omp_for_combined_p.
(expand_omp_for_static_nochunk): Likewise.
(expand_omp_for_static_chunk): Add forgotten cond_var bump that was
probably moved over into expand_omp_for_generic rather than being copied
there.
gcc/cp/
* cp-tree.h (CP_OMP_CLAUSE_INFO): Allow for any clauses up to _condvar_
instead of only up to linear.
gcc/testsuite/
* c-c++-common/gomp/lastprivate-conditional-2.c (foo): Don't expect
a sorry_at on any of the clauses.
libgomp/
* testsuite/libgomp.c-c++-common/lastprivate-conditional-7.c: New test.
* testsuite/libgomp.c-c++-common/lastprivate-conditional-8.c: New test.
* testsuite/libgomp.c-c++-common/lastprivate-conditional-9.c: New test.
* testsuite/libgomp.c-c++-common/lastprivate-conditional-10.c: New test.
From-SVN: r271907
* gimplify.c (struct gimplify_omp_ctx): Add clauses member.
(gimplify_scan_omp_clauses): Initialize ctx->clauses.
(gimplify_adjust_omp_clauses_1): Transform lastprivate conditional
explicit clause on combined parallel into implicit shared clause.
(gimplify_adjust_omp_clauses): Move lastprivate conditional clause
and firstprivate if the decl has one too from combined parallel to
the worksharing construct.
gcc/testsuite/
* c-c++-common/gomp/lastprivate-conditional-2.c (foo): Don't expect
sorry on lastprivate conditional on parallel for.
* c-c++-common/gomp/lastprivate-conditional-3.c (foo): Add tests for
lastprivate conditional warnings on parallel for constructs.
* c-c++-common/gomp/lastprivate-conditional-4.c: New test.
libgomp/
* testsuite/libgomp.c-c++-common/lastprivate_conditional_4.c: Rename
to ...
* testsuite/libgomp.c-c++-common/lastprivate-conditional-4.c: ... this.
* testsuite/libgomp.c-c++-common/lastprivate-conditional-5.c: New test.
* testsuite/libgomp.c-c++-common/lastprivate-conditional-6.c: New test.
From-SVN: r271733
* omp-low.c (lower_omp_1) <case GIMPLE_ASSIGN>: Look through ordered,
critical, taskgroup and section regions when looking for a region
with non-NULL lastprivate_conditional_map.
* testsuite/libgomp.c-c++-common/lastprivate-conditional-3.c: New test.
From-SVN: r271672
PR libgomp/90641
* work.c (gomp_init_work_share): Instead of aligning final ordered
value to multiples of long long alignment, align to that the
first part (ordered team ids) and if inline_ordered_team_ids
is not on a long long alignment boundary within the structure,
use __alignof__ (long long) - 1 pad size always.
* loop.c (GOMP_loop_start): Fix *mem computation if
inline_ordered_team_ids is not aligned on long long alignment boundary
within the structure.
* loop-ull.c (GOMP_loop_ull_start): Likewise.
* sections.c (GOMP_sections2_start): Likewise.
From-SVN: r271671
PR libgomp/90585
* plugin/plugin-hsa.c: Include gstdint.h. Include inttypes.h only if
HAVE_INTTYPES_H is defined.
(print_uint64_t): New typedef.
(PRIu64): Define if HAVE_INTTYPES_H is not defined.
(print_kernel_dispatch, run_kernel): Use PRIu64 macro instead of
"lu", cast uint64_t HSA_DEBUG and fprintf arguments to print_uint64_t.
(release_kernel_dispatch): Likewise. Cast shadow->debug to uintptr_t
before casting to void *.
* plugin/plugin-nvptx.c: Include gstdint.h instead of stdint.h.
* oacc-mem.c: Don't include config.h nor stdint.h.
* target.c: Don't include config.h.
* oacc-cuda.c: Likewise.
* oacc-host.c: Don't include stdint.h.
From-SVN: r271597
... to avoid compilation overhead, and to keep simple '-foffload=[...]'
handling in test cases.
libgomp/
* testsuite/libgomp.oacc-c++/c++.exp: Specify
"-foffload=$offload_target".
* testsuite/libgomp.oacc-c/c.exp: Likewise.
* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
* testsuite/lib/libgomp.exp
(check_effective_target_openacc_nvidia_accel_configured): Remove,
as (conceptually) merged into
check_effective_target_openacc_nvidia_accel_selected. Adjust all
users.
From-SVN: r269109
PR middle-end/89002
* gimplify.c (gimplify_omp_for): When adding OMP_CLAUSE_*_GIMPLE_SEQ
for lastprivate/linear IV, push gimplify context around gimplify_assign
and, if it needed any temporaries, pop it into a gimple bind around the
sequence.
* testsuite/libgomp.c/pr89002.c: New test.
From-SVN: r268346
PR middle-end/89002
* gimplify.c (gimplify_omp_for): When adding OMP_CLAUSE_*_GIMPLE_SEQ
for lastprivate/linear IV, push gimplify context around gimplify_assign
and, if it needed any temporaries, pop it into a gimple bind around the
sequence.
* testsuite/libgomp.c/pr89002.c: New test.
From-SVN: r268345
I wrote a test-case:
...
int
main (void)
{
for (unsigned i = 0; i < 128; ++i)
{
acc_init (acc_device_nvidia);
acc_shutdown (acc_device_nvidia);
}
return 0;
}
...
and ran it under valgrind. The only leak location reported with a frequency
of 128, was the allocation of ptx_devices in nvptx_init.
Fix this by freeing ptx_devices in GOMP_OFFLOAD_fini_device, once
instantiated_devices drops to 0.
2019-01-24 Tom de Vries <tdevries@suse.de>
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_fini_device): Free ptx_devices
once instantiated_devices drops to 0.
From-SVN: r268237
Consider test-case:
...
int
main (void)
{
#pragma acc parallel async
;
#pragma acc parallel async
;
#pragma acc wait
return 0;
}
...
This fails with:
...
libgomp: cuMemAlloc error: invalid argument
Segmentation fault (core dumped)
...
The cuMemAlloc error is due to the fact that we're try to allocate 0 bytes.
Fix this by preventing calling map_push with size zero argument in nvptx_exec.
This also has the consequence that for the abort-1.c test-case, we end up
calling cuMemFree during map_fini for the struct cuda_map allocated in
map_init, which fails because an abort happened. Fix this by calling
cuMemFree with CUDA_CALL_NOCHECK in cuda_map_destroy.
2019-01-23 Tom de Vries <tdevries@suse.de>
PR target/PR88946
* plugin/plugin-nvptx.c (cuda_map_destroy): Use CUDA_CALL_NOCHECK for
cuMemFree.
(nvptx_exec): Don't call map_push if mapnum == 0.
* testsuite/libgomp.oacc-c-c++-common/pr88946.c: New test.
From-SVN: r268178
There are currently two situations where this assert triggers:
...
libgomp/plugin/plugin-nvptx.c: map_fini: Assertion `!s->map->active' failed.
...
First, in abort-1.c, a parallel region triggering an abort:
...
int
main (void)
{
#pragma acc parallel
abort ();
return 0;
}
...
The abort is detected in nvptx_exec as the CUDA_ERROR_ILLEGAL_INSTRUCTION
return status of the cuStreamSynchronize call after kernel launch, which is
then handled by calling non-returning function GOMP_PLUGIN_fatal.
Consequently, the map_pop in nvptx_exec that in case of cuStreamSynchronize
success would remove or inactive the element added by the map_push earlier in
nvptx_exec, does not trigger. With the element no longer active, but still
marked active and a member of s->map, we run into the assert during
GOMP_OFFLOAD_fini_device, which is triggered from atexit handler
gomp_target_fini (which is triggered by the GOMP_PLUGIN_fatal mentioned above
calling exit).
Second, in pr88941.c, an async parallel region without wait:
...
int
main (void)
{
#pragma acc parallel async
;
/* no #pragma acc wait */
return 0;
}
...
Because nvptx_exec is handling an async region, it does not call map_pop for
the element added by map_push, but schedules an kernel execution completion
event to call map_pop. Again, we run into the assert during
GOMP_OFFLOAD_fini_device, which is triggered from atexit handler
gomp_target_fini, but the exit in this case is triggered by returning from main.
So either the kernel is still running, or the kernel has completed but the
corresponding event that is supposed to call map_pop is stuck in the event
queue, waiting for an event_gc.
Fix this by removing the assert, and skipping the freeing of device memory if
the map is still marked active (though in the async case, this is more a
workaround than an fix).
2019-01-23 Tom de Vries <tdevries@suse.de>
PR target/88941
PR target/88939
* plugin/plugin-nvptx.c (cuda_map_destroy): Handle map->active case.
(map_fini): Remove "assert (!s->map->active)".
* testsuite/libgomp.oacc-c-c++-common/pr88941.c: New test.
From-SVN: r268177
The map field of a struct ptx_stream is a FIFO. The FIFO is implemented as a
single linked list, with pop-from-the-front semantics.
The function map_pop pops an element, either by:
- deallocating the element, if there is more than one element
- or marking the element inactive, if there's only one element
The responsibility of map_push is to push an element to the back, as well as
selecting the element to push, by:
- allocating an element, or
- reusing the element at the front if inactive and big enough, or
- dropping the element at the front if inactive and not big enough, and
allocating one that's big enough
The current implemention gets at least the first and most basic scenario wrong:
> map = cuda_map_create (size);
We create an element, and assign it to map.
> for (t = s->map; t->next != NULL; t = t->next)
> ;
We determine the last element in the fifo.
> t->next = map;
We append the new element.
> s->map = map;
But here, we throw away the rest of the FIFO, and declare the FIFO to be just
the new element.
This problem causes the test-case asyncwait-1.c to fail intermittently on some
systems. The pr87835.c test-case added here is a a minimized and modified
version of asyncwait-1.c (avoiding the kernel construct) that is more likely to
fail.
Fix this by rewriting map_pop more robustly, by:
- seperating the function in two phases: select element, push element
- when reusing or dropping an element, making sure that the element is cleanly
popped from the queue
- rewriting the push element part in such a way that it can handle all cases
without needing if statements, such that each line is exercised for each of
the three cases.
2019-01-23 Tom de Vries <tdevries@suse.de>
PR target/87835
* plugin/plugin-nvptx.c (map_push): Fix adding of allocated element.
* testsuite/libgomp.oacc-c-c++-common/pr87835.c: New test.
From-SVN: r268176
Add some test-cases that set vector length using -fopenacc-dim.
2019-01-12 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c: New test.
* testsuite/libgomp.oacc-fortran/gemm-2.f90: New test.
From-SVN: r267897