[nvptx, libgomp] Fix map_push
The map field of a struct ptx_stream is a FIFO. The FIFO is implemented as a
single linked list, with pop-from-the-front semantics.
The function map_pop pops an element, either by:
- deallocating the element, if there is more than one element
- or marking the element inactive, if there's only one element
The responsibility of map_push is to push an element to the back, as well as
selecting the element to push, by:
- allocating an element, or
- reusing the element at the front if inactive and big enough, or
- dropping the element at the front if inactive and not big enough, and
allocating one that's big enough
The current implemention gets at least the first and most basic scenario wrong:
> map = cuda_map_create (size);
We create an element, and assign it to map.
> for (t = s->map; t->next != NULL; t = t->next)
> ;
We determine the last element in the fifo.
> t->next = map;
We append the new element.
> s->map = map;
But here, we throw away the rest of the FIFO, and declare the FIFO to be just
the new element.
This problem causes the test-case asyncwait-1.c to fail intermittently on some
systems. The pr87835.c test-case added here is a a minimized and modified
version of asyncwait-1.c (avoiding the kernel construct) that is more likely to
fail.
Fix this by rewriting map_pop more robustly, by:
- seperating the function in two phases: select element, push element
- when reusing or dropping an element, making sure that the element is cleanly
popped from the queue
- rewriting the push element part in such a way that it can handle all cases
without needing if statements, such that each line is exercised for each of
the three cases.
2019-01-23 Tom de Vries <tdevries@suse.de>
PR target/87835
* plugin/plugin-nvptx.c (map_push): Fix adding of allocated element.
* testsuite/libgomp.oacc-c-c++-common/pr87835.c: New test.
From-SVN: r268176
2019-01-23 09:16:11 +01:00
|
|
|
/* { dg-do run { target openacc_nvidia_accel_selected } } */
|
|
|
|
/* { dg-additional-options "-lcuda" } */
|
|
|
|
|
|
|
|
#include <openacc.h>
|
|
|
|
#include <stdlib.h>
|
|
|
|
#include "cuda.h"
|
|
|
|
|
|
|
|
#include <stdio.h>
|
|
|
|
|
|
|
|
#define n 128
|
|
|
|
|
|
|
|
int
|
|
|
|
main (void)
|
|
|
|
{
|
|
|
|
CUresult r;
|
|
|
|
CUstream stream1;
|
|
|
|
int N = n;
|
|
|
|
int a[n];
|
|
|
|
int c[n];
|
|
|
|
|
|
|
|
acc_init (acc_device_nvidia);
|
|
|
|
|
|
|
|
r = cuStreamCreate (&stream1, CU_STREAM_NON_BLOCKING);
|
|
|
|
if (r != CUDA_SUCCESS)
|
|
|
|
{
|
|
|
|
fprintf (stderr, "cuStreamCreate failed: %d\n", r);
|
|
|
|
abort ();
|
|
|
|
}
|
|
|
|
|
|
|
|
acc_set_cuda_stream (1, stream1);
|
|
|
|
|
|
|
|
for (int i = 0; i < n; i++)
|
|
|
|
{
|
|
|
|
a[i] = 3;
|
|
|
|
c[i] = 0;
|
|
|
|
}
|
|
|
|
|
2019-05-08 12:01:30 +02:00
|
|
|
#pragma acc data copy (a, c) copyin (N)
|
[nvptx, libgomp] Fix map_push
The map field of a struct ptx_stream is a FIFO. The FIFO is implemented as a
single linked list, with pop-from-the-front semantics.
The function map_pop pops an element, either by:
- deallocating the element, if there is more than one element
- or marking the element inactive, if there's only one element
The responsibility of map_push is to push an element to the back, as well as
selecting the element to push, by:
- allocating an element, or
- reusing the element at the front if inactive and big enough, or
- dropping the element at the front if inactive and not big enough, and
allocating one that's big enough
The current implemention gets at least the first and most basic scenario wrong:
> map = cuda_map_create (size);
We create an element, and assign it to map.
> for (t = s->map; t->next != NULL; t = t->next)
> ;
We determine the last element in the fifo.
> t->next = map;
We append the new element.
> s->map = map;
But here, we throw away the rest of the FIFO, and declare the FIFO to be just
the new element.
This problem causes the test-case asyncwait-1.c to fail intermittently on some
systems. The pr87835.c test-case added here is a a minimized and modified
version of asyncwait-1.c (avoiding the kernel construct) that is more likely to
fail.
Fix this by rewriting map_pop more robustly, by:
- seperating the function in two phases: select element, push element
- when reusing or dropping an element, making sure that the element is cleanly
popped from the queue
- rewriting the push element part in such a way that it can handle all cases
without needing if statements, such that each line is exercised for each of
the three cases.
2019-01-23 Tom de Vries <tdevries@suse.de>
PR target/87835
* plugin/plugin-nvptx.c (map_push): Fix adding of allocated element.
* testsuite/libgomp.oacc-c-c++-common/pr87835.c: New test.
From-SVN: r268176
2019-01-23 09:16:11 +01:00
|
|
|
{
|
|
|
|
#pragma acc parallel async (1)
|
|
|
|
;
|
|
|
|
|
|
|
|
#pragma acc parallel async (1) num_gangs (320)
|
2019-05-08 12:01:30 +02:00
|
|
|
#pragma acc loop gang
|
[nvptx, libgomp] Fix map_push
The map field of a struct ptx_stream is a FIFO. The FIFO is implemented as a
single linked list, with pop-from-the-front semantics.
The function map_pop pops an element, either by:
- deallocating the element, if there is more than one element
- or marking the element inactive, if there's only one element
The responsibility of map_push is to push an element to the back, as well as
selecting the element to push, by:
- allocating an element, or
- reusing the element at the front if inactive and big enough, or
- dropping the element at the front if inactive and not big enough, and
allocating one that's big enough
The current implemention gets at least the first and most basic scenario wrong:
> map = cuda_map_create (size);
We create an element, and assign it to map.
> for (t = s->map; t->next != NULL; t = t->next)
> ;
We determine the last element in the fifo.
> t->next = map;
We append the new element.
> s->map = map;
But here, we throw away the rest of the FIFO, and declare the FIFO to be just
the new element.
This problem causes the test-case asyncwait-1.c to fail intermittently on some
systems. The pr87835.c test-case added here is a a minimized and modified
version of asyncwait-1.c (avoiding the kernel construct) that is more likely to
fail.
Fix this by rewriting map_pop more robustly, by:
- seperating the function in two phases: select element, push element
- when reusing or dropping an element, making sure that the element is cleanly
popped from the queue
- rewriting the push element part in such a way that it can handle all cases
without needing if statements, such that each line is exercised for each of
the three cases.
2019-01-23 Tom de Vries <tdevries@suse.de>
PR target/87835
* plugin/plugin-nvptx.c (map_push): Fix adding of allocated element.
* testsuite/libgomp.oacc-c-c++-common/pr87835.c: New test.
From-SVN: r268176
2019-01-23 09:16:11 +01:00
|
|
|
for (int ii = 0; ii < N; ii++)
|
|
|
|
c[ii] = (a[ii] + a[N - ii - 1]);
|
|
|
|
|
|
|
|
#pragma acc parallel async (1)
|
|
|
|
#pragma acc loop seq
|
|
|
|
for (int ii = 0; ii < n; ii++)
|
|
|
|
a[ii] = 6;
|
|
|
|
|
|
|
|
#pragma acc wait (1)
|
|
|
|
}
|
|
|
|
|
|
|
|
for (int i = 0; i < n; i++)
|
|
|
|
if (c[i] != 6)
|
|
|
|
abort ();
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|