OpenMP Device Constructs

By The Editorial Team | September 04, 2015

2.2 Array Sections
An array section designates a subset of the elements of an array. Fortran has built-in support for array sections, but C and C++ do not. Some OpenMP clauses such as the map clause accept array section syntax that can occur in place of an array subscript. The syntax in C and C++ is variable[ : ], where variable has a type of array or pointer, and lower-bound and length are integral expressions that specify a contiguous set of elements. Array sections are useful for describing the memory behind a pointer in C/C++ or for mapping a slice or sub-section of a large array. Note that in Fortran the lower-bound and upper-bounds are specified as the upper and lower extent of the array section. For C/C++, the lower-bound is the start of the sub-section and the length is the size of the array section. This difference from Fortran was chosen because C/C++ deals with lengths of arrays and not upper bounds.

The notation [:] is a shorthand for a whole array dimension if the size of the dimension is known from the array declaration. Both and must be specified if either is needed.

The compiler must be able to determine the shape and size of an array object. If the base of an array section has incompletely specified dimensions (such as a pointer variable), the length of the array section must be specified explicitly.

2.3 Persistent data region
As with any storage, performance may be optimized by minimizing data transfers. Also, since creating space on a remote device can be expensive, reusing the same allocated space across target invocations (making the storage persistent) can enhance performance. The structured block of a target data directive in C/C++ and the code block terminated by the end target data directive in Fortran create a region in which the storage and the data within the storage persist across enclosed target directive executions. The map clause is used to designate data transfer to the device (map-type to) at the beginning of the region and transfer data back from the device (map-type from) at the end of the region. The tofrom map-type is a combination of to and from. The alloc map-type designates only storage creation without any transfer.
The program below illustrates persistence and data transfers at the boundaries of the target data region.

Listing 4: Persistent data with the Target Data directive

  #define N 1000  int main(){  {   int i, a[N], b[N], C[N], D[N];  <br>   for(i=0; i<N; i++) { a[i]=i; b[N-i]=i; }  <br>   #pragma omp target data map(to:    a)                            map(tofrom:b)                            map(from:  c)                            map(alloc: d)    {     #pragma omp target     for (i=0; i<N; i++) { d[i] = a[i] + b[i]; }  <br>     #pragma omp target     for (i=0; i<N; i++) { b[i]=d[i]*i; c[i]=d[i]/i; }   }  <br>   printf("b[0] b[N-1] c[0] c[N-1] %d %d %d %d n",            b[0],b[N-1],c[0],c[N-1]);  }

Note, there is no implicit mapping of the arrays on the target execution directives. The variable i is implicitly mapped, though. Target update directives are used within the target data region to transfer (assign) data in the original list items (host values) to the corresponding device storage with the to clause. Transfers in the reverse direction are specified with the from clause.

The next program illustrates “update” transfers, and uses array sections of dynamic data (arrays created by malloc) on the host. Here, v1, v2, and p array sections are created on the device, and v1 and v2 values on the host are copied to the device at the beginning of the target data region. After the first target execution, the host modifies its v1 and v2 arrays, and then copies the values to the device with the target update directive. The new values are used in the second target execution directive. At the end of the target data region, the p array section is copied from the device to the host.

Listing 5: Updating persistent storage with the Target Data directive.

  extern void init(float *, float *, int);  extern void init_again(float *, float *, int);  extern void output(float *, int);  void vec_mult(float *p, float *v1, float *v2, int N)  {    int i;    init(v1, v2, N);  <br>  #pragma omp target data map(to: v1[:N], v2[:N]) map(from: p[0:N])  {    #pragma omp target    #pragma omp parallel for    for (i=0; i<N; i++)         p[i] = v1[i] * v2[i];  <br>    init_again(v1, v2, N);    #pragma omp target update to(v1[:N], v2[:N])  <br>    #pragma omp target    #pragma omp parallel for    for (i=0; i<N; i++)        p[i] = p[i] + (v1[i] * v2[i]);  }    output(p, N);  }

<1 2 3 4 >

Navigation