Advanced Features
1. Depend offloading
The depend offloading functionality provides the ability to offload a task onto the device and makes the thread immediately available to participate in worksharing. Asynchronous offloading is integrated with the OpenMP tasking model, providing the ability to order tasks on the host and offload tasks.
#pragma omp target depend(out: b) map(a) { task1(a); } #pragma omp target update depend(out: a) map(a) { task2(a); } #pragma omp task depend(in: b) task3_on_host(); #pragma omp target depend(inout: a, b) map(a,b) { #pragma omp task task4(a); #pragma omp task depend(out: b) task5(); #pragma omp task depend(in: b) task6(); }
2. Unstructured data directives (transfers and persistent data)
The unstructured data directives provide a rich set of data allocation and data movement. These directives enable allocation of device memory in one routine, deallocation in a different routine, and conditional data motions.
class myClass { myClass(){ commonData = malloc() #pragma omp target enter data map(alloc(commonData)) } ~myClass(){ #pragma omp target exit data map(release(instanceData)) } transfer_to() { #pragma omp target exit data map(to(instanceData)) } transfer_from() { #pragma omp target exit data map(from(instanceData)) } <br> private: float *commonData; int length; }
Programming Guidelines
Programs that are good candidates for offloading have these characteristics:
1. There is a high level of parallelism either in threads or vectors
2. The data transfer should be minimal.
3. The code should not execute significant amounts of I/O
In a heterogeneous computing environment, an application can be tuned to execute particular components of work on specific devices. For example, serial components might execute best on a general purpose, high frequency CPU whose architecture is suited for logic and branched code. Code with highly parallel and vectorizable components might share the work across the host and the device(s). Targeted execution on a device may incur significant overhead for data initialization and transfer. In this case it is beneficial to overlap data communication with computation or reuse data storage across many target executions. OpenMP 4.0 provides the ability to reuse data across multiple offloads and a future extension will enable asynchronous (overlapping) data transfers.
Future of Device Constructs
The OpenMP committee continues to enrich the language with new features that enable efficient use of the target devices. The committee has released Technical Report 3 (TR3) in anticipation of the OpenMP 4.1 release. The key new features in the TR3 are:
Non-structured data allocation
In OpenMP 4.0, variables are mapped to a target device for the duration of the lexical scope where the construct is used. To extend the lifetime of the variable beyond this lexical scope so variables can be allocated in one routine and freed in a different routine, two new standalone target constructs have been introduced: “target enter data” and “target exit data”. “target enter” begins the lifetime of the variable in the target device and “target exit data” ends the lifetime of the variable.
Asynchronous offload
In OpenMP 4.0, all target tasks are synchronous. The thread which encounters the target task starts the execution of the target task on the target device and waits for the target task to complete. A new clause “wait” has been added to the target construct to enable the encountering thread to resume execution before the target task completes its execution.
Depend clause addition
A new “depend” clause has been added to the device constructs that enables synchronization between device constructs, and also between device constructs and task constructs. The behavior of this “depend” clause is the same as the “depend” clause in the task construct and will be described in the article on tasking.
Map clause extensions
A new map type “delete” has been added to the existing map-type to enable the user to free the specified object from the device environment unconditionally. This can be used to release the object in an inner nesting of a target region.
Map type modifier “always” has been added to the map-type to modify the default behavior of the map-type. The “always” modifier will enable the user to force a transfer where the transfer may not have occurred due to the presence rule. In OpenMP 4.0, this would have required the user to use an “update” construct to transfer the data.
Appendix on OpenMP Target Syntax
More around this topic...
In the same section
© HPC Today 2024 - All rights reserved.
Thank you for reading HPC Today.