OpenMP Device Constructs
By   |  September 04, 2015

3. Accelerated Worksharing

The teams construct creates a league of thread teams in which the master thread of each team begins execution of the region. Each master thread is an initial thread, and executes sequentially, as if enclosed in an implicit task region defined by an implicit parallel region surrounding the entire teams region.

  int main (int argc, char ** argv)  {     int nteams = 4;  #pragma omp target     {  #pragma omp teams num_teams(nteams)         {             int teamSize = omp_get_num_threads();  #pragma omp distribute             for (int t=0; t<nteams; t++) {  #pragma omp parallel for                 for (int i=0; i<teamSize; i++) {                     int nTeams = omp_get_num_teams();                     int myTeam = omp_get_team_num();                     int me     = omp_get_thread_num();                     int teamSize = omp_get_num_threads();  printf ("Iteration: %d -- thread %d of %d in team %d of %dn", i, me, teamSize, myTeam, nTeams);      // system with 240 threads would print the following for iteration 0     // Iteration: 0 -- thread 0 of 60 in team 0 of 4     // Iteration: 0 -- thread 0 of 60 in team 1 of 4     // Iteration: 0 -- thread 0 of 60 in team 2 of 4     // Iteration: 0 -- thread 0 of 60 in team 3 of 4  }}}}}  

4. Asynchronous Execution

As usual, the nowait clause on a target directive means that the encountering thread does not wait at some form of an implicit barrier or wait. A thread that encounters a target region without a nowait clause will launch a target execution and wait for the target execution (and the data transfers) to complete before continuing execution beyond the target construct. That is, the encountering thread blocks until the target execution is done. This guarantees that the data mapped from the device will be in place for execution after the target construct.

A target nowait clause enables a target region to be launched in the background and allows a thread to immediately continue execution after the target region. The thread is free to execute serially, or to create or simultaneously participate in parallel regions on the host. Essentially, the target execution can proceed asynchronously as a host thread executes code after the target construct. One might think of the asynchronous behavior as similar to that of asynchronous I/O.

  #pragma omp target            	   #pragma omp target  nowait  device_work();                	   device work();  //  host thread blocks here   	   // host thread continues execution here  //  until device_work completes   // while device_work executes  

Technically, the details of how a nowait clause accomplishes asynchronous execution of a host thread and a target execution is based on the execution model for tasks. The target construct executes as though it is enclosed by a task; this generated task is a target task. Without a target nowait clause, a target task is executed immediately by the encountering thread (i.e., the task in undeferred) and waits at a scheduling point for device execution and data transfers to complete.

When a nowait clause is present, the target task is queued for execution and the task of the encountering thread may resume execution after the target construct before the target task completes execution. The underlying mechanisms for data transfers and launch of the device executable are implementation defined. That is, OS threads may do this work. Hence, in a serial region where only a single OpenMP thread is available, the mechanics of running, monitoring and terminating a target execution on the device (launch and cleanup), and transferring data may proceed without interrupting the single OpenMP host thread until it is time to resume the target task waiting at the scheduling point.

Completion of the target task, and hence the target execution, is guaranteed by a taskwait directive. In the code snippet below, the master (serial) thread waits for completion of its child task (the target task), guaranteeing completion of the target execution and data transfers.

  int main(){    #pragma omp declare void device_work(int ia[])    int ia[2];    ia[0]=1; ia[1];  <br>    #pragma omp target nowait    device_work(ia);         	// ia changed on device  <br>    printf("ia[0]=%dn", ia[0]); // race condition  <br>    #pragma omp taskwait  <br>    printf("ia[0]=%dn", ia[0]); // device values  }  

Navigation

<1234>

© HPC Today 2024 - All rights reserved.

Thank you for reading HPC Today.

Express poll

Do you use multi-screen
visualization technologies?

Industry news

Brands / Products index