HPC Magazine Février 2014 - Maîtriser OpenACC 2.0 (partie 2)

Listing 10 - Code équivalent à celui du listing 9 sans application de la clause tile.


#pragma acc loop gang collapse(2)
for (ii=0 ; ii<1024; ii+=16) 
  for (jj=0 ; ji<512; jj+=8)
   #pragma acc loop worker num_workers(64) collapse(2)
    for (i=ii ; i<ii+16; i++) 
       for (j=jj ; j<jj+8; j++) {
          X[i][j] = 
            c11*Y[i-1][j-1] + c12*Y[i-1][j-1]  + c13*Y[i-1][j-1] +
            c21*Y[i][j-1] + c22*Y[i][j]  + c23*Y[i][j-1] +
            c31*Y[i+1][j-1] + c32*Y[i-1][j-1]  + c33*Y[i+1][j-1] ;
       }