2 – SIMD VECTOR EXTENSIONS TO OpenMP
To facilitate explicit vector programming of OpenMP programs, a new set of pragmas (or directives) were added in the OpenMP 4.0 specification. Ordinary, but annotated, C/C++ and Fortran complex loops and functions enable SIMD execution on modern microprocessors, so programmers do not have to rely on compiler data dependency analysis and vendor-specific compiler hints (e.g.,
#pragma ivdep supported by IBM, Cray and Intel compilers). This section describes the syntax and semantics of these extensions.
The OpenMP SIMD extensions have restrictions. For example, C++ exception handling code, any call to throw exceptions, and
setjmp function calls are not allowed in the lexical or dynamic scope of SIMD functions and loops. Other restrictions include:
• The function or subroutine body must be a structured block.
• The execution of the function or subroutine, when called from a SIMD loop, cannot result in the execution of an OpenMP construct.
• The execution of the function or subroutine cannot have any side effects that would alter its execution for concurrent iterations of a SIMD chunk.
• A program that branches into or out of the function is non-conforming.
Detailed syntax restrictions and language rules of OpenMP SIMD extensions can be found in the OpenMP 4.0 specification .
simd constructs for loops
The basis of OpenMP 4.0 SIMD extensions is the
simd construct for
for (C/C++) and
do loops (Fortran). This new construct instructs the compiler on vectorizing the loop. The
simd construct can be applied to a loop to indicate that the iterations of the loop can be divided into contiguous chunks of a specific length and, for each chunk, multiple iterations can be executed with multiple SIMD lanes concurrently within the chunk while preserving all data dependencies of the original serial program and its execution. The syntax of the
simd construct is as follows:
#pragma omp simd [clause[[,] clause] ...] new-line for-loops </code> Fortran:<code> !$omp simd [clause[[,] clause] ...] new-line do-loops [!$omp end simd ]
simd construct closely follows the idea and syntax of the existing loop construct. It supports several clauses that we cover in Section 2.4. The loop header of the associated
do loop must obey the same restrictions as for the loop construct. These restrictions enable the OpenMP compiler to determine the iteration space of the loop upfront and to distribute it accordingly to fit the vectorization.
simd construct can also be applied to the existing worksharing loop construct (and parallel loops) to form a loop
simd construct (and a combined parallel loop SIMD construct), which specifies a loop that can be executed concurrently using SIMD instructions and guarantees that those iterations will also be executed in parallel by threads in the team.
The loop (
do) SIMD construct will first distribute the iterations of the associated loop(s) across the implicit tasks of the parallel region in a manner consistent with any clauses that apply to the loop construct. The resulting chunks of iterations will then be converted to a SIMD loop in a manner consistent with any clauses that apply to the
simd construct. The effect of any clause that applies to both constructs is as if it were applied (for more details, see Section 2.8.3 and Section 2.10 in the OpenMP 4.0 specification ).
simd constructs for functions
simd construct for loops, OpenMP 4.0 introduces the
declare simd construct, which can be applied to a function (C, C++ and Fortran) or a subroutine (Fortran) to enable the creation of one or more versions that can process multiple instances of each argument using SIMD instructions from a single invocation from a SIMD loop. There may be multiple
declare simd directives for a function (C, C++, Fortran) or subroutine (Fortran). The syntax of the
declare simd construct is as follows:
#pragma omp declare simd [clause[[,] clause] ...] new-line [#pragma omp declare simd [clause[[,] clause] ...] new-line] function definition or declaration
!$omp declare simd (proc-name) [clause[[,] clause] ...] new-line
declare simd construct instructs the compiler to create SIMD versions of the associated function. The expressions appearing in the clauses of this directive are evaluated in the scope of the arguments of the function declaration or definition.
2.3 - SIMD execution model
In the SIMD execution model, a SIMD loop conceptually has logical iterations numbered 0,1,...,N-1 where N is the number of loop iterations. The logical numbering denotes the sequence in which the iterations would be executed if the associated loop(s) were executed with no SIMD instructions. A SIMD function has a logical number of invocations numbered 0,1,…,VL-1, where VL is the number of SIMD lanes. The logical numbering denotes the sequence in which the invocations would be executed if the associated function was executed with no SIMD instructions. In other words, a legal SIMD program and its execution should obey all original data dependencies among iterations (or invocations) and dependencies within an iteration of the serial program and its execution.
Given a SIMD hardware unit with 8 lanes, i.e. 8 elements of float type data that can be packed into a single SIMD register for 8-way SIMD execution, a chunk of iterations is mapped onto SIMD lanes and starts running concurrently on those SIMD lanes. The group of running SIMD lanes is called a SIMD chunk. The program counter is a single program counter shared by the SIMD lanes; it points to the single instruction to be executed next. To control execution within a SIMD chunk, the execution predicate is a per-SIMD-lane boolean value that indicates whether or not side effects from the current instruction should be observed. For example, if a statement were to be executed with an "all false" predicate, it should have no observable side-effects.
Upon entering a SIMD context (i.e. a SIMD loop or a SIMD function) in an application, the execution predicate is "all true" and the program counter points to the first statement in the loop or function. The following two statements describe the required behavior of the program counter and the execution predicate over the course of execution of a SIMD context:
• The program counter will have a sequence of values that correspond to a conservative execution path through statements of the SIMD context, wherein if any SIMD lane executes a statement, the program counter will pass through that statement;
• At each statement through which the program counter passes, the execution predicate will be set such that its value for a particular SIMD lane is "true" if and only if the SIMD lane is to be enabled to execute that statement.
The above SIMD execution behavior provides the compiler some latitude. For example, the program counter is allowed to skip a series of statements for which the execution predicate is "all false" since the statements have no observable side-effects. In reality, the control flow in the program can be diverging, which leads to reductions in SIMD efficiency (and thus performance) as different SIMD lanes must perform different computations. The SIMD execution model provides an important guarantee about the behavior of the program counter and execution predicate: the execution of SIMD lanes is maximally converged. Maximal convergence means that if two SIMD lanes follow the same control path, they are guaranteed to execute each program statement concurrently while preserving all original data dependencies. If two SIMD lanes follow diverging control paths, they are guaranteed to re-converge as soon as possible in the SIMD execution.
© HPC Today 2021 - All rights reserved.
Thank you for reading HPC Today.