Explicit Vector Programming with OpenMP 4.0 SIMD Extensions

By Xinmin Tian and Bronis R. de Supinski | November 19, 2014

2.4 – Clauses for SIMD constucts

To refine the execution behavior of the SIMD construct further, OpenMP provides clauses for programmers to write optimal vector programs that explicitly specify the data sharing, data movement, visibility, SIMD length, linearity, uniformity and memory alignment.

2.4.1 – Data sharing clauses

The private, lastprivate and reduction clauses control data privatization and sharing of variables for a SIMD execution context. The private clause creates an uninitialized vector for the given variables. For SIMD function arguments, by default, a parameter has a distinct storage location or value for each of its instances among hardware SIMD lanes, i.e. it is private. The lastprivate clause provides the same semantics but also copies out the values produced from the last iteration to outside the loop. The reduction clause creates a vector copy of the variable and horizontally aggregates partial values of that vector into the original scalar variable.

2.4.1.1 – The uniform clause

A parameter that is specified with the uniform clause represents an invariant value for a chunk of concurrent invocations of the function in the execution of a single SIMD loop. It effectively is shared across SIMD lanes of vector execution. Specifying function parameters that are shared across the SIMD lanes as uniform allows the vectorizer to generate optimized code for scalar (or unit-stride) memory loads (or stores), and optimal control flow. For instance, when a base address is uniform and the offset is a linear unit stride, the compiler can generate faster unit-stride vector memory load/store instructions (e.g., movaps or movups supported on Intel SIMD hardware) instead of generating gather/scatter instructions. Also when a test condition for a control flow decision is based on a uniform quantity, the compiler can exploit that all running code instances will follow the same path at that point to save the overhead of masking checks and control flow divergence.

2.4.1.2 – The linear clause

A variable (or a parameter) specified in a linear clause is made private to each iteration (or each SIMD lane) and has a linear relationship with respect to the iteration space of the SIMD execution context. A variable cannot appear in more than one linear clause, or in a linear clause and also in another OpenMP data clause. A linear-step can be specified for a variable in a linear clause. If specified, the linear-step expression must be invariant during the execution of the region associated with the construct. Otherwise, the execution results in unspecified behavior. If linear-step is not specified, it is assumed to be 1.

Under a SIMD loop context, the value of the linearized variable on each iteration of the associated loop(s) corresponds to the value of the original variable before entering the construct plus the logical number of the iteration times linear-step. The value corresponding to the sequentially last iteration of the associated loops is assigned to the original variable. If the associated code does not increase the variable by linear-step in each iteration of the loop then the behavior is undefined.

Under a SIMD function context, a parameter referenced in a linear-step must be the subject of a uniform clause. No parameter of a vector function can be the subject of more than one uniform or linear clause. For a linear parameter, if the corresponding argument values in consecutive iterations (in the serial version of the program) do not differ by linear-step, the behavior is undefined.

2.4.2 – The aligned clause

Memory access alignment is important since most platforms can load (or store) aligned data much faster than unaligned data accesses, especially SIMD (vector type) data. However, compilers often cannot detect alignment properties of data across all modules of a program, or dynamically allocated memory (or objects), so they must conservatively generate code that uses only unaligned loads / stores.

Hence, the aligned(variable[:alignment] [,variable[:alignment]]) clause allows programmers to express alignment information (i.e. number of bytes that must be a constant positive integer value) to the compiler. For each variable in the list of the aligned clause, the programmer can specify an alignment value; if no optional alignment value is specified, an implementation defined default alignment for SIMD instructions on the target platforms is assumed.

2.4.3 – The safelen clause

If a safelen clause is specified, then no two iterations executed concurrently with SIMD instructions can have a greater distance in the logical iteration space than its value. The parameter of the safelen clause must be a constant positive integer expression. The number of iterations that are executed concurrently at any given time is implementation defined but guaranteed not to exceed the value specified in the safelen clause. A different SIMD lane will execute each concurrent iteration. Each set of concurrent iterations is a SIMD chunk.

2.4.4 – The simdlen clause

For a function annotated with declare simd, when a SIMD version is created, the number of concurrent elements packed for each argument of the function is determined by the vector length specified in the simdlen clause or, by default, is selected by the compiler for a given SIMD hardware. When the specified vector length is a multiple of the hardware SIMD length, the compiler may apply double-pumping, triple-pumping, or quad-pumping that emulates longer vectors by fusing multiple vector registers into a larger logical vector register. The parameter of the simdlen clause must be a constant positive integer expression. In practice, it should be a multiple of the hardware SIMD length. Otherwise, the number of elements packed for each argument of the function is implementation defined.

2.4.5 – inbranch and notinbranch clauses

The inbranch clause indicates that a function will always be called under conditions in the SIMD loop / function. The notinbranch clause indicates that a function will never be called under conditions of a SIMD loop / function. If neither clause is specified, then the function may or may not be called from inside a conditional statement of a SIMD loop / function. By default, for every SIMD variant function declared, two implementations are provided: one especially suitable for conditional invocation (i.e., inbranch version) with a predicate, and another especially suitable for unconditional invocation (i.e., notinbranch version).

If all invocations are conditional, generation of the notinbranch version can be suppressed using the inbranch clause. Similarly, if all invocations are unconditional, generation of the inbranch version can be suppressed using the notinbranch clause. Suppressing either inbranch or notinbranch version of a SIMD function helps to reduce code size and compilation time. By default, both inbranch and notinbranch versions of vector variants have to be provided, since the compiler cannot determine that the original scalar function is always called under condition (or not).

<1 2 3 4 5 6 7 8 >

Navigation