2.4 – Clauses for SIMD constucts
To refine the execution behavior of the SIMD construct further, OpenMP provides clauses for programmers to write optimal vector programs that explicitly specify the data sharing, data movement, visibility, SIMD length, linearity, uniformity and memory alignment.
2.4.1 – Data sharing clauses
The private
, lastprivate
and reduction
clauses control data privatization and sharing of variables for a SIMD execution context. The private
clause creates an uninitialized vector for the given variables. For SIMD function arguments, by default, a parameter has a distinct storage location or value for each of its instances among hardware SIMD lanes, i.e. it is private
. The lastprivate
clause provides the same semantics but also copies out the values produced from the last iteration to outside the loop. The reduction
clause creates a vector copy of the variable and horizontally aggregates partial values of that vector into the original scalar variable.
2.4.1.1 – The uniform
clause
A parameter that is specified with the uniform
clause represents an invariant value for a chunk of concurrent invocations of the function in the execution of a single SIMD loop. It effectively is shared across SIMD lanes of vector execution. Specifying function parameters that are shared across the SIMD lanes as uniform
allows the vectorizer to generate optimized code for scalar (or unit-stride) memory loads (or stores), and optimal control flow. For instance, when a base address is uniform
and the offset is a linear unit stride, the compiler can generate faster unit-stride vector memory load/store instructions (e.g., movaps
or movups
supported on Intel SIMD hardware) instead of generating gather/scatter instructions. Also when a test condition for a control flow decision is based on a uniform quantity, the compiler can exploit that all running code instances will follow the same path at that point to save the overhead of masking checks and control flow divergence.
2.4.1.2 – The linear
clause
A variable (or a parameter) specified in a linear
clause is made private to each iteration (or each SIMD lane) and has a linear relationship with respect to the iteration space of the SIMD execution context. A variable cannot appear in more than one linear
clause, or in a linear
clause and also in another OpenMP data clause. A linear-step
can be specified for a variable in a linear
clause. If specified, the linear-step
expression must be invariant during the execution of the region associated with the construct. Otherwise, the execution results in unspecified behavior. If linear-step
is not specified, it is assumed to be 1.
Under a SIMD loop context, the value of the linearized variable on each iteration of the associated loop(s) corresponds to the value of the original variable before entering the construct plus the logical number of the iteration times linear-step
. The value corresponding to the sequentially last iteration of the associated loops is assigned to the original variable. If the associated code does not increase the variable by linear-step
in each iteration of the loop then the behavior is undefined.
Under a SIMD function context, a parameter referenced in a linear-step
must be the subject of a uniform
clause. No parameter of a vector function can be the subject of more than one uniform
or linear
clause. For a linear
parameter, if the corresponding argument values in consecutive iterations (in the serial version of the program) do not differ by linear-step
, the behavior is undefined.
2.4.2 – The aligned
clause
Memory access alignment is important since most platforms can load (or store) aligned data much faster than unaligned data accesses, especially SIMD (vector type) data. However, compilers often cannot detect alignment properties of data across all modules of a program, or dynamically allocated memory (or objects), so they must conservatively generate code that uses only unaligned loads / stores.
Hence, the aligned(variable[:alignment] [,variable[:alignment]])
clause allows programmers to express alignment information (i.e. number of bytes that must be a constant positive integer value) to the compiler. For each variable in the list of the aligned clause, the programmer can specify an alignment value; if no optional alignment value is specified, an implementation defined default alignment for SIMD instructions on the target platforms is assumed.
2.4.3 – The safelen
clause
If a safelen
clause is specified, then no two iterations executed concurrently with SIMD instructions can have a greater distance in the logical iteration space than its value. The parameter of the safelen
clause must be a constant positive integer expression. The number of iterations that are executed concurrently at any given time is implementation defined but guaranteed not to exceed the value specified in the safelen clause. A different SIMD lane will execute each concurrent iteration. Each set of concurrent iterations is a SIMD chunk.
2.4.4 – The simdlen
clause
For a function annotated with declare simd
, when a SIMD version is created, the number of concurrent elements packed for each argument of the function is determined by the vector length specified in the simdlen
clause or, by default, is selected by the compiler for a given SIMD hardware. When the specified vector length is a multiple of the hardware SIMD length, the compiler may apply double-pumping, triple-pumping, or quad-pumping that emulates longer vectors by fusing multiple vector registers into a larger logical vector register. The parameter of the simdlen
clause must be a constant positive integer expression. In practice, it should be a multiple of the hardware SIMD length. Otherwise, the number of elements packed for each argument of the function is implementation defined.
2.4.5 – inbranch
and notinbranch
clauses
The inbranch
clause indicates that a function will always be called under conditions in the SIMD loop / function. The notinbranch
clause indicates that a function will never be called under conditions of a SIMD loop / function. If neither clause is specified, then the function may or may not be called from inside a conditional statement of a SIMD loop / function. By default, for every SIMD variant function declared, two implementations are provided: one especially suitable for conditional invocation (i.e., inbranch
version) with a predicate, and another especially suitable for unconditional invocation (i.e., notinbranch
version).
If all invocations are conditional, generation of the notinbranch version can be suppressed using the inbranch
clause. Similarly, if all invocations are unconditional, generation of the inbranch
version can be suppressed using the notinbranch
clause. Suppressing either inbranch
or notinbranch
version of a SIMD function helps to reduce code size and compilation time. By default, both inbranch
and notinbranch
versions of vector variants have to be provided, since the compiler cannot determine that the original scalar function is always called under condition (or not).
More around this topic...
In the same section
© HPC Today 2024 - All rights reserved.
Thank you for reading HPC Today.