2.4 – Clauses for SIMD constucts
To refine the execution behavior of the SIMD construct further, OpenMP provides clauses for programmers to write optimal vector programs that explicitly specify the data sharing, data movement, visibility, SIMD length, linearity, uniformity and memory alignment.
2.4.1 – Data sharing clauses
reduction clauses control data privatization and sharing of variables for a SIMD execution context. The
private clause creates an uninitialized vector for the given variables. For SIMD function arguments, by default, a parameter has a distinct storage location or value for each of its instances among hardware SIMD lanes, i.e. it is
lastprivate clause provides the same semantics but also copies out the values produced from the last iteration to outside the loop. The
reduction clause creates a vector copy of the variable and horizontally aggregates partial values of that vector into the original scalar variable.
126.96.36.199 – The
A parameter that is specified with the
uniform clause represents an invariant value for a chunk of concurrent invocations of the function in the execution of a single SIMD loop. It effectively is shared across SIMD lanes of vector execution. Specifying function parameters that are shared across the SIMD lanes as
uniform allows the vectorizer to generate optimized code for scalar (or unit-stride) memory loads (or stores), and optimal control flow. For instance, when a base address is
uniform and the offset is a linear unit stride, the compiler can generate faster unit-stride vector memory load/store instructions (e.g.,
movups supported on Intel SIMD hardware) instead of generating gather/scatter instructions. Also when a test condition for a control flow decision is based on a uniform quantity, the compiler can exploit that all running code instances will follow the same path at that point to save the overhead of masking checks and control flow divergence.
188.8.131.52 – The
A variable (or a parameter) specified in a
linear clause is made private to each iteration (or each SIMD lane) and has a linear relationship with respect to the iteration space of the SIMD execution context. A variable cannot appear in more than one
linear clause, or in a
linear clause and also in another OpenMP data clause. A
linear-step can be specified for a variable in a
linear clause. If specified, the
linear-step expression must be invariant during the execution of the region associated with the construct. Otherwise, the execution results in unspecified behavior. If
linear-step is not specified, it is assumed to be 1.
Under a SIMD loop context, the value of the linearized variable on each iteration of the associated loop(s) corresponds to the value of the original variable before entering the construct plus the logical number of the iteration times
linear-step. The value corresponding to the sequentially last iteration of the associated loops is assigned to the original variable. If the associated code does not increase the variable by
linear-step in each iteration of the loop then the behavior is undefined.
Under a SIMD function context, a parameter referenced in a
linear-step must be the subject of a
uniform clause. No parameter of a vector function can be the subject of more than one
linear clause. For a
linear parameter, if the corresponding argument values in consecutive iterations (in the serial version of the program) do not differ by
linear-step, the behavior is undefined.
2.4.2 – The
Memory access alignment is important since most platforms can load (or store) aligned data much faster than unaligned data accesses, especially SIMD (vector type) data. However, compilers often cannot detect alignment properties of data across all modules of a program, or dynamically allocated memory (or objects), so they must conservatively generate code that uses only unaligned loads / stores.
aligned(variable[:alignment] [,variable[:alignment]]) clause allows programmers to express alignment information (i.e. number of bytes that must be a constant positive integer value) to the compiler. For each variable in the list of the aligned clause, the programmer can specify an alignment value; if no optional alignment value is specified, an implementation defined default alignment for SIMD instructions on the target platforms is assumed.
2.4.3 – The
safelen clause is specified, then no two iterations executed concurrently with SIMD instructions can have a greater distance in the logical iteration space than its value. The parameter of the
safelen clause must be a constant positive integer expression. The number of iterations that are executed concurrently at any given time is implementation defined but guaranteed not to exceed the value specified in the safelen clause. A different SIMD lane will execute each concurrent iteration. Each set of concurrent iterations is a SIMD chunk.
2.4.4 – The
For a function annotated with
declare simd, when a SIMD version is created, the number of concurrent elements packed for each argument of the function is determined by the vector length specified in the
simdlen clause or, by default, is selected by the compiler for a given SIMD hardware. When the specified vector length is a multiple of the hardware SIMD length, the compiler may apply double-pumping, triple-pumping, or quad-pumping that emulates longer vectors by fusing multiple vector registers into a larger logical vector register. The parameter of the
simdlen clause must be a constant positive integer expression. In practice, it should be a multiple of the hardware SIMD length. Otherwise, the number of elements packed for each argument of the function is implementation defined.
inbranch clause indicates that a function will always be called under conditions in the SIMD loop / function. The
notinbranch clause indicates that a function will never be called under conditions of a SIMD loop / function. If neither clause is specified, then the function may or may not be called from inside a conditional statement of a SIMD loop / function. By default, for every SIMD variant function declared, two implementations are provided: one especially suitable for conditional invocation (i.e.,
inbranch version) with a predicate, and another especially suitable for unconditional invocation (i.e.,
If all invocations are conditional, generation of the notinbranch version can be suppressed using the
inbranch clause. Similarly, if all invocations are unconditional, generation of the
inbranch version can be suppressed using the
notinbranch clause. Suppressing either
notinbranch version of a SIMD function helps to reduce code size and compilation time. By default, both
notinbranch versions of vector variants have to be provided, since the compiler cannot determine that the original scalar function is always called under condition (or not).
© HPC Today 2021 - All rights reserved.
Thank you for reading HPC Today.