Vectorization Directives
This topic describes the directives used for vectorization.
Because vector operations cannot be expressed directly in the compiler, the compiler must be capable of transforming scalar operations into equivalent vector operations. The candidates for vectorization are operations in loops and assignments of structures. Compiler directives may be used to control vectorization.
concurrent
#pragma _CRI concurrent [safe_distance=n]- n
- An integer that represents the number of additional consecutive loop iterations that can be executed in parallel without danger of data conflict. n must be an integer constant > 0. If SAFE_DISTANCE=n is not specified, the distance is assumed to be infinite, and the compiler ignores all cross-iteration dependencies. The concurrent directive is ignored if the safe_distance clause is used and vectorization is requested on the command line.
The concurrent directive indicates that no data dependence exists between array references in different iterations of the loop. This directive affects the loop that immediately follows it. This can be useful for vectorization optimizations.
concurrent Directive
The concurrent directive indicates that the relationship k>3 is true. The compiler will safely load all the array references x[i-k], x[i-k+1], x[i-k+2], x[i-k+3] during loop iteration i.
#pragma _CRI concurrent safe_distance=3
for (i = k + 1; i < n;i++) {
x[i] = a[i] + x[i-k];
}
hand_tuned
#pragma _CRI hand_tunedAssert that the code in the next loop nest has been arranged by hand for maximum performance, and the compiler should restrict some of the more aggressive automatic expression rewrites. The compiler should still fully optimize and vectorize the loop within the constraints of the directive. The hand_tuned directive applies to the next loop in the same manner as the concurrent and safe_address directives.
Use of this directive may severely impede performance. Use carefully and evaluate performance before and after employing this directive.
ivdep
#pragma _CRI ivdep [ SAFEVL=vlen | INFINITEVL ]- vlen
- Specifies a vector length in which no dependency will occur. vlen must be an integer between 1 and 1024 inclusive.
INFINITEVL- Specifies an infinite safe vector length. No dependency will occur at any vector length.
When the ivdep directive appears before a loop, the compiler ignores vector dependencies, including explicit dependencies, in any attempt to vectorize the loop. ivdep applies only to the first for loop or while loop that follows the directive within the same program unit.
If no vector length is specified, the vector length used is infinity.
If a loop with an ivdep directive is enclosed within another loop with an ivdep directive, the ivdep directive on the outer loop is ignored. When the Cray compiler vectorizes a loop, it may reorder the statements in the source code to remove vector dependencies. When ivdep is specified, the statements in the loop or array syntax statement are assumed to contain no dependencies as written, and the Cray compiler does not reorder loop statements.
loop_info
#pragma _CRI loop_info prefer_thread#pragma _CRI loop_info prefer_nothread#pragma _CRI loop_info [min_trips(c)] [est_trips(c)] [max_trips(c)][cache( symbol[,symbol ...] )][cache_nt(symbol[,symbol ...] ) ][prefetch] [noprefetch]- c
- An expression that evaluates to an integer constant at compilation time.
- min_trips
- Specifies guaranteed minimum number of trips.
- est_trips
- Specifies estimated or average number of trips.
- max_trips
- Specifies guaranteed maximum number of trips.
cache- Specifies that symbol is to be allocated in cache; this is the default if no hint is specified and the
cache_ntdirective is not specified. cache_nt- Specifies that symbol is to use non-temporal reads and writes.
prefetch- Specifies a preference that prefetches be performed for the following loop.
noprefetch- Specifies a preference that no prefetches be performed for the following loop.
- symbol
- The base name of the object that should not be placed into the cache. This can be the base name of any object (such as an array or scalar structure) without member references like
C[10]. If specifying a pointer in the list, only the references, not the pointer itself, have the no cache allocate property.
The loop_info directive allows additional information to be specified about the behavior of a loop, including run time trip count, hints on cache allocation strategy, and threading preference. The loop_info directive provides information to the optimizer and can produce faster code sequences.
Use loop_info immediately before a for loop to indicate minimum, maximum, estimated trip count. The compiler will diagnose misuse at compile time when able, or when option -h dir_check is specified at run time.
For cache allocation hints, use the loop_info directive to override default settings, cache or cache_nt directives, or override automatic cache management decisions. The cache hints are local and apply only to the specified loop nest.
Use the loop_info prefer_thread directive to indicate the preference that the loop following the directive be threaded. The loop_info prefer_nothread indicates the preference that the loop following the directive should not be threaded.
loop_info Directive
The minimum trip count is 1 and the maximum trip count is 1000.
void
loop_info( double *restrict a, double *restrict b, double s1, int n )
{
int i;
#pragma _CRI loop_info min_trips(1) max_trips(1000), cache_nt(b)
for (i = 0; i< n; i++) {
if(a[i] != 0.0) {
a[i] = a[i] + b[i]*s1;
}
}
}
nopattern
#pragma _CRI nopatternThe nopattern directive disables pattern matching for the loop immediately following the directive. By default, the compiler detects coding patterns in source code sequences and replaces these sequences with calls to optimized library functions. In most cases, this replacement improves performance. There are cases, however, in which this substitution degrades performance. This can occur, for example, in loops with very low trip counts. In such a case, use the nopattern directive to disable pattern matching and cause the compiler to generate inline code.
The nopattern directive disables pattern matching for the loop immediately following the directive.
nopattern Directive
Placing the nopattern directive in front of the outer loop of a nested loop turns off pattern matching for the matrix multiply that takes place inside the inner loop.
double a[100][100], b[100][100], c[100][100];
void nopat(int n)
{
int i, j, k;
#pragma _CRI nopattern
for (i=0; i < n; ++i) {
for (j = 0; j < n; ++j) {
for (k = 0; k < n; ++k) {
c[i][j] += a[i][k] * b[k][j];
}
}
}
}
[no]vector
#pragma _CRI novector#pragma _CRI vector [clause[, clause] ... ]always- Vectorize the loop that immediately follows the directive. This directive states a vectorization preference and does not guarantee that the loop has no memory-dependence hazard. This directive has the same effect as the prefervector directive.
aligned- Directs the compiler to generate aligned data movement instructions for array references when vectorizing. For current Intel processors, data alignment is necessary for efficient vectorization. Use with care to improve performance. If some of the access patterns are actually unaligned, using the ALIGNED clause may generate incorrect code. This directive also directs the compiler to ignore explicit and implicit vector dependencies.
unaligned- Directs the compiler to generate unaligned data movement instructions for all array references when vectorizing.
The novector directive suppresses compiler attempts to vectorize loops and array syntax statements. It overrides any other vectorization-related directives, as well as the -h vector and -O vectorn command line options. These directives are ignored if vectorization or scalar optimization has been disabled.
In C/C++, the novector directive applies only to the following loop. When applied to an outer loop in a nest, the directive also applies to all inner loops. After a vector directive is specified, automatic vectorization is enabled for all loop nests.
novector Directive
#pragma _CRI novector
for (i = 0; i < h; i++) { /* Loop not vectorized */
a[i] = b[i] + c[i];
}
permutation
#pragma _CRI permutation symbol [, symbol ] ...Specifies that an integer array has no repeated values. This directive is useful when the integer array is used as a subscript for another array (vector-valued subscript). This directive may improve code performance.
In a sequence of array accesses that read array element values from the specified symbols with no intervening accesses that modify the array element values, each of the accessed elements will have a distinct value.
When an array with a vector-valued subscript appears on the left side of the equal sign in a loop, many-to-one assignment is possible. Many-to-one assignment occurs if any repeated elements exist in the subscripting array. If it is known that the integer array is used merely to permute the elements of the subscripted array, it can often be determined that many-to-one assignment does not exist with that array reference.
permutation Directive
Sometimes a vector-valued subscript is used as a means of indirect addressing because the elements of interest in an array are sparsely distributed; in this case, an integer array is used to select only the desired elements, and no repeated elements exist in the integer array. The permutation directive does not apply to the array a. Rather, it applies to the pointer used to index into it, ipnt. By knowing that ipnt is a permutation, the compiler can safely generate an unordered scatter for the write to a.
int *ipnt;
#pragma permutation ipnt
...
for ( i = 0; i < N; i++ ) {
a[ipnt[i]] = b[i] + c[i];
}
[no]pipeline
#pragma _CRI pipeline#pragma _CRI nopipelineSoftware-based vector pipelining (software vector pipelining) provides additional optimization beyond the normal hardware-based vector pipelining. In software vector pipelining, the compiler analyzes all vector loops and automatically attempts to pipeline a loop if doing so can be expected to produce a significant performance gain. This optimization also performs any necessary loop unrolling.
In some cases the compiler either does not pipeline a loop that could be pipelined or pipelines a loop without producing performance gains. In these situations, use the pipeline or nopipeline directive to advise the compiler to pipeline or not pipeline the loop immediately following the directive.
Software vector pipelining is valid only for the innermost loop of a loop nest. These directives are advisory only. While the nopipeline directive can be used to inhibit automatic pipelining, and the pipeline directive can be used to attempt to override the compiler's decision not to pipeline a loop, the compiler cannot be forced to pipeline a loop that cannot be pipelined.
Loops that have been pipelined are so noted in loopmark listing messages.
prefervector
#pragma _CRI prefervectorDirects the compiler to vectorize the loop immediately following the directive if the loop contains more than one loop in the nest that can be vectorized. The directive states a vectorization preference and does not guarantee that the loop has no memory-dependence hazard.
prefervector Directive
Both loops can be vectorized, but the directive directs the compiler to vectorize the outer for loop. Without the directive and without any knowledge of n and m, the compiler would vectorize the inner loop.
float a[1000], b[100][1000];
void
f(int m, int n)
{
int i, j;
#pragma _CRI prefervector
for (i = 0; i < n; i++) {
for (j = 0; j < m; j++) {
a[i] += b[j][i];
}
}
}
pgo loop_info
#pragma _CRI prefervectorEnables profile-guided optimizations by tagging loopmark information as having come from profiling. For information about CrayPat and profile information, see the Cray Performance Measurement and Analysis Tools User Guide.
safe_address
#pragma _CRI safe_addressSpecifies that it is safe to speculatively execute memory references within all conditional branches of a loop; these memory references can be safely executed in each iteration of the loop. For most code, this directive can improve performance significantly by preloading vector expressions. However, most loops do not require this directive to have preloading performed. safe_address is required only when the safety of the operation cannot be determined or index expressions are very complicated.
safe_address directive is an advisory directive. That is, the compiler may override the directive if it determines the directive is not beneficial. If the directive is not used on a loop and the compiler determines that it would benefit from the directive, it issues a message indicating such. The message is similar to this:CC-6375 cc: VECTOR File = ctest.c, Line = 6
A loop would benefit from "#pragma safe_address"If using the directive on a loop and the compiler determines that it does not benefit from the directive, it issues a message that states the directive is superfluous and can be removed.To see the messages, use the -h report=v or -h msgs option.
Incorrect use of the directive can result in segmentation faults, bus errors, or excessive page faulting. However, it should not result in incorrect answers. Incorrect usage can result in very severe performance degradations or program aborts.
safe_address directive
In this example, the compiler will not preload vector expressions, because the value of j is unknown. However, if it is known that references to b[i][j] are safe to evaluate for all iterations of the loop, regardless of the condition, the safe_address directive can be used. With the directive, the compiler can safely load b[i][j] as a vector, merge 0.0 where the condition is true, and store the resulting vector safely.
void x3( double a[restrict 1000], int j )
{
int i;
#pragma _CRI safe_address
for ( i = 0; i < 1000; i++ ) {
if ( a[i] != 0.0 ) {
b[j][i] = 0.0;
}
}
}
safe_conditional
#pragma _CRI safe_conditionalSpecifies that it is safe to execute all memory references and arithmetic operations within all conditional branches of the subsequent scalar or vector loop nest. It can improve performance by allowing the hoisting of invariant expressions from conditional code and by allowing prefetching of memory references.
The safe_conditional directive is an advisory directive. The compiler may override the directive if it determines the directive is not beneficial.
Incorrect use of the directive can result in segmentation faults, bus errors, or excessive page faulting. However, it should not result in incorrect answers. Incorrect usage can result in very severe performance degradations or program aborts.
safe_conditional directive
In this example, without the safe_conditional directive, the compiler cannot precompute the invariant expression s1*s2 because their values are unknown and may cause an arithmetic trap if executed unconditionally. However, if the condition is known to be true at least once, then s1*s2 is safe to speculatively execute. The safe_conditional compiler directive can be used to imply the safety of the operation. With the directive, the compiler evaluates s1*s2 outside of the loop, rather than under control of the conditional code. In addition, all control flow is removed from the body of the vector loop, because s1*s2 no longer poses a safety risk.
void
safe_cond( double a[restrict 1000], double s1, double s2 )
{
int i;
#pragma _CRI safe_conditional
for (i = 0; i< 1000; i++) {
if( a[i] != 0.0) {
a[i] = a[i] + s1*s2;
}
}
}