General Optimization Options

Detailed descriptions of general optimization options.

-h [no]add_paren

Default: -h noadd_paren

The -h add_paren option automatically adds parenthesis to select associative operations (+,-,*) to encourage left to right evaluation of floating point and complex expressions. For more information, see the crayftn(1) man page. Left to right evaluation is not required by the language standards, but some applications may expect it.

-h [no]aggress, -O [no]aggress

Default: noaggress

Provides greater opportunity to optimize loops that would otherwise by inhibited from optimization due to an internal compiler size limitation. noaggress leaves this size limitation in effect. With aggress, internal compiler tables are expanded to accommodate larger loop bodies. This option can increase the compilation's time and memory size.

alias=mode

Default: -halias=default

The values for mode are:

none: The compiler assumes that the program contains no aliases; i.e. any given object is accessed through exactly one mechanism. For example, an object may be consistently accessed directly or via a unique pointer, but the object may not be accessed via multiple pointers. This mode may improve optimization, but use it with caution; if aliases are present, incorrect code may be generated.
default: The compiler infers either the c or fortran alias analysis mode from the source language. C++ uses the same mode as C, with the pointer aliasing rules additionally applying to references.
c: The compiler assumes that aliases may exist as allowed by the C language. Briefly, two pointers of similar types (e.g., int and unsigned int) may be used to access the same object, but pointers of different types may not be used to access the same object. There are two exceptions to this rule. First, a pointer to char (char*) may be used to access an object of any type. Second, a structure field may be accessed via a pointer to its type. Accesses to a union are permitted to employ type-punning (i.e., reading from a different field than the one most recently written), provided that such accesses are made only through the union itself and not through pointers to its members.
fortran: The compiler assumes that aliases may exist as allowed by the Fortran language. In particular, the compiler assumes that arguments never alias. This mode may be specified for C programs which behave like Fortran to mimic the effect of applying the restrict qualifier to all pointer parameter types.
tolerant: The compiler assumes that the program is not well-behaved with respect to pointers, such that it may use a pointer to access an object of a completely different type (e.g., using a pointer to long to access a double). This mode activates an extremely conservative alias analysis that may reduce optimization.

-h [no]autoprefetch, -O [no]autoprefetch

Default: autoprefetch

This option controls autoprefetch optimization. This option does not affect the loop_info noprefetch or prefetch directives.

-h [no]autothread, -O [no]autothread

Default: noautothread

The -h autothread option enables autothreading.

This is identical to the -O autothread option and is provided for command-line compatibility between the Cray C and Fortran compilers.

-h display_opt

The -h display_opt option displays the compiler optimization settings currently in force.

This option is identical to the -eo option and is provided for command-line compatibility between the Cray C compiler and Fortran compilers.

-h flex_mp=`level`

Default: -h flex_mp=default

The -h flex_mp=level option controls the aggressiveness of optimizations which may affect floating point and complex repeatability when application requirements require identical results when varying the number of ranks or threads.

The values for level are:

intolerant: Has the highest probability of repeatable results, but also the highest performance penalty.
strict: Uses some safe optimizations and yields higher performance than intolerant, with a high probability of repeatable results.
conservative: Uses more aggressive optimization and yields higher performance than intolerant, but results may not be sufficiently repeatable for some applications.
default: Uses more aggressive optimization and yields higher performance than conservative, but results may not be sufficiently repeatable for some applications.
tolerant: Uses most aggressive optimization and yields highest performance, but results may not be sufficiently repeatable for some applications.

-h fusion`n`, -O fusion`n`

Default: fusion2

Loop fusion can improve the performance of loops, although in rare cases it may degrade performance. The n argument allows loop fusion to be turned on or off and determine where fusion should occur.

Loop fusion is disabled when n is set to 0.

The values for n are:

0: No fusion. Ignore all fusion directives and do not attempt to fuse other loops.
1: Attempt to fuse loops that are marked by the fusion directive.
2: Attempt to fuse all loops (includes array syntax implied loops), except those marked with the nofusion directive.

-h loop_trips=[tiny | small | medium | large | huge], -O loop_trips=[tiny | small | medium | large | huge]

Specifies runtime loop trip counts for all loops in a compiled source file. This information is used to optimize the runtime characteristics of the application.

-h [no]msgs, -O [no]msgs

Default: nomsgs

The -h msgs option causes the compiler to write optimization messages to stderr.

This option is identical to the -O msgs option and is provided for command-line compatibility with the Cray C compiler.

-h [no]negmsgs, -O [no]negmsgs

Default: nonegmsgs

The -h negmsgs option causes the compiler to write messages to stderr that indicate why optimizations such as vectorization, inlining, or cloning did not occur. The -h negmsgs option enables the -h msgs option. The -h list=a option enables the -h negmsgs option.

This option is identical to the -O negmsgs option and is provided for command-line compatibility with the Cray C compiler.

-h [no]omp_trace

Default: -h noomp_trace

The -h omp_trace option turns the insertion of the CrayPat OpenMP tracing calls on.

-h [no]overindex, -O [no]overindex

Default: nooverindex

The overindex option declares that there are array subscripts that index a dimension of an array that is outside the declared bounds of that array. The nooverindex option declares that there are no array subscripts that index a dimension of an array that is outside the declared bounds of that array.

-h [no]pattern, -O [no]pattern

Default: pattern

Globally enables pattern matching. When the compiler recognizes certain patterns in the source code, it replaces the construct with a call to an optimized library routine. A loop or statement that has been pattern matched and replaced with a call to a library routine is indicated with an A in the loopmark listing. The nopattern option globally disables pattern matching and causes the compiler to ignore the PATTERN and NOPATTERN directives.

Pattern matching is not always worthwhile. If there is a small amount of work in the pattern-matched construct, the call overhead may outweigh the time saved by using the optimized library routine. When compiling using the default optimization settings, the compiler attempts to determine whether each given candidate for pattern matching will in fact yield improved performance.

-h pl=`program_library`

Create and use a persistent repository of compiler information specified by program_library. When used with -hwp, this option provides application-wide, cross-file, automatic inlining. The program_library repository is implemented as a directory and the information contained in program library is built up with each compiler invocation. Any compilation that does not have the -hpl option will not add information to this repository. Because of the persistence of program_library, it is the user's responsibility to manage it. For example, rm -r program_library might be added to the make clean target in an application makefile. Because program_library is a directory, use rm -r to remove it. If an application makefile works by creating files in multiple directories during a single build, the program_library should be an absolute path, otherwise multiple and incomplete program library repositories will be created. For example, avoid -hpl=./PL.1 and use -hpl=/fullpath/builddir/PL.1 instead.

-h shortcircuit`n`, -O shortcircuit`n`

Default: shortcircuit2

Specify various levels of short circuit evaluation. Short circuit evaluation is an optimization in which the compiler analyzes all or part of a logical expression based on the results of a preliminary analysis. When short circuiting is enabled, the compiler attempts short circuit evaluation of logical expressions that are used in IF statement scalar logical expressions. This evaluation is performed on the .AND. operator and the .OR. operator.

Assume operand1 .OR. operand2. The operand2 need not be evaluated if operand1 is true because in that case, the entire expression evaluates to true. Likewise, if operand2 is true, operand1 need not be evaluated.

The compiler performs short circuit evaluation in a variety of ways, based on the following command line options

-O shortcircuit0 disables short circuiting of IF and ELSEIF statement logical conditions.
-O shortcircuit1 specifies short circuiting of IF and ELSEIF logical conditions only when a PRESENT, ALLOCATED, or ASSOCIATED intrinsic procedure is in the condition.
The short circuiting is performed left to right. In other words, the left operand is evaluated first, and if it determines the value of the operation, the right operand is not evaluated. The following code segment shows how this option could be used:
```
SUBROUTINE SUB(A)
INTEGER,OPTIONAL::A
IF (PRESENT(A) .AND. A==0) THEN
...
```
The expression A==0 must not be evaluated if A is not PRESENT. The short circuiting performed when -O shortcircuit1 is in effect causes the evaluation of PRESENT(A) first. If that is false, A==0 is not evaluated. If -O shortcircuit1 is in effect, the preceding example is equivalent to the following example:
```
SUBROUTINE SUB(A)
INTEGER,OPTIONAL::A
IF (PRESENT(A)) THEN
   IF (A==0) THEN
   ...
```
-O shortcircuit2 specifies short circuiting of IF and ELSEIF logical conditions, and it is done left to right. All .AND. and .OR. operators in these expressions are evaluated in this way. The left operand is evaluated, and if it determines the result of the operation, the right operand is not evaluated. This is the default for all other cpu targets other than mic-knl.
-O shortcircuit3 specifies short circuiting of IF and ELSEIF logical conditions. It is an attempt to avoid making function calls. When this option is in effect, the left and right operands to .AND. and .OR. operators are examined to determine if one or the other contains function calls. If either operand has functions, short circuit evaluation is performed. The operand that has fewer calls is evaluated first, and if it determines the result of the operation, the remaining operand is not evaluated. If both operands have no calls, then no short circuiting is done. For the following example, the right operand of .OR. is evaluated first. If A==0 then ifunc() is not called:
```
IF (ifunc() == 0 .OR. A==0) THEN
...
```
This is the default if the cpu target is either mic-knl.

-h profile_generate

The -h profile_generate option directs that the source code be instrumented for gathering profile information. The compiler inserts calls and data-gathering instructions to allow CrayPat to gather information about the loops in a compilation unit. If using this option, CrayPat must be run on the resulting executable so the CrayPat data-gathering routines are linked in. For information about CrayPat and profile information, see the Cray Performance Measurement and Analysis Tools User Guide.

Do not combine the -g and -h profile_generate compiler command-line options. Doing so produces reports in CrayPat that contain blank tables and spurious warning messages.

-h [no]safe_addr

Default: -h safe_addr

Provides assurance that most conditionally executed memory references are thread safe, which in turn supports a more aggressive use of speculative writes, thereby improving application performance. If -h nosafe_addr is specified, the optimizer performs speculative stores only when it can prove absolute thread safety using the information available within the application code.

-h thread`n`, -O thread`n`

Default: -h thread2

The -h threadn option controls the optimization of both OpenMP and automatic threading.

The values for n:

0: No autothreading or OMP threading. The thread0 option is similar to -h noomp , but -h noomp disables OpenMP only and does not affect autothreading.
1: Specifies strict compliance with the OpenMP standard for directive compilation. Strict compliance is defined as no extra optimizations in or around OpenMP constructs. In other words, the compiler performs only the requested optimizations. If -h thread1 is specified, it is equivalent to specifying -h nosafe_addr.
2: OpenMP parallel regions are subjected to some optimizations; that is, some parallel region expansion. Parallel region expansion is an optimization that merges two adjacent parallel regions in a compilation unit into a single parallel region.
3: Full optimization: loop restructuring, including modifying iteration space for static schedules (breaking standard compliance). Reduction results may not be repeatable.

-h unroll`n` , -O unroll`n`

Default: unroll2

The -h unrolln option globally controls loop unrolling and changes the assertiveness of the UNROLL directive. By default, the compiler attempts to unroll loops, unless the NOUNROLL directive is specified of a loop. Generally, unrolling loops increases single processor performance at the cost of increased compile time and code size.

Loop unrolling is disabled when the -O or -h scalar level is set to 0.

The values for n:

0: No unrolling (ignore all unroll directives and do not attempt to unroll other loops).
1: Attempt to unroll loops that are marked by the unroll directive.
2: Unroll loops when performance is expected to improve. Loops marked with the unroll or nounroll directive override automatic unrolling.

-h wp

Enables the whole program mode.

This option causes the compiler backend (IPA, optimizer, codegenerator) to be invoked at application link time, enabling whole program automatic inlining/cloning and future whole program interprocedural analysis (IPA) optimizations. Since the -hwp option provides automatic application-wide inlining, the -Oipafrom option is no longer needed for cross-file inlining and using these two options together is not permitted. Requires that -h pl=program_library is also specified.

The options -hpl= and -hwp should be specified on all compiler invocations and on the compiler link invocation. Since -hwp delays the compiler optimization step until link time, -c compiles will take less time and the link step will take longer. Normally, this is just a time shift from one build phase to another with roughly the same overall compile time. In some cases increased inlining may cause an increase in overall compile time. Using -hwp allows the compiler backend to be invoked in parallel during a build. Setting the environment variable NPROC controls the number of concurrent compiler backend invocations and this parallelism may reduce overall compile time.

-O opt , `opt`...

The -O opt option specifies optimization features. More than one -O option can be specified, with accompanying arguments, on the command line. If specifying more than one argument to -O, separate the individual arguments with commas and do not include intervening spaces.The -eo option or the ftnlx command displays the optimization options the compiler uses at compile time. The -eo option is identical to the -h display_opt option which is provided for command-line compatibility with the Cray C compiler.

The -O 0, -O 1, -O 2, and -O 3 options allow a general level of optimization to be specified that includes vectorization, scalar optimization, and inlining. Generally, as the optimization level increases, compilation time increases and execution time decreases.

The -O 1, -O 2, and -O 3 specifications do not directly correspond to the numeric optimization levels for scalar optimization, vectorization, and inlining. For example, specifying -O 3 does not necessarily enable vector3. Cray reserves the right to alter the specific optimizations performed at these levels from release to release.

The other optimization options, such as -O aggress and -O cachen, control pattern matching, cache management, zero incrementing, and several other optimization features. Some of these features can also be controlled through compiler directives.

Table 1. Optimization values
	scalar0	scalar1	scalar2	scalar3	vector0	vector1	vector2	vector3	thread0	thread1	thread2	thread3
Low compile cost	X				X				X
Moderate compile cost		X	X			X	X
Potentially high compile cost				X				X		X	X	X
Potential numerical differences from unoptimized execution(operator reassociation)		X	X	X	X	X	X	X		X	X	X
Implies at least scalar1						X				X	X	X
Implies at least scalar2							X	X			X	X
Loop nest restructuring		X	X	X		X	X	X	X	X	X	X
Vectorize array syntax statements					X	X	X	X
OpenMP disabled									X

All -O options, except -O0, 1, 2 and 3, are also available with the -h option for command-line compatibility with the Cray C compiler.

-O`level`

Default: -O2

Specify a general level of optimization that includes vectorization, scalar optimization, cache management, and inlining. Generally, as the optimization level increases, compilation time increases and execution time decreases.

The -Olevel specifications do not directly correspond to the numeric optimization levels for scalar optimization, vectorization, and inlining. For example, specifying -O 3 does not necessarily enable -h vector3. Cray reserves the right to alter the specific optimizations performed at these levels from release to release. Use the -h display_opt option to display the optimization options used during compilation.

The values for level:

0: Disable optimization, including floating point optimizations. Low compile time, small compile size, no global scalar optimization. Vectorize most array syntax statements, but disable all other vectorizations. Implies -h fp0. Some informational messages may not be issued.
1: Conservative optimization: moderate compile time and size, global scalar optimizations, and loop nest restructuring. Results may differ from the results obtained when -O 0 is specified because of operator reassociation. No optimizations will be performed that might create false exceptions. Only array syntax statements and inner loops are vectorized and the system does not perform some vector reductions. User tasking is enabled, so OpenMP directives are recognized.
2: Moderate optimization: moderate compile time and size, global scalar optimizations, pattern matching, and loop nest restructuring. Results may differ from results obtained when -O 1 is specified because of vector reductions. The -O 2 option enables automatic vectorization of array syntax and entire loop nests. This is the default level of optimization.
3: Aggressive optimization: potentially larger compile time and size, global scalar optimizations, possible loop nest restructuring, and pattern matching. The optimizations performed might create false exceptions in rare instances. Results may differ from results obtained when -O 1 is specified because of vector reductions.

-O [no]zeroinc, -h [no]zeroinc

Default: -O nozeroinc

The -O zeroinc option causes the compiler to assume that a constant increment variable (CIV) can be incremented by zero. A CIV is a variable that is incremented only by a loop invariant value. For example, in a loop with variable J, the statement J = J + K, where K can be equal to zero, J is a CIV. -O zeroinc can cause less strength reduction to occur in loops that have variable increments. The -O [no]zeroinc option is identical to the -h [no]zeroinc option.