General Optimization Options

Detailed descriptions of general optimization options.

-h [no]add_paren

Default:  -h noadd_paren

The -h add_paren option automatically adds parenthesis to select associative operations (+,-,*) to encourage left to right evaluation of floating point and complex expressions. For more information, see the crayftn(1) man page. Left to right evaluation is not required by the language standards, but some applications may expect it.

-h [no]aggress

Default:  noaggress

Provides greater opportunity to optimize loops that would otherwise by inhibited from optimization due to an internal compiler size limitation. noaggress leaves this size limitation in effect. With aggress, internal compiler tables are expanded to accommodate larger loop bodies. This option can increase the compilation's time and memory size.

-h [no]autoprefetch

Default: autoprefetch

This option controls autoprefetch optimization. This option does not affect the loop_info noprefetch or prefetch directives.

-h [no]autothread

Default: noautothread

The -h autothread option enables autothreading.

-h display_opt

The -h display_opt option displays the compiler optimization settings currently in force.

-h flex_mp=level

Default: -h flex_mp=default

The -h flex_mp=level option controls the aggressiveness of optimizations which may affect floating point and complex repeatability when application requirements require identical results when varying the number of ranks or threads.

The values for level are:
intolerant
Has the highest probability of repeatable results, but also the highest performance penalty.
strict
Uses some safe optimizations and yields higher performance than intolerant, with a high probability of repeatable results.
conservative
Uses more aggressive optimization and yields higher performance than intolerant, but results may not be sufficiently repeatable for some applications.
default
Uses more aggressive optimization and yields higher performance than conservative, but results may not be sufficiently repeatable for some applications.
tolerant
Uses most aggressive optimization and yields highest performance, but results may not be sufficiently repeatable for some applications.

-h [no]func_trace

Default: -h nofunc_trace

The -h func_trace option is for use only with CrayPat. If this option is specified, the compiler inserts CrayPat trace entry points into each function in the compiled source file. The names of the trace entry points are:
  • __pat_tp_func_entry
  • __pat_tp_func_return
These are resolved by CrayPat when the program is instrumented using the pat_build command. When the instrumented program is executed and it encounters either of these trace entry points, CrayPat captures the address of the current function and its return address.

-h fusionn

Default: fusion2

Loop fusion can improve the performance of loops, although in rare cases it may degrade performance. The n argument allows loop fusion to be turned on or off and determine where fusion should occur.

Loop fusion is disabled when n is set to 0.

The values for n are:
0
No fusion. Ignore all fusion directives and do not attempt to fuse other loops.
1
Attempt to fuse loops that are marked by the fusion directive.
2
Attempt to fuse all loops (includes array syntax implied loops), except those marked with the nofusion directive.

-h [no]intrinsics

Default: -h intrinsics

Allow the use of intrinsic hardware functions, which allow direct access to some hardware instructions or generate inline code for some functions. This option has no effect on specially handled library functions

-h [no]msgs

Default: nomsgs

The -h msgs option causes the compiler to write optimization messages to stderr.

-h [no]negmsgs

Default: nonegmsgs

The -h negmsgs option causes the compiler to write messages to stderr that indicate why optimizations such as vectorization, inlining, or cloning did not occur. The -h negmsgs option enables the -h msgs option. The -h list=a option enables the -h negmsgs option.

-h [no]omp_trace

Default: -h noomp_trace

The -h omp_trace option turns the insertion of the CrayPat OpenMP tracing calls on.

-h [no]overindex

Default: nooverindex

The overindex option declares that there are array subscripts that index a dimension of an array that is outside the declared bounds of that array. The nooverindex option declares that there are no array subscripts that index a dimension of an array that is outside the declared bounds of that array.

-h [no]pattern

Default: pattern

Globally enables pattern matching. When the compiler recognizes certain patterns in the source code, it replaces the construct with a call to an optimized library routine. A loop or statement that has been pattern matched and replaced with a call to a library routine is indicated with an A in the loopmark listing. The nopattern option globally disables pattern matching and causes the compiler to ignore the PATTERN and NOPATTERN directives.

Pattern matching is not always worthwhile. If there is a small amount of work in the pattern-matched construct, the call overhead may outweigh the time saved by using the optimized library routine. When compiling using the default optimization settings, the compiler attempts to determine whether each given candidate for pattern matching will in fact yield improved performance.

-h pl=program_library

Create and use a persistent repository of compiler information specified by program_library. When used with -hwp, this option provides application-wide, cross-file, automatic inlining. The program_library repository is implemented as a directory and the information contained in program library is built up with each compiler invocation. Any compilation that does not have the -hpl option will not add information to this repository. Because of the persistence of program_library, it is the user's responsibility to manage it. For example, rm -r program_library might be added to the make clean target in an application makefile. Because program_library is a directory, use rm -r to remove it. If an application makefile works by creating files in multiple directories during a single build, the program_library should be an absolute path, otherwise multiple and incomplete program library repositories will be created. For example, avoid -hpl=./PL.1 and use -hpl=/fullpath/builddir/PL.1 instead.

-h profile_generate

The -h profile_generate option directs that the source code be instrumented for gathering profile information. The compiler inserts calls and data-gathering instructions to allow CrayPat to gather information about the loops in a compilation unit. If using this option, CrayPat must be run on the resulting executable so the CrayPat data-gathering routines are linked in. For information about CrayPat and profile information, see the Cray Performance Measurement and Analysis Tools User Guide.

-h threadn

Default: -h thread2

The -h threadn option controls the optimization of both OpenMP and automatic threading.

The values for n:
0
No autothreading or OMP threading. The  thread0  option is similar to  -h noomp , but  -h noomp  disables OpenMP only and does not affect autothreading.
1
Specifies strict compliance with the OpenMP standard for directive compilation. Strict compliance is defined as no extra optimizations in or around OpenMP constructs. In other words, the compiler performs only the requested optimizations.
2
OpenMP parallel regions are subjected to some optimizations; that is, some parallel region expansion. Parallel region expansion is an optimization that merges two adjacent parallel regions in a compilation unit into a single parallel region.
3
Full optimization: loop restructuring, including modifying iteration space for static schedules (breaking standard compliance). Reduction results may not be repeatable.

-h unrolln

Default: unroll2

The -h unrolln option globally controls loop unrolling and changes the assertiveness of the UNROLL directive. By default, the compiler attempts to unroll loops, unless the NOUNROLL directive is specified of a loop. Generally, unrolling loops increases single processor performance at the cost of increased compile time and code size.

Loop unrolling is disabled when the scalar level is set to 0.

The values for n:
0
No unrolling (ignore all  unroll  pragmas and do not attempt to unroll other loops).
1
Attempt to unroll loops that are marked by the unroll  pragma.
2
Unroll loops when performance is expected to improve. Loops marked with the unroll  or nounroll  pragma override automatic unrolling.

-h wp

Enables the whole program mode.

This option causes the compiler backend (IPA, optimizer, codegenerator) to be invoked at application link time, enabling whole program automatic inlining/cloning and future whole program interprocedural analysis (IPA) optimizations. Since the -hwp option provides automatic application-wide inlining, the -Oipafrom option is no longer needed for cross-file inlining and using these two options together is not permitted. Requires that -h pl=program_library is also specified.

The options -hpl= and -hwp should be specified on all compiler invocations and on the compiler link invocation. Since -hwp delays the compiler optimization step until link time, -c compiles will take less time and the link step will take longer. Normally, this is just a time shift from one build phase to another with roughly the same overall compile time. In some cases increased inlining may cause an increase in overall compile time. Using -hwp allows the compiler backend to be invoked in parallel during a build. Setting the environment variable NPROC controls the number of concurrent compiler backend invocations and this parallelism may reduce overall compile time.

-Olevel

Default: -O2

Specify a general level of optimization that includes vectorization, scalar optimization, cache management, and inlining. Generally, as the optimization level increases, compilation time increases and execution time decreases.

The -Olevel specifications do not directly correspond to the numeric optimization levels for scalar optimization, vectorization, and inlining. For example, specifying  -O 3  does not necessarily enable  -h vector3. Cray reserves the right to alter the specific optimizations performed at these levels from release to release. Use the  -h display_opt  option to display the optimization options used during compilation.

The values for level:
0
Disable optimization, including floating point optimizations. Low compile time, small compile size, no global scalar optimization. Vectorize most array syntax statements, but disable all other vectorizations. Implies  -h fp0.
1
Conservative optimization: moderate compile time and size, global scalar optimizations, and loop nest restructuring. Results may differ from the results obtained when  -O 0  is specified because of operator reassociation. No optimizations will be performed that might create false exceptions. Only array syntax statements and inner loops are vectorized and the system does not perform some vector reductions. User tasking is enabled, so OpenMP directives are recognized.
2
Moderate optimization: moderate compile time and size, global scalar optimizations, pattern matching, and loop nest restructuring. Results may differ from results obtained when -O 1 is specified because of vector reductions. The -O 2 option enables automatic vectorization of array syntax and entire loop nests. This is the default level of optimization.
3
Aggressive optimization: potentially larger compile time and size, global scalar optimizations, possible loop nest restructuring, and pattern matching. The optimizations performed might create false exceptions in rare instances. Results may differ from results obtained when  -O 1  is specified because of vector reductions.