Performance Options

Describes performance optimizaiton enhancements in CCE Clang

Clang does not apply optimizations unless they are requested. For best performance, -Ofast with -flto is recommended. For applications that are sensitive to floating-point optimizations, it may be necessary to adjust the floating-point optimization level using one of the options below. For applications that require bit reproducibility (i.e., which are designed to calculate the same result no matter how the work is distributed among a constant product of MPI ranks and OpenMP threads), it may be necessary to forgo floating-point optimization by using -O3 instead of -Ofast.

-fast
Implies -Ofast and -flto.
-ffp=level
Select a level for Cray floating-point math optimizations and math library functions. Requesting the lowest level, -ffp=0, will generate code with the highest precision and grants the compiler minimal freedom to optimize floating-point operations, whereas requesting the highest level, -ffp=4, will grant the compiler maximal freedom to aggressively optimize but likely will result in lower precision.
Requesting levels 1 through 4 will flush denormals to zero and imply -funsafe-math-optimizations and -fno-math-errno; if those options are subsequently changed, then this option may not work as expected. With -fcray, -ffp=3 is implied by -ffast-math or -Ofast. Using -ffp=0 will prevent the use of Cray math libraries and disable all Cray floating-point optimizations.
Supported values for level are 0, 1, 2, 3, 4.
-fcray-mallopt, -fno-cray-mallopt
Optimize malloc by using Cray's custom mallopt parameters, which for most programs improves performance but may cause higher memory usage. This is a link-time option. The default is -fcray-mallopt.
-fivdep, -fno-ivdep
Enable or disable #pragma ivdep handling. The default is -fivdep.
-flocal-restrict, -fno-local-restrict
Honor restrict-qualified pointers declared in a block scope by assuming that they do not alias with other restrict-qualified pointers declared in the same block scope. The default is -flocal-restrict.
-floop-trips=scale
Optimize assuming loops with statically unknown trip counts have trip counts at the scale of scale.
At this time, the only valid value for scale is huge: assume loops have trip counts large enough such that referenced data will not fit in the cache.