Performance Options
Describes performance optimizaiton enhancements in CCE Clang
Clang does not apply optimizations unless they are requested. For best performance, -Ofast with -flto is recommended. For applications that are sensitive to floating-point optimizations, it may be necessary to adjust the floating-point optimization level using one of the options below. For applications that require bit reproducibility (i.e., which are designed to calculate the same result no matter how the work is distributed among a constant product of MPI ranks and OpenMP threads), it may be necessary to forgo floating-point optimization by using -O3 instead of -Ofast.
-fast- Implies
-Ofastand-flto. -ffp=level- Select a level for Cray floating-point math optimizations and math library functions. Requesting the lowest level,
-ffp=0, will generate code with the highest precision and grants the compiler minimal freedom to optimize floating-point operations, whereas requesting the highest level,-ffp=4, will grant the compiler maximal freedom to aggressively optimize but likely will result in lower precision. -fcray-mallopt, -fno-cray-mallopt- Optimize malloc by using Cray's custom mallopt parameters, which for most programs improves performance but may cause higher memory usage. This is a link-time option. The default is
-fcray-mallopt. -fivdep, -fno-ivdep- Enable or disable
#pragma ivdephandling. The default is-fivdep. -flocal-restrict, -fno-local-restrict- Honor restrict-qualified pointers declared in a block scope by assuming that they do not alias with other restrict-qualified pointers declared in the same block scope. The default is
-flocal-restrict. -floop-trips=scale- Optimize assuming loops with statically unknown trip counts have trip counts at the scale of scale.