Math Options

This topic lists and describes the different math options available.

-h fplevel

Default: -h fp2

The -h fp option controls the level of floating-point optimizations. The level argument controls the level of allowable optimization; 0 gives the compiler minimum freedom to optimize floating-point operations, while 4 gives it maximum freedom. The higher the level, the less the floating-point operations conform to the IEEE standard.

Each level, 1-4, includes the optimizations at the previous level.

The values for level:
0
Use this level only when the code pushes the limits of IEEE accuracy or requires strong IEEE standard conformance. The resulting executable code conforms more closely to the IEEE floating-point standard than the default mode (-h fp2). Many identity optimizations are disabled. Vectorization of floating-point and complex reductions are disabled. Executable code is slower than higher floating-point optimization levels.
1
Use this option only when the code pushes the limits of IEEE accuracy or requires strong IEEE standard conformance. This option performs generally safe, non-conforming IEEE optimizations, such as folding a == a to true, where a is a floating point object. At this level, a scaled complex divide mechanism is enabled that increases the range of complex values that can be handled without producing an underflow. Rewrite of division into multiplication by reciprocal is inhibited.
2
Default.
3
Use when performance is more critical than the level of IEEE standard conformance provided by fp2. The -h fp3 option is an acceptable level of optimization for many applications.
4
Use if the application uses algorithms which are tolerant of reduced precision.
Table 1. Floating-point Optimization Levels
Optimization Typefp0fp1fp2 (default)fp3fp4
SafetyMaximumHighHighModerateLow
Complex divisionsAccurate and slowerAccurate and slowerFast1Fast1Fast1
Exponentiation rewriteNoneNoneWhen optimization benefit is very high2Always2,3Always2,3
Strength reductionNoneNoneFastFastFast
Rewrite division as reciprocal equivalent4NoneNoneYesAggressiveAggressive
Floating point reductionsSlowFastFastFastFast
Expression factoringNoneYesYesYesYes
Expression tree balancingNoneNoneYesYesYes
Inline 32-bit operations5NoNoNoYesYes
Fused multiply-add6NoYesYesYesYes
1

Algebraically correct but may lack precision in boundary cases.

2

Rewriting values raised to a constant power into an algebraically equivalent series of multiplications and/or square roots.

3

Rewriting exponentiations (ab) not previously optimized into the algebraically equivalent form exp(b * ln(a)).

4

For example, x/y is transformed to x * 1.0/y.

5

32-bit division, square root, and reciprocal square root use very fast but less precise code sequences.

6

Uses fused multiply-add instructions on architectures that support it.