Configure and Validate Dynamic Cooling Control Variables

Customize SMW dynamic fan speed variables in the .ini files.

Under normal circumstances, administrators need only set the fan_auto_speed_enable to 1 to enable dynamic fan speed control. All other dynamic fan speed related variables should be left at their default settings.

In particular, adjusting the fan_auto_speeds variable is not recommended as the automatically generated fan speed tables will always be correct for the type of hardware on each blade.

The following settings are described here for use in special situations where the default values are not adequate.

CAUTION: Cray recommends that these settings (other than fan_auto_speed_enable) be changed only in close consultation with Cray service. Refer to the xtccr(8) man page for complete list of all xtccr configuration attributes.
fan_auto_high_temp_offset
Specifies the offset from the highest temperature that a CPU or GPU can operate without exceeding the threshold temperature (TJMAX) and being throttled, which corresponds to the highest fan speed in a fan speed table. The default value of fan_auto_high_temp_offset is 10. The potential range of values for this variable are >= 0 and <= 20. For example, if fan_auto_speed_high is not set and fan_auto_high_temp_offset is set if a component has a TJMAX of 100, then the highest fan speed in the fan speed table will be equal to fan_speed_normal, and the corresponding temperature for that fan speed will be at >= 90.
fan_auto_high_temp_offset=10
fan_auto_speed_enable
Enables automatic fan speed selection. Set this variable to 1 to enable dynamic fan speed support. The default value is 0 (disabled).
fan_auto_speed_high
Specifies the highest fan speed that can be used within a fan speed table, whether the table is user-specified or auto-generated. The default value of fan_auto_speed_high in auto-generated fan speed tables is the value of fan_speed_normal. The potential range of values for this variable are >= fan_speed_normal and <= fan_speed_high.
fan_auto_speed_high=3100
fan_auto_speed_min
This is the lowest fan speed that will be used in a fan speed table. This value cannot be less than 1550 for standard blowers and 1900 for high-pressure (HP) blowers. The default value is 1900 for standard blowers and 2400 for HP blowers.
fan_auto_speed_temp_step
Defines the component temperature in degrees C that will cause a different fan speed to be selected from the fan speed table. Value must be >= 2 or <= 12. The default value is 5. This is used only if fan_auto_speed_enable=1.
fan_auto_speed_temp_step=5
fan_auto_speed_rpm_step
In an auto generated fan speed table, each speed is separated in the table by fan_auto_speed_rpm_step. The highest fan speed in an auto generated table is the result of fan_speed_high - fan_auto_speed_rpm_step. For example, if fan_speed_high is 3100 and fan_auto_speed_rpm_step is 150, then the speeds in the auto generated fan speed table are 2950, 2800, 2650, etc.
The value must be >= 100 or <= 300. The default value is 150. This has no effect on fan speeds defined via fan_auto_speeds. This is used only if fan_auto_speed_enable=1.
fan_auto_speed_rpm_step=150
fan_auto_speed_high
This is the highest fan speed in an auto-generated fan speed table. The default value is equal to fan_speed_normal (as defined in the .ini file, which is usually 2700 for standard blowers and 3400 for HP blowers.
fan_auto_high_temp_offset
This is the temperature offset (in degrees C), from TJMAX to use when creating the entry for the highest fan speed in an auto-generated fan speed table. The default value is 10. Allowed values must be between 0 and 20.
fan_auto_speeds
Defines the contents of the fan speed table. The highest speed allowed is fan_speed_high and the lowest speed allowed is fan_auto_speed_min. A minimum of 2 and a maximum of 15 fan speeds may be defined. No duplicates are allowed. Auto fan speeds are switched whenever component temperatures vary by fan_auto_speed_temp_step degrees C.
This is used only if fan_auto_speed_enable=1. If fan_auto_speed_enable=1 and fan_auto_speeds are not defined, then fan speed tables will be auto generated.
Cray does not recommend this configuration because the fan speed table is used for different types of CPUs/GPUs, whereas auto-generated fan speed tables are built using the TJMAX for each specific type of CPU/GPU.
fan_speed_step_up_delay
Specifies the amount of time before the system switches to a higher speed in a fan speed table when die temperatures are increasing. The default is 20 seconds.
fan_speed_step_down_delay
Specifies the amount of time before the system switches to a lower fan speed when die temperatures are decreasing. The default is 300 seconds.

INI File Validation

If the dynamic fan speed variables have been changed from their default values, it's important to validate the .ini files, prior to loading them onto the controllers.

Use the xtccr --validate command to do this.
crayadm@smw> xtccr --validate=filename

Some of the variables defined in the cooling .ini files may be fully validated in this fashion, whereas other variables may only be provisionally validated, as information specific to each cabinet is required to fully validate the value of a variable.

Setting fan speeds dynamically via xtccr on systems with mixed blower types within the same row is not supported. On systems with both STD and HP blowers in separate rows, fan speed settings must be done by means of row-specific .ini files.

For example, the value of fan_speed_high can only be validated provisionally because knowledge of the type of fans installed within a cabinet (STD or HP) is required to fully validate the value.