Dynamic Fan Speed Control

XC™ series SMW dynamic fan speed control.

The HSS cooling system for liquid-cooled XC Series cabinets supports dynamic fan speed control by row or for the entire system.

When dynamic fan speed control is not enabled the HSS cooling software operates the cabinet fans at one of 3 fan speeds, defined as fan_speed_idle when the blades in the cabinet are not powered on, fan_speed_high when a CPU or GPU is within 8 degrees of the highest temperature that it can operate at without being throttled (TJMAX), and fan_speed_normal at all other times.

The speed setting of fan_speed_normal ensures that, under normal operation, the temperature of the CPU/GPU dies are maintained below the hot spot detection threshold. If the cooling water is at the required temperature and the temperature setpoint is set appropriately, no hot spot should be detected, as this setting is expected to cover the worst case. Typically, die temperatures on a production system fluctuate but are below the throttle threshold most of the time. Setting fan speed to a constant fan_speed_normal is unnecessary and can consume more energy than is needed to properly cool the system.

When the dynamic fan speed feature is enabled, the cabinets self-regulate their fan speed based upon observed CPU and/or GPU temperatures. Each cabinet in a row runs its fans at the same speed, based on the highest CPU or GPU temperature sensor reading from all of the blades in all cabinets within the row. The frequency with which fan speeds change in response to temperature sensor readings varies depending on the type of jobs running on the system, and is bounded by two pre-existing ini file variables:
  • fan_speed_step_up_delay This variable controls how fast the system will switch to a higher speed in a fan speed table if die temperatures are increasing. The default is 20 seconds.
  • fan_speed_step_down_delay This variable controls how fast the system will switch to a lower fan speed if die temperatures are decreasing. The default is 300 seconds.
Important: Cray recommends that these and other cooling variables related to dynamic fan speeds in the initialization files be kept at their default values. The exception is fan_auto_speed_enable, which enables dynamic fan speed control.

Enabling dynamic fan speed control does not supercede CPU hot-spot detection and control. When a hot spot is detected, the cabinet fans in a row will still switch to the fan_speed_high setting and remain at that setting until the hot spot is cleared. Similarly, if the blades are powered down, the fans will run at the fan_speed_idle setting.