Increase the Boot Manager Timeout Value

On systems of 4,000 nodes or larger, the time that elapses until the boot manager receives all responses to the boot requests can be greater than the default 60-second time-out value. This is due, in large part, to the amount of other event traffic that occurs as each compute node generates its console output.

To avoid this problem, change the boot_timeout value in the /opt/cray/hss/default/etc/bm.ini file on the SMW to increase the default 60-second time-out value by 60 seconds for every 5,000 nodes; for example:

Increase the boot_timeout value

For systems of 5,000 to 10,000 nodes, change the boot_timeout line to:

boot_timeout 120

For systems of 10,000 to 15,000 nodes, change the boot_timeout line to:

boot_timeout 180