Dump and Reboot Nodes Automatically
The SMW daemon dumpd initiates automatic dump and reboot of nodes when requested by the Node Health Checker (NHC).
A system administrator can set global variables in the /etc/opt/cray/nodehealth/nodehealth.conf configuration file to control the interaction of NHC and dumpd. For more information about NHC and the nodehealth.conf configuration file, see Configure the Node Health Checker (NHC).
Variables can also be set in the /etc/opt/cray-xt-dumpd/dumpd.conf configuration file on the SMW to control how dumpd behaves on the system.
Each CLE release package also includes an example dumpd configuration file, /etc/opt/cray-xt-dumpd/dumpd.conf.example. The dumpd.conf.example file is a copy of the /etc/opt/cray-xt-dumpd/dumpd.conf file provided for an initial installation.
If the /etc/opt/cray-xt-dumpd/dumpd.conf file does not exist, then the /etc/opt/cray-xt-dumpd/dumpd.conf.example file is copied to the /etc/opt/cray-xt-dumpd/dumpd.conf file.
The CLE installation and upgrade processes automatically install dumpd software, but it must be explicitly enabled.