Run Tests after Boot is Complete
Tests are performed as crayadm from the login node and include running apstat to get number of nodes, running aprun on all nodes, checking the home directory, the Lustre directory, and the SMW databases to ensure they have the current state of all compute nodes.
- The system has completed booting.
- The compute nodes are “interactive" (i.e., not under workload manager control).
- ALPS is available.
If ALPS is not available and Slurm is used as the workload manager (WLM), then the compute nodes can be either "interactive" or "batch" and srun (the Slurm command equivalent to aprun) should be used instead of the aprun commands in the steps that follow.
Log in to the login node as crayadm. This can be done from the SMW to the boot node to the login node, or directly from another computer to the login node without passing through the SMW and boot node. Then perform these rudimentary functionality checks.