Displays when you mouse over the topic on the Cray Portal.
There might be an instance where both of the SMWs in an HA pair have been powered down. This could be due to an install, upgrade, planned test, or a power outage. In any of these events, it is best practice to follow a simple procedure to bring the SMWs back online in the quickest and least impactive way.
- Power on the first SMW.
- Wait for the first SMW to complete its boot and start all cluster resources.
The SMW now should have started all cluster resources except for STONITH on the first SMW. If it hasn't, refer to the troubleshooting section in this document.
- Validate the status of the SMWs if the SMWs were powered down due to a failure.
- Check dmesg for errors.
If any errors are found, correct them before continuing onto the next step.
- Check /var/opt/cray/log/smwmessages-<DATE> for errors.
If any errors are found, correct them before continuing onto the next step.
- Check Boot RAID filesystems using fsck if SMWs were powered down due to failure.
Run the tool in query only mode since the filesystems are mounted.
If a problem is found, run the following substeps:
- Put the SMW into standby to stop all resources.
smw1# crm node standby smw1
- Execute fsck or equivalent on the filesystem to repair any corruption.
- Once all filesystems have been recovered, bring the SMW back online
smw1# crm node online smw1
- Power on the second SMW.
The SMW now should have started all cluster resources except for STONITH on the second SMW. If it hasn't, refer to the troubleshooting section in this document.
- Wait for the second SMW to complete its boot and start the remaining cluster resources.
- Once both SMWs are booted and all resources have started check the cluster state
smw1# check_config smw1 smw2 smw1-drac-ip smw2-drac-ip
Ignore warnings about not being able to contact the boot node.
If check_config reports errors, resolve the errors and execute check_config again. If errors continue to be reported, contact support.
When no errors are reported, continue with other site specific SMW checks and CLE boot.