Finish Configuring the SMW HA System

Finalize configuration of SMW HA installation.

In order to finalize the configuration on the SMW HA system, commands must be performed on both the first and second SMW. Careful attention should be paid to the command prompts in each step example of the following procedure.

  1. Bring up the Cray system (service and compute nodes), if not already up.
  2. Synchronize ssh user keys between smw2 and the boot node to enable passwordless access.
    1. Copy the rsa-key from the first SMW to the second SMW:
      smw1# scp -pr /root/.ssh/id_rsa* root@smw2:/root/.ssh/
    2. Log in to the boot node from the second SMW. Answer reply "yes" when prompted.
      smw2# ssh boot
      boot# exit
  3. If the time zone was changed when installing the base operating system, copy the localtime file on the second SMW.
    Put the SMW time zone setting where the cabinet and blade controllers can access it. Execute the following command on the second SMW.
    smw2# cp -p /etc/localtime /opt/tftpboot/localtime
  4. Make a final saved snapshot after the first SMW is completely rebooted. Use the following commands on each SMW.
    smw1# export SNAPSHOT_HA=$(snaputil list |grep ^cur| awk '{print $2}')
    
    smw1# snaputil create .save
    
    
    1. Force a failover so the steps can be completed on the other SMW.
      smw1# crm resource move ClusterIP smw2
      
      smw1# sleep 300
      
      smw1# crm resource unmove ClusterIP
    2. Confirm that the failover has completed, smw2 is now active, and all services are running again.
      smw1# crm status
      
      smw2# export SNAPSHOT_HA=$(snaputil list |grep ^cur| awk '{print $2}')
      
      smw2# snaputil create .save
      
    3. Force the failover back to the first SMW.
      smw2# crm resource move ClusterIP smw1
      
      smw2# sleep 300
      
      smw2#  crm resource unmove ClusterIP
      
      smw2# crm status
      
  5. Verify the system has been correctly configured.
    smw1# check_config smw1 smw2 smw1-drac smw2-drac
    Running this command should return and report "System is configured correctly." If it doesn't, the system needs to be repaired before moving on.
    If eLogin is not yet configured, check_config will return errors similar to the following example.
    ERROR: System is not configured correctly.
    Error summary:
    	SMW ethel: interface 6 link DOWN
    	SMW ethel: interface 7 link DOWN
    	SMW lucy: interface 6 link DOWN
    	SMW lucy: interface 7 link DOWN
    	ethel	eth6 - ERROR - address is 10.6.1.2/16, should be
    	ethel	eth7 - ERROR - address is 10.7.1.2/16, should be
    	lucy	eth6 - ERROR - address is 10.6.1.3/16, should be
    	lucy	eth7 - ERROR - address is 10.7.1.3/16, should be
    	ethel	eth6 - ERROR - address is 10.6.1.1/16, should be
    	ethel	eth7 - ERROR - address is 10.7.1.1/16, should be
    	ethel:	ICMP ping to 10.6.1.3 failed - ERROR!
    	ethel:	ICMP ping to 10.7.1.3 failed - ERROR!
    	lucy:	ICMP ping to 10.6.1.2 failed - ERROR!
    	lucy:	ICMP ping to 10.7.1.2 failed - ERROR!
    
    Expect these errors to continue until eLogin is configured.
If this site will be using SMW-HA to manage eLogin, proceed to XC™ Series SMW-managed eLogin Installation Guide (S-3020). If not, the SMW-HA has now been installed and configured.
CAUTION: If it is necessary to revert to a previous snapshot at some point, use only an HA snapshot — that is, a snapshot created after the SMW HA software was installed and configured. It is dangerous to boot a non-HA snapshot on an HA system because there is a risk of double-mounting the shared file systems, which could cause file system corruption. .