Post Update Configuration

Procedures needed to complete the update configuration of both the SMWs.

Both the first and second SMW have been updated to latest version of SLES.

  1. Power on the first SMW.
  2. Log into the first SMW (smw1) as root. Log in directly as root; do not use su from a different account.
    workstation> ssh -X root@smw1
  3. Restart the typescript on the first SMW.
    smw1# export TODAY=`date +%Y%m%d`
    smw1# cd /var/adm/cray/release/_update
    smw1# script -af .update.3
    smw1# export SNAPSHOT=$(snaputil list |grep ^cur| awk '{print $2}')
    smw1# PS1="\[\e[1;31m\]\u@\h:\w \t # \[\e[0m\]\[\e[00m\]"
  4. Wait for the cluster to stabilize. However, do not expect all cluster resources to start normally until the SMWHAconfig and PMDB configuration steps are completed later in this procedure.
    smw1# sleep 300
  5. Power on the second SMW.
  6. Wait for the cluster to stabilize. However, do not expect all cluster resources to start normally until the SMWHAconfig and PMDB configuration steps are completed later in this procedure.
    smw1# sleep 300
  7. Reconfigure the cluster.
    IMPORTANT: Verify that your smwha_args file contains accurate configuration information (e.g., hostnames, IP addresses, etc.) reflecting your system.
    Expect warnings about timeouts. These may safely be ignored. This step causes the whole cluster to be reconfigured. If your site uses eLogin, these resources will also be removed and will need to be put back.
    smw1# cd /opt/cray/ha-smw/default/hainst
    smw1# cat smwha_args
    smw1# ./SMWHAconfig @smwha_args
  8. Initialize the replicated PMDB on both SMWs.
    NOTE: The following substeps require switching back and forth between the active and passive SMWs. Pay attention to the command line prompts in the examples.
    1. Enable maintenance mode.
      smw1# maintenance_mode_configure enable
       Maintenance mode was enabled
    2. Initialize the active SMW.
      smw1# pmdb_util ha --init_master
      ...
      [initialize()]:     INFO: -----------------------------------------------
      [initialize()]:     INFO: PMDB Initialization SUCCEEDED
      [initialize()]:     INFO: -----------------------------------------------
      CAUTION: Ensure that the last three lines of the output look as above. Do not proceed until the initialization has finished.
    3. Start the service for the active SMW.
      smw1# systemctl start postgresql
      NOTE: If postgresql is unable to start, it is possible that the cluster was in transition when it was put into maintenance mode at the beginning of this step. If this happens, perform the following steps to allow the cluster to transition, then return to the beginning of step 8 and repeat this procedure.
      smw1# maintenance_mode_configure disable
      smw#1 sleep 30
    4. Initialize the passive SMW.
      smw2# pmdb_util ha --init_standby
      [main()]:     INFO: Initializing standby...
      [init_standby()]:     INFO: Initializing HA standby system...
      [init_standby()]:     INFO: Old data directory removed.
      [init_standby()]:     INFO: Synchronizing this standby with master. This might take a while!
      [init_standby()]:     INFO: Initial replication successful! Full output:
      [init_standby()]:     INFO: 		NOTICE:  WAL archiving is not enabled; 
      you must ensure that all required WAL segments are copied through other means to complete the backup
      [init_standby()]:     INFO: Standby successfully initialized!
      CAUTION: Ensure that the last lines of the output look as above. Do not proceed until the initialization has finished.
    5. Start the service for the passive SMW.
      smw2# systemctl start postgresql
    6. Disable maintenance mode.
      smw1# maintenance_mode_configure disable
       Maintenance mode was disabled
      
      smw1# sleep 300
  9. Check the status of the cluster.
    smw1# crm status
    1. Run the following command if processes remain "Stopped" after waiting 300 seconds.
      smw1# clear_failcounts
      smw1# sleep 300
    2. Run the following command to restart all resources if processes remain stopped after waiting another 300 seconds.
      smw1# clean_resources
      smw1# sleep 300
  10. Report the system status.
    smw1# ha_health
    smw1# check_config smw1 smw2 smw1-drac smw2-drac
  11. Determine the active SMW before proceeding.
    smw1# ha_health | egrep "Active Node" 
  12. To continue the system update, log into the active SMW and perform the procedures beginning with Section 5.3, “Build Boot Images,” in the XC™ Series Software Installation and Configuration Guide (CLE 7.0.UP00) S-2559. These procedures only need to be done on the active SMW. Update procedures on the passive SMW are now complete.