Configure Boot Node Failover

Configure a tier1 service node to be a backup boot node for boot node failover.

Boot node failover requirements:
  • Both the boot node and the backup boot node must have a Fibre Channel or SAS connection to the boot RAID.
  • Both the boot node and the backup boot node must have an Ethernet connection to the network shared with the SMW in order to PXE boot and transfer data as a tier1 node.
  • The primary and backup nodes must not be on the same blade.
  • The boot and SDB nodes must not be on the same blade.
CAUTION: The system will fail if a blade containing both the boot node and the SDB node fails, because Cray does not support concurrent failover of boot and SDB nodes. Therefore, the boot and SDB nodes and their backups (for boot/SDB node failover) must be on different blades.

The system must be shut down before invoking the xtcli halt command, which is used in this procedure.

If a secondary (backup) boot node is configured, boot node failover will occur automatically if the primary boot node fails. This procedure configures the system for boot node failover. If boot node failover was configured during an SMW/CLE software installation or update, this procedure is not needed.

For the examples in this procedure, the cname of the primary boot node is c0-0c0s4n1, and the cname of the backup boot node is c0-2c0s4n1.

  1. Configure cray_multipath for the backup boot node, if cray_multipath is enabled.

    cray_multipath is in the global config set and may be inherited by the CLE config set. If the global cray_multipath is enabled and the CLE cray_multipath is set to inherit from the global config set, then make the changes in the global cray_multipath service. If the CLE cray_multipath service is enabled and not set to inherit from the global config set, then make the changes in the CLE cray_multipath service.

    Enter the list of multipath nodes.

    Change cray_multipath.settings.multipath.data.node_list, so that it includes both the primary boot node and the backup boot node.

    This example shows a list of four nodes: an SMW with host ID 1eac4e0c, a primary boot node with cname c0-0c0s4n1, a backup boot node with cname c0-2c0s4n1, and an SDB node with cname c0-0c0s3n1.

    cray_multipath.settings.multipath.data.node_list: 
    - 1eac4e0c 
    - c0-0c0s4n1 
    - c0-2c0s4n1 
    - c0-0c0s3n1
    
  2. Configure cray_node_groups to add a backup boot node.

    In the CLE config set, the cray_node_groups service should have a boot_nodes node group with the primary boot node (c0-0c0s4n1) and the backup boot node (c0-2c0s3n1) as members.

    cray_node_groups.settings.groups.data.group_name.boot_nodes: null
    cray_node_groups.settings.groups.data.boot_nodes.description: Default node  
        group which contains the primary and failover (if applicable) boot  
        nodes associated with the current partition.
    cray_node_groups.settings.groups.data.boot_nodes.members: 
    - c0-0c0s4n1
    - c0-2c0s4n1
    
  3. Configure cray_persistent_data to add the boot_nodes node group.
    Ensure that this setting includes the boot_nodes node group and the sdb_nodes node group.
    cray_persistent_data.settings.mounts.data./var/lib/nfs.client_groups:
    - boot_nodes
    - sdb_nodes
    
  4. Configure cray_scalable_services to add boot_nodes node group.
    Ensure that this setting includes the boot_nodes node group and the sdb_nodes node group.
    cray_scalable_services.settings.scalable_service.data.tier1_groups:
    - boot_nodes
    - sdb_nodes
    
  5. Configure cray_net to add a backup boot node.

    These settings configure a host as the backup boot node (backup_bootnode) when using boot node failover. Ensure that the standby_node variable is set to true.

    Note: The host name for the primary and backup boot node should both be set to boot. The aliases can be different so that the /etc/hosts entry for the cname has the host name alias.
    cray_net.settings.hosts.data.common_name.backup_bootnode: null
    cray_net.settings.hosts.data.backup_bootnode.description: backup Boot node for the system
    cray_net.settings.hosts.data.backup_bootnode.aliases:
    - cray-boot2
    cray_net.settings.hosts.data.backup_bootnode.hostid: c0-2c0s4n1
    cray_net.settings.hosts.data.backup_bootnode.host_type: admin
    cray_net.settings.hosts.data.backup_bootnode.hostname: boot
    cray_net.settings.hosts.data.backup_bootnode.standby_node: true
    
    cray_net.settings.hosts.data.backup_bootnode.interfaces.common_name.hsn_boot_alias: null
    cray_net.settings.hosts.data.backup_bootnode.interfaces.hsn_boot_alias.name: ipogif0:1
    cray_net.settings.hosts.data.backup_bootnode.interfaces.hsn_boot_alias.description: 
        Well known address used for boot node services.
    cray_net.settings.hosts.data.backup_bootnode.interfaces.hsn_boot_alias.vlan_id: ''
    cray_net.settings.hosts.data.backup_bootnode.interfaces.hsn_boot_alias.vlan_etherdevice: ''
    cray_net.settings.hosts.data.backup_bootnode.interfaces.hsn_boot_alias.bonding_slaves: []
    cray_net.settings.hosts.data.backup_bootnode.interfaces.hsn_boot_alias.bonding_module_opts: mode=active-backup
        miimon=100
    cray_net.settings.hosts.data.backup_bootnode.interfaces.hsn_boot_alias.aliases: []
    cray_net.settings.hosts.data.backup_bootnode.interfaces.hsn_boot_alias.network: hsn
    cray_net.settings.hosts.data.backup_bootnode.interfaces.hsn_boot_alias.ipv4_address: 10.131.255.254
    cray_net.settings.hosts.data.backup_bootnode.interfaces.hsn_boot_alias.ipv4_secondary_addresses: []
    cray_net.settings.hosts.data.backup_bootnode.interfaces.hsn_boot_alias.mac: ''
    cray_net.settings.hosts.data.backup_bootnode.interfaces.hsn_boot_alias.startmode: ''
    cray_net.settings.hosts.data.backup_bootnode.interfaces.hsn_boot_alias.bootproto: static
    cray_net.settings.hosts.data.backup_bootnode.interfaces.hsn_boot_alias.mtu: ''
    cray_net.settings.hosts.data.backup_bootnode.interfaces.hsn_boot_alias.extra_attributes: []
    cray_net.settings.hosts.data.backup_bootnode.interfaces.hsn_boot_alias.module: ''
    cray_net.settings.hosts.data.backup_bootnode.interfaces.hsn_boot_alias.params: ''
    #cray_net.settings.hosts.data.backup_bootnode.interfaces.hsn_boot_alias.unmanaged_interface: false
    
    cray_net.settings.hosts.data.backup_bootnode.interfaces.common_name.primary_ethernet: null
    cray_net.settings.hosts.data.backup_bootnode.interfaces.primary_ethernet.name: eth0
    cray_net.settings.hosts.data.backup_bootnode.interfaces.primary_ethernet.description: 
        Ethernet connecting boot node to the SMW.
    cray_net.settings.hosts.data.backup_bootnode.interfaces.primary_ethernet.vlan_id: ''
    cray_net.settings.hosts.data.backup_bootnode.interfaces.primary_ethernet.vlan_etherdevice: ''
    cray_net.settings.hosts.data.backup_bootnode.interfaces.primary_ethernet.bonding_slaves: []
    cray_net.settings.hosts.data.backup_bootnode.interfaces.primary_ethernet.bonding_module_opts: mode=active-backup
        miimon=100
    cray_net.settings.hosts.data.backup_bootnode.interfaces.primary_ethernet.aliases: []
    cray_net.settings.hosts.data.backup_bootnode.interfaces.primary_ethernet.network: admin
    cray_net.settings.hosts.data.backup_bootnode.interfaces.primary_ethernet.ipv4_address: 10.3.1.254
    cray_net.settings.hosts.data.backup_bootnode.interfaces.primary_ethernet.ipv4_secondary_addresses: []
    cray_net.settings.hosts.data.backup_bootnode.interfaces.primary_ethernet.mac: ''
    cray_net.settings.hosts.data.backup_bootnode.interfaces.primary_ethernet.startmode: ''
    cray_net.settings.hosts.data.backup_bootnode.interfaces.primary_ethernet.bootproto: static
    cray_net.settings.hosts.data.backup_bootnode.interfaces.primary_ethernet.mtu: ''
    cray_net.settings.hosts.data.backup_bootnode.interfaces.primary_ethernet.extra_attributes: []
    cray_net.settings.hosts.data.backup_bootnode.interfaces.primary_ethernet.module: ''
    cray_net.settings.hosts.data.backup_bootnode.interfaces.primary_ethernet.params: ''
    #cray_net.settings.hosts.data.backup_bootnode.interfaces.primary_ethernet.unmanaged_interface: false
    
  6. Update the config set to regenerate the CLE /etc/hosts file so that it contains the appropriate backup node settings.
    smw# cfgset update p0
    smw# cfgset validate p0
    
  7. Halt the primary and backup boot nodes.
    crayadm@smw> xtcli halt boot_primary_id,boot_backup_id
    
  8. Set the primary and backup boot nodes using the xtcli command. Use the -b argument for a boot node.
    crayadm@smw> xtcli part_cfg update p0 -b boot_primary_id,boot_backup_id
    
  9. Add boot node failover to the boot automation file, auto.hostname.start.

    When boot node failover is used, add settings to the boot automation file to ensure that STONITH is enabled on the blades that contain the primary and backup boot nodes. The STONITH setting does not survive a power cycle or any other action that causes the bcsysd daemon to restart. Adding these lines to the boot automation file maintains that setting.

    Set STONITH for the blades that contain the primary and backup boot nodes. In the example, the primary boot node is c0-0c0s4n1, so its blade is c0-0c0s4, and the backup boot node is c0-2c0s4n1, so its blade is c0-2c0s4. Add these lines before the line for booting the boot node.

    # Set STONITH for primary boot node
    lappend actions {crms_exec "xtdaemonconfig c0-0c0s4 stonith=true"}
    # Set STONITH for the backup boot node
    lappend actions {crms_exec "xtdaemonconfig c0-2c0s4 stonith=true"}
    
  10. Enable the xtfailover_halt command in the auto.hostname.stop file.
    Uncomment the second of these lines in auto.hostname.stop. This file in /opt/cray/hss/default/etc is normally copied from auto.xtshutdown to auto.hostname.stop during a fresh install. The xtfailover_halt command ensures that the xtbootsys shutdown process sends a STOP NMI to the failover nodes.
    # Enable the following line if boot or sdb failover is enabled:
    lappend actions { crms_exec \
    "/opt/cray/hss/default/bin/xtfailover_halt --partition $data(partition,given) --shutdown" 
    
  11. Assign the boot image to the backup boot node.
    Check which NIMS group and boot image are being used for the primary boot node and the backup boot node. (The cnode and cmap commands replace the nimscli command, which was deprecated in CLE 6.0.UP04 and removed in CLE 6.0.UP05. Be sure to change any scripts that reference nimscli.)
    smw# cnode list c0-0c0s4n1
    smw# cnode list c0-2c0s4n1
    

    If the backup boot node does not have the same NIMS group and boot image assigned, update the backup boot node.

    Remove the old NIMS group from the backup boot node.
    smw# cnode update -G oldNIMSgroup c0-2c0s4n1
    
    Assign the primary boot node's NIMS group and boot image to the backup boot node.
    smw# cnode update -g primaryNIMSgroup \
    -i /path/to/primary/bootimage c0-2c0s4n1
    
    Confirm the change.
    smw# cnode list c0-2c0s4n1
    
  12. Boot the system.
    crayadm@smw> xtbootsys -a auto.hostname.start
    
    Trouble? If a node that is on a blade with STONITH enabled fails to boot, try adjusting the heartbeat timeout setting for that node (see the xtdaemonconfig man page).

    For all other problems booting CLE, see the XC™ Series Boot Troubleshooting Guide (S-2565).