Reprovision a Persistent Disk on an eLogin Node

Describes how to safely change a disk that is persistent. Involves moving data from the disk to save it, changing the disk to nonpersistent, making the config set changes, rebooting the node, then making the disk persistent again and moving the saved data back to the disk.

eLogin node is booted.

To reprovision a persistent disk with a new partition layout, that disk must be reconfigured as nonpersistent. This can be done by creating a new storage profile for that node with the desired layout and persist_on_boot set to false.

The new partition scheme will be created on the eLogin node after rebooting the node; however ALL DATA WILL BE LOST in the process. If data that resides on the disk needs to be retained, move the data to a safe location before rebooting the node, and copy it back after the node successfully provisions.

To make the disk persistent again, set persist_on_boot: true in the new storage profile after the node has rebooted, so that subsequent reboots do not repartition the disk and cause data loss.
Warning: To avoid loss of data when reprovisioning a persistent disk, move data to a safe location before rebooting the eLogin node. After rebooting the node and restoring that data to the disk, ensure that the disk is reconfigured as persistent.

This procedure safely reprovisions a persistent disk on an eLogin node (elogin1 in the example commands).

  1. Copy data from the persistent disk to a safe location somewhere off that eLogin node.
  2. Prepare configuration worksheets for editing.
    1. Generate a set of configuration worksheets with the current CLE configuration data.
      This example uses the existing CLE config set p0.
      smw# cfgset update -m prepare --no-scripts p0
      
    2. Copy the CLE worksheets to a work area for editing.
      This example makes a directory called /my/workarea. Use a suitable work area directory location to perform this step.
      smw# mkdir -p /my/workarea
      smw# cd /var/opt/cray/imps/config/sets/p0/worksheets
      smw# cp *_worksheet.yaml /my/workarea
      
    3. Change to the new work area.
      smw# cd /my/workarea
      
  3. Edit the cray_storage configuration worksheet to add a storage profile.
    smw# vi cray_storage_worksheet.yaml
    
  4. Add a new storage profile.
    Copy elogin_default (or another storage profile with a layout similar to the desired layout), then change the persistent disk (device) to be nonpersistent and make other changes, as needed.
    1. Copy the storage profile.
      In the worksheet, copy the default storage profile and paste it below this line:
      # NOTE: Place additional 'storage_profiles' setting entries here, if desired.
      
    2. Replace the name (key) of the copied profile with the key for the new storage profile (new_elogin in this example).
      # NOTE: Place additional 'storage_profiles' setting entries here, if desired.
       
      cray_storage.settings.storage_profiles.data.name.new_elogin: null
      cray_storage.settings.storage_profiles.data.new_elogin.enabled: true
       
      cray_storage.settings.storage_profiles.data.new_elogin.layouts.device./dev/sda: null
      cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sda.partition_type: gpt
      cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sda.persist_on_boot: false
       
      cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sda.partitions.label.GRUB: null
      cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sda.partitions.GRUB.type: ext3
      cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sda.partitions.GRUB.size: 1MiB 
       ... 
      cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sda.partitions.label.BOOT: null
      cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sda.partitions.BOOT.type: ext3
      cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sda.partitions.BOOT.size: 2GiB
       ...   
      cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sda.partitions.label.WRITELAYER: null
      cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sda.partitions.WRITELAYER.type: ext4
      cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sda.partitions.WRITELAYER.size: 20GiB
       ... 
      cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sda.partitions.label.TMP: null
      cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sda.partitions.TMP.type: xfs
      cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sda.partitions.TMP.size: 256GiB
       ...  
      cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sda.partitions.label.SWAP: null
      cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sda.partitions.SWAP.type: swap
      cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sda.partitions.SWAP.size: 128GiB
       ...  
      
      cray_storage.settings.storage_profiles.data.new_elogin.layouts.device./dev/sdb: null
      cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sdb.partition_type: gpt
      cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sdb.persist_on_boot: true
       
      cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sdb.partitions.label.CRASH: null
      cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sdb.partitions.CRASH.type: ext4
      cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sdb.partitions.CRASH.size: 10GiB
       ...  
      cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sdb.partitions.label.PERSISTENT: null
      cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sdb.partitions.PERSISTENT.type: xfs
      cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sdb.partitions.PERSISTENT.size: ALL 
       ... 
      
    3. Change the persist_on_boot flag to false for the /dev/sdb disk.
      cray_storage.settings.storage_profiles.data.new_elogin.layouts.device./dev/sdb: null
      cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sdb.partition_type: gpt
      cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sdb.persist_on_boot: false
      
    4. Make the desired changes to this storage profile.
      Because the disk is temporarily nonpersistent, partitions can be added, removed, resized, reordered, or have their file system type changed. Make the desired reprovisioning changes now, bearing in mind the following requirements:
      • To function properly, all eLogin nodes must have all of the following partitions with these exact labels:
        • nonpersistent disk: GRUB, BOOT, WRITELAYER, TMP, and SWAP
        • persistent disk: CRASH and PERSISTENT
      • To enable the eLogin node to boot, the partition_flags list for the GRUB partition must be set to a list containing bios_grub instead of the empty list (the default value for that field).
      • The sum of the sizes of all of the volatile data partitions on the first disk (/dev/sda) must be less than the available storage on the first disk. Similarly, the sum of the sizes of all of the persistent data partitions on the second disk (/dev/sdb) must be less than the available storage on the second disk.
      • Two partitions have the following minimum size limits:
        • BOOT must be > 1 GiB (note binary value)
        • PERSISTENT must be > 200 GiB (note binary value)

      If it is necessary to change the configuration of virtual disk sda or sdb, see Configure the eLogin RAID Virtual Disks.

      For more information about binary values, see Prefixes for Binary and Decimal Multiples.

  5. Upload modified cray_storage worksheet to the config set.
    smw# cfgset update -w '/my/workarea/cray_storage_worksheet.yaml' p0
    
  6. Update the CLE config set.
    smw# cfgset update p0
    
    This update runs all pre-configuration and post-configuration scripts. It is good practice to update the config set when any config services have been changed by importing worksheets.
  7. Validate the config set.
    smw# cfgset validate p0
    
  8. Assign the new storage profile to the eLogin node.
    smw# enode update --set-storage_profile new_elogin elogin1
    
  9. Reboot the eLogin node.
    This example reboots an eLogin node named elogin1.
    smw# enode reboot --pxe elogin1
    
  10. Verify the changes to the storage layout.
    1. On the SMW, determine if the node is finished booting.
      In this example, the eLogin node is elogin1.
      smw# enode status elogin1
      
      The eLogin node has finished booting if its status is node_up.
    2. On the eLogin node, verify that the desired partitions exist with the expected sizes.
      elogin# df
      
  11. Change the formerly persistent disk, which was temporarily made nonpersistent, to be persistent again.
    This example uses the new_elogin storage profile. Substitute the actual storage profile name for this system.
    smw# cfgset modify --set true \
    cray_storage.settings.storage_profiles.data.new_elogin.layouts./dev/sdb.persist_on_boot p0
    
  12. Update the CLE config set.
    smw# cfgset update p0
    
    This update runs all pre-configuration and post-configuration scripts. It is good practice to update the config set when any config services have been changed by importing worksheets.
  13. Validate the config set.
    smw# cfgset validate p0
    
  14. Push the config set to the eLogin node.
    smw# cfgset push -d elogin1 p0
    
  15. Move the data copied from the persistent disk (in the first step) back to the eLogin node.