Install Third-Party Software with a Custom Image Recipe

Create a customized image recipe for Cray nodes that includes third-party software, with a Cray-provided recipe as a subrecipe. Also summarizes other image-creation methods (cloning an image; using zypper/chroot).

Any software that is created independent from Cray and that is not delivered with a Cray system is third-party software that administrators install as add-ons to the Cray system. (The information in this section does not pertain to software installed on an external file system that is connected to a Cray system.) There are several ways to install third-party software:
  • (Recommended) Create a custom image recipe for the third-party software and add a Cray-provided recipe as a subrecipe (also called extending a recipe). This method is preferred because the update to the image is persisted in the recipe.
  • Clone an existing recipe, then modify the clone to add the third-party software. This method is not recommended because cloned recipes do not receive updates from patches.
  • Use the image chroot command to install the software to an existing image. Software installed with this method is lost when a node image is rebuilt from a recipe. However, this approach can be useful when persistence is not important, such as when testing third-party software. The Cray image chroot command can be used to chroot into any XC system image root, regardless of architecture. (Note that using image chroot on a non-bootable image like PE or diags may result in a prompt like this because such images lack the content used to populate the prompt: [diags_cle_7.0.up00_sles_15_x86-64] I have no name!@smw:/"I have no name!" )
  • Use the zypper command to install software on a node. Software installed with this method is lost the next time the node is booted. Like the image chroot method, this approach can be used when testing software that does not need to persist in the image.
  • If the software is not located in RPMs, then one of the following methods is needed:
    • Package the files into a cpio or tar file and use the postbuild_copy and postbuild_chroot sections of the recipe to copy and extract the files to the correct locations.
    • Build the image first and then copy or install the files by hand.
Important: Do not directly modify a Cray-provided recipe. For information on removing an RPM after building from a Cray-provided recipe, see Remove an Undesired RPM After Building From a Cray Recipe.

This procedure describes the recommended method of creating a new image recipe for third-party software that will run on Cray nodes. The procedure explains how to add a Cray-provided image recipe as a subrecipe, then add the third-party repositories, package collections, and RPMs, as well as optional non-RPM content. It then shows how to build an image root, export the image root into a boot image, push the boot image to the boot node (netroot only), test it on a single node, and assign the tested image to all applicable nodes.

For more information on image-related concepts and commands, see About the Image Management and Provisioning System (IMPS).

CREATE REPOSITORY

  1. Create a new repository and add the third-party packages (RPMs). Skip this step if the repository already exists on the SMW or is hosted on a remote repository server.
    1. Use the repo create command to create the new repository (for example, my_sles15_repo).
      This command requires the operating system distribution (for example, SLES15, RHEL6, CentOS).
      smw# repo create --dist SLES15 my_sles15_repo
      
    2. Verify that the new repository was created.
      smw# repo list my* 
      my_sles15_repo
      
    3. Add the third-party RPMs to the repository. This example takes all RPMs starting with myrpm in the example repository path /path/to/repos/ and copies them to the example repo my_sles15_repo.
      smw# repo update -a "/path/to/repos/myrpm*.rpm" my_sles15_repo
      smw# ls -l /var/opt/cray/repos/my_sles15_repo
      -rw-r--r-- 1 crayadm crayadm 485137 Nov 23 08:56 myrpm-11.13.1.1-4.x86_64.rpm
      
    4. (Optional) Check the contents of the repository. This command displays the packages but not the full RPM names.
      smw# repo show --fields contents
      
      Add the --detailed option to display the version and architecture for each package in the repository.
    5. Validate the repository.
      smw# repo validate my_sles15_repo
      

CREATE PACKAGE COLLECTION

  1. Create a package collection and add the RPM package names.
    A package collection represents a logical grouping of RPMs. Cray recommends using a package collection because the RPMs can be used in multiple image types (such as compute and service node images). Package collections are stored on the SMW in /etc/opt/cray/imps/package_collections.d/.
    Cray provides the following package collections for workload manager (WLM) software:
    • service-pbs_cle_7.0.up00_sles_15 (packages needed to build and run PBS)
    • service-torque_cle_7.0.up00_sles_15 (packages needed to build and run Moab/TORQUE)
    • slurm-build_cle_7.0.up00_sles_15 (packages needed to build Slurm)
    • compute-slurm_cle_7.0.up00_sles_15 (packages needed to run Slurm on compute nodes)
    • login-slurm_cle_7.0.up00_sles_15 (packages needed to run Slurm on login nodes)
    • service-slurm_cle_7.0.up00_sles_15 (packages needed to run Slurm on service nodes)
    1. Create an empty package collection (for example, my_collection).
      smw# pkgcoll create --description "Example package collection" my_collection
      
    2. Verify that the package collection was created.
      smw# pkgcoll list my*
      my_collection
      
    3. Add the RPM package name or names (for example, myrpm) to the package collection.
      Important: When adding an RPM package to a package collection, use the file name of the RPM without the .rpm extension. Otherwise, the package will not install in the image root, even though the package collection may validate.
      smw# pkgcoll update -p myrpm \
      --description "My package collection" my_collection
      
    4. Display information about the package collection.
      smw# pkgcoll show my_collection
      my_collection:
           name: my_collection
           description: My package collection
           packages:
                myrpm
      
    5. Validate the package collection.
      This example assumes that my_collection is for the x86-64 architecture. Use --arch aarch64 if the package collection is for that architecture.
      smw# pkgcoll --arch x86_64 validate my_collection
      

CREATE RECIPE

  1. Create a new recipe and customize it by adding a subrecipe (the Cray-provided image) and the content for the third-party software.
    1. List the existing recipes to determine which image recipe to include.
      smw# recipe list
      admin_cle_7.0.up00_sles_15_ari
      cdt-base-1.0.0_sle12_ari
      cdt-base_1.0.0_sles_12sp3_ari
      compute-large_cle_7.0.up00_sles_15_ari
      compute_cle_7.0.up00_sles_15_ari
      dal_cle_7.0.up00_sles_15_ari
      diags_cle_7.0.up00_sles_15
      elogin-smw-large_cle_7.0.up00_sles_15_ari
      elogin-smw_cle_7.0.up00_sles_15_ari
      ...
    2. Create a new image recipe. This example uses the recipe name site_compute.
      smw# recipe create --description \
      "Example recipe for 3rd-party software on compute nodes" site_compute
      
    3. Add the existing image recipe as a subrecipe. This example uses the Cray-provided recipe compute_cle_7.0.up00_sles_15_ari.
      smw# recipe update -i compute_cle_7.0.up00_sles_15_ari site_compute
      
      Use the PE recipe as a subrecipe of a custom PE recipe if third-party software needs to be added to a PE image.
      It is not recommended to use the PE image for this because a local file needs to be created under /etc/opt/cray/pe/admin-pe/bindmount.conf.d/ to specify bind-mount locations.
    4. Add the package collection that contains the third-party RPMs (in this example, my_collection).
      smw# recipe update -c my_collection \
      --rationale "Include my package collection" site_compute
      
    5. Add the repository that contains the third-party RPMs (for example, my_sles15_repo).
      smw# recipe update -r my_sles15_repo \
      --rationale "Include third-party RPMs" site_compute
      
      To add a remote repository that is hosted on an external repository server, specify the repository's Uniform Resource Identifier (URI) starting with http:// or https://.
    6. Add the objects mentioned in the subrecipe that are also needed for the parent recipe.
      Important: The objects mentioned in a subrecipe are used to build that subrecipe but are not available to the parent recipe. If a package (RPM) or package collection is specified in the parent recipe, the custom recipe must explicitly contain the set of repositories where the packages can be found.
      1. Determine which repository contains the necessary RPM or RPMs. This example find command identifies the Cray repository that contains the RPM otherrpm.
        smw# find /var/opt/cray/repos -name otherrpm\* -ls 
        
      2. Select the correct repository:
        • Choose the repository for the image's operating system distribution — use a SLES repository for a SLES image recipe; use a CentOS repository for a CentOS recipe.
        • Most operating system and Cray repositories come in pairs (base and updates), such as sle-15-product-sles and sle-15-product-sles_updates. Be sure to select both the base and base_updates repositories if they exist, because RPMs in update repositories will have higher version RPMs that will be used instead of lower version RPMs contained in base repos.
      3. Add the required repository or repositories (in this example, otherrepo).
        smw# recipe update -r otherrepo \
        --rationale "Additional repo for third-party software" site_compute
        
        Repeat the -r option to add multiple repositories, such as a base and base_updates repository pair.
        smw# recipe update -r sle-15-product-sles -r sle-15-product-sles_updates \
        --rationale "sles15 update repo" site_compute
    7. (Optional) Add post-build actions by manually editing the image recipe in /etc/opt/cray/imps/image_recipes.d/image_recipes.local.json.
      Post-build actions can add non-RPM content (files or directories) to the image or specify commands to run in the chroot environment of the image root (on the SMW). For example, the post-build actions could include copying a tar file into the image, then using chroot to run the commands to untar it and run an install script.
      • In the postbuild_chroot section, add the commands to run in a chroot environment for this image root.
      • In the postbuild_copy section, add the files to copy into the image.
      smw# vi /etc/opt/cray/imps/image_recipes.d/image_recipes.local.json
      
      "site_compute": {
         "description": "Example recipe for 3rd-party software on compute nodes",
          "dist": "sles15",
          "default-arch": "x86_64",
          "valid-arch": [
              "x86_64",
              "aarch64"
          ],
          "packages": [ ... ],
          "package_collections": [ ... ],
          ”recipes": [ ... ],
          "repositories": [ ... ],
          "postbuild_chroot": [
              "common_command_1",
              "common_command_2",
              "{% if arch == 'aarch64' %}aarch64_command_1{% endif %}",
              "{% if arch == 'x86_64' %}x86_64_command_1{% endif %}",
              "common_command_3",
              "common_command_4",
              ...
          ],
          "postbuild_copy": [
              "/path/to/file/copyfile_1",
              "/path/to/file/copyfile_2",
              "{% if arch == 'aarch64' %}/path/to/file/aarch64_copyfile_1{% endif %}",
              "{% if arch == 'aarch64' %}/path/to/file/aarch64_copyfile_2{% endif %}",
              "/path/to/file/copyfile_3",
              "/path/to/file/copyfile_4",
              ...
          ],
          "version": "2.0.0",
          "metadata": {
               "created": "2018-05-31T13:24:14"
               "history": [
                   "2018-05-31T13:26:01: Extended recipes attribute with 1 Recipe."
                   ...
               ]
          }
      }
      

      While editing the recipe, do not delete or change the version or metadata fields.

      Tip: Post-build scripts can use the following environmental variables:
      • IMPS_IMAGE_NAME
      • IMPS_VERSION
      • IMPS_IMAGE_RECIPE_NAME
      • IMPS_POSTBUILD_FILES
    8. Validate the image recipe.
      smw# recipe validate site_compute
      INFO - Validating recipe site_compute is valid for x86_64 architecture.
      INFO - Validating Image 'site_compute-validate-2018-05-31_13:31:44'
      ...
      INFO - Building out site_compute
      INFO - Calling package manager to validate Recipe 'site_compute'; this can take a few minutes.
      INFO - Removed Image 'site_compute-validate-2018-05-31_13:31:44'.
      INFO - Removed Image 'site_compute-validate-2018-05-31_13:31:44'.
      INFO - Recipe validates.
      
      This command checks that the JSON syntax of the image recipe is correct. It also validates, for the specified architecture, all repositories and package collections referenced by the image recipe, checks that all required packages are included in the recipe, and ensures that it can access any files in the postbuild_copy section.

      Caveat. Recipe validation does NOT validate post-build activities, such as running scripts and copyfiles actions, because without actually installing packages, the scripts/actions cannot be run.

BUILD AND PACKAGE IMAGE

  1. Build the image recipe to create the image root.
    Choose a unique name for the image root. Cray recommends using the image recipe name plus the current date/time. This example uses the image root name site_compute_timestamp.
    Important: If the image root name is not unique, it will overwrite an existing image root. A unique name is especially important for images that are pushed to the boot node. Do not overwrite the image root that is currently used by running nodes.
    The image create command builds the image recipe starting with the package manager installation and then proceeds to step through the post-build copy and post-build chroot commands (in that order).

    If the image to be created should have a different architecture than the recipe's default architecture, add the --arch ARCH to the following command, where ARCH is one of the valid architectures, such as x86_64 or aarch64.

    smw# image create -r site_compute site_compute_timestamp
    INFO - Repository 'my_sles15_repo' validates.
    INFO - Recipe 'site_compute' is valid for building.
    INFO - Calling Package manager to build new image root; this will take a few minutes.
    INFO - Rebuilding RPM database for Image 'site_compute_timestamp'.
    INFO - RPM database does not need to be rebuilt.
    INFO - Running post-build scripts for Image 'site_compute_timestamp'.
    INFO - Copying postbuild files to /tmp/tmpmAyzGl in Image 'site_compute_timestamp'
    INFO - * Executing post-build chroot script: 'common_command1'
    INFO - post-build chroot script output will be located in /tmp/site_compute_postbuild_out_20180513-15:55:11g4WA6p
    INFO - Build of Recipe 'site_compute' has completed successfully.
    
  2. (Optional) Display the build history of the image root.
    smw# image show site_compute_20180521082046
    site_compute_20180521082046:
      name: site_compute_20180521082046
      arch: x86_64
      dist: sles15
      created: 2018-05-21T08:20:46
      history:
        2018-05-21T08:20:57: Successful build of Recipe 'seed_common_6.0.up00_sles_15' into Image 'site_compute_20180521082046'.
        2018-05-21T08:22:53: Successful build of top level recipe 'compute_cle_7.0.up00_sles_15_ari'.
        2018-05-21T08:22:53: Successful rebuild of RPM database.
      path: /var/opt/cray/imps/image_roots/site_compute_20180521082046
    
  3. Package the image root into a boot image.
    smw# image export site_compute_timestamp
    
    INFO - Copying kernel /var/opt/cray/imps/image_roots/site_compute_timestamp/boot/bzImage-3.12.28-4.6_1.0000.8685-cray_ari_c into /tmp/temp_tempfs_50LJ93/DEFAULT
    INFO - Copying parameters file /var/opt/cray/imps/image_roots/site_compute_timestamp/boot/parameters-ari_c into /tmp/temp_tempfs_50LJ93/DEFAULT
     .
     .
     .
    INFO - Image 'site_compute_timestamp' has been packaged into /var/opt/cray/imps/boot_images/site_compute_timestamp.cpio.
    
  4. If this is a netroot image, push the image root to the boot node.
    Important: Before pushing the image root, make sure that there is sufficient space on the boot node in /var/opt/cray/imps/image_roots.
    smw# image sqpush site_compute_timestamp --destination boot
    
    The image sqpush command puts a SquashFS compressed image root on the boot node. Cray recommends using this command instead of image push for better boot performance. For more information, see About Image Pushes: push versus sqpush.

TEST IMAGE

  1. Test the new boot image on a single node.
    1. Assign the boot image to a node with the NIMS cnode command.
      The cnode and cmap commands replace the nimscli command, which was deprecated in CLE 6.0.UP04 and removed in CLE 6.0.UP05. Be sure to change any scripts that reference nimscli.
      This example assigns the boot image file site_compute_timestamp.cpio (in the directory /var/opt/cray/imps/boot_images/) to the compute node with the cname c0-0c0s15n3.
      • For a tmpfs image:
        smw# cnode update -i \
        /var/opt/cray/imps/boot_images/site_compute_timestamp.cpio c0-0c0s15n3
        
      • For a netroot image:
        smw# cnode update c0-0c0s15n3 \
        --set-parameter netroot=site_compute_timestamp
        
    2. Warm-boot the node to test the boot image.
      smw#  xtcli shutdown c0-0c0s15n3
      .
      .
      .
      crayadm@smw> xtbootsys --reboot \
      -r "testing new boot image site_compute_timestamp" c0-0c0s15n3
      

ASSIGN IMAGE TO NODES

  1. Change the NIMS map to assign the new image to the applicable nodes.
    1. Back up the current map before changing to the new image. First, identify the active map.
      smw# cmap list | grep True
      
      The following steps use the active map name "current-map".
    2. Next, clone the current map.
      smw# cmap create --clone current-map new-map
      
      Trouble? The cmap command will first verify that the CLE config set associated with the NIMS map exists. If it does not exist, the command will fail with an error message to that effect.
      • If the config set is expected to be missing (for example, during an installation when the CLE config set has not yet been created), then repeat the cmap create command with the --no-verify flag.
      • If the config set is NOT expected to be missing, then create/locate the missing config set, set it as the default config set for the NIMS map, and repeat the cmap create command.
    3. Mark the new map as the active map.
      smw# cmap setactive new-map
      
    4. Assign the new boot image to all applicable nodes. This example uses "--filter group=compute" to assign the image to all compute nodes.
      • For a tmpfs image:
        smw# cnode update --filter group=compute \
        -i /var/opt/cray/imps/boot_images/site_compute_timestamp.cpio
        
      • For a netroot image:
        smw# cnode update --filter group=compute \
        --set-parameter netroot=site_compute_timestamp
        
      Trouble? If problems occur, use this command to revert to the previous map (current-map):
      smw# cmap setactive current-map
      
  2. Choose when the nodes should switch to the new image.
    • To immediately use the new image, warm-boot all applicable nodes with the new image. This example specifies the compute nodes as a comma-separated list of cnames; see the xtcli(8) man page for other ways of specifying multiple nodes.
      smw# xtcli shutdown cname, cname, ... cname
      .
      .
      .
      
      crayadm@smw> xtbootsys --reboot -r "Booting custom image on all compute nodes" \
      cname, cname, ... cname
      
    • To have the workload manager (WLM) reboot the node once the current user's job finishes, see Apply Rolling Patches to Compute Nodes with cnat.
    • Otherwise, wait until the next full system reboot. The nodes will boot with the new image.

After a recipe has been defined and tested, the imgbuilder command can be used to rebuild and package boot images.