Enable and Configure RUR

It won't work unless it's enabled and configured

This procedure assumes that the user has generated configuration worksheets and is editing the RUR configuration worksheet (cray_rur_worksheet.yaml). If new worksheets need to be generated, use this procedure:
  1. Generate up-to-date worksheets for config set p0 (merges any new service packages installed on the system with data already in config set p0).
    smw# cfgset update --mode prepare --no-scripts p0
  2. Locate the newly generated worksheets and copy them to a new location.
    smw# cfgset show --fields path p0
    p0:
      path: /var/opt/cray/imps/config/sets/p0
    smw# cp /var/opt/cray/imps/config/sets/p0/worksheets/* /some/edit/location
  3. Edit the RUR worksheet.
    smw# vi /some/edit/location/cray_rur_worksheet.yaml
This procedure identifies both necessary and optional settings for RUR to function properly. The following steps correspond to the configuration settings available in the RUR worksheet, and step numbering reflects the order in which those settings appear.
Tip: The default values assigned for settings are sufficient for an initial install.
  1. Edit cray_rur_worksheet.yaml.
  2. Uncomment cray_rur.enabled and set it to true.
    # Enable 'cray_rur' Service? (boolean, level=basic)
    cray_rur.enabled: true
    #
    #********************* END Service Enable/Disable ********************
  3. Uncomment the lines corresponding to the base settings. Review the guidance information and default value for each setting to determine whether or not to modify it.
    #
    cray_rur.settings.base.data.debug_level: ERROR
    #
    #
    cray_rur.settings.base.data.keep_temp_files: false
    #
    #
    cray_rur.settings.base.data.use_json: false
    #
  4. Uncomment the lines corresponding to the rur_stage settings. Review the guidance information and default value for each setting to determine whether or not to modify it.
    #
    cray_rur.settings.rur_stage.data.stage_timeout: 90
    #
    #
    cray_rur.settings.rur_stage.data.stage_dir: /var/spool/RUR
    #
  5. Uncomment the lines corresponding to the rur_gather settings. Review the guidance information and default value for each setting to determine whether or not to modify it.
    #
    cray_rur.settings.rur_gather.data.gather_timeout: 90
    #
    #
    cray_rur.settings.rur_gather.data.gather_dir: /tmp/rur
    #
  6. Uncomment the lines corresponding to the rur_post settings. Review the guidance information and default value for each setting to determine whether or not to modify it.
    #
    cray_rur.settings.rur_post.data.post_timeout: 90
    #
    #
    cray_rur.settings.rur_post.data.post_dir: /tmp/rur
    #
  7. (Optional) Enable the gpustat data plugin.
    The gpustat plugin collects utilization statistics for NVIDIA GPUs, if present (see The gpustat Data Plugin).
    1. Uncomment cray_rur.settings.gpustat.data.enable and set it to true.
      #
      cray_rur.settings.gpustat.data.enable: true
      #
    2. Uncomment the remaining gpustat settings.
      #
      cray_rur.settings.gpustat.data.stage: /opt/cray/rur/default/bin/gpustat_stage.py
      #
      #
      cray_rur.settings.gpustat.data.post: /opt/cray/rur/default/bin/gpustat_post.py
      #
  8. (Optional) Enable the taskstats data plugin.
    The taskstats plugin collects process accounting data (see The taskstats Data Plugin).
    1. Uncomment cray_rur.settings.taskstats.data.enable and set it to true.
      #
      cray_rur.settings.taskstats.data.enable: true
      #
    2. Uncomment the remaining taskstats settings.
      #
      cray_rur.settings.taskstats.data.stage: /opt/cray/rur/default/bin/taskstats_stage.py
      #
      #
      cray_rur.settings.taskstats.data.post: /opt/cray/rur/default/bin/taskstats_post.py
      #
      #
      cray_rur.settings.taskstats.data.arg: json-dict
      #
    3. Review the guidance information for cray_rur.settings.taskstats.data.arg and modify its value if desired.
      Tip: The amount of data reported by the taskstats plugin and the format in which it is written is determined by the value of arg. Examples are included in The taskstats Data Plugin.
  9. (Optional) Enable the energy data plugin.
    The energy plugin collects compute node energy usage data (see The energy Data Plugin).
    1. Uncomment cray_rur.settings.energy.data.enable and set it to true.
      #
      cray_rur.settings.energy.data.enable: true
      #
    2. Uncomment the remaining energy settings.
      #
      cray_rur.settings.energy.data.stage: /opt/cray/rur/default/bin/energy_stage.py
      #
      #
      cray_rur.settings.energy.data.post: /opt/cray/rur/default/bin/energy_post.py
      #
      #
      cray_rur.settings.energy.data.arg: json-dict
      #
    3. Review the guidance information for cray_rur.settings.energy.data.arg and modify its value if desired.
      Tip: The amount of data reported by the energy plugin and the format in which it is written is determined by the value of arg. Examples are included in The energy Data Plugin.
  10. (Optional) Enable the timestamp data plugin.
    The timestamp plugin collects the start and end times of an application or job (see The timestamp Data Plugin).
    1. Uncomment cray_rur.settings.timestamp.data.enable and set it to true.
      #
      cray_rur.settings.timestamp.data.enable: true
      #
    2. Uncomment the remaining timestamp settings.
      #
      cray_rur.settings.timestamp.data.stage: /opt/cray/rur/default/bin/timestamp_stage.py
      #
      #
      cray_rur.settings.timestamp.data.post: /opt/cray/rur/default/bin/timestamp_post.py
      #
  11. (Optional) Enable the memory data plugin.
    The memory plugin collects information from /proc and /sys that is useful when assessing the memory performance of an application or job (see The memory Data Plugin).
    1. Uncomment cray_rur.settings.memory.data.enable and set it to true.
      #
      cray_rur.settings.memory.data.enable: true
      #
    2. Uncomment the remaining memory settings.
      #
      cray_rur.settings.memory.data.stage: /opt/cray/rur/default/bin/memory_stage.py
      #
      #
      cray_rur.settings.memory.data.post: /opt/cray/rur/default/bin/memory_post.py
      #
      #
      cray_rur.settings.memory.data.arg: json-dict
      #
    3. Review the guidance information for cray_rur.settings.memory.data.arg and modify if desired.
      Tip: The amount of data reported by the memory plugin is determined by the value of arg. Examples are included in The memory Data Plugin.
  12. (Optional) Enable the nodeuse data plugin.
    The nodeuse plugin collects compute node usage data within the scope of an application (see The nodeuse Data Plugin).
    1. Uncomment cray_rur.settings.nodeuse.data.enable and set it to true.
      #
      cray_rur.settings.nodeuse.data.enable: true
      #
    2. Uncomment the remaining nodeuse settings.
      #
      cray_rur.settings.nodeuse.data.stage: /opt/cray/rur/default/bin/nodeuse_stage.py
      #
      #
      cray_rur.settings.nodeuse.data.post: /opt/cray/rur/default/bin/nodeuse_post.py
      #
  13. (Optional) Enable the dws data plugin.
    The dws plugin collects DataWarp utilization statistics (within the scope of an application) from compute nodes, if present (see The dws Data Plugin).
    1. Uncomment cray_rur.settings.dws.data.enable and set it to true.
      #
      cray_rur.settings.dws.data.enable: true
      #
    2. Uncomment the remaining dws settings.
      #
      cray_rur.settings.dws.data.stage: /opt/cray/rur/default/bin/dws_stage.py
      #
      #
      cray_rur.settings.dws.data.post: /opt/cray/rur/default/bin/dws_post.py
      #
  14. (Optional) Enable the dws_server data plugin.
    The dws_server plugin collects utilization statistics (within the scope of an application) from DataWarp servers, if present (see The dws_server Data Plugin).
    1. Uncomment cray_rur.settings.dws_server.data.enable and set it to true.
      #
      cray_rur.settings.dws_server.data.enable: true
      #
    2. Uncomment the remaining dws_server settings.
      #
      cray_rur.settings.dws_server.data.stage: /opt/cray/rur/default/bin/dws_server_stage.py
      #
      #
      cray_rur.settings.dws_server.data.post: /opt/cray/rur/default/bin/dws_server_post.py
      #
  15. (Optional) Enable the dws_job_server data plugin.
    The dws_job_server plugin collects utilization statistics (within the scope of a job) from DataWarp servers, if present (The dws_job_server Data Plugin).
    1. Uncomment cray_rur.settings.dws_job_server.data.enable and set it to true.
      #
      cray_rur.settings.dws_job_server.data.enable: true
      #
    2. Uncomment the remaining dws_job_server settings.
      Note that the post script is the same as dws_server.
      #
      cray_rur.settings.dws_job_server.data.stage: /opt/cray/rur/default/bin/dws_job_server_stage.py
      #
      #
      cray_rur.settings.dws_server.data.post: /opt/cray/rur/default/bin/dws_server_post.py
      #
  16. (Optional) Enable the llm output plugin.
    The llm plugin aggregates log messages from various Cray nodes and places them on the SMW (The llm Output Plugin).
    1. Uncomment cray_llm.settings.llm.data.enable and set it to true.
      #
      cray_rur.settings.llm.data.enable: true
      #
    2. Uncomment the other llm setting.
      #
      cray_rur.settings.llm.data.output: /opt/cray/rur/default/bin/llm_output.py
      #
  17. (Optional) Enable the user output plugin.
    The user plugin writes RUR output for a user's application to the user's home directory (default) or a user-defined location, only if the user has indicated that this behavior is desired (The user Output Plugin).
    1. Uncomment cray_rur.settings.user.data.enable and set it to true.
      #
      cray_rur.settings.user.data.enable: true
      #
    2. Uncomment the remaining user settings.
      #
      cray_rur.settings.user.data.output: /opt/cray/rur/default/bin/user_output.py
      #
      #
      cray_rur.settings.user.data.arg: single, opt_in
      #
    3. Review the guidance information for cray_rur.settings.user.data.arg and modify if desired.
      Tip: The number of output files created by the user plugin and its opt-in flag are determined by the value of arg. Further details are included in The user Output Plugin.

Next, configure the cray_alps to call the RUR prologue and epilogue scripts. Sites running Slurm must modify the Slurm configuration file to call the RUR prologue and epilogue scripts.