How to configure one PBS complex to manage multiple architectures: add a non-Cray machine-oriented miniserver (MOM) node (MAMU or post-processing node), set up PBS scheduler, and create/import prologue and epilogue hooks.
This procedure assumes the following:
- MAMU (multiple application, multiple user) service nodes are configured on the Cray XC system.
- PBS (12.1 or a later release) is installed and running on the SDB node of the Cray XC system.
This procedure describes how to configure one PBS complex to manage multiple architectures. It comprises the following tasks:
- Add a non-Cray machine-oriented miniserver (MOM) node to the PBS complex.
- Set up the PBS scheduler.
- Create and import prologue and epilogue hooks.
Note that Cray no longer supports the cgroup hook, which was formerly enabled as part of this procedure. Sites are expected to use the hook provided by Altair Engineering, Inc. to perform a similar function.
–––––––––– ADD A NON-CRAY MOM NODE TO THE PBS COMPLEX ––––––––––
The non-Cray MOM node could be a MAMU or post-processing node (PPN). A MAMU node is a repurposed compute node that functions as a service node.
For each non-Cray MOM node to add, repeat step 1 through step 10.
- Create a custom tag to differentiate the Linux nodes from the compute nodes.
This example uses the tag mamu for PPN nodes as well. Add this boolean custom resource to the PBS resourcedef file on the PBS server node. After adding the resource restart the PBS Server.
sdb# cat /var/spool/PBS/server_priv/resourcedef | grep mamu
mamu type=boolean flag=h
- Ensure that the host names are set up correctly on the Linux hosts.
Use host names that describe the function of these nodes. The host name should be present in /etc/hosts as well as what is returned by the host name command.
sdb# cat /etc/hosts | grep mamu
10.128.0.31 nid00030 c0-0c0s7n2 mamu1
mamu1# hostname
mamu1
- Create the MAMU node inside of PBS.
sdb# qmgr -c "create node mamu1"
mamu1# /etc/init.d/pbs start
- Configure the PBS MOM config file
$usecp variables.Do not set the alps_client variable.
mamu1# cat /var/spool/PBS/mom_priv/config
$usecp *:/home /home
$usecp *:/ufs /ufs
$usecp *:/cray /cray
- Verify that the node is configured in PBS and the node state is free.
sdb# qmgr -c "print node mamu1"
#
# Create nodes and set their properties.
#
#
# Create and define node mamu1
#
create node mamu1 Mom=nid00030
set node mamu1 state = free
set node mamu1 resources_available.arch = linux
set node mamu1 resources_available.host = nid00030
set node mamu1 resources_available.mem = 32993312kb
set node mamu1 resources_available.ncpus = 16
set node mamu1 resources_available.vnode = mamu1
set node mamu1 resv_enable = True
set node mamu1 sharing = default_shared
- Tag each node in the complex with
mamu=false.
sdb# qmgr -c "set node @default resources_available.mamu = false"
- Tag the MAMU node with
mamu=true.
sdb# qmgr -c "set node mamu1 resources_available.mamu = true"
- Verify that the new resource is present.
sdb# qmgr -c "print node mamu1"
#
# Create nodes and set their properties.
#
#
# Create and define node mamu1
#
create node mamu1 Mom=nid00030
set node mamu1 state = free
set node mamu1 resources_available.arch = linux
set node mamu1 resources_available.host = nid00030
set node mamu1 resources_available.mem = 32993312kb
set node mamu1 resources_available.ncpus = 16
set node mamu1 resources_available.mamu = True
set node mamu1 resources_available.vnode = mamu1
set node mamu1 resv_enable = True
set node mamu1 sharing = default_shared
- Ensure that all other non-MAMU nodes have
mamu=false. - Run jobs targeting the
mamu nodes using one of the following options. - Option 1: If the MAMU nodes have a default memory setting, use the following command.
crayadm@login> qsub -I -lselect=1:mamu=true
qsub: waiting for job job_name to start
qsub: job job_name ready
crayadm@mamu1>
- Option 2: If there is NO default memory setting for the MAMU nodes, specify the maximum amount of memory that the job is expected to use.
crayadm@login> qsub -I -lselect=1:mamu=true:mem=200mb
qsub: waiting for job job_name to start
qsub: job job_name ready
crayadm@mamu1>
–––––––––– SET UP THE PBS SCHEDULER ––––––––––
- Edit the PBS configuration file.
smw# vi /var/spool/PBS/sched_priv/sched_config
- To set a scheduling policy, see the PBS Professional Administrator's Guide.
–––––––––– CREATE AND IMPORT PROLOGUE AND EPILOGUE HOOKS ––––––––––
Enabling a hook to use the execjob_prologue event will disable any prologue bash scripts in PBS. Likewise, enabling a hook to use the execjob_epilogue event will disable any epilogue bash scripts in PBS.
The creation and import of prologue and epilogue hooks is necessary for cluster compatibility mode (CCM) functionality on the Cray XC system.
- Install the following prologue and epilogue hooks on any system that needs both hook and prologue/epilogue script functionality.
These hooks are wrappers for the prologue/epilogue scripts.
sdb# qmgr -c 'create hook cray_prologue'
sdb# qmgr -c 'set hook cray_prologue type = site'
sdb# qmgr -c 'set hook cray_prologue enabled = true'
sdb# qmgr -c 'set hook cray_prologue event = execjob_prologue'
sdb# qmgr -c 'set hook cray_prologue user = pbsadmin'
sdb# qmgr -c 'set hook cray_prologue alarm = 30'
sdb# qmgr -c 'set hook cray_prologue order = 1'
sdb# qmgr -c 'create hook cray_epilogue'
sdb# qmgr -c 'set hook cray_epilogue type = site'
sdb# qmgr -c 'set hook cray_epilogue enabled = true'
sdb# qmgr -c 'set hook cray_epilogue event = execjob_epilogue'
sdb# qmgr -c 'set hook cray_epilogue user = pbsadmin'
sdb# qmgr -c 'set hook cray_epilogue alarm = 30'
sdb# qmgr -c 'set hook cray_epilogue order = 1'
- Create the following temporary file.
sdb# vi /var/spool/PBS/mom_priv/prologue.py
- Copy and paste the following text into prologue.py.
"""
Copyright 2015 Cray Inc. All rights reserved.
This script sets up the environment for the main hook code then invokes the
main routine from the library on disk where this hook was installed. One
benefit of this organization is to improve the testability of the main routine
by allowing it to work, without modification, under either PBS or a test
environment.
Description:
This hook will execute a prologue script located at /mom_priv/prologue
if the file exists, and there are arguments to pass.
"""
import pbs
import sys
import os
# For now the PBS_HOME value has to be editeded manually
# There is no way to access PBS_HOME with the pbs module
PBS_HOME = '/var/spool/PBS-12.1.400/'
PROLOGUE_DIR = PBS_HOME + '/mom_priv'
PROLOGUE = PROLOGUE_DIR + '/prologue'
try:
# Get the hook event information and parameters
e = pbs.event()
# Check to see if a prologe even exists
if os.path.exists(PROLOGUE) == False:
e.accept()
# Ignore requests from scheduler or server
if e.requestor in ["PBS_Server", "Scheduler"]:
e.accept()
# Get the information for the job being queued
j = e.job
if j and j.id and j.euser and j.egroup:
# Assemble and execute the prolgue command
cmd = PROLOGUE
cmd = cmd + ' ' + j.id
cmd = cmd + ' ' + j.euser
cmd = cmd + ' ' + j.egroup
os.popen(cmd, 'w')
# accept the event
e.accept()
except SystemExit:
pass
except:
print sys.exc_info()[0]
- Import the prologue.
sdb# qmgr -c 'import hook cray_prologue \
application/x-python default /var/spool/PBS/mom_priv/prologue.py'
- Create the following temporary file.
sdb# vi /var/spool/PBSmom_priv/epilogue.py
- Copy and paste the following text into epilogue.py.
"""
Copyright 2015 Cray Inc. All rights reserved.
This script sets up the environment for the main hook code then invokes the
main routine from the library on disk where this hook was installed. One
benefit of this organization is to improve the testability of the main routine
by allowing it to work, without modification, under either PBS or a test
environment.
Description:
This hook will execute a epilogue script located at /mom_priv/epilogue
if the file exists, and there are 3 arguments to pass. Passing more than 3
arguments is unsupported
"""
import pbs
import sys
import os
# The PBS_HOME value has to be editeded manually
# There is no way to access PBS_HOME with the pbs module
PBS_HOME = '/var/spool/PBS-12.1.400/'
EPILOGUE_DIR = PBS_HOME + '/mom_priv'
EPILOGUE = EPILOGUE_DIR + '/epilogue'
try:
# Get the hook event information and parameters
e = pbs.event()
if os.path.exists(EPILOGUE) == False:
e.accept()
# Ignore requests from scheduler or server
if e.requestor in ["PBS_Server", "Scheduler"]:
e.accept()
# Get the information for the job being queued
j = e.job
if j and j.id and j.euser and j.egroup:
# Assemble and execute the epilogue command
cmd = EPILOGUE
cmd = cmd + ' ' + j.id
cmd = cmd + ' ' + j.euser
cmd = cmd + ' ' + j.egroup
os.popen(cmd, 'w')
# accept the event
e.accept()
except SystemExit:
pass
except:
print sys.exc_info()[0]
- Import the epilogue.
sdb# qmgr -c 'import hook cray_epilogue \
application/x-python default /var/spool/pbs/mom_priv/epilogue.py'