About Node Groups

Provides an overview of node groups and lists some characteristics.

The Cray Node Groups service (cray_node_groups) enables administrators to define and manage logical groupings of system nodes. Nodes can be grouped arbitrarily, though typically they are grouped by software functionality or hardware characteristics, such as login, compute, service, DVS servers, and RSIP servers.

Node groups that have been defined in a config set can be referenced by name within all CLE services in that config set, thereby eliminating the need to specify groups of nodes (often the same ones) for each service individually and greatly streamlining service configuration. Node groups are used in many Cray-provided Ansible configuration playbooks and roles and can be also used in site-local Ansible plays. Node groups are similar to but more powerful than the class specialization feature of releases prior to CLE 6.0. For example, a node can be a member of more than one node group but could belong to only one class.

The figure below demonstrates how several nodes may belong to more than one node group. In this example, node group A contains nodes 1-5, node group B contains nodes 4-5, and node group C contains nodes 4-9. Nodes 4 and 5 belong to node groups A, B, and C. In this example, if nodes 1-5 are the desired target for an Ansible play, the play can target node group A instead of specifying each node individually.
Figure: Node Group Member Overlap
Sites are encouraged to define their own node groups and specify their members. Administrators can define and manage node groups using any of these methods:
  • Edit and upload the node groups configuration worksheet (cray_node_groups_worksheet.yaml).
  • Use the cfgset command to view and modify node groups interactively with the configurator.
  • Use the cfgset get and cfgset modify CLI commands to view and modify node groups at the command line. Note that CLI modifications must be followed by a config set update.
After using any of these methods, remember to validate the config set.

Characteristics of Node Groups

  • Node group membership is not exclusive, that is, a node may be a member of more than one node group.
  • Node group membership is specified as a list of nodes:
    • use cname for a CLE node
    • use host ID (the output of the hostid command) for the SMW
    • use host name for an eLogin node
  • All compute nodes and/or all service nodes can be added as node group members by including the keywords “platform:compute” and/or “platform:service” in a node group.
  • Any CLE configuration service is able to reference any defined node group by name.
  • The Configuration Management Framework (CMF) exposes node group membership of the current node through the local system "facts" provided by the Ansible runtime environment. This means that each node knows what node groups it belongs to, and that knowledge can be used in Cray and site-local Ansible playbooks.

Pre-populated Node Groups

Pre-populated node groups are groups of nodes that
  • are likely to be customized and used by many sites
  • support useful default values for many of the configuration services

Several of the pre-populated node groups require customization by a site to provide the appropriate node membership information. This table lists the pre-populated groups and indicates which ones require site customization.

Note that beginning with CLE 6.0.UP06, Cray no longer supports a single node group for all login nodes. Instead, there are two architecture-specific login node groups: one for all login nodes with the x86-64 architecture and one for all login nodes with the AArch64 architecture. To specify all login nodes in the system, use both of those node groups.

Table 1. cray_node_groups
Pre-populated Node GroupRequires Customization?Notes
compute_nodesNoContains all compute nodes in the given partition. The list of nodes is determined at runtime.
compute_nodes_x86_64NoContains all x86-64 compute nodes in the given partition. The list of nodes is determined at runtime.
compute_nodes_aarch64NoContains all AArch64 compute nodes in the given partition. The list of nodes is determined at runtime.
service_nodesNoContains all service nodes in the given partition. The list of nodes is determined at runtime.
service_nodes_x86_64NoContains all x86-64 service nodes in the given partition. The list of nodes is determined at runtime.
service_nodes_aarch64NoContains all AArch64 service nodes in the given partition. The list of nodes is determined at runtime.
smw_nodesYesAdd the host ID (output of the hostid command) of the SMW. For an SMW HA system, add the host ID of the second SMW also.
boot_nodesYesAdd the cname of the boot node. If there is a failover boot node, add its cname also.
sdb_nodesYesAdd the cname of the SDB node. If there is a failover SDB node, add its cname also.
login_nodes_x86_64YesAdd the cnames of all x86-64 internal login nodes on the system.
login_nodes_aarch64YesAdd the cnames of all AArch64 internal login nodes on the system. Leave empty (set to []) if there are none.
elogin_nodesYesAdd the host names of external login nodes on the system. Leave empty (set to []) if there are no eLogin nodes.
all_nodesMaybeContains all compute nodes and service nodes on the system. Add external nodes (e.g., eLogin nodes), if needed.
all_nodes_x86_64NoContains all x86-64 nodes in the given partition. The list of nodes is determined at runtime.
all_nodes_aarch64NoContains all AArch64 nodes in the given partition. The list of nodes is determined at runtime.
tier2_nodesYesAdd the cnames of nodes that will be used as tier2 servers in the cray_scalable_services configuration.

Why is there no "tier1_nodes" pre-populated node group? Cray provides a pre-populated tier2_nodes node group to support defaults in the cray_simple_shares service. Cray does not provide a tier1_nodes node group because no default data in any service requires it. Because it is likely that tier1 nodes will consist of only the boot node and the SDB node, for which node groups already exist, Cray recommends using those groups to populate the cray_scalable_services tier1_groups setting rather than defining a tier1_nodes group.

About eLogin nodes. To add eLogin nodes to a node group, use their host names instead of cnames, because unlike CLE nodes, eLogin nodes do not have cname identifiers. If eLogin nodes are intended to receive configuration settings associated with the all_nodes group, add them to that group, or change the relevant settings in other configuration services to include both all_nodes and elogin_nodes.

Additional Platform Keywords

Cray uses these two platform keywords to create pre-populated node groups that contain all compute or all service nodes.
  • platform:compute
  • platform:service
Cray uses these keywords to create pre-populated node groups that contain all compute or service nodes with the x86-64 or AArch64 architecture.
  • platform:compute-X86
  • platform:service-X86
  • platform:compute-ARM
  • platform:service-ARM

Disabled nodes. All platform keywords, such as platform:compute, platform:service-ARM, and platform:compute-HW12, include nodes that have been disabled. To identify disabled nodes, use this keyword: platform:disabled

Excluded nodes. Groups of nodes can be excluded using a negation operator: ~ (the tilde symbol). For example, a custom node group that contains all enabled compute and service nodes would have the following list as its members. The ordering of the list does not matter: all non-negated keywords are resolved first, then negated ones are removed.
- platform:compute
- platform:service
- ~platform:disabled
Sites that need finer-grained groupings can use additional platform keywords to create custom node groups. For a node group that contains all compute or service nodes with a particular processor/core type, use one of the following platform keywords.
  • platform:compute-XX##
  • platform:service-XX##
For XX##, substitute a processor/core code, such as KL64 or KL68, which designate two Intel® Xeon Phi™ "Knights Landing" (KNL) processors with different core counts. To find the code associated with each node on a Cray system, use the xtcli status p0 command and look in the "Core" column of the output, as shown in the following example.
smw# xtcli status p0
Network topology: class 0
Network type: Aries
Nodeid: Service Core Arch| Comp state [Flags]
-----------------------------------------------------
c0-0c0s0n0: service BW18 X86| ready [noflags|]
c0-0c0s0n1: service BW18 X86| ready [noflags|]
c0-0c0s0n2: service BW18 X86| ready [noflags|]
c0-0c0s0n3: service BW18 X86| ready [noflags|]
c0-0c0s1n0: service BW18 X86| ready [noflags|]
c0-0c0s1n1: service BW18 X86| ready [noflags|]
c0-0c0s1n2: service BW18 X86| ready [noflags|]
c0-0c0s1n3: service BW18 X86| ready [noflags|]
c0-0c0s2n0: - HW12 X86| ready [noflags|]
c0-0c0s2n1: - HW12 X86| ready [noflags|]
c0-0c0s2n2: - HW12 X86| ready [noflags|]
c0-0c0s2n3: - HW12 X86| ready [noflags|]
The following table lists some of the common processor/core codes supported by Cray.
Table 2. Cray Supported Intel Processor/Core (XX##) Codes
Processor (XX)Core (##)Intel Code Name
BW12, 14, 16, 18, 20, 22, 24, 28, 32, 36, 40, 44"Broadwell"
HW04, 06, 08, 10, 12, 14, 16, 18, 20, 24, 28, 32, 36"Haswell"
IV02, 04, 06, 08, 10, 12, 16, 20, 24"Ivy Bridge"
KL60, 64, 66, 68, 72"Knights Landing"
SB04, 06, 08, 12, 16"Sandy Bridge"
SK04, 08, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56"Skylake"
CL18, 20, 24"Cascade Lake"