Start Lustre File System on NXD-Enabled Systems

Considerations when starting Lustre on NXD-enabled systems

NXD is installed only on the OSS nodes of Lustre storage systems. Lustre is started by running the cscli mount command from the primary management node (same as non-NXD systems). This command brings up the underlying Object Storage Targets, or OSTs (RAID devices), on each node and then mounts the file system. There is a difference between the underlying devices for OSTs on non-NXD versus NXD-enabled systems:
  • non-NXD systems use standard GridRAID devices (/dev/mdX)
  • NXD-enabled systems use device-mapped GridRAID devices (/dev/dm-X)
Running the cscli show_nodes command will display the status of the OSS nodes and their OSTs (GridRAID or device-mapped GridRAID), such as the following:
  • Power state (On/Off)
  • Service state (Started/Stopped)
  • Number of targets

NXD must be enabled again after the file system is mounted, to continue caching small block IO operations.

To start Lustre and enable NXD, follow these steps:
  1. Start the Lustre file system.
    MGMT0$ cscli mount
  2. Use the cscli show_nodes command to monitor the progress of bringing up services on the Lustre nodes. It may take a few seconds after issuing the cscli mount command for the show_nodes command output to reflect the correct/updated status.
  3. Continue running the cscli show_nodes command until the output shows that all services have started, as shown in the following sample output:
    admin@cls12345n000$ cscli show_nodes
    ---------------------------------------------------------------------------
    Hostname     Role     Power State Service State Targets HA Partner   HA Res
    ---------------------------------------------------------------------------
    cls12345n000 MGMT     On          N/a           0 / 0   cls12345n001 None
    cls12345n001 (MGMT)   On          N/a           0 / 0   cls12345n000 None
    cls12345n002 MDS,MGS  On          Started       1 / 1   cls12345n003 Local
    cls12345n003 MDS,(MGS)On          Started       1 / 1   cls12345n002 Local
    cls12345n004 OSS      On          Started       1 / 1   cls12345n005 Local
    cls12345n005 OSS      On          Started       1 / 1   cls12345n004 Local
    ---------------------------------------------------------------------------

    Because of drive or hardware issues, one or more targets may occasionally show a Service State of Stopped, often when a failure occurs trying to bring up the underlying RAID arrays for those nodes. It may be necessary to triage the issue by manually inspecting the affected nodes.

  4. Once the file system is mounted and all services started, enable the caching of small block I/O operations.
    admin@cls12345n000$ cscli nxd enable