Add and Remove the NXD Feature

How to add or remove the NXD feature to or from the datapath

This section describes how to add or remove the NXD feature to or from the datapath.

Add the NXD Feature After It Has Been Removed

This procedure adds NXD into the datapath and is equivalent to converting a non-NXD system to a system on which NXD may be enabled. The procedure ensures that the file system and caches are started properly, to avoid data loss.

This process is valid only if the system was properly installed with NXDSupport: yes in the YAML file.

  1. Stop the Lustre file system:
    MGMT0$ cscli unmount
  2. Start the NXD service:
    MGMT0$ cscli nxd service start
    This command starts the NXD service on all storage nodes.
  3. Start the Lustre file system:
    MGMT0$ cscli mount
    This command brings up all the RAID resources from all nodes and mounts the file system.

    NXD caching may now be enabled on the system.

Remove the NXD Feature After It Has Been Added

This procedure completely removes the NXD feature from the datapath and ensures that the file system and caches are shut down properly, to avoid data loss.

  1. Disable NXD caching before removing NXD from the datapath, and then wait for the cached dirty data to be flushed to the main array:
    MGMT0$ cscsli nxd disable

    This command initiates a process to flush the NXD cache to the main array. While the cached dirty data is being flushed, running the cscli nxd list command will show the value disabling in the Caching State field. This value continues to show until NXD has completed the flush process.

    The NXD flush process proceeds in the background. After all the data to be flushed from the cache has been written out to the drives on the main array, the flush process is deemed complete and NXD disables caching on all nodes.

    Once NXD caching is disabled, output from the cscli nxd list command should show the value disabled in the Caching State field. To confirm the completion of this step, run the cscli nxd list command repeatedly to poll for the value disabled in the Caching State field.

    If the cache is full, it may take more than 40 minutes to flush all cached data out to drives, depending on the following factors:
    • The size of the cache device
    • The amount of cached data that must be flushed
    • The state of the underlying main array (GridRAID). It may take longer to write all cached data from SSD to a degraded array (GridRAID) or when the GridRAID array is already overloaded to service large bypassed IOs.
    NXD does a fast flush as part of the cscsli nxd disable command, at a flush rate of 500 MiB/Sec or above.

    If client IO has not stopped, sometimes the caching state will remain in the “disabling” state for a considerable period. This may occur when data in new I/O operations overlaps with data that is already in the cache. As a result, it will take longer to flush all the cached data. Run cscli nxd list -a to see more details about the cache state, Dirty CWs, Cache Blocks Flushed, etc.

  2. Stop the Lustre file system:
    MGMT0$ cscli unmount
  3. Stop the NXD service:
    MGMT0$ cscli nxd service stop
    This command stops the NXD service on all storage nodes.
  4. Start the Lustre file system:
    MGMT0$ cscli mount