NXD Cache State on Lustre Filesystem

About NXD caching state on a mounted Lustre file system

When the Lustre file system is mounted, the NXD cache state may be enabled or disabled on all the Object Storage Targets (OSTs). There is no option to change the NXD cache state on selected OSTs only.

Use the cscli nxd list command to determine if NXD caching on OSTs is enabled or disabled. In the following example, the caching state is enabled on the targets:

root@cls12345n000$ cscli nxd list
----------------------------------------------------------------------------------------
Host          Cache       Caching Total    Cache      Cache    Cache      Bypass      
              Group       State   Cache    Size       Block    Window     IO Size
                                  Size     In Use     Size     Size
----------------------------------------------------------------------------------------
cls12345n004  nxd_cache_0 enabled 1.406 TB 699.875 MB 8(4 KiB) 128(64 KiB) 2048(1 MiB)
cls12345n005  nxd_cache_1 enabled 1.406 TB 745.500 MB 8(4 KiB) 128(64 KiB) 2048(1 MiB)
----------------------------------------------------------------------------------------

As needed, the NXD caching state may be changed using the cscli nxd [enable|disable] commands, however, the Lustre file system must be mounted. Note also that NXD caching may be enabled or disabled even if:

An OSS node is down
One or more OSTs of an OSS node are down

If an OST goes down while NXD caching is enabled, dirty data may remain in the NXD cache.

Considerations When Disabling NXD Caching

After running the cscli nxd disable command, it is possible for the output of the cscli nxd list command to continue showing the caching state for any OST as disabling for an extended period. This is because of the length of time needed by background processes to flush outstanding dirty data from the NXD cache device (SSD) to GridRAID.

On a healthy system, it typically takes less than 40 minutes for 1.4TB of a cache device (90% of the dirty data) to flush completely to GridRAID. If the process takes longer, we recommend monitoring the flushing progress for the OST by using the cscli nxd list -a command repeatedly until the process is complete. The -a option will display detailed NXD configuration information and statistics in the DirtyCWs and Cache Blocks Flushed fields.

The following table summarizes several common scenarios that may be observed when monitoring flushing progress:


Value in Dirty CWs Field is...	Value in Cache Blocks Flushed Field is...	Possible Issues
Not changing	Not changing	NXD cache device is up and running fine, but the NXD virtual drive (GridRAID) is faulty or down. Run `cat /proc/mdstat` to check GridRAID state on the affected OSS node.
Decreasing	Increasing	Wait until the caching state on the affected OSTs is completely disabled. The flushing speed might be reduced for several reasons, such as 1) GridRAID is busy serving large bypassed IOs, 2) RAID check is running in the background, 3) GridRAID array is degraded due to 1 or 2 rotational drive failures.
Increasing	Increasing	There may be continuous large overlapping IO coming into the affected OSTs. We recommend stopping client IO until flushing is complete. Any new IOs which are not overlapping in nature will bypass the NXD cache, but for any overlapping IOs the only solution is to stop client IOs to allow the NXD disable process to complete.