ClusterStor NXD Feature

NXD overview and theory of operation.

This publication describes how to operate, manage, and troubleshoot systems that are configured to use NXD.

NXD uses flash acceleration software to improve performance for small block IOs by caching them on SSD drives.

Procedures in this section are written specifically for use by an admin user and do not require root (super user) access. Hewlett Packard Enterprise recommends that these procedures be followed as written and that the technician does not log in as root or perform the procedures as a root user. Adding the feature requires approximately one (1) hour.

Overview

NXD implements write back caching on SSD drives for IO on small blocks. The NXD functionality is characterized by the following:

Cache is populated by writes that are less than or equal to the size of a user configured IO bypass size. These writes are referred to as small block IO operations.
Writes larger than the configured bypass IO size are referred to as large block IO operations and completely bypass the cache. However, if the large block IO request overlaps data that is already cached, it is treated as a cached IO. This is done to ensure data integrity.
Reads on cached data are served from the cache.
Small block reads on noncached data do not populate the cache. They are served directly from the backend disks where the data resides.

NXD Theory of Operation

One of the core components of NXD is a kernel IO filter driver that is implemented as a Linux device mapper (DM) target driver. This driver intercepts IO and routes small blocks for caching. Large blocks bypass caching and are directly handed off to the underlying device (GridRAID).

Caching is implemented as a cache management library with well-defined APIs, deployed as a Linux kernel module.

The NXD IO and caching capabilities operate at the Linux kernel block layer, which makes it transparent to the file system, database applications, and other applications.

NXD is hardware agnostic and can work with any block device.

Cache Group

NXD introduces the concept of a cache group, which is a collection of:

A single GridRAID data volume to be cached, called a virtual device or virtual drive (VD), usually /dev/mdX (RAID 6), where X=0, 2, 4... for even-numbered OSS nodes and 1, 3, 5... for odd-numbered OSS nodes.
A single cache/caching device (CDEV) in which data is cached. The CDEV in a system usually shares the same set of SSDs that are used for WIB and JRNL. The NXD configuration uses larger capacity SSDs (typically 3.2 TB). The CDEV is configured as a single RAID1 md device (in the case of two [2] SSDs) or RAID10 md device (in the case of four [4], six [6], eight [8], or 10 SSDs). For example, /dev/md25X, where X = 0, 2, 4… for even-numbered OSS nodes and 1, 3, 5 … for odd-numbered OSS nodes.

Each instance of GridRAID will be in a separate cache group (that is, each OST will have its own cache group). Therefore, in an SSU+0 configuration an OSS node will show the following sample output (condensed format) from the cscli nxd list command.

admin@cls12345n000$ cscli nxd list
-----------------------------------------------------------------------------------
Host        Cache       Caching Total    Cache      Cache    Cache       Bypass
            Group       State   Cache    Size       Block    Window      IO Size
                                Size     In Use     Size     Size
-----------------------------------------------------------------------------------
cls12345n004 nxd_cache_0 enabled 1.406 TB 699.875 MB 8(4 KiB) 128(64 KiB) 64(32 KiB)
cls12345n005 nxd_cache_1 enabled 1.406 TB 745.500 MB 8(4 KiB) 128(64 KiB) 64(32 KiB)
-----------------------------------------------------------------------------------

The preceding output reveals the following:

The NXD Caching State is “enabled”, indicating that NXD caching is in effect.
Cache Block Size, Cache Window Size, and Bypass IO Size are displayed in sectors (where the size of one sector is 512 bytes), as well as in kibibytes (in parentheses).