DVS Configuration Settings and Mount Options

Mount options and kernel module parameters used to configure DVS and enhance its performance.

Administrators configure Cray DVS (Data Virtualization Service) using the configurator and modprobe.d files in the Simple Sync directory structure, not on the command line or by adding lines to /etc files (e.g., /etc/fstab). The following sections describe the settings that are available within the configurator for that purpose. The first two sections cover settings that are part of the client mount setting. The last section covers the remaining DVS settings, which are kernel module parameters.
Important: When configurator guidance indicates a relationship or interaction between one or more settings, it is advisory only; the configurator does not automatically check to ensure compatibility among settings. However, the underlying implementation of DVS is unchanged, and it does automatically set related mount options when certain mount options are specified. To prevent mount failure, enter setting values that are compatible, in accordance with the instructions in this publication.

Client Mount Settings

reference
A human-readable string—a name—that is used to uniquely identify a client mount. reference cannot be set by accepting the default: a non-empty string is required.
  • Full setting name: cray_dvs.settings.client_mount.data.reference.REF-NAME (where REF-NAME is the user-provided client mount reference name)
  • Level: basic
  • Default value: '' (empty string)
  • Associated environment variable: none
  • Related settings/options: Because this is the key field of a client mount setting entry, each setting within the client mount setting includes this string in its full setting name.
mount_point
A string that specifies the full pathname on the client of the projected file system. mount_point cannot be set by accepting the default: a non-empty string is required.
  • Full setting name: cray_dvs.settings.client_mount.data.REF-NAME.mount_point
  • Level: basic
  • Default value: '' (empty string)
  • Associated environment variable: none
  • Related settings/options: none
spath
A string that specifies the full pathname on the DVS server of the file system that is to be projected for a client mount. It must be an absolute path and it must exist on the DVS server. spath cannot be set by accepting the default: a non-empty string is required.
  • Full setting name: cray_dvs.settings.client_mount.data.REF-NAME.spath
  • Level: basic
  • Default value: '' (empty string)
  • Associated environment variable: none
  • Related settings/options: none
server_groups
A list of node groups that will function as DVS servers for a client mount. Enter one node group per line. server_groups cannot be set by accepting the default: a non-empty list is required.
Important: DVS servers should be dedicated because they use unlimited amounts of CPU and memory resources based directly on the I/O requests sent from DVS clients. Avoid using nodes that have other services (Lustre nodes, login nodes, etc.) or are tier2 nodes.
  • Full setting name: cray_dvs.settings.client_mount.data.REF-NAME.servers
  • Level: basic
  • Default value: [] (empty list)
  • Associated environment variable: none
  • Related settings/options: Functionally equivalent to the nodename or nodefile "additional" option in the options setting of the client mount setting. The use of those two additional options is deprecated.
client_groups
A list of node groups that will function as DVS clients for a client mount. Enter node groups one per line. Unlike server_groups, client_groups can be set to an empty list. If no node groups are specified, the mount will be performed on all suitable compute nodes (a compute node functioning as a DVS server is an example of an unsuitable node). This is common.
  • Full setting name: cray_dvs.settings.client_mount.data.REF-NAME.clients
  • Level: basic
  • Default value: [] (empty list)
  • Associated environment variable: none
  • Related settings/options: none
loadbalance
Used to specify loadbalance mode, which more evenly distributes loads across DVS servers. Loadbalance mode is valid only for read-only mounts.
  • Full setting name: cray_dvs.settings.client_mount.data.REF-NAME.loadbalance
  • Level: advanced
  • Default value: false or 0
  • Associated environment variable: none
  • Related settings/options: When loadbalance is enabled, the underlying DVS implementation automatically sets the readonly setting to true and sets these additional options: cache=1, failover=1, maxnodes=1, and hash_on_nid=1. Cray recommends setting the attrcache_timeout setting as well to take advantage of the mount being read-only. If loadbalance is enabled, leave the readonly setting unconfigured or set it to true to maintain consistency with the way DVS implements loadbalance.
attrcache_timeout
Enables client-side attribute caching, which can significantly increase performance, most notably in pathname lookup situations. File attributes and dentries for getattr requests, pathname lookups, etc. are read from DVS servers and cached on the DVS client for n seconds. Subsequent lookups or getattr requests use the cached attributes until the timeout expires, at which point they are read and cached again on first reference. When attribute caching is disabled, DVS clients must send a lookup request to a DVS server for every level of a pathname, and repeat this for every pathname operation. When it is enabled, it sends a lookup request to a DVS server for every level of a pathname once per n seconds.
Note: An administrator with root privilege can force a cache revalidation at any time, not just when the timeout has expired. See "Force a Cache Revalidation on a DVS Mount Point," which can be found in XC™ Series DVS Administration Guide (S-0005) or XC™ Series Software Installation and Configuration Guide (S-2559).
  • Full setting name: cray_dvs.settings.client_mount.data.REF-NAME.attrcache_timeout
  • Level: advanced
  • Default value: 14400 seconds for read-only mounts.
    Important: This is the configurator default. The underlying DVS implementation default is 3 seconds, which is safer for read-write mounts. This means that to enhance system performance for read-only mounts, configure this setting by accepting the configurator default (or entering some other value). Leaving this setting unconfigured will result in the underlying default being used.
  • Associated environment variable: none
  • Related settings/options: The Ansible play that consumes DVS configuration data prevents use of this mount option for read-write file systems due to the risk of file system corruption. Run-time mounts not accompanied by that Ansible play do not have that safeguard. In such cases, if a read-write mount is created, it is safe to leave attrcache_timeout unconfigured so that the underlying default is used.
readonly
Determines whether the client mount is read-only or read-write. If intending to enable client-side caching of read data on a non-writable file system, use this readonly setting to force the DVS mount to be read-only. This will disable write caching.
  • Full setting name: cray_dvs.settings.client_mount.data.REF-NAME.readonly
  • Level: basic
  • Default value: true or 1
    Important: This is the configurator default. The underlying DVS implementation default is false. Leaving this setting unconfigured will result in the underlying DVS default being used.
  • Associated environment variable: none
  • Related settings/options: When loadbalance is enabled, DVS automatically enables readonly but the configurator does not, so either leave this setting unconfigured or accept the configurator default. If the attrcache_timeout setting is set for this client mount, readonly should be enabled (set to true) in the configurator/worksheet. If the cache option is specified in the options setting for this client mount, enabling readonly is the only way to enable read caching without enabling write caching as well).
options
Provides the only way to specify mount options in addition to those already specified in the other mount point settings. Enter a string with mount options separated by comma and no spaces. For information about available options and their implications, see "Additional Options for Use in the Options Setting of a Client Mount." Note that it is necessary to specify maxnodes=1 here for a read-write client mount of an NFS or other non-cluster, non-coherent file system.
  • Full setting name: cray_dvs.settings.client_mount.data.REF-NAME.options
  • Level: advanced
  • Default value: "" (empty string)
  • Associated environment variable: none
  • Related settings/options: Options contained in this setting will be appended to the mount options specified in other settings. Any that are functionally redundant with settings already configured (such as nodename/nodefile, which are redundant with the server_groups setting) will override those settings.

Additional Options for Use in the Options Setting of a Client Mount

All of the mount options listed in this section can be used only in the options setting of a client mount setting in the configurator or DVS configuration worksheet. The options setting is level advanced, so specify -l advanced when invoking the configurator to be able to use these mount options.
atomic / noatomic
atomic enables atomic stripe parallel mode. This ensures that stripe parallel requests adhere to POSIX read/write atomicity rules. DVS clients send each I/O request to a single DVS server to ensure that the bytes are not interleaved with other requests from DVS clients. The DVS server used to perform the read, write, or metadata operation is selected using an internal hash involving the underlying file or directory inode number and the offset of data into the file relative to the DVS block size.

noatomic disables atomic stripe parallel mode. If there are multiple DVS servers and neither loadbalance nor cluster parallel mode is specified, DVS stripes I/O requests across multiple servers and does not necessarily adhere to POSIX read/write atomicity rules if file locking is not used.

  • Default value: noatomic or 0
  • Associated environment variable: DVS_ATOMIC
  • Related settings/options: none
attrcache_timeout
Do not use this option in the options setting. Use the configurator setting (attrcache_timeout) instead.
blksize=n
blksize=n sets the DVS block size to n bytes. Used in striping.
  • Default value: 524288 (512 KB)
  • Associated environment variable: DVS_BLOCKSIZE
  • Related settings/options: none
cache / nocache
cache enables client-side caching of both read and write data. The client node caches reads from the DVS server node, caches writes from user applications that are aggregated and later 'written back' to the backing file system storage on the DVS server node, and provides data to user applications from the page cache if possible, instead of performing a data transfer from the DVS server node. For more information, see "DVS Client-side Write-back Caching can Yield Performance Gains" in XC™ Series DVS Administration Guide (S-0005). Cray DVS is not a clustered file system; no coherency is maintained among multiple DVS client nodes reading and writing to the same file. If cache is enabled and data consistency is required, applications must take care to synchronize their accesses to the shared file.

nocache disables client-side read/write caching.

  • Default value: nocache or 0
  • Associated environment variable: DVS_CACHE (use with caution)
  • Related settings/options: When loadbalance is enabled, DVS automatically enables cache. If readonly enabled and the cache option is used, the client node will cache only read data (this is equivalent to disabling write caching).
Important: If enabling read/write caching, read "DVS Client-side Write-back Caching can Yield Performance Gains" in XC™ Series DVS Administration Guide (S-0005). to understand the implications and prevent data corruption.
cache_read_sz
cache_read_sz is a limit that can be specified to prevent reads or writes over this size from being cached in the Linux page cache.
  • Default value: 0
  • Associated environment variable: DVS_CACHE_READ_SZ
  • Related settings/options: If cache is not enabled, DVS ignores cache_read_sz.
closesync / noclosesync
closesync enables data synchronization on last close of a file. When a process performs the final close of a file descriptor, in addition to forwarding the close to the DVS server, the DVS server node waits until data has been written to the underlying media before indicating that the close has completed. Because DVS does not cache data on client nodes (unless the cache option is used) and has no replay capabilities, this ensures that data is not lost if a server node crashes after an application has exited.

noclosesync causes DVS to return a close() request immediately.

  • Default value: noclosesync or 0
  • Associated environment variable: DVS_CLOSESYNC
  • Related settings/options: The closesync option is redundant with periodic sync, which is enabled by default. Because periodic sync is more efficient than closesync, Cray recommends letting periodic sync take care of data synchronization instead of using this mount option. See "Periodic Sync Promotes Data and Application Resiliency" in XC™ Series DVS Administration Guide (S-0005).
datasync / nodatasync
datasync enables data synchronization. The DVS server node waits until data has been written to the underlying media before indicating that the write has completed. Can significantly impact performance.

nodatasync causes a DVS server node to return from a write request as soon as the user's data has been written into the page cache on the server node.

  • Default value: nodatasync or 0
  • Associated environment variable: DVS_DATASYNC
  • Related settings/options: none
deferopens / nodeferopens
deferopens defers DVS client open requests to DVS servers for a given set of conditions. When a file is open in stripe parallel mode or atomic stripe parallel mode, DVS clients send the open request to a single DVS server only. Additional open requests are sent as necessary when the DVS client performs a read or write to a different server for the first time. The deferopens option deviates from POSIX specifications. For example, if a file was removed after the initial open succeeded but before deferred opens were initiated by a read or write operation to a new server, the read or write operation would fail with errno set to ENOENT because the open was unable to open the file.

nodeferopens disables the deferral of DVS client open requests to DVS servers. When a file is open in stripe parallel mode or atomic stripe parallel mode, DVS clients send open requests to all DVS servers denoted by nodename or nodefile.

  • Default value: nodeferopens or 0
  • Associated environment variable: DVS_DEFEROPENS
  • Related settings/options: The deferopens option must be used if the dwfs option is used.
distribute_create_ops / nodistribute_create_ops
distribute_create_ops causes DVS to change its hashing algorithm so that create and lookup requests are distributed across all of the servers, as opposed to being distributed to a single server. This applies to creates, mkdirs, lookups, mknods, links, and symlinks. 

nodistribute_create_ops causes DVS to use its normal algorithm of using just one target server.

  • Default value: nodistribute_create_ops or 0
  • Associated environment variable: none
  • Related settings/options: none
dwfs / nodwfs
dwfs specifies that the remote file system mounted under DVS is dwfs (DataWarp file system). This should be used even if there are layers between DVS and dwfs (e.g., DVS -> accountfs -> dwfs).

nodwfs is the default, where DVS does not support a DataWarp file system.

  • Default value: nodwfs or off.
  • Associated environment variable: none
  • Related settings/options: The dwfs option can be used only if the deferopens option is used.
failover / nofailover
failover enables failover and failback of DVS servers. If all servers fail, operations for the mount point behave as described by the retry option until at least one server is rebooted and has loaded DVS. If multiple DVS servers are listed for a client mount and one or more of the servers fails, operations for that mount continue by using the subset of servers still available. When the downed servers are rebooted and start DVS, any client mounts that had performed failover operations failback to once again include the servers as valid nodes for I/O forwarding operations.

nofailover disables failover and failback of DVS servers. If one or more servers for a given client mount fail, operations for that mount behave as described by the retry or noretry option specified for the client mount.

  • Default value: failover or 1
  • Associated environment variable: none
  • Related settings/options: When the failover option is enabled (occurs automatically when loadbalance is enabled), the noretry option cannot be enabled.
hash
Except in cases of extremely advanced administrators or specific advice from DVS developers, do not use the hash mount option. The best course of action is to let DVS use its default value. The hash option has three possible values:
fnv-1a
hash=fnv-1a offers the best overall performance with very little variation due to differing numbers of servers.
jenkins
hash=jenkins is the hash that DVS previously used. It is included in the unlikely case of end-case pathological issues with the fnv-1a hash, but it has worse overall performance.
modulo
hash=modulo does not do any hash at all, but rather takes the modulo of the seed that it is given. This option can potentially have high load balancing characteristics, but is extremely vulnerable to pathological cases such as file systems that only allocate even numbered inodes or a prime number of servers.
  • Default value: fnv-1a
  • Associated environment variable: none
  • Related settings/options: none
hash_on_nid
With hash_on_nid set to on, DVS uses the nid of the client as the hash seed instead of using the file inode number. This effectively causes all request traffic for the compute node to go to a single server. This can help metadata operation performance by avoiding lock thrashing in the underlying file system when each process on a set of DVS clients is using a separate file.
  • Default value: off or 0
  • Associated environment variable: none
  • Related settings/options: When hash_on_nid is enabled (set to 1), DVS automatically sets the hash option to modulo. When loadbalance is enabled, DVS automatically sets hash_on_nid=1.
killprocess / nokillprocess
killprocess enables killing processes that have one or more file descriptors with data that has not yet been written to the backing store. DVS provides this option to minimize the risk of silent data loss, such as when data still resides in the kernel or file system page cache on the DVS server after a write has completed.

nokillprocess disables the killing of processes that have written data to a DVS server when a server fails. When a server fails, processes that have written data to the server are not killed. If a process continues to perform operations with an open file descriptor that had been used to write data to the server, the operations fail (with errno set to EHOSTDOWN). A new open of the file is allowed, and subsequent operations with the corresponding file descriptor function normally.

  • Default value: killprocess or 1
  • Associated environment variable: DVS_KILLPROCESS
  • Related settings/options: With the periodic sync feature (enabled by default), DVS servers attempt to fsync dirty files to minimize the number of processes that are killed and will also fsync a dirty file's data when the file is closed. If periodic sync is disabled (not recommended), the killprocess option alone cannot fully guarantee prevention of silent data loss (though it is highly unlikely) because a close() does not guarantee that data has been transferred to the underlying media (see the closesync option).
loadbalance/noloadbalance
Do not use this option in the options setting. Use the configurator setting (loadbalance) instead.
magic
magic defines what the expected file system magic value for the projected file system on the DVS servers should be. When a DVS client attempts to mount the file system from a server, it verifies that the underlying file system has a magic value that matches the specified value. If not, the DVS client excludes that DVS server from the list of servers it uses for the mount point and prints a message to the system console. Once the configuration issue on the DVS server has been addressed and the client mounts the correct file system, DVS can be restarted on the server. All clients subsequently verify that the server is configured correctly and include the server for that mount point. Many file system magic values are defined in the /usr/include/linux/magic.h file. Commonly used magic values on Cray systems are:
NFS0x6969
GPFS0x47504653
BTRFS0x9123683E
TMPFS0x01021994
  • Default value: the underlying file system's magic value
  • Associated environment variable: none
  • Related settings/options: none
maxnodes
maxnodes is used in configuring DVS modes.
  • Default value: number of nodes available (nnodes)
  • Associated environment variable: DVS_MAXNODES
  • Related settings/options: When loadbalance is enabled, DVS automatically sets maxnodes=1.
mds
mds=server, where server is the hostname for a DVS server, specifies which DVS server to use for metadata operations. Metadata will be sent only to the server specified. Used only for DataWarp file systems.
  • Default value: none
  • Associated environment variable: none
  • Related settings/options: When the dwfs option is used, mds must be used. Cray recommends not using mds if the dwfs option is not used.
nodefile
nodefile is the file name of a file with a list of server nodes specified as cnames separated by a colon (:) and no spaces. Do not use this option in the options setting. Use the configurator setting (server_groups) instead.
nodename
nodename is a list of server nodes specified as cnames separated by a colon (:) and no spaces. Do not use this option in the options setting. Use the configurator setting (server_groups) instead.
path
Do not use this option in the options setting. Use the configurator setting (spath) instead.
retry / noretry
retry enables the retry option, which affects how a DVS client node behaves in the event of a DVS server node going down. If retry is specified, any user I/O request is retried until it succeeds, receives an error other than a "node down" indication, or receives a signal to interrupt the I/O operation.

noretry disables retries of user I/O requests when the DVS server receiving the request is down.

  • Default value: retry or 1
  • Associated environment variable: none
  • Related settings/options: When the failover option is enabled, the noretry option cannot be enabled.
ro_cache / no_ro_cache
ro_cache enables read-only caching for files on writable client mounts. Files opened with read-only permissions in ro_cache mode are treated as if they were on a DVS read-only cached client mount. If the file has any concurrent open that has write permissions, all instances of that file revert to the default no_ro_cache mode for the current and subsequent reads.

no_ro_cache disables read-only caching for files on writable client mounts.

  • Default value: no_ro_cache or 0
  • Associated environment variable: none
  • Related settings/options: none
userenv / nouserenv
userenv specifies that DVS must honor end user environment variable overrides for DVS mount options.

nouserenv allows the administrator to block end user environment variable overrides for DVS mount options.

  • Default value: userenv or 1
  • Associated environment variable: none
  • Related settings/options: none

Kernel Module Parameter Settings

Setting kernel module parameters during initial system configuration is just like setting any other configuration data values. However, changing them later to reconfigure a service may require reloading the module to enable the change to take effect. That is why Cray recommends viewing all module parameter settings as permanent once they are set during initial configuration, before modules are loaded.
dvsipc_heartbeat_timeout
DVS inter-process communication (IPC) heartbeat timeout, in seconds. This parameter is no longer used; it has been preserved only to maintain backwards compatibility with existing DVS config files. Leave this parameter unconfigured or accept the default value.
  • Full setting name: cray_dvs.settings.dvsipc_heartbeat_timeout
  • Level: advanced
  • Default value: 60
  • Related settings/options: none

Additional Kernel Module Parameters

There are many DVS kernel module parameters that cannot be set within the configurator. For a list of them and instructions on how to set them, see Configure DVS using Modprobe or Proc Files.