DVS Configuration Settings and Mount Options
Mount options and kernel module parameters used to configure DVS and enhance its performance.
modprobe.d files in the Simple Sync directory structure, not on the command line or by adding lines to /etc files (e.g., /etc/fstab). The following sections describe the settings that are available within the configurator for that purpose. The first two sections cover settings that are part of the client mount setting. The last section covers the remaining DVS settings, which are kernel module parameters.Client Mount Settings
reference- A human-readable string—a name—that is used to uniquely identify a client mount.
referencecannot be set by accepting the default: a non-empty string is required.- Full setting name:
cray_dvs.settings.client_mount.data.reference.REF-NAME(where REF-NAME is the user-provided client mount reference name) - Level:
basic - Default value:
''(empty string) - Associated environment variable: none
- Related settings/options: Because this is the key field of a client mount setting entry, each setting within the client mount setting includes this string in its full setting name.
- Full setting name:
mount_point- A string that specifies the full pathname on the client of the projected file system.
mount_pointcannot be set by accepting the default: a non-empty string is required.- Full setting name:
cray_dvs.settings.client_mount.data.REF-NAME.mount_point - Level:
basic - Default value:
''(empty string) - Associated environment variable: none
- Related settings/options: none
- Full setting name:
spath- A string that specifies the full pathname on the DVS server of the file system that is to be projected for a client mount. It must be an absolute path and it must exist on the DVS server.
spathcannot be set by accepting the default: a non-empty string is required.- Full setting name:
cray_dvs.settings.client_mount.data.REF-NAME.spath - Level:
basic - Default value:
''(empty string) - Associated environment variable: none
- Related settings/options: none
- Full setting name:
server_groups- A list of node groups that will function as DVS servers for a client mount. Enter one node group per line.
server_groupscannot be set by accepting the default: a non-empty list is required.Important: DVS servers should be dedicated because they use unlimited amounts of CPU and memory resources based directly on the I/O requests sent from DVS clients. Avoid using nodes that have other services (Lustre nodes, login nodes, etc.) or are tier2 nodes.- Full setting name:
cray_dvs.settings.client_mount.data.REF-NAME.servers - Level:
basic - Default value:
[](empty list) - Associated environment variable: none
- Related settings/options: Functionally equivalent to the
nodenameornodefile"additional" option in theoptionssetting of the client mount setting. The use of those two additional options is deprecated.
- Full setting name:
client_groups- A list of node groups that will function as DVS clients for a client mount. Enter node groups one per line. Unlike
server_groups,client_groupscan be set to an empty list. If no node groups are specified, the mount will be performed on all suitable compute nodes (a compute node functioning as a DVS server is an example of an unsuitable node). This is common.- Full setting name:
cray_dvs.settings.client_mount.data.REF-NAME.clients - Level:
basic - Default value:
[](empty list) - Associated environment variable: none
- Related settings/options: none
- Full setting name:
loadbalance- Used to specify loadbalance mode, which more evenly distributes loads across DVS servers. Loadbalance mode is valid only for read-only mounts.
- Full setting name:
cray_dvs.settings.client_mount.data.REF-NAME.loadbalance - Level:
advanced - Default value:
falseor0 - Associated environment variable: none
- Related settings/options: When
loadbalanceis enabled, the underlying DVS implementation automatically sets thereadonlysetting totrueand sets these additional options:cache=1,failover=1,maxnodes=1, andhash_on_nid=1. Cray recommends setting theattrcache_timeoutsetting as well to take advantage of the mount being read-only. Ifloadbalanceis enabled, leave thereadonlysetting unconfigured or set it totrueto maintain consistency with the way DVS implementsloadbalance.
- Full setting name:
attrcache_timeout- Enables client-side attribute caching, which can significantly increase performance, most notably in pathname lookup situations. File attributes and
dentriesfor getattr requests, pathname lookups, etc. are read from DVS servers and cached on the DVS client for n seconds. Subsequent lookups or getattr requests use the cached attributes until the timeout expires, at which point they are read and cached again on first reference. When attribute caching is disabled, DVS clients must send a lookup request to a DVS server for every level of a pathname, and repeat this for every pathname operation. When it is enabled, it sends a lookup request to a DVS server for every level of a pathname once per n seconds.Note: An administrator with root privilege can force a cache revalidation at any time, not just when the timeout has expired. See "Force a Cache Revalidation on a DVS Mount Point," which can be found in XC™ Series DVS Administration Guide (S-0005) or XC™ Series Software Installation and Configuration Guide (S-2559).- Full setting name:
cray_dvs.settings.client_mount.data.REF-NAME.attrcache_timeout - Level:
advanced - Default value:
14400seconds for read-only mounts.Important: This is the configurator default. The underlying DVS implementation default is3seconds, which is safer for read-write mounts. This means that to enhance system performance for read-only mounts, configure this setting by accepting the configurator default (or entering some other value). Leaving this setting unconfigured will result in the underlying default being used. - Associated environment variable: none
- Related settings/options: The Ansible play that consumes DVS configuration data prevents use of this mount option for read-write file systems due to the risk of file system corruption. Run-time mounts not accompanied by that Ansible play do not have that safeguard. In such cases, if a read-write mount is created, it is safe to leave
attrcache_timeoutunconfigured so that the underlying default is used.
- Full setting name:
readonly- Determines whether the client mount is read-only or read-write. If intending to enable client-side caching of read data on a non-writable file system, use this
readonlysetting to force the DVS mount to be read-only. This will disable write caching.- Full setting name:
cray_dvs.settings.client_mount.data.REF-NAME.readonly - Level:
basic - Default value:
trueor1Important: This is the configurator default. The underlying DVS implementation default isfalse. Leaving this setting unconfigured will result in the underlying DVS default being used. - Associated environment variable: none
- Related settings/options: When
loadbalanceis enabled, DVS automatically enablesreadonlybut the configurator does not, so either leave this setting unconfigured or accept the configurator default. If theattrcache_timeoutsetting is set for this client mount,readonlyshould be enabled (set to true) in the configurator/worksheet. If thecacheoption is specified in theoptionssetting for this client mount, enablingreadonlyis the only way to enable read caching without enabling write caching as well).
- Full setting name:
options- Provides the only way to specify mount options in addition to those already specified in the other mount point settings. Enter a string with mount options separated by comma and no spaces. For information about available options and their implications, see "Additional Options for Use in the Options Setting of a Client Mount." Note that it is necessary to specify
maxnodes=1here for a read-write client mount of an NFS or other non-cluster, non-coherent file system.- Full setting name:
cray_dvs.settings.client_mount.data.REF-NAME.options - Level:
advanced - Default value: "" (empty string)
- Associated environment variable: none
- Related settings/options: Options contained in this setting will be appended to the mount options specified in other settings. Any that are functionally redundant with settings already configured (such as nodename/nodefile, which are redundant with the
server_groupssetting) will override those settings.
- Full setting name:
Additional Options for Use in the Options Setting of a Client Mount
All of the mount options listed in this section can be used only in the options setting of a client mount setting in the configurator or DVS configuration worksheet. The options setting is level advanced, so specify -l advanced when invoking the configurator to be able to use these mount options.atomic/noatomicatomicenables atomic stripe parallel mode. This ensures that stripe parallel requests adhere to POSIX read/write atomicity rules. DVS clients send each I/O request to a single DVS server to ensure that the bytes are not interleaved with other requests from DVS clients. The DVS server used to perform the read, write, or metadata operation is selected using an internal hash involving the underlying file or directory inode number and the offset of data into the file relative to the DVS block size.noatomicdisables atomic stripe parallel mode. If there are multiple DVS servers and neither loadbalance nor cluster parallel mode is specified, DVS stripes I/O requests across multiple servers and does not necessarily adhere to POSIX read/write atomicity rules if file locking is not used.- Default value:
noatomicor0 - Associated environment variable:
DVS_ATOMIC - Related settings/options: none
- Default value:
attrcache_timeout- Do not use this option in the
optionssetting. Use the configurator setting (attrcache_timeout) instead. blksize=nblksize=nsets the DVS block size to n bytes. Used in striping.- Default value:
524288(512 KB) - Associated environment variable:
DVS_BLOCKSIZE - Related settings/options: none
- Default value:
cache/nocachecacheenables client-side caching of both read and write data. The client node caches reads from the DVS server node, caches writes from user applications that are aggregated and later 'written back' to the backing file system storage on the DVS server node, and provides data to user applications from the page cache if possible, instead of performing a data transfer from the DVS server node. For more information, see "DVS Client-side Write-back Caching can Yield Performance Gains" in XC™ Series DVS Administration Guide (S-0005). Cray DVS is not a clustered file system; no coherency is maintained among multiple DVS client nodes reading and writing to the same file. If cache is enabled and data consistency is required, applications must take care to synchronize their accesses to the shared file.nocachedisables client-side read/write caching.- Default value:
nocacheor0 - Associated environment variable:
DVS_CACHE(use with caution) - Related settings/options: When
loadbalanceis enabled, DVS automatically enablescache. Ifreadonlyenabled and thecacheoption is used, the client node will cache only read data (this is equivalent to disabling write caching).
Important: If enabling read/write caching, read "DVS Client-side Write-back Caching can Yield Performance Gains" in XC™ Series DVS Administration Guide (S-0005). to understand the implications and prevent data corruption.- Default value:
cache_read_szcache_read_szis a limit that can be specified to prevent reads or writes over this size from being cached in the Linux page cache.- Default value:
0 - Associated environment variable:
DVS_CACHE_READ_SZ - Related settings/options: If
cacheis not enabled, DVS ignorescache_read_sz.
- Default value:
closesync/noclosesyncclosesyncenables data synchronization on last close of a file. When a process performs the final close of a file descriptor, in addition to forwarding the close to the DVS server, the DVS server node waits until data has been written to the underlying media before indicating that the close has completed. Because DVS does not cache data on client nodes (unless thecacheoption is used) and has no replay capabilities, this ensures that data is not lost if a server node crashes after an application has exited.noclosesynccauses DVS to return a close() request immediately.- Default value:
noclosesyncor0 - Associated environment variable:
DVS_CLOSESYNC - Related settings/options: The
closesyncoption is redundant with periodic sync, which is enabled by default. Because periodic sync is more efficient thanclosesync, Cray recommends letting periodic sync take care of data synchronization instead of using this mount option. See "Periodic Sync Promotes Data and Application Resiliency" in XC™ Series DVS Administration Guide (S-0005).
- Default value:
datasync/nodatasyncdatasyncenables data synchronization. The DVS server node waits until data has been written to the underlying media before indicating that the write has completed. Can significantly impact performance.nodatasynccauses a DVS server node to return from a write request as soon as the user's data has been written into the page cache on the server node.- Default value:
nodatasyncor0 - Associated environment variable:
DVS_DATASYNC - Related settings/options: none
- Default value:
deferopens/nodeferopensdeferopensdefers DVS client open requests to DVS servers for a given set of conditions. When a file is open in stripe parallel mode or atomic stripe parallel mode, DVS clients send the open request to a single DVS server only. Additional open requests are sent as necessary when the DVS client performs a read or write to a different server for the first time. Thedeferopensoption deviates from POSIX specifications. For example, if a file was removed after the initial open succeeded but before deferred opens were initiated by a read or write operation to a new server, the read or write operation would fail witherrnoset toENOENTbecause the open was unable to open the file.nodeferopensdisables the deferral of DVS client open requests to DVS servers. When a file is open in stripe parallel mode or atomic stripe parallel mode, DVS clients send open requests to all DVS servers denoted by nodename or nodefile.- Default value:
nodeferopensor0 - Associated environment variable:
DVS_DEFEROPENS - Related settings/options: The
deferopensoption must be used if thedwfsoption is used.
- Default value:
distribute_create_ops/nodistribute_create_opsdistribute_create_opscauses DVS to change its hashing algorithm so that create and lookup requests are distributed across all of the servers, as opposed to being distributed to a single server. This applies to creates, mkdirs, lookups, mknods, links, and symlinks.nodistribute_create_opscauses DVS to use its normal algorithm of using just one target server.- Default value:
nodistribute_create_opsor0 - Associated environment variable: none
- Related settings/options: none
- Default value:
dwfs/nodwfsdwfsspecifies that the remote file system mounted under DVS is dwfs (DataWarp file system). This should be used even if there are layers between DVS and dwfs (e.g., DVS -> accountfs -> dwfs).nodwfsis the default, where DVS does not support a DataWarp file system.- Default value:
nodwfsoroff. - Associated environment variable: none
- Related settings/options: The
dwfsoption can be used only if thedeferopensoption is used.
- Default value:
failover/nofailoverfailoverenables failover and failback of DVS servers. If all servers fail, operations for the mount point behave as described by theretryoption until at least one server is rebooted and has loaded DVS. If multiple DVS servers are listed for a client mount and one or more of the servers fails, operations for that mount continue by using the subset of servers still available. When the downed servers are rebooted and start DVS, any client mounts that had performed failover operations failback to once again include the servers as valid nodes for I/O forwarding operations.nofailoverdisables failover and failback of DVS servers. If one or more servers for a given client mount fail, operations for that mount behave as described by theretryornoretryoption specified for the client mount.- Default value:
failoveror1 - Associated environment variable: none
- Related settings/options: When the
failoveroption is enabled (occurs automatically whenloadbalanceis enabled), thenoretryoption cannot be enabled.
- Default value:
hash- Except in cases of extremely advanced administrators or specific advice from DVS developers, do not use the
hashmount option. The best course of action is to let DVS use its default value. Thehashoption has three possible values:fnv-1ahash=fnv-1aoffers the best overall performance with very little variation due to differing numbers of servers.jenkinshash=jenkinsis the hash that DVS previously used. It is included in the unlikely case of end-case pathological issues with thefnv-1ahash, but it has worse overall performance.modulohash=modulodoes not do any hash at all, but rather takes themoduloof the seed that it is given. This option can potentially have high load balancing characteristics, but is extremely vulnerable to pathological cases such as file systems that only allocate even numbered inodes or a prime number of servers.
- Default value:
fnv-1a - Associated environment variable: none
- Related settings/options: none
hash_on_nid- With
hash_on_nidset toon, DVS uses the nid of the client as the hash seed instead of using the file inode number. This effectively causes all request traffic for the compute node to go to a single server. This can help metadata operation performance by avoiding lock thrashing in the underlying file system when each process on a set of DVS clients is using a separate file.- Default value:
offor0 - Associated environment variable: none
- Related settings/options: When
hash_on_nidis enabled (set to 1), DVS automatically sets thehashoption tomodulo. Whenloadbalanceis enabled, DVS automatically setshash_on_nid=1.
- Default value:
killprocess/nokillprocesskillprocessenables killing processes that have one or more file descriptors with data that has not yet been written to the backing store. DVS provides this option to minimize the risk of silent data loss, such as when data still resides in the kernel or file system page cache on the DVS server after a write has completed.nokillprocessdisables the killing of processes that have written data to a DVS server when a server fails. When a server fails, processes that have written data to the server are not killed. If a process continues to perform operations with an open file descriptor that had been used to write data to the server, the operations fail (witherrnoset toEHOSTDOWN). A new open of the file is allowed, and subsequent operations with the corresponding file descriptor function normally.- Default value:
killprocessor1 - Associated environment variable:
DVS_KILLPROCESS - Related settings/options: With the periodic sync feature (enabled by default), DVS servers attempt to fsync dirty files to minimize the number of processes that are killed and will also fsync a dirty file's data when the file is closed. If periodic sync is disabled (not recommended), the
killprocessoption alone cannot fully guarantee prevention of silent data loss (though it is highly unlikely) because a close() does not guarantee that data has been transferred to the underlying media (see theclosesyncoption).
- Default value:
loadbalance/noloadbalance- Do not use this option in the
optionssetting. Use the configurator setting (loadbalance) instead. magicmagicdefines what the expected file system magic value for the projected file system on the DVS servers should be. When a DVS client attempts to mount the file system from a server, it verifies that the underlying file system has a magic value that matches the specified value. If not, the DVS client excludes that DVS server from the list of servers it uses for the mount point and prints a message to the system console. Once the configuration issue on the DVS server has been addressed and the client mounts the correct file system, DVS can be restarted on the server. All clients subsequently verify that the server is configured correctly and include the server for that mount point. Many file system magic values are defined in the /usr/include/linux/magic.h file. Commonly used magic values on Cray systems are:NFS 0x6969 GPFS 0x47504653 BTRFS 0x9123683E TMPFS 0x01021994 - Default value: the underlying file system's magic value
- Associated environment variable: none
- Related settings/options: none
maxnodesmaxnodesis used in configuring DVS modes.- Default value: number of nodes available
(nnodes) - Associated environment variable:
DVS_MAXNODES - Related settings/options: When
loadbalanceis enabled, DVS automatically setsmaxnodes=1.
- Default value: number of nodes available
mdsmds=server, whereserveris the hostname for a DVS server, specifies which DVS server to use for metadata operations. Metadata will be sent only to the server specified. Used only for DataWarp file systems.- Default value: none
- Associated environment variable: none
- Related settings/options: When the
dwfsoption is used,mdsmust be used. Cray recommends not usingmdsif thedwfsoption is not used.
- nodefile
- nodefile is the file name of a file with a list of server nodes specified as cnames separated by a colon (:) and no spaces. Do not use this option in the
optionssetting. Use the configurator setting (server_groups) instead. - nodename
- nodename is a list of server nodes specified as cnames separated by a colon (:) and no spaces. Do not use this option in the
optionssetting. Use the configurator setting (server_groups) instead. path- Do not use this option in the
optionssetting. Use the configurator setting (spath) instead. retry/noretryretryenables the retry option, which affects how a DVS client node behaves in the event of a DVS server node going down. Ifretryis specified, any user I/O request is retried until it succeeds, receives an error other than a "node down" indication, or receives a signal to interrupt the I/O operation.noretrydisables retries of user I/O requests when the DVS server receiving the request is down.- Default value:
retryor1 - Associated environment variable: none
- Related settings/options: When the
failoveroption is enabled, thenoretryoption cannot be enabled.
- Default value:
ro_cache/no_ro_cachero_cacheenables read-only caching for files on writable client mounts. Files opened with read-only permissions inro_cachemode are treated as if they were on a DVS read-only cached client mount. If the file has any concurrent open that has write permissions, all instances of that file revert to the defaultno_ro_cachemode for the current and subsequent reads.no_ro_cachedisables read-only caching for files on writable client mounts.- Default value:
no_ro_cacheor0 - Associated environment variable: none
- Related settings/options: none
- Default value:
userenv/nouserenvuserenvspecifies that DVS must honor end user environment variable overrides for DVS mount options.nouserenvallows the administrator to block end user environment variable overrides for DVS mount options.- Default value:
userenvor1 - Associated environment variable: none
- Related settings/options: none
- Default value:
Kernel Module Parameter Settings
Setting kernel module parameters during initial system configuration is just like setting any other configuration data values. However, changing them later to reconfigure a service may require reloading the module to enable the change to take effect. That is why Cray recommends viewing all module parameter settings as permanent once they are set during initial configuration, before modules are loaded.dvsipc_heartbeat_timeout- DVS inter-process communication (IPC) heartbeat timeout, in seconds. This parameter is no longer used; it has been preserved only to maintain backwards compatibility with existing DVS config files. Leave this parameter unconfigured or accept the default value.
- Full setting name:
cray_dvs.settings.dvsipc_heartbeat_timeout - Level:
advanced - Default value:
60 - Related settings/options: none
- Full setting name:
Additional Kernel Module Parameters
There are many DVS kernel module parameters that cannot be set within the configurator. For a list of them and instructions on how to set them, see Configure DVS using Modprobe or Proc Files.