CGE Hadoop HDFS Configuration
How Lustre and HDFS data is retrieved from different location paths.
The CGE CLI requires access to HDFS configuration to retrieve data results and configuration files that may exist there. As such, the value of the HADOOP_CONF_DIR environment variable is inspected and relevant configurations files from this directory are used if this variable specifies a valid directory, otherwise the default location /etc/hadoop/conf is searched. The system will display log output, which lists configurations files that are used if the verbose mode is enabled.
HDFS and Lustre URL Path Locations
Specify a full URL to the Lustre file system when check-pointing to Lustre. The pathname specified is interpreted relative to the scheme and authority of the data directory URL. To checkpoint to a different scheme, specify the scheme's URL. While check-pointing to Lustre from HDFS, the following path will inform the checkpoint command where to store the data:file:/mnt/lustre/my/data/directory
- The checkpoint is written exactly as specified by the URL if a full URL is used. This means that an HDFS URL will cause the checkpoint to be written to the path specified in the URL on the HDFS file system described by the rest of the URL, and a file URL (i.e. file:/path) will be written to the POSIX file system at the pathname specified in the URL.
- The checkpoint will be written in a directory relative to the data directory used at CGE start up if a relative path (i.e., a simple path with no leading '/' character) is used.
- The pathname will be interpreted within the space specified by the URL of the data directory used at CGE start up if a full pathname but no URL is specified. The checkpoint will be written at the specified path within HDFS if CGE was started using an HDFS URL. The checkpoint will be written at the specified path within the POSIX file space if CGE was started with a simple pathname or file URL.