Create Checkpoints Using the CGE checkpoint Command

Purpose, syntax and examples of using the CGE checkpoint command.

The checkpoint command is used to request checkpoint creation. A checkpoint is a dump to disk of the current database state, optionally including a NQuads file that can be used to export the database to other tools. It is a compiled database consisting of a dbQuads, string_table_chars, and string_table_chars.index file.

This command simply accepts a directory path to create the checkpoint in. The checkpoint directory is specified as a URI, which may be a full path, such as file:// or hdfs:///URL. It can also be a relative URI, in which case it will be resolved relative to the base URI on the server, which is the current database directory. If a relative path is used, the path will be evaluated relative to the data directory of the running CGE instance.

By using that directory's path as the checkpoint's path, it is possible to checkpoint to the same data directory the user started from. A successfully created checkpoint will overwrite the existing dbQuads, string_table_chars and string_table_chars.index files, so that the new dataset is retrieved the next time the user starts from that directory. Alternatively, it is also possible to checkpoint to another directory. If the directory already contains a dataset, and the checkpoint succeeds, the dataset will be overwritten.

If the data directory is being moved to a different location, shutdown any instance of CGE that was launched using that data directory before relaunching CGE.

While using the checkpoint command:
  • If a full URL is used, the checkpoint is written exactly as specified by the URL, which means that an HDFS URL will cause the checkpoint to be written to the path specified in the URL on the HDFS file system described by the rest of the URL, and a FILE URL (i.e. file:/path) will be written to the POSIX file system at the pathname specified in the URL.
  • If a relative path (i.e. a simple path with no leading / character) is used, the checkpoint will be written in a directory relative to the data directory used at CGE start up.
  • If a full pathname, but not a URL is specified, the pathname will be interpreted within the space specified by the URL of the data directory used at CGE start up. Therefore, if CGE was started using an HDFS URL, the checkpoint will be written at the specified path within HDFS, otherwise if CGE was started with a simple pathname or FILE URL, the checkpoint will be written at the specified path within the POSIX file space.
  • The checkpoint command allows overwriting existing checkpoints. However it will do so in such a way that it guarantees that this is an atomic operation. This means that either the checkpoint is overwritten and replaced, or the previous checkpoint will continue to exist.

For more information, see the cge-cli-checkpoint(1) man page.

Examples

Use a relative URL to a file

$ cge-cli checkpoint /lus/scratch/user/db/cp1

Use a HDFS URL

$ cge-cli checkpoint hdfs:///user/db/cp1

Use NQuads

If an NQuads file needs to be generated for use with other RDF and SPARQL tools, use the -q or --quads option of the checkpoint command, as shown in the following example:
$ cge-cli checkpoint --quads /lus/scratch/user/db/cp1
Checkpoint creation succeeded