Query Cancellation

Mechanisms of halting a long running CGE query, list of associated NVPs and troubleshooting tips.

The CGE Server will cancel a request any time the client making the request disconnects from the server, or if the request exceeds an NVP configurable timeout value. Request cancellation can occur between operations within a query, inside the merge operation, inside the filter operation or inside the group-by operation. The first two of these will always recognize request cancellation, while cancellation must be explicitly enabled for the filter and group-by operations. Some of this optimization is disabled when cancellation is enabled, resting in slower operation. Set the server.LoopInterruptGranularitySeconds NVP value to a non-zero value (1 is a good choice) to enable cancellation in filter and group-by operations. This value can be set either in the cge.properties file or in the NVPs sent with a specific query. The maximum number of seconds defaults to 1 in merge operations, but can be increased by increasing this setting.

Wait for the memory allocation process to complete if query cancellation is taking longer than several minutes. Restart the CGE server on additional nodes to provide additional memory, thus preventing queries from slowing down frequently.

Process and Request Termination

The CGE CLI acts as a client to the database server. When a command requiring a connection to the database is executed, the control flow is as follows:

  1. Command performs any client side validation and processing that is necessary for the requested action
  2. A request to the database is prepared
  3. A connection to the database is established
  4. The request is submitted to the database
  5. The client waits until it receives a response from the database
  6. The response is processed as necessary
  7. Command returns results, if any, and exits with an appropriate exit code or continues on to the next requested action

If the process is terminated during steps four and five, CGE will make a best effort to terminate the submitted request by forcibly disconnecting the active connection. The database server will spot the disconnection and will terminate request processing at the next cancellation point. Cancellation may not be immediate and may take a long time to occur, depending on the current operation. When running the CGE SPARQL server, use the active connections interface to explicitly cancel requests submitted via HTTP.

Therefore after submitting a cancellation request for terminating a long running query, it may not be possible to submit further requests until the database has either cancelled/completed the previous request. Typically when this happens the system will return an error stating that the command line timed out trying to connect to the database. If query cancellation takes more than several minutes to complete, restart the CGE server on a larger block of nodes to provide additional memory and prevent queries from slowing down due to lack of memory. Restarting the database will lose any in-memory changes that were not yet checkpointed to disk. For databases with read/write workloads, checkpoint regularly prior to executing long running queries.

Query Cancellation Using a Timeout

Setting the server.QueryTimeout NVP value while submitting a query is another way of cancelling long running queries. The query will time out when the number of specified is reached, causing it to fail and send back a failure message. This can be useful when developing queries or when the duration of execution is unknown. Configure this setting either in the cge.properties file or specify it with the submitted query.

NVPs Associated with Query Cancellation

  • server.QueryTimeout - Set a timeout value in seconds for a given query or all queries
  • server.LoopInterruptGranularitySeconds - When non-zero, enables cancellation in the filter operation. When greater than 1 increases the interval, at which cancellation will be checked in merge and filter operations.
In addition to these user NVPs, there are three NVPs provided for internal testing purposes. These are listed here because setting them will cause a dramatic performance degradation for queries.
Warning: The default value for NVPs is 0. Do not modify this value unless advised by Cray Support for debugging purposes.
  • server.TestCancellationDispatcherPauseSeconds
  • server.TestCancellationFilterPauseSeconds
  • server.TestCancellationMakemergePauseSeconds
  • server.TestCancellationGroupInitHurisPauseSeconds
  • server.TestCancellationGroupEvalArgPauseSeconds