Logging and Troubleshooting

CGE logging and troubleshooting tips and techniques

CGE produces a text log, which is a trace of program execution during query or update processing. Users can view the log with a text editor (such as vi), or typically the Linux less command. The log can be searched using the grep command for text messages of interest.

INFO messages will be deposited into the log during normal operation. CGE can also generate ERROR and WARN messages. All of these messages can yield information about activity that takes place during command execution. 

System error message can be present in the log under conditions where CGE exits or improperly shuts down.

When queries or updates are executed, INFO messages with “now starting query #” are written to the log. For example:

2015-Feb-10 19:34:26.513 CST INFO [][7720] 0x43 parser/parseAndBuildSM.cpp@374 allocQueryGlobals [] [QRY ]  <OT> now starting query # 1 
Many other INFO messages will also be deposited to the log during normal operation. For example, long processing times can be seen in the log from one INFO message to the next:
2015-Feb-13 14:44:45.500 CST INFO [][9448] 0xb utils/malloc/cqe_malloc.cpp@901 LogRequest [] [QRY |MEM ] image 0 : request by "file: parser/qengine/database.cpp, func: readFromDisk line: 989" of 69.849 MiB (0x45d9688) was filled. (0x10005200c80)
2015-Feb-13 14:49:31.099 CST INFO [][9448] 0xc parser/qengine/database.cpp@1141 readFromDisk [] [QRY |STRT] time to read in db of size 139.698 GiB (0x22ecb28000): 285.679279 
When large datasets are used, the INFO message for the total start up time can be long, as shown in the following example:
2014-Dec-18 14:40:37.428 CST INFO [][25977] 0x5b parser/dbServer.cpp@1259 main [] [QRY |STRT|PERF] Total startup time: 1434.489315 seconds

The following are examples of ERROR messages  that CGE can produce when query or update processing has failed:

  1. No such file or directory
  2. No space left on device
  3. Exiting because malloc of
  4. Lookup failure for HURI
  5. Invalid graph algorithm name
  6. Exiting with status
  7. Bad entry
  8. Short read
  9. Assertion
  10. Realloc of
  11. Error detected in Dispatcher

It is recommend to search the log for the text: "ERROR" and contact Cray Support if problems are encountered in query or update processing.

The following are samples of WARN messages that can be produced. WARN messages are subjective in preceding errors in processing:
  1. huri was not found
  2. directory not specified
  3. not found in IRA
  4. No valid quads in database
  5. Invalid object for quad
  6. Number of warnings found
  7. Unsupported datatype
  8. not in the dictionary
  9. IRA huris not allocated
Search the log for WARN messages and contact Cray Support if problems in query or update processing are suspected.
The following are examples of system error messages  that CGE can produce when query or update processing has failed.  Search the log for the last INFO messages  and contact Cray Support if any of these follow:
  1. DUE TO TIME LIMIT
  2. terminate called without an active exception
  3. srun: error
  4. Segmentation fault
  5. Bus error
  6. free invalid pointer
  7. Out of memory
  8. Unable to terminate gracefully
  9. Floating point exception
  10. Aborted
  11. Killed
  12. Unable to allocate resources
  13. Exited with exit code
  14. Requested nodes are busy
  15. transaction completed with an error state
  16. LIBDMAPP ERROR
  17. IRI Resolution Error
  18. rpn not found for
  19. Trapped with SIGINT