Diagnose CGE Python API Issues

Troubleshooting information for the CGE Python API

Exceptions

The Java JVM will pass exception information back to the python interpreter. Here are examples of common runtime and programming errors that produce exceptions:
  • Starting CGE with a reference to a nonexistent dataset - An exception will occur if the dataset referenced in the forExistingDatabase() invocation does not exist.
    >>>
    >>> my_cge_launcher_builder.forExistingDatabase("/mnt/lustre/xxx/ripple/mkdb/sp2b/25k")
     
    Traceback (most recent call last):
      File "test.py", line 66, in <module>
        my_cge_launcher_builder.forExistingDatabase("/mnt/lustre/xxx/ripple/mkdb/sp2b/25k")
      File "/usr/lib/python2.7/site-packages/py4j/java_gateway.py", line 1133, in __call__
        answer, self.gateway_client, self.target_id, self.name)
      File "/usr/lib/python2.7/site-packages/py4j/protocol.py", line 319, in get_return_value
        format(target_id, ".", name), value)
    py4j.protocol.Py4JJavaError: An error occurred while calling o6.forExistingDatabase.
    : java.lang.IllegalArgumentException: Database directory /mnt/lustre/xxx/ripple/mkdb/sp2b/25k must be an existing directory
            at com.cray.cge.api.builders.CgeLauncherBuilder.forExistingDatabase(CgeLauncherBuilder.java:65)
            at com.cray.cge.api.builders.CgeLauncherBuilder.forExistingDatabase(CgeLauncherBuilder.java:95)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:498)
            at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
            at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
            at py4j.Gateway.invoke(Gateway.java:280)
            at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
            at py4j.commands.CallCommand.execute(CallCommand.java:79)
            at py4j.GatewayConnection.run(GatewayConnection.java:214)
            at java.lang.Thread.run(Thread.java:745)
  • Running a query against a connection where the cge-server has already exited - The my_conn object is still valid, but the call to querySummary() generates an exception because the CGE server is not running.
    >>> my_conn.isRunning()
    False
    >>>
    >>>
    >>> my_query_results = my_conn.querySummary(DEFAULT_QUERY)
    Traceback (most recent call last):
       File "<stdin>", line 1, in <module>
       File "/usr/lib/python2.7/site-packages/py4j/java_gateway.py", line 1133, in __call__
         answer, self.gateway_client, self.target_id, self.name)
       File "/usr/lib/python2.7/site-packages/py4j/protocol.py", line 319, in get_return_value
         format(target_id, ".", name), value)
    py4j.protocol.Py4JJavaError: An error occurred while calling o25.querySummary.
    : com.hp.hpl.jena.query.QueryExecException: There was an error communicating with the remote server
             at com.cray.cge.sparql.engine.CgeQueryEngine.eval(CgeQueryEngine.java:157)
             at com.hp.hpl.jena.sparql.engine.QueryEngineBase.evaluateNoMgt(QueryEngineBase.java:142)
             at com.hp.hpl.jena.sparql.engine.QueryEngineBase.createPlan(QueryEngineBase.java:110)
             at com.hp.hpl.jena.sparql.engine.QueryEngineBase.getPlan(QueryEngineBase.java:88)
             at com.cray.cge.api.builders.CgeConnectionImpl.querySummary(CgeConnectionImpl.java:628)
             at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
             at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
             at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
             at java.lang.reflect.Method.invoke(Method.java:498)
             at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
             at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
             at py4j.Gateway.invoke(Gateway.java:280)
             at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
             at py4j.commands.CallCommand.execute(CallCommand.java:79)
             at py4j.GatewayConnection.run(GatewayConnection.java:214)
             at java.lang.Thread.run(Thread.java:745)
    Caused by: com.cray.cge.communications.messaging.exceptions.CommunicationsSecurityException: \
    Unable to establish a connection to the database server at localhost:23239 as it does not appear to be running
             at com.cray.cge.communications.client.ssh.SshClient.connect(SshClient.java:484)
             at com.cray.cge.communications.client.AbstractClient.connect(AbstractClient.java:61)
             at com.cray.cge.sparql.engine.CgeQueryEngine.eval(CgeQueryEngine.java:102)
             ... 15 more Caused by: com.jcraft.jsch.JSchException: java.net.ConnectException: Connection refused
             at com.jcraft.jsch.Util.createSocket(Util.java:394)
             at com.jcraft.jsch.Session.connect(Session.java:215)
             at com.cray.cge.communications.client.ssh.SshClient.connect(SshClient.java:439)
             ... 17 more Caused by: java.net.ConnectException: Connection refused
             at java.net.PlainSocketImpl.socketConnect(Native Method)
             at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
             at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
             at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
             at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
             at java.net.Socket.connect(Socket.java:589)
             at java.net.Socket.connect(Socket.java:538)
             at java.net.Socket.<init>(Socket.java:434)
             at java.net.Socket.<init>(Socket.java:211)
             at com.jcraft.jsch.Util$1.run(Util.java:362)
  • Invoking withJobOptions() more than once - This shows how the withJobOptions() function can only be invoked once for a given instance of the CgeLauncherBuilder.
    >>>
    >>> my_cge_launcher_builder.withJobOptions(my_cge_joboptions)
    >>>
    >>> my_cge_launcher_builder.withJobOptions(my_cge_joboptions)
    Traceback (most recent call last):  
       File "<stdin>", line 1, in <module>
       File "/usr/lib/python2.7/site-packages/py4j/java_gateway.py", line 1133, in __call__
         answer, self.gateway_client, self.target_id, self.name)
       File "/usr/lib/python2.7/site-packages/py4j/protocol.py", line 319, in get_return_value
         format(target_id, ".", name), value)
    py4j.protocol.Py4JJavaError: An error occurred while calling o7.withJobOptions.
    : java.lang.IllegalStateException: Cannot set job options as they have already been set
             at com.cray.cge.api.builders.CgeLauncherBuilder.withJobOptions(CgeLauncherBuilder.java:144)
             at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
             at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
             at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
             at java.lang.reflect.Method.invoke(Method.java:498)
             at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
             at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
             at py4j.Gateway.invoke(Gateway.java:280)
             at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
             at py4j.commands.CallCommand.execute(CallCommand.java:79)
             at py4j.GatewayConnection.run(GatewayConnection.java:214)
             at java.lang.Thread.run(Thread.java:745)

Errors

  • Attempt to access gateway that has been shutdown - This error shows a legitimate shutdown of the JVM, but then an attempt to utilize the previously active connection.
    >>>
    >>> gateway.shutdown()
    >>>
    >>> my_conn.getPort()
    Traceback (most recent call last):
       File "<stdin>", line 1, in <module>
       File "py4j/java_gateway.py", line 1131, in __call__
         answer = self.gateway_client.send_command(command)
       File "py4j/java_gateway.py", line 881, in send_command
         connection = self._get_connection()
       File "py4j/java_gateway.py", line 825, in _get_connection
         raise Py4JNetworkError("Gateway is not connected.")
    py4j.protocol.Py4JNetworkError: Gateway is not connected.
    >>>
    >>>
  • Shutting down the gateway before stopping the connection- This error shows a legitimate shutdown of the JVM, then an attempt to stop the CGE server.
    >>>
    >>> gateway.shutdown()
    >>>
    >>> my_conn.stop()
    Traceback (most recent call last):
       File "<stdin>", line 1, in <module>
       File "py4j/java_gateway.py", line 1131, in __call__
         answer = self.gateway_client.send_command(command)
       File "py4j/java_gateway.py", line 881, in send_command
         connection = self._get_connection()
       File "py4j/java_gateway.py", line 825, in _get_connection
         raise Py4JNetworkError("Gateway is not connected.")
    py4j.protocol.Py4JNetworkError: Gateway is not connected.
  • Not enough CPUs available to launch CGE - After starting the connection and waiting a suitable start up time, the call to isRunning() returns False, and the call for status() returns Failed and NotRunning.
    >>> my_conn.start()
    >>>
    >>> my_conn.isRunning()
    False
    >>>
    >>> my_CgeStatus = my_conn.status()
    >>> my_CgeStatus.toString()
    u'Process: Failed - CGE: NotRunning'
    The error can be seen in the cge_runtime.log.
    Tue Sep 20 2016 16:28:38.336870 CDT[][mrun]:ERROR:Not enough CPUs for exclusive access. Available: 1 Needed: 2
  • Exiting python without explicitly running gateway.shutdown() - This leaves the Java JVM process as a still-active orphan process.
    [userid@nid00030 ~]$ top -u $USER
      PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM   TIME+   COMMAND
    64461 userid    20   0 35.778g  36304  14640 S   0.0  0.0   0:00.42 java
    in which case the user should kill the process explicitly:
    [userid@nid00030~]$ kill -964461