Diagnose CGE Python API Issues
Troubleshooting information for the CGE Python API
Exceptions
The Java JVM will pass exception information back to the python interpreter. Here are examples of common runtime and programming errors that produce exceptions:- Starting CGE with a reference to a nonexistent dataset - An exception will occur if the dataset referenced in the
forExistingDatabase()invocation does not exist.>>> >>> my_cge_launcher_builder.forExistingDatabase("/mnt/lustre/xxx/ripple/mkdb/sp2b/25k") Traceback (most recent call last): File "test.py", line 66, in <module> my_cge_launcher_builder.forExistingDatabase("/mnt/lustre/xxx/ripple/mkdb/sp2b/25k") File "/usr/lib/python2.7/site-packages/py4j/java_gateway.py", line 1133, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/usr/lib/python2.7/site-packages/py4j/protocol.py", line 319, in get_return_value format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling o6.forExistingDatabase. : java.lang.IllegalArgumentException: Database directory /mnt/lustre/xxx/ripple/mkdb/sp2b/25k must be an existing directory at com.cray.cge.api.builders.CgeLauncherBuilder.forExistingDatabase(CgeLauncherBuilder.java:65) at com.cray.cge.api.builders.CgeLauncherBuilder.forExistingDatabase(CgeLauncherBuilder.java:95) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745) - Running a query against a connection where the cge-server has already exited - The
my_connobject is still valid, but the call toquerySummary()generates an exception because the CGE server is not running.>>> my_conn.isRunning() False >>> >>> >>> my_query_results = my_conn.querySummary(DEFAULT_QUERY) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/site-packages/py4j/java_gateway.py", line 1133, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/usr/lib/python2.7/site-packages/py4j/protocol.py", line 319, in get_return_value format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling o25.querySummary. : com.hp.hpl.jena.query.QueryExecException: There was an error communicating with the remote server at com.cray.cge.sparql.engine.CgeQueryEngine.eval(CgeQueryEngine.java:157) at com.hp.hpl.jena.sparql.engine.QueryEngineBase.evaluateNoMgt(QueryEngineBase.java:142) at com.hp.hpl.jena.sparql.engine.QueryEngineBase.createPlan(QueryEngineBase.java:110) at com.hp.hpl.jena.sparql.engine.QueryEngineBase.getPlan(QueryEngineBase.java:88) at com.cray.cge.api.builders.CgeConnectionImpl.querySummary(CgeConnectionImpl.java:628) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745) Caused by: com.cray.cge.communications.messaging.exceptions.CommunicationsSecurityException: \ Unable to establish a connection to the database server at localhost:23239 as it does not appear to be running at com.cray.cge.communications.client.ssh.SshClient.connect(SshClient.java:484) at com.cray.cge.communications.client.AbstractClient.connect(AbstractClient.java:61) at com.cray.cge.sparql.engine.CgeQueryEngine.eval(CgeQueryEngine.java:102) ... 15 more Caused by: com.jcraft.jsch.JSchException: java.net.ConnectException: Connection refused at com.jcraft.jsch.Util.createSocket(Util.java:394) at com.jcraft.jsch.Session.connect(Session.java:215) at com.cray.cge.communications.client.ssh.SshClient.connect(SshClient.java:439) ... 17 more Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at java.net.Socket.connect(Socket.java:538) at java.net.Socket.<init>(Socket.java:434) at java.net.Socket.<init>(Socket.java:211) at com.jcraft.jsch.Util$1.run(Util.java:362) - Invoking
withJobOptions()more than once - This shows how thewithJobOptions()function can only be invoked once for a given instance of theCgeLauncherBuilder.>>> >>> my_cge_launcher_builder.withJobOptions(my_cge_joboptions) >>> >>> my_cge_launcher_builder.withJobOptions(my_cge_joboptions) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/site-packages/py4j/java_gateway.py", line 1133, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/usr/lib/python2.7/site-packages/py4j/protocol.py", line 319, in get_return_value format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling o7.withJobOptions. : java.lang.IllegalStateException: Cannot set job options as they have already been set at com.cray.cge.api.builders.CgeLauncherBuilder.withJobOptions(CgeLauncherBuilder.java:144) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745)
Errors
- Attempt to access gateway that has been shutdown - This error shows a legitimate shutdown of the JVM, but then an attempt to utilize the previously active connection.
>>> >>> gateway.shutdown() >>> >>> my_conn.getPort() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "py4j/java_gateway.py", line 1131, in __call__ answer = self.gateway_client.send_command(command) File "py4j/java_gateway.py", line 881, in send_command connection = self._get_connection() File "py4j/java_gateway.py", line 825, in _get_connection raise Py4JNetworkError("Gateway is not connected.") py4j.protocol.Py4JNetworkError: Gateway is not connected. >>> >>> - Shutting down the gateway before stopping the connection- This error shows a legitimate shutdown of the JVM, then an attempt to stop the CGE server.
>>> >>> gateway.shutdown() >>> >>> my_conn.stop() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "py4j/java_gateway.py", line 1131, in __call__ answer = self.gateway_client.send_command(command) File "py4j/java_gateway.py", line 881, in send_command connection = self._get_connection() File "py4j/java_gateway.py", line 825, in _get_connection raise Py4JNetworkError("Gateway is not connected.") py4j.protocol.Py4JNetworkError: Gateway is not connected. - Not enough CPUs available to launch CGE - After starting the connection and waiting a suitable start up time, the call to
isRunning()returnsFalse, and the call forstatus()returnsFailedandNotRunning.
The error can be seen in the cge_runtime.log.>>> my_conn.start() >>> >>> my_conn.isRunning() False >>> >>> my_CgeStatus = my_conn.status() >>> my_CgeStatus.toString() u'Process: Failed - CGE: NotRunning'Tue Sep 20 2016 16:28:38.336870 CDT[][mrun]:ERROR:Not enough CPUs for exclusive access. Available: 1 Needed: 2 - Exiting python without explicitly running
gateway.shutdown()- This leaves the Java JVM process as a still-active orphan process.[userid@nid00030 ~]$ top -u $USER PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 64461 userid 20 0 35.778g 36304 14640 S 0.0 0.0 0:00.42 java
in which case the user should kill the process explicitly:[userid@nid00030~]$ kill -964461