CGE Error Messages and Resolution Information
Describes common CGE issues and provides information about troubleshooting.
The most common errors that are likely to be encountered while using CGE involve failure to connect to a database server successfully. There are a variety of different errors that can occur, depending on exactly what goes wrong. Common error messages that are likely to be encountered along with troubleshooting techniques are documented in the following table.
| Error Message | Description | Resolution |
|---|---|---|
Unable to establish a connection to the database server at host:port as it does not appear to be running | The CLI tried to connect to a database server running on the given host and port combination but was unable to establish a connection. This typically means one of two things:
|
|
Unable to authenticate to the database server at host:port. You do not have any SSH keys present in your configured identity Directory | The CLI tried to connect to a database server running on the given host and port combination. A connection was established successfully, but authentication to the database server failed because there are no SSH keys configured. | Create at least one SSH key and place it in the appropriate directory. |
Unable to authenticate to the database server at host:port. Your SSH key(s) from your configured identity directory are not in the authorized_keys file of the database or its owner | The CLI tried to connect to a database server running on the given host and port combination. A connection was established successfully but authentication to the database server failed because none of the SSH keys were in the authorized_keys file that the database is using. This may also be caused by the CLI selecting the wrong SSH identity. As described in the SSH identities section, the first identity found by searching several default locations is used, but this may not always be the desired identity. |
|
Host key for host host:port is not trusted, please run in interactive mode and trust this key or manually add the host key to your known_hosts file in your configured identity idDirectory | The CLI tried to connect to a database server running on the given host and port combination. A connection was successfully established but the database server was unable to prove its identity to the CLI because the host key provided by the database server was not trusted. This error is usually only seen the first time when a connection to a specific server instance is established. Once the key is trusted (see resolution steps) this error should no longer be seen for this host and port combination. |
|
Timed out attempting to establish a database connection (waited N seconds), database server may be too busy to service your request currently | The CLI tried to connect to a database server running on the given host and port combination but was unable to establish a connection within the timeout interval. This means that the database server is currently busy processing another request and cannot accept the request at this time. |
|
Server failed to start up | One or more of the CGE job steps failed to launch because CGE was not found. | Try relaunching CGE if the system displays this message. In addition, it is recommended to ensure that all compute nodes are correctly configured. In particular verify the following:
|
Not enough symmetric heap for new sorting keys | There is not enough symmetric heap for new sorting keys | use the -H option to cge-launch to set the symmetric heap value to a larger value. Try doubling what shows up by default near the top of the log for a start. Symmetric heap is a boundary value on a resource that is allocated as needed, so using a larger than necessary value does not mean that this value will be allocated. It only means that no more than this value will be allocated. It is better to overestimate by a bit than to underestimate. |
[PE_64]:inet_listen_socket_setup:inet_setup_listen_socket: bind failed port 20219 listen_sock = 5 Address already in use | This may be due to leftover cge-server processes | Follow the instructions documented in Terminate Orphaned cge-server Jobs |
Error: Timed out waiting for the server to start running | When a computational loop during a database build takes an extremely long time without producing any indication of forward progress (generally some kind of output in the log), cge-launch may decide that the start up sequence has hung and terminate it with this message. | Change the interval used to detect a start up hang from its default setting of 900 seconds to some longer interval. If you know the problem is just that a dataset is very computationally intensive to build and is prone to such timeouts, setting this timeout value to 3600 seconds (an hour) is almost certain to eliminate any chance of this failure at the expense of causing you to take a very long time to detect an actual hang in start up. To change this, use the --startupTimeout=seconds option to cge-launch. |
HTTP Errors are reported by a tool or API | A request submitted to the HTTP Interface provided by the cge-cli fe command was not successful. If the request was submitted via a tool or API then only minimal error details may be reported directly to you. However please see the resolutions for ways to find more detailed error information. |
|
:inet_listen_socket_setup :inet_setup_listen_socket : bind failed port 1371 listen_sock = 5 Address already in use | A previous cge-launch or HPC/mrun job failed or was killed, and the inet_listen socket is likely in the TIME_WAIT state on one or more of the compute nodes. | Wait 60-90 seconds for the inet_listen_socket (port 1371) to clear up from TIME_WAIT state. If the problem persists, the likely cause is some other program has an active socket connection to port 1371 on one (or more) compute nodes. That application must release port 1371 on the affected node(s) before new cge-launch or HPC/mrun jobs can be run on that node(s). |
User user does not have permission to perform operation operation | An action was requested for which the requesting user did not have the appropriate permissions |
|