Lists the nuances of monitoring clusters.
Monitoring a Secure Cluster
- After regenerating the HPE Ezmeral Data Fabric user ticket, service failures
occur for
collectd and OpenTSDB
- If you delete or regenerate the HPE Ezmeral Data Fabric user ticket, the running
collectd and OpenTSDB services will fail. After updating the HPE Ezmeral Data Fabric user ticket, restart collectd and OpenTSDB services.
Monitoring Logs
- I notice a sudden increase in
fluentd logs. What can I do?
- A sudden increase in the log file for
fluentd could mean that a
feedback loop is occurring where fluentd logs an error in the log file
for a fluentd issue and that log entry causes yet another error when
fluentd tries to parse it. In this case, consider disabling the index
of fluentd logs. See Configure Logs to Index.
- I see "400 - Rejected by Elasticsearch" messages in the fluentd logs. What can I
do?
- Messages such as the following can accumulate in the
fluentd log when
a process does not produce logs with valid UTF-8 output:
2019-04-25 17:00:11 -0700 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Elasticsearch
ErrorHandler::ElasticsearchError error="400 - Rejected by Elasticsearch" location=nil
after setting this option in es_config.conf
- In a message such as the following, you might see invalid characters represented as a
diamond with a question mark:
. The "service_name":"collectd" part
of the message indicates that collectd is generating the invalid UTF-8
output:[2019-04-30T19:06:29,495][DEBUG][o.e.a.b.TransportShardBulkAction] [mfs73] [mapr_monitoring-2019.05.01][4] failed
to execute bulk item (index) index {[mapr_monitoring-2019.05.01][mapr_monitoringv1][taQkcWoBCeW3tMAsn1cW],
source[{"my_event_time":"2019-04-30 18:36:39","level":"info","message":"write_maprstreams plugin: Produced:
Offset: 1247132; Size: 152; [{\"metric\":\"mapr.streams.produce_msgs\",\"value\":448,\"tags\":{\"fqdn\":\
"qa-node91.qa.lab\",\"clusterid\":\"6378079583755418855\",\"clustername\":\"my.cluster.com\"}}]
�\n","@timestamp":"2019-04-30T18:36:39.000000000-07:00","service_name":"collectd"}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse field [message] of type [text]
Caused by: com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 middle byte 0x5c
- One workaround is to comment out the log producing the invalid character. You can do
this in the
fluentd.conf file. For more information, see Configure Logs to Index.
- Another workaround is to fix the application that produces the error message. If the
log file is produced by an application that you control, change the output of the log
producing the invalid character.
Monitoring Metrics
- Where should I store the Elasticsearch index?
- Elasticsearch requires a lot of disk space. Also, when you upgrade Elasticsearch, the
default index directory is removed along with the package update. Therefore, it is
recommended to configure a separate filesystem for the index data. It is not recommended
to store index data under the
/ or the /var
filesystem. Note: If you store the Elasticsearch index on a filesystem that is locally
hosted, you will be able to access logs in the event that the HPE Ezmeral Data Fabric cluster is
not available.
- For more information about the Elasticsearch index and the default index directory,
see Log Aggregation and Storage.
- I see a "Bad Request" error message for my HPE Ezmeral Data Fabric Database metrics? What can I do?
- If you have more than 1000 active tables in HPE Ezmeral Data Fabric Database and
the HPE Ezmeral Data Fabric monitoring request size to OpenTSDB is more than 4 KB, you may see
the following error message:
"Sorry but your request was rejected as being invalid.
The reason provided was: Chunked request not supported."
You
can increase the maximum request size of OpenTSDB to up to 64 KB by setting the
following parameters in the opentsdb.conf file:
tsd.http.request.enable_chunked=true
tsd.http.request.max_chunk=65536
- For more information, see the OpenTSDB configuration guide.