Using HPE Ezmeral Data Fabric Monitoring (Spyglass Initiative)

HPE Ezmeral Data Fabric Monitoring (part of the Spyglass initiative) provides the ability to collect, store, and view metrics and logs for nodes, services, and jobs/applications.

Metric Monitoring

Administrators can monitor the current status of the cluster and anticipate future cluster requirements with dashboards. For example, you can use metrics dashboards to visualize the following:

Storage Utilization: Use metrics dashboards to monitor storage trends. For example, you can compare the volume of filesystem usage at different times to the filesystem capacity and then allocate resources to the filesystem accordingly.
Node Utilization: Use metrics dashboards to check for node overload. For example, if the CPU usage is high on a few nodes, you may want to distribute the load across more nodes for better performance and efficiency.
HPE Ezmeral Data Fabric Database Operational Trends: Use metrics dashboards to display historical trends for HPE Ezmeral Data Fabric Database operations. For example, if a user reports HPE Ezmeral Data Fabric Database slowness, the historical trends associated with row scans, get, and put operations can be used to identify the node(s) on which the performance degradation occurs.

Log Monitoring

Administrators can use dashboards to visualize, search, and review logs when troubleshooting issues. For example, you can use log dashboards to troubleshoot the following issues:

Service Failures: When metrics indicate that one or more services are down, use log dashboards to check the logs for each failed service and drill-down to each associated node.
Application Failures: When an application or job fails, use log dashboard to identify possible bottlenecks. For example, you can search the logs for a given application ID across all the nodes in the cluster.
filesystem Performance: When users experience filesystem or NFS for the HPE Ezmeral Data Fabric slowness, use log dashboards to search the HPE Ezmeral Data Fabric filesystem logs for service errors or application issues.