System Monitoring

Intro to system monitoring procedures available on the Health tab

Sonexion System Manager's monitoring solution is built on Icinga, open-source software that is proven for monitoring health and performance of large networks of Linux servers. Icinga's monitoring system checks user specified hosts and services, and provides notifications when things go wrong and when they recover.

System monitoring is provided on the Sonexion System Manager Health tab. For more information about the Health tab, please see the Health Tab topic.

The remainder of this section focuses on the procedures available to view system status, view performance data, and generate reports.

Cray® View for ClusterStor™

In addition to the monitoring capabilities provided through the CSM GUI, Cray offers the View for ClusterStor application. View for ClusterStor is a sophisticated monitoring and metric software package that collects and persists performance and job metrics specific to ClusterStor storage systems. View for ClusterStor collects Lustre performance, jobs metrics, and system events specific to the storage system. Additionally, system logs, system metrics, and system events from each ClusterStor storage system can be configured to be monitored and will collect and persist ibstats metrics from the InfiniBand fabric if connected to the ClusterStor high speed InfiniBand network.

Cray View for ClusterStor includes:
  • Job Runtime Variability: Real-time and historical views of data to help adminstrators understand what is impacting user jobs
  • Event Correlation: A unified view of the system, providing administrators with the ability to correlate systemic events that impact performance
  • Trend Analysis: Data-driven analysis and visualization from historical data can help identify trends that can then be used to shape changes to the system
  • Alerting: Threshold engine enables customized alerts based on any metric
The easy-to-use View for ClusterStor interface and visualization capabilities provide always-on insight into what applications are doing, without impacting performance. With detailed visibility into how resources are utilized by applications, administrators are finally enabled to make impactful optimizations.

For more information about View for ClusterStor, please contact your organization's Cray account manager.