InfiniBand Features
Describes InfiniBand topology baseline, error and performance metrics, and sample rate modification
InfiniBand (IB) performance, configuration, and errors are collected for any attached InfiniBand networks.
IB Topology Baseline
When the monitoring system is first initialized, the InfiniBand configuration is detected and a baseline configuration is saved on the View for ClusterStor™ server for future comparison. This initial configuration is considered to be the IB topology baseline. In subsequent discovery, comparisons will be made against the baseline to determine if the topology has changed.
- A topology change event is logged in the View for ClusterStor logs and in the metric database. The log details of the topology changes can be retrieved via the IB Page in Kibana.
- The topology change is posted to the time-series database and an annotation will be created.
- Alarms are triggered and notifications sent to the configured email address.
The saved IB baseline configuration can be updated using the procedure described in Reset InfiniBand Topology Baseline.
Error and Performance Metrics
In addition to collecting and storing the IB topology baseline, error and performance metrics are gathered. For a list of the InfiniBand metrics gathered, see InfiniBand Metrics. The error metrics are gathered every 60 seconds. The extended metrics for transmits and receives are gathered every 30 seconds.
Modify Sample Rates
- Extended performance metrics
- Error metrics
- IB topology updates
EXTENDED_METRICS_INTERVAL=30 ERROR_METRICS_INTERVAL=60 IBTOPOLOGY_INTERVAL=900Note that the collector agent polls on a 15 second interval, which places a lower limit on the sample intervals of the three classes of metrics.
Changes made to the /etc/sma-data/etc/.env file will not take place until View for ClusterStor is restarted.