Monitor the Health of PCIe Channels
About the xtpcimon command.
Processors are connected to the high-speed interconnect network (HSN) ASIC through PCIe channels.
The xtpcimon command is executed from the System Management Workstation (SMW) and is started and run during the boot process.
Any PCIe-related errors are reported to stdout, unless directed to a log file.
xtpcimon also displays CLE-originated GHAL-based Advanced Error Reporting (AER) errors for PCIe.
If the optional /opt/cray/hss/default/etc/xtpcimon.ini initialization file is present, the xtpcimon command uses the settings provided in the file.
For more information, see the xtpcimon(8) man page.
Report PCIe-related errors to stdout
crayadm@smw> xtpcimon
starting
----> connection to event router made
121017 04:57:01 ############# ################# ##################
121017 04:57:01 Node Category Description
121017 04:57:01 ############# ################# ##################
Received all responses to request to start monitoring
121017 04:58:01 c0-0c0s7a0n1 CorrectableMemErr 0:0:0 AER Correctable: Non-fatal \
error (mask bit: 1)
121008 05:42:00 c0-0c1s6a0n2 CorrectableMemErr Link CRC error (cnt: 3)
121008 05:43:30 c0-0c1s6a0n2 Info Correctable/CRC error
Also refer to the XC Series SEC and check_xt Guide S-2542 to use system event rules for the Cray Simple Event Correlator (SEC) and the related check_xt utility.