Examine System Logs

Use the system logs to troubleshoot boots.

Various logs receive entries during the boot process that can indicate boot problems.

systemd Journal

The systemd init system takes over the boot process after initrd. Use the journalctl -a to display all kernel messages and other information in the systemd journal. Using journalctl -f displays the most recent journal entries and continuously prints new entries. systemd stores messages in a custom database, the systemd journal. The information available in the journal includes:
  • syslogd messages
  • Kernel log messages
  • initrd messages
  • Messages written to stdout/stderr for all services

HSS Daemon Logs

The HSS daemons and the rsyslogd daemon running on the SMW logs to files in the /var/opt/cray/log directory. These daemons include nimsd, xtpmd, xtremoted, xtpowerd, xtsnmpd, xtdiagd, erfsd, state_manager, bootmanager, sedc_manager, nid_mgr, erdh, and erd.

SMW Command Log

The /var/opt/cray/log/commands log lists the commands issued from the SMW console.

CLE Boot Logs

The output from booting CLE is in the /var/opt/cray/log/p0-current log. For more detailed information, go to the p0-current directory and examine these log files:
  • bootinfo.timestamp

    Contains output from the xtbootsys command. Timing information for how long sections of the boot process take is listed at the bottom of this file.

  • console-YYYYMMDD

    Contains the combined console output from every node. To find Ansible failures for a node during init, search for cray-ansible: serial start of play type cle FAILED in init phase