Debug Ansible Failures During System Boot

Basic information about debugging options when system boot fails.

Ansible runs in init and Ansible runs a second time after systemd completes the boot process. Ansible failures in init cause the affected node to drop into a debug shell for node access via xtcon for troubleshooting. When the debug shell is exited, Ansible is re-executed in init. A node's boot does not proceed until the first run of cray-ansible in init is successful.

The Ansible callback plugin captures any file changes made by Ansible file modules and stores a record of these changes in log files located at /var/opt/cray/log/ansible/changelog. The plugin provides detailed failure information, including the path to the task file being executed and any config set variable references in the task file.

Ansible logs under /var/opt/cray/log/ansible are collected via cdump and xtdumpsys. In addition, xtdumpsys collects the files from running nodes, changed by Ansible according to the changelog callback plugin. When possible, Ansible Cray-provided plays create a backup of files modify by a play to let the administrator to perform a diff of these files to see the changes made by Ansible. Administrators can use the ansible_cfg_search command to examine an image and a config set. This command outputs a list of variables and the Ansible files that accessed each variable.

See also XC™ Series Boot Troubleshooting Guide (S-2565).