Check File Systems Using e2fsck Command

Build disk arrays and check the consistency of file systems using the e2fsck command

There may be times when it is necessary to run e2fsck on one or more Lustre MDT or OST target devices (e.g., following a hard node failure or disk drive fallout). Some problems can result in the superblock state flag being marked clean with errors. This will prevent the corresponding fsys resource from starting until e2fsck has been run to verify and clean the device, in which case, 'crm_mon' will report an fsys start problem, failed action not configured.

Use the following procedure to run e2fsck on an individual MDT or OST device. The steps in the procedure describe running e2fsck on an OST. Note that if e2fsck needs to be run on the full file system, Lustre must first be unmounted. For a system with NXD enabled (generally L300N), disable NXD before unmounting Lustre ([admin@cls12345n000]$ cscli nxd disable).

To unmount Lustre, run the following command from the active MGMT node:

[admin@cls12345n000]$ cscli unmount -f fs_name

Log in, via SSH, to the OSS node containing the devices where the e2fsck command is to be run. The following example assumes the OSS node is cls12345n004.
```
[root@cls12345n000]# ssh cls12345n004
```

Assemble the RAID device if the target disk array is not already assembled.

Run crm_mon -1r to determine the corresponding raid-group name from the command output.

If crm_mon -1r showed that the RAID device is not assembled on either node on the HA pair, assemble the RAID device using the raid-group name determined from the crm_mon -1r command output:

[root@cls12345n000]# mdraid-activate -d cls12345n004_md0-group

The -d option to mdraid-activate is for starting the mdraid device only; the fsys start will not be attempted.

Following is sample output from the assembly of a GridRAID md0 device on cls12345n004:

[root@cls12345n004 ~]# mdraid-activate -d cls12345n004_md0-group
info try: mdadm --assemble --name=cls12345n004:md128 --config=/var/lib/mdraidscripts/mdadm.conf --run /dev/md/cls12345n004:md128 /dev/disk/by-id/wwn-0x5000c50030128fc7-part1 /dev/disk/by-id/wwn-0x5000c5003012639b-part1
debug raid cls12345n004:md128 started using 1 tries, now waiting for device node to show up on Wed Dec 5 13:46:03 CST 2018
info try: mdadm --assemble --name=cls12345n004:md129 --config=/var/lib/mdraidscripts/mdadm.conf --run /dev/md/cls12345n004:md129 /dev/disk/by-id/wwn-0x5000c50030128fc7-part2 /dev/disk/by-id/wwn-0x5000c5003012639b-part2
debug raid cls12345n004:md129 started using 1 tries, now waiting for device node to show up on Wed Dec 5 13:46:04 CST 2018
info balance targets on HBA: 0000:11:00.0=14 0000:10:00.0=27 
info try: mdadm --assemble --name=cls12345n004:md0 --config=/var/lib/mdraidscripts/mdadm.conf --bitmap=/WIBS/cls12345n004:md0/WIB_cls12345n004:md0 --run /dev/md/cls12345n004:md0 
/dev/disk/by-id/wwn-0x5000c50085097963 /dev/disk/by-id/wwn-0x5000c5006246b563 
/dev/disk/by-id/wwn-0x5000c5006246b47f /dev/disk/by-id/wwn-0x5000c5006246ad9b 
/dev/disk/by-id/wwn-0x5000c5006246a623 /dev/disk/by-id/wwn-0x5000c50062469bff 
/dev/disk/by-id/wwn-0x5000c500593e29e7 /dev/disk/by-id/wwn-0x5000c500593e1f97 
/dev/disk/by-id/wwn-0x5000c500593dfb27 /dev/disk/by-id/wwn-0x5000c500593dc5ab 
/dev/disk/by-id/wwn-0x5000c500593dc253 /dev/disk/by-id/wwn-0x5000c500593da2d7 
/dev/disk/by-id/wwn-0x5000c500593da03f /dev/disk/by-id/wwn-0x5000c500593d998f 
/dev/disk/by-id/wwn-0x5000c500593920cf /dev/disk/by-id/wwn-0x5000c500593919f3 
/dev/disk/by-id/wwn-0x5000c50059390f1f /dev/disk/by-id/wwn-0x5000c50059390e67 
/dev/disk/by-id/wwn-0x5000c50059390e1f /dev/disk/by-id/wwn-0x5000c50059384067 
/dev/disk/by-id/wwn-0x5000c5005937f913 /dev/disk/by-id/wwn-0x5000c500591f737f 
/dev/disk/by-id/wwn-0x5000c500591f6863 /dev/disk/by-id/wwn-0x5000c500591f3caf 
/dev/disk/by-id/wwn-0x5000c500591f2ae3 /dev/disk/by-id/wwn-0x5000c500591cb3af 
/dev/disk/by-id/wwn-0x5000c500591ac853 /dev/disk/by-id/wwn-0x5000c500591ab947 
/dev/disk/by-id/wwn-0x5000c500591ab7eb /dev/disk/by-id/wwn-0x5000c500591ab66f 
/dev/disk/by-id/wwn-0x5000c5005912934f /dev/disk/by-id/wwn-0x5000c500590864df 
/dev/disk/by-id/wwn-0x5000c50059086283 /dev/disk/by-id/wwn-0x5000c500590855ab 
/dev/disk/by-id/wwn-0x5000c500590852e3 /dev/disk/by-id/wwn-0x5000c50059083303 
/dev/disk/by-id/wwn-0x5000c50059082b67 /dev/disk/by-id/wwn-0x5000c50058e01ab7 
/dev/disk/by-id/wwn-0x5000c50058df8baf /dev/disk/by-id/wwn-0x5000c50058df39c3 
/dev/disk/by-id/wwn-0x5000c50058bb12a7
debug raid cls12345n004:md0 started using 1 tries, now waiting for device node to show up on Wed Dec 5 13:46:06 CST 2018
assembled cls12345n004:md0 in 1 tries

In the above output, the OST device named md0 in the line below is the device of interest:

assembled cls12345n004:md0 in 1 tries

The other mdraid devices are journaling and backup devices.

Confirm that the device is not mounted by running the following command on the node where e2fsck is to be run, and on the HA partner node.
```
[root@cls12345n004 ~]# mount -t lustre | grep md0
[root@cls12345n004 ~]# ssh cls12345n05 mount -t lustre | grep md0
```
The command will show no output if the Lustre target is not mounted.
Note: For an L300N system, if the device to be checked is an OST the mounted device paths are /dev/mapper/... rather than /dev/dm-*. The command ls -l /dev/mapper will show the short dm-* paths corresponding to the long /dev/mapper/nytroxd-md-* paths.
Unmount Lustre on the desired device, if needed (per the following caveat).
When running e2fsck, the device to be checked needs to be assembled, but Lustre must not be mounted. If steps 2 and 3 indicate that the array is assembled and Lustre is mounted, then use commands similar to the following on the desired device to unmount Lustre on that device only:
```
[root@cls12345n004 ~]# crm_resource stop cls12345n004_md0-stop
[root@cls12345n004 ~]# crm_resource stop cls12345n004_md0-fsys
```
Run the e2fsck command. It is best to begin with a read-only e2fsck first—to assess the state of the device—before allowing e2fsck to make repairs.
```
[root@cls12345n004]# e2fsck -nvf -tt /dev/md0 > /tmp/e2fsck_n.n04.md0.\
`date +%Y%m%d%H%M` 2>&1
```
Note: For an L300N system, if the device to be checked is an OST, the device path used for e2fsck is /dev/dm-X rather than /dev/mdX. The /dev/dm-X device name is not guaranteed to be persistent; for example, /dev/dm-0 on n004 will not always be OST0. Use the e2label command "e2label /dev/dm-X" to verify that the correct /dev/dm-X device is used.
Examine the output from the e2fsck command. Contact Cray/HPE Service for assistance if any serious problems are reported or if it is not clear how to interpret the e2fsck output.
If there are no serious problems that require additional analysis, run e2fsck with the -p 'preen' option. This will automatically repair problems deemed "safe" to correct without additional assistance.
```
[root@cls12345n004]# e2fsck -pvf -tt /dev/md0
```
After any necessary repairs have been made to the Lustre target device, proceed with the next step.
Remount Lustre on the desired device, if needed (per the following caveat).
If crm_resource was used to unmount Lustre from the array (see step 4), then use commands similar to the following to remount Lustre. (If not, skip to the next step.)
```
[root@cls12345n004 ~]# crm_resource start cls12345n004_md0-stop
[root@cls12345n004 ~]# crm_resource start cls12345n004_md0-fsys
```
If this step was needed and was run, this procedure is complete; do not proceed to the next steps.
If this step was not run, proceed to step 8 through step 10.
Deactivate the RAID device to clean up and return the system to a state ready to mount for normal Lustre operation:
```
[root@cls12345n004]# mdraid-deactivate cls12345n004_md0-group
```
If crm_mon -1r shows a failed action for the fsys start, run clean_xyraid to clear the failed action:
```
[root@cls12345n004]# clean_xyraid cls12345n044_md0-group
```
Restart the device or, if the entire Lustre file system was shut down before running e2fsck, restart the file system:
To restart the device:
```
[root@cls12345n004]# start_xyraid cls01234n004_md0-group
```
To restart Lustre:
```
[admin@cls12345n000]$ cscli mount -f fs_name
```
Note: For an L300N system, re-enable NXD if applicable:
```
[admin@cls12345n000]$ cscli nxd enable
```