Describes how you can use the gfsck command, under the supervision of
HPE Ezmeral Data Fabric Support or
Engineering, to perform consistency checks and appropriate repairs on a volume, or a volume
snapshot.
You can use the gfsck command when the local fsck either
repairs or loses some containers at the highest epoch.
For an overview of using the GFSCK command, see Using Global File System Checking.
Although you need to be the root user to run this command, checking tiering-enabled volumes requires you to be the mapr user.
/opt/mapr/bin/gfsck
[-h] [--help]
[-c] [--clear]
[-d] [--debug]
[-b] [--dbcheck]
[-r] [--repair]
[-y] [--assume-yes]
[-Gquick] [--check-tiermetadata-only]
[-Gfull] [--check-tiermetadata-full]
[-Dquick] [--check-tierdata-presence]
[-Dfull] [--check-tierdata-crc]
[-J] [--skip-tier-log-replay]
[-D] [--crc]
[cluster=cluster-name (default=default)]
[rwvolume=volume-name (default=null)]
[snapshot=snapshot-name (default=null)]
[snapshotid=snapshot-id (default=0)]
[fid=fid (default=null)]
[cid=cid (default=0)]
[startCid=cid (default=0)]
[rIdx=<repl index>] (replication index, only enabled with [-D] [--crc]
[fidThreads=<check crc thread count for fid>] (default:16, max:128)
[cidThread=<check crc thread count for cid>] (default:16, max:128)
[scanthreads=inode scanner threads count (default:10, max:1000)]
-d|--debug-b|--dbcheckstartKey and endKey range. As this option is I/O
intensive, use this option only if you suspect database inconsistency.-r|--repair-GQuick, -GFull, -DQuick, and
-DFull. Repair is not supported for snapshots and mirrors.-y|--assume-yesgfsck
pauses for user input: yes to delete, no to exit gfsck,
or ctrl-C to quit.-D|--crcYou can use this option at the volume,
container, snapshot, and the filelet levels. gfsck reports corruption
found at each level.
User who must use this option:
root
clusterrwvolumefidrIdx option. You must use fid only
with the --crc option.cidrIdx option. The default value of
0 denotes that all containers are checked. You must use cid
only with the --crc option.--crc rwvolume=<volumename>.Use this option to start
verification from the specific container instead of starting from the first container
of that volume, If not provided, the --crc option checks the data CRC
of all the containers.
For example, assume that one particular volume has containers such as 205...2055...2900.. .. .. .. 3000 .. .. .. .. 5000.. .. .. .. .. 9999.
You can use the startCid option to start verification from container 3000, and all containers prior to 3000 will be skipped.
rIdxfid or
cid) of the copy of the data to check for errors. Use only with
-D or --crc and either fid or
cid.
For example, -D fid:2510.32.131204
rIdx=0 only checks the data for copy 1 of the specified fid.
fidThreads--crc
option.cidThreads--crc option.scanthreadssnapshotsnapshotid-Gquick|--check-tiermetadata-only-Gfull|--check-tiermetadata-full-Dquick|--check-tierdata-presence-Gquick or
-Gfull. Checks and reports if the object in the meta data tables
exists in the tier or not.-Dfull|--check-tierdata-crc-Gquick or
-Gfull. Validates the data CRC for the object in the meta data
tables.-J|--skip-tier-log-replayIn debug mode, run the gfsck command on the read/write volume named
mapr.cluster.root:
/opt/mapr/bin/gfsck rwvolume=mapr.cluster.root -dSample output is as follows:
Starting GlobalFsck:
clear-mode = false
debug-mode = true
dbcheck-mode = false
repair-mode = false
assume-yes-mode = false
cluster = my.cluster.com
rw-volume-name = mapr.cluster.root
snapshot-name = null
snapshot-id = 0
user-id = 0
group-id = 0
get volume properties ...
rwVolumeName = mapr.cluster.root (volumeId = 205374230, rootContainerId = 2049, isMirror = false)
put volume mapr.cluster.root in global-fsck mode ...
get snapshot list for volume mapr.cluster.root ...
starting phase one (get containers) for volume mapr.cluster.root(205374230) ...
container 2049 (latestEpoch=3, fixedByFsck=false)
got volume containers map
done phase one
starting phase two (get inodes) for volume mapr.cluster.root(205374230) ...
get container inode list for cid 2049
+inodelist: fid=2049.32.131224 pfid=-1.16.2 typ=4 styp=0 nch=0 dMe:false dRec: false
+inodelist: fid=2049.33.131226 pfid=-1.16.2 typ=2 styp=0 nch=0 dMe:false dRec: false
+inodelist: fid=2049.34.131228 pfid=-1.33.131226 typ=4 styp=0 nch=0 dMe:false dRec: false
+inodelist: fid=2049.35.131230 pfid=-1.16.2 typ=4 styp=0 nch=0 dMe:false dRec: false
+inodelist: fid=2049.36.131232 pfid=-1.16.2 typ=4 styp=0 nch=0 dMe:false dRec: false
+inodelist: fid=2049.38.262312 pfid=-1.16.2 typ=2 styp=0 nch=0 dMe:false dRec: false
+inodelist: fid=2049.39.262314 pfid=-1.38.262312 typ=1 styp=0 nch=0 dMe:false dRec: false
got container inode lists (totalThreads=1)
done phase two
starting phase three (get fidmaps & tabletmaps) for volume mapr.cluster.root(205374230) ...
got fidmap lists (totalFidmapThreads=0)
got tabletmap lists (totalTabletmapThreads=0)
done phase three
=== Start of GlobalFsck Report ===
file-fidmap-filelet union --
2049.39.262314:P --> primary (nchunks=0) --> AllOk
no errors
table-tabletmap-tablet union --
empty
orphan directories --
none
orphan kvstores --
none
orphan files --
none
orphan fidmaps --
none
orphan tables --
none
orphan tabletmaps --
none
orphan dbkvstores --
none
orphan dbfiles --
none
orphan dbinodes --
none
containers that need repair --
none
incomplete snapshots that need to be deleted --
none
user statistics --
containers = 1
directories = 2
kvstores = 0
files = 1
fidmaps = 0
filelets = 0
tables = 0
tabletmaps = 0
schemas = 0
tablets = 0
segmaps = 0
spillmaps = 0
overflowfiles = 0
bucketfiles = 0
spillfiles = 0
=== End of GlobalFsck Report ===
remove volume mapr.cluster.root from global-fsck mode (ret = 0) ...
GlobalFsck completed successfully (7142 ms); Result: verify succeeded To
verify if the object is present on the tier, run the gfsck command on
the tiering-enabled read/write volume named for_test5:
-Dfull as well. Replace
-Dquick with -Dfull.
/opt/mapr/bin/gfsck rwvolume=for_test5 -Gfull -DquickSample output is as follows:
Starting GlobalFsck:
clear-mode = false
debug-mode = false
dbcheck-mode = false
repair-mode = false
assume-yes-mode = false
cluster = Cloudpool19
rw-volume-name = for_test5
snapshot-name = null
snapshot-id = 0
user-id = 2000
group-id = 2000
get volume properties ...
put volume for_test5 in global-fsck mode ...
get snapshot list for volume for_test5 ...
starting phase one (get containers) for volume for_test5(16558233) ...
got volume containers map
done phase one
starting phase two (get inodes) for volume for_test5(16558233) ...
got container inode lists
done phase two
starting phase three (get fidmaps & tabletmaps) for volume for_test5(16558233) ...
got fidmap lists
got tabletmap lists
completed secondary index field path info gathering
completed secondary index consistency check
Starting DeferMapCheck..
completed DeferMapCheck
done phase three
=== Start of GlobalFsck Report ===
file-fidmap-filelet union --
no errors
table-tabletmap-tablet union --
empty
containers that need repair --
none
user statistics --
containers = 6
directories = 6
files = 1
filelets = 2
tables = 0
tablets = 0
=== End of GlobalFsck Report ===
Putting volume into TierGlobalFsck mode . . . . .
=== Start of TierGlobalFsck Report ===
TierVolumeGfsck completed, corruption not found
total number of containers scanned 6
total number of vcds verified 6722
total number of objects verified 18
total number of vcds skipped 0
total number of objects skipped 0
total number of vcds that need repair 0
total number of objects that need repair 0
=== End of TierGlobalFsck Report ===
removing volume from TierGlobalFsck mode
remove volume for_test5 from global-fsck mode (ret = 0)
GlobalFsck completed successfully (37039 ms); Result: verify succeeded# /opt/mapr/bin/gfsck -D fid=2085.32.131412 --debug
verifying data crc
mode = fid
fid = 2085.32.131412
debug-mode = true
repair-mode = false
cluster = default
replication index = -1
user-id = 0
group-id = 0
crc validate result for fid : 2085.32.131412
total local cluster/vcds verified : 51
total local cluster/vcds corrupted : 0
total local cluster/vcds skipped: 0
total purged cluster/vcds verified : 0
total purged cluster/vcds corrupted : 0
total purged cluster/vcds skipped: 0/opt/mapr/log/gfsck.log file. Sample
output is as follows:
/opt/mapr/bin/gfsck -D rwvolume=rocky
verifying data crc
mode = volume
rwVolumeName = rocky
fid thread count = 16
cid thread count = 16
debug-mode = false
repair-mode = false
cluster = default
replication index = -1
user-id = 0
group-id = 0
total containers : 6
total container skipped : 0
data crc verification completed with no errors