gfsck

Describes how you can use the gfsck command, under the supervision of HPE Ezmeral Data Fabric Support or Engineering, to perform consistency checks and appropriate repairs on a volume, or a volume snapshot.

You can use the gfsck command when the local fsck either repairs or loses some containers at the highest epoch.

For an overview of using the GFSCK command, see Using Global File System Checking.

Permissions Required

Although you need to be the root user to run this command, checking tiering-enabled volumes requires you to be the mapr user.

Syntax

/opt/mapr/bin/gfsck
    [-h] [--help]
    [-c] [--clear]
    [-d] [--debug]
    [-b] [--dbcheck]
    [-r] [--repair]
    [-y] [--assume-yes]
    [-Gquick] [--check-tiermetadata-only]
    [-Gfull]  [--check-tiermetadata-full]
    [-Dquick] [--check-tierdata-presence]
    [-Dfull]  [--check-tierdata-crc]
    [-J] [--skip-tier-log-replay]
    [-D] [--crc]
    [cluster=cluster-name (default=default)]
    [rwvolume=volume-name (default=null)]
    [snapshot=snapshot-name (default=null)]
    [snapshotid=snapshot-id (default=0)]
    [fid=fid (default=null)]
    [cid=cid (default=0)]
    [startCid=cid (default=0)]
    [rIdx=<repl index>] (replication index, only enabled with [-D]  [--crc]
    [fidThreads=<check crc thread count for fid>] (default:16, max:128)
    [cidThread=<check crc thread count for cid>] (default:16, max:128)
    [scanthreads=inode scanner threads count (default:10, max:1000)]

Parameters

-h|--help

Description: Prints usage text

User who must use this option: Either root or mapr.

-c|--clear

Description: Clears previous warnings before performing the global filesystem check.

User who must use this option: Either root or mapr.

-d|--debug

Description: Provides information for debugging.

User who must use this option: Either root or mapr.

-b|--dbcheck

Description: Checks that every key in a tablet is within that tablet's startKey and endKey range. As this option is I/O intensive, use this option only if you suspect database inconsistency.

User who must use this option: root

-r|--repair

Description: Indicates and repairs the inconsistencies detected by -GQuick, -GFull, -DQuick, and -DFull. Repair is not supported for snapshots and mirrors.

User who must use this option: root

-y|--assume-yes

Description: If specified, assumes that containers without valid copies (as reported by CLDB) are deleted automatically. If not specified, gfsck pauses for user input: yes to delete, no to exit gfsck, or ctrl-C to quit.

User who must use this option: Either root or mapr.

-D|--crc

Description: Provides validation of the CRC of the data present in the volume. The data can either be local or offloaded.

You can use this option at the volume, container, snapshot, and the filelet levels. gfsck reports corruption found at each level.

User who must use this option: root

cluster

Description: Specifies the name of the cluster (default: default cluster)

User who must use this option: Either root or mapr.

rwvolume

Description: Specifies the name of the volume (default: default cluster)

User who must use this option: Either root or mapr.

fid

Description: Checks data CRC for the master copy of the specified fid. To check any other copy, use the rIdx option. You must use fid only with the --crc option.

User who must use this option: mapr

cid

Description: Checks data CRC for the master copy of the specified container ID. To check any other copy, use the rIdx option. The default value of 0 denotes that all containers are checked. You must use cid only with the --crc option.

User who must use this option: mapr

startCid

Description: startCid is only applicable with the option --crc rwvolume=<volumename>.

Use this option to start verification from the specific container instead of starting from the first container of that volume, If not provided, the --crc option checks the data CRC of all the containers.

For example, assume that one particular volume has containers such as 205...2055...2900.. .. .. .. 3000 .. .. .. .. 5000.. .. .. .. .. 9999.

You can use the startCid option to start verification from container 3000, and all containers prior to 3000 will be skipped.

User who must use this option: mapr

rIdx

Description: Specifies the index (either fid or cid) of the copy of the data to check for errors.

Use only with -D or --crc and either fid or cid.

For example, -D fid:2510.32.131204 rIdx=0 only checks the data for copy 1 of the specified fid.

User who must use this option: mapr

fidThreads

Description: Specifies the number of threads for scanning fids (default:16, max:128). You must use fidThreads only with the --crc option.

User who must use this option: mapr

cidThreads

Description: Specifies the number of threads for scanning container IDs (default:16, max:128). You must use cidThreads only with the --crc option.

User who must use this option: mapr

scanthreads

Description: Specifies the number of threads for scanning inodes (default:10, max:1000)

User who must use this option: Either root or mapr.

snapshot

Description: Specifies the name of the snapshot (default: null)

User who must use this option: Either root or mapr.

snapshotid

Description: Specifies the snapshot ID (default: 0)

User who must use this option: Either root or mapr.

Tier Options

-Gquick|--check-tiermetadata-only: Description: Checks if the entries in the meta data tables maintained internally for objects and tiers (the mapping between the Virtual Cluster Descriptor (VCD) map and object map) , are consistent, and reports an error if not.; User who must use this option: mapr
-Gfull|--check-tiermetadata-full: Description: Checks if the entries in the meta data tables maintained internally for objects and containers (the mapping between the VCD map and object map, along with the mapping between the VCD map and the MFS meta data), are consistent and reports an error if not.; User who must use this option: mapr
-Dquick|--check-tierdata-presence: Description: Specified with either -Gquick or -Gfull. Checks and reports if the object in the meta data tables exists in the tier or not.; User who must use this option: mapr
-Dfull|--check-tierdata-crc: Description: Specified with either -Gquick or -Gfull. Validates the data CRC for the object in the meta data tables.; User who must use this option: mapr
-J|--skip-tier-log-replay: Description: Skips replaying transactions from internal dot files if a tier operation ends abruptly. Data Fabric recommends that you use this option when running the GFSCK utility on tiered volumes.; User who must use this option: Either root or mapr.

Examples

Debug Mode

In debug mode, run the gfsck command on the read/write volume named mapr.cluster.root:

/opt/mapr/bin/gfsck rwvolume=mapr.cluster.root -d

Sample output is as follows:

Starting GlobalFsck:
  clear-mode            = false
  debug-mode            = true
  dbcheck-mode          = false
  repair-mode           = false
  assume-yes-mode       = false
  cluster               = my.cluster.com
  rw-volume-name        = mapr.cluster.root
  snapshot-name         = null
  snapshot-id           = 0
  user-id               = 0
  group-id              = 0

  get volume properties ...
    rwVolumeName = mapr.cluster.root (volumeId = 205374230, rootContainerId = 2049, isMirror = false)

  put volume mapr.cluster.root in global-fsck mode ...

  get snapshot list for volume mapr.cluster.root ...

  starting phase one (get containers) for volume mapr.cluster.root(205374230) ...
    container 2049 (latestEpoch=3, fixedByFsck=false)
    got volume containers map
  done phase one

  starting phase two (get inodes) for volume mapr.cluster.root(205374230) ...
    get container inode list for cid 2049
      +inodelist: fid=2049.32.131224 pfid=-1.16.2 typ=4 styp=0 nch=0 dMe:false dRec: false
      +inodelist: fid=2049.33.131226 pfid=-1.16.2 typ=2 styp=0 nch=0 dMe:false dRec: false
      +inodelist: fid=2049.34.131228 pfid=-1.33.131226 typ=4 styp=0 nch=0 dMe:false dRec: false
      +inodelist: fid=2049.35.131230 pfid=-1.16.2 typ=4 styp=0 nch=0 dMe:false dRec: false
      +inodelist: fid=2049.36.131232 pfid=-1.16.2 typ=4 styp=0 nch=0 dMe:false dRec: false
      +inodelist: fid=2049.38.262312 pfid=-1.16.2 typ=2 styp=0 nch=0 dMe:false dRec: false
      +inodelist: fid=2049.39.262314 pfid=-1.38.262312 typ=1 styp=0 nch=0 dMe:false dRec: false
    got container inode lists (totalThreads=1)
  done phase two

  starting phase three (get fidmaps & tabletmaps) for volume mapr.cluster.root(205374230) ...
    got fidmap lists (totalFidmapThreads=0)
    got tabletmap lists (totalTabletmapThreads=0)
  done phase three

=== Start of GlobalFsck Report ===

file-fidmap-filelet union --
2049.39.262314:P     --> primary (nchunks=0)      --> AllOk
no errors

table-tabletmap-tablet union --
empty

orphan directories --
none

orphan kvstores --
none

orphan files --
none

orphan fidmaps --
none

orphan tables --
none

orphan tabletmaps --
none

orphan dbkvstores --
none

orphan dbfiles --
none

orphan dbinodes --
none

containers that need repair --
none

incomplete snapshots that need to be deleted --
none

user statistics --
containers          = 1
directories         = 2
kvstores            = 0
files               = 1
fidmaps             = 0
filelets            = 0
tables              = 0
tabletmaps          = 0
schemas             = 0
tablets             = 0
segmaps             = 0
spillmaps           = 0
overflowfiles       = 0
bucketfiles         = 0
spillfiles          = 0

=== End of GlobalFsck Report ===

remove volume mapr.cluster.root from global-fsck mode (ret = 0) ...

GlobalFsck completed successfully (7142 ms); Result: verify succeeded

To verify if the object is present on the tier, run the gfsck command on the tiering-enabled read/write volume named for_test5:

Note: This example is valid for -Dfull as well. Replace -Dquick with -Dfull.

/opt/mapr/bin/gfsck rwvolume=for_test5 -Gfull -Dquick

Sample output is as follows:

Starting GlobalFsck:
  clear-mode            = false
  debug-mode            = false
  dbcheck-mode          = false
  repair-mode           = false
  assume-yes-mode       = false
  cluster               = Cloudpool19
  rw-volume-name        = for_test5
  snapshot-name         = null
  snapshot-id           = 0
  user-id               = 2000
  group-id              = 2000

  get volume properties ...

  put volume for_test5 in global-fsck mode ...

  get snapshot list for volume for_test5 ...

  starting phase one (get containers) for volume for_test5(16558233) ...  
    got volume containers map

done phase one

  starting phase two (get inodes) for volume for_test5(16558233) ...
    got container inode lists
  done phase two

  starting phase three (get fidmaps & tabletmaps) for volume for_test5(16558233) ...
    got fidmap lists
    got tabletmap lists
    completed secondary index field path info gathering
    completed secondary index consistency check
    Starting DeferMapCheck..
    completed DeferMapCheck
  done phase three

  === Start of GlobalFsck Report ===

  file-fidmap-filelet union --
    no errors

  table-tabletmap-tablet union --
    empty

  containers that need repair --
    none

  user statistics --
    containers          = 6
    directories         = 6
    files               = 1
    filelets            = 2
    tables              = 0
    tablets             = 0

  === End of GlobalFsck Report ===
Putting volume into TierGlobalFsck mode . . . . .

=== Start of TierGlobalFsck Report ===
TierVolumeGfsck completed, corruption not found
  total number of containers scanned           6
  total number of vcds verified                6722
  total number of objects verified             18
  total number of vcds skipped                 0
  total number of objects skipped              0
  total number of vcds that need repair        0
  total number of objects that need repair     0
=== End of TierGlobalFsck Report ===

removing volume from TierGlobalFsck mode
remove volume for_test5 from global-fsck mode (ret = 0)

GlobalFsck completed successfully (37039 ms); Result: verify succeeded

Example (Verifying CRC of FIlelet)

# /opt/mapr/bin/gfsck  -D fid=2085.32.131412  --debug
verifying data crc
  mode          =       fid
  fid           =       2085.32.131412
  debug-mode    =       true
  repair-mode   =       false
  cluster       =       default
  replication index     =       -1
  user-id       =       0
  group-id      =       0

crc validate result for fid : 2085.32.131412
  total local cluster/vcds verified : 51
  total local cluster/vcds corrupted : 0
  total local cluster/vcds skipped: 0
  total purged cluster/vcds verified : 0
  total purged cluster/vcds corrupted : 0
  total purged cluster/vcds skipped: 0

Example (Verifying CRC at a Container Level)

For CRC checks at the container level, the output is not displayed on the terminal. Instead it is written to the /opt/mapr/log/gfsck.log file. Sample output is as follows:

/opt/mapr/bin/gfsck  -D rwvolume=rocky
verifying data crc
  mode          =       volume
  rwVolumeName          =       rocky
  fid thread count      =       16
  cid thread count      =       16
  debug-mode    =       false
  repair-mode   =       false
  cluster       =       default
  replication index     =       -1
  user-id       =       0
  group-id      =       0
  total containers : 6
  total container skipped : 0
  data crc verification completed with no errors