Describes the erasure coding (EC) schemes for data protection and recovery.
Erasure coding (EC) is a data protection method in which data is broken into fragments, expanded and encoded with redundant data pieces, and stored across a set of different locations or storage media.
EC ensures that if data becomes corrupted, it can be reconstructed using information about the data that is present elsewhere.
The time required to reconstruct data depends on the number of data fragments in the chosen EC scheme, and the number of failures that have occurred. For example, reconstruction of EC scheme 10+2 takes longer compared to the reconstruction of EC scheme 3+2, as a larger number of data blocks must be read.
There are two kinds of EC schemes that you can use:
m+n scheme, the system must
read a minimum of m other blocks to reconstruct data.As an administrator, consider the following points when selecting an EC scheme:
In an erasure coded volume, an erasure coding scheme without Local Parity has the stripe layout m+n. The stripe is an array of m data fragments and n parity fragments.
Each fragment is called a stripelet. Each stripelet is present on a container and one stripe is across different containers on different nodes. The default stripelet size is 4MB. For an EC scheme 4+2 for example, the stripe size is 24MB.
A Container Group (CG) is collection of such stripes. Based on the maximum size of a container (32 GB), the maximum number of stripes in a CG is 8K.
Each stripe is created by the same number of data fragments from all containers in the group of EC containers. Each container is placed on a different physical node.
For example, assume m=4, n=2, and stripe depth=4 MB.

Select from the following schemes for erasure-coded volumes:
| EC Scheme | Number of Data Nodes | Number of Parity Nodes | Total Number of Nodes Needed | Number of Failures Recoverable | Number of Nodes to Read to Recover Data | |
|---|---|---|---|---|---|---|
| CLI | Control System | |||||
3+2 |
3, 2 | 3 | 2 | 5 | 2 | 3 |
4+2 |
4, 2 | 4 | 2 | 6 | 2 | 4 |
5+2 |
5, 2 | 5 | 2 | 7 | 2 | 5 |
6+3 |
6, 3 | 6 | 3 | 9 | 3 | 6 |
10+<x> where x is a value from 1 to 9 |
N.A | 10 | x | 10+x | x | 10 |
Although you can create a volume without the required number of nodes for a specific scheme, volume offload fails if the required number of nodes are not present.
When choosing the scheme, note that more nodes leads to longer recovery time, resulting in degraded performance, network expense, and lengthy time to rebuild.If you anticipate only a single failure, use an EC scheme with local parity, as the number of nodes needed to be read for recovery is fewer, when compared to an EC scheme without local parity.
For example, consider a 12 + 4 EC scheme
represented as D0 + D1 + D2 +....+D10 + D11 + P0 +…+P3
Suppose node D4 goes down, now to rebuild, a total of 12 stripelets must be read. This leads to huge performance degradation in network bandwidth, CPU cycles, and Disk IO .
To reduce the reconstruction cost, use EC Local Parity, where the number of stripelets to be read reduces to 6 for a single failure in the 12+2+2 scheme.Choosing an EC scheme with local parity, reduces EC storage overhead without incurring high rebuild costs and longer rebuild times while lowering the probability of data loss. HPE recommends the following tested local parity schemes:
| EC Scheme | Number of Data Nodes | Number of Local Parity Nodes | Number of Global Parity Nodes | Total Number of Nodes Needed | Number of Failures Recoverable | Number of Nodes to Read to Recover Data | |
|---|---|---|---|---|---|---|---|
| CLI | Control System | ||||||
| 10+2+2 | N/A | 10 | 2 | 2 | 14 |
|
|
12+2+2 |
N/A | 12 | 2 | 2 | 16 |
|
|
Consider reading the following technical discussion only if you want to specify other local parity schemes.
Technical discussion on parity schemes
Local parity is calculated from a subset of data blocks.
Consider a 10+2 scheme without local parity.

In this example, if block D6 fails, the system needs to read a minimum of 10 other blocks, to recover data.
Now consider a 10+2+2 scheme with local parity. In this case, data is divided into two (2) segments, each containing five (5) data blocks, with a local parity for each segment. The global parity blocks are common to both segments. To recover from a single failure, the system must read only the four (4) remaining blocks in the affected segment, and the corresponding local parity block:

In this example, since there is only a single failure in a segment (block D5), the system must only read blocks D1+D2+D3+D4+L1 (which is the local parity of this data segment). Recovery is much faster and more efficient, due to the local parity block.
Points to note for using an erasure coding scheme with local parity

With local parity, the system recovers from a maximum of g+l failures, in certain cases. For the 10+2+2 scheme, the system can recover from a maximum of 4 failures, in certain cases. For example:

Here, there are 4 failures. The required number of blocks to read (10 in this case) are available, for recovery. The system reads D1+D2+D3+D6+D9+D10+L1+L2+G1+G2.
However, consider the following example:

Although there are 4 failures, the system does not have the required number of blocks (10) to read and recover. The only blocks that can be read are D1+D2+L1+D6+D7+D8+D9+D10+G1. Block L2 cannot be read because there are no failures in its corresponding data blocks D6 to D10. Therefore in this case, data cannot be recovered.