Bulkload operations can be performed as a full bulkload or as an incremental bulkload.
The most common way of loading data into a HPE Ezmeral Data Fabric Database Binary Tables is with a put operation. However, at large scales, bulk loads offer a performance advantage over put operations.
Bulk loading is supported by the following tools, which can be used for both full and incremental bulkload operations:
hbase com.mapr.fs.hbase.tools.mapreduce.CopyTableImportFiles utility which imports HFile or Result files into
HPE Ezmeral Data Fabric Database binary tables. For example:
hbase com.mapr.fs.hbase.tools.mapreduce.ImportFiles
-Dmapred.reduce.tasks=2
-inputDir < input directory, for example: /test/tabler.kv >
-table < table name, for example: /table2 >
[ -format < Result|HFile > ]
[ -sample < true|false > ]
[ -mapOnly < true|false > ]Full bulkload operations offer the best performance advantage because it skips the
write-ahead log (WAL) typical of HPE Ezmeral Data Fabric Database binary table operations. Full bulkload
operations can only be performed on empty tables that have the bulkload
attribute set to true. This value is set only when creating a table.
When you set the bulkload attribute, you cannot enable replication on
the table. Since this effectively disables logging on the table, HPE Ezmeral Data Fabric Database also does not
capture log data that Elasticsearch can use to index the table.
To create a HPE Ezmeral Data Fabric Database binary table for bulkloading, use one of the following:
maprcli table
create command with tthe -bulkload parameter set to
true.
create command with the
BULKLOAD parameter set to true. For example:
hbase> create '/a0','f1', BULKLOAD => 'true'BULKLOAD parameter from
the SPLITS parameter. For example:
hbase> create '/t1', 'f1', {SPLITS => ['10', '20', '30']}, {BULKLOAD => 'true'} Control System with Will table be bulkload? option set to Yes under table PROPERTIES.
maprcli table edit command to set the
bulkload parameter to TRUE again.alter command to set the
BULKLOAD parameter to TRUE again.Incremental bulk loads can add data to existing tables concurrently with other table operations, with better performance than put operations. This type of bulk load makes use of write-ahead log files.
You can use incremental bulk loads to ingest large amounts of data to an existing table. Tables remain available for standard client operations such as put, get, and scan while the bulk load is in process. A table can perform multiple incremental bulk load operations simultaneously.
maprcli table create command,
with the hbase shell’s create command, or in the Control System,
incremental loads are supported by default.