To set up the Change Data Capture (CDC) feature, the following must exist or be created: a HPE Ezmeral Data Fabric Database source table (JSON or binary), a HPE Ezmeral Data Fabric Event Store changelog stream, a HPE Ezmeral Data Fabric Event Store stream topic, and a HPE Ezmeral Data Fabric Database table changelog relationship between the source table and the destination stream topic.
The destination HPE Ezmeral Data Fabric Event Store stream can be in same cluster as the HPE Ezmeral Data Fabric Database source table or it can be on a remote HPE Ezmeral Data Fabric cluster. If you are propagating changed data from a source table on a source cluster to a destination stream topic on a remote destination cluster, a gateway must be setup. Gateways are setup by installing the gateway on the destination cluster and specifying the gateway node(s) on the source cluster. See Administering Data Fabric Gateways and Configuring Gateways for Table and Stream Replication.
The following diagram shows a simple CDC data model, with one source table to one destination topic on one stream. Because this scenario has the destination stream topic on a remote destination cluster, a gateway must be setup and configured.
// Create Volume for table
maprcli volume create -name tableVolume -path /tableVolume
// Create Binary table
maprcli table create -path /tableVolume/cdcTable
// Create JSON table
maprcli table create -path /tableVolume/cdcTable -tabletype json// Create Volume for table
https://10.10.100.17:8443/rest/volume/create?name=tableVolume&path=/tableVolume
// Create Binary table
https://10.10.100.17:8443/rest/table/create?path=/tableVolume/cdcTable
// Create JSON table
https://10.10.100.17:8443/rest/table/create?path=/tableVolume/cdcTable&tabletype=json
A HPE Ezmeral Data Fabric Event Store changelog stream must be created for the
propagated changed data records using the maprcli stream create
-ischangelog parameter. See maprcli stream create or use the Control System.
stream topic create command is used to create a stream topic,
then the number of topic partitions can be set at creation time and then is
locked. table changelog add command is used to add a stream topic (as
well as establish a relationship between the source table and the changelog stream),
then the number of topic partitions is inherited from the changelog stream and is
locked.// Create Volume for stream
maprcli volume create -name streamVolume -path /streamVolume
// Create stream (default partitions: 1)
maprcli stream create -path /streamVolume/changelogStream -ischangelog true
// Create stream (default partitions: 3)
maprcli stream create -path /streamVolume/changelogStream -ischangelog true -defaultpartitions 3// Create Volume for stream
https://10.10.100.17:8443/rest/volume/create?name=streamVolume&path=/streamVolume
// Create stream (default partitions: 1)
https://10.10.100.17:8443/rest/stream/create?path=/streamVolume/changelogStream&ischangelog=true
// Create stream (default partitions: 3)
https://10.10.100.17:8443/rest/stream/create?path=/streamVolume/changelogStream&ischangelog=true&defaultpartitions=3
maprcli table changelog add command is used to establish the changelog
relationship.) The stream topic edit command can not be used to modify the
topic's number of partitions.maprcli table
changelog add command and create the topic there.stream topic create command and
not specify the -partitions parameter.stream
topic create command and set the -partitions parameter.The following code examples show how to create a stream topic and change the default partition to five (5).
// Create topic (default partitions: 5
maprcli stream topic create -path /streamVolume/changelogStream -topic cdcTopic1 -partitions 5
// Create topic (default partitions: 5
https://10.10.100.17:8443/rest/stream/topic/create?path=/streamVolume/changelogStream&topic=cdcTopic1&partitions=5
A table changelog relationship must be added between the source table and the destination stream topic by using the maprcli table changelog add command or the Control System. By adding a table changelog relationship, you are creating an environment that propagates changed data records from a source table to a HPE Ezmeral Data Fabric Event Store stream topic.
-useexistingtopic parameter.
The -useexistingtopic parameter can only be used with a changelog
stream's newly created topic or a previous changelog stream topic for the same source
table.-propagateexistingdata
parameter to false. The default is true.-pause parameter to
true. The change data records are stored in a bucket until you resume the changelog
relationship; at this point, the stored change data records are propagated to the stream
topic. See table changelog resume for more
information.maprcli table changelog add -path /tableVolume/cdcTable -changelog /streamVolume/changelogStream:cdcTopic1
maprcli table changelog add -path /tableVolume/cdcTable -changelog /streamVolume/changelogStream:cdcTopic1 -useexistingtopic true
https://10.10.100.17:8443/rest/table/changelog/add?path=/tableVolume/cdcTable&changelog=/streamVolume/changelogStream:cdcTopic1
https://10.10.100.17:8443/rest/table/changelog/add?path=/tableVolume/cdcTable&changelog=/streamVolume/changelogStream:cdcTopic1&useexistingtopic=true
The following example verifies that the table changelog relationship exists:
maprcli table changelog list -path /tableVolume/cdcTable
To have CDC changed data records to consume, you need to perform inserts, updates, and deletes on the HPE Ezmeral Data Fabric Database table data. See CRUD operations on documents using mapr dbshell for JSON documents, mapr hbshell for binary data, Java applications for HPE Ezmeral Data Fabric Database JSON, C or Java applications for HPE Ezmeral Data Fabric Database Binary.
A HPE Ezmeral Data Fabric Event Store Kafka/OJAI consumer application subscribes to the topic and consumes the change data records. See Consuming CDC Records for more information.