In this replication topology, there are two primary-secondary relationships, with each table playing both the primary and secondary roles. Client applications update both tables and each table replicates updates to the other.
All updates from a source table arrive at a replica after having been authenticated at a gateway. Therefore, access control expressions on the replica that control permissions for updates to column families and columns are irrelevant; gateways have the implicit authority to update replicas.
In this diagram, the customers table on the cluster
sanfrancisco replicates updates to the customers table
in the cluster newyork. The latter table in turn replicates updates to the
former table. HPE Ezmeral Data Fabric Database tags each table operation with the
universally unique ID (UUID) that it has assigned the table at which the operation
originated. Therefore, operations are replicated only once and are not replicated back to
the originating table.
In this diagram, both tables are in a single cluster. Operations on table
customersA are replicated to table customersB and vice
versa.
If one of the tables goes offline, you can direct client applications to the other table. When the offline table comes back online, replication between the two tables resumes automatically. When both tables are in synch again, you can redirect client applications back to the original table.
For example, assume that client applications are using the customers
table that is in the cluster sanfrancisco.
The customers table in the sanfrancisco cluster
becomes unavailable, so you redirect those client applications to the
customers table in the newyork cluster.
After the customers table in the sanfrancisco cluster
comes back online, replication back to it starts immediately. As client applications are
not yet using this table, there are no updates to replicate to the table in the
newyork cluster.
When the customers table in the sanfrancisco cluster
is in synch with the other table, you can redirect your client applications back to
it.
The method that HPE Ezmeral Data Fabric Database uses to resolve conflicts depends on the type of table involved in a multi-master replication scenario.
memcmp to compare the data
that is being modified by the conflicting changes, favoring the greater
value.Normally, delete operations are purged after the affected table cells are updated. Whereas the result of an update is saved in a table until another change overwrites or deletes it, the result of a delete is not saved. In multi-master replication, this difference can lead to tables being unsynchronized.
/mapr/sanfrancisco/customers, put row A at 10:00:00 AM./mapr/newyork/customers, delete row A at 10:00:01 AM./mapr/sanfrancisco/customers, the order of operations is:/mapr/newyork/customers.)/mapr/newyork/customers, the order of operations is:/mapr/sanfrancisco/customers.)Now, though the put happened on /mapr/sanfrancisco/customers at
10:00:00 AM, the put reaches /mapr/newyork/customers several seconds
after that. Assume that the actual time the put arrives at
/mapr/newyork/customers is 10:00:03 AM.
To ensure that both tables stay synchronized, /mapr/newyork/customers
should preserve the delete until after the put is replicated. Then, the delete can be
applied after the put. Therefore, the time-to-live for the delete should be at least
long enough for the put to arrive at /mapr/newyork/customers. In this
case, the time-to-live should be at least 3 seconds.
In general, the time-to-live for deletes should be greater than the amount of time that
it takes replicated operations to reach replicas. By default, the value is 24 hours.
Configure the value with the -deletettl parameter in the
maprcli table edit command.
For example, suppose (to extend the scenario above) that you pause replication during
weekdays and resume it on weekends. The put takes place on Monday morning
/mapr/sanfrancisco/customers at 10:00:00 AM and the delete takes
place at /mapr/newyork/customers at 10:00:01 AM. Replication does not
resume until 12:00:00 AM Saturday morning. Given the volume of operations to be
replicated and the potential for network problems, it is possible that these operations
will not be replicated until Sunday. In this scenario, a value of 7 days for
-deletettl (7 multiplied by 24 hours) should provide sufficient
margin.