Chapter 5 - Replication

Replication

Copy data on multiple machines.

Why?

* Keep data close to users
* Redundancy = keep working even if some machines fail
* Scale read queries

How?

The problem comes when data changes occur. How to apply these changes on the replicas?

Types of replication:

* Single leader (master/slave)
* Multi leader (master/master)
* Leaderless

Single leader replication

Write operations happen on the leader (master) only.

  1. Master recieves instruction to write
  2. Master writes locally
  3. Master sends the data change in replication log
  4. Slaves take log entry and update their local copy

Read operations can happen on the leader or any follower (slave).

This design is built-in on most relational DB and also on some Message Brokers (Kafka, RabbitMQ).

When does this replication happen?

* Synchronously
* Asynchronously

Follower 1 is synchronous, Follower 2 is asynchronous.

Leader has to wait for Follower 1 "ok" before taking any other write operation.

You will want to have one synchronous and many asynchronous followers. If the synchronous comes to fault, bring one of the asynchronous to synchronous.

What happens when the leader is at fault?

One of the followers have to be promoted as leader, and clients have to update to direct their write operations to the new leader and other followers need to start following the new leader. This is called failover.

Asynchronous followers are not necessarily up to date with the leader, creating eventual consistency as they will eventually catch up in the future (if there are less write operations on the leader for example).

Multi leader replication

Each leader acts simultaneously as a follower of the other leader(s).

Applies when:

* Multiple data centers, one leader in each, replicating changes on the other leaders. Each having followers locally only.
* Offline needs (example of the calendar on your mobile which accepts writes while offline and syncs it to the server when back online)
* Collaborative editing (e.g. Google Docs)

Problem:

* Allowing writes simultaneously on different machines introduces conflicts: Resolving conflicts! 

Leaderless replication

Also known as Dynamo style as it's been popularised again by Amazon's DynamoDB.

Essentially in leaderless, each node accepts writes.

However, there is no failover as with leader-based approaches. So when a node is down, nothing will inform it that it missed something.

In the image above, the client wait for 2 nodes to acknowledge the write operation (set user picture).

Interestingly, replica 3 is not updated and won't be by any of the other replicas. However, on a read operation, client also waits for several nodes to reply (replica 3 replies with stale data but the others are up to date as we can see with the version numbers).

Eventually during that read operation, as the client sees that replica 3 is out of date, it can ask for an update, this is a read repair and it works great when there are many reads.

Another option to keep every node updated is the Anti-entropy process which runs in the background and copies misses from a node to another.

Ensure read and write operations are trustable:

* `w + r > n` where
    * `w` is the number of nodes which need to acklowledge write operations
    * `r` is the number of nodes which need to acklowledge write operations
    * `n` is the number of nodes in total
* Commonly we set:
    * `n` as an odd number
    * `w` = `r` = `(n + 1)/2`

results matching ""

    No results matching ""