Partitioning
Partitioning and Replication are usually combined. When doing so, each node is a leader for some partitions and a follower for others. Partitions don't have to be replicated on every node so that node 1 could contain partitions A, B, C and node 2 partitions B, D, E, etc...
Partitions are usually a subset of a data set, it's still a database on its own.
How to dispatch the shards equally then so that some nodes don't have to deal with more queries than others and the load is then equally distributed?
Partitioning of key-value stores
By key range
Just like your encyclopedia.
With encyclopedia you search by key, same as in key-value stores.
- Allows efficient range queries.
- Load is not necessarily evenly distributed
For time series, such as sensors collection, prefix the timestamp key by the sensor's name.
By hash of key
Fairly common in distributed systems.
The key is non-cryptographically hashed (with MD5 or Fowler-Noll-Vo).
- This ensures entries are evenly distributed (and so does load)
- Does not permit range queries, and all nodes have to be queried when querying by primary key.
Partitioning with Secondary Indexes
Such as all cars whose colour is red. Index is
color:red
.
By document
Each partition maintains its own secondary index, covering only the documents in that partition.
You must then query all partitions, wait for them all to return in order to have a consistent response.
For the secondary index
color
,
- Partition 0 could have ids
123
,245
and657
withcolor:red
, ids23
and345
withcolor:black
- Partition 1 could have ids
745
,888
withcolor:red
and nocolor:black
.considering Partition 0 keeps track of ids [0, 700] and Partition 1 ids [700, 1000].
This is called local indexing.
By term
The indexes are now held globally, not per partition. However, it is itself also partitioned.
Hence, color
will be distributed so that colours from a
to c
would end up in Partition 0, and from d
to m
in Partition 1, etc. They would still keep track of document ids whatever the partition they are actually attached to. So an index in Partition 0 can contain a document which is in Partition 1.
This is called global indexing.
Rebalancing
When adding / removing / operating on nodes.
Make sure you have multiple shards / partitions on each node.
Request routing
Options:
* Connect to any node, if it's the one which has the requested data, serve it, otherwise forward to the other node which has it.
* Connect to a load balancer which only role is to dispatch
* Require from client to be aware of the partitions assignments to nodes
Many rely on separate coordination service such as Zookeeper:
Nodes self-register with Zookeeper and the routing tier or the client can subscribe to changes.