8 Sharding And Consistent Hashing
Sharding
The process of splitting a dataset into partitions.
- lends itself data parallelism
Hashing
Uniform hash functions map inputs uniformly across the hash output space.
Naive approach:
H(key) % number_of_partitions
Lots of costly shard migration when server crash or added!
To solve this:
Consistent Hashing
- The hash output space is viewed as a “ring”
- Servers are allocated to some location on the hash output space ring
- The server is responsible for a piece of data is the next server clockwise from H(key)
However, its possible for shards to be unbalanced
- decrease granularity of the segments
- having each physical server represented multiple times on the hash output ring as a token