What is it?
A distributed, sharded Key-Value store inspired by DynamoDB and Etcd. It is capable of handling node failures transparently and maintaining strict consistency guarantees using the Raft Consensus Algorithm.
System Design
The system is composed of several components:
- Gateway Service: Routes client requests to the correct shard.
- Shard Masters: Maintain the mapping of keys to shards.
- Data Nodes: Store the actual data and replicate logs.
Leader Election
One of the most complex parts of Raft is Leader Election. When a follower doesn't hear from a leader within a randomized timeout, it becomes a candidate.
func (rf *Raft) ticker() {
for !rf.killed() {
// Check if we received a heartbeat
if time.Since(rf.lastHeartbeat) > rf.electionTimeout {
rf.startElection()
}
time.Sleep(10 * time.Millisecond)
}
}
Features
- Sharding: Consistent hashing ring for even data distribution.
- Replication: 3x replication factor for fault tolerance.
- Strong Consistency: Linearizability for all read/write operations.
Challenges
Debugging distributed systems is notoriously difficult due to non-determinism. I built a discrete-event simulator to test network partitions, packet drops, and random node crashes to verify the correctness of the consensus logic.