Distributed Systems

Some notes.

Resources:

Algorithms

  • RAFT is a consensus algorithm for distributed systems. It uses naming of Election, Terms, and Leader to model how state is propagated through a distributed system. etcd is an implementation of RAFT.
  • Gossip - not a consensus algorithm explicitly, but a model for how to distribute updates through a system.

CAP Theorem

In distributed systems literature, we have a concept called CAP theorem, also known as Brewer’s theorem. The argument of the theorem is quite simple:

A distributed system can only offer two out of these three things.

Consistency - Every read receives the most recent write or an error. Availability - Every request receives a (non-error) response, without the guarantee that it contains the most recent write. Partition tolerance - The system is scaled up to multiple partitions, and the system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.

Despite the “two out of three” idea, the third one (partition tolerance) is a defining aspect of a “distributed system”. If it’s not partition tolerant, then it’s not distributed. So, in practice, really it comes down to choosing consistency vs. availability.

Hence, why we hear the term “eventual consistency”. Usually people choose availability for end-user-facing distributed systems.

Dual Writes Problem

Doesn’t necessarily warrant an entire section in this notes doc, but this GH Repo covers it well: here

Basically, if you see this type of pattern:

await database.save(user);
await eventBus.publish(userCreatedEvent);

This is the dual writes problem. Anything can happen between those two operations — some other use could be created with the same username via a different system, a system could crash, or other thing happen causing the second operation to fail.

One good(depending) way around this is to ensure the event consumer is idempotent — the event consumer can run as many times as needed without causing something unexpected (or change in data) in the system. The same input -> the same output, never depending on “run n times”.

end of storey Last modified: