CRDT: Conflict-Free Data Structures Guide
Imagine a collaborative text editor where dozens of users type at the same time, yet the document never ends up in a corrupted state. That magic happens thanks to Conflict‑Free Replicated Data Types, or CRDTs. In this guide we’ll demystify the theory, walk through concrete Python implementations, and explore where CRDTs shine in production systems.
What Is a CRDT?
A CRDT is a data structure designed for distributed environments where replicas can diverge and later converge without any central coordinator. The key guarantee is *strong eventual consistency*: if no new updates are made, all replicas will eventually hold the same value, regardless of the order in which updates were applied.
CRDTs achieve this by embedding enough metadata in each operation to resolve conflicts deterministically. Unlike operational transformation (OT), which requires a server to reorder edits, CRDTs let every node apply updates locally and still reach a consistent state.
Two Families of CRDTs
- State‑based (or convergent) CRDTs: Nodes periodically exchange their whole state and merge using a commutative, associative, idempotent function.
- Operation‑based (or commutative) CRDTs: Nodes broadcast individual operations; each operation must be delivered exactly once and be inherently commutative.
Both families obey the same mathematical properties, but they differ in network overhead and implementation complexity.
Types of CRDTs
CRDTs come in many flavors, each tailored to a specific data model. Below are the most common categories you’ll encounter.
Registers
A register holds a single value. The classic example is a *last‑writer‑wins* (LWW) register, which resolves conflicts by comparing timestamps or logical clocks.
Counters
Counters are integer values that support increment and decrement operations. The most popular design is the *PN‑Counter* (Positive‑Negative Counter), which keeps separate grow‑only counters for increments and decrements.
Sets
Sets can be either *grow‑only* (G‑Set) or *add‑remove* (2‑P‑Set, OR‑Set). An OR‑Set (Observed‑Removed Set) tags each element with a unique identifier, allowing precise removal without resurrecting stale entries.
Sequences
Sequences model ordered collections like text buffers. The *RGA* (Replicated Growable Array) and *Logoot* algorithms assign immutable identifiers to characters, enabling concurrent inserts and deletes.
How CRDTs Work Under the Hood
All CRDTs rely on three mathematical properties:
- Commutativity: Applying operation A then B yields the same state as B then A.
- Associativity: Grouping operations does not affect the final result.
- Idempotence: Re‑applying the same operation has no additional effect.
These properties guarantee that merging states or replaying operations is safe, even when messages are delayed, reordered, or duplicated.
Version Vectors & Causal Context
Many CRDTs embed a version vector (or dot) with each update. A dot is a tuple (node_id, counter) that uniquely identifies the event. When two replicas exchange data, they compare dots to decide which updates are missing, ensuring *causal delivery* without a central sequencer.
Merge Functions
For state‑based CRDTs, the merge function is often a simple lattice join. For example, a G‑Set merges by taking the union of its element sets. Because union is commutative, associative, and idempotent, the merge satisfies the CRDT contract.
Building a Simple CRDT in Python
Let’s implement an **Observed‑Removed Set (OR‑Set)** from scratch. This example demonstrates the core ideas—unique tags, add/remove operations, and a deterministic merge.
import uuid
from collections import defaultdict
class ORSet:
def __init__(self):
# Mapping: element -> set of unique tags (adds)
self.adds = defaultdict(set)
# Mapping: element -> set of unique tags (removes)
self.removes = defaultdict(set)
def _new_tag(self):
"""Generate a globally unique tag."""
return uuid.uuid4().hex
def add(self, element):
tag = self._new_tag()
self.adds[element].add(tag)
return (element, tag) # operation payload
def remove(self, element):
# Remove *all* known tags for the element
tags = self.adds.get(element, set())
self.removes[element].update(tags)
return (element, tags) # operation payload
def lookup(self, element):
"""True if there exists an add‑tag not removed."""
return bool(self.adds[element] - self.removes[element])
def value(self):
"""Current set view."""
return {e for e in self.adds if self.lookup(e)}
def merge(self, other):
"""Merge another ORSet into this one."""
for elem, tags in other.adds.items():
self.adds[elem].update(tags)
for elem, tags in other.removes.items():
self.removes[elem].update(tags)
Usage is straightforward: each replica creates an ORSet, calls add or remove, and broadcasts the returned payload. On receipt, the replica applies the same operation locally and periodically runs merge with any peer’s full state.
Now, let’s see a **PN‑Counter** that demonstrates how two grow‑only counters combine into a bi‑directional counter.
class PNCounter:
def __init__(self, node_id):
self.node_id = node_id
self.p = defaultdict(int) # increments per node
self.n = defaultdict(int) # decrements per node
def inc(self, n=1):
self.p[self.node_id] += n
def dec(self, n=1):
self.n[self.node_id] += n
def value(self):
return sum(self.p.values()) - sum(self.n.values())
def merge(self, other):
for nid, cnt in other.p.items():
self.p[nid] = max(self.p[nid], cnt)
for nid, cnt in other.n.items():
self.n[nid] = max(self.n[nid], cnt)
Both examples are *operation‑based* in spirit because each method returns a minimal payload that can be sent over the wire. The merge functions are pure state‑based joins, making them safe to call repeatedly without side effects.
Real‑World Use Cases
CRDTs power many modern collaborative and offline‑first applications. Below are three domains where they provide a decisive advantage.
- Collaborative Editing: Google Docs, Notion, and Figma use CRDT‑style algorithms to merge concurrent edits without a single point of failure.
- Distributed Caching: Systems like Redis CRDT module and AntidoteDB store counters, sets, and maps that remain consistent across geo‑replicated clusters.
- Offline Mobile Apps: Evernote and Microsoft OneNote let users edit notes offline; CRDTs reconcile changes the moment connectivity returns.
In each case the core benefit is *low latency*: users see their own updates instantly, while the system quietly propagates changes in the background.
Performance Considerations & Trade‑offs
CRDTs are not a silver bullet. Their metadata can grow linearly with the number of operations, especially for sets that retain tombstones. Choosing the right variant (e.g., using a *compact* OR‑Set with garbage collection) can mitigate this.
Network traffic also differs between families. State‑based CRDTs send the whole replica state, which can be expensive for large structures, but they are simple to implement. Operation‑based CRDTs send tiny deltas, but they rely on reliable, exactly‑once delivery—often requiring an additional transport layer like ACK‑based gossip.
Finally, latency guarantees vary. Because every replica can apply updates locally, write latency is near‑zero. However, read latency may be higher if a query must reconcile multiple pending updates before returning a stable result.
Pro Tips for Integrating CRDTs
Tip 1 – Use Compact Identifiers: UUIDs are easy but bulky. For high‑throughput systems, generate 64‑bit Lamport timestamps combined with a node ID to keep tags small.
Tip 2 – Prune Tombstones Periodically: Implement a background job that removes tags older than a configurable “retention window” once you’re sure all replicas have seen them.
Tip 3 – Combine CRDTs with Version Vectors: When you need strong causal ordering (e.g., for complex data structures), store a version vector alongside each payload and discard duplicates early.
Tip 4 – Test Under Real Network Conditions: Simulate message loss, reordering, and partitions. A well‑behaved CRDT will still converge, but performance characteristics can change dramatically.
Conclusion
CRDTs turn the daunting problem of distributed consistency into a set of well‑understood mathematical guarantees. By embedding enough metadata to resolve conflicts deterministically, they enable low‑latency, offline‑first, and highly available applications without a central coordinator.
We covered the theory behind state‑ and operation‑based CRDTs, explored common data structures like registers, counters, sets, and sequences, and built two practical Python examples that you can drop into your own projects. Real‑world systems already rely on CRDTs for collaborative editing, geo‑replicated caches, and mobile offline sync.
When you evaluate a new distributed feature, ask yourself whether eventual consistency is acceptable and whether the added metadata overhead fits your performance budget. If the answer is “yes,” CRDTs are often the cleanest, most resilient solution.