CRDTs: Conflict-Free Replicated Data Types

Learn how CRDTs enable strongly consistent eventual consistency in distributed databases. Explore G-Counters, PN-Counters, LWW-Registers, OR-Sets, and practical applications.

published: reading time: 14 min read

CRDTs: Conflict-Free Replicated Data Types for Distributed Systems

Distributed systems have a conflict problem. When multiple nodes accept writes concurrently and then synchronize, those writes might contradict each other. Traditional approaches require coordination: you need a node to decide which write wins. CRDTs take a different approach. They are data structures mathematically designed to merge concurrent updates without conflicts, no coordination required.

I first encountered CRDTs when debugging a collaborative editing tool. Users on three continents were editing the same document simultaneously. The system was using operational transformation, which required a central server to serialize operations. When that server went down, edits diverged catastrophically. We switched to a CRDT-based approach and the problem vanished. No coordinator. No conflicts. Edits from any node merge cleanly.

The Problem CRDTs Solve

Traditional conflict resolution requires either:

  1. Coordination: A central node sequences all writes. This is a bottleneck and a single point of failure.
  2. Application-level resolution: The application developer writes complex merge logic. This is error-prone.
  3. Last-write-wins: The timestamp decides. This loses data and produces unexpected results.

CRDTs offer a fourth option: the data structure itself guarantees conflict-free merging.

graph TD
    subgraph Without CRDT
        N1[Node A] --> W1[Write: x=1]
        N2[Node B] --> W2[Write: x=2]
        M1[Merge] -->|conflict| X[???]
    end

    subgraph With CRDT
        N3[Node A] --> W3[Write: x=1]
        N4[Node B] --> W4[Write: x=2]
        M2[Merge] --> R[Both values preserved]
    end

What Makes a CRDT

A CRDT is a data type where any two valid states can be merged and the result is also a valid state. This property is called commutativity: the order of merging does not matter.

Mathematically, CRDTs come in two flavors:

  • CmRDT (Commutative Replicated Data Types): Operations commute
  • CvRDT (Convergent Replicated Data Types): States converge when merged

In practice, most implementations use CvRDTs because they are easier to reason about. The state itself is designed to merge cleanly.

G-Counter: Grow-Only Counter

The simplest CRDT. It only increments. No decrements. This works for things like page view counts where you never need to subtract.

class GCounter:
    """
    Grow-only counter. Each node maintains its own count.
    Merging takes the maximum for each node.
    """
    def __init__(self):
        self.counts = {}  # node_id -> count

    def increment(self, node_id):
        self.counts[node_id] = self.counts.get(node_id, 0) + 1

    def merge(self, other: 'GCounter'):
        for node_id, count in other.counts.items():
            self.counts[node_id] = max(
                self.counts.get(node_id, 0),
                count
            )

    def value(self) -> int:
        return sum(self.counts.values())
graph TD
    N1[Node A: 5] --> M[Merge]
    N2[Node B: 3] --> M
    M --> R[Total: 8]

PN-Counter: Positive-Negative Counter

A PN-Counter extends G-Counter to support both increments and decrements. It maintains two G-Counters: one for increments, one for decrements.

class PNCounter:
    """
    Positive-Negative counter. Supports both increments and decrements.
    Uses two G-Counters internally.
    """
    def __init__(self):
        self.positive = GCounter()  # increments
        self.negative = GCounter()  # decrements

    def increment(self, node_id, amount=1):
        for _ in range(amount):
            self.positive.increment(node_id)

    def decrement(self, node_id, amount=1):
        for _ in range(amount):
            self.negative.increment(node_id)

    def merge(self, other: 'PNCounter'):
        self.positive.merge(other.positive)
        self.negative.merge(other.negative)

    def value(self) -> int:
        return self.positive.value() - self.negative.value()

A use case: a shopping cart where you add and remove items. The PN-Counter tracks the cart size even when concurrent updates happen.

LWW-Register: Last-Write-Wins Register

A register holds a single value. When merging, the write with the latest timestamp wins.

import time
from dataclasses import dataclass

@dataclass
class LWWRegister:
    value: any
    timestamp: float
    node_id: str

class LWWMap:
    """
    Last-Write-Wins map. Each key has a (value, timestamp, node_id) tuple.
    On merge, the entry with highest timestamp wins.
    Ties broken by node_id.
    """
    def __init__(self):
        self.entries = {}  # key -> LWWRegister

    def set(self, key, value, timestamp=None, node_id=None):
        self.entries[key] = LWWRegister(
            value=value,
            timestamp=timestamp or time.time(),
            node_id=node_id
        )

    def get(self, key):
        return self.entries.get(key)

    def merge(self, other: 'LWWMap'):
        for key, reg in other.entries.items():
            if key not in self.entries:
                self.entries[key] = reg
            else:
                # Compare timestamps, then node_id for ties
                ours = self.entries[key]
                if (reg.timestamp > ours.timestamp or
                    (reg.timestamp == ours.timestamp and reg.node_id > ours.node_id)):
                    self.entries[key] = reg
sequenceDiagram
    participant N1 as Node A
    participant N2 as Node B
    participant M as Merge

    N1->>N1: set(x=1, ts=100)
    N2->>N2: set(x=2, ts=101)
    N1->>M: state: x=(1, 100, A)
    N2->>M: state: x=(2, 101, B)
    M->>M: ts=101 > ts=100, keep B
    Note over M: Result: x=2

OR-Set: Observed-Remove Set

An OR-Set lets you add and remove elements. Concurrent adds and removes are handled correctly.

class ORObject:
    """
    Observed-Remove Set entry.
    Each (tag, value) pair can be added and removed.
    """
    def __init__(self, value, tag):
        self.value = value
        self.tag = tag

class ORSet:
    """
    Observed-Remove Set. Elements have unique tags.
    - add(tag, value): Adds element with unique tag
    - remove(tag): Marks element as removed
    - merge(): Union of adds, subtracts removes
    """
    def __init__(self):
        self.adds = {}  # tag -> ORObject
        self.removes = set()  # tags that were removed

    def add(self, value, tag=None):
        tag = tag or str(uuid.uuid4())
        self.adds[tag] = ORObject(value, tag)
        return tag

    def remove(self, tag):
        self.removes.add(tag)

    def contains(self, tag):
        return tag in self.adds and tag not in self.removes

    def merge(self, other: 'ORSet'):
        # Union of adds
        for tag, obj in other.adds.items():
            if tag not in self.removes:
                self.adds[tag] = obj

        # Union of removes
        self.removes.update(other.removes)

    def elements(self):
        return [obj.value for tag, obj in self.adds.items()
                if tag not in self.removes]

OR-Sets are good for collaborative lists, shopping carts, and anywhere items are added and removed concurrently.

Tombstone accumulation is a practical issue with basic OR-Set implementations. When you remove an element, the tag lands in the removes set but never gets cleaned up. Run enough add/remove cycles and the removes set grows without bound. Picture this: add and remove a user 100 times across 5 nodes, and you now have 500 tombstone entries even though the set looks empty to anyone reading it. This is especially painful in memory-constrained environments. RGA and other advanced variants address this differently, using garbage collection or mergeable delete flags instead of permanent tombstones.

RGA: Replicated Growable Array

The RGA is a CRDT for ordered sequences. It handles concurrent inserts at the same position correctly.

class RGANode:
    def __init__(self, value, timestamp, id=None):
        self.value = value
        self.timestamp = timestamp
        self.id = id or str(uuid.uuid4())

class RGA:
    """
    Replicated Growable Array. Uses timestamps and IDs
    for ordering. Concurrent inserts at the same position
    are ordered by ID to ensure determinism.
    """
    def __init__(self):
        self.nodes = []  # Ordered list of RGANodes

    def insert_after(self, after_id, value):
        node = RGANode(value, time.time())
        # Insert at position after after_id
        pos = self._find_position(after_id)
        self.nodes.insert(pos + 1, node)

    def _find_position(self, after_id):
        for i, n in enumerate(self.nodes):
            if n.id == after_id:
                return i
        return -1  # Insert at beginning if not found

    def merge(self, other: 'RGA'):
        # Merge by timestamp+ID ordering
        all_nodes = {n.id: n for n in self.nodes}
        all_nodes.update({n.id: n for n in other.nodes})

        # Re-sort by timestamp, then by ID
        self.nodes = sorted(
            all_nodes.values(),
            key=lambda n: (n.timestamp, n.id)
        )

Where CRDTs Are Used

Collaborative Editing

Figma uses CRDTs for real-time collaboration. Each edit is an operation that can be merged without conflicts. Multiple users can edit the same document simultaneously.

Distributed Databases

Riak uses CRDTs for certain data types. Cassandra’s lightweight transactions use a form of CRDT-like conflict resolution.

Messaging Systems

WhatsApp uses CRDTs for message synchronization. When you send a message offline, it is stored locally. When connectivity returns, it is merged with the server state without conflicts.

Shopping Carts

Amazon’s shopping cart reportedly uses CRDTs. Your cart is replicated across nodes. Concurrent modifications (add from phone, remove from desktop) merge correctly.

Implementing CRDTs in Practice

Most languages have CRDT libraries. For Python:

# Using the crdt library
from crdt import GCounter, PN_Counter, LWW_Register, OR_Map

# Create counters on different nodes
node_a_counter = PN_Counter()
node_b_counter = PN_Counter()

# Each node increments independently
node_a_counter.increment('node_a')
node_b_counter.increment('node_b')

# Merge when nodes communicate
node_a_counter.merge(node_b_counter)

# Result is the sum of all increments
print(node_a_counter.value)

The Limits of CRDTs

CRDTs are powerful but not a silver bullet.

Memory overhead: CRDTs maintain metadata (timestamps, tags, per-node state). A G-Counter with 1000 nodes needs 1000 counters even if most are zero.

Not all data types have CRDT forms: You cannot build a CRDT for a set with strong consistency guarantees on contains(). Some data just does not have a conflict-free representation.

Semantic limitations: LWW-Register loses data by design. If two concurrent writes happen, one value disappears. CRDTs do not solve the problem of data loss; they just make the loss predictable.

# CRDT does not preserve both values
node_a.set('x', 'hello')
node_b.set('x', 'world')
# After merge: only one survives
# Which one depends on timestamps, not intent

Timestamps are unreliable: Clocks skew across machines. Hybrid Logical Clocks (HLC) help, but timestamps are never perfect.

CRDTs vs Operational Transformation

Both solve collaborative editing, but differently.

AspectCRDTOperational Transformation
CoordinationNoneRequired for some transforms
ComplexitySimple data structuresComplex transformation functions
Server dependencyNot neededOften needed for correctness
MemoryGrows with historyCan be compact

For new systems, CRDTs are usually the better choice. They are simpler and do not require a central server.

CRDT Type Comparison

CRDT TypeOperationsMerge StrategyMemory OverheadUse Cases
G-CounterIncrement onlyMax per nodeO(N) countersPage views, vote counts
PN-CounterIncrement, DecrementMax for pos/negO(2N) countersShopping cart qty, bank balance
LWW-RegisterSetLatest timestamp winsO(1)User preferences, last-write-wins
OR-SetAdd, RemoveUnion of adds, subtracts removesO(A+R) entriesShopping carts, collaborative lists
RGAInsert, DeleteTimestamp+ID orderingO(N) nodesText editing, ordered sequences

Production Libraries

Most languages have CRDT library support:

# Python: pycrdt library
from pycrdt import Counter, LwwMap, ORMap

# Create a shared document
doc = pycrdt.Document()

# Add CRDTs to the document
counter = doc.get_or_create('counter', pycrdt.Int())
counter.increment()
// JavaScript/TypeScript: Yjs library
import * as Y from "yjs";

const doc = new Y.Doc();
const text = doc.getText("editor");
const map = doc.getMap("preferences");

// Automatic conflict resolution
text.insert(0, "Hello");
LibraryLanguageBest For
YjsJavaScript/TypeScriptCollaborative text editing
AutomergeJavaScript, Rust, PythonGeneral-purpose CRDT documents
pycrdtPythonPython-based systems
riak_dtErlangDistributed databases
crdtRubyRuby-based systems

2P-Set: Add-Only-Then-Remove-Once

A Two-Phase Set (2P-Set) is stricter than an OR-Set: you can add freely, but once something is removed it cannot be re-added. Think feature flags, configuration keys, or any domain where deprecation is permanent.

The two phases are straightforward. During the add phase, elements land in AddSet. During the remove phase, they move to RemoveSet (a tombstone). Merge unions both sets. An element is present only when it is in AddSet and not in RemoveSet.

class TwoPhaseSet:
    """2P-Set: adds go to AddSet, removes go to RemoveSet. Once removed, never returns."""
    def __init__(self):
        self.add_set = set()      # Elements that have been added
        self.remove_set = set()  # Elements that have been removed (tombstones)

    def add(self, value):
        if value not in self.remove_set:
            self.add_set.add(value)

    def remove(self, value):
        self.remove_set.add(value)

    def contains(self, value):
        return value in self.add_set and value not in self.remove_set

    def merge(self, other: 'TwoPhaseSet'):
        self.add_set = self.add_set.union(other.add_set)
        self.remove_set = self.remove_set.union(other.remove_set)

Good fit: versioned config where you deprecate keys and never re-enable them. Roll out a feature flag, then kill it permanently. A 2P-Set handles this correctly even if someone tries to add the flag again concurrently.

Cart-Document CRDT: Shopping Cart Without Conflicts

Shopping carts have to deal with concurrent adds, removes, and quantity updates from multiple devices hitting the same cart. A Cart-Document CRDT models this as a document of items, each with a product ID, quantity, and timestamp.

from dataclasses import dataclass
import time

@dataclass
class CartItem:
    product_id: str
    quantity: int
    timestamp: float

class CartDocument:
    """Shopping cart as a CRDT document. Concurrent adds/updates/remove all merge cleanly."""
    def __init__(self):
        self.items = {}  # product_id -> CartItem

    def add_item(self, product_id: str, quantity: int, timestamp: float = None):
        timestamp = timestamp or time.time()
        if product_id not in self.items:
            self.items[product_id] = CartItem(product_id, quantity, timestamp)
        elif timestamp > self.items[product_id].timestamp:
            self.items[product_id] = CartItem(product_id, quantity, timestamp)

    def remove_item(self, product_id: str, timestamp: float = None):
        timestamp = timestamp or time.time()
        if product_id in self.items:
            if timestamp > self.items[product_id].timestamp:
                self.items[product_id] = CartItem(product_id, 0, timestamp)

    def update_quantity(self, product_id: str, quantity: int, timestamp: float = None):
        timestamp = timestamp or time.time()
        if product_id in self.items and timestamp > self.items[product_id].timestamp:
            self.items[product_id] = CartItem(product_id, quantity, timestamp)
        elif product_id not in self.items:
            self.items[product_id] = CartItem(product_id, quantity, timestamp)

    def merge(self, other: 'CartDocument'):
        for product_id, other_item in other.items.items():
            if product_id not in self.items:
                self.items[product_id] = other_item
            else:
                our_item = self.items[product_id]
                # Take max quantity per product so neither addition gets lost
                merged_quantity = max(our_item.quantity, other_item.quantity)
                # Latest timestamp wins for everything else
                if other_item.timestamp > our_item.timestamp:
                    self.items[product_id] = CartItem(
                        product_id,
                        merged_quantity,
                        other_item.timestamp
                    )
                else:
                    self.items[product_id] = CartItem(
                        product_id,
                        merged_quantity,
                        our_item.timestamp
                    )

    def get_items(self):
        return [item for item in self.items.values() if item.quantity > 0]

The merge takes the max quantity per product_id so you never lose an item that was added on another device. Everything else uses timestamp-based last-write-wins.

JSON CRDT: Nested Structures Without Conflicts

Apps need to sync nested JSON-like structures: maps with string keys, arrays with ordered elements, scalars. Automerge and Yjs handle this by treating every operation (set, insert, delete) as a timestamped, actor-tagged unit that merges deterministically.

Automerge stores documents as operation trees. Each op carries a timestamp and actor ID. When merging, it computes the transitive closure of all ops and applies them in deterministic order (actor ID breaks ties).

Yjs takes a similar path, giving every array element and map key a unique ID. Concurrent changes resolve like this:

  • Scalar values (strings, numbers, booleans): latest timestamp wins
  • Map keys: latest timestamp wins per key
  • Array elements: inserts get unique IDs; concurrent inserts at the same spot ordered by ID
  • Nested objects: recursive merge, where each sub-object is itself a CRDT
class JSONScalar:
    """A scalar value with LWW semantics."""
    def __init__(self, value, timestamp, actor):
        self.value = value
        self.timestamp = timestamp
        self.actor = actor

    def merge(self, other):
        if other.timestamp > self.timestamp:
            return other
        elif other.timestamp == self.timestamp and other.actor > self.actor:
            return other
        return self

class JSONMap:
    """A JSON object/map with LWW semantics per key."""
    def __init__(self):
        self.fields = {}  # key -> JSONValue

    def set(self, key, value, timestamp, actor):
        self.fields[key] = JSONScalar(value, timestamp, actor)

    def get(self, key):
        return self.fields.get(key)

    def merge(self, other: 'JSONMap'):
        for key, other_val in other.fields.items():
            if key not in self.fields:
                self.fields[key] = other_val
            else:
                self.fields[key] = self.fields[key].merge(other_val)

class JSONArray:
    """A JSON array with CRDT semantics for concurrent inserts."""
    def __init__(self):
        self.elements = []  # List of (id, JSONValue)

    def insert(self, id, value, timestamp, actor):
        self.elements.append((id, JSONScalar(value, timestamp, actor)))

    def merge(self, other: 'JSONArray'):
        # Union of elements, sorted by (timestamp, actor) for determinism
        other_map = {e[0]: e[1] for e in other.elements}
        self_map = {e[0]: e[1] for e in self.elements}
        self_map.update(other_map)
        self.elements = sorted(
            [(id, val) for id, val in self_map.items()],
            key=lambda x: (x[1].timestamp, x[1].actor)
        )

Both Automerge and Yjs use structural sharing to avoid copying entire documents on every change. When you modify a deeply nested field, only the path to that field is copied; sibling branches are shared with the previous version. This makes CRDT documents efficient even when histories grow long.

Quick Recap

  • CRDTs merge concurrent updates without conflicts
  • G-Counter: increment-only counting
  • PN-Counter: bidirectional counting
  • LWW-Register: timestamp-based value selection
  • OR-Set: add/remove collections
  • CRDTs trade memory and semantics for coordination-free merging
  • Not all data types have CRDT representations
  • Use for: collaborative editing, distributed caches, offline-first apps

For more on eventual consistency, see Event-Driven Architecture. For conflict resolution patterns, see Consistency Models. For distributed data patterns, see Database Replication.

CRDTs are a powerful tool for building eventually consistent systems without coordination overhead. They are not right for every problem, but when you need multiple nodes to accept writes independently and merge cleanly, CRDTs provide mathematical guarantees that hand-rolled merge logic cannot match.

Category

Related Posts

Asynchronous Replication: Speed and Availability at Scale

Learn how asynchronous replication works in distributed databases, including eventual consistency implications, lag monitoring, and practical use cases where speed outweighs strict consistency.

#distributed-systems #replication #eventual-consistency

Multi-Paxos

Multi-Paxos extends basic Paxos to achieve consensus on sequences of values, enabling practical replicated state machines for distributed systems and databases.

#distributed-systems #consensus #paxos

Paxos Consensus Algorithm

Paxos is a family of consensus algorithms for achieving agreement in distributed systems despite failures, pioneered by Leslie Lamport.

#distributed-systems #consensus #paxos