System Design #9: Design a Collaborative Text Editor

Hey folks, Rahul here 👋

Google Docs, Notion, Figma — real-time collaboration is the feature that separates toys from tools. And it's arguably the hardest frontend system design problem you'll encounter.

The core question is deceptively simple: "Two people type at the same time. How do both documents converge to the same state?" The answer involves either Operational Transformation (OT) or Conflict-free Replicated Data Types (CRDTs), and most candidates can't explain either. Let's fix that.

R — Requirements

Functional Requirements

Rich text editing (bold, italic, headings, lists, links, images)
Real-time collaboration: multiple users editing simultaneously
Cursor presence: see other users' cursor positions and selections
Conflict resolution: concurrent edits converge to the same state
Version history: browse and restore previous versions
Offline editing with sync on reconnect
Comments and suggestions (track changes)

Non-Functional Requirements

Latency: Local edits must feel instant (<16ms)
Convergence: All clients must reach the same document state
Scalability: Support 50+ simultaneous editors
Reliability: No data loss on crashes or network failures
Performance: Handle documents with 100K+ characters

A — Architecture

OT vs CRDT: The Core Decision

Operational Transformation (OT) — Google Docs' Approach

OT transforms operations against each other so they can be applied in any order and produce the same result.

// Two users type simultaneously:
// User A: insert "X" at position 5
// User B: insert "Y" at position 3

// Without transformation:
// A sees: "helloXworld" → insert Y at 3 → "helYloXworld" ✗
// B sees: "helYloworld" → insert X at 5 → "helYlXoworld" ✗
// Different results!

// With OT:
// Transform A's op against B's: A inserted after B's position, so shift A right
// A' = insert "X" at position 6 (5 + 1)
// Now both converge: "helYloXworld" ✓

interface Operation {
  type: 'insert' | 'delete' | 'retain';
  position: number;
  content?: string;  // For insert
  count?: number;    // For delete/retain
  userId: string;
  revision: number;  // Server revision at time of creation
}

CRDT — Figma/Yjs Approach ✅ (Recommended for new projects)

CRDTs assign unique IDs to every character, making merge commutative by design — no transformation needed:

// Each character has a unique, ordered ID
interface CRDTChar {
  id: { clock: number; clientId: string }; // Lamport timestamp
  value: string;
  parent: CRDTChar['id'] | null;           // Left neighbor at insertion time
  isDeleted: boolean;                        // Tombstone (never truly removed)
}

// Insertion: place between two existing characters
// The ID ordering guarantees deterministic merge
// Even if two users insert at the "same" position, 
// the client IDs break the tie deterministically

Why CRDT for new projects?

No server-side transformation logic (simpler server)
Works offline natively (merge when reconnected)
Libraries like Yjs handle the complexity
Google chose OT in 2006; if they started today, they'd likely use CRDTs

Architecture with Yjs

D — Data Model

Document State

Awareness Protocol (Cursor Presence)

I — Interface Definition

WebSocket Protocol

TipTap + Yjs Integration

O — Optimizations

1. Offline Support with IndexedDB

2. Cursor Decoration Rendering

3. Version History with Snapshots

4. Large Document Performance

5. Permission-Aware Editing

Staff-Level Challenges: Beyond Data Convergence

. Solving Vector Clock Bloat & Metadata Growth

The Problem: Every unique clientId in Yjs or Automerge creates an entry in the state vector. If a document is public or long-lived, the overhead of identifying "who owns what" can eventually exceed the size of the actual text.

The Interview Answer:
"To handle identifier bloat, we implement Client ID Recycling and Document Snapshotting (Compaction). We don't need the full causal history for all time. Once we reach a 'Global Stable State' (where we know all active clients have seen an update), we can 'compact' the history.
Technically, this involves the server periodically generating a 'Canonical Snapshot'. New clients download this compressed snapshot instead of the full log of every keystroke since 2022. We also map long UUID client IDs to small integer offsets (1, 2, 3...) in a local lookup table to save bytes in the binary wire format."

2. Solving Semantic Merging (The "Intent" Problem)

The Problem: CRDTs guarantee convergence (everyone sees the same thing) but not intention (the result makes sense). If I move a block to Section A and you edit it while it was in Section B, the result might be a duplicated block or a broken reference.

The Interview Answer:
"CRDTs solve for data integrity, but Semantic Integrity requires a Block-Based Schema. Instead of treating the whole doc as one long string, we treat it as an ordered tree of unique 'Block IDs' (like Notion).
If a user moves a block, we move the Reference ID, not the text itself. For rich text conflicts (like overlapping bold/italic), we use the 'LWW' (Last Write Wins) register or 'Add-Wins' set logic for formatting attributes. If two users move the same block to different locations, we use a deterministic tie-breaker (e.g., highest Client ID) to ensure they land in the same spot, then provide a 'User B moved this' toast to the losing user for UX clarity."

3. Solving the "Split-Brain" (Long-term Offline) Problem

The Problem: A user edits on a plane for 10 hours. In the meantime, the document has changed 80%. A blind merge might result in "Interleaving" (sentences mixed together like a salad).

The Interview Answer:
"For high-latency or long-term offline merges, we move from Automatic Merging to Proposed Merging. When the client reconnects, we calculate a 'Diff' between the local state and the server's head.
If the diff exceeds a 'Conflict Threshold' (e.g., >20% of the document or overlapping lines), we fork the offline changes into a 'Side-Branch' or 'Draft Mode'. We then present a 'Merge Review' UI—similar to a Git PR—where the user can 'Accept All' or 'Keep My Version'. This prevents the 'Shredded Text' effect where two people editing the same paragraph results in characters alternating from each user."

Production Gotchas Rahul Has Debugged 🔥

History vs. Collaboration: TipTap's built-in History extension (undo/redo) conflicts with Yjs — it tracks local operations, not CRDT states. Always disable it and use Yjs's built-in undo manager: new Y.UndoManager(yXmlFragment).
Cursor Flickering: Awareness updates fire on every keystroke, causing remote cursors to "flicker." Throttle awareness broadcasts to every 100ms and use CSS transitions for smooth cursor movement.
Memory Leaks: Yjs documents accumulate tombstones (deleted characters are marked, not removed). For long-lived documents, periodically "garbage collect" by creating a fresh snapshot.
Tab Duplication: If a user opens the same doc in two tabs, they'll see themselves as two collaborators with conflicting cursors. Detect this with BroadcastChannel and designate one tab as primary.
Initial Load Blank Flash: The Yjs doc is empty until the WebSocket syncs. Show a skeleton loader, and pre-populate from IndexedDB cache while waiting for the authoritative server state.

Next up: #10: Design a Notification System — push vs pull, real-time badges, notification grouping, and the read/unread state machine. 🔔