Levelbrook Labs

Building a Collaborative Whiteboard: Notes on State Synchronization and User Experience

A collaborative whiteboard seems simple. Users draw shapes, and other users see them. But this apparent simplicity hides a classic distributed systems problem: maintaining a consistent shared state across multiple clients with varying network latency. It's not about rendering graphics; it's about state synchronization, conflict resolution, and delivering a user experience that feels instantaneous, even when it isn't.

This problem is technically interesting because it forces a direct confrontation with the trade-offs between consistency, availability, and latency. The goal is to create a system where the final state of the canvas is identical for all participants, regardless of the order in which their operations arrive at the server. This is the essence of eventual consistency, but with the added constraint that the "eventual" must feel like "now".

An Architecture with Rails and Hotwire

While a client-heavy SPA framework might seem like the default choice, a modern server-centric stack like Ruby on Rails with Hotwire (specifically, Action Cable and Turbo Streams) is surprisingly well-suited for this challenge. It allows us to keep the authoritative state and business logic on the server, simplifying the client significantly.

The Data Model: An Event Sourced Approach

The naive approach is to model the shapes themselves. A better model is to treat the canvas as a sequence of operations. This event-sourcing pattern is robust and provides a full history for features like undo/redo or replaying a session.

The state is represented by two primary models:

# A simplified Rails model structure

class WhiteboardElement < ApplicationRecord
  # whiteboard_id, element_uuid, type, data (jsonb)
end

class WhiteboardOperation < ApplicationRecord
  # whiteboard_id, user_id, element_uuid, action, payload (jsonb)
end

When a client connects, we serve the full set of `WhiteboardElement` records. From then on, we only push `WhiteboardOperation` deltas over the wire.

Real-time UX with Action Cable and Stimulus

The real-time flow is orchestrated by an Action Cable channel that each client subscribes to for a specific whiteboard.

  1. Client Action: A user finishes drawing a line. A Stimulus JS controller captures the raw path data (an array of points). To avoid flooding the server, we don't send every `mousemove` event; we batch the points and send a single `create_path` operation on `mouseup`.
  2. Optimistic Update: The Stimulus controller immediately renders the line on the local user's canvas. This is crucial for a zero-latency feel. The line might be rendered in a temporary state (e.g., slightly transparent) until confirmed by the server.
  3. Send to Server: The controller sends the operation data (e.g., `{ action: 'create', type: 'path', payload: { ... } }`) to the server via the Action Cable WebSocket connection.
  4. Server Processing: The Rails channel receives the operation. It validates it, creates a `WhiteboardOperation` record, and applies the change to the corresponding `WhiteboardElement`. This entire block should be transactional.
  5. Broadcast: Upon successful persistence, the channel broadcasts the operation to all *other* subscribers of the whiteboard channel using Turbo Streams. The original sender does not need this broadcast, as they already performed an optimistic update.
  6. Client Reception: The other clients' Stimulus controllers receive the broadcasted operation. They parse the payload and draw the new element on their respective canvases.

Where Things Break at Scale

This architecture is robust for dozens, even hundreds, of concurrent users. The primary bottleneck becomes database write contention on the `whiteboard_operations` table. For a truly massive-scale application (thousands of concurrent editors on one board), you'd likely graduate from PostgreSQL as your primary event bus. You might introduce an intermediate layer like Redis Streams or Kafka to handle the high-throughput stream of incoming operations, with workers that consume from this stream and persist to the database asynchronously. However, for 99% of use cases, a well-indexed Postgres table is more than sufficient.

Pragmatic Tradeoffs and Correctness

Building a perfect, conflict-free system is an academic exercise. Building a useful one involves pragmatic tradeoffs.

Conflict Resolution: What if two users move the same element simultaneously? The simplest and often most effective strategy is Last Write Wins (LWW). The server processes operations serially. Whichever operation is persisted last becomes the new state. For a whiteboard, this is usually acceptable. The visual feedback is immediate, and humans are good at socially resolving minor conflicts ("Oops, sorry, you move it."). Implementing more complex strategies like CRDTs or operational transforms adds significant complexity for a benefit that may not be perceived by the user.

Undo/Redo: The event log makes undo possible. A simple per-user undo stack can be managed on the client. When a user hits "undo," the client finds their last operation and sends a compensating operation to the server (e.g., a `delete` for a `create`). A shared, global undo is vastly more complex and often confusing for users. Sticking to a user-scoped undo is a key pragmatic choice.

The Human-in-the-Loop: The most important aspect is that the system must be intelligible to the user. If an optimistic update fails (e.g., due to a server-side validation error), the temporary shape must be removed from the local canvas with clear feedback. The system's correctness is ultimately in service of the user's mental model. If the canvas state occasionally deviates for a second before snapping into its consistent state, that's often a better user experience than a system that locks up waiting for perfect consensus.

Reflection

The collaborative whiteboard problem is a fantastic microcosm of modern web application development. It elegantly ties together database design, real-time communication protocols, and front-end user experience. It serves as a reminder that the most challenging problems aren't always about raw computational power, but about orchestrating state across a distributed system where the most critical nodes are the humans staring at the screens.

Choosing a stack like Rails and Hotwire doesn't just work; it provides a clear, coherent model for reasoning about this state management. By keeping the authority on the server and using lightweight client-side controllers for UX polish, we can build incredibly powerful, real-time features without drowning in client-side complexity.