Building a Content Rights & Royalties Dashboard: Notes on Multi-cloud SaaS for Digital Rights Holders

Patrick Donahue · Levelbrook Consulting

The business of digital media isn't just about creation; it's about distribution. A modern rights holder—be it a film studio, music label, or book publisher—doesn't sell to a single marketplace. They license content to a constellation of platforms: Netflix on AWS, YouTube on GCP, Apple Music on their own infrastructure, and dozens of smaller players, each with its own cloud, API, and data reporting format. This fragmentation creates a significant technical challenge: how do you build a single, coherent view of your rights, revenues, and restrictions across this heterogeneous, multi-cloud landscape?

This isn't a simple ETL problem. The data isn't just different in format; it's different in structure and cadence. Royalty reports can be real-time API streams, monthly CSV uploads via SFTP, or even quarterly PDF statements. The underlying rights contracts are complex legal documents, not standardized database records. Building a SaaS dashboard to unify this chaos is a fascinating engineering problem that sits at the intersection of data integration, domain modeling, and human-computer interaction.

An Architectural Blueprint

To tackle this, we need a stack that is flexible at the data layer, efficient at the integration layer, and highly reactive at the presentation layer. A pragmatic choice for a new system would be MongoDB for data, Python for the backend, and a modern JavaScript framework like Vue.js for the frontend.

Data Model: Embracing Complexity with MongoDB

Relational databases struggle with the fractal complexity of media rights. A single film can have hundreds of distinct, overlapping rights grants. Trying to normalize this into rigid tables is a path to unmaintainable JOINs. A document-oriented database like MongoDB is a natural fit. We can model a `Right` as a single, rich document that encapsulates its core attributes.

A simplified `Right` document might look like this:

{
  "_id": ObjectId("..."),
  "content_id": ObjectId("..."), // FK to the "Content" collection
  "grantee": "StreamingPlatform-A",
  "territories": ["US", "CA", "MX"], // ISO 3166-1 alpha-2
  "term": {
    "start": ISODate("2024-01-01T00:00:00Z"),
    "end": ISODate("2026-12-31T23:59:59Z")
  },
  "exclusivity": "exclusive", // 'exclusive', 'non-exclusive', 'sole'
  "media_formats": ["streaming", "svod"],
  "royalty_terms": {
    "type": "revenue_share",
    "rate": 0.25, // 25% of net revenue
    "reporting_cadence": "monthly"
  },
  "source_document_id": ObjectId("..."), // Link to the original contract file
  "verification": {
    "status": "pending_review", // 'pending_review', 'verified'
    "verified_by": null,
    "verified_at": null
  }
}

This structure allows for flexible queries: "Show me all exclusive streaming rights expiring in the next 90 days in Europe" or "Find all content where we don't have active rights for SVOD in Brazil." The schema can evolve without disruptive migrations, accommodating new and esoteric deal structures as they arise.

Backend: Python, Connectors, and AI-Assisted Ingestion

The backend's primary job is to be an integration workhorse and an intelligence layer. Python, with frameworks like FastAPI or Django, excels here due to its mature ecosystem for data manipulation (Pandas), HTTP requests (HTTPX, Requests), and machine learning.

The core of the backend is a set of "Connectors," one for each external platform. Each connector is responsible for authenticating, fetching royalty data, and normalizing it into a standard internal format before storing it. Some connectors will hit REST APIs, some will poll S3 buckets, and others might even require browser automation for legacy portals.

This is where we can pragmatically apply AI. Many rights are still defined in scanned PDF contracts. We can use an LLM (e.g., via the OpenAI API or a self-hosted model) to perform structured data extraction. An ingestion pipeline would take an uploaded PDF, use the LLM to parse it into the JSON schema shown above, and create a `Right` document with a `verification.status` of `pending_review`. This transforms the human task from tedious data entry to efficient verification.

Frontend: A Reactive Dashboard with Vue.js, Pinia, and TypeScript

The user of this system is a rights manager, not a data scientist. They need an intuitive, real-time interface to explore this complex data. A component-based framework like Vue.js, combined with TypeScript for type safety, is ideal.

State management is critical. Using a store like Pinia allows us to maintain a central, reactive state for filters, user settings, and cached data from the API. When a user filters by territory, components across the application (a map, a data table, a timeline) should all react instantly without complex prop-drilling.

For real-time updates, such as the status of an ongoing royalty report ingestion, Server-Sent Events (SSE) provide a simple and efficient mechanism. The backend can push progress updates to the client, which the Vue app uses to update a status indicator without the overhead of a full WebSocket connection.

At scale, performance bottlenecks shift from the database to the frontend. Rendering a table with 100,000 royalty line items will crash a browser. The solution is virtual scrolling on the frontend and intelligent pagination and aggregation on the backend. The API shouldn't just serve raw data; it should provide pre-aggregated endpoints for dashboard widgets (e.g., `GET /api/v1/royalties/summary?period=monthly&territory=US`).

Pragmatism, Correctness, and the Human in the Loop

A senior engineer knows that "perfect" is the enemy of "shipped." In a system dealing with financial data from dozens of unreliable sources, we must design for imperfection.

Embrace Eventual Consistency: The dashboard is a "single pane of glass," but the view through it has varying levels of freshness. Data from a real-time API is seconds old; data from a partner's monthly PDF is weeks old. The UI must be transparent about this, clearly labeling the "as of" timestamp for every piece of data. Attempting to force transactional consistency across dozens of external systems is a fool's errand.

Idempotency is Non-Negotiable: A connector that processes royalty payments must be idempotent. If it runs twice on the same monthly report due to a network error or a retry, it cannot result in duplicate payment records. This is typically handled by assigning a unique, deterministic ID to each source transaction and using that as a key for insertion.

The LLM is an Assistant, Not an Oracle: The most critical design principle is the human-in-the-loop workflow for AI-processed data. The LLM is a powerful tool for reducing drudgery, but it will make mistakes. It might misinterpret a clause or hallucinate a date. Therefore, every single piece of data extracted by an LLM *must* be presented to a human expert for validation in a purpose-built UI. The system must maintain a full audit trail: who verified the data, when they verified it, and what changes they made. The goal is intelligence augmentation, not blind automation.

Closing Reflection

Building a system like this is compelling because the primary challenge isn't algorithmic complexity or raw performance, but rather taming real-world messiness. It requires a holistic approach, connecting robust backend integrations, a flexible data model, a highly usable frontend, and the judicious application of new technologies like LLMs. The goal is to build a system that doesn't just store data, but creates clarity from chaos, ultimately allowing rights holders to make better decisions in an increasingly fragmented digital world.