Levelbrook Labs

Building a Payroll Processing Demo: Notes on Payroll

I build proof-of-concept demos. It's how I explore complex problem spaces and validate architectural ideas. Recently, I turned my attention to payroll processing—a domain that, from the outside, appears to be a solved problem of simple arithmetic. The reality is a fascinating intersection of distributed systems, state management, and stringent requirements for correctness. It's a perfect subject for a technical deep dive.

The Domain: More State Machine Than Spreadsheet

The core function of payroll seems trivial: gross_pay = hourly_rate * hours_worked. But this is the tip of a large, complex iceberg. The real work lies in the transformation from gross to net pay, a process governed by a staggering number of rules and temporal dependencies.

Consider the inputs for a single employee's paycheck:

This isn't just a calculation; it's a stateful process. An employee's year-to-date (YTD) earnings affect things like the Social Security tax wage base limit. A change in their home address mid-pay-period can alter their state and local tax obligations. The system must not only perform the calculation correctly but do so based on a precise snapshot of reality at a specific point in time. Getting it wrong has real financial and legal consequences.

This makes payroll less of a CRUD application and more of a durable, long-running workflow execution. This distinction is the key to a robust architecture.

Architecting a Resilient Workflow

To model this problem, I designed a system around the concept of a durable workflow. The goal is to make the process fault-tolerant, observable, and capable of handling the inevitable delays and failures of interacting with external systems (like banks and tax APIs).

The stack for this demo is Python and Django for the API and data persistence, React with TypeScript for the admin UI, and Postgres as the database. The critical component, however, is a workflow orchestrator like Temporal. Infrastructure is managed with Terraform on AWS.

The Central Role of a Workflow Engine

A payroll run is not an atomic transaction. It can take hours or even days from initiation to settlement. It involves multiple steps, some of which depend on external actors or specific times of day (e.g., ACH batch windows). Trying to manage this with database status fields and cron jobs is a well-known path to building a fragile, unmaintainable system.

A workflow engine like Temporal allows us to define the entire payroll run as a single, durable function. Here's a conceptual Python-based workflow definition:


@workflow.defn
class PayrollRunWorkflow:
    @workflow.run
    async def run(self, company_id: str, pay_period_id: str) -> str:
        # Step 1: Lock pay period and calculate for all employees
        # Each calculation is an "Activity" - a unit of work that can be retried.
        draft_paystubs = await workflow.execute_activity(
            activities.calculate_all_paystubs,
            args=[company_id, pay_period_id],
            start_to_close_timeout=timedelta(minutes=30)
        )

        # Step 2: Pause the workflow and wait for human approval.
        # The workflow will sleep here indefinitely until a signal is received.
        await workflow.wait_for_condition(
            lambda: self.approved is True, timeout=timedelta(days=5)
        )

        # Step 3: Approval received. Proceed with payments.
        await workflow.execute_activity(
            activities.initiate_ach_transfers,
            args=[draft_paystubs],
            start_to_close_timeout=timedelta(hours=2),
            idempotency_key=f"payroll-run-{pay_period_id}"
        )

        # Step 4: Finalize and notify.
        await workflow.execute_activity(
            activities.finalize_run_and_notify,
            args=[pay_period_id]
        )
        return "COMPLETED"

    @workflow.signal
    def approve_run(self):
        self.approved = True
            

This approach makes the process state explicit and durable. If the server running this code crashes between calculating paystubs and waiting for approval, Temporal ensures that when a worker comes back online, the workflow resumes from exactly where it left off. This is a game-changer for reliability.

Data Modeling for Correctness

The Postgres data model must prioritize immutability and auditability. When a payroll run is executed, you can't rely on the current state of an Employee record. What if their salary was changed yesterday? Which salary applies?

The solution is to snapshot all relevant data for the run. The Paystub model shouldn't just have foreign keys; it should contain a complete record of the inputs used for its calculation:

This makes each paystub a self-contained, auditable artifact. It costs more in storage, but the gain in correctness and debuggability is non-negotiable for a financial system.

The Human in the Loop and UX

A fully automated payroll system is a terrifying prospect. The cost of an error is too high. The architecture must include a "human-in-the-loop" design pattern, where the system performs calculations and then pauses for review and explicit approval from a payroll administrator.

This is where the React/TypeScript frontend comes in. It's not just a form to kick off the run; it's a dashboard for observing the workflow's progress and a tool for intervention.

To provide real-time feedback, the backend can use Server-Sent Events (SSE). As the Temporal workflow completes activities (e.g., "Calculated pay for 250 of 1000 employees"), the activity worker can publish an event. The Django backend pushes this event over an SSE connection to the React client, which updates a progress bar. This transforms the user experience from a black box into a transparent, observable process.

The "Approve Payroll" button in the UI doesn't just flip a boolean in a database. It calls an API endpoint that sends a signal to the waiting Temporal workflow (as seen in the approve_run method above), causing it to resume execution. This clean separation of concerns between the UI action and the workflow state is a key benefit.

Scaling and Edge Cases

Where does this break? At scale, the challenges shift from individual calculations to system-wide throughput and resilience.

A Reflection on the Problem

Building a system like this is a reminder that some of the most interesting engineering problems aren't in novel algorithms but in the careful, pragmatic assembly of reliable systems for critical, real-world processes. Payroll is a domain defined by its constraints: correctness is absolute, timing is unforgiving, and the state is complex and ever-changing. Architecting for these constraints forces a level of discipline and foresight that goes far beyond typical application development. It's a problem space where the quality of the engineering has a direct and meaningful impact on people's lives.