Building an AI Fraud Detection Interface: Notes on Fintech / Financial Crime Prevention

Patrick Donahue · Levelbrook Consulting

Financial crime prevention is one of those domains where the engineering stakes are exceptionally high. It's not a theoretical exercise; it’s an adversarial, real-time race against motivated actors. The core challenge is a signal-to-noise problem at planetary scale. A system must sift through billions of legitimate transactions to find the handful that represent theft, money laundering, or other illicit activities. The technical interest lies in the tension between speed, scale, and—most critically—correctness. A false positive isn't just a statistical error; it's a frozen account, a missed rent payment, a ruined vacation. A false negative is stolen funds. This is a domain where the human-in-the-loop isn't a fallback; it's the entire point.

Architecting for Investigation

Building a modern fraud detection platform requires a hybrid, polyglot architecture. No single tool solves the problem. The goal is to funnel massive streams of data into a coherent, actionable interface for a human analyst. Here’s a pragmatic sketch of how the pieces fit together.

The Data Spine: Tiered Storage for Different Temperatures

The data workload is heterogeneous. We have a firehose of raw events, a structured core of relational data, and a massive historical corpus for analysis.

Hot Path (Ingestion): For raw transaction and user behavior events (logins, password changes, device fingerprinting), you need extreme write throughput. A system like Google's Bigtable is ideal. Its wide-column model allows for flexible schemas to capture evolving event types, and it scales horizontally to handle millions of events per second. The key here is write-once, read-many, with access patterns typically based on a user or entity ID over a time range.
Warm Path (State & Cases): The core "source of truth"—user accounts, case management state, analyst assignments, and audit logs—demands ACID compliance. This is non-negotiable. A managed Postgres instance is the workhorse here. It holds the canonical state of an investigation and provides the relational integrity needed to ensure an analyst's actions are correctly and durably recorded.
Cold Path (Analytics & Model Training): To train detection models and run large-scale analytical queries (e.g., "show me all transactions linked to this device hash over the last year"), you need a data warehouse. BigQuery is a natural fit, allowing for complex aggregations over petabytes of historical data ingested from Bigtable. This is where data scientists, armed with Python and ML libraries, build and validate the models that generate the initial alerts.

Services & The Real-Time Layer

The backend is a set of specialized services. A one-size-fits-all monolith would crumble.

Ingestion & Rules Engine: A high-performance service written in Go can consume from the event stream (e.g., Kafka) and perform initial, low-latency checks. This service writes raw events to Bigtable and runs a preliminary set of deterministic rules. Sometimes, large financial institutions have legacy rules engines written in Java, so this Go service might act as a modernizing facade.
Dashboard API: A Node.js service using TypeScript is a strong choice for the API that powers the analyst dashboard. It can efficiently orchestrate calls to Postgres (for case data) and Bigtable (for user event history) and manage the real-time connection to the client.
The AI "Co-pilot" Service: This is where the LLM integration happens. A Python service can take a structured set of data for a given alert (transaction details, user history, related accounts, output from a traditional risk score model) and format it into a prompt for a model like Claude. The goal isn't for the LLM to *detect* fraud, but to *synthesize and explain* the risk factors in natural language for the human analyst.

"This $4,500 transaction to a new merchant is anomalous. The user's IP address originates from a different country than their last 100 sessions, and the device fingerprint has not been seen before. This transaction pattern matches two other confirmed fraud cases in the last 72 hours."

This summary is an LLM's output. It transforms raw data points into a coherent narrative, dramatically reducing the analyst's cognitive load.

The Interface: A Real-Time Investigation Canvas

The frontend, built in React with TypeScript, is more than a dashboard; it's a live workspace. When a high-priority alert is triaged, it should be pushed to the analyst's screen instantly via WebSockets or Server-Sent Events (SSE). The UI must present a dense but clear picture: the suspicious transaction, the AI-generated summary, a timeline of the user's recent activity, and visualizations of linked entities (e.g., other accounts using the same device ID).

Every UI component—the transaction list, the user history panel, the case notes editor—is a candidate for real-time updates. As one analyst leaves a note, another should see it appear without a page refresh. This is critical for collaborative investigation teams.

Infrastructure for this entire stack would be managed via Terraform, ensuring environments are reproducible and scalable.

Pragmatic Tradeoffs and The Human Factor

The most senior engineering decision here is to optimize for analyst efficacy, not pure automation.

Correctness over Premature Action: The system should default to flagging for review, not automated blocking. The cost of a false positive is often higher than the cost of a few minutes of analyst review. The interface must make it trivial to see *why* an alert was generated, allowing an analyst to quickly dismiss false positives.

The Feedback Loop is Everything: Every action an analyst takes ("Confirm Fraud," "False Positive," "Add to Watchlist") is a valuable training signal. The `AnalystAction` event must be captured with rich context and fed back into the BigQuery dataset. This closes the loop, allowing the next generation of models to learn from the human expert's decision. Without this, the models will stagnate.

Where it Breaks at Scale: The primary failure mode at scale is analyst overload. If the models are too noisy, the alert queue becomes a firehose, and important signals are lost. The system must therefore include robust alert prioritization logic. Another failure point is data latency. If the pipeline from event capture to the analyst's screen takes minutes instead of seconds, the fraud has already happened. The hot path (Go, Bigtable, WebSockets) must be obsessively optimized for low latency.

A Reflection on the Problem

Building systems for financial crime prevention is a compelling challenge. It sits at the intersection of distributed systems engineering, applied machine learning, and human-computer interaction. The objective is not to build an infallible "AI judge" but to construct a powerful tool that augments the intuition and expertise of a human investigator. The ultimate measure of success is not how many alerts the system can generate, but how effectively it empowers a person to make a fast, correct, and fair decision under pressure.