Levelbrook Labs

Building AI Financial Analysis Demo: Notes on Artificial Intelligence for Financial Services and Consulting

Patrick Donahue · Levelbrook Consulting

The intersection of unstructured text and structured quantitative data is one of the most compelling, and difficult, domains for applied AI. Financial services and consulting are built on this junction: analysts read thousands of pages of SEC filings, earnings call transcripts, and market news (unstructured text) to inform their valuation models and forecasts (structured data). The core engineering challenge is not merely to automate this, but to build a system that augments the analyst's ability to discover insights, verify claims, and synthesize a narrative—all while maintaining an exceptionally high bar for correctness.

This problem space is technically interesting because it's not a straightforward application of a single model. It requires a multi-stage, hybrid architecture that blends classic data processing, specialized NLP models, and generative LLMs. Building a proof-of-concept is an exercise in systems design, exploring how to chain these disparate components into a cohesive, responsive, and—most importantly—trustworthy user experience. I recently built a small demo to explore these mechanics firsthand.

Try the interactive demo

The Domain Problem: From Documents to Decisions

An analyst needs to understand the health and trajectory of a company. Their raw materials include:

The human process involves reading the text to find context for the numbers. Why did revenue increase? The MD&A might mention a new product line. Why did margins shrink? The earnings call might reveal supply chain issues. The goal is to synthesize these qualitative insights with the quantitative facts into a coherent report.

An AI system must replicate this synthesis. It needs to parse tables of numbers and understand the nuance of executive commentary. This immediately rules out a naive "feed everything to a GPT" approach. The risk of hallucination is too high, and the ability to trace a conclusion back to a specific sentence in a specific document is non-negotiable.

System Architecture: A Pragmatic Hybrid Approach

A production-grade system for this task is inherently a set of cooperating services. Here’s a breakdown of a potential architecture, using the specified polyglot stack of Python, PHP, and JavaScript, deployed on a cloud platform like AWS or Azure.

1. Data Ingestion & Pre-processing (Python + Cloud Storage)

The first step is a robust ETL pipeline. A Python service running on a schedule (e.g., via AWS Lambda or Azure Functions) would fetch data from sources like the SEC EDGAR API, financial data providers, and news feeds. Documents (10-Ks, transcripts) are stored in an object store like S3 or Azure Blob Storage. Structured financial data is cleaned and inserted into a relational database (e.g., PostgreSQL).

The crucial pre-processing step for text is chunking. A 200-page 10-K is too large for most models. It must be split into logical, semantically-aware chunks (e.g., by section, paragraph) and stored alongside metadata linking back to the source document and page number. This metadata is the foundation of verifiability later on.

2. The AI Core: A Multi-Stage NLP Pipeline (Python + PyTorch/TensorFlow)

This is a set of containerized Python services, each with a specific task. They communicate via internal APIs or a message queue.

3. Application & Presentation Layer (PHP/JS + Real-time UX)

The user-facing application orchestrates the process. While my preference is often Rails for its integrated nature, a PHP backend (using a framework like Laravel or Symfony) is perfectly suited to serve as the API gateway.

The PHP backend would handle user authentication, manage analysis requests, and call the various Python services. For a long-running report generation, it would initiate the job and return a job ID. The client would then poll or connect via a WebSocket/SSE for status updates.

The frontend, built with a modern JavaScript framework like React or Vue, is where the system's value is truly expressed. A static report is not enough. The UI must be interactive:

The data model is key. A central AnalysisReport table would link to users, source documents, and the structured JSON output. The JSON itself needs a well-defined schema to ensure the frontend can reliably parse and render the report, including the crucial source attribution metadata for each piece of generated content.

Where It Breaks at Scale

This architecture has failure points. The vector search can become a bottleneck if not properly indexed. LLM API calls can be slow and expensive; aggressive caching of common queries and pre-generating reports for high-traffic companies is essential. The biggest challenge is the state management of long-running, multi-stage analysis jobs. Using a robust message queue (like RabbitMQ or AWS SQS) and designing idempotent services is critical to ensure that a failure in one stage doesn't corrupt the entire process.

Pragmatic Tradeoffs & The Human-in-the-Loop

A senior engineer's job is to make tradeoffs. In this domain, the primary tension is between automation and correctness.

1. Speed vs. Depth: A real-time, on-demand report is a fantastic UX goal. However, a deep analysis involving multiple large documents and fine-tuned models can take minutes. A pragmatic solution is a tiered approach: provide an instant "headline" summary based on cached or pre-computed data, while running the full, deep analysis in the background and streaming in the detailed sections as they become available.

2. Full Automation vs. Analyst Augmentation: The goal should not be to produce a final report that an analyst blindly forwards to a client. The risk of subtle errors or misinterpretations is too high. Instead, the system should be designed as a "first draft" generator. The UI should include an "edit and verify" mode where a human analyst can review the AI's output, correct inaccuracies, and add their own insights. The system becomes a powerful tool that eliminates 80% of the manual drudgery (finding and collating information), freeing up the analyst to focus on the 20% that requires true human expertise (critical thinking, strategic interpretation).

This human-in-the-loop model is the only responsible way to deploy such technology in high-stakes environments. The system should log which parts of the report were AI-generated and which were human-edited, creating an audit trail and a valuable feedback loop for improving the models over time.

Closing Reflection

Building a system like this is a microcosm of modern software engineering. It requires a deep understanding of data flow, API design, user experience, and cloud infrastructure, all in service of orchestrating sophisticated AI models. The purely technical challenge of making these components work together is significant. But the more profound challenge is philosophical: designing a system that is transparent, verifiable, and ultimately trustworthy. The most successful AI tools in finance and consulting won't be black boxes that claim to have "the answer." They will be glass boxes that empower human experts to find the answer themselves, faster and with greater confidence than ever before.