Building a Surplus Inventory AI Pricing Recommender

Patrick Donahue · Levelbrook Consulting

The world of enterprise supply chain software for CPG and retail operates on a scale that is difficult to comprehend. We're talking about millions of SKUs, distributed across hundreds of warehouses, with demand signals fluctuating based on seasonality, marketing campaigns, and macroeconomic shifts. Margins are razor-thin. In this environment, surplus inventory—overstock, short-dated products, last season's apparel—isn't just a minor inefficiency; it's a multi-billion dollar liability that directly eats into profitability.

The standard playbook involves blunt-force discounting or liquidation through secondary markets, often at a steep loss. This is a technically interesting problem because a "smarter" approach isn't about finding a single magic discount percentage. It's a high-dimensional optimization challenge. The ideal price for a pallet of yogurt expiring in 45 days depends on current warehouse capacity costs, real-time demand for similar products, the risk of cannibalizing full-price sales of a newer batch, and even the "brand cost" of being seen as a discount-heavy label. This intersection of hard data, probabilistic forecasting, and qualitative business logic is fertile ground for a carefully architected AI-powered system.

Try the interactive demo

System Architecture Notes

Let's sketch out a pragmatic approach using a modern, well-understood stack: TypeScript across the board, with React for the frontend, Express on the backend, and MongoDB as the operational database. This stack offers type safety, a rich ecosystem, and the flexibility needed to handle often-messy enterprise product data.

The Data Model: It's All About the Lots

A common mistake is to model inventory at the SKU level. For surplus, you must think in terms of lots or batches. A thousand units of a product expiring in 90 days are a different problem than a thousand units expiring in 10. A flexible document model in MongoDB is well-suited for this.

We'd structure our core collections something like this:

// InventoryLots Collection
{
  "_id": ObjectId("..."),
  "lot_id": "L202405-A4B7",
  "sku": "YOG-STR-500G",
  "quantity": 1500,
  "warehouse_id": "WH-CENTRAL-01",
  "cost_per_unit": 0.85,
  "received_date": ISODate("2024-05-10T..."),
  "expiration_date": ISODate("2024-08-15T...")
}

// PricingRecommendations Collection
{
  "_id": ObjectId("..."),
  "sku": "YOG-STR-500G",
  "target_lot_ids": ["L202405-A4B7"], // Can target one or more lots
  "recommended_price": 1.29,
  "list_price": 2.49,
  "justification_text": "Recommending a 48% discount to accelerate sell-through. This lot has 45 days until expiration, and current sales velocity is 20% below forecast. This price point is projected to clear 90% of stock in 30 days, avoiding total loss and high disposal fees.",
  "confidence_score": 0.88,
  "status": "pending_review", // 'accepted', 'rejected', 'modified'
  "created_at": ISODate("...")
}

The `PricingRecommendations` collection is key. It's not just a number; it stores the *why* (`justification_text`) and the state of the recommendation, which is crucial for the human-in-the-loop workflow.

The Recommendation Engine: A Hybrid Approach

Simply asking an LLM "What should the price be?" is naive and unreliable. A robust engine is a two-stage process:

Quantitative Prediction: A traditional machine learning model (e.g., a gradient-boosted tree like XGBoost) is trained on historical sales data. Its job is to answer a specific question: "Given features like `days_to_expiration`, `current_inventory_level`, `historical_sales_velocity`, `seasonality_factor`, and `price_discount_percentage`, what is the predicted `sell_through_rate` in the next 30 days?" This model provides the numerical foundation.
Qualitative Synthesis (LLM): The output from the quantitative model is then fed into an LLM as part of a structured prompt. The LLM's role is not to invent a price, but to synthesize the data and generate a human-readable justification. It bridges the gap between statistical output and business context.

Example LLM Prompt Context:
Product: Strawberry Yogurt, 500g (SKU: YOG-STR-500G)
Inventory: 1500 units expiring in 45 days.
Warehouse Cost: $0.02/unit/day.
Quantitative Model Output: A 48% discount ($1.29) yields a 90% sell-through probability.
Task: Generate a pricing recommendation and justification for a supply chain manager. Emphasize urgency and loss avoidance.

This hybrid model plays to the strengths of both technologies: classical ML for reliable regression/classification and LLMs for nuanced, natural language communication.

Real-time UX and Where Things Break

The frontend, built in React, isn't just a static report. It's a decision-support tool. When a manager `Accepts` a recommendation via an API call to the Express backend, that state change needs to be reflected across the system. For a team of managers working on the same inventory, real-time updates are essential to prevent redundant work.

This is a perfect use case for Server-Sent Events (SSE). After a price is accepted, the backend can push an event to all connected clients, which then update their UI state. SSE is a simpler, more pragmatic choice than full WebSockets here, as the communication is primarily one-way from server to client.

At scale, this architecture will creak in predictable places:

Data Latency: The entire system is only as good as its input data. If the inventory data from the central ERP is 24 hours stale, the recommendations will be flawed. Real-time data ingestion pipelines become critical.
Model Training & MLOps: A one-off training script doesn't cut it. The quantitative models need to be regularly retrained on new sales data to combat model drift. This requires a dedicated MLOps pipeline for versioning, training, and deploying models.
LLM Costs and Latency: Generating thousands of recommendations daily via a premium LLM API can get expensive and slow. Caching strategies, batching API calls, and potentially fine-tuning smaller, open-source models for this specific task are necessary optimizations.

Pragmatism and the Human in the Loop

The biggest hurdle in deploying a system like this is not technical; it's trust. A supply chain manager with 20 years of experience will not cede control to a black box, nor should they. This is why the human-in-the-loop angle is non-negotiable.

The LLM-generated `justification_text` is the most important feature in the entire application. It surfaces the underlying data points that led to the recommendation, making it auditable and understandable. The UI must present this justification clearly alongside the core numbers.

Furthermore, every user action—`Accept`, `Reject`, `Modify`—is an invaluable training signal. When a manager overrides a recommendation and enters a different price, that's not a system failure; it's a data point. This feedback loop is gold. It must be captured and used to fine-tune the models, teaching the system about the unwritten rules and expert intuition that aren't present in the raw sales data. The goal is not to build a system that is always right, but one that learns from the experts it's designed to serve.

Closing Reflection

The problem of surplus inventory is a microcosm of the challenges in modern enterprise software. It sits at the messy intersection of large-scale data, complex business logic, and the need for human expertise. Building an effective tool here isn't about finding a perfect algorithm to automate pricing. It's about building a system of intelligence augmentation—a tool that can process vast amounts of data, identify patterns, and present its findings in a clear, actionable way to empower an expert to make a better, faster decision. The synthesis of predictive analytics, generative AI, and a thoughtfully designed user interface is not just a compelling technical architecture; it's the key to building tools that actually get used.