HomeAbout MeThe LabThe RadarThe Toolkit
Back to The Lab
Build-to-Show2026-04-05

Anatomy of a Demand Forecasting Pipeline

A step-by-step walkthrough of building a demand forecasting pipeline — from exploratory analysis to predictive modeling — using a public chocolate sales dataset to demonstrate the methodology.

Demand ForecastingMachine LearningPythonTime SeriesFeature Engineering
Anatomy of a Demand Forecasting Pipeline

Note: This case study uses a public dataset (global chocolate sales, 2022–2024) to walk through the full demand forecasting workflow. The techniques and thinking apply directly to enterprise-scale supply chain forecasting — the kind of work I do professionally across global logistics networks.

The Problem

Every supply chain runs on a forecast. Get it right and you optimize inventory, reduce waste, and keep customers happy. Get it wrong and you're either sitting on excess stock or scrambling to fill shortages.

Most teams rely on historical sales data and a time-series model. That's a reasonable starting point — but it raises a critical question: what's actually driving your demand, and how much of it can a model learn?

This project dissects a demand forecasting pipeline end-to-end — from raw data to predictive model to actionable insight — to answer that question.

The Dataset

To demonstrate the workflow without proprietary data, I used a publicly available chocolate sales dataset covering:

  • 3,283 transactions across 6 countries (Australia, India, UK, USA, Canada, New Zealand)
  • 22 product SKUs across 5 categories (Specialty, Classic, Dark Chocolate, Premium, Baking & Beverages)
  • Time span: July 2022 – December 2024
  • Features: Sales person, country, product, date, revenue, boxes shipped

It's a simplified version of a real distribution network — enough to demonstrate the methodology without the complexity (or confidentiality) of enterprise data.

The Approach

Step 1: Understand the Signal Landscape

Before touching any model, I mapped the data to understand what signals exist:

  • Temporal patterns — Monthly revenue trends revealed clear seasonality, with Q4 and January showing distinct peaks
  • Geographic variance — Performance varied dramatically by country, with Australia leading at ~18% of total revenue despite not being the largest market by transaction count
  • Product mix dynamics — Specialty products dominated (39% of transactions) but Premium showed higher per-unit value
  • Sales channel effects — Individual salesperson performance varied by 3–5x, suggesting route-to-market is itself a demand signal

This step is critical in real engagements. Most teams jump straight to modeling. But understanding your signal landscape tells you what the model can learn from — and more importantly, what's missing.

Step 2: Feature Engineering

Raw data rarely predicts well. The transformation layer is where domain expertise matters most.

Time-based features extracted:

  • Year, month, quarter, day of week, week of year
  • These capture seasonality, weekly buying patterns, and year-over-year trends

Derived metrics:

  • Price per box (revenue / boxes shipped) — reveals pricing strategy effects on demand
  • Product category groupings — reduces 22 SKUs to 5 meaningful segments

Encoded categorical signals:

  • Country, product, salesperson — each encoded to capture their individual demand influence

Note that this dataset only contains internal signals. In a production pipeline, this is where you'd also layer in external signals — weather data, economic indicators, event calendars, promotional schedules. That enrichment step is often where the biggest accuracy gains come from, and I discuss it further in the next steps section below.

Step 3: Predictive Modeling

I tested two approaches to predict transaction-level revenue:

Random Forest Regression

  • Ensemble of decision trees, robust to non-linear relationships
  • Handles mixed feature types (categorical + numerical) naturally
  • Result: R² = 0.61, RMSE of ~$2,808

Gradient Boosting Regression

  • Sequential error-correction approach
  • Often stronger on tabular data
  • Result: R² = 0.10, RMSE of ~$3,886

The Random Forest significantly outperformed here — which itself is an insight. The demand patterns in this data are non-linear and interaction-heavy (country × product × time combinations matter more than any single feature), which is exactly where tree-based ensembles excel.

Cross-validation confirmed stability: RF achieved 0.61 ± 0.07 across folds, meaning the model generalizes rather than overfitting to the training set.

Step 4: What Drives Demand?

Feature importance analysis revealed the hierarchy of demand drivers:

Feature Importance What It Means
Boxes Shipped 28.1% Volume is the strongest revenue predictor (unsurprising but validates data quality)
Sales Person 20.8% Who sells matters — channel and relationship effects are real
Product 19.4% Product mix drives revenue variance
Country 12.5% Geographic market differences are significant
Month 9.8% Seasonality matters, but less than structural factors
Day of Week 5.7% Weekly patterns exist but are secondary
Quarter 2.1% Captured mostly by month already
Year 1.6% Minimal trend effect in this timeframe

The insight for supply chain teams: Structural factors (who, what, where) explain ~80% of demand variance. Temporal factors (when) explain ~20%. This means your master data quality and segmentation strategy matter more than your time-series model sophistication — a finding I've seen consistently in enterprise environments.

Step 5: Product Classification

Beyond forecasting revenue, I built a classifier to predict which product category a transaction belongs to, based on transaction characteristics.

  • Random Forest Classifier: 87% accuracy, 95.5% cross-validated
  • Best performance on Specialty and Dark Chocolate categories
  • Weaker on Baking & Beverages (smallest category, fewer training examples)

Why this matters operationally: Accurate product classification enables automated inventory allocation, demand sensing by category, and dynamic assortment planning — all critical for multi-product distribution networks.

Business Impact: What This Means in Dollars

Technical metrics like R² and RMSE matter to data scientists. Supply chain leaders need to know: what does better forecasting actually save?

Here's how the findings from this pipeline translate to real operational impact:

Forecast Accuracy → Inventory Reduction

Industry research consistently shows that a 1% improvement in forecast accuracy reduces inventory by 1–2% (Gartner, McKinsey). Our Random Forest model achieved R² = 0.61 compared to a naive baseline — in a real deployment, even modest accuracy gains compound across thousands of SKUs.

For a distributor doing $20M in annual revenue (the scale of this dataset):

  • 10% improvement in forecast accuracy → ~$200K–400K in freed working capital from inventory reduction
  • Reduced safety stock buffers → lower warehousing costs, less product expiry, fewer markdowns
  • Better allocation across 6 markets → fewer stockouts in high-demand regions, less excess in low-demand ones

Structural Insights → Smarter Segmentation

The feature importance analysis revealed that who sells, what product, and where matter 4x more than when. This has direct implications:

Insight Operational Action Estimated Impact
Salesperson explains 20.8% of variance Align top performers to high-value accounts; train or reassign underperformers 5–15% revenue lift from channel optimization
Product mix drives 19.4% of variance Tailor assortment by market instead of uniform distribution 10–20% reduction in slow-moving inventory
Geography accounts for 12.5% of variance Differentiate safety stock and reorder points by country 15–25% reduction in regional stockouts
Seasonality is only 9.8% Stop over-investing in seasonal models; fix your master data first Redirected analytics effort toward higher-ROI problems

Product Classification → Operational Automation

The 87% classification accuracy enables:

  • Automated routing of new products to the correct inventory pool — eliminating manual categorization delays
  • Dynamic demand sensing by category — detecting shifts from Premium to Classic (or vice versa) in near real-time
  • Assortment planning — data-driven decisions about which categories to expand or contract in each market

The Compounding Effect

These aren't isolated improvements. Better forecasting reduces inventory, which frees cash, which funds better data infrastructure, which improves the next model. In my experience across enterprise supply chains, teams that invest in this pipeline see cumulative cost reductions of 8–15% of total supply chain spend within 12–18 months.

The key is not the model — it's the systematic approach: understand your signals, engineer the right features, validate rigorously, and let the data tell you where to invest next.

Evolving This Pipeline

This demo covers the core workflow with internal data. In a production environment, the next evolution steps would be:

External signal enrichment:

  • Weather data by region (chocolate demand is temperature-sensitive)
  • Holiday and event calendars by country
  • Economic indicators (consumer spending indices, inflation data)
  • Promotional calendars and pricing changes

Advanced temporal modeling:

  • Prophet or NeuralProphet for trend + seasonality decomposition
  • LSTM or Temporal Fusion Transformer for sequence learning
  • Hierarchical reconciliation across country → category → SKU levels

Operational hardening:

  • Probabilistic forecasting (prediction intervals, not just point estimates)
  • Automated retraining pipelines with drift detection
  • A/B testing framework for forecast method selection

Each of these layers builds on the foundation demonstrated here. The pipeline structure stays the same — the signals and models get richer.

Key Takeaways

  1. Feature engineering > model complexity. A well-featured Random Forest beat a more sophisticated Gradient Boosting model because the features captured the right signals.

  2. Structural factors dominate. Product, geography, and channel explain 4x more variance than time-based features. Invest in master data quality before investing in fancier algorithms.

  3. Know your ceiling. An R² of 0.61 with only internal data tells you where the model's limits are — and points you toward external signals as the next source of lift.

  4. Start simple, validate, then add complexity. This workflow — EDA → feature engineering → baseline model → enrichment — works at any scale. The dataset size changes; the methodology doesn't.


Built with Python (pandas, scikit-learn, matplotlib) using a publicly available dataset. The methodology reflects approaches I apply professionally in enterprise supply chain environments.

EH
Esther Ho
AI x Supply Chain