Note: This case study uses a public dataset (global chocolate sales, 2022–2024) to walk through the full demand forecasting workflow. The techniques and thinking apply directly to enterprise-scale supply chain forecasting — the kind of work I do professionally across global logistics networks.
The Problem
Every supply chain runs on a forecast. Get it right and you optimize inventory, reduce waste, and keep customers happy. Get it wrong and you're either sitting on excess stock or scrambling to fill shortages.
Most teams rely on historical sales data and a time-series model. That's a reasonable starting point — but it raises a critical question: what's actually driving your demand, and how much of it can a model learn?
This project dissects a demand forecasting pipeline end-to-end — from raw data to predictive model to actionable insight — to answer that question.
The Dataset
To demonstrate the workflow without proprietary data, I used a publicly available chocolate sales dataset covering:
- 3,283 transactions across 6 countries (Australia, India, UK, USA, Canada, New Zealand)
- 22 product SKUs across 5 categories (Specialty, Classic, Dark Chocolate, Premium, Baking & Beverages)
- Time span: July 2022 – December 2024
- Features: Sales person, country, product, date, revenue, boxes shipped
It's a simplified version of a real distribution network — enough to demonstrate the methodology without the complexity (or confidentiality) of enterprise data.
The Approach
Step 1: Understand the Signal Landscape
Before touching any model, I mapped the data to understand what signals exist:
- Temporal patterns — Monthly revenue trends revealed clear seasonality, with Q4 and January showing distinct peaks
- Geographic variance — Performance varied dramatically by country, with Australia leading at ~18% of total revenue despite not being the largest market by transaction count
- Product mix dynamics — Specialty products dominated (39% of transactions) but Premium showed higher per-unit value
- Sales channel effects — Individual salesperson performance varied by 3–5x, suggesting route-to-market is itself a demand signal
This step is critical in real engagements. Most teams jump straight to modeling. But understanding your signal landscape tells you what the model can learn from — and more importantly, what's missing.
Step 2: Feature Engineering
Raw data rarely predicts well. The transformation layer is where domain expertise matters most.
Time-based features extracted:
- Year, month, quarter, day of week, week of year
- These capture seasonality, weekly buying patterns, and year-over-year trends
Derived metrics:
- Price per box (revenue / boxes shipped) — reveals pricing strategy effects on demand
- Product category groupings — reduces 22 SKUs to 5 meaningful segments
Encoded categorical signals:
- Country, product, salesperson — each encoded to capture their individual demand influence
Note that this dataset only contains internal signals. In a production pipeline, this is where you'd also layer in external signals — weather data, economic indicators, event calendars, promotional schedules. That enrichment step is often where the biggest accuracy gains come from, and I discuss it further in the next steps section below.
Step 3: Predictive Modeling
I tested two approaches to predict transaction-level revenue:
Random Forest Regression
- Ensemble of decision trees, robust to non-linear relationships
- Handles mixed feature types (categorical + numerical) naturally
- Result: R² = 0.61, RMSE of ~$2,808
Gradient Boosting Regression
- Sequential error-correction approach
- Often stronger on tabular data
- Result: R² = 0.10, RMSE of ~$3,886
The Random Forest significantly outperformed here — which itself is an insight. The demand patterns in this data are non-linear and interaction-heavy (country × product × time combinations matter more than any single feature), which is exactly where tree-based ensembles excel.
Cross-validation confirmed stability: RF achieved 0.61 ± 0.07 across folds, meaning the model generalizes rather than overfitting to the training set.
Step 4: What Drives Demand?
Feature importance analysis revealed the hierarchy of demand drivers:
| Feature | Importance | What It Means |
|---|---|---|
| Boxes Shipped | 28.1% | Volume is the strongest revenue predictor (unsurprising but validates data quality) |
| Sales Person | 20.8% | Who sells matters — channel and relationship effects are real |
| Product | 19.4% | Product mix drives revenue variance |
| Country | 12.5% | Geographic market differences are significant |
| Month | 9.8% | Seasonality matters, but less than structural factors |
| Day of Week | 5.7% | Weekly patterns exist but are secondary |
| Quarter | 2.1% | Captured mostly by month already |
| Year | 1.6% | Minimal trend effect in this timeframe |
The insight for supply chain teams: Structural factors (who, what, where) explain ~80% of demand variance. Temporal factors (when) explain ~20%. This means your master data quality and segmentation strategy matter more than your time-series model sophistication — a finding I've seen consistently in enterprise environments.
Step 5: Product Classification
Beyond forecasting revenue, I built a classifier to predict which product category a transaction belongs to, based on transaction characteristics.
- Random Forest Classifier: 87% accuracy, 95.5% cross-validated
- Best performance on Specialty and Dark Chocolate categories
- Weaker on Baking & Beverages (smallest category, fewer training examples)
Why this matters operationally: Accurate product classification enables automated inventory allocation, demand sensing by category, and dynamic assortment planning — all critical for multi-product distribution networks.
Business Impact: What This Means in Dollars
Technical metrics like R² and RMSE matter to data scientists. Supply chain leaders need to know: what does better forecasting actually save?
Here's how the findings from this pipeline translate to real operational impact:
Forecast Accuracy → Inventory Reduction
Industry research consistently shows that a 1% improvement in forecast accuracy reduces inventory by 1–2% (Gartner, McKinsey). Our Random Forest model achieved R² = 0.61 compared to a naive baseline — in a real deployment, even modest accuracy gains compound across thousands of SKUs.
For a distributor doing $20M in annual revenue (the scale of this dataset):
- 10% improvement in forecast accuracy → ~$200K–400K in freed working capital from inventory reduction
- Reduced safety stock buffers → lower warehousing costs, less product expiry, fewer markdowns
- Better allocation across 6 markets → fewer stockouts in high-demand regions, less excess in low-demand ones
Structural Insights → Smarter Segmentation
The feature importance analysis revealed that who sells, what product, and where matter 4x more than when. This has direct implications:
| Insight | Operational Action | Estimated Impact |
|---|---|---|
| Salesperson explains 20.8% of variance | Align top performers to high-value accounts; train or reassign underperformers | 5–15% revenue lift from channel optimization |
| Product mix drives 19.4% of variance | Tailor assortment by market instead of uniform distribution | 10–20% reduction in slow-moving inventory |
| Geography accounts for 12.5% of variance | Differentiate safety stock and reorder points by country | 15–25% reduction in regional stockouts |
| Seasonality is only 9.8% | Stop over-investing in seasonal models; fix your master data first | Redirected analytics effort toward higher-ROI problems |
Product Classification → Operational Automation
The 87% classification accuracy enables:
- Automated routing of new products to the correct inventory pool — eliminating manual categorization delays
- Dynamic demand sensing by category — detecting shifts from Premium to Classic (or vice versa) in near real-time
- Assortment planning — data-driven decisions about which categories to expand or contract in each market
The Compounding Effect
These aren't isolated improvements. Better forecasting reduces inventory, which frees cash, which funds better data infrastructure, which improves the next model. In my experience across enterprise supply chains, teams that invest in this pipeline see cumulative cost reductions of 8–15% of total supply chain spend within 12–18 months.
The key is not the model — it's the systematic approach: understand your signals, engineer the right features, validate rigorously, and let the data tell you where to invest next.
Evolving This Pipeline
This demo covers the core workflow with internal data. In a production environment, the next evolution steps would be:
External signal enrichment:
- Weather data by region (chocolate demand is temperature-sensitive)
- Holiday and event calendars by country
- Economic indicators (consumer spending indices, inflation data)
- Promotional calendars and pricing changes
Advanced temporal modeling:
- Prophet or NeuralProphet for trend + seasonality decomposition
- LSTM or Temporal Fusion Transformer for sequence learning
- Hierarchical reconciliation across country → category → SKU levels
Operational hardening:
- Probabilistic forecasting (prediction intervals, not just point estimates)
- Automated retraining pipelines with drift detection
- A/B testing framework for forecast method selection
Each of these layers builds on the foundation demonstrated here. The pipeline structure stays the same — the signals and models get richer.
Key Takeaways
-
Feature engineering > model complexity. A well-featured Random Forest beat a more sophisticated Gradient Boosting model because the features captured the right signals.
-
Structural factors dominate. Product, geography, and channel explain 4x more variance than time-based features. Invest in master data quality before investing in fancier algorithms.
-
Know your ceiling. An R² of 0.61 with only internal data tells you where the model's limits are — and points you toward external signals as the next source of lift.
-
Start simple, validate, then add complexity. This workflow — EDA → feature engineering → baseline model → enrichment — works at any scale. The dataset size changes; the methodology doesn't.
Built with Python (pandas, scikit-learn, matplotlib) using a publicly available dataset. The methodology reflects approaches I apply professionally in enterprise supply chain environments.