Predicting which orders will breach their Promised Delivery Time — before they do.
A supply-chain analytics project that mines a hypothetical e-commerce dataset to identify, explain, and predict Promised Delivery Time (PDT) breaches — the orders that arrive late and quietly erode customer trust.
- Who? Anyone working on logistics analytics, fulfilment ops, or churn-via-experience.
- What? EDA, feature engineering, and a model bake-off (Neural Networks, Random Forest, SVM, plus baselines) to predict delivery delays.
- Where? Built on the public DataCo Supply Chain dataset.
- When? 2024.
- Why? A late delivery costs more than the delivery — it costs the next order. Knowing which orders are at risk lets ops intervene before the breach.
Every fulfilment team lives the same loop: orders flow in, promises go out, and somewhere between warehouse and doorstep, a chunk of those promises break. The interesting question isn't how many — it's which ones, and why.
This project walks that loop end-to-end:
- EDA uncovers where delays cluster — by region, product category, shipping mode, day-of-week.
- Feature engineering turns raw fields into signals: time-of-order, route distance proxies, category encodings, customer history.
- Model bake-off pits a Neural Network against Random Forest and SVM (with logistic regression and gradient boosting as sanity-check baselines).
- Evaluation compares them on recall, F1, and ROC-AUC, because in this domain false negatives (missed breaches) are the expensive kind of wrong.
The recommendations that fall out — inventory positioning, ship-mode reassignment, region-level SLAs — are where the analytics turn into action.
Open
Notebooks/main.ipynbfor the full visual story — heatmaps, region breakdowns, model comparisons, and feature importances.
| Layer | Tools |
|---|---|
| Language | Python 3 |
| Notebook | Jupyter |
| Data | pandas, numpy |
| Viz | matplotlib, seaborn, plotly |
| ML | scikit-learn (RF, SVM, LR, GBM), TensorFlow / Keras (NN) |
| Stats | statsmodels |
PDT-Breach-Risk-Mitigation/
├── Dataset/
│ └── DataCoSupplyChainDataset.csv
├── Notebooks/
│ └── main.ipynb # EDA + feature engineering + models
├── requirements.txt
├── LICENSE
└── README.md
git clone https://github.com/GyaneshSamanta/PDT-Breach-Risk-Mitigation.git
cd PDT-Breach-Risk-Mitigation
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
jupyter notebook Notebooks/main.ipynbPRs welcome that:
- Swap the model bake-off for gradient-boosted trees with proper cross-validation.
- Add SHAP explanations for individual breach predictions.
- Wrap the model in a tiny FastAPI endpoint for ops integration.
See LICENSE. The DataCo dataset is third-party — check its original terms before redistribution.
- Gyanesh Samanta — analysis, modelling, write-up.
- DataCo — for the public supply-chain dataset that anchors the analysis.