A machine learning solution to predict and prevent customer churn in the telecommunications industry
SmartChurn is an end-to-end machine learning project that predicts customer churn in the telecom sector using advanced algorithms and interpretable AI techniques. The project demonstrates a complete ML pipeline from data preprocessing to actionable business insights, helping telecom companies identify at-risk customers and implement targeted retention strategies.
- 🎯 High Accuracy: Achieves ~84% accuracy with balanced precision and recall
- 🔍 Explainable AI: Uses SHAP values for model interpretability
- 📊 Business-Ready: Extracts actionable insights for retention strategies
- 🚀 Production-Ready: Deployed web app for real-time predictions
- 📈 Comprehensive Analysis: Full EDA and feature engineering pipeline
Try the live application: SmartChurn Web App
- Source: Telco Customer Churn Dataset (Kaggle)
- Size: 7,043 customer records
- Features: 21 attributes including:
- Customer demographics (gender, senior citizen, dependents)
- Account information (tenure, contract type, payment method)
- Service usage (phone service, internet service, streaming services)
- Billing information (monthly charges, total charges)
Data Collection → EDA → Data Preprocessing → Feature Engineering
↓
Model Training
↓
┌───────────┴───────────┐
↓ ↓
Model Evaluation SHAP Analysis
↓ ↓
└───────────┬───────────┘
↓
Business Insights & Deployment
- Python 3.8 or higher
- pip package manager
- Jupyter Notebook (optional, for exploring the analysis)
-
Clone the repository
git clone https://github.com/vansh-09/SmartChurn.git cd SmartChurn -
Create a virtual environment (recommended)
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
jupyter notebook SmartChurn.ipynbstreamlit run app.py| Metric | Score |
|---|---|
| Accuracy | 84% |
| Precision | 83% |
| Recall | 87% |
| F1-Score | 85% |
| ROC-AUC | 0.88 |
- ✅ Excellent performance on tabular data
- ✅ Handles imbalanced datasets well
- ✅ Built-in feature importance
- ✅ Robust to overfitting
- ✅ Fast training and prediction
- Comprehensive visualization of customer behavior patterns
- Correlation analysis between features
- Churn distribution across different customer segments
- Handling missing values and outliers
- Encoding categorical variables
- Feature scaling and normalization
- Creating derived features for better predictions
- XGBoost model training with hyperparameter tuning
- Cross-validation for robust performance estimation
- Model serialization for deployment
- Comprehensive classification metrics
- Confusion matrix analysis
- ROC curve and AUC score
- Feature importance analysis
- Global feature importance
- Individual prediction explanations
- Force plots and waterfall charts
- Dependence plots for feature interactions
- Actionable recommendations for customer retention
- Risk segmentation of customer base
- Cost-benefit analysis of retention strategies
Based on the model analysis, here are the critical factors driving customer churn:
-
📅 Contract Type
- Month-to-month customers have 3x higher churn rate
- Action: Offer incentives for annual contracts
-
🧾 Monthly Charges
- High charges without bundled services increase churn by 40%
- Action: Create value-added service bundles
-
❌ Lack of Support Services
- Customers without tech support churn 2x more
- Action: Include complimentary support in premium plans
-
📄 Paperless Billing
- Digital-first customers show higher churn (25% more)
- Action: Engage through digital channels and apps
-
📊 Low Tenure
- Customers with < 6 months tenure are highest risk
- Action: Implement strong onboarding and early engagement
Assuming:
- Average customer lifetime value: $2,000
- Cost of retention campaign: $100 per customer
- Model identifies 500 at-risk customers with 85% precision
Potential savings: 500 × 0.85 × ($2,000 - $100) = $807,500
SmartChurn/
│
├── data/
│ └── telco.csv # Dataset
│
├── images/ # Visualizations and demos
│ └── demo.gif
│
├── models/
│ └── smartchurn_model.pkl # Trained model
│
├── notebooks/
│ └── SmartChurn.ipynb # Main analysis notebook
│
├── src/
│ ├── preprocessing.py # Data preprocessing utilities
│ ├── model.py # Model training and evaluation
│ └── visualization.py # Plotting functions
│
├── app.py # Streamlit web application
├── requirements.txt # Python dependencies
├── Analysis_Report.md # Detailed findings and insights
├── .gitignore
├── LICENSE
└── README.md
| Category | Technologies |
|---|---|
| Language | Python 3.8+ |
| Data Manipulation | Pandas, NumPy |
| Visualization | Matplotlib, Seaborn, Plotly |
| Machine Learning | Scikit-learn, XGBoost |
| Model Interpretation | SHAP |
| Deployment | Streamlit |
| Model Persistence | Pickle |
- Implement real-time prediction API with FastAPI
- Add deep learning models (Neural Networks) for comparison
- Integrate with CRM systems for automated alerts
- Build a dashboard for churn monitoring
- Add A/B testing framework for retention strategies
- Implement MLOps pipeline with model versioning
- Add customer segmentation using clustering
Contributions are what make the open-source community an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the MIT License. See LICENSE for more information.
Vansh
- GitHub: @vansh-09
- LinkedIn: Your LinkedIn Profile
- Email: your.email@example.com
- Kaggle for providing the dataset
- XGBoost Documentation
- SHAP Library
- Streamlit for the amazing deployment platform
- Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System
- Lundberg, S. M., & Lee, S. I. (2017). A Unified Approach to Interpreting Model Predictions
- Various articles on customer churn prediction in telecom industry
If you found this project helpful, please consider giving it a ⭐!
Made with ❤️ by Vansh
