Skip to content

vansh-09/SmartChurn

Repository files navigation

📱 SmartChurn: Telecom Customer Churn Prediction

Live Demo Python XGBoost License

A machine learning solution to predict and prevent customer churn in the telecommunications industry

Live DemoReport BugRequest Feature


🎯 Project Overview

SmartChurn is an end-to-end machine learning project that predicts customer churn in the telecom sector using advanced algorithms and interpretable AI techniques. The project demonstrates a complete ML pipeline from data preprocessing to actionable business insights, helping telecom companies identify at-risk customers and implement targeted retention strategies.

🌟 Key Highlights

  • 🎯 High Accuracy: Achieves ~84% accuracy with balanced precision and recall
  • 🔍 Explainable AI: Uses SHAP values for model interpretability
  • 📊 Business-Ready: Extracts actionable insights for retention strategies
  • 🚀 Production-Ready: Deployed web app for real-time predictions
  • 📈 Comprehensive Analysis: Full EDA and feature engineering pipeline

🎬 Demo

Try the live application: SmartChurn Web App

SmartChurn Demo

📊 Dataset

  • Source: Telco Customer Churn Dataset (Kaggle)
  • Size: 7,043 customer records
  • Features: 21 attributes including:
    • Customer demographics (gender, senior citizen, dependents)
    • Account information (tenure, contract type, payment method)
    • Service usage (phone service, internet service, streaming services)
    • Billing information (monthly charges, total charges)

🏗️ Project Architecture

Data Collection → EDA → Data Preprocessing → Feature Engineering
                                ↓
                          Model Training
                                ↓
                    ┌───────────┴───────────┐
                    ↓                       ↓
            Model Evaluation        SHAP Analysis
                    ↓                       ↓
                    └───────────┬───────────┘
                                ↓
                    Business Insights & Deployment

🚀 Getting Started

Prerequisites

  • Python 3.8 or higher
  • pip package manager
  • Jupyter Notebook (optional, for exploring the analysis)

Installation

  1. Clone the repository

    git clone https://github.com/vansh-09/SmartChurn.git
    cd SmartChurn
  2. Create a virtual environment (recommended)

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies

    pip install -r requirements.txt

Running the Project

Option 1: Jupyter Notebook (Recommended for exploration)

jupyter notebook SmartChurn.ipynb

Option 2: Streamlit Web App (For predictions)

streamlit run app.py

📈 Model Performance

Metric Score
Accuracy 84%
Precision 83%
Recall 87%
F1-Score 85%
ROC-AUC 0.88

Why XGBoost?

  • ✅ Excellent performance on tabular data
  • ✅ Handles imbalanced datasets well
  • ✅ Built-in feature importance
  • ✅ Robust to overfitting
  • ✅ Fast training and prediction

🔍 Key Features

1. 📊 Exploratory Data Analysis (EDA)

  • Comprehensive visualization of customer behavior patterns
  • Correlation analysis between features
  • Churn distribution across different customer segments

2. 🧼 Data Preprocessing & Feature Engineering

  • Handling missing values and outliers
  • Encoding categorical variables
  • Feature scaling and normalization
  • Creating derived features for better predictions

3. 🤖 Machine Learning Pipeline

  • XGBoost model training with hyperparameter tuning
  • Cross-validation for robust performance estimation
  • Model serialization for deployment

4. ✅ Model Evaluation

  • Comprehensive classification metrics
  • Confusion matrix analysis
  • ROC curve and AUC score
  • Feature importance analysis

5. 🔍 SHAP (SHapley Additive exPlanations)

  • Global feature importance
  • Individual prediction explanations
  • Force plots and waterfall charts
  • Dependence plots for feature interactions

6. 📉 Business Insights Generation

  • Actionable recommendations for customer retention
  • Risk segmentation of customer base
  • Cost-benefit analysis of retention strategies

💡 Key Business Insights

Based on the model analysis, here are the critical factors driving customer churn:

🔴 High-Risk Indicators

  1. 📅 Contract Type

    • Month-to-month customers have 3x higher churn rate
    • Action: Offer incentives for annual contracts
  2. 🧾 Monthly Charges

    • High charges without bundled services increase churn by 40%
    • Action: Create value-added service bundles
  3. ❌ Lack of Support Services

    • Customers without tech support churn 2x more
    • Action: Include complimentary support in premium plans
  4. 📄 Paperless Billing

    • Digital-first customers show higher churn (25% more)
    • Action: Engage through digital channels and apps
  5. 📊 Low Tenure

    • Customers with < 6 months tenure are highest risk
    • Action: Implement strong onboarding and early engagement

💰 Potential ROI

Assuming:

  • Average customer lifetime value: $2,000
  • Cost of retention campaign: $100 per customer
  • Model identifies 500 at-risk customers with 85% precision

Potential savings: 500 × 0.85 × ($2,000 - $100) = $807,500

📁 Project Structure

SmartChurn/
│
├── data/
│   └── telco.csv                 # Dataset
│
├── images/                        # Visualizations and demos
│   └── demo.gif
│
├── models/
│   └── smartchurn_model.pkl      # Trained model
│
├── notebooks/
│   └── SmartChurn.ipynb          # Main analysis notebook
│
├── src/
│   ├── preprocessing.py          # Data preprocessing utilities
│   ├── model.py                  # Model training and evaluation
│   └── visualization.py          # Plotting functions
│
├── app.py                        # Streamlit web application
├── requirements.txt              # Python dependencies
├── Analysis_Report.md            # Detailed findings and insights
├── .gitignore
├── LICENSE
└── README.md

🛠️ Technologies Used

Category Technologies
Language Python 3.8+
Data Manipulation Pandas, NumPy
Visualization Matplotlib, Seaborn, Plotly
Machine Learning Scikit-learn, XGBoost
Model Interpretation SHAP
Deployment Streamlit
Model Persistence Pickle

🔮 Future Enhancements

  • Implement real-time prediction API with FastAPI
  • Add deep learning models (Neural Networks) for comparison
  • Integrate with CRM systems for automated alerts
  • Build a dashboard for churn monitoring
  • Add A/B testing framework for retention strategies
  • Implement MLOps pipeline with model versioning
  • Add customer segmentation using clustering

🤝 Contributing

Contributions are what make the open-source community an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

📝 License

Distributed under the MIT License. See LICENSE for more information.

👤 Author

Vansh

🙏 Acknowledgments

📚 References

  • Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System
  • Lundberg, S. M., & Lee, S. I. (2017). A Unified Approach to Interpreting Model Predictions
  • Various articles on customer churn prediction in telecom industry

If you found this project helpful, please consider giving it a ⭐!

Made with ❤️ by Vansh

About

Predict and prevent telecom customer churn with ML. XGBoost model achieves 84% accuracy, SHAP analysis provides explainability, and interactive Streamlit app delivers real-time predictions. Turn data into retention strategies.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages