Gamma: End-to-End Telco Churn Prediction Pipeline

An end-to-end Data Science & MLOps pipeline designed to predict customer churn using the IBM Telco Customer dataset. This repository demonstrates a complete machine learning lifecycle, focusing on reproducibility, clean code, and production-ready serving.

🌟 Business Value

Customer churn prediction is highly critical for telecommunications companies. By accurately forecasting which clients are at risk of leaving, companies can proactively target them with retention strategies, saving significant revenue. This project encapsulates the entire workflow required to train a robust model and expose it as a scalable API.

Technologies Used: Python · Poetry · Uvicorn/FastAPI (Serving) · MLflow (Tracking) · Sphinx (Documentation)

🏗 Architecture & Workflow

graph LR
    A[Raw Data] -->|make_dataset.py| B(Clean Data)
    B -->|build_features.py| C(Processed Features)
    C -->|train.py| D[XGBoost / LightGBM / CatBoost]
    D --> E((MLflow Tracking))
    D --> F[FastAPI Serving]
    F -->|predict API| G(Client)

📦 Requirements

Python 3.12 (recommended)
Poetry for environment and dependency management

🚀 Installation & Setup

Clone the repository and install the dependencies:

pip install poetry
poetry install

This will create an isolated virtual environment and automatically download all the dependencies defined in pyproject.toml.

🧭 Project Structure

.
├── README.md                # Project description
├── pyproject.toml           # Poetry dependencies configuration
├── notebooks/               # Jupyter notebooks (EDA, features, models, tracking)
│   ├── 01_eda.ipynb         # Exploratory Data Analysis (EDA)
│   ├── 02_features.ipynb    # Feature creation and transformation
│   ├── 03_models.ipynb      # Model training and validation
│   └── 04_mlflow.ipynb      # Experiment tracking with MLflow
├── reports/                 # Rendered HTML results and figures
├── docs/                    # Sphinx generated documentation
├── src/                     # Main source code
│   ├── data/
│   │   └── make_dataset.py        # Data downloading and cleaning
│   ├── features/
│   │   ├── build_features.py      # Feature engineering and preparation
│   │   └── feature_selection.py   # Feature selection methods
│   ├── models/
│   │   ├── models.py              # Classifiers definition and hyperparameter grids
│   │   ├── train.py               # Nested CV training and artifact saving
│   │   └── predict.py             # Batch predictions with trained models
│   └── serving/
│       └── app.py                 # Prediction API (FastAPI)

Module Descriptions

src/data/make_dataset.py: Downloads, cleans, and transforms raw data to prepare it for modeling.
src/features/build_features.py: Handles feature engineering, encoding, scaling, and splitting data.
src/features/feature_selection.py: Utilities for variable selection (variance, correlation, collinearity, RFECV).
src/models/models.py: Defines available classifiers and their hyperparameter grids.
src/models/train.py: Trains models with nested cross-validation, selects the optimal threshold, and persists artifacts.
src/models/predict.py: Generates batch predictions from a saved model.
src/serving/app.py: FastAPI application exposing the production models via REST endpoints.

⚡️ Quick Start Guide (TL;DR)

Install Dependencies
```
poetry install
```
Generate Documentation
- Windows: .\docs\make.bat html
- Linux/macOS: make -C docs html

Download 'Telco Churn' Data

poetry run python src/data/make_dataset.py --out data/raw/telco.csv

Data Preparation

poetry run python src/features/build_features.py --in data/preprocessed/telco_preprocessed.xlsx --out data/processed --kind cc

Train, Evaluate, and Save Model
```
poetry run python src/models/train.py --data data/processed --models models
```
*Note: This script may take up to an hour to execute. It will save the trained model and testing metrics. *
Serve the API
```
poetry run uvicorn src.serving.app:app --host 127.0.0.1 --port 8000
```
- Documentation is available at: GET http://localhost:8000/documentation
- Note: In production repo this is exposed via Github Pages.

Example Prediction Request

curl -X POST http://127.0.0.1:8000/predict -H "Content-Type: application/json" --data @sample.json

📊 MLflow Tracking

To view the experiment logs and interactive UI:

poetry run mlflow ui

After starting the server, run the 04_mlflow.ipynb notebook to log the experiments.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
docs		docs
notebooks		notebooks
reports		reports
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
sample.json		sample.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gamma: End-to-End Telco Churn Prediction Pipeline

🌟 Business Value

🏗 Architecture & Workflow

📦 Requirements

🚀 Installation & Setup

🧭 Project Structure

Module Descriptions

⚡️ Quick Start Guide (TL;DR)

📊 MLflow Tracking

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Gamma: End-to-End Telco Churn Prediction Pipeline

🌟 Business Value

🏗 Architecture & Workflow

📦 Requirements

🚀 Installation & Setup

🧭 Project Structure

Module Descriptions

⚡️ Quick Start Guide (TL;DR)

📊 MLflow Tracking

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages