Skip to content

RudrakshChugh/Fraud_Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Adaptive Checkout Risk Engine

A cost-aware, multi-dimensional fraud detection framework that moves beyond binary classification to deliver personalized, explainable transaction routing.

Python LightGBM FastAPI SHAP NetworkX


Overview

Traditional fraud systems rely on static rules like "block if amount > $50,000 AND device is new". While simple, these systems generate excessive false positives—declining legitimate customers, increasing cart abandonment, and eroding trust.

This project replaces that paradigm with an Adaptive Risk Engine that evaluates each transaction across four independent dimensions and routes it to the optimal action (Approve, OTP, or Block) using a mathematically defined Composite Decision Score (CDS).

Key Differentiator: The system optimizes for business cost (revenue saved minus friction imposed), not just statistical accuracy.


System Architecture

graph TD
    A[Customer Checkout] --> B[Transaction Data]
    B --> C[Feature Engineering]
    C --> D[Fraud Risk Model]
    C --> T[Trust Engine]
    C --> V[CRV Assessment]
    C --> N[Graph Intelligence]
    
    D --> E[Composite Decision Engine]
    T --> E
    V --> E
    N --> E
    
    E --> F{Cost-Aware Routing}
    
    F -->|Low Risk| G[Approve]
    F -->|Medium Risk| H[OTP]
    F -->|High Risk| I[Block]

    classDef approve fill:#d4edda,stroke:#28a745,stroke-width:2px,color:#000;
    classDef otp fill:#fff3cd,stroke:#ffc107,stroke-width:2px,color:#000;
    classDef block fill:#f8d7da,stroke:#dc3545,stroke-width:2px,color:#000;

    class G approve;
    class H otp;
    class I block;
Loading

Composite Decision Score (CDS)

The final routing decision is computed as:

$$CDS = w_1 \cdot \text{Risk} - w_2 \cdot \text{Trust} - w_3 \cdot \text{CRV} + w_4 \cdot \text{GraphRisk}$$

Where weights are tuned via the Business Cost Function to jointly minimize fraud loss, false-positive declines, and verification overhead.


Results

Trained on 150,000 samples from the IEEE-CIS Fraud Detection dataset (~2.65% fraud rate).

Metric Score
ROC-AUC 0.9430
PR-AUC 0.7044
Test Samples 30,000
Fraud in Test 794 (2.65%)

ROC Curve

Precision-Recall Curve

Top 20 Feature Importances


Innovation Layer

This project goes significantly beyond standard fraud classification:

Component What It Does File
Behavioral Trust Engine Computes a dynamic trust score from account age, transaction history, device/location consistency, and chargeback history. Acts as a counterweight to raw fraud probability. trust_engine.py
Customer Relationship Value (CRV) A proxy for customer lifetime value derived from transaction volume, longevity, and behavioral consistency. High-CRV customers receive reduced friction. crv_assessment.py
Fraud Network Intelligence Builds entity graphs (Customer ↔ Card ↔ Device ↔ IP ↔ Email) and extracts topological features like shared-device counts, fraud-neighbor scores, and centrality metrics. graph_network.py
Business Cost Optimizer Evaluates routing decisions against actual dollar losses: (Fraud Loss × Missed Fraud) + (FP Cost × Declines) + (OTP Cost × Challenges) cost_optimization.py

Repository Structure

├── data/                       # IEEE-CIS dataset (git-ignored)
├── model/                      # Saved LightGBM weights
├── results/                    # Evaluation plots & metrics
│   ├── roc_curve.png
│   ├── pr_curve.png
│   ├── feature_importance.png
│   └── metrics.json
├── fraud_engine/               # Core ML & API logic
│   ├── run_pipeline.py         # End-to-end training & evaluation
│   ├── train.py                # LightGBM training with class weighting
│   ├── composite_score.py      # CDS routing logic
│   ├── trust_engine.py         # Behavioral trust scoring
│   ├── crv_assessment.py       # Customer value proxy
│   ├── graph_network.py        # NetworkX graph features
│   ├── temporal_velocity.py    # Time-based feature extraction
│   ├── cost_optimization.py    # Business cost function
│   ├── baseline_model.py       # Rule-based baseline benchmark
│   ├── explainability.py       # SHAP model explainer
│   └── api.py                  # FastAPI inference endpoint
├── requirements.txt
└── README.md

Quick Start

1. Clone & Install

git clone https://github.com/RudrakshChugh/Fraud_Detection.git
cd Fraud_Detection
python -m venv venv && .\venv\Scripts\activate   # Windows
pip install -r requirements.txt

2. Train the Model

Place the IEEE-CIS dataset CSVs inside data/, then run:

python fraud_engine/run_pipeline.py

This will train the model, save weights to model/, and generate all evaluation plots in results/.

3. Launch the API

cd fraud_engine
uvicorn api:app --reload

4. Test a Transaction

curl -X POST http://localhost:8000/evaluate_risk \
  -H "Content-Type: application/json" \
  -d '{"transaction_id":"TXN-001","customer_id":"C-42","amount":1200.0,"device_id":"D-99","ip_address":"192.168.1.1","email_domain":"gmail.com"}'

ML Methodology

Data & Preprocessing

  • Dataset: IEEE-CIS Fraud Detection (590K+ transactions, 400+ features)
  • Preprocessing: Median imputation, label encoding, removal of columns with >80% null values
  • Class Imbalance: Handled via dynamic scale_pos_weight and stratified train/test splits

Feature Engineering

Category Features
Temporal & Velocity Transactions per hour/day, time since last transaction, weekend flags
Behavioral Average spend, spending deviation, amount percentiles
Graph-Derived Accounts per device, devices per card, shared-IP count, fraud-neighbor count

Explainability (SHAP)

Every flagged transaction is accompanied by human-readable explanations:

Risk Score: 87/100

  • Multiple accounts linked to current device (Graph Alert)
  • Unusual transaction amount vs. historical baseline
  • High Customer Relationship Value (Mitigating Factor)

Tech Stack

Layer Technology
ML Framework LightGBM, XGBoost, Scikit-learn
Graph Analysis NetworkX
Explainability SHAP
API FastAPI + Uvicorn
Visualization Matplotlib, Seaborn

Future Work

  • Online Learning — Continuous model updates to adapt to concept drift
  • Streaming Inference — Apache Kafka integration for ultra-low latency event processing
  • Federated Learning — Privacy-preserving collaborative fraud detection across merchant networks

License

This project is for educational and portfolio purposes. The IEEE-CIS dataset is subject to its own Kaggle competition rules.

About

A cost-aware, multi-dimensional ML fraud detection framework that moves beyond binary classification to deliver personalized, explainable transaction routing.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages