A cost-aware, multi-dimensional fraud detection framework that moves beyond binary classification to deliver personalized, explainable transaction routing.
Traditional fraud systems rely on static rules like "block if amount > $50,000 AND device is new". While simple, these systems generate excessive false positives—declining legitimate customers, increasing cart abandonment, and eroding trust.
This project replaces that paradigm with an Adaptive Risk Engine that evaluates each transaction across four independent dimensions and routes it to the optimal action (Approve, OTP, or Block) using a mathematically defined Composite Decision Score (CDS).
Key Differentiator: The system optimizes for business cost (revenue saved minus friction imposed), not just statistical accuracy.
graph TD
A[Customer Checkout] --> B[Transaction Data]
B --> C[Feature Engineering]
C --> D[Fraud Risk Model]
C --> T[Trust Engine]
C --> V[CRV Assessment]
C --> N[Graph Intelligence]
D --> E[Composite Decision Engine]
T --> E
V --> E
N --> E
E --> F{Cost-Aware Routing}
F -->|Low Risk| G[Approve]
F -->|Medium Risk| H[OTP]
F -->|High Risk| I[Block]
classDef approve fill:#d4edda,stroke:#28a745,stroke-width:2px,color:#000;
classDef otp fill:#fff3cd,stroke:#ffc107,stroke-width:2px,color:#000;
classDef block fill:#f8d7da,stroke:#dc3545,stroke-width:2px,color:#000;
class G approve;
class H otp;
class I block;
The final routing decision is computed as:
Where weights are tuned via the Business Cost Function to jointly minimize fraud loss, false-positive declines, and verification overhead.
Trained on 150,000 samples from the IEEE-CIS Fraud Detection dataset (~2.65% fraud rate).
| Metric | Score |
|---|---|
| ROC-AUC | 0.9430 |
| PR-AUC | 0.7044 |
| Test Samples | 30,000 |
| Fraud in Test | 794 (2.65%) |
This project goes significantly beyond standard fraud classification:
| Component | What It Does | File |
|---|---|---|
| Behavioral Trust Engine | Computes a dynamic trust score from account age, transaction history, device/location consistency, and chargeback history. Acts as a counterweight to raw fraud probability. | trust_engine.py |
| Customer Relationship Value (CRV) | A proxy for customer lifetime value derived from transaction volume, longevity, and behavioral consistency. High-CRV customers receive reduced friction. | crv_assessment.py |
| Fraud Network Intelligence | Builds entity graphs (Customer ↔ Card ↔ Device ↔ IP ↔ Email) and extracts topological features like shared-device counts, fraud-neighbor scores, and centrality metrics. | graph_network.py |
| Business Cost Optimizer | Evaluates routing decisions against actual dollar losses: (Fraud Loss × Missed Fraud) + (FP Cost × Declines) + (OTP Cost × Challenges) |
cost_optimization.py |
├── data/ # IEEE-CIS dataset (git-ignored)
├── model/ # Saved LightGBM weights
├── results/ # Evaluation plots & metrics
│ ├── roc_curve.png
│ ├── pr_curve.png
│ ├── feature_importance.png
│ └── metrics.json
├── fraud_engine/ # Core ML & API logic
│ ├── run_pipeline.py # End-to-end training & evaluation
│ ├── train.py # LightGBM training with class weighting
│ ├── composite_score.py # CDS routing logic
│ ├── trust_engine.py # Behavioral trust scoring
│ ├── crv_assessment.py # Customer value proxy
│ ├── graph_network.py # NetworkX graph features
│ ├── temporal_velocity.py # Time-based feature extraction
│ ├── cost_optimization.py # Business cost function
│ ├── baseline_model.py # Rule-based baseline benchmark
│ ├── explainability.py # SHAP model explainer
│ └── api.py # FastAPI inference endpoint
├── requirements.txt
└── README.md
git clone https://github.com/RudrakshChugh/Fraud_Detection.git
cd Fraud_Detection
python -m venv venv && .\venv\Scripts\activate # Windows
pip install -r requirements.txtPlace the IEEE-CIS dataset CSVs inside data/, then run:
python fraud_engine/run_pipeline.pyThis will train the model, save weights to model/, and generate all evaluation plots in results/.
cd fraud_engine
uvicorn api:app --reloadcurl -X POST http://localhost:8000/evaluate_risk \
-H "Content-Type: application/json" \
-d '{"transaction_id":"TXN-001","customer_id":"C-42","amount":1200.0,"device_id":"D-99","ip_address":"192.168.1.1","email_domain":"gmail.com"}'- Dataset: IEEE-CIS Fraud Detection (590K+ transactions, 400+ features)
- Preprocessing: Median imputation, label encoding, removal of columns with >80% null values
- Class Imbalance: Handled via dynamic
scale_pos_weightand stratified train/test splits
| Category | Features |
|---|---|
| Temporal & Velocity | Transactions per hour/day, time since last transaction, weekend flags |
| Behavioral | Average spend, spending deviation, amount percentiles |
| Graph-Derived | Accounts per device, devices per card, shared-IP count, fraud-neighbor count |
Every flagged transaction is accompanied by human-readable explanations:
Risk Score: 87/100
- Multiple accounts linked to current device (Graph Alert)
- Unusual transaction amount vs. historical baseline
- High Customer Relationship Value (Mitigating Factor)
| Layer | Technology |
|---|---|
| ML Framework | LightGBM, XGBoost, Scikit-learn |
| Graph Analysis | NetworkX |
| Explainability | SHAP |
| API | FastAPI + Uvicorn |
| Visualization | Matplotlib, Seaborn |
- Online Learning — Continuous model updates to adapt to concept drift
- Streaming Inference — Apache Kafka integration for ultra-low latency event processing
- Federated Learning — Privacy-preserving collaborative fraud detection across merchant networks
This project is for educational and portfolio purposes. The IEEE-CIS dataset is subject to its own Kaggle competition rules.


