Skip to content

LT-Ripjaws/spam-detection-with-stacking-classifier-machine-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spam Detection with Stacking Classifier

Contributors Technologies Status

Intro

A sophisticated machine learning web application that detects spam messages using an ensemble stacking classifier. Built with FastAPI backend and a modern, responsive frontend.

Features

  • Advanced ML Model: Stacking classifier combining MultinomialNB, LinearSVC, SGDClassifier, and RandomForestClassifier
  • Optimized Performance: RandomizedSearchCV hyperparameter tuning for best results
  • RESTful API: FastAPI backend for fast, reliable predictions
  • Modern UI: Professional, responsive web interface.
  • Real-time Analysis: Instant spam detection with visual feedback
  • High Accuracy: Optimized for recall to minimize false negatives

Model Performance

  • Accuracy: ~98%
  • Precision: ~88%
  • Recall: ~93%
  • F1 Score: ~90%

Note: Metrics based on test set evaluation

Screenshots

Predicting SPAM
Spam
Predicting HAM
Ham

Project Structure

spam-detection-with-stacking-classifier-machine-learning/
│
├── data/
│   ├── spam_dataset.csv          # Original dataset
│
│── spam_classifier.pkl           # Trained model
├── notebooks/
│   ├── spam_detection.ipynb
│
│── main.py                       # FastAPI application
│── index.html                    # Frontend HTML/CSS/JS
│
│
├── .gitignore                    # Git ignore rules
├── README.md                     # This file
└── LICENSE                       # MIT License 

Installation

  1. Clone the repository

    git clone this repo
    cd spam-detection-with-stacking-classifier-machine-learning
  2. Create a virtual environment

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Run the application

    uvicorn main:app --reload
  4. Access the web interface Open your browser and navigate to http://localhost:8000

Requirements

fastapi==0.104.1
uvicorn==0.24.0
scikit-learn==1.3.2
pandas==2.1.3
numpy==1.26.2
pydantic==2.5.2
nltk

Model Architecture

Base Models

  • MultinomialNB: Naive Bayes with Laplace smoothing
  • LinearSVC: Support Vector Classification with balanced class weights
  • SGDClassifier: Stochastic Gradient Descent with modified Huber loss
  • RandomForestClassifier: Ensemble of decision trees with 200 estimators

Meta-Learner

  • LogisticRegression: Combines predictions from base models

Feature Engineering

  • TfidfVectorizer: N-gram based text vectorization
    • Max features: 8000
    • N-gram range: (1,1) to (1,3)
    • Sublinear TF scaling

Hyperparameter Tuning

RandomizedSearchCV was used with:

  • n_iter: 100 iterations
  • cv: 5-fold cross-validation
  • scoring: Recall (to minimize false negatives)
  • n_jobs: Parallel processing enabled

Tuned Parameters

Component Parameter Range
TF-IDF max_features 3000-11000
TF-IDF ngram_range (1,1) to (1,3)
MultinomialNB alpha 0.01-1.0
LinearSVC C 0.1-2
RandomForest n_estimators 50-300
LogisticRegression C 0.01-5

🌐 API Endpoints

POST /predict

Predicts whether a message is spam or not.

Request:

{
  "text": "Your message here"
}

Response:

{
  "prediction": "spam" | "not spam"
}

GET /

Returns the web interface (HTML page).

📝 License

This project is licensed under the MIT License

👤 Author

Chinmoy Guha

Profile Banner

About

A sophisticated machine learning web application that detects spam messages using an ensemble stacking classifier. Built with FastAPI backend and a modern, responsive frontend.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages