A sophisticated machine learning web application that detects spam messages using an ensemble stacking classifier. Built with FastAPI backend and a modern, responsive frontend.
- Advanced ML Model: Stacking classifier combining MultinomialNB, LinearSVC, SGDClassifier, and RandomForestClassifier
- Optimized Performance: RandomizedSearchCV hyperparameter tuning for best results
- RESTful API: FastAPI backend for fast, reliable predictions
- Modern UI: Professional, responsive web interface.
- Real-time Analysis: Instant spam detection with visual feedback
- High Accuracy: Optimized for recall to minimize false negatives
- Accuracy: ~98%
- Precision: ~88%
- Recall: ~93%
- F1 Score: ~90%
Note: Metrics based on test set evaluation
| Predicting SPAM |
|---|
![]() |
| Predicting HAM |
|---|
![]() |
spam-detection-with-stacking-classifier-machine-learning/
│
├── data/
│ ├── spam_dataset.csv # Original dataset
│
│── spam_classifier.pkl # Trained model
├── notebooks/
│ ├── spam_detection.ipynb
│
│── main.py # FastAPI application
│── index.html # Frontend HTML/CSS/JS
│
│
├── .gitignore # Git ignore rules
├── README.md # This file
└── LICENSE # MIT License
-
Clone the repository
git clone this repo cd spam-detection-with-stacking-classifier-machine-learning -
Create a virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Run the application
uvicorn main:app --reload
-
Access the web interface Open your browser and navigate to
http://localhost:8000
fastapi==0.104.1
uvicorn==0.24.0
scikit-learn==1.3.2
pandas==2.1.3
numpy==1.26.2
pydantic==2.5.2
nltk
- MultinomialNB: Naive Bayes with Laplace smoothing
- LinearSVC: Support Vector Classification with balanced class weights
- SGDClassifier: Stochastic Gradient Descent with modified Huber loss
- RandomForestClassifier: Ensemble of decision trees with 200 estimators
- LogisticRegression: Combines predictions from base models
- TfidfVectorizer: N-gram based text vectorization
- Max features: 8000
- N-gram range: (1,1) to (1,3)
- Sublinear TF scaling
RandomizedSearchCV was used with:
- n_iter: 100 iterations
- cv: 5-fold cross-validation
- scoring: Recall (to minimize false negatives)
- n_jobs: Parallel processing enabled
| Component | Parameter | Range |
|---|---|---|
| TF-IDF | max_features | 3000-11000 |
| TF-IDF | ngram_range | (1,1) to (1,3) |
| MultinomialNB | alpha | 0.01-1.0 |
| LinearSVC | C | 0.1-2 |
| RandomForest | n_estimators | 50-300 |
| LogisticRegression | C | 0.01-5 |
Predicts whether a message is spam or not.
Request:
{
"text": "Your message here"
}Response:
{
"prediction": "spam" | "not spam"
}Returns the web interface (HTML page).
This project is licensed under the MIT License
Chinmoy Guha
- GitHub: @LT-Ripjaws
- Email: chinmoyguha676z@gmail.com




