Skip to content

AI-powered document classifier with FastAPI + Streamlit | Supports text & PDF input

Notifications You must be signed in to change notification settings

reinelt88/document-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 Document Classifier (AI-powered)

This project is a simple yet complete demonstration of how to classify documents into categories such as:

  • 📝 contract
  • 📄 invoice
  • 🏥 medical_record

It uses TF-IDF + Logistic Regression under the hood and includes:

  • A FastAPI-based backend
  • A Streamlit-based frontend
  • Auto-training logic if no model is present
  • PDF and raw text classification support

🚀 Features

  • ✅ Train from scratch (only if no model exists)
  • ✅ Classify PDF or plain text files
  • ✅ Expose REST API for integration
  • ✅ Streamlit UI for easy testing
  • ✅ Clean and modular project structure

🖼️ Screenshots

Classifying a PDF Document

Chat Response

Classifying plain text

Chat Response

🗂️ Project Structure

document-classifier/
├── app/                  # FastAPI app
│   ├── main.py
│   ├── model.py
│   └── schemas.py
├── classifier/           # Model training and preprocessing
│   ├── train_model.py
│   ├── preprocess.py
│   └── sample_dataset.csv
├── model/                # Trained model file
│   └── document_classifier_model.joblib
├── streamlit_app/        # Streamlit interface
│   └── app.py
├── requirements.txt
└── README.md

🧪 How to Run

  1. Install dependencies
pip install -r requirements.txt
  1. Run the FastAPI backend
uvicorn app.main:app --reload
  1. Run the Streamlit app (in another terminal)
streamlit run streamlit_app/app.py

🧠 Tech Stack

  • Python 3
  • Scikit-learn
  • FastAPI
  • Streamlit
  • PyMuPDF (for reading PDFs)

📬 API Endpoints

POST /predict

Send raw text to get classification result. Here few examples:

Contract:

This contract is made between the Company and the Contractor, and outlines the scope of services.

Invoice:

Invoice #45678 - Amount Due: $1,200.00 - Due Date: July 15, 2025

Medical Record:

The patient presents with chronic lower back pain and has a history of herniated disc.

{
  "text": "This contract is made between the Company and the Contractor, and outlines the scope of services."
}

POST /predict-pdf

Upload a PDF file and get a predicted label.

Follow on GitHub

About

AI-powered document classifier with FastAPI + Streamlit | Supports text & PDF input

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages