🧠 Document Classifier (AI-powered)

This project is a simple yet complete demonstration of how to classify documents into categories such as:

📝 contract
📄 invoice
🏥 medical_record

It uses TF-IDF + Logistic Regression under the hood and includes:

A FastAPI-based backend
A Streamlit-based frontend
Auto-training logic if no model is present
PDF and raw text classification support

🚀 Features

✅ Train from scratch (only if no model exists)
✅ Classify PDF or plain text files
✅ Expose REST API for integration
✅ Streamlit UI for easy testing
✅ Clean and modular project structure

🖼️ Screenshots

Classifying a PDF Document

Classifying plain text

🗂️ Project Structure

document-classifier/
├── app/                  # FastAPI app
│   ├── main.py
│   ├── model.py
│   └── schemas.py
├── classifier/           # Model training and preprocessing
│   ├── train_model.py
│   ├── preprocess.py
│   └── sample_dataset.csv
├── model/                # Trained model file
│   └── document_classifier_model.joblib
├── streamlit_app/        # Streamlit interface
│   └── app.py
├── requirements.txt
└── README.md

🧪 How to Run

Install dependencies

pip install -r requirements.txt

Run the FastAPI backend

uvicorn app.main:app --reload

Run the Streamlit app (in another terminal)

streamlit run streamlit_app/app.py

🧠 Tech Stack

Python 3
Scikit-learn
FastAPI
Streamlit
PyMuPDF (for reading PDFs)

📬 API Endpoints

`POST /predict`

Send raw text to get classification result. Here few examples:

Contract:

This contract is made between the Company and the Contractor, and outlines the scope of services.

Invoice:

Invoice #45678 - Amount Due: $1,200.00 - Due Date: July 15, 2025

Medical Record:

The patient presents with chronic lower back pain and has a history of herniated disc.

{
  "text": "This contract is made between the Company and the Contractor, and outlines the scope of services."
}

`POST /predict-pdf`

Upload a PDF file and get a predicted label.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Document Classifier (AI-powered)

🚀 Features

🖼️ Screenshots

Classifying a PDF Document

Classifying plain text

🗂️ Project Structure

🧪 How to Run

🧠 Tech Stack

📬 API Endpoints

`POST /predict`

`POST /predict-pdf`

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
assets		assets
classifier		classifier
model		model
samples		samples
streamlit_app		streamlit_app
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

reinelt88/document-classifier

Folders and files

Latest commit

History

Repository files navigation

🧠 Document Classifier (AI-powered)

🚀 Features

🖼️ Screenshots

Classifying a PDF Document

Classifying plain text

🗂️ Project Structure

🧪 How to Run

🧠 Tech Stack

📬 API Endpoints

POST /predict

POST /predict-pdf

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`POST /predict`

`POST /predict-pdf`

Packages