Skip to content

AnvayKharb/model

Repository files navigation

Caption Quality Classifier 🏷️

A machine learning project to differentiate between good and bad captions using Natural Language Processing.

Overview

This project uses TF-IDF vectorization and Logistic Regression to classify captions as either "good" or "bad" based on their quality.

Features

  • ✅ Load caption data from JSON files
  • ✅ TF-IDF text vectorization
  • ✅ Logistic Regression classification
  • ✅ Model saving and loading
  • ✅ Prediction with confidence scores
  • ✅ Data augmentation utilities
  • ✅ Interactive prediction mode

Installation

pip install -r requirements.txt

Quick Start

Training

python train.py

Interactive Prediction

python predict.py

Programmatic Usage

from caption_classifier import CaptionClassifier

classifier = CaptionClassifier()
captions, labels = classifier.load_data('caption_data.json')
X = classifier.preprocess(captions)
classifier.train(X, labels)

result = classifier.predict("A beautiful sunset over the ocean")
print(result)  # {'label': 'good', 'confidence': 0.95}

Project Structure

├── caption_classifier.py  # Main classifier class
├── caption_data.json      # Sample training data
├── config.py              # Configuration settings
├── utils.py               # Utility functions
├── augmentation.py        # Data augmentation
├── train.py               # Training script
├── predict.py             # Prediction API
├── test_classifier.py     # Unit tests
└── requirements.txt       # Dependencies

Dataset

The sample dataset includes 10 labeled captions (5 good, 5 bad).

Running Tests

python -m unittest test_classifier.py

License

MIT License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages