Validates that a user's text description actually matches what they uploaded.
100% local — no ChatGPT, no external AI APIs. All models run on your machine.
┌─────────────────────────────────────────┐
│ Score ≥ 75 % → ✅ Approved │
│ 50 – 74 % → ⚠️ Warning (user can │
│ improve or submit) │
│ Score < 50 % → ❌ Rejected │
└─────────────────────────────────────────┘
| Layer | Technology | Purpose |
|---|---|---|
| Backend | Python · FastAPI · Uvicorn | REST API, file handling |
| CV/OCR | Tesseract · PyMuPDF · OpenCV | Text extraction from files |
| Embeddings | CLIP (ViT-B/32, local weights) | Visual ↔ text similarity |
| NLP | Sentence-BERT (all-MiniLM-L6-v2) | Semantic text ↔ OCR similarity |
| Frontend | React 18 · Vite · JSX | Drag-and-drop upload UI |
| Container | Docker · Docker Compose | One-command deployment |
upload-validator/
├── backend/
│ ├── main.py # FastAPI app — all CV/NLP logic
│ └── requirements.txt
├── frontend/
│ ├── src/
│ │ ├── UploadValidator.jsx # Main component
│ │ └── main.jsx
│ ├── index.html
│ ├── package.json
│ └── vite.config.js
├── tests/
│ ├── test_validator.py # Pytest suite (unit + integration)
│ └── generate_samples.py # Creates sample_data/ test files
├── sample_data/ # Generated by generate_samples.py
└── docker/
├── Dockerfile.backend
├── Dockerfile.frontend
└── docker-compose.yml
- Docker Desktop ≥ 4.x
# 1. Clone / unzip the project
cd upload-validator
# 2. Build and start both services
docker compose -f docker/docker-compose.yml up --build
# Frontend → http://localhost:3000
# Backend → http://localhost:8000First run downloads ~600 MB of model weights (CLIP + SBERT).
They are cached in the container layer; subsequent starts are instant.
| Tool | Version | Install |
|---|---|---|
| Python | 3.10 + | python.org |
| Tesseract OCR | 5.x | see below |
| Node.js | 18 + | nodejs.org |
Install Tesseract:
# macOS
brew install tesseract
# Ubuntu / Debian
sudo apt-get install tesseract-ocr tesseract-ocr-eng
# Windows — download installer from:
# https://github.com/UB-Mannheim/tesseract/wiki
# then add install dir to PATHcd upload-validator/backend
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
uvicorn main:app --reload --port 8000
# → http://localhost:8000
# → http://localhost:8000/docs (Swagger UI)cd upload-validator/frontend
npm install
npm run dev
# → http://localhost:3000cd upload-validator
# Install test deps (if not already done)
pip install pytest httpx PyMuPDF
# Run the full suite
pytest tests/test_validator.py -v| Class | What it tests |
|---|---|
TestCombineScores |
Score weighting math |
TestThresholdLogic |
approved / warning / rejected at exact boundaries |
TestHealthEndpoint |
/health returns 200 |
TestValidateEndpoint |
Empty desc → 400, bad MIME → 415, PNG/PDF/MP4 routing |
TestScoreAccuracy |
High similarity → high score, low → low (mocked models) |
All tests mock CLIP and SBERT so they run without GPU or internet access.
# Generate sample files
python tests/generate_samples.py
# Creates: sample_data/red_square.png, invoice.pdf, sample_video.mp4, etc.Open http://localhost:3000 and try these combinations:
| File | Good description (expect ≥75%) | Bad description (expect <50%) |
|---|---|---|
red_square.png |
"a red coloured square image" | "quarterly financial report" |
invoice.pdf |
"an invoice with payment amount" | "a photo of a sunset" |
contract.pdf |
"a service agreement between two companies" | "cat video compilation" |
blue_square.png |
"a blue square illustration" | "legal contract document" |
# Good match — should be approved
curl -X POST http://localhost:8000/validate \
-F "file=@sample_data/red_square.png" \
-F "description=a red square image"
# Bad match — should be rejected
curl -X POST http://localhost:8000/validate \
-F "file=@sample_data/invoice.pdf" \
-F "description=a video about cats"Image / PDF:
CLIP score (visual content ↔ description) × 0.6
+ SBERT score (OCR text ↔ description) × 0.4
= final %
Video (.mp4):
Keyframes extracted (OpenCV) → same pipeline as images
CLIP applied to each keyframe → max similarity used
If CLIP is unavailable (import error), the system falls back to OCR + SBERT only.
Edit the decision block in backend/main.py:
if score >= 75: # ← change to e.g. 80 for stricter approval
decision = "approved"
elif score >= 50: # ← change to e.g. 60 for stricter warning
decision = "warning"
else:
decision = "rejected"Adjust model weights:
# In analyse_image / analyse_pdf / analyse_video
score = combine_scores(clip_sim, text_sim,
clip_weight=0.6, # ← visual weight
text_weight=0.4) # ← OCR/text weight| Type | Extensions | Analysis method |
|---|---|---|
| Image | .jpg .jpeg .png .webp .gif .bmp |
CLIP + Tesseract OCR |
.pdf |
PyMuPDF text extraction + page renders → CLIP | |
| Video | .mp4 .mov .avi .mkv .webm |
OpenCV keyframes → CLIP |