-
Notifications
You must be signed in to change notification settings - Fork 0
Home
ILM College Sargodha | BSCS (24-28) | Group #2
Subject: Artificial Intelligence | Topic: Natural Language Processing
Supervised by: Sir Abdur-Rehman
Affiliated with: University of Sargodha
Live at: https://entify.orbin.dev
Entify is a comparative Named Entity Recognition system we built as our AI semester project. The core idea was to train a CRF model from scratch, put it next to spaCy's pre-trained neural model, and see how the two actually differ in the real world — not just on paper.
The project ran from 26 Jan 2026 to 26 Mar 2026, an 8-week development cycle.
Two engines run in parallel on whatever text you give it:
- CRF Model — trained by us on 14,041 sentences from the CoNLL2003 dataset. Uses statistical transition probabilities and BIO tagging to detect entity boundaries.
- spaCy (en_core_web_sm) — used as a pre-trained neural baseline. No fine-tuning applied.
Both results are shown side by side with entity labels, positions, and inference time.
| Entity Class | Custom CRF | spaCy Baseline |
|---|---|---|
| Person (PER) | 93.1% | 91.4% |
| Location (LOC) | 88.7% | 86.2% |
| Organization (ORG) | 85.4% | 83.8% |
| Misc (MISC) | 79.2% | 74.1% |
| Weighted Avg | 90.32% | 85.9% |
CRF inference: 0.0004s avg
spaCy inference: 0.0028s avg
Tested on CoNLL2003 testb split, verified on production server.
| Layer | Technology |
|---|---|
| Backend | Flask, sklearn-crfsuite, spaCy |
| Frontend | Alpine.js, Tailwind CSS, AOS |
| Hosting | AWS EC2, Apache, Gunicorn |
| CI/CD | GitHub Actions (auto-deploy on main push) |
Weeks 1–2 — Team learning phase. Hassan led architecture planning. Mudassir and Saad got up to speed with Python OOP, Git, and Tkinter.
Weeks 3–4 — Core development. CRF model trained on CoNLL2003. spaCy integration done. Initial Tkinter UI started.
Weeks 5–6 — Full desktop app completed in Tkinter. Compare, CRF, and spaCy modes all working locally.
Weeks 7–8 — Pivoted to web. Rebuilt everything in Flask, deployed on AWS EC2 with Apache and Gunicorn, wired up GitHub Actions for CI/CD.
26 Mar 2026 — Final submission.
The compare endpoint is public and CORS-enabled.
POST https://entify.orbin.dev/api/compare
Content-Type: application/json
{
"text": "Apple Inc. was founded in Cupertino.",
"mode": "compare"
}
Response returns entity arrays from both models with label, start, end, and processing time.
Error codes: 400 (empty text), 413 (over 5000 chars), 500 (inference failure).
Full reference: https://entify.orbin.dev/docs/api
git clone https://github.com/softdevhassan/entify
cd entify
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python -m spacy download en_core_web_sm
python app.pyFull setup guide: https://entify.orbin.dev/docs/dev
| Name | Role |
|---|---|
| Hassan Ali | Team Lead |
| Mudassir Ali | Member |
| Saad Ilyas | Member |