Skip to content
Hassan Ali edited this page Mar 28, 2026 · 1 revision

Entify — Comparative NER Engine

ILM College Sargodha | BSCS (24-28) | Group #2
Subject: Artificial Intelligence | Topic: Natural Language Processing
Supervised by: Sir Abdur-Rehman
Affiliated with: University of Sargodha

Live at: https://entify.orbin.dev


What is this project?

Entify is a comparative Named Entity Recognition system we built as our AI semester project. The core idea was to train a CRF model from scratch, put it next to spaCy's pre-trained neural model, and see how the two actually differ in the real world — not just on paper.

The project ran from 26 Jan 2026 to 26 Mar 2026, an 8-week development cycle.


How it works

Two engines run in parallel on whatever text you give it:

  • CRF Model — trained by us on 14,041 sentences from the CoNLL2003 dataset. Uses statistical transition probabilities and BIO tagging to detect entity boundaries.
  • spaCy (en_core_web_sm) — used as a pre-trained neural baseline. No fine-tuning applied.

Both results are shown side by side with entity labels, positions, and inference time.


Model Performance

Entity Class Custom CRF spaCy Baseline
Person (PER) 93.1% 91.4%
Location (LOC) 88.7% 86.2%
Organization (ORG) 85.4% 83.8%
Misc (MISC) 79.2% 74.1%
Weighted Avg 90.32% 85.9%

CRF inference: 0.0004s avg
spaCy inference: 0.0028s avg
Tested on CoNLL2003 testb split, verified on production server.


Tech Stack

Layer Technology
Backend Flask, sklearn-crfsuite, spaCy
Frontend Alpine.js, Tailwind CSS, AOS
Hosting AWS EC2, Apache, Gunicorn
CI/CD GitHub Actions (auto-deploy on main push)

Project Timeline

Weeks 1–2 — Team learning phase. Hassan led architecture planning. Mudassir and Saad got up to speed with Python OOP, Git, and Tkinter.

Weeks 3–4 — Core development. CRF model trained on CoNLL2003. spaCy integration done. Initial Tkinter UI started.

Weeks 5–6 — Full desktop app completed in Tkinter. Compare, CRF, and spaCy modes all working locally.

Weeks 7–8 — Pivoted to web. Rebuilt everything in Flask, deployed on AWS EC2 with Apache and Gunicorn, wired up GitHub Actions for CI/CD.

26 Mar 2026 — Final submission.


API

The compare endpoint is public and CORS-enabled.

POST https://entify.orbin.dev/api/compare
Content-Type: application/json

{
  "text": "Apple Inc. was founded in Cupertino.",
  "mode": "compare"
}

Response returns entity arrays from both models with label, start, end, and processing time.
Error codes: 400 (empty text), 413 (over 5000 chars), 500 (inference failure).

Full reference: https://entify.orbin.dev/docs/api


Quick Start

git clone https://github.com/softdevhassan/entify
cd entify
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python -m spacy download en_core_web_sm
python app.py

Full setup guide: https://entify.orbin.dev/docs/dev


Team

Name Role
Hassan Ali Team Lead
Mudassir Ali Member
Saad Ilyas Member