Academic integrity tool for checking student assignments.
A desktop app, CLI tool, and web service that loads a student assignment, extracts references and in-text citations, verifies every reference exists via academic databases, checks URLs, validates citation formatting, and flags suspicious or fabricated references.
- Reference Verification — Checks every bibliography entry against Crossref, Semantic Scholar, and OpenAlex APIs
- Citation Format Validation — Checks APA, MLA, and Chicago formatting rules
- Cross-Reference Checking — Matches in-text citations to bibliography entries, flags orphans
- URL Verification — HTTP checks on referenced URLs, with screenshots as evidence (desktop)
- DOI Resolution — Validates DOIs via Crossref (bot-blocked/paywalled publisher pages are reported as blocked, not dead)
- Citation Patterns — Future-dated citations, suspicious year clusters, mixed citation styles, and placeholder/template citation text
- File Support — PDF, DOCX, TXT, Markdown, JSON
CiteSight focuses on citation integrity. Prose-level signals — readability,
writing quality, and AI-writing tells (emojis, em-dashes, adverb ratio) — live
in the sibling document-analyser
tool; run both for a full picture.
Three ways to use CiteSight:
| Method | Best for | Install |
|---|---|---|
| Desktop app | Offline use, URL screenshots | Download for your platform |
| CLI | Automation, CI pipelines | npm install -g cite-sight |
| Docker | VPS hosting, shared access | docker pull michaelborck/cite-sight |
| Feature | Web / Docker | Desktop | CLI |
|---|---|---|---|
| File input | Single file | Multiple files | Single file |
| File types | PDF, DOCX, TXT | PDF, DOCX, TXT, MD | PDF, DOCX, TXT, MD, JSON |
| URL screenshots | — | Yes | — |
| PDF/CSV export | Yes | — | — |
| Output format | Browser dashboard | Desktop dashboard | Text or JSON (stdout) |
Pull the pre-built Docker image — no Node.js or build tools needed on the server.
docker run -d -p 3000:3000 --restart unless-stopped --name cite-sight michaelborck/cite-sightThe web app and API are available at http://your-server:3000.
Create a docker-compose.yml on your VPS:
services:
app:
image: michaelborck/cite-sight:latest
ports:
- "3000:3000"
restart: unless-stopped
environment:
- PORT=3000Then:
docker compose up -ddocker compose pull
docker compose up -dFor heavier usage, add Redis to queue analysis jobs instead of processing synchronously:
services:
app:
image: michaelborck/cite-sight:latest
ports:
- "3000:3000"
restart: unless-stopped
environment:
- PORT=3000
- REDIS_URL=redis://redis:6379
depends_on:
- redis
redis:
image: redis:7-alpine
restart: unless-stoppedIf you're serving on a domain with HTTPS, point your reverse proxy at port 3000. Example Caddy config:
citesight.yourdomain.com {
reverse_proxy localhost:3000
}
npm install
npm run build:core
# Terminal 1: Vite dev server
cd packages/desktop && npx vite
# Terminal 2: Electron
npx tsc -p packages/desktop/tsconfig.json
cd packages/desktop && npx electron .npm install
npm run build:core
npm run build:server
# Terminal 1: API server
node packages/server/dist/index.js
# Terminal 2: Web frontend
cd packages/web && npx viteOpen http://localhost:5173 — Vite proxies API calls to the server.
npm run build:core
npx tsc -p packages/cli/tsconfig.json
cite-sight check paper.pdf
cite-sight check paper.pdf --json
cite-sight check paper.pdf --style apa --email you@example.com
# Reports show the detail behind each issue by default — what was cited vs.
# what the matched record holds, plus the surrounding text for in-text
# citations. Use --minimal for a condensed summary-and-verdicts view.
cite-sight check paper.pdf --minimal
# For a bare source list / annotated bibliography (e.g. a deep-research export)
# rather than a manuscript, use --source-list to skip the in-text
# cross-reference check (otherwise every entry is reported as "uncited").
# CiteSight also auto-skips that check when no reference is cited at all.
cite-sight check sources.md --source-listBatch checking and rate limits. Lookups run one reference at a time and every
external request is paced to one per second, so checking a folder is slow but
stays within the citation databases' polite-pool limits; results are cached per
run, so a work cited across many papers is looked up only once. Always pass
--email (it joins the Crossref/OpenAlex polite pools). Semantic Scholar's
keyless tier can still rate-limit a large batch — supply a key with --s2-key
or the SEMANTIC_SCHOLAR_API_KEY environment variable (the desktop app and
server read the same variable). When a lookup is throttled, that reference is
reported as unverified with the reason (e.g. "rate-limited on Semantic
Scholar") — it is not a confirmed miss; re-run to retry those.
docker compose up --build
# Open http://localhost:3000cite-sight/
├── packages/
│ ├── core/ # Shared analysis library
│ ├── desktop/ # Electron app
│ ├── cli/ # CLI tool
│ ├── server/ # Express API server
│ └── web/ # Landing page + online tool
├── Dockerfile
├── docker-compose.yml
└── package.json # Workspace root
For each reference in the bibliography:
- Parse — Extract authors, title, year, journal, DOI, URL
- Validate Format — Check against APA/MLA/Chicago rules
- Verify Existence (cascade):
- DOI → Crossref API
- Search Crossref by title + author
- Search Semantic Scholar (fallback)
- Search OpenAlex (fallback)
- If has URL → HTTP status check
- Cross-Reference — Match bibliography ↔ in-text citations
- Score — Confidence score (0–1) based on metadata match quality
Releases are built automatically via GitHub Actions when a version tag is pushed.
Use the bump script to update all workspace versions, commit, tag, and push in one step:
npm run bump -- patch # 0.2.9 → 0.2.10
npm run bump -- minor # 0.2.9 → 0.3.0
npm run bump -- major # 0.2.9 → 1.0.0
npm run bump -- 1.0.0 # exact versionThe script updates all 6 package.json files, commits, creates an annotated vX.Y.Z tag, and prompts before pushing.
Pushing a v* tag triggers:
- Electron installers — macOS (DMG), Windows (NSIS), Linux (AppImage) with auto-update
- npm publish —
@michaelborck/cite-sight-core+cite-sightCLI - Docker images — pushed to Docker Hub and GitHub Container Registry (amd64 + arm64)
- Core: TypeScript, pdfjs-dist, mammoth
- Desktop: Electron, React 19, Zustand, Vite, electron-updater
- Web: React 19, Vite
- Server: Express, multer, BullMQ (optional)
- CLI: Commander.js, chalk
- APIs: Crossref, Semantic Scholar, OpenAlex (all free tier)
See LICENSE.