Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
3ef1f45
Refactor notebooks for Colab GPU compatibility and fair comparison
vardhjain Jun 11, 2026
fdc5cdc
Fix ensure_ollama PATH on Colab — use shutil.which with /usr/local/bi…
vardhjain Jun 11, 2026
4f63aa6
Fix Ollama install on Colab: add zstd dep + robust binary search
vardhjain Jun 11, 2026
e163726
Restructure repo and fix the GraphRAG vs PlainRAG comparison
vardhjain Jun 12, 2026
d78dcdb
Tighten concept-hop query and drop dead FAISS dependency
vardhjain Jun 12, 2026
45a3b3e
Point at new ArangoDB deployment; tune notebooks for Colab Pro A100
vardhjain Jun 12, 2026
dc7e9db
Make benchmark resilient to Ollama crashes; cap generation
vardhjain Jun 12, 2026
eb51948
Make notebook clones reset-safe; default to faster generation cap
vardhjain Jun 12, 2026
704cdeb
Created using Colab
vardhjain Jun 12, 2026
585a24c
Add benchmark results and honest ablation write-up
vardhjain Jun 13, 2026
1389733
Add results artifacts (figure, summary); strip notebook run outputs
vardhjain Jun 13, 2026
1bfc4df
Add community health files, Makefile, diagram, and repo metadata
vardhjain Jun 13, 2026
4861a3c
Make the project portable: no hardcoded endpoint, local-or-cloud setup
vardhjain Jun 13, 2026
a4c3271
Add interactive UIs: Gradio chat demo + Streamlit results dashboard
vardhjain Jun 13, 2026
85522ac
Make the Streamlit dashboard one-click deployable on Streamlit Cloud
vardhjain Jun 13, 2026
c308933
Use shields.io badge for the live demo (more robust than static.strea…
vardhjain Jun 13, 2026
925055d
Polish README presentation: centered hero, quick-nav, emoji sections
vardhjain Jun 23, 2026
bcf10c8
Add coverage to CI (Codecov) and raise it to 82%
vardhjain Jun 23, 2026
0c0738b
Add GitHub Pages docs site + README docs section
vardhjain Jun 23, 2026
4a6192d
Add dashboard screenshot; fix dashboard bar chart + add latency view
vardhjain Jun 23, 2026
dee231a
Add chat UI screenshot; fix ChatInterface for current Gradio
vardhjain Jun 23, 2026
cde8767
Point live-demo links at the deployed Streamlit URL
vardhjain Jun 23, 2026
d3b2f35
CI: pass CODECOV_TOKEN and bump codecov-action to v5
vardhjain Jun 23, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file removed .DS_Store
Binary file not shown.
20 changes: 20 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# https://editorconfig.org
root = true

[*]
charset = utf-8
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true
indent_style = space
indent_size = 4

[*.{md,yml,yaml,json,toml}]
indent_size = 2

[*.ipynb]
trim_trailing_whitespace = false
insert_final_newline = false

[Makefile]
indent_style = tab
16 changes: 16 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Copy to .env and fill in. Never commit .env (it is in .gitignore).
# On Google Colab, set these via the Secrets panel (key icon) instead.

# ── ArangoDB (GraphRAG only) ──────────────────────────────────────────────────
# Local (default): use the bundled docker-compose — `docker compose up -d`,
# then ARANGO_HOST=http://localhost:8529 and ARANGO_PASS=devpassword.
# Cloud (ArangoDB Oasis): point ARANGO_HOST at your deployment endpoint, e.g.
# https://<your-deployment>.arangodb.cloud:8529
ARANGO_HOST=http://localhost:8529
ARANGO_USER=root
ARANGO_PASS=
ARANGO_DB=pubmed_graph

# ── Ollama (LLM) ──────────────────────────────────────────────────────────────
OLLAMA_API=http://localhost:11434/api/chat
LLM_MODEL=deepseek-r1:8b
18 changes: 18 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Normalize line endings: LF in the repository and on checkout, everywhere.
* text=auto eol=lf

# Must be LF to run on Unix (Makefile is also tab-sensitive).
Makefile text eol=lf
*.sh text eol=lf

# Binary assets — no EOL conversion, no diff noise.
*.png binary
*.jpg binary
*.pdf binary
*.pptx binary
*.pkl binary
*.bin binary

# Thin Colab wrappers are documentation, not core source — keep them out of the
# language breakdown so the repo reads as the Python project it is.
*.ipynb linguist-documentation
34 changes: 34 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
---
name: Bug report
about: Report something that isn't working as expected
title: "[Bug] "
labels: bug
assignees: ""
---

**Describe the bug**
A clear and concise description of what the bug is.

**To reproduce**
Steps or the exact command, e.g.:
```bash
python scripts/run_benchmark.py --arm graph --n 200
```

**Expected behavior**
What you expected to happen.

**Logs / traceback**
```
paste the error here
```

**Environment**
- OS:
- Python version:
- Running where: [local / Colab]
- GPU (if any):
- Arango reachable / Ollama running: [yes/no]

**Additional context**
Anything else that might help.
5 changes: 5 additions & 0 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
blank_issues_enabled: false
contact_links:
- name: Question / discussion
url: https://github.com/vardhjain/Knowledge_Graph_Question_Answering/discussions
about: Ask a question or discuss the methodology, results, or design.
24 changes: 24 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
name: Feature request
about: Suggest an idea or improvement
title: "[Feature] "
labels: enhancement
assignees: ""
---

**What problem does this solve?**
A clear description of the motivation or gap.

**Proposed solution**
What you'd like to happen.

**Fairness check (for retrieval/eval changes)**
This project is a *fair* ablation. If your idea touches retrieval or evaluation,
note how it keeps the arms comparable (shared corpus/embedder/reranker/prompt/
LLM/top-k) and avoids leaking the answer into context.

**Alternatives considered**
Any other approaches you weighed.

**Additional context**
Links, papers, or examples.
29 changes: 29 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
## Summary

<!-- What does this PR change, and why? -->

## Type of change

- [ ] Bug fix
- [ ] New feature
- [ ] Refactor / cleanup
- [ ] Docs
- [ ] Benchmark / results

## Checklist

- [ ] `make test` passes
- [ ] `make lint` passes
- [ ] `CHANGELOG.md` updated under "Unreleased"
- [ ] Docs/README updated if behavior changed

## Fairness (retrieval/evaluation changes only)

- [ ] Confounders (embedder, reranker, prompt, LLM, top-k, seed, n) stay in
`config.py` and identical across arms
- [ ] No benchmark question/answer can leak into a retrieved context
(the leakage regression test still passes)

## Notes

<!-- Anything reviewers should know: trade-offs, follow-ups, screenshots. -->
42 changes: 42 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
name: CI

on:
push:
branches: [main, revamp]
pull_request:

jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.10", "3.11"]
steps:
- uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: pip

- name: Install test dependencies
# The heavy ML libraries (torch, sentence-transformers, faiss, arango,
# datasets) are imported lazily, so unit tests need only this light set.
run: |
python -m pip install --upgrade pip
python -m pip install numpy scikit-learn scipy requests pytest pytest-cov ruff

- name: Lint (ruff)
run: ruff check src scripts tests app

- name: Test (pytest)
run: pytest --cov=kgqa --cov-report=xml --cov-report=term-missing

- name: Upload coverage to Codecov
if: matrix.python-version == '3.11'
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }}
files: ./coverage.xml
fail_ci_if_error: false
34 changes: 34 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# ── OS ────────────────────────────────────────────────────────────────────────
.DS_Store
Thumbs.db

# ── Python ────────────────────────────────────────────────────────────────────
__pycache__/
*.py[cod]
*.egg-info/
.eggs/
build/
dist/
.venv/
venv/
env/
.ipynb_checkpoints/

# ── Secrets ───────────────────────────────────────────────────────────────────
.env

# ── Caches & artifacts (regenerated; never committed) ─────────────────────────
pubmed_vectors_cache.pkl
Plain_RAG/pubmed_rag_index.bin
Plain_RAG/pubmed_rag_data.pkl
*.bin
*.pkl

# ── Results (figures are committed; keep raw JSON if you want — see README) ────
# results/ is committed intentionally so the README can reference real numbers.

# ── Tooling ───────────────────────────────────────────────────────────────────
.pytest_cache/
.ruff_cache/
.coverage
htmlcov/
21 changes: 21 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Run automatically on `git commit` after `pre-commit install`.
# See https://pre-commit.com
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.6.9
hooks:
- id: ruff
args: [--fix]
- id: ruff-format

- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: end-of-file-fixer
- id: trailing-whitespace
- id: check-yaml
- id: check-toml
- id: check-added-large-files
args: [--maxkb=1024]
- id: check-merge-conflict
- id: detect-private-key
11 changes: 11 additions & 0 deletions .streamlit/config.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Theme for the Streamlit dashboard (app/dashboard.py). Read by `streamlit run`
# locally and by Streamlit Community Cloud. Only long-stable keys are used so it
# renders correctly on any recent Streamlit version. Palette matches the
# matplotlib figure in results/ablation.png (blue primary).
[theme]
base = "light"
primaryColor = "#2196F3"
backgroundColor = "#FFFFFF"
secondaryBackgroundColor = "#F5F7FA"
textColor = "#1A2027"
font = "sans serif"
57 changes: 57 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Changelog

All notable changes to this project are documented here. The format follows
[Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project
adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added
- **Interactive UIs** in `app/`: a Gradio chat demo (`chat_app.py`) over the
winning `graph` arm that cites source PubMed IDs, and a Streamlit results
dashboard (`dashboard.py`) that visualizes the ablation, McNemar tests, and
per-class breakdown. `requirements-app.txt`, `make chat` / `make dashboard`.
- `BaseRetriever.chat()` — conversational answer plus the retrieved source pubids.
- `scripts/compare.py` now also writes `results/summary.json` (structured metrics
+ contrasts) for the dashboard.
- One-click **Streamlit Community Cloud** deploy for the dashboard: a light
`app/requirements.txt` (picked up before the heavy root file), a themed
`.streamlit/config.toml`, a richer page config, and a README live-demo badge.

## [1.0.0] — 2026-06-12

The "fair comparison" revamp: turned a confounded notebook demo into a
controlled, reproducible 4-arm ablation with an industry-standard repo layout.

### Added
- Importable `src/kgqa/` package: `config`, `prompts`, `llm`, `data`,
`evaluation`, `models`, and a `retrieval/` sub-package (`base`, `plain`, `graph`).
- Four retrieval arms isolating each component:
`plain → plain_rr → graph → graph_concepts`.
- A shared `ChunkStore` so every arm searches an identical corpus.
- MeSH concept-hop expansion (`graph_concepts`) — the previously unused
`Concepts`/`MENTIONS` graph is now exercised.
- Seeded random sampling and a paired **McNemar** significance test.
- `scripts/`: `ingest.py` (leakage-free graph build), `run_benchmark.py`
(`--arm`, retry + Ollama auto-restart + checkpointing), `compare.py`.
- Test suite (CPU-only via fakes), GitHub Actions CI, `ruff` + `pre-commit`.
- Docs and meta: README with results, `CONTRIBUTING`, `CODE_OF_CONDUCT`,
`SECURITY`, `CITATION.cff`, issue/PR templates, `Makefile`, architecture diagram.
- Benchmark results (n=200) and ablation figure under `results/`.

### Fixed
- **Label leakage:** ingestion no longer stores a question-derived `title` or
`final_decision`; graph contexts use generic `=== STUDY n ===` labels, so the
benchmark question/answer can never appear in a retrieved context.
- **Confounded comparison:** the cross-encoder reranker is now its own arm
instead of a hidden advantage for GraphRAG.
- **Inconsistent corpus/chunking** across arms — now identical.
- `NameError` in the graph-expansion fallback path.

### Changed
- Generation is bounded (`num_predict`) and the model kept resident
(`keep_alive`); `LLM_NUM_CTX` / `LLM_NUM_PREDICT` are environment-tunable.
- Removed the dead `faiss` dependency (PlainRAG uses the shared numpy-cosine store).

[Unreleased]: https://github.com/vardhjain/Knowledge_Graph_Question_Answering/compare/v1.0.0...HEAD
[1.0.0]: https://github.com/vardhjain/Knowledge_Graph_Question_Answering/releases/tag/v1.0.0
25 changes: 25 additions & 0 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
cff-version: 1.2.0
title: "Knowledge Graph Question Answering: a fair GraphRAG vs PlainRAG comparison"
message: "If you use this software or its findings, please cite it as below."
type: software
authors:
- given-names: Vardh
family-names: Jain
email: vardhjain20@gmail.com
repository-code: "https://github.com/vardhjain/Knowledge_Graph_Question_Answering"
abstract: >-
A controlled 4-arm ablation (plain, plain_rr, graph, graph_concepts) on
PubMedQA that isolates what a knowledge graph contributes to retrieval-augmented
question answering, holding corpus, chunking, embedder, reranker, prompt, LLM,
and top-k constant. Includes a paired McNemar significance test and a
leakage-free ArangoDB graph schema.
keywords:
- graphrag
- retrieval-augmented-generation
- knowledge-graph
- pubmedqa
- arangodb
- ablation-study
license: MIT
version: 1.0.0
date-released: "2026-06-12"
Loading