Akash-Raghavendra · vardhjain · Jun 11, 2026 · Jun 11, 2026 · Jun 11, 2026 · Jun 12, 2026
diff --git a/.DS_Store b/.DS_Store
diff --git a/.editorconfig b/.editorconfig
@@ -0,0 +1,20 @@
+# https://editorconfig.org
+root = true
+
+[*]
+charset = utf-8
+end_of_line = lf
+insert_final_newline = true
+trim_trailing_whitespace = true
+indent_style = space
+indent_size = 4
+
+[*.{md,yml,yaml,json,toml}]
+indent_size = 2
+
+[*.ipynb]
+trim_trailing_whitespace = false
+insert_final_newline = false
+
+[Makefile]
+indent_style = tab
diff --git a/.env.example b/.env.example
@@ -0,0 +1,16 @@
+# Copy to .env and fill in. Never commit .env (it is in .gitignore).
+# On Google Colab, set these via the Secrets panel (key icon) instead.
+
+# ── ArangoDB (GraphRAG only) ──────────────────────────────────────────────────
+# Local (default): use the bundled docker-compose — `docker compose up -d`,
+#   then ARANGO_HOST=http://localhost:8529 and ARANGO_PASS=devpassword.
+# Cloud (ArangoDB Oasis): point ARANGO_HOST at your deployment endpoint, e.g.
+#   https://<your-deployment>.arangodb.cloud:8529
+ARANGO_HOST=http://localhost:8529
+ARANGO_USER=root
+ARANGO_PASS=
+ARANGO_DB=pubmed_graph
+
+# ── Ollama (LLM) ──────────────────────────────────────────────────────────────
+OLLAMA_API=http://localhost:11434/api/chat
+LLM_MODEL=deepseek-r1:8b
diff --git a/.gitattributes b/.gitattributes
@@ -0,0 +1,18 @@
+# Normalize line endings: LF in the repository and on checkout, everywhere.
+* text=auto eol=lf
+
+# Must be LF to run on Unix (Makefile is also tab-sensitive).
+Makefile text eol=lf
+*.sh text eol=lf
+
+# Binary assets — no EOL conversion, no diff noise.
+*.png binary
+*.jpg binary
+*.pdf binary
+*.pptx binary
+*.pkl binary
+*.bin binary
+
+# Thin Colab wrappers are documentation, not core source — keep them out of the
+# language breakdown so the repo reads as the Python project it is.
+*.ipynb linguist-documentation
diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -0,0 +1,34 @@
+---
+name: Bug report
+about: Report something that isn't working as expected
+title: "[Bug] "
+labels: bug
+assignees: ""
+---
+
+**Describe the bug**
+A clear and concise description of what the bug is.
+
+**To reproduce**
+Steps or the exact command, e.g.:
+```bash
+python scripts/run_benchmark.py --arm graph --n 200
+```
+
+**Expected behavior**
+What you expected to happen.
+
+**Logs / traceback**
+```
+paste the error here
+```
+
+**Environment**
+- OS:
+- Python version:
+- Running where: [local / Colab]
+- GPU (if any):
+- Arango reachable / Ollama running: [yes/no]
+
+**Additional context**
+Anything else that might help.
diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml
@@ -0,0 +1,5 @@
+blank_issues_enabled: false
+contact_links:
+  - name: Question / discussion
+    url: https://github.com/vardhjain/Knowledge_Graph_Question_Answering/discussions
+    about: Ask a question or discuss the methodology, results, or design.
diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md
@@ -0,0 +1,24 @@
+---
+name: Feature request
+about: Suggest an idea or improvement
+title: "[Feature] "
+labels: enhancement
+assignees: ""
+---
+
+**What problem does this solve?**
+A clear description of the motivation or gap.
+
+**Proposed solution**
+What you'd like to happen.
+
+**Fairness check (for retrieval/eval changes)**
+This project is a *fair* ablation. If your idea touches retrieval or evaluation,
+note how it keeps the arms comparable (shared corpus/embedder/reranker/prompt/
+LLM/top-k) and avoids leaking the answer into context.
+
+**Alternatives considered**
+Any other approaches you weighed.
+
+**Additional context**
+Links, papers, or examples.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,29 @@
+## Summary
+
+<!-- What does this PR change, and why? -->
+
+## Type of change
+
+- [ ] Bug fix
+- [ ] New feature
+- [ ] Refactor / cleanup
+- [ ] Docs
+- [ ] Benchmark / results
+
+## Checklist
+
+- [ ] `make test` passes
+- [ ] `make lint` passes
+- [ ] `CHANGELOG.md` updated under "Unreleased"
+- [ ] Docs/README updated if behavior changed
+
+## Fairness (retrieval/evaluation changes only)
+
+- [ ] Confounders (embedder, reranker, prompt, LLM, top-k, seed, n) stay in
+      `config.py` and identical across arms
+- [ ] No benchmark question/answer can leak into a retrieved context
+      (the leakage regression test still passes)
+
+## Notes
+
+<!-- Anything reviewers should know: trade-offs, follow-ups, screenshots. -->
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -0,0 +1,42 @@
+name: CI
+
+on:
+  push:
+    branches: [main, revamp]
+  pull_request:
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ["3.10", "3.11"]
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+          cache: pip
+
+      - name: Install test dependencies
+        # The heavy ML libraries (torch, sentence-transformers, faiss, arango,
+        # datasets) are imported lazily, so unit tests need only this light set.
+        run: |
+          python -m pip install --upgrade pip
+          python -m pip install numpy scikit-learn scipy requests pytest pytest-cov ruff
+
+      - name: Lint (ruff)
+        run: ruff check src scripts tests app
+
+      - name: Test (pytest)
+        run: pytest --cov=kgqa --cov-report=xml --cov-report=term-missing
+
+      - name: Upload coverage to Codecov
+        if: matrix.python-version == '3.11'
+        uses: codecov/codecov-action@v5
+        with:
+          token: ${{ secrets.CODECOV_TOKEN }}
+          files: ./coverage.xml
+          fail_ci_if_error: false
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,34 @@
+# ── OS ────────────────────────────────────────────────────────────────────────
+.DS_Store
+Thumbs.db
+
+# ── Python ────────────────────────────────────────────────────────────────────
+__pycache__/
+*.py[cod]
+*.egg-info/
+.eggs/
+build/
+dist/
+.venv/
+venv/
+env/
+.ipynb_checkpoints/
+
+# ── Secrets ───────────────────────────────────────────────────────────────────
+.env
+
+# ── Caches & artifacts (regenerated; never committed) ─────────────────────────
+pubmed_vectors_cache.pkl
+Plain_RAG/pubmed_rag_index.bin
+Plain_RAG/pubmed_rag_data.pkl
+*.bin
+*.pkl
+
+# ── Results (figures are committed; keep raw JSON if you want — see README) ────
+# results/ is committed intentionally so the README can reference real numbers.
+
+# ── Tooling ───────────────────────────────────────────────────────────────────
+.pytest_cache/
+.ruff_cache/
+.coverage
+htmlcov/
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,21 @@
+# Run automatically on `git commit` after `pre-commit install`.
+# See https://pre-commit.com
+repos:
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    rev: v0.6.9
+    hooks:
+      - id: ruff
+        args: [--fix]
+      - id: ruff-format
+
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v5.0.0
+    hooks:
+      - id: end-of-file-fixer
+      - id: trailing-whitespace
+      - id: check-yaml
+      - id: check-toml
+      - id: check-added-large-files
+        args: [--maxkb=1024]
+      - id: check-merge-conflict
+      - id: detect-private-key
diff --git a/.streamlit/config.toml b/.streamlit/config.toml
@@ -0,0 +1,11 @@
+# Theme for the Streamlit dashboard (app/dashboard.py). Read by `streamlit run`
+# locally and by Streamlit Community Cloud. Only long-stable keys are used so it
+# renders correctly on any recent Streamlit version. Palette matches the
+# matplotlib figure in results/ablation.png (blue primary).
+[theme]
+base = "light"
+primaryColor = "#2196F3"
+backgroundColor = "#FFFFFF"
+secondaryBackgroundColor = "#F5F7FA"
+textColor = "#1A2027"
+font = "sans serif"
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,57 @@
+# Changelog
+
+All notable changes to this project are documented here. The format follows
+[Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project
+adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+
+## [Unreleased]
+
+### Added
+- **Interactive UIs** in `app/`: a Gradio chat demo (`chat_app.py`) over the
+  winning `graph` arm that cites source PubMed IDs, and a Streamlit results
+  dashboard (`dashboard.py`) that visualizes the ablation, McNemar tests, and
+  per-class breakdown. `requirements-app.txt`, `make chat` / `make dashboard`.
+- `BaseRetriever.chat()` — conversational answer plus the retrieved source pubids.
+- `scripts/compare.py` now also writes `results/summary.json` (structured metrics
+  + contrasts) for the dashboard.
+- One-click **Streamlit Community Cloud** deploy for the dashboard: a light
+  `app/requirements.txt` (picked up before the heavy root file), a themed
+  `.streamlit/config.toml`, a richer page config, and a README live-demo badge.
+
+## [1.0.0] — 2026-06-12
+
+The "fair comparison" revamp: turned a confounded notebook demo into a
+controlled, reproducible 4-arm ablation with an industry-standard repo layout.
+
+### Added
+- Importable `src/kgqa/` package: `config`, `prompts`, `llm`, `data`,
+  `evaluation`, `models`, and a `retrieval/` sub-package (`base`, `plain`, `graph`).
+- Four retrieval arms isolating each component:
+  `plain → plain_rr → graph → graph_concepts`.
+- A shared `ChunkStore` so every arm searches an identical corpus.
+- MeSH concept-hop expansion (`graph_concepts`) — the previously unused
+  `Concepts`/`MENTIONS` graph is now exercised.
+- Seeded random sampling and a paired **McNemar** significance test.
+- `scripts/`: `ingest.py` (leakage-free graph build), `run_benchmark.py`
+  (`--arm`, retry + Ollama auto-restart + checkpointing), `compare.py`.
+- Test suite (CPU-only via fakes), GitHub Actions CI, `ruff` + `pre-commit`.
+- Docs and meta: README with results, `CONTRIBUTING`, `CODE_OF_CONDUCT`,
+  `SECURITY`, `CITATION.cff`, issue/PR templates, `Makefile`, architecture diagram.
+- Benchmark results (n=200) and ablation figure under `results/`.
+
+### Fixed
+- **Label leakage:** ingestion no longer stores a question-derived `title` or
+  `final_decision`; graph contexts use generic `=== STUDY n ===` labels, so the
+  benchmark question/answer can never appear in a retrieved context.
+- **Confounded comparison:** the cross-encoder reranker is now its own arm
+  instead of a hidden advantage for GraphRAG.
+- **Inconsistent corpus/chunking** across arms — now identical.
+- `NameError` in the graph-expansion fallback path.
+
+### Changed
+- Generation is bounded (`num_predict`) and the model kept resident
+  (`keep_alive`); `LLM_NUM_CTX` / `LLM_NUM_PREDICT` are environment-tunable.
+- Removed the dead `faiss` dependency (PlainRAG uses the shared numpy-cosine store).
+
+[Unreleased]: https://github.com/vardhjain/Knowledge_Graph_Question_Answering/compare/v1.0.0...HEAD
+[1.0.0]: https://github.com/vardhjain/Knowledge_Graph_Question_Answering/releases/tag/v1.0.0
diff --git a/CITATION.cff b/CITATION.cff
@@ -0,0 +1,25 @@
+cff-version: 1.2.0
+title: "Knowledge Graph Question Answering: a fair GraphRAG vs PlainRAG comparison"
+message: "If you use this software or its findings, please cite it as below."
+type: software
+authors:
+  - given-names: Vardh
+    family-names: Jain
+    email: vardhjain20@gmail.com
+repository-code: "https://github.com/vardhjain/Knowledge_Graph_Question_Answering"
+abstract: >-
+  A controlled 4-arm ablation (plain, plain_rr, graph, graph_concepts) on
+  PubMedQA that isolates what a knowledge graph contributes to retrieval-augmented
+  question answering, holding corpus, chunking, embedder, reranker, prompt, LLM,
+  and top-k constant. Includes a paired McNemar significance test and a
+  leakage-free ArangoDB graph schema.
+keywords:
+  - graphrag
+  - retrieval-augmented-generation
+  - knowledge-graph
+  - pubmedqa
+  - arangodb
+  - ablation-study
+license: MIT
+version: 1.0.0
+date-released: "2026-06-12"