Final Edit by vardhjain · Pull Request #1 · Akash-Raghavendra/Knowledge_Graph_Question_Answering

vardhjain · 2026-06-13T02:03:55Z

No description provided.

- Both GraphRAG.ipynb and Plain_RAG.ipynb are now fully self-contained (no shared_utils import dependency) with all shared constants, prompts, FuzzyEvaluator, Evaluator, and call_ollama defined inline identically - GraphRAG: credentials via Colab Secrets (ARANGO_PASS) with env var fallback - Plain_RAG: GPU-aware FAISS with faiss-gpu-cu12 swap instruction - Add Comparison.ipynb for side-by-side accuracy/F1/latency/confusion charts - Add shared_utils.py for local script runs - Add run_graphrag.py, run_plainrag.py, run_comparison.py standalone scripts

…n fallback

The Ollama install script requires zstd for extraction (new in recent versions). Added `apt-get install -y zstd -q` before the curl install. Also expanded ensure_ollama() to search multiple candidate paths and fall back to a filesystem find rather than hard-coding /usr/local/bin. Both Plain_RAG and GraphRAG notebooks updated identically.

Implements the agreed revamp: an importable src/kgqa package, a 4-arm ablation (plain -> plain_rr -> graph -> graph_concepts), and the fairness fixes from the audit. Science fixes: - Single shared ChunkStore (identical corpus + per-section chunking across arms) - Reranker promoted to its own arm so the graph never gets it as a hidden edge - Label leakage removed: ingestion stores no question-derived title and no final_decision; graph context uses generic "=== STUDY n ===" labels - MeSH Concepts/MENTIONS now used via a concept-hop arm (graph_concepts) - Seeded random sampling (n=200) + paired McNemar significance test - Fixed the NameError in the graph-expansion fallback Repo hygiene: - src/ package, scripts/ (ingest, run_benchmark, compare), thin Colab notebooks - tests/ (17 pytest cases, CPU-only via fakes), ruff config, GitHub Actions CI - README, requirements, .gitignore, LICENSE (MIT), .env.example - Docs (PDF/PPTX) moved to docs/; .DS_Store untracked; superseded files removed

- concept-hop AQL now ranks neighbours by shared-concept count first and reconstructs abstracts only for the top-N (was building an abstract for every candidate on every query — 200x on a full run) - remove faiss-cpu: PlainRAG now uses the shared numpy-cosine ChunkStore

- default ARANGO_HOST -> new Oasis deployment (581c546a8d66), overridable via env - notebooks request A100 GPU + High-RAM; add nvidia-smi check and a labeled-only ingest smoke test before the full run

A single Ollama 500/timeout previously aborted an entire arm, and since the server is shared across arms one crash cascaded into all four failing. - run_benchmark: per-question retry (3x) with automatic Ollama restart between attempts; checkpoint results every 25 questions; the script now owns Ollama health (health-check + (re)start + warm) instead of relying on a one-shot start - llm: cap generation via num_predict and keep the model resident (keep_alive), so a runaway reasoning chain can't stall/crash the server - config: LLM_NUM_CTX / LLM_NUM_PREDICT / LLM_KEEP_ALIVE / LLM_TIMEOUT now env-tunable (shrink on small-VRAM GPUs); defaults 4096 / 1024 / 30m / 180s - notebook: drop --no-ollama-start so the runner can self-heal; add a GPU-memory tuning hint

- clone cell now %cd /content + rm -rf before clone, so re-running can't nest a second checkout (caused a doubled results path) - benchmark secrets cell sets LLM_NUM_CTX=8192 / LLM_NUM_PREDICT=1024 for the 80GB A100: full graph context, bounded generation (~halves runtime; identical across arms so the comparison is unaffected)

n=200, seed 42, deepseek-r1:8b on A100. Parent-document expansion is the decisive win (plain_rr -> graph +22.5pp, McNemar p<0.0001); reranker +7pp (n.s.); concept-hop -2pp (n.s.) at 5x latency. plain/plain_rr fall below the majority baseline — context sufficiency, which the graph supplies, dominates.

- results/ablation.png + results/summary.md generated from the n=200 run - revert notebooks/02_benchmark.ipynb to the clean (output-free) wrapper that Colab's "Created using Colab" save had filled with execution outputs

Bring the repo up to industry/community standards for a public release: - Makefile (install/test/lint/ingest/benchmark/compare/clean) + `make help` - CONTRIBUTING, CODE_OF_CONDUCT, SECURITY, CHANGELOG, CITATION.cff - .github/ ISSUE_TEMPLATE (bug, feature, config) + PULL_REQUEST_TEMPLATE - assets/architecture.svg + README badges and embedded diagram - .pre-commit-config.yaml (ruff + hygiene hooks), .editorconfig - .gitattributes to enforce LF (Makefile-safe) and tidy linguist stats - richer pyproject metadata (authors, urls, classifiers, keywords, dev extras) Deliberately no configs/ dir: configuration lives in src/kgqa/config.py (typed + env-overridable), which is documented in CONTRIBUTING.

So anyone can run it out of the box: - default ARANGO_HOST is now http://localhost:8529 (no specific deployment baked in) - add docker-compose.yml for a one-command local ArangoDB - .env.example and README document both paths (local Docker / cloud Oasis) - notebooks read ARANGO_HOST + ARANGO_PASS from Colab Secrets (nothing hardcoded) and clone the main branch - README setup/run instructions generalized

- app/chat_app.py: live GraphRAG chat over the winning `graph` arm; answers cite source PubMed IDs (--share for a public link, --concepts for the concept arm) - app/dashboard.py: Streamlit dashboard of the ablation (bars, McNemar, per-class confusion matrices when raw results are present); reads results/ only, so it deploys to Streamlit Cloud - BaseRetriever.chat(): conversational answer + retrieved source pubids (tested) - compare.py now also emits results/summary.json (structured metrics) for the dashboard - requirements-app.txt, make chat/dashboard/install-app, app/ added to ruff + CI - README "Interactive demo & dashboard" section + app/README deployment notes

The dashboard is the hosted public demo (no LLM/DB/GPU — reads results/ only). The chat stays local/Colab (it needs a GPU + a persistent ArangoDB). - app/requirements.txt: light deploy deps (streamlit/pandas/scikit-learn). Streamlit Cloud reads the entrypoint's dir first, so this is used and the heavy root requirements.txt is ignored for the deploy — benchmark users keep their full deps. - .streamlit/config.toml: clean light theme matching the ablation figure. - dashboard: richer set_page_config (icon, About/menu links, methodology expander); per-class section degrades cleanly when raw results are absent. - README: "Open in Streamlit" badge + "Live demo" section; app/README has the exact Streamlit Cloud deploy steps.

…mlit.io)

Match the presentation style of the EthicLens repo — centered title with emoji + tagline, a badge row, a quick-nav line (live demo / results / why / setup), and emoji on section headers. Nav-target headings kept plain so anchors stay stable.

- new unit tests for data sampling, the Ollama client, evaluation report/save, and ChunkStore.from_dataset (mocking the datasets/requests boundaries) — 18 -> 24 tests - CI runs pytest --cov and uploads to Codecov; README gets a coverage badge - pytest-cov added to dev deps; coverage config in pyproject

- docs/index.md + docs/_config.yml (Jekyll Cayman theme): a landing page with the architecture diagram, results table, ablation figure, honest findings, and links to the demo / report / slides - README: docs badge + a Documentation section Enable via Settings -> Pages -> Deploy from branch -> main -> /docs.

- assets/dashboard.png: real screenshot of the running dashboard, embedded (clickable) in the README Live demo section - dashboard: group the accuracy/macro-F1 bars (were misleadingly stacked) and replace the redundant figure with a horizontal latency-by-arm chart - bump deploy pin to streamlit>=1.39 (grouped/horizontal bar options)

- assets/chat.png: the Gradio chat interface (title, description, example questions), embedded in the README chat section - chat_app.py: drop the 'theme' kwarg — gr.ChatInterface no longer accepts it on Gradio 5/6 (requirements pin gradio>=4), so a fresh install would have crashed

The app deployed to Streamlit's auto-generated subdomain rather than the custom kgqa-ablation; repoint the README badge/nav/screenshot link, the docs site links, and the About website to the live URL so nothing 404s.

Codecov rejected tokenless uploads ('Token required - not valid tokenless upload'), so the badge stayed unknown. Use the repo upload token via the CODECOV_TOKEN secret.

vardhjain and others added 13 commits June 11, 2026 17:32

Fix ensure_ollama PATH on Colab — use shutil.which with /usr/local/bi…

fdc5cdc

…n fallback

Point at new ArangoDB deployment; tune notebooks for Colab Pro A100

45a3b3e

- default ARANGO_HOST -> new Oasis deployment (581c546a8d66), overridable via env - notebooks request A100 GPU + High-RAM; add nvidia-smi check and a labeled-only ingest smoke test before the full run

Created using Colab

704cdeb

vardhjain marked this pull request as draft June 13, 2026 16:06

vardhjain added 2 commits June 13, 2026 12:23

vardhjain force-pushed the main branch from 43f1272 to 85522ac Compare June 13, 2026 17:02

vardhjain added 8 commits June 13, 2026 13:02

Use shields.io badge for the live demo (more robust than static.strea…

c308933

…mlit.io)

Point live-demo links at the deployed Streamlit URL

cde8767

The app deployed to Streamlit's auto-generated subdomain rather than the custom kgqa-ablation; repoint the README badge/nav/screenshot link, the docs site links, and the About website to the live URL so nothing 404s.

CI: pass CODECOV_TOKEN and bump codecov-action to v5

d3b2f35

Codecov rejected tokenless uploads ('Token required - not valid tokenless upload'), so the badge stayed unknown. Use the repo upload token via the CODECOV_TOKEN secret.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Final Edit#1

Final Edit#1
vardhjain wants to merge 23 commits into
Akash-Raghavendra:mainfrom
vardhjain:main

vardhjain commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vardhjain commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant