Pull ML conference paper lists, filter for the Sutro Group lens (energy-efficient training + broader training-efficiency), and produce an annotated, browsable literature base.
Live site: https://0bserver07.github.io/iclr-lit-builder/
Source: papercopilot/paperlists. Starts with ICLR 2026 (~20K records); generalizes to NeurIPS, ICML, etc.
| Count | |
|---|---|
| Papers ingested | 19,813 |
| Keyword-filtered (sent to LLM) | 4,842 |
| Score 3 (directly Sutro-relevant) | 40 |
| Score 2 (relevant) | 25 |
| Score 1 (tangential) | 134 |
| Score 0 (rejected) | 4,643 |
| Markdown rendered | 4,196 files |
fetch → ingest → filter (keywords) → score (LLM) → deepen (on demand) → render
Each stage is a CLI subcommand and writes to a SQLite database (data/db/lit.sqlite). The markdown / MkDocs site is generated from the DB.
The scoring stage uses an LLM. Two providers are supported, controlled by LIT_PROVIDER. Override the model at any time with LIT_MODEL=<name>.
# Anthropic (default)
export LIT_PROVIDER=anthropic
export ANTHROPIC_API_KEY=sk-ant-...
# default model: claude-haiku-4-5-20251001
# override: export LIT_MODEL=claude-sonnet-4-6
# Ollama Cloud
export LIT_PROVIDER=ollama
export OLLAMA_API_KEY=...
# default model: deepseek-v4-pro:cloud
# Ollama local (no key needed)
export LIT_PROVIDER=ollama
export OLLAMA_HOST=http://localhost:11434
export LIT_MODEL=llama3.1:8bVerified to work with the scoring prompt (lit score) and the deepen prompt (lit deepen). Swap with LIT_MODEL=<name>.
| Model | Notes |
|---|---|
deepseek-v4-pro:cloud |
Default. Reasoning model; ~5s per paper at 200-token limit. Best price/quality. |
deepseek-v4-flash:cloud |
Faster, lower latency, slightly less robust on edge cases. |
gpt-oss:120b |
Strong general scorer. Slightly heavier than deepseek-v4-pro. |
qwen3:235b-cloud |
Largest. Best for the deepen stage on borderline papers. |
llama3.1:70b |
Solid baseline; available locally too. |
LIT_MODEL=llama3.1:8b # 4.7 GB, fast, decent
LIT_MODEL=qwen3:14b # 8 GB, better reasoning
LIT_MODEL=deepseek-v4:7b # 4 GB, distilled reasoning modelpip install -e .
# or with uv: uv sync && uv run lit ...
lit fetch iclr2026
lit ingest iclr2026
lit filter iclr2026 # keyword pre-filter
lit score iclr2026 --limit 200 # LLM triage on survivors (0–3 + reason)
lit list iclr2026 --min-score 2 # browse high-relevance
lit deepen iclr2026 <paper_id> # structured digest on demand
lit render iclr2026 # write markdown + mkdocs nav
lit serve # local mkdocs previewScoring 5 ICLR 2026 candidates on deepseek-v4-pro:cloud (Ollama Cloud), ~5s per paper:
| score | title | reason |
|---|---|---|
| 2 | PersonalQ: Select, Quantize, and Serve Personalized Diffusion | Quantization technique for personalized diffusion models that reduces inference memory, aligning with low-precision research. |
| 2 | Reassessing Layer Pruning in LLMs | Layer pruning to reduce computation, directly addressing efficiency and model compression. |
| 1 | Toward Unifying Group Fairness Evaluation from a Sparsity Perspective | References sparsity but only as a lens for fairness evaluation, not as a contribution to training efficiency. |
| 1 | Early Layer Readouts for Robust Knowledge Distillation | Domain generalization via adaptive distillation, only tangential efficiency link. |
| 0 | Concept Alignment for Autonomous Distillation | Robustness and bias mitigation, not energy-efficient training. |
The CLI is designed to be called by other coding agents (Codex, Claude Code). Every command takes positional args, exits non-zero on error, and prints structured key=value output. See lit --help.
src/lit_builder/
models.py # shared dataclasses + SQLite DDL
config.py # paths, venue registry
data/ # papercopilot fetch + SQLite ingest
filter/ # keyword matcher
score/ # LLM scorer + deepener (Anthropic / Ollama)
render/ # markdown + mkdocs export
cli/ # typer commands
configs/keywords.yaml # editable keyword groups
| Stage | iclr2026 |
|---|---|
| fetch | done — 19,813 raw records (93 MB) |
| ingest | done — 19,813 in DB |
| filter | done — 4,842 keyword candidates |
| score | done — 4,842 / 4,842 LLM-scored via deepseek-v4-pro:cloud (40 at score 3, 25 at 2, 134 at 1, 4,643 at 0) |
| deepen | implemented; on-demand per paper |
| render | done — 4,196 markdown pages + index |
| publish | live at https://0bserver07.github.io/iclr-lit-builder/ |
pip install pytest
PYTHONPATH=src python3 -m pytest tests -q # 33 tests, all mocked LLM