RawToWise

LLM Knowledge Compiler — drop raw documents, get a structured markdown wiki.

Install · Quick Start · How It Works · Contributing

demo-keyless.mp4

raw/ (papers, articles, URLs)
  → rtw compile → wiki/ (structured .md with backlinks)
                    → rtw query → answers saved in output/
                    → rtw lint  → detect contradictions, fill gaps

Inspired by Andrej Karpathy's LLM knowledge base workflow. Turn his "hacky collection of scripts" into a real tool.

Why RawToWise?

Problem	RawToWise
RAG requires vector DB infra	No vector DB — LLM navigates via index + backlinks
Chat answers disappear	Exploration = accumulation — every query can be saved and revisited
PKM requires manual organizing	Drop and forget — put files in `raw/`, LLM handles the rest
Vendor lock-in (NotebookLM, etc.)	Plain markdown — works in Obsidian, VSCode, or any editor

Install

curl -fsSL https://raw.githubusercontent.com/vericontext/rawtowise/main/install.sh | bash

Other install methods

# Via pipx
pipx install git+https://github.com/vericontext/rawtowise.git

# Via uv
uv tool install git+https://github.com/vericontext/rawtowise.git

# From source
git clone https://github.com/vericontext/rawtowise.git && cd rawtowise && pip install -e .

RawToWise can run through your logged-in Codex or Claude Code CLI session, so a separate API key is not required for local use. Direct Anthropic API usage is still supported with ANTHROPIC_API_KEY.

Quick Start

# 1. Initialize a project
rtw init --name "AI Research"

# 2. Ingest sources
rtw ingest https://example.com/article
rtw ingest "https://en.wikipedia.org/wiki/Transformer_(deep_learning)"
rtw ingest paper.pdf
rtw ingest ./my-articles/

# 3. Compile into a wiki
rtw compile

# 4. Ask questions
rtw query "What are the key debates in this field?"

# 5. Health check
rtw lint

How It Works

Ingest — Fetch URLs (via Jina Reader), copy local files into raw/, and convert supported document formats to Markdown with MarkItDown. Raw sources stay intact; processed Markdown and source metadata live under .rtw/.

Compile — LLM extracts key concepts from all compilable sources, generates interlinked wiki articles with [[backlinks]] and [source: source_id:Lx-Ly] citations, and builds an index. Articles are generated in parallel for speed. Incremental compiles use source hashes to skip unchanged inputs.

Query — LLM reads the wiki index, finds relevant articles, and synthesizes an answer. Answers are printed to the terminal and saved to output/ for future reference; direct API backends stream when supported.

Lint — LLM audits the wiki for contradictions, coverage gaps, stale information, and suggested explorations. RawToWise also checks dangling wikilinks, uncited concept pages, orphan pages, and stale source hashes.

Commands

Command	Description
`rtw init`	Initialize a new project (creates dirs + config, detects LLM backend)
`rtw ingest <source>`	Ingest URL, file, or directory into `raw/`
`rtw compile`	Compile sources into wiki (incremental by default)
`rtw compile --full`	Full recompile from scratch
`rtw compile --dry-run`	Estimate compile input size and direct API cost
`rtw query "question"`	Ask the wiki
`rtw query "..." --format table`	Output as markdown table
`rtw query "..." --deep`	Deep research mode (longer output)
`rtw lint`	Run wiki health check
`rtw stats`	Show wiki statistics

Project Structure

my-research/
├── rtw.yaml              # Configuration
├── .env                  # Optional API key overrides (gitignored)
├── raw/                  # Raw sources — you add files here
│   ├── articles/         #   Web articles (auto-sorted)
│   ├── papers/           #   PDFs (auto-sorted)
│   ├── documents/        #   Office/ePub docs
│   └── data/             #   CSV/JSON/XML/Excel files
├── wiki/                 # LLM-generated wiki — don't edit manually
│   ├── AGENTS.md         #   Wiki schema + maintenance rules
│   ├── _index.md         #   Master index
│   ├── _sources.md       #   Source catalog
│   ├── log.md            #   Append-only operation log
│   └── concepts/         #   Concept articles with [[backlinks]]
├── output/               # Query results
│   └── queries/          #   Saved answers
└── .rtw/                 # Internal state (manifest, processed markdown, debug logs)
    ├── sources.json      #   Source manifest with hashes/provenance
    └── processed/        #   Markdown converted from PDFs/Office/etc.

Configuration

rtw.yaml (auto-generated by rtw init):

version: 1
name: "My Research"

llm:
  provider: auto                   # auto, anthropic, codex, or claude-code
  compile: claude-sonnet-4-6      # Fast model for compilation
  query: claude-sonnet-4-6        # Query answering
  lint: claude-haiku-4-5-20251001 # Economical model for health checks
  codex_model: ""                 # Optional Codex model override
  claude_code_model: ""           # Optional Claude Code model override
  timeout_seconds: 600

compile:
  strategy: incremental
  max_concepts: 200
  language: en                    # Wiki language

llm.provider: auto resolves in this order:

Active Codex session + codex CLI
Active Claude Code session + claude CLI
ANTHROPIC_API_KEY
Installed Claude Code CLI
Installed Codex CLI

You can force a backend per run:

RAWTOWISE_LLM_PROVIDER=codex rtw compile
RAWTOWISE_LLM_PROVIDER=claude-code rtw query "..."
RAWTOWISE_LLM_PROVIDER=anthropic rtw lint

Agent-Assisted Development

This repository is set up for both Codex and Claude Code:

AGENTS.md — shared repository instructions for Codex, Claude Code, and other agents
CLAUDE.md — Claude Code entry point that delegates to AGENTS.md
.codex/config.toml — project-scoped Codex defaults and hook enablement
.codex/hooks.json + .codex/hooks/ — Codex hooks for version sync, patch auto-bump on git commit, and destructive command guards
.claude/ — Claude Code hooks for the same version sync / auto-bump workflow

Keep personal choices such as model, auth method, sandbox, approval policy, telemetry, and MCP servers in your user-level Codex or Claude Code config. Project hooks may require starting a new trusted Codex session before they load.

Viewing the Wiki

The compiled wiki is plain markdown with [[wiki-links]]. Best viewed with:

Obsidian — open wiki/ as a vault. Graph view shows concept connections.
VSCode + Foam — [[backlink]] support with graph visualization.
Any markdown viewer — files are standard .md, readable anywhere.

Cost

RawToWise can use your logged-in Codex or Claude Code CLI session. Those backends follow your CLI account's subscription, rate limit, or usage policy. If you set llm.provider: anthropic or ANTHROPIC_API_KEY, RawToWise calls the Anthropic API directly and API billing applies.

Operation	Anthropic API estimate
Ingest URL/file	No LLM API call
Compile 5 sources	~$1-2
Single query	~$0.05-0.15
Lint	~$0.50

Use rtw compile --dry-run to estimate compile input size before compiling. Cost estimates are only meaningful for direct API backends.

Roadmap

See open issues labeled roadmap for planned features, including:

YouTube transcript support
Review/approval mode for generated wiki edits
Hybrid local search (BM25/vector/rerank)
Ollama/local model support
Obsidian plugin
MCP server for AI agents

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

Uninstall

curl -fsSL https://raw.githubusercontent.com/vericontext/rawtowise/main/uninstall.sh | bash

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.claude		.claude
.codex		.codex
.github		.github
src/rawtowise		src/rawtowise
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
TESTING.md		TESTING.md
install.sh		install.sh
pyproject.toml		pyproject.toml
uninstall.sh		uninstall.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RawToWise

Why RawToWise?

Install

Quick Start

How It Works

Commands

Project Structure

Configuration

Agent-Assisted Development

Viewing the Wiki

Cost

Roadmap

Contributing

Uninstall

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RawToWise

Why RawToWise?

Install

Quick Start

How It Works

Commands

Project Structure

Configuration

Agent-Assisted Development

Viewing the Wiki

Cost

Roadmap

Contributing

Uninstall

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages