Atomic Notes transforms rich sources into verified atomic knowledge units.
The current implementation starts with PDFs, but the project is meant to be input- and output-independent: source adapters normalize different media into a common source representation, pipelines create atomic notes, and renderers/exporters decide where those notes go.
source input -> normalized source -> atomic-note pipeline -> output renderer
PDF input and Obsidian-style Markdown are the first supported path, not the whole product.
generative v0.3.136 · extractive v0.2.0 · last updated: 2026-06-10
436 tests passing (CI: ubuntu + windows). An independent multi-rater project assessment
(2026-06-10) and the resulting roadmap live in internal/docs/ — see
2026-06-10-projekt-bewertung.md and m1-installierbarkeit-plan.md.
All commands below assume you are in the repository root after cloning — the
example file and .env paths are repo-relative, not part of an installed wheel.
git clone https://github.com/TillQuandel/atomic-notes.git
cd atomic-notes
pip install -e .poppler-utils (required for PDF text extraction via pdftotext):
| Platform | Command |
|---|---|
| Ubuntu/Debian | sudo apt install poppler-utils |
| macOS | brew install poppler |
| Windows | choco install poppler or scoop install poppler |
The default backend drives the Claude Code CLI — no API key needed. Install the CLI and log in once:
npm install -g @anthropic-ai/claude-code # or follow the official install docs
claude loginFor an API-based backend (Anthropic, OpenAI, Ollama, …) set
ATOMIC_AGENT_BACKEND=litellm and add a provider key. See
generative/README.md for full backend documentation.
Privacy: the
litellmbackend sends PDF text to the configured external API (e.g. Anthropic/OpenAI). For a fully local path that never leaves your machine, use theextractivepipeline (see Pipelines below) or a locallitellmprovider such as Ollama. The defaultsubscriptionbackend uses your own Claude account.
Copy the example env file and fill in your paths:
cp generative/.env.example generative/.env
# edit generative/.env: set ATOMIC_AGENT_VAULT_PATH to your Obsidian vaultGenerated notes land in the Obsidian vault directory configured via
ATOMIC_AGENT_VAULT_PATH in generative/.env.
atomic-notes doctorStart with --dry-run. It shows what would be generated — including a slim
Markdown diff of any note that a re-run would overwrite — without writing or
changing a single file. Once the preview looks right, drop the flag for the real
run.
# recommended first run — shows what would be generated (and what a re-run would
# overwrite, as a diff) without writing any files
atomic-notes run --source examples/zettelkasten-primer.pdf --dry-run
# full run — writes atomic notes to your configured vault
atomic-notes run --source examples/zettelkasten-primer.pdfA local web GUI wraps the same pipeline: pick a configured PDF or drag-and-drop (or upload) any PDF from your machine, watch live per-stage progress, and in dry-run mode preview each generated note (routing, critic score, confidence) before any write. It runs the CLI as a subprocess and streams progress over SSE — no React/npm, no telemetry, fully offline.
pip install -e '.[gui]' # FastAPI + uvicorn + python-multipart
atomic-notes gui # opens http://127.0.0.1:8052It stands beside the read-only eval dashboard, not replacing it.
Below is a real note the generative pipeline produced from the bundled
examples/zettelkasten-primer.pdf. Every claim carries a footnote anchored to a
source page, the source block is rendered deterministically from metadata (not
free-generated), and quality concerns are surfaced as quality-flags rather than
hidden. Exact output varies with the notes already in your vault and the metadata
the source exposes.
---
title: "Atomic Note"
aliases:
- "Atomare Note"
- "atomic note"
- "Zettelkasten-Grundeinheit"
type: atomic
synthesis-confidence: low
confidence-rationale: "nicht peer-reviewed (Methodische Limits); nur 1 Anker (Relevance)"
auto-vault-recommended: true
source-file: "zettelkasten-primer.pdf"
claude-generated: true
quality-flags:
- "⚠️ kein DOI — Qualität nicht automatisch prüfbar"
- "⚠️ Duplikat-Risiko hoch — prüfe: Atomic Notes"
created: 2026-06-17
tags:
- zettelkasten
- knowledge-management
related:
- "[[Atomic Notes]]"
- "[[Schema-Konzept]]"
---
# Atomic Note: Kleinstmögliche eigenständige Wissenseinheit mit genau einer Idee
Eine Atomic Note hält genau eine Idee fest und ist die kleinste Gedankeneinheit, die noch für sich allein verständlich ist[^1]. Der Begriff ist der Chemie entlehnt: wie ein Atom die kleinste Einheit mit den Eigenschaften eines Elements ist, ist eine Atomic Note die kleinste Informationseinheit, die noch ohne äußeren Kontext bedeutsam bleibt[^2].
Die Beschränkung auf eine Idee pro Note ist keine Einschränkung, sondern eine Design-Entscheidung[^4]. Wenn jede Note einen einzigen kohärenten Gedanken trägt, wird Retrieval (Wiederfinden) präzise und Rekombination möglich[^5].
> [!quote]- Zettelkasten-Primer 2026, S. 1
> „A note that mixes three ideas is hard to link to anything because it is always half-relevant."
[^1]: zettelkasten-primer, S. 1.
[^2]: zettelkasten-primer, S. 1.
[^4]: zettelkasten-primer, S. 1.
[^5]: zettelkasten-primer, S. 1.
## Quellen
*Quelle: zettelkasten-primer 2026: zettelkasten-primer, S. 1*The body above is abridged for the README; a full run emits the complete note (all paragraphs and page anchors) to your configured vault.
| Field | Meaning |
|---|---|
type |
Note kind (atomic, merge-stub, …). |
synthesis-confidence |
Pipeline's confidence in the synthesis: high / medium / low. |
confidence-rationale |
Short reason for a low/medium confidence (only when set). |
quality-flags |
Concerns surfaced for review (e.g. no DOI, duplicate risk) — not hidden. |
source-file |
The source PDF the note was generated from. |
source-status |
unresolved when the source identity (author/year) could not be confirmed — e.g. enrichment found none or a weak CrossRef match was rejected; the file is left untouched and the note flagged for review. |
auto-vault-recommended |
Whether the critic deems the note vault-ready; routing itself is tag-based via Auto Note Mover. |
pipeline-content-hash |
Checksum of the generated note — lets a re-run detect manual edits and avoid overwriting them. |
- M1 — installable by strangers: packaging, entry point, preflight
doctor, hardened backend error paths, CI on ubuntu + windows, quickstart walkthrough, and bundled example are all done. M1 complete. Plan:internal/docs/m1-installierbarkeit-plan.md. - M2 — trustworthy output: gold-standard coverage measurement, threshold calibration, PDF text-quality gate + OCR fallback, a small reproducible benchmark.
- M3 — staying power: configurable note conventions beyond Obsidian, REST/API layer (issues #9–#11).
generative/ LLM-based synthesis pipeline with verifier, critic, and quality gates
extractive/ Local extractive pipeline; no free generation, source sentences only
shared/ Shared schemas, database schema, and cross-pipeline utilities
tests/ Repository-level tests, currently focused on the extractive pipeline
internal/ Internal evaluation assets and development notes, not user-facing product
internal/dashboard/ is used for evaluation and debugging while developing the pipelines. It is not part of the public user workflow.
The generative pipeline synthesizes standalone atomic notes from source material. It uses LLM stages for planning, extraction, verification, cross-reference checks, and critique. This is the higher-quality path when synthesis is useful and API/model access is acceptable.
No API key is required: the default backend drives the Claude Code CLI, so a Claude
Pro/Max subscription plus a logged-in CLI is enough. An API-based backend (litellm:
Anthropic, OpenAI, Ollama, …) is available via ATOMIC_AGENT_BACKEND=litellm. See
generative/README.md for details and limits.
pip install -e .
atomic-notes doctor
atomic-notes run --source <pdf> --dry-run
atomic-notes run --source <pdf>The extractive pipeline builds notes from source sentences. It is local-first and does not freely generate prose, so it is useful as a privacy-preserving baseline and as a low-hallucination comparison path.
python extractive/orchestrator.py --source <pdf> --output obsidian --out-dir ./notes
python extractive/orchestrator.py --source <pdf> --output json --out-dir ./notesThe long-term output contract is a structured atomic note: title, body, source anchors, source metadata, quality status, and optional links/tags. Obsidian Markdown is one renderer. Plain Markdown, JSON, ZIP exports, and other PKM formats should be renderer concerns rather than pipeline assumptions.
PDF is the first adapter. Future adapters should normalize HTML/articles, RSS items, transcripts, podcasts, videos, and other concept-rich sources into the same source model before the pipeline runs.
Current Stage-0 baseline is pdftotext. A June 2026 A/B probe evaluated pdfplumber and GROBID but did not show a robust advantage over pdftotext; pdfplumber also regressed on a two-column PDF through glued words and lower word yield. The pdfplumber adapter is therefore parked until a focused comparison shows a yield or grounding gain over pdftotext beyond run noise.
The project is still early and carries some historical naming in older internal docs and eval data. New code and public documentation should use:
generativefor the LLM synthesis pipelineextractivefor the local sentence-extraction pipelineinternalfor dashboards, calibration, and development-only tooling
Apache 2.0