prd: scribe v2 — writing companion redesign

# PRD: Scribe v2. Writing Companion Redesign

**Date:** 2026-04-12
**Status:** Draft

## Problem

The scribe writing companion takes 8-15 seconds to return grammar annotations. The bottleneck is not the AI model. It's the delivery path: DevTUI shells out to overmind CLI, which manages a session, which calls the Claude API, which streams back through a subscriber process. Three process boundaries, IPC hops, and a newline-escaping hack to work around overmind dropping multiline arguments.

The user writes blog posts in both English and Portuguese. The current scribe only provides grammar/spelling checks. There is no way to interact with the AI, ask questions about the draft, or pull context from the user's Obsidian vault (which has 10+ blog drafts, 44 TILs, and a structured idea backlog already indexed by qmd).

Writing is a conversation. The scribe today is a one-way annotation feed.

## Background

The scribe was built as a proof of concept using overmind (the user's own Elixir-based AI agent platform). It works but the architecture adds latency that makes it feel sluggish. The user writes ~1 post per week, alternating between English and Portuguese.

Three insights from recent discussion:

1. **Grammar/spelling is a solved problem.** harper-core (pure Rust, Automattic, 10K+ stars) lints English in milliseconds. No LLM needed for typos.
2. **The overmind middleman is the bottleneck.** A direct HTTP call to Claude Haiku takes 2-3s. Through overmind it takes 8-15s.
3. **The real value is the conversation.** The user has a rich Obsidian vault with ideas, TILs, and drafts. A writing companion that can search the vault and discuss the draft is more valuable than one that just flags typos.

Current architecture: DevTUI → overmind CLI → overmind daemon → Claude API → overmind daemon → overmind subscribe CLI → DevTUI.

Target architecture: DevTUI → harper-core (in-process) + reqwest → LLM API (direct).

## Requirements

### Must Have

- **Instant English grammar/spelling.** harper-core runs on every content change. Annotations appear in <50ms. No network, no API key, no external process.
- **LLM writing hints for EN and PT-BR.** Style, phrasing, factual suggestions via direct API call. Configurable provider (Groq or Claude). Triggers on demand or after idle. Works for both languages.
- **Remove overmind dependency.** Delete `start_scribe_session`, `send_to_scribe`, `run_subscriber`, `escape_for_overmind`, `kill_scribe_session` from ops.rs. No sessions, no subscriber threads, no daemon management.
- **Provider configuration.** The user can set which LLM provider and model to use. API key read from environment variable. Defaults to Groq (Llama 3.1 8B) for speed.

### Should Have

- **Chat pane.** Interactive writing companion as a new layout in the Ctrl+G cycle: Preview → Scribe → Chat → Editor only. The user types a question, AI responds with context from the current draft. Conversation history preserved while editing the same article.
- **Vault context in chat.** Before sending a chat message to the LLM, run `qmd query -c vault "<question>"` and include the top 3-5 results as context. The AI sees the draft + related vault notes + the question.
- **Portuguese spelling via hunspell.** Basic PT-BR spell check using hunspell with `pt_BR` dictionary. Not grammar (no good pure-Rust option exists). Complements the LLM path for PT-BR posts.

### Out of Scope

- **Full PT-BR grammar checking without LLM.** No pure-Rust Portuguese grammar checker exists. LanguageTool (Java, 400MB+ RAM) is too heavy for a TUI. The LLM path covers this adequately.
- **Streaming responses in scribe.** The annotation format (JSON array) doesn't benefit from streaming. Chat pane streaming is a future enhancement.
- **Local LLM inference (Ollama).** The M4 16GB runs Llama 3.1 8B at ~30-50 tok/s (6-10s per response). Groq cloud delivers the same model in <1s. Local inference doesn't improve the experience at this hardware tier.
- **Multiple vault collections.** Chat searches the single `vault` qmd collection. Multi-collection support is unnecessary.
- **Persisting chat history across articles.** Chat resets when switching articles in CMS mode. The conversation is about the current draft.

## Constraints

- **harper-core is English-only.** Supports US, UK, CA, AU dialects. No Portuguese, no Spanish. This is a library limitation, not a design choice.
- **qmd must be installed for vault search.** Chat works without qmd (just no vault context), but the full experience requires it. Graceful fallback: if `qmd` is not found, skip vault search and note it in the chat.
- **API key required for LLM features.** Grammar (harper) works offline. Hints and chat require a configured API key. Clear error message if missing.
- **Sync HTTP in a thread.** The editor uses no async runtime. LLM calls run in `thread::spawn` with a sync HTTP client (ureq or reqwest blocking), writing results into the existing `Arc<Mutex<Option<Result>>>` slot. This matches the current pattern.

## Acceptance Criteria

### Instant Grammar (English)

- Given an English post with a typo ("teh"), when content changes, then a red annotation appears on the affected line within 100ms.
- Given a Portuguese post, when content changes, then harper does not run (language detection or manual toggle).
- Given a post with code blocks, when content changes, then grammar checking skips fenced code blocks.

### LLM Writing Hints

- Given any post (EN or PT-BR), when the user triggers a check (Ctrl+T or idle >10s), then annotations appear within 3s (Groq) or 5s (Claude).
- Given no API key configured, when the user triggers a check, then the scribe pane shows "No API key. Set GROQ_API_KEY or ANTHROPIC_API_KEY."
- Given the LLM provider is unreachable, when a check is triggered, then the error appears in the status log and harper annotations remain visible.

### Chat Pane

- Given the user presses Ctrl+G to cycle to Chat, then an input area appears at the bottom of the right pane with conversation history above.
- Given the user types a question and presses Enter, then the question appears in the conversation, a "thinking..." indicator shows, and the AI response appears within 3-5s.
- Given qmd is installed, when the user sends a chat message, then the AI response reflects knowledge from related vault notes (visible as "Found N related notes" in the response context).
- Given qmd is not installed, when the user sends a chat message, then the chat works without vault context and shows a one-time note: "Install qmd for vault search."
- Given the user switches articles in CMS mode, then chat history clears.
- Given the user is in Chat layout, when they press a key, then the keystroke goes to the chat input box, not to vim. Esc returns focus to vim.

### Overmind Removal

- Given a fresh DevTUI build, when the user runs the editor, then overmind is not required and not referenced in any process or error message.
- Given the scribe pane is active, when the editor exits, then no `overmind kill` process is spawned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prd: scribe v2 — writing companion redesign #7

PRD: Scribe v2. Writing Companion Redesign

Problem

Background

Requirements

Must Have

Should Have

Out of Scope

Constraints

Acceptance Criteria

Instant Grammar (English)

LLM Writing Hints

Chat Pane

Overmind Removal

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

prd: scribe v2 — writing companion redesign #7

Description

PRD: Scribe v2. Writing Companion Redesign

Problem

Background

Requirements

Must Have

Should Have

Out of Scope

Constraints

Acceptance Criteria

Instant Grammar (English)

LLM Writing Hints

Chat Pane

Overmind Removal

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions