feat(rag): Ask this book — conversational streaming chat + gpt-4.1-mini#385
Merged
Conversation
Turns the single-turn FAQ into a real chat (grounding + citations + spoiler gate
unchanged).
- Model: rag.ask -> gpt-4.1-mini (dedicated openai-rag provider, mirrors
openai-explain; translate/podcast stay nano).
- Companion system prompt: warm, conversational, handles greetings; grounding is
STRICT and overrides everything (covers themes/symbolism/interpretation, not
just plot) -> every book-fact cites [n]; greeting/meta the only no-cite case.
- Multi-turn memory: AskRequest.History (client-held, server-clamped last 6
turns / 4000 chars each); real LlmMessage[] (system -> excerpts -> history ->
question); retrieval still per latest question. Eval call passes [].
- SSE streaming (content-negotiated, copies Explain): delta* then terminal done
{citations,lastReadOrd,insufficient}; empty chunks -> friendly delta + done,
NO model call (spoiler/hallucination short-circuit, both paths). JSON fallback
unchanged. Both catalog + user-book ask. MaxOutputTokens 320->400.
- Web: streaming token render + blinking caret, suggested starters on empty
thread, last-6 history sent, abort on new-question/unmount. Mobile keeps JSON.
architect -> backend+web -> adversarial QA (SHIP; closed grounding loophole +
added empty-chunk-no-LLM-call tests). 878 unit + 564 web green.
NOTE: re-run the paid grounding/citation eval on /ai-quality post-deploy.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Turns the single-turn FAQ-style Ask into a real chat — grounding + citations + spoiler gate unchanged.
rag.ask→ gpt-4.1-mini (dedicatedopenai-ragprovider, mirrorsopenai-explain).[n]; greeting/meta the only no-cite case.history(client-held, server-clamped last 6 turns / 4000 chars) → realLlmMessage[]; retrieval per latest question.delta*→ terminaldone {citations,lastReadOrd,insufficient}; empty chunks → friendly delta + done, no model call (spoiler/hallucination short-circuit on both paths). JSON fallback unchanged. Both catalog + user-book.architect → backend + web → adversarial QA (SHIP) — closed a grounding loophole (interpretation/themes weren't bound) + added empty-chunk-no-LLM-call tests. 878 unit + 564 web green; solution + web build clean.
/ai-qualitypost-deploy (only runtime confirmation of the tightened prompt).🤖 Generated with Claude Code