feat: prompt user for evolution via Stop-hook AskUserQuestion + dev test harness#57
Merged
Conversation
Stop hook scans party for pokemon ready to evolve (branch + single
chain) and emits {decision:"block", reason} to force Claude to call
AskUserQuestion instead of auto-evolving or silently staging a flag.
User selects a target, Claude runs `tokenmon evolve <name> <target>`.
Refuse sets evolution_prompt_shown, which holds the prompt until the
user manually runs `/tkm evolve` to clear it.
- evolution.ts: single-chain now uses the same flag-based flow as
branch evolutions; state-missing callers keep the original return
signature so existing tests remain green
- stop.ts: post-XP scan emits block JSON first, then persists the
prompt_shown flag under lock (duplicate prompt > silent loss on
crash); lock failures are logged rather than swallowed
- markEvolutionReady helper dedups the three single-chain paths in
checkEvolution
- notifications.ts + status-line.ts: skip the ready hint once the
prompt has fired
- test/e2e: harness verifies block JSON and flag persistence via an
isolated CLAUDE_CONFIG_DIR; tmux full-session harness remains TODO
The evolution block path in stop.ts previously emitted
{decision:"block", reason} but discarded any system_message that was
populated in the same stop turn — level-up, catch, and achievement
notifications would silently disappear the moment an evolution
triggered. Merge the accumulated system_message into the block output
so user-facing messages persist alongside Claude's block instruction.
systemMessage is user-facing only per the Claude Code hooks spec, so
it does not interfere with the reason field Claude consumes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dev-only slash command that auto-cycles the full E2E path for the Stop-hook evolution AskUserQuestion feature. Per scenario: backup state and hooks, seed test party, swap hooks.json to worktree paths, spawn a fresh tmux pane with isolated CLAUDE_CONFIG_DIR, launch Claude Code, capture the AskUserQuestion render, send-keys the scenario's expected answer, 3-layer verify (UI regex, tokenmon evolve tool call, state diff), restore backup. No-arg runs all 6 scenarios sequentially; --scenario <name> runs one; --restore cleans up after an aborted run; --dry-run lists scenarios without LLM cost. Harness is excluded from the published plugin via the new files allowlist in package.json. - 6 scenarios covering branch, single-chain, batch, overflow, refuse persistence, and accept-clear-reprompt lifecycle - backup.ts: dual-format hooks.json swap (baked absolute paths OR CLAUDE_PLUGIN_ROOT/DATA template vars), byte-level restore, gen-aware state/config paths; resolves hooks path via PLUGIN_ROOT walk-up - tmux-driver.ts: pane spawn with isolated CLAUDE_CONFIG_DIR, capture-with-pattern-wait, numeric + text send-keys, graceful tmux-missing fallback - verify.ts: 3-layer assertion from UI regex through tool call detection to state diff against expected_after - cli/test-evolve.ts: SIGINT handler + try/finally for crash-safe restore (duplicate prompt preferred over silent loss) - skills/test-evolve/SKILL.md: slash command entry delegating to the tsx CLI - Tighten existing e2e test types (any to Partial<State>, Partial<Config>) and drop the unused execFileSync import Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop all tmux automation, Claude Code spawning, and UI/tool-level asserts. Claude Code's Ink-based REPL does not accept tmux send-keys submissions, which made the "spawn a child session and auto-answer AskUserQuestion" path unreliable and expensive to debug. The simpler and more useful shape is: backup, seed, swap hooks.json, let the human trigger the prompt in their own live session, then verify state and restore. CLI subcommands are now: tokenmon test-evolve --list list all scenarios tokenmon test-evolve --setup <scenario> backup + seed + swap tokenmon test-evolve --verify state diff vs expected tokenmon test-evolve --restore byte-level restore A tiny current.json pointer under .tokenmon/test-backup/ lets --verify and --restore work without passing the scenario name again. - cli/test-evolve.ts: 463 -> 156 lines, no tmux/spawn/waitForPattern - test-evolve/verify.ts: 245 -> 115 lines, state-only assertions - test-evolve/tmux-driver.ts: deleted - skills/test-evolve/SKILL.md: rewritten for manual dispatch Also pick up two earlier fixes needed to make the setup path actually work end-to-end: - backup.ts swapHooksJson: regex now accepts JSON-escaped `\"` surrounding baked absolute paths so hooks.json rewrites apply when the user runs with baked (post-install) hook paths - backup.ts: drop the hardcoded `/home/minsiwon00/...` fallback and resolve the active hooks.json via PLUGIN_ROOT + a marketplace fallback, throwing when neither exists instead of writing to a non-existent path Verified: typecheck passes; 1203/1203 tests pass; round-trip --setup branch-eevee then --restore produces byte-identical state, config, and hooks files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous SKILL.md asked the user to manually invoke /tkm:test-evolve --verify and /tkm:test-evolve --restore at the end of each scenario. That contradicted the original design goal — the cycle (seed → test → verify → restore) must always close itself so the user's live state and hooks.json never stay mutated past the test window. Update the skill to orchestrate the full cycle across turns: 1. On /tkm:test-evolve <scenario>: run --setup, tell the user to send any message, then stop the turn. 2. On the next turn, after the Stop hook emits the block reason and the user picks an option (or refuses) via AskUserQuestion, run `tokenmon evolve <pokemon> <target>` for the chosen target(s). 3. Immediately after the evolve call(s) succeed, run --verify and then --restore, always, even if verify reports FAIL. Restore is unconditional so the user's real state/config/hooks.json come back to pre-test bytes. --restore is still exposed as a manual escape hatch for the case where the session is killed before the auto-restore step runs. --verify is no longer a user-facing command. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two adjustments so the evolution prompt is always visible through the AskUserQuestion path: 1. Block detection moves BEFORE the first_stop/no_delta early return in stop.ts. The prior placement meant that if evolution_ready was already set when a new session started (e.g. after a cheat/test seed, or a resumed conversation where conditions had been met but the block had never fired), the very first Stop silently returned and the user had to send a second message before AskUserQuestion surfaced. Running block detection regardless of the lock result kind fixes this and keeps the existing `evolution_prompt_shown` guard so duplicate blocks are still prevented. 2. Drop the `evolution_ready` hint from the status line. Because the Stop hook now reliably produces an AskUserQuestion prompt on every qualifying stop, rendering the same "pokemon ready to evolve" notice in the status line was redundant noise — the prompt itself is the canonical surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nches
Three polish items surfaced during live testing of the evolution
AskUserQuestion flow:
1. Tone mismatch. Claude was paraphrasing the AskUserQuestion
question text instead of using the pokemon-voice phrasing
("...어라!? {pokemon}의 상태가...?") that the status line had
been using. Each locale's hook.evolution_candidate_line now
carries the exact per-pokemon question string under a "use
VERBATIM" label, and hook.evolution_block_reason instructs Claude
to copy that string into AskUserQuestion.question without any
rewording.
2. Overflow handling for branches with more than 3 targets (Eevee has
8). AskUserQuestion caps at 4 options, so the previous "all
targets + Refuse" instruction silently broke for Eevee-class
pokemon. The new reason text spells out the rule: ≤3 targets show
all + Refuse, 4+ targets show the first 3 + Refuse and list the
remaining targets in the question body so the user can pick any of
them via 'Other'.
3. Cross-gen name resolution in getPokemonName. Seed data (and any
other path that surfaces a pokemon not native to the active
generation's i18n) was falling back to the numeric ID, so the
block reason rendered "133" instead of "이브이". Added a cross-gen
lookup that searches each generation's game i18n before using the
ID as the final fallback; the active generation is still preferred.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ismatch Without an explicit rule, Claude could blindly feed a user's freeform 'Other' response into `tokenmon evolve`. A garbled or off-list name then errored silently inside applyBranchEvolution (not in the evolves_to list -> null return) with no useful feedback to the user. Extend hook.evolution_block_reason in all four locale + voice combos with an explicit handling rule: - Button picks run `tokenmon evolve` directly. - 'Refuse' (button or text: refuse/no/cancel/거부) skips. - Free-text 'Other' must be validated against the target list (case-insensitive, English and localized names both accepted) before the command runs. On mismatch, reply with a short "I didn't recognize that" and re-invoke the same AskUserQuestion. Re-prompt caps at 2 iterations so a user who keeps typing garbage eventually lands on an implicit 'Refuse' instead of an infinite loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…items
Three live-test fallouts:
1. Cross-gen reverse name lookup in pokemonIdByName. Previously
searched only the active generation's i18n, so "이브이" in a
gen4-active save returned undefined and the evolve CLI said
"컬렉션에서 "이브이"을(를) 찾을 수 없다". Mirror the forward lookup
from getPokemonName: try the active gen first, then fall back
across installed gens' ko/en tables. Real-gameplay benefit too —
any path that accepts user-typed names for pokemon from other
gens now resolves.
2. cmdEvolve's targetArg was passed through to the branch.name
string-equal comparison without being resolved through
resolvePokemonArg. That forced Claude (or the user) to pass
numeric IDs; a localized name like "샤미드" always fell through
to "현재 ~의 진화 조건을 만족하는 경로가 없다" even when the
target was actually eligible. Apply the same name→ID resolution
we already use on pokemonArg.
3. The branch-eevee test scenario seeded `evolution_options` with
all 8 Eeveelutions but never seeded the evolution conditions, so
applyBranchEvolution's runtime check rejected every choice. Add
`items: { water-stone, thunder-stone, fire-stone }` to the scenario
seed (and an optional `items`/`current_region` pair on the
Scenario type + writeSeed) so the chosen target actually evolves,
and trim evolution_options to the 5 branches that are genuinely
eligible (stones + friendship) for clearer overflow testing.
All 1203 existing tests still green; harness CLI typechecks clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… in skill 1. branch-eevee now seeds the full 8-branch evolution_options again (134/135/136/196/197/470/471/700). Only 3 are eligible via the seeded stones + 2 via friendship, but all 8 belong to the real Eevee dex so limiting the seed to 5 was misleading — users expected the complete set in the AskUserQuestion prompt. The ineligible three (Leafeon/Glaceon/Sylveon) naturally end up in the question body's "Other forms" list because the 4-option cap rule only promotes eligible targets to buttons. 2. Rewrite skills/test-evolve/SKILL.md so the evolution-trigger turn is treated as one continuous cycle instead of a turn-N / turn-N+1 split. The previous wording let Claude stop after the user answered the AskUserQuestion without running `tokenmon evolve`, verify, or restore. The new instructions say explicitly that the entire chain (render question → run evolve → print summary → --verify → --restore → final report) must complete within the same turn without stopping early. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the active generation's pokemon DB does not contain the source pokemon (e.g. Eevee #133 in a gen4-active save), checkEvolution, getEligibleBranches, applyBranchEvolution and applySingleChainEvolution all returned null/empty because `db.pokemon[baseId]` is undefined. That left test-evolve --setup branch-eevee + Claude-driven evolve failing with "현재 이브이의 진화 조건을 만족하는 경로가 없다" on every scenario that used a cross-gen pokemon — and would also affect any real-gameplay path where a user's party member came from another generation via migration. Fix: every access of the source pokemon's data now falls back to ensurePokemonInDB, which transparently walks the other generations' data and injects the missing pokemon into the active gen's cache. applyBranchEvolution also applies the same fallback for the target data so cross-gen branch targets resolve. Also correct branch-eevee.json: the crawled data only contains the three stone-based Eeveelutions (Vaporeon/Jolteon/Flareon), so the scenario's evolution_options list is trimmed to match reality. The later Eeveelutions (Espeon/Umbreon/Leafeon/Glaceon/Sylveon) are a data-crawler backlog item, not an overflow-path regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lback Live scenario testing surfaced two related failures: 1. cmdEvolve only dispatched through getEligibleBranches, so pokemon whose data.evolves_to is a string (or legacy line[stage+1]) — i.e. every non-branching pokemon — hit "no eligible" and returned without invoking executeEvolve. That defeated the Stop-hook AskUserQuestion flow for any single-chain pokemon: the user would see the prompt, click the target, and nothing would happen. Add a dedicated single-chain branch in cmdEvolve that routes through checkEvolution (no state → returns a validated EvolutionResult) and then executeEvolve, which already dispatches to applySingleChainEvolution under the hood. The branch runs ahead of the getEligibleBranches path so branching pokemon still use the original flow. 2. checkEvolution's string-evolves_to path only called ensurePokemonInDB for explicit cross-gen references like "gen1:25". Plain numeric IDs that happened to live only in another generation (e.g. Charmeleon #5 on a gen4-active save) fell through and returned null because targetData was undefined. Same issue on the legacy line[stage+1] path. Both paths now use the ensurePokemonInDB fallback so cross-gen targets resolve. Also seed the required evolution stones on the multi-3 and overflow-5 scenarios so Pikachu (thunder-stone) and Eevee (water/thunder/fire stones) can actually evolve when chosen. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tive The test-evolve skill only lives in Claude's context during the turn that invokes /tkm:test-evolve (setup turn). On the follow-up turn where the Stop hook emits the evolution block and Claude renders AskUserQuestion, the skill content has usually dropped out of the prompt window, so Claude hands control back to the user after the evolve call instead of running the auto verify + auto restore tail the skill required. When stop.ts detects the harness marker file ( .tokenmon/test-backup/current.json), append an explicit "[TEST HARNESS ACTIVE]" block to the reason that spells out the two trailing commands with their absolute paths (the dev flow's own PLUGIN_ROOT + bin/tsx-resolve.sh + src/cli/test-evolve.ts), plus a reminder to print the final report. That keeps the cycle self-closing whether or not the skill-turn history is still in context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…+ achievements) Three review findings from review-comments.md: 1. [P1] applySingleChainEvolution now falls back to ensurePokemonInDB for plain numeric cross-gen targets, not just the explicit `genN:id` crossGenRef syntax. Matches the fallback already in checkEvolution so every Stop-hook evolution that asks the user can actually resolve when the target species lives in another generation's dex. The legacy line[stage+1] branch picks up the same fallback for symmetry. 2. [P2] getInstalledHooksPath in src/test-evolve/backup.ts now searches ~/.claude/plugins/cache/tkm/tkm/<version>/hooks/hooks.json in addition to the worktree PLUGIN_ROOT and the marketplaces tree. That is where the release install actually lives — the prior helper threw before creating backups on any machine installed via the cache path, which was exactly the setup the new harness documents. 3. [P2] cmdAchievements now reads commonState in addition to state.achievements and merges entries from getCommonAchievementsDB so cross-generation achievements such as all_gen_badges stay in the listing. Before this patch the command only consulted the active-gen table so common achievements silently disappeared once a user unlocked them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…at-stop # Conflicts: # src/cli/tokenmon.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
decision:"block"whosereasontells Claude to callAskUserQuestionwith the pokemon-voice phrasing (e.g....어라!? 파이리의 상태가...?). Works for both branch evolutions (e.g. Eevee's 3 stone-based paths) and single-chain evolutions (e.g. Charmander → Charmeleon), with a cross-genensurePokemonInDBfallback so pokemon not native to the active gen still resolve.tokenmon evolve <pokemon> <target>; Refuse (orno/cancel/거부) sets a permanentevolution_prompt_shownflag so the same pokemon is not re-prompted until the user runs/tkm evolvemanually./tkm:test-evolveslash harness that backs up state, seeds one of six scenarios (branch-eevee / single-charmander / multi-3 / overflow-5 / refuse-persist / accept-clear-reprompt), swaps the activehooks/hooks.jsonto route hooks at the worktree, lets the user trigger the prompt in their live session, then auto-runs--verify+--restorevia an append to the block reason.package.jsonnow uses an explicitfilesallowlist so all test-evolve paths (skills/test-evolve/,src/cli/test-evolve.ts,src/test-evolve/,src/test-scenarios/) stay out of published tarballs, and.tokenmon/test-backup/is gitignored.Key commit surface
src/hooks/stop.ts— post-lock evolution scan, block emission before thefirst_stop/no_deltaearly return,system_messagepreserved in block output, test-harness tail appended when.tokenmon/test-backup/current.jsonexists.src/core/evolution.ts—markEvolutionReadyhelper, single-chainevolution_readyplumbing (no more silent auto-evolve),ensurePokemonInDBfallback on every source/target lookup, newapplySingleChainEvolutionexport that mirrorsapplyBranchEvolution.src/core/pokemon-data.ts— cross-gen fallback ingetPokemonNameandpokemonIdByNameso localized names from another generation's dex resolve.src/cli/tokenmon.ts—cmdEvolveresolvestargetArgthroughresolvePokemonArgand routes single-chain pokemon throughcheckEvolution+executeEvolve(previously "no eligible" for any non-branching pokemon);executeEvolvedispatches branch vs single-chain byArray.isArray(data.evolves_to)and clearsevolution_prompt_shownon the new key.src/status-line.ts— drop thestatusline.evolution_readyhint; the AskUserQuestion prompt is the canonical surface.src/core/notifications.ts— skip theevolution_readynotification when the prompt has already fired.Other), and Other-input validation with a max-two re-prompt cap.test/e2e/evolve-askuserquestion.test.ts— hook-level contract coverage for the newdecision:"block"JSON and the prompt-shown flag lifecycle (child_process fallback; tmux full-session harness remains a follow-up).Dev harness (dev-only, excluded from publish)
/tkm:test-evolve <scenario>→--setup(backup + seed + hooks swap)AskUserQuestionrenderstokenmon evolvein the same turn → auto--verify→ auto--restore→ compact report/tkm:test-evolve --list|--restorefor emergency cleanup |--helpTest plan
npm run typecheckcleannpm test— 1203 existing tests still greennode --import tsx --test test/e2e/evolve-askuserquestion.test.ts— 2/2 pass/tkm:test-evolve single-charmander→ click Charmeleon → verify + restore run automatically/tkm:test-evolve branch-eevee→ click Vaporeon → Eevee evolves via water-stone seed/tkm:test-evolve refuse-persist→ click Refuse → next Stop does not re-block🤖 Generated with Claude Code