Simulation mode for interactive spec-driven software simulation#16
Open
freesig wants to merge 35 commits into
Open
Simulation mode for interactive spec-driven software simulation#16freesig wants to merge 35 commits into
freesig wants to merge 35 commits into
Conversation
…tion Introduces a new TUI screen where users can interactively simulate their software based on spec nodes before implementation. An AI agent drives the simulation, producing per-channel output (UI, audio, network, errors, logs) grounded in spec node references. Key features: - Channel picker to select output channels before starting - Dual input modes (Normal/Insert) with Ctrl+Enter to submit - Tab/split-pane layout cycling (F5) for multi-channel views - Spec reference overlays ([^N] footnotes, number keys to inspect) - Spec gap indicators for ungrounded agent behavior - Behavior reporting mode (r key) for inverse traceability - Background processing with spinner, session resumption via --resume
…s bar The MCP config passed to claude CLI was missing the required "type": "http" field, causing immediate rejection. Additionally, simulation errors were invisible because the simulation screen never rendered app.message.
Adds the ability to generate and view shadow answers in the TUI, comparing spec answers against actual codebase implementation. Press Shift+S to trigger shadow generation, which shows a progress bar and populates implementation status icons (○/◐/●/⚡) in the tree view and detailed status/review in the node content panel.
Shift+Enter is not reliably forwarded by tmux, making it impossible to submit simulation input in tmux sessions. Add Ctrl+S as a fallback.
The model would break character on short inputs like "q", responding with plain text instead of JSON. Append a format reminder to each resume prompt to keep responses in the required JSON envelope.
…scendants Instead of loading all spec nodes, the simulation now loads the selected node, its ancestor chain, descendants, a spec summary, and other root questions. The prompt strongly encourages querying MCP tools for additional context beyond the focus subtree.
Adds an optional scenario description step between channel picker and simulation start. Users can describe the starting state and ongoing conditions (e.g. "other nodes are sending ACK messages") to customize the simulation context instead of always starting from a blank state.
…lacing channels Reports now display a magenta-bordered overlay with the LLM's explanation of why the simulation behaves a certain way, citing spec nodes with [^N] markers. Channel contents are preserved so the user doesn't lose the current simulation state and node references.
…are active Candidate reload was being forced every 250ms tick while any background operation was running. Now only reloads once on the busy→idle transition.
…ctually generates it
…l focus cycling Add [S] keybinding in simulation Normal mode to edit the scenario at any point during a running simulation. The input area shows a [Scenario] prefix, pre-populates with the current scenario, and on submit sends a scenario update to the LLM while persisting it on the session. Also adds log panel focus support with arrow key line scrolling and Tab cycling through main → tree → log panels.
…t ref input to simulation - LLM now must list every discrete decision in a "decisions" array with refs and spec_gaps - Changed spec_gap from single optional string to spec_gaps array (multiple per channel) - Number keys buffer with ~500ms delay for multi-digit refs (e.g. "11" for [^11]) - Decisions panel renders below channels in magenta with inline ref markers - SimOpenRef searches both channel refs and decision refs
…from directory state Adds UpdateFeature op to the append-only log, a regeneration pipeline that uses directory context and child Q&A nodes to produce an updated description via Claude, and triggers cascade review on children after update.
…ctory state Adds RegenerateFeature action bound to Shift+R in both tree and flat-list modes, with an API layer that validates the selected node is a root feature before spawning regeneration.
…access codebase Previously run_claude was called without a directory, so Claude had no file access. Now uses run_claude_in_dir_cached with the spec's directory, includes project directory instructions in the prompt, and appends the codebase context output instruction so responses build the dir context cache.
Full explore (X key) now shows a depth selection modal instead of immediately starting with hardcoded depth 3. Exploration also runs with end_on_answer=true so it stops after answering leaf nodes.
…taminating sim responses With --output-format text, intermediate MCP tool call/result content was concatenated into stdout, causing JSON parse failures when the model used tools during resumed sessions (e.g., scenario updates). Switching to --output-format json isolates the final assistant text in a result field.
…ecified behavior Reframe the AI's role from "simulate a working app" to "spec-simulation tool" that renders ONLY what spec nodes explicitly describe. Key changes: - Add CARDINAL RULE preamble making grounding the #1 priority - Strengthen "do not guess" into explicit rules against loosely-related citations - Remove "realistic TUI" language that pushed AI to invent UI elements - Reframe spec_gaps as warning flags, not permission slips - Add "prefer omission over invention" rule with concrete examples - Fix initial/resume prompts to stop requesting a "starting state"
Adds a [Tab] toggle on the channel picker screen that loads all spec nodes into the simulation system prompt instead of just the focus node's ancestors and descendants.
… invented behavior The previous prompt revision was too strict, causing the AI to just list spec coverage instead of rendering an interactive simulation. Rebalance: - AI MUST render the feature as it would look if implemented - Everything grounded in a spec node gets cited with [^N] markers - Everything invented to make the simulation interactive gets flagged as a spec_gap — never silently blended with grounded content - Loose citations (citing a parent/related node for something it doesn't specifically describe) are explicitly called out as the wrong pattern - Updated both build_system_prompt and build_system_prompt_whole_spec
…ation prediction Use entropy-based gap detection: only flag spec_gaps for high-entropy decisions where different implementers would diverge. Low-entropy choices (submit buttons, standard layout, obvious defaults) are rendered naturally without flagging, producing cleaner simulation output.
Replace string buffer with Vec<CapturedKey> so simulation insert mode
captures all keystrokes as discrete tokens. Special keys render as
{enter}, {up}, {down}, {tab}, etc. for readable input display.
…mode
Enables mouse capture in the terminal and records left-clicks within the
channel content area as {left-click:X,Y} tokens in the captured key sequence,
allowing the AI to interpret simulated UI interactions.
When enabled, sets the claude CLI working directory to the spec's project directory and appends code-aware instructions to the system prompt — letting the agent consult the actual codebase for unspecified details while treating the spec as the source of truth.
MCP answer_question and update_answer were reimplementing embed + submit_op
directly, skipping the post-answer pipeline (entropy evaluation, child
generation, descendant review, summary regeneration). Now both route through
api::answer_node — the same path used by the web UI and TUI.
- Extend answer_node to accept optional residual_entropy; skip background
AI evaluation when caller provides it
- Remove redundant update_answer MCP tool (answer_question handles both)
- MCP now returns full Node instead of just {node_id, status}
Global EnableMouseCapture was preventing text selection/copy across the entire TUI. Now mouse capture is toggled on only when entering simulation insert mode and off when leaving.
Add a broadcast channel to AppState so the op_loop notifies all subscribers after each committed operation. The TUI subscribes and refreshes the relevant view (spec list or node list) immediately, replacing the previous pull-only model that required user navigation to see background changes.
Sync tool handlers were calling Handle::block_on() from within the tokio async runtime, causing hangs on every MCP tool call. Converted all 10 affected handlers to async fn and replaced block_on with .await.
Instruments: tool handler entry/exit, answer_node stages, submit_op flow, op_loop receive/apply, and MCP handler creation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stack
Test plan