Skip to content

Simulation mode for interactive spec-driven software simulation#16

Open
freesig wants to merge 35 commits into
freesig/sim-1-tui-infrafrom
freesig/sim-2-simulation
Open

Simulation mode for interactive spec-driven software simulation#16
freesig wants to merge 35 commits into
freesig/sim-1-tui-infrafrom
freesig/sim-2-simulation

Conversation

@freesig
Copy link
Copy Markdown
Collaborator

@freesig freesig commented Apr 6, 2026

Summary

  • Add core simulation engine: runner, prompts, session management, orchestration
  • Simulation UI: channel picker, scenario input, cursor navigation, mouse capture
  • Feature regeneration via MCP tool and TUI keybind
  • Convert MCP tool handlers to async to prevent runtime deadlock
  • Iterative prompt tuning to enforce JSON output and spec fidelity

Stack

  • PR 1/5: TUI Infrastructure
  • PR 2/5 ← you are here
  • PR 3/5: Sim Tools, Game Mode & Streaming
  • PR 4/5: Members Screen
  • PR 5/5: Lean Game Mode

Test plan

  • Launch simulation from TUI with a scenario
  • Simulation produces spec-referenced interactions
  • Channel picker options work (explore code, consume whole spec)
  • Feature regeneration updates descriptions from directory state
  • MCP tools don't deadlock when called during simulation

freesig added 30 commits March 31, 2026 13:26
…tion

Introduces a new TUI screen where users can interactively simulate their
software based on spec nodes before implementation. An AI agent drives the
simulation, producing per-channel output (UI, audio, network, errors, logs)
grounded in spec node references.

Key features:
- Channel picker to select output channels before starting
- Dual input modes (Normal/Insert) with Ctrl+Enter to submit
- Tab/split-pane layout cycling (F5) for multi-channel views
- Spec reference overlays ([^N] footnotes, number keys to inspect)
- Spec gap indicators for ungrounded agent behavior
- Behavior reporting mode (r key) for inverse traceability
- Background processing with spinner, session resumption via --resume
…s bar

The MCP config passed to claude CLI was missing the required "type": "http"
field, causing immediate rejection. Additionally, simulation errors were
invisible because the simulation screen never rendered app.message.
Adds the ability to generate and view shadow answers in the TUI,
comparing spec answers against actual codebase implementation. Press
Shift+S to trigger shadow generation, which shows a progress bar and
populates implementation status icons (○/◐/●/⚡) in the tree view
and detailed status/review in the node content panel.
Shift+Enter is not reliably forwarded by tmux, making it impossible
to submit simulation input in tmux sessions. Add Ctrl+S as a fallback.
The model would break character on short inputs like "q", responding
with plain text instead of JSON. Append a format reminder to each
resume prompt to keep responses in the required JSON envelope.
…scendants

Instead of loading all spec nodes, the simulation now loads the
selected node, its ancestor chain, descendants, a spec summary,
and other root questions. The prompt strongly encourages querying
MCP tools for additional context beyond the focus subtree.
Adds an optional scenario description step between channel picker and
simulation start. Users can describe the starting state and ongoing
conditions (e.g. "other nodes are sending ACK messages") to customize
the simulation context instead of always starting from a blank state.
…lacing channels

Reports now display a magenta-bordered overlay with the LLM's explanation
of why the simulation behaves a certain way, citing spec nodes with [^N]
markers. Channel contents are preserved so the user doesn't lose the
current simulation state and node references.
…are active

Candidate reload was being forced every 250ms tick while any background
operation was running. Now only reloads once on the busy→idle transition.
…l focus cycling

Add [S] keybinding in simulation Normal mode to edit the scenario at any
point during a running simulation. The input area shows a [Scenario]
prefix, pre-populates with the current scenario, and on submit sends a
scenario update to the LLM while persisting it on the session.

Also adds log panel focus support with arrow key line scrolling and
Tab cycling through main → tree → log panels.
…t ref input to simulation

- LLM now must list every discrete decision in a "decisions" array with refs and spec_gaps
- Changed spec_gap from single optional string to spec_gaps array (multiple per channel)
- Number keys buffer with ~500ms delay for multi-digit refs (e.g. "11" for [^11])
- Decisions panel renders below channels in magenta with inline ref markers
- SimOpenRef searches both channel refs and decision refs
…from directory state

Adds UpdateFeature op to the append-only log, a regeneration pipeline that
uses directory context and child Q&A nodes to produce an updated description
via Claude, and triggers cascade review on children after update.
…ctory state

Adds RegenerateFeature action bound to Shift+R in both tree and flat-list
modes, with an API layer that validates the selected node is a root feature
before spawning regeneration.
…access codebase

Previously run_claude was called without a directory, so Claude had no file
access. Now uses run_claude_in_dir_cached with the spec's directory, includes
project directory instructions in the prompt, and appends the codebase context
output instruction so responses build the dir context cache.
Full explore (X key) now shows a depth selection modal instead of
immediately starting with hardcoded depth 3. Exploration also runs
with end_on_answer=true so it stops after answering leaf nodes.
…taminating sim responses

With --output-format text, intermediate MCP tool call/result content was
concatenated into stdout, causing JSON parse failures when the model used
tools during resumed sessions (e.g., scenario updates). Switching to
--output-format json isolates the final assistant text in a result field.
…ecified behavior

Reframe the AI's role from "simulate a working app" to "spec-simulation
tool" that renders ONLY what spec nodes explicitly describe. Key changes:
- Add CARDINAL RULE preamble making grounding the #1 priority
- Strengthen "do not guess" into explicit rules against loosely-related citations
- Remove "realistic TUI" language that pushed AI to invent UI elements
- Reframe spec_gaps as warning flags, not permission slips
- Add "prefer omission over invention" rule with concrete examples
- Fix initial/resume prompts to stop requesting a "starting state"
Adds a [Tab] toggle on the channel picker screen that loads all spec
nodes into the simulation system prompt instead of just the focus
node's ancestors and descendants.
… invented behavior

The previous prompt revision was too strict, causing the AI to just list
spec coverage instead of rendering an interactive simulation. Rebalance:
- AI MUST render the feature as it would look if implemented
- Everything grounded in a spec node gets cited with [^N] markers
- Everything invented to make the simulation interactive gets flagged as
  a spec_gap — never silently blended with grounded content
- Loose citations (citing a parent/related node for something it doesn't
  specifically describe) are explicitly called out as the wrong pattern
- Updated both build_system_prompt and build_system_prompt_whole_spec
…ation prediction

Use entropy-based gap detection: only flag spec_gaps for high-entropy
decisions where different implementers would diverge. Low-entropy choices
(submit buttons, standard layout, obvious defaults) are rendered naturally
without flagging, producing cleaner simulation output.
Replace string buffer with Vec<CapturedKey> so simulation insert mode
captures all keystrokes as discrete tokens. Special keys render as
{enter}, {up}, {down}, {tab}, etc. for readable input display.
…mode

Enables mouse capture in the terminal and records left-clicks within the
channel content area as {left-click:X,Y} tokens in the captured key sequence,
allowing the AI to interpret simulated UI interactions.
When enabled, sets the claude CLI working directory to the spec's
project directory and appends code-aware instructions to the system
prompt — letting the agent consult the actual codebase for unspecified
details while treating the spec as the source of truth.
MCP answer_question and update_answer were reimplementing embed + submit_op
directly, skipping the post-answer pipeline (entropy evaluation, child
generation, descendant review, summary regeneration). Now both route through
api::answer_node — the same path used by the web UI and TUI.

- Extend answer_node to accept optional residual_entropy; skip background
  AI evaluation when caller provides it
- Remove redundant update_answer MCP tool (answer_question handles both)
- MCP now returns full Node instead of just {node_id, status}
Global EnableMouseCapture was preventing text selection/copy across the
entire TUI. Now mouse capture is toggled on only when entering simulation
insert mode and off when leaving.
freesig added 5 commits April 1, 2026 08:20
Add a broadcast channel to AppState so the op_loop notifies all
subscribers after each committed operation. The TUI subscribes and
refreshes the relevant view (spec list or node list) immediately,
replacing the previous pull-only model that required user navigation
to see background changes.
Sync tool handlers were calling Handle::block_on() from within the
tokio async runtime, causing hangs on every MCP tool call. Converted
all 10 affected handlers to async fn and replaced block_on with .await.
Instruments: tool handler entry/exit, answer_node stages, submit_op
flow, op_loop receive/apply, and MCP handler creation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant