Skip to content

feat(ship): stack-aware test execution#842

Open
inchwormz wants to merge 1 commit intogarrytan:mainfrom
inchwormz:feat/stack-aware-ship-test-execution
Open

feat(ship): stack-aware test execution#842
inchwormz wants to merge 1 commit intogarrytan:mainfrom
inchwormz:feat/stack-aware-ship-test-execution

Conversation

@inchwormz
Copy link
Copy Markdown

@inchwormz inchwormz commented Apr 5, 2026

TL;DR

/ship only works on Rails projects today. On anything else (Next.js, Python, Go, Rust, PHP, Elixir, plain Node) it tells Claude to run bin/test-lane, the file does not exist, and the ship stops. This PR makes /ship detect the project at skill-run time and emit the right test instructions for that project. Rails projects see no change. Every other project goes from broken to working.

The problem, concretely

ship/SKILL.md.tmpl hard-codes the Rails test flow inside Step 3 "Run tests". The template says things like bin/test-lane, db:test:prepare, app/services/*_prompt_builder.rb, and EVAL_JUDGE_TIER=full. These are Rails-only commands. A user running /ship on a Next.js app still gets the same generated text, because the template has no branching. Claude then tries bin/test-lane on a repo where that file does not exist, and the ship halts.

You can verify this on main:

grep -n "bin/test-lane" ship/SKILL.md.tmpl
# 10 hits, all inside Step 3
grep -n "{{.*STACK.*}}\|SHIP_STACK" ship/SKILL.md.tmpl
# 0 hits

There is no stack awareness. The template assumes Rails.

The fix

  1. A new resolver function, generateShipTestExecution in scripts/resolvers/testing.ts. It returns the full "Step 3: Run tests" block, with stack detection bash at the top and two branches below.
  2. A new placeholder, {{SHIP_TEST_EXECUTION}}, registered in scripts/resolvers/index.ts.
  3. ship/SKILL.md.tmpl now contains a single {{SHIP_TEST_EXECUTION}} line where the 86 Rails-specific lines used to live.
  4. The generator (bun run gen:skill-docs) expands the placeholder into the same Rails block as before plus a new generic-fallback block plus stack detection bash.

Detection happens at skill-run time, not generator time. The generated ship/SKILL.md contains bash that sniffs the current repo and prints SHIP_STACK: rails or SHIP_STACK: generic. That means one generated skill works across every project, without per-project regeneration.

Rails projects: byte-for-byte the same

The Rails block is moved, not rewritten. Every line that used to live at the top of Step 3 now lives inside ### If SHIP_STACK: rails. That includes:

  • bin/test-lane 2>&1 | tee /tmp/ship_tests.txt (plus the parallel npm run test line)
  • The RAILS_ENV=test bin/rails db:migrate warning about corrupting structure.sql
  • The db:test:prepare explanation
  • The Step 3.25 Eval Suites section (app/services/*_prompt_builder.rb, *_generation_service.rb, *_evaluator.rb, config/system_prompts/*.txt, test/evals/**/* globs)
  • The PROMPT_SOURCE_FILES grep pattern
  • EVAL_JUDGE_TIER=full EVAL_VERBOSE=1 bin/test-lane --eval and the fast/standard/full tier reference table
  • The "Pre-merge gate uses full tier" rule

You can verify the Rails content survived intact:

# Count Rails markers in committed main
git show main:ship/SKILL.md | grep -c "bin/test-lane"
# 4

# Count on this branch
git show HEAD:ship/SKILL.md | grep -c "bin/test-lane"
# 4

Same for PROMPT_SOURCE_FILES (1:1), db:test:prepare (1:1), EVAL_JUDGE_TIER=full (2:2).

Detection uses the same Gemfile grep rails pattern that generateTestBootstrap already uses at scripts/resolvers/testing.ts:20. No new convention.

Generic branch: what happens on a non-Rails project

The generic branch runs when Gemfile is missing, or Gemfile exists but does not contain rails. It walks a detection order:

  1. package.json scripts. If present, run the first match of test:ci, test:all, or test.
  2. Makefile target. If present with a test or check target, run make test or make check.
  3. Language default. go test ./... for Go, cargo test for Rust, pytest for Python, bundle exec rspec for non-Rails Ruby, and so on. Only reached if steps 1 and 2 found nothing.

If nothing matches, /ship uses AskUserQuestion instead of guessing. Options: run a specific command, skip tests this ship (with a loud warning in the PR body), or type a custom command.

Proof: same command, two projects, two outputs

Rails repo:

$ head -1 Gemfile
source 'https://rubygems.org'
$ grep -c rails Gemfile
3
# /ship emits the Rails branch:
#   bin/test-lane 2>&1 | tee /tmp/ship_tests.txt
#   EVAL_JUDGE_TIER=full EVAL_VERBOSE=1 bin/test-lane --eval test/evals/<suite>_eval_test.rb
#   "Rails tests pass (N runs, 0 failures)"

Next.js repo (same /ship command):

$ ls Gemfile 2>/dev/null
$ node -e "console.log(require('./package.json').scripts.test)"
next test
# /ship emits the generic branch:
#   next test 2>&1 | tee /tmp/ship_tests.txt
#   (no bin/test-lane, no eval suites)

Before this change, the Next.js run followed Rails-only instructions and errored on a missing bin/test-lane.

Files changed

File Change
scripts/resolvers/testing.ts New generateShipTestExecution function (about 150 lines). Imports generateTestFailureTriage from ./preamble and calls it once at the bottom so the triage appears after both branches.
scripts/resolvers/index.ts Registers SHIP_TEST_EXECUTION: generateShipTestExecution in the RESOLVERS record.
ship/SKILL.md.tmpl 86 lines of Rails-specific bash and prose replaced with a single {{SHIP_TEST_EXECUTION}} line.
ship/SKILL.md Regenerated via bun run gen:skill-docs. 184 lines changed.
README.md Short "Stack-aware skills" section near the end, listing the two branches and the detection logic. Trim further or drop if you prefer.
CHANGELOG.md Unreleased entry with a "Rails projects: no behavior change" callout.
CONTRIBUTING.md Short note under "Editing SKILL.md files" explaining how to add a new stack branch.

Adding a new stack later

Example: adding a dedicated Next.js branch.

  1. Add a detection line to the _STACK="generic" bash block in generateShipTestExecution:
    [ -f package.json ] && grep -q '"next"' package.json 2>/dev/null && _STACK="nextjs"
    
  2. Add a matching ### If SHIP_STACK: nextjs section inside the template literal with Next.js-specific instructions.
  3. Leave the generic fallback as the last branch.

CONTRIBUTING.md has a short version of this recipe.

Not touched

To keep the review surface small, this PR does not touch any of the following:

  • Hooks
  • Config schema (.gstack/config.yaml)
  • Generator core (scripts/gen-skill-docs.ts, scripts/host-config.ts)
  • Any skill other than /ship
  • Telemetry
  • CI

The only runtime behavior change is inside Step 3 of /ship. Everything else in ship/SKILL.md (preamble, Step 0 through Step 2, Step 3.4 onward) is untouched.

Testing done locally

  • bun run gen:skill-docs runs green. No unresolved placeholders in the output.
  • ship/SKILL.md regenerated to 2290 lines, down from 2396 after removing a duplicated ## Test Failure Ownership Triage section that was accidentally emitted twice in an earlier draft.
  • Rails content verified byte-parity against origin/main for every marker (bin/test-lane, PROMPT_SOURCE_FILES, db:test:prepare, EVAL_JUDGE_TIER=full, tier reference table).
  • Stack detection bash tested against both a Rails Gemfile (matches) and a Next.js package.json with no Gemfile (falls through to generic).

Test plan for reviewer

  • bun run gen:skill-docs rebuilds ship/SKILL.md with no unresolved placeholders.
  • grep -n "## Test Failure Ownership Triage" ship/SKILL.md shows exactly one match.
  • Rails repo: running /ship prints the bin/test-lane and eval-suite block unchanged.
  • Next.js repo: running /ship prints the generic branch and runs next test (or whatever package.json scripts.test contains).
  • Project with no Gemfile, no package.json, and no Makefile: /ship falls through to AskUserQuestion instead of running a Rails command.

Open questions for the maintainer

  1. Dedicated Next.js branch in a follow-up? The generic branch handles Next.js via package.json scripts.test, which is fine. A dedicated branch could add Playwright detection and next build as a pre-test gate. Happy to ship that as PR 2 if you want it.
  2. README section placement. The "Stack-aware skills" blurb lives between "Docs" and "Privacy & Telemetry". If you prefer it somewhere else, or prefer it removed entirely and only mentioned in CHANGELOG, say the word.
  3. The retro/review templates. Two files (retro/SKILL.md.tmpl, review/SKILL.md.tmpl) contain Rails references in example output only, not in runtime instructions. I left them alone in this PR. Flag if you want a follow-up sweep.

/ship test execution now detects the project stack at skill-run time
and emits instructions that fit. Rails projects (Gemfile contains
"rails") get the same bin/test-lane, db:test:prepare, and
app/services/*_prompt_builder.rb eval-suite flow as before. Any other
project gets a generic path that finds the project's own test command
via package.json scripts, a Makefile target, or language defaults,
then falls back to AskUserQuestion when nothing is detectable.

Previously, every project running /ship followed Rails-only
instructions and errored on a missing bin/test-lane.

The Rails block is moved, not rewritten. bin/test-lane, the
db:test:prepare warning, the eval-suite runner pattern, and the
EVAL_JUDGE_TIER=full tier reference all sit byte-for-byte inside the
SHIP_STACK: rails branch. Detection uses the same Gemfile grep that
the existing generateTestBootstrap resolver uses.

Files:
 - scripts/resolvers/testing.ts: new generateShipTestExecution
   resolver with Rails + generic branches
 - scripts/resolvers/index.ts: register SHIP_TEST_EXECUTION
 - ship/SKILL.md.tmpl: replace 86 lines of Rails-specific content
   with {{SHIP_TEST_EXECUTION}}
 - ship/SKILL.md: regenerate
 - README.md: stack-aware skills section with Rails vs Next.js proof
 - CHANGELOG.md: Unreleased entry
 - CONTRIBUTING.md: note on adding a new stack branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant