feat(ship): stack-aware test execution#842
Open
inchwormz wants to merge 1 commit intogarrytan:mainfrom
Open
Conversation
/ship test execution now detects the project stack at skill-run time
and emits instructions that fit. Rails projects (Gemfile contains
"rails") get the same bin/test-lane, db:test:prepare, and
app/services/*_prompt_builder.rb eval-suite flow as before. Any other
project gets a generic path that finds the project's own test command
via package.json scripts, a Makefile target, or language defaults,
then falls back to AskUserQuestion when nothing is detectable.
Previously, every project running /ship followed Rails-only
instructions and errored on a missing bin/test-lane.
The Rails block is moved, not rewritten. bin/test-lane, the
db:test:prepare warning, the eval-suite runner pattern, and the
EVAL_JUDGE_TIER=full tier reference all sit byte-for-byte inside the
SHIP_STACK: rails branch. Detection uses the same Gemfile grep that
the existing generateTestBootstrap resolver uses.
Files:
- scripts/resolvers/testing.ts: new generateShipTestExecution
resolver with Rails + generic branches
- scripts/resolvers/index.ts: register SHIP_TEST_EXECUTION
- ship/SKILL.md.tmpl: replace 86 lines of Rails-specific content
with {{SHIP_TEST_EXECUTION}}
- ship/SKILL.md: regenerate
- README.md: stack-aware skills section with Rails vs Next.js proof
- CHANGELOG.md: Unreleased entry
- CONTRIBUTING.md: note on adding a new stack branch
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR
/shiponly works on Rails projects today. On anything else (Next.js, Python, Go, Rust, PHP, Elixir, plain Node) it tells Claude to runbin/test-lane, the file does not exist, and the ship stops. This PR makes/shipdetect the project at skill-run time and emit the right test instructions for that project. Rails projects see no change. Every other project goes from broken to working.The problem, concretely
ship/SKILL.md.tmplhard-codes the Rails test flow inside Step 3 "Run tests". The template says things likebin/test-lane,db:test:prepare,app/services/*_prompt_builder.rb, andEVAL_JUDGE_TIER=full. These are Rails-only commands. A user running/shipon a Next.js app still gets the same generated text, because the template has no branching. Claude then triesbin/test-laneon a repo where that file does not exist, and the ship halts.You can verify this on
main:There is no stack awareness. The template assumes Rails.
The fix
generateShipTestExecutioninscripts/resolvers/testing.ts. It returns the full "Step 3: Run tests" block, with stack detection bash at the top and two branches below.{{SHIP_TEST_EXECUTION}}, registered inscripts/resolvers/index.ts.ship/SKILL.md.tmplnow contains a single{{SHIP_TEST_EXECUTION}}line where the 86 Rails-specific lines used to live.bun run gen:skill-docs) expands the placeholder into the same Rails block as before plus a new generic-fallback block plus stack detection bash.Detection happens at skill-run time, not generator time. The generated
ship/SKILL.mdcontains bash that sniffs the current repo and printsSHIP_STACK: railsorSHIP_STACK: generic. That means one generated skill works across every project, without per-project regeneration.Rails projects: byte-for-byte the same
The Rails block is moved, not rewritten. Every line that used to live at the top of Step 3 now lives inside
### If SHIP_STACK: rails. That includes:bin/test-lane 2>&1 | tee /tmp/ship_tests.txt(plus the parallelnpm run testline)RAILS_ENV=test bin/rails db:migratewarning about corruptingstructure.sqldb:test:prepareexplanationapp/services/*_prompt_builder.rb,*_generation_service.rb,*_evaluator.rb,config/system_prompts/*.txt,test/evals/**/*globs)PROMPT_SOURCE_FILESgrep patternEVAL_JUDGE_TIER=full EVAL_VERBOSE=1 bin/test-lane --evaland thefast/standard/fulltier reference tableYou can verify the Rails content survived intact:
Same for
PROMPT_SOURCE_FILES(1:1),db:test:prepare(1:1),EVAL_JUDGE_TIER=full(2:2).Detection uses the same
Gemfile grep railspattern thatgenerateTestBootstrapalready uses atscripts/resolvers/testing.ts:20. No new convention.Generic branch: what happens on a non-Rails project
The generic branch runs when
Gemfileis missing, orGemfileexists but does not containrails. It walks a detection order:package.jsonscripts. If present, run the first match oftest:ci,test:all, ortest.Makefiletarget. If present with atestorchecktarget, runmake testormake check.go test ./...for Go,cargo testfor Rust,pytestfor Python,bundle exec rspecfor non-Rails Ruby, and so on. Only reached if steps 1 and 2 found nothing.If nothing matches,
/shipusesAskUserQuestioninstead of guessing. Options: run a specific command, skip tests this ship (with a loud warning in the PR body), or type a custom command.Proof: same command, two projects, two outputs
Rails repo:
Next.js repo (same
/shipcommand):Before this change, the Next.js run followed Rails-only instructions and errored on a missing
bin/test-lane.Files changed
scripts/resolvers/testing.tsgenerateShipTestExecutionfunction (about 150 lines). ImportsgenerateTestFailureTriagefrom./preambleand calls it once at the bottom so the triage appears after both branches.scripts/resolvers/index.tsSHIP_TEST_EXECUTION: generateShipTestExecutionin the RESOLVERS record.ship/SKILL.md.tmpl{{SHIP_TEST_EXECUTION}}line.ship/SKILL.mdbun run gen:skill-docs. 184 lines changed.README.mdCHANGELOG.mdCONTRIBUTING.mdAdding a new stack later
Example: adding a dedicated Next.js branch.
_STACK="generic"bash block ingenerateShipTestExecution:### If SHIP_STACK: nextjssection inside the template literal with Next.js-specific instructions.CONTRIBUTING.md has a short version of this recipe.
Not touched
To keep the review surface small, this PR does not touch any of the following:
.gstack/config.yaml)scripts/gen-skill-docs.ts,scripts/host-config.ts)/shipThe only runtime behavior change is inside Step 3 of
/ship. Everything else inship/SKILL.md(preamble, Step 0 through Step 2, Step 3.4 onward) is untouched.Testing done locally
bun run gen:skill-docsruns green. No unresolved placeholders in the output.ship/SKILL.mdregenerated to 2290 lines, down from 2396 after removing a duplicated## Test Failure Ownership Triagesection that was accidentally emitted twice in an earlier draft.origin/mainfor every marker (bin/test-lane,PROMPT_SOURCE_FILES,db:test:prepare,EVAL_JUDGE_TIER=full, tier reference table).Gemfile(matches) and a Next.jspackage.jsonwith noGemfile(falls through to generic).Test plan for reviewer
bun run gen:skill-docsrebuildsship/SKILL.mdwith no unresolved placeholders.grep -n "## Test Failure Ownership Triage" ship/SKILL.mdshows exactly one match./shipprints thebin/test-laneand eval-suite block unchanged./shipprints the generic branch and runsnext test(or whateverpackage.json scripts.testcontains).Gemfile, nopackage.json, and noMakefile:/shipfalls through toAskUserQuestioninstead of running a Rails command.Open questions for the maintainer
package.json scripts.test, which is fine. A dedicated branch could add Playwright detection andnext buildas a pre-test gate. Happy to ship that as PR 2 if you want it.retro/SKILL.md.tmpl,review/SKILL.md.tmpl) contain Rails references in example output only, not in runtime instructions. I left them alone in this PR. Flag if you want a follow-up sweep.