Skip to content

Stripe Startup Kit v0.1 — 5 skills + head-to-head eval (SOL-88)#21

Open
solidstatecc wants to merge 1 commit into
masterfrom
agent/engineer/b8289b87
Open

Stripe Startup Kit v0.1 — 5 skills + head-to-head eval (SOL-88)#21
solidstatecc wants to merge 1 commit into
masterfrom
agent/engineer/b8289b87

Conversation

@solidstatecc

Copy link
Copy Markdown
Owner

Stripe Startup Kit v0.1 — AUTHOR phase (SOL-88)

Five dogfood-critical-path skills for the founder's sell-a-thing loop. They compose the Stripe MCP and never wrap it, enforcing the four-part safety doctrine the MCP only recommends in prose.

The doctrine (the moat — enforced, not recommended)

  1. Test mode first — live refused until a green dry run + explicit unlock
  2. Least-privilege — restricted rk_ keys, scoped per skill; sk_ flagged over-privileged
  3. Human-confirm on money — live money moves return a confirm card, never auto-run
  4. Idempotent — every write carries a deterministic Idempotency-Key

Architecture

Each bundle is SKILL.md + entry.py + a copy of the shared rails.py. entry.py is a deterministic guard (JSON-in → guarded plan out, pure stdlib, zero Stripe network calls). It returns the exact, idempotent, confirm-gated MCP call to make; the agent executes it through the Stripe MCP only when the gate clears (GO / CONFIRM / BLOCK). That deterministic enforcement layer is what makes the doctrine testable.

The skills

stripe-stand-upstripe-product-to-pricestripe-tax-readystripe-deliverstripe-revenue-read

(Deferred to v0.2: subscription-designer, recover.)

Evals — the gating proof

  • eval/head_to_head.py (doctrine rails vs raw MCP): rails 6/6, raw MCP 1/6, delta +5, idempotency deterministic → rails beat raw MCP, not a wrapper. Report in eval/RESULTS.md. The raw-MCP baseline is modeled faithfully from the SOL-86 RESEARCH gate (MCP runs live writes unblocked; doctrine is prose-only).
  • eval/skills_eval.py (per-skill): 22/22 across normal / edge / out-of-scope refusal cases.
python3 eval/head_to_head.py   # exit 0 = rails win; exit 1 = stop & flag
python3 eval/skills_eval.py     # exit 0 = all green

Boundaries

TEST MODE only. Listing/packaging and the real-money dogfood are downstream council-gated steps, not in this PR. Next pipeline gates: AUDIT (auditor), then TEST (tester).

Note for AUDIT

Skills are prefixed stripe- (brief used bare names like deliver, stand-up). Rationale: a flat marketplace makes generic slugs collide and mis-trigger. Trivially reversible (folder name + frontmatter name) if the council prefers the bare names.

🤖 Generated with Claude Code

AUTHOR phase of the Skill Production pipeline. Five dogfood-critical-path
skills that compose the Stripe MCP and enforce the four-part safety doctrine
the MCP only recommends (test-mode-first, least-privilege rk_ keys,
human-confirm on money, idempotent writes):

  stripe-stand-up · stripe-product-to-price · stripe-tax-ready
  stripe-deliver · stripe-revenue-read

Each bundle is SKILL.md + entry.py (a deterministic JSON-in/JSON-out guard,
pure stdlib, zero Stripe network calls) + a copy of the shared rails.py.
The guard plans a confirm-gated, idempotent MCP call; the agent executes it
via the Stripe MCP. Compose, never wrap.

Evals (gating proof): head-to-head rails vs raw MCP scores 6/6 vs 1/6
(delta +5, idempotency deterministic) — the rails meaningfully beat raw MCP,
so this is not a wrapper. Per-skill suite 22/22 (normal/edge/refusal).

TEST MODE only. Listing/packaging and the real-money dogfood stay
council-gated downstream. Next gates: AUDIT, then TEST.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>
@vercel

vercel Bot commented Jun 18, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
solidstate.cc Ready Ready Preview, Comment Jun 18, 2026 5:18pm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant