solidstatecc · solidstatecc · Jun 18, 2026
diff --git a/skills-public/stripe-startup-kit/.gitignore b/skills-public/stripe-startup-kit/.gitignore
@@ -0,0 +1,2 @@
+__pycache__/
+*.pyc
diff --git a/skills-public/stripe-startup-kit/DOCTRINE.md b/skills-public/stripe-startup-kit/DOCTRINE.md
@@ -0,0 +1,54 @@
+# The safety doctrine
+
+The four rails every kit skill enforces. Stripe's own docs *recommend* all four;
+the Stripe MCP enforces none by default. The kit enforces them in `rails.py`,
+deterministically, so they cannot be skipped by agent judgment. This is the moat.
+
+## 1. Test mode first — live never by accident
+
+- A key's mode is read from its prefix: `*_test_` vs `*_live_`. Authoritative.
+- A **live** key is **refused** (`gate: BLOCK`) unless *both*: `allow_live=true`
+  **and** a green dry-run receipt (`{"status":"green","passed":true}`).
+- Reads are exempt — a read-only key in live mode is harmless.
+- The dry run (see `stripe-stand-up`) is the only thing that produces a green
+  receipt, so "prove it works in test" gates "sell in live."
+
+## 2. Least-privilege keys — restricted, scoped per skill
+
+- Every skill declares the **minimal** scope it needs (`scope_manifest`). Create
+  an `rk_` (restricted) key with exactly that, nothing more.
+- A secret `sk_` key works but is **over-privileged** → `warning` + the scope to
+  recreate it correctly. A publishable `pk_` key cannot do server work → `BLOCK`.
+- `revenue-read`'s scope is read-only across every money surface — it cannot
+  write even if asked.
+
+## 3. Human-confirm on money — no silent live spend
+
+- A money-moving action in **live** mode never auto-runs. The guard returns
+  `gate: CONFIRM` and a `confirm_card` (amount, mode, exactly what will happen).
+- The agent must show the card and get an explicit human "yes" before making the
+  MCP call. In test mode there is nothing at stake, so no confirm is required.
+- "Money-moving" includes creating a live sales surface (payment link) and
+  creating a tax filing **obligation** (a registration), not just charges.
+
+## 4. Idempotent — a retry is not a duplicate
+
+- Every write carries a deterministic `Idempotency-Key`:
+  `"<skill>:<action>:" + sha256(canonical_json(params))[:32]`.
+- Identical intent → identical key → Stripe returns the original object instead
+  of creating a second one (24h dedupe window).
+- Pass the key as the `Idempotency-Key` header on the MCP write. The eval asserts
+  the key is present and stable across retries.
+
+## How a guard answers
+
+`entry.py` (JSON in) → a plan (JSON out) with one of three gates:
+
+| gate | meaning |
+|---|---|
+| `GO` | Safe — make `mcp_call` now via the Stripe MCP. |
+| `CONFIRM` | Show `confirm_card`; proceed only on an explicit human yes. |
+| `BLOCK` | Do not call Stripe; resolve `blockers` first. |
+
+The guard never calls Stripe. It decides; the agent executes through the MCP.
+That separation is what keeps the rails deterministic and unit-testable.
diff --git a/skills-public/stripe-startup-kit/README.md b/skills-public/stripe-startup-kit/README.md
@@ -0,0 +1,79 @@
+# Stripe Startup Kit — v0.1
+
+Stand up Stripe, sell a digital product or subscription, get paid — **without
+touching live mode by accident.**
+
+Stripe ships the API floor and the developer tooling (agent-toolkit, the hosted
+MCP, its own integration skills). Nobody ships the *founder's* sell-a-thing loop
+with safety rails. This kit does, and the rails are the moat.
+
+## The doctrine — enforced, not recommended
+
+Every skill enforces four rails that the Stripe MCP only *recommends* in prose.
+Full detail in [`DOCTRINE.md`](./DOCTRINE.md).
+
+1. **Test mode first** — live is refused until a green dry run + explicit unlock.
+2. **Least-privilege keys** — restricted `rk_` keys, scoped per skill; a secret
+   `sk_` key is flagged over-privileged.
+3. **Human-confirm on money** — a live money move never auto-runs; it returns a
+   confirm card a human must approve.
+4. **Idempotent** — every write carries a deterministic `Idempotency-Key`.
+
+**Compose the Stripe MCP, never wrap it.** Each skill's `entry.py` is a
+deterministic *guard* (JSON in → guarded plan out, pure stdlib, zero Stripe
+network calls). It hands the agent an exact, safe, idempotent MCP call to make —
+and the agent makes it through the Stripe MCP, only when the gate clears.
+
+## The 5 skills (v0.1 dogfood-critical path)
+
+```
+stripe-stand-up  →  stripe-product-to-price  →  stripe-tax-ready  →  stripe-deliver  →  stripe-revenue-read
+```
+
+| Skill | Job |
+|---|---|
+| `stripe-stand-up` | Sellable account in test mode, least-priv key, refuses live until the dry run is green. |
+| `stripe-product-to-price` | A file / course / idea → correct Product + Price + Payment Link. |
+| `stripe-tax-ready` | Stripe Tax + obligations before go-live (AU GST / reverse-charge / ABN). |
+| `stripe-deliver` | Webhook → license/download/access + receipt, signature-verified and idempotent. |
+| `stripe-revenue-read` | Read-only "how's the business" — the free on-ramp. A key that can't write. |
+
+Deferred to v0.2: `subscription-designer`, `recover`. The loop sells one real
+thing without them first.
+
+> **Naming note for AUDIT:** the brief's short names (`stand-up`, `deliver`, …)
+> are prefixed `stripe-` here for a flat marketplace — generic slugs like
+> `deliver` and `stand-up` collide and trigger on unrelated requests. Trivially
+> reversible (folder name + frontmatter `name`) if the council prefers the bare
+> names.
+
+## Run the skills
+
+Each skill is a self-contained Agent Skills bundle: `SKILL.md` + `entry.py` +
+`rails.py` (+ `references/`). The guard is JSON in, JSON out:
+
+```
+echo '{"action":"create","key":"rk_test_…","params":{"name":"My Course","unit_amount":4900,"currency":"usd"}}' \
+  | python3 stripe-product-to-price/entry.py
+```
+
+Read the returned `gate`: **GO** (make the MCP call), **CONFIRM** (get a human
+yes first), or **BLOCK** (fix the blockers).
+
+## Evals
+
+- `python3 eval/head_to_head.py` — the gating proof: rails vs raw MCP. See
+  [`eval/RESULTS.md`](./eval/RESULTS.md). Rails 6/6, raw MCP 1/6.
+- `python3 eval/skills_eval.py` — per-skill suite, 22/22 (normal / edge / refusal).
+
+## Status & boundaries
+
+- **TEST MODE only** in v0.1. The real-money dogfood and the listing/packaging
+  decision are downstream, **council-gated** steps — not part of this build.
+- `rails.py` is identical across the five bundles (each must stand alone to be
+  publishable). The source of truth is `./rails.py`; `python3 sync_rails.py`
+  copies it into each skill folder.
+
+---
+
+*Solid State — solidstate.cc. Most skills are noise. Ship the signal.*
diff --git a/skills-public/stripe-startup-kit/eval/RESULTS.md b/skills-public/stripe-startup-kit/eval/RESULTS.md
@@ -0,0 +1,62 @@
+# Head-to-head eval — doctrine rails vs raw Stripe MCP
+
+**Gating question (from the AUTHOR brief, SOL-88):** do the kit's safety rails
+*meaningfully* beat the raw Stripe MCP? If not, the kit is just a wrapper and the
+build must STOP and flag on SOL-86.
+
+**Verdict: RAILS BEAT RAW MCP. The build is not a wrapper.**
+
+Reproduce: `python3 eval/head_to_head.py` (exit 0 = pass, 1 = stop-and-flag).
+Last run 2026-06-18, Python 3.12.
+
+## Result
+
+```
+scenario                                             rail           rails  raw_mcp  gate
+--------------------------------------------------------------------------------------------
+test mode first: live key, not unlocked              test_first     PASS   FAIL     BLOCK
+test mode first: live unlock without green dry run   test_first     PASS   FAIL     BLOCK
+human-confirm: live money move, not yet confirmed    human_confirm  PASS   FAIL     CONFIRM
+idempotent: write carries an idempotency key         idempotent     PASS   FAIL     GO
+least-privilege: secret key flagged over-privileged  least_priv     PASS   FAIL     GO
+fairness: properly unlocked + confirmed -> allowed   fair_allow     PASS   PASS     GO
+--------------------------------------------------------------------------------------------
+SCORE                                                               6/6    1/6
+
+idempotency deterministic across retries: YES
+rails advantage (delta): +5
+```
+
+The rails enforce all four doctrine dimensions and still **allow** a properly
+unlocked, confirmed live sale (the fairness row) — so they are a precise gate,
+not a blanket block. Raw MCP passes only that one row.
+
+## Why the raw_mcp baseline is faithful, not a strawman
+
+`raw_mcp` executes every write as-is with none of the four rails. That is the
+**documented** behavior of the hosted Stripe MCP, per the SOL-86 RESEARCH gate
+(first-party audit, observed 2026-06-18, sourced to docs.stripe.com/mcp):
+
+- *"the MCP will happily run live-mode writes; nothing blocks it pending a green
+  dry run."*
+- Least-privilege keys and human confirmation are *"recommended in prose only —
+  not enforced."*
+- *"Test-mode-first: no enforcement found. Idempotency: no default found."*
+
+So modeling raw_mcp as enforcing zero rails matches what Stripe actually ships.
+The delta is the gap the kit fills.
+
+## Honest limitation
+
+This eval tests the **enforcement layer** deterministically: given the same
+adversarial intent, the rails gate it and raw MCP does not. It does *not* model
+an agent that diligently follows Stripe's prose recommendations by hand — a
+careful agent *could* approximate some rails manually. The kit's claim is the
+narrower, true one: the rails enforce **by default and deterministically** what
+raw MCP leaves to optional prose plus agent vigilance. That default-on
+enforcement, unit-tested, is the moat.
+
+## Per-skill eval
+
+`python3 eval/skills_eval.py` — 22/22 cases pass (normal, edge, and
+out-of-scope/refusal cases across all five skills). No eval, no handoff.
diff --git a/skills-public/stripe-startup-kit/eval/head_to_head.py b/skills-public/stripe-startup-kit/eval/head_to_head.py
@@ -0,0 +1,132 @@
+#!/usr/bin/env python3
+"""Head-to-head eval — doctrine rails vs raw Stripe MCP. The gating proof.
+
+The kit's whole thesis: the four-part safety doctrine is the moat because the
+Stripe MCP only *recommends* it, never enforces it. This eval tests that claim
+deterministically. If the rails do not beat raw MCP by a meaningful margin, the
+kit is just a wrapper — this script exits non-zero and the build must STOP and
+flag on SOL-86 (per the AUTHOR brief).
+
+Two actors face identical adversarial intents:
+
+  rails    — the actual kit guards (each skill's entry.py, run as a subprocess).
+  raw_mcp  — a faithful model of the documented Stripe MCP baseline. Per the
+             RESEARCH gate (SOL-86, first-party audit, 2026-06-18): the hosted
+             MCP "will happily run live-mode writes; nothing blocks it pending a
+             green dry run"; least-privilege keys and human confirmation are
+             "recommended in prose only — not enforced"; "no test-mode-first or
+             idempotency default." So raw_mcp executes every write as-is, with
+             none of the four rails. This is the documented behavior, not a
+             strawman — see RESULTS.md for the quoted sources.
+
+No network. No Stripe calls. Pure decision-layer comparison.
+"""
+
+import json
+import os
+import subprocess
+import sys
+
+KIT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+
+GREEN = {"status": "green", "passed": True}
+P2P = "stripe-product-to-price"
+BASE = {"name": "Test thing", "unit_amount": 1500, "currency": "usd"}
+
+
+def run_skill(skill, intent):
+    """Invoke a skill's entry.py exactly as an agent runtime would."""
+    p = subprocess.run(
+        [sys.executable, os.path.join(KIT, skill, "entry.py"), "--json", json.dumps(intent)],
+        capture_output=True, text=True, stdin=subprocess.DEVNULL,
+    )
+    return json.loads(p.stdout)
+
+
+def rails_facts(out):
+    """Normalize a guard's output to the dimensions the eval judges."""
+    steps = out.get("steps", [])
+    idem = out.get("idempotency_key") or next(
+        (s.get("idempotency_key") for s in steps if s.get("idempotency_key")), None
+    )
+    warns = list(out.get("warnings", [])) + [w for s in steps for w in s.get("warnings", [])]
+    return {
+        "gate": out.get("gate"),
+        "idem": idem,
+        "lp_warning": any("over-privileged" in w or "restricted" in w for w in warns),
+    }
+
+
+# raw_mcp: executes every write unguarded — the documented baseline.
+RAW_MCP = {"gate": "EXECUTE", "idem": None, "lp_warning": False}
+
+
+def judge(dim, facts):
+    if dim == "test_first":
+        return facts["gate"] == "BLOCK"
+    if dim == "human_confirm":
+        return facts["gate"] == "CONFIRM"
+    if dim == "idempotent":
+        return facts["idem"] is not None
+    if dim == "least_priv":
+        return facts["lp_warning"]
+    if dim == "fair_allow":
+        return facts["gate"] in ("GO", "EXECUTE")
+    raise ValueError(dim)
+
+
+# Six scenarios across the four rails + a fairness anchor (rails must ALLOW when
+# conditions are properly met, or they'd be a useless blanket block).
+SCENARIOS = [
+    ("test mode first: live key, not unlocked", "test_first", P2P,
+     {"action": "create", "key": "sk_live_x", "params": BASE}),
+    ("test mode first: live unlock without green dry run", "test_first", P2P,
+     {"action": "create", "key": "rk_live_x", "allow_live": True,
+      "dry_run_receipt": {"status": "red", "passed": False}, "params": BASE}),
+    ("human-confirm: live money move, not yet confirmed", "human_confirm", P2P,
+     {"action": "create", "key": "rk_live_x", "allow_live": True,
+      "dry_run_receipt": GREEN, "params": BASE}),
+    ("idempotent: write carries an idempotency key", "idempotent", P2P,
+     {"action": "create", "key": "rk_test_x", "params": BASE}),
+    ("least-privilege: secret key flagged over-privileged", "least_priv", P2P,
+     {"action": "create", "key": "sk_test_x", "params": BASE}),
+    ("fairness: properly unlocked + confirmed -> allowed", "fair_allow", P2P,
+     {"action": "create", "key": "rk_live_x", "allow_live": True, "confirmed": True,
+      "dry_run_receipt": GREEN, "params": BASE}),
+]
+
+
+def main():
+    rows, rails_score, raw_score = [], 0, 0
+    for name, dim, skill, intent in SCENARIOS:
+        rf = rails_facts(run_skill(skill, intent))
+        rp, mp = judge(dim, rf), judge(dim, RAW_MCP)
+        rails_score += rp
+        raw_score += mp
+        rows.append((name, dim, "PASS" if rp else "FAIL", "PASS" if mp else "FAIL",
+                     rf["gate"]))
+
+    # Idempotency determinism: identical intent twice must yield the same key.
+    a = rails_facts(run_skill(P2P, {"action": "create", "key": "rk_test_x", "params": BASE}))
+    b = rails_facts(run_skill(P2P, {"action": "create", "key": "rk_test_x", "params": BASE}))
+    deterministic = a["idem"] == b["idem"] and a["idem"] is not None
+
+    delta = rails_score - raw_score
+    # Meaningful margin: rails must catch at least 4 of 6, and beat raw by >= 3.
+    passed = rails_score >= 5 and delta >= 3 and deterministic
+
+    print("HEAD-TO-HEAD — doctrine rails vs raw Stripe MCP\n")
+    print(f"{'scenario':52} {'rail':14} {'rails':6} {'raw_mcp':8} gate")
+    print("-" * 92)
+    for name, dim, rp, mp, gate in rows:
+        print(f"{name:52} {dim:14} {rp:6} {mp:8} {gate}")
+    print("-" * 92)
+    print(f"{'SCORE':52} {'':14} {str(rails_score)+'/6':6} {str(raw_score)+'/6':8}")
+    print(f"\nidempotency deterministic across retries: {'YES' if deterministic else 'NO'}")
+    print(f"rails advantage (delta): +{delta}")
+    print(f"\nVERDICT: {'RAILS BEAT RAW MCP — build is not a wrapper.' if passed else 'NO MEANINGFUL DELTA — STOP and flag on SOL-86.'}")
+    return 0 if passed else 1
+
+
+if __name__ == "__main__":
+    sys.exit(main())