Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions skills-public/stripe-startup-kit/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
__pycache__/
*.pyc
54 changes: 54 additions & 0 deletions skills-public/stripe-startup-kit/DOCTRINE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# The safety doctrine

The four rails every kit skill enforces. Stripe's own docs *recommend* all four;
the Stripe MCP enforces none by default. The kit enforces them in `rails.py`,
deterministically, so they cannot be skipped by agent judgment. This is the moat.

## 1. Test mode first — live never by accident

- A key's mode is read from its prefix: `*_test_` vs `*_live_`. Authoritative.
- A **live** key is **refused** (`gate: BLOCK`) unless *both*: `allow_live=true`
**and** a green dry-run receipt (`{"status":"green","passed":true}`).
- Reads are exempt — a read-only key in live mode is harmless.
- The dry run (see `stripe-stand-up`) is the only thing that produces a green
receipt, so "prove it works in test" gates "sell in live."

## 2. Least-privilege keys — restricted, scoped per skill

- Every skill declares the **minimal** scope it needs (`scope_manifest`). Create
an `rk_` (restricted) key with exactly that, nothing more.
- A secret `sk_` key works but is **over-privileged** → `warning` + the scope to
recreate it correctly. A publishable `pk_` key cannot do server work → `BLOCK`.
- `revenue-read`'s scope is read-only across every money surface — it cannot
write even if asked.

## 3. Human-confirm on money — no silent live spend

- A money-moving action in **live** mode never auto-runs. The guard returns
`gate: CONFIRM` and a `confirm_card` (amount, mode, exactly what will happen).
- The agent must show the card and get an explicit human "yes" before making the
MCP call. In test mode there is nothing at stake, so no confirm is required.
- "Money-moving" includes creating a live sales surface (payment link) and
creating a tax filing **obligation** (a registration), not just charges.

## 4. Idempotent — a retry is not a duplicate

- Every write carries a deterministic `Idempotency-Key`:
`"<skill>:<action>:" + sha256(canonical_json(params))[:32]`.
- Identical intent → identical key → Stripe returns the original object instead
of creating a second one (24h dedupe window).
- Pass the key as the `Idempotency-Key` header on the MCP write. The eval asserts
the key is present and stable across retries.

## How a guard answers

`entry.py` (JSON in) → a plan (JSON out) with one of three gates:

| gate | meaning |
|---|---|
| `GO` | Safe — make `mcp_call` now via the Stripe MCP. |
| `CONFIRM` | Show `confirm_card`; proceed only on an explicit human yes. |
| `BLOCK` | Do not call Stripe; resolve `blockers` first. |

The guard never calls Stripe. It decides; the agent executes through the MCP.
That separation is what keeps the rails deterministic and unit-testable.
79 changes: 79 additions & 0 deletions skills-public/stripe-startup-kit/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Stripe Startup Kit — v0.1

Stand up Stripe, sell a digital product or subscription, get paid — **without
touching live mode by accident.**

Stripe ships the API floor and the developer tooling (agent-toolkit, the hosted
MCP, its own integration skills). Nobody ships the *founder's* sell-a-thing loop
with safety rails. This kit does, and the rails are the moat.

## The doctrine — enforced, not recommended

Every skill enforces four rails that the Stripe MCP only *recommends* in prose.
Full detail in [`DOCTRINE.md`](./DOCTRINE.md).

1. **Test mode first** — live is refused until a green dry run + explicit unlock.
2. **Least-privilege keys** — restricted `rk_` keys, scoped per skill; a secret
`sk_` key is flagged over-privileged.
3. **Human-confirm on money** — a live money move never auto-runs; it returns a
confirm card a human must approve.
4. **Idempotent** — every write carries a deterministic `Idempotency-Key`.

**Compose the Stripe MCP, never wrap it.** Each skill's `entry.py` is a
deterministic *guard* (JSON in → guarded plan out, pure stdlib, zero Stripe
network calls). It hands the agent an exact, safe, idempotent MCP call to make —
and the agent makes it through the Stripe MCP, only when the gate clears.

## The 5 skills (v0.1 dogfood-critical path)

```
stripe-stand-up → stripe-product-to-price → stripe-tax-ready → stripe-deliver → stripe-revenue-read
```

| Skill | Job |
|---|---|
| `stripe-stand-up` | Sellable account in test mode, least-priv key, refuses live until the dry run is green. |
| `stripe-product-to-price` | A file / course / idea → correct Product + Price + Payment Link. |
| `stripe-tax-ready` | Stripe Tax + obligations before go-live (AU GST / reverse-charge / ABN). |
| `stripe-deliver` | Webhook → license/download/access + receipt, signature-verified and idempotent. |
| `stripe-revenue-read` | Read-only "how's the business" — the free on-ramp. A key that can't write. |

Deferred to v0.2: `subscription-designer`, `recover`. The loop sells one real
thing without them first.

> **Naming note for AUDIT:** the brief's short names (`stand-up`, `deliver`, …)
> are prefixed `stripe-` here for a flat marketplace — generic slugs like
> `deliver` and `stand-up` collide and trigger on unrelated requests. Trivially
> reversible (folder name + frontmatter `name`) if the council prefers the bare
> names.

## Run the skills

Each skill is a self-contained Agent Skills bundle: `SKILL.md` + `entry.py` +
`rails.py` (+ `references/`). The guard is JSON in, JSON out:

```
echo '{"action":"create","key":"rk_test_…","params":{"name":"My Course","unit_amount":4900,"currency":"usd"}}' \
| python3 stripe-product-to-price/entry.py
```

Read the returned `gate`: **GO** (make the MCP call), **CONFIRM** (get a human
yes first), or **BLOCK** (fix the blockers).

## Evals

- `python3 eval/head_to_head.py` — the gating proof: rails vs raw MCP. See
[`eval/RESULTS.md`](./eval/RESULTS.md). Rails 6/6, raw MCP 1/6.
- `python3 eval/skills_eval.py` — per-skill suite, 22/22 (normal / edge / refusal).

## Status & boundaries

- **TEST MODE only** in v0.1. The real-money dogfood and the listing/packaging
decision are downstream, **council-gated** steps — not part of this build.
- `rails.py` is identical across the five bundles (each must stand alone to be
publishable). The source of truth is `./rails.py`; `python3 sync_rails.py`
copies it into each skill folder.

---

*Solid State — solidstate.cc. Most skills are noise. Ship the signal.*
62 changes: 62 additions & 0 deletions skills-public/stripe-startup-kit/eval/RESULTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Head-to-head eval — doctrine rails vs raw Stripe MCP

**Gating question (from the AUTHOR brief, SOL-88):** do the kit's safety rails
*meaningfully* beat the raw Stripe MCP? If not, the kit is just a wrapper and the
build must STOP and flag on SOL-86.

**Verdict: RAILS BEAT RAW MCP. The build is not a wrapper.**

Reproduce: `python3 eval/head_to_head.py` (exit 0 = pass, 1 = stop-and-flag).
Last run 2026-06-18, Python 3.12.

## Result

```
scenario rail rails raw_mcp gate
--------------------------------------------------------------------------------------------
test mode first: live key, not unlocked test_first PASS FAIL BLOCK
test mode first: live unlock without green dry run test_first PASS FAIL BLOCK
human-confirm: live money move, not yet confirmed human_confirm PASS FAIL CONFIRM
idempotent: write carries an idempotency key idempotent PASS FAIL GO
least-privilege: secret key flagged over-privileged least_priv PASS FAIL GO
fairness: properly unlocked + confirmed -> allowed fair_allow PASS PASS GO
--------------------------------------------------------------------------------------------
SCORE 6/6 1/6

idempotency deterministic across retries: YES
rails advantage (delta): +5
```

The rails enforce all four doctrine dimensions and still **allow** a properly
unlocked, confirmed live sale (the fairness row) — so they are a precise gate,
not a blanket block. Raw MCP passes only that one row.

## Why the raw_mcp baseline is faithful, not a strawman

`raw_mcp` executes every write as-is with none of the four rails. That is the
**documented** behavior of the hosted Stripe MCP, per the SOL-86 RESEARCH gate
(first-party audit, observed 2026-06-18, sourced to docs.stripe.com/mcp):

- *"the MCP will happily run live-mode writes; nothing blocks it pending a green
dry run."*
- Least-privilege keys and human confirmation are *"recommended in prose only —
not enforced."*
- *"Test-mode-first: no enforcement found. Idempotency: no default found."*

So modeling raw_mcp as enforcing zero rails matches what Stripe actually ships.
The delta is the gap the kit fills.

## Honest limitation

This eval tests the **enforcement layer** deterministically: given the same
adversarial intent, the rails gate it and raw MCP does not. It does *not* model
an agent that diligently follows Stripe's prose recommendations by hand — a
careful agent *could* approximate some rails manually. The kit's claim is the
narrower, true one: the rails enforce **by default and deterministically** what
raw MCP leaves to optional prose plus agent vigilance. That default-on
enforcement, unit-tested, is the moat.

## Per-skill eval

`python3 eval/skills_eval.py` — 22/22 cases pass (normal, edge, and
out-of-scope/refusal cases across all five skills). No eval, no handoff.
132 changes: 132 additions & 0 deletions skills-public/stripe-startup-kit/eval/head_to_head.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
#!/usr/bin/env python3
"""Head-to-head eval — doctrine rails vs raw Stripe MCP. The gating proof.

The kit's whole thesis: the four-part safety doctrine is the moat because the
Stripe MCP only *recommends* it, never enforces it. This eval tests that claim
deterministically. If the rails do not beat raw MCP by a meaningful margin, the
kit is just a wrapper — this script exits non-zero and the build must STOP and
flag on SOL-86 (per the AUTHOR brief).

Two actors face identical adversarial intents:

rails — the actual kit guards (each skill's entry.py, run as a subprocess).
raw_mcp — a faithful model of the documented Stripe MCP baseline. Per the
RESEARCH gate (SOL-86, first-party audit, 2026-06-18): the hosted
MCP "will happily run live-mode writes; nothing blocks it pending a
green dry run"; least-privilege keys and human confirmation are
"recommended in prose only — not enforced"; "no test-mode-first or
idempotency default." So raw_mcp executes every write as-is, with
none of the four rails. This is the documented behavior, not a
strawman — see RESULTS.md for the quoted sources.

No network. No Stripe calls. Pure decision-layer comparison.
"""

import json
import os
import subprocess
import sys

KIT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))

GREEN = {"status": "green", "passed": True}
P2P = "stripe-product-to-price"
BASE = {"name": "Test thing", "unit_amount": 1500, "currency": "usd"}


def run_skill(skill, intent):
"""Invoke a skill's entry.py exactly as an agent runtime would."""
p = subprocess.run(
[sys.executable, os.path.join(KIT, skill, "entry.py"), "--json", json.dumps(intent)],
capture_output=True, text=True, stdin=subprocess.DEVNULL,
)
return json.loads(p.stdout)


def rails_facts(out):
"""Normalize a guard's output to the dimensions the eval judges."""
steps = out.get("steps", [])
idem = out.get("idempotency_key") or next(
(s.get("idempotency_key") for s in steps if s.get("idempotency_key")), None
)
warns = list(out.get("warnings", [])) + [w for s in steps for w in s.get("warnings", [])]
return {
"gate": out.get("gate"),
"idem": idem,
"lp_warning": any("over-privileged" in w or "restricted" in w for w in warns),
}


# raw_mcp: executes every write unguarded — the documented baseline.
RAW_MCP = {"gate": "EXECUTE", "idem": None, "lp_warning": False}


def judge(dim, facts):
if dim == "test_first":
return facts["gate"] == "BLOCK"
if dim == "human_confirm":
return facts["gate"] == "CONFIRM"
if dim == "idempotent":
return facts["idem"] is not None
if dim == "least_priv":
return facts["lp_warning"]
if dim == "fair_allow":
return facts["gate"] in ("GO", "EXECUTE")
raise ValueError(dim)


# Six scenarios across the four rails + a fairness anchor (rails must ALLOW when
# conditions are properly met, or they'd be a useless blanket block).
SCENARIOS = [
("test mode first: live key, not unlocked", "test_first", P2P,
{"action": "create", "key": "sk_live_x", "params": BASE}),
("test mode first: live unlock without green dry run", "test_first", P2P,
{"action": "create", "key": "rk_live_x", "allow_live": True,
"dry_run_receipt": {"status": "red", "passed": False}, "params": BASE}),
("human-confirm: live money move, not yet confirmed", "human_confirm", P2P,
{"action": "create", "key": "rk_live_x", "allow_live": True,
"dry_run_receipt": GREEN, "params": BASE}),
("idempotent: write carries an idempotency key", "idempotent", P2P,
{"action": "create", "key": "rk_test_x", "params": BASE}),
("least-privilege: secret key flagged over-privileged", "least_priv", P2P,
{"action": "create", "key": "sk_test_x", "params": BASE}),
("fairness: properly unlocked + confirmed -> allowed", "fair_allow", P2P,
{"action": "create", "key": "rk_live_x", "allow_live": True, "confirmed": True,
"dry_run_receipt": GREEN, "params": BASE}),
]


def main():
rows, rails_score, raw_score = [], 0, 0
for name, dim, skill, intent in SCENARIOS:
rf = rails_facts(run_skill(skill, intent))
rp, mp = judge(dim, rf), judge(dim, RAW_MCP)
rails_score += rp
raw_score += mp
rows.append((name, dim, "PASS" if rp else "FAIL", "PASS" if mp else "FAIL",
rf["gate"]))

# Idempotency determinism: identical intent twice must yield the same key.
a = rails_facts(run_skill(P2P, {"action": "create", "key": "rk_test_x", "params": BASE}))
b = rails_facts(run_skill(P2P, {"action": "create", "key": "rk_test_x", "params": BASE}))
deterministic = a["idem"] == b["idem"] and a["idem"] is not None

delta = rails_score - raw_score
# Meaningful margin: rails must catch at least 4 of 6, and beat raw by >= 3.
passed = rails_score >= 5 and delta >= 3 and deterministic

print("HEAD-TO-HEAD — doctrine rails vs raw Stripe MCP\n")
print(f"{'scenario':52} {'rail':14} {'rails':6} {'raw_mcp':8} gate")
print("-" * 92)
for name, dim, rp, mp, gate in rows:
print(f"{name:52} {dim:14} {rp:6} {mp:8} {gate}")
print("-" * 92)
print(f"{'SCORE':52} {'':14} {str(rails_score)+'/6':6} {str(raw_score)+'/6':8}")
print(f"\nidempotency deterministic across retries: {'YES' if deterministic else 'NO'}")
print(f"rails advantage (delta): +{delta}")
print(f"\nVERDICT: {'RAILS BEAT RAW MCP — build is not a wrapper.' if passed else 'NO MEANINGFUL DELTA — STOP and flag on SOL-86.'}")
return 0 if passed else 1


if __name__ == "__main__":
sys.exit(main())
Loading