Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 84 additions & 0 deletions .claude/agents/pagekit-evaluator-pass.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
name: pagekit-evaluator-pass
description: Skeptical, read-only adversarial second read of a completed PageKit run. Produces runs/<name>/evaluator-pass.md. Runs after evaluation.md exists. Use when the user asks for an evaluator pass or when the pagekit orchestrator finishes a run and needs one.
tools: Read, Grep, Glob, Bash, Write
---

You are the PageKit evaluator-pass subagent.

## Job

Read a completed PageKit run (with `evaluation.md` already written) and produce an adversarial second read. You are read-only on everything except `runs/<name>/evaluator-pass.md`.

The evaluator pass is not the evaluation. The evaluation is the run's own honest read. The evaluator pass is a skeptical outside voice that pressure-tests what the run is quietly claiming.

## Inputs

The orchestrator (or the user) gives you:
- path to the run folder (e.g., `runs/kind-bowl-real/`)

Expect these inputs inside the run folder:
- `goal.md`
- all 7 artifacts (signal-doc, message-spine, first-page-decision, page-argument-shape, proof-map, first-page-draft, claim-check)
- `evaluation.md`
- `working-log.md`

## Read first

- `frameworks/run-logging.md` — what the evaluator pass is for and where it fits
- `frameworks/anti-slop.md` — the patterns that matter
- The existing `evaluation.md` — to know what the run claims

## Procedure

1. Walk the run end-to-end. Look for what the run is quietly claiming that deserves scrutiny. Examples of quiet claims:
- "The first-page decision is right" — is it? What is the confidence basis (data / signal / hypothesis)? If hypothesis, does the run label it as such?
- "The draft is ready for publication" — is it? What is still missing (testimonials, real mechanism detail, scope confirmations)?
- "The claim-check caught everything" — did it? Are there lines in the corrected draft that still read as slop or overclaim?
- "This proves X" — does it? Single runs prove less than the evaluation sometimes suggests.
2. For each scrutiny item, state the **implication**: what does it suggest for the method, the template, the framework, the skill, or the anti-slop rules?
3. End with a **punch list** of specific repo improvements the run exposed. Each item should name the file that would change. This is how the run-to-repo-improvement loop closes.
4. Close with a **final evaluator read**: one paragraph. Is the run real? What does it prove? What does it not prove?

## Output

Write `runs/<name>/evaluator-pass.md` with sections:

```markdown
# Evaluator Pass

Skeptical second read of the run. Not the same voice as `evaluation.md`; this pass is deliberately adversarial.

## Things the run is quietly claiming that deserve scrutiny

### 1. <claim>
<reasoning>

**Implication:** <what this suggests for the method or the repo>

### 2. <claim>
...

## Punch list for repo improvement

1. **<file path>** — <specific change>
2. **<file path>** — <specific change>

## Final evaluator read

<one-paragraph honest read>
```

## Hard rules

- Do not reward defensiveness. Skeptical means skeptical.
- Do not generate new artifacts for the run (no new drafts, no new proof maps). You are read-only on the run itself.
- Do not mark items as "already fixed" that the run did not actually fix.
- If the run skipped the claim-check step or did not hit the fully-logged tier, call it out in section 1 regardless of what `evaluation.md` says.

## Return

When done, hand back to the caller:
- path to `runs/<name>/evaluator-pass.md`
- count of scrutiny items and punch-list items
- final evaluator read in one line
43 changes: 43 additions & 0 deletions .claude/skills/pagekit-evaluator-pass/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
---
name: pagekit-evaluator-pass
description: Run the adversarial evaluator pass on a completed PageKit run. Post-evaluation skeptical second read. Use when a run has an evaluation.md but no evaluator-pass.md, or when the orchestrator needs an evaluator pass. Delegates to the pagekit-evaluator-pass subagent.
---

# PageKit Evaluator Pass

You are invoking the adversarial evaluator pass on a completed run.

## When to use

- After the run's `evaluation.md` is written.
- Before declaring the run fully-logged (the fully-logged tier requires `evaluator-pass.md`).
- When a reviewer asks "is this run actually as good as the evaluation says?"

## Read first
- `frameworks/run-logging.md` — where the evaluator pass fits in the run-logging tier
- `.claude/agents/pagekit-evaluator-pass.md` — the subagent you will delegate to

## Inputs
- path to the run folder (e.g., `runs/kind-bowl-real/`)

## Procedure

1. Confirm the run has `evaluation.md`. If not, the evaluator pass is premature — finish the evaluation first.
2. Invoke the `pagekit-evaluator-pass` subagent. Pass the run-folder path.
3. The subagent runs read-only, produces `runs/<name>/evaluator-pass.md`, and hands back a summary (count of scrutiny items, count of punch-list items, one-line final read).
4. Relay the result to the caller. If the punch list contains actionable repo changes, name them for the caller so the run-to-repo-improvement loop can close.

## Quality gate

A strong evaluator pass:
- names specific quiet claims from the run, not general impressions
- writes an "implication" for each scrutiny item (what this suggests for the method or the repo)
- ends with a concrete punch list of file-level changes
- does not reward defensiveness

If the subagent's output is general rather than specific, ask it to retry with more direct reference to the run's artifacts.

## Relationship to `pagekit-claim-checker`

- `pagekit-claim-checker` reviews the **draft** line by line at a chosen severity.
- `pagekit-evaluator-pass` reviews the **whole run** including what the evaluation quietly claims. Different target, different voice.
31 changes: 31 additions & 0 deletions .github/ISSUE_TEMPLATE/bug.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
name: Bug (scripts / skills / hook misbehaving)
about: Something in the PageKit tooling broke. Not a method issue.
title: "[bug] "
labels: bug
---

## What broke

<!-- `doctor.sh` FAIL, `slop-check.sh` false positive, `/pagekit` skill not auto-invoking, SessionStart hook erroring, etc. -->

## Repro

1. <!-- Step -->
2. <!-- Step -->
3. <!-- Step -->

## Expected vs. actual

<!-- What did you think would happen. What happened instead. -->

## Environment

- OS:
- Shell:
- Claude Code / Codex / Cowork version:
- Anything non-default about the install?

## Output

<!-- Paste the error, the relevant log, or the surprising behavior. Trim aggressively. -->
22 changes: 22 additions & 0 deletions .github/ISSUE_TEMPLATE/method-feedback.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
name: Method feedback (I used PageKit)
about: You ran PageKit on an object and something broke, drifted, or felt wrong.
title: "[feedback] "
labels: method-feedback
---

## What you were trying to do

<!-- Object type, what page you were building, what tool (Claude Code / Codex / Cowork / chat / quickstart). -->

## What actually happened

<!-- Where in the 7-step chain did it go sideways? What did you expect vs. what you got? -->

## Run link (if logged)

<!-- If the run is in a repo, paste the path to the run folder. If possible, paste the `evaluation.md` or at minimum the `first-page-decision.md`. -->

## Your read on the fix

<!-- Is this a prompt issue, a framework issue, a skill-description issue, a template issue? If you already know, say so. If you want us to diagnose, say that. -->
26 changes: 26 additions & 0 deletions .github/ISSUE_TEMPLATE/method-proposal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
name: Method proposal (add / change / retire a part of PageKit)
about: You want to add a step, framework, template, skill, or change the chain.
title: "[proposal] "
labels: method-proposal
---

## What you're proposing

<!-- One line. Add a step, change a framework, retire a template, add a rule to anti-slop, etc. -->

## Why this exists

<!-- What problem does it solve? Ideally point at a logged run or an evaluator-pass punch list that surfaced the gap. Proposals without run evidence land lower. -->

## What changes

<!-- Files touched. If this adds to the canonical chain, which step / framework / template / prompt / skill / guided-run does it touch? -->

## What stays stable

<!-- Does this keep the 6-step chain intact, or propose a change to the chain itself? Large changes to the canonical method need more evidence. -->

## Risk

<!-- What could go wrong? What downstream surfaces might break? -->
23 changes: 23 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
<!--
PageKit PR template. Keep these sections. Short and concrete beats polished.
Delete the comments before submitting.
-->

## Summary

<!-- One paragraph. What changes, why. If this is a run-driven method change, name the run that surfaced it. -->

## What changed

<!-- Bullets. File-level if useful. Keep it tight. -->

## Verification

- [ ] `bash scripts/doctor.sh` → PASS
- [ ] `bash scripts/slop-check.sh` → exit 0 clean
- [ ] (if the PR touches run structure) `bash scripts/run-check.sh runs/<name>` → FULLY LOGGED (or tier you target)
- [ ] (if the PR touches prompts/frameworks) reviewed against `AGENTS.md` and `CLAUDE.md` for consistency

## Notes for the reviewer

<!-- Anything worth calling out: trade-offs, things deferred, open questions. -->
23 changes: 23 additions & 0 deletions .github/workflows/check.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
name: PageKit checks

on:
push:
branches: [main]
pull_request:
branches: [main]

jobs:
checks:
name: doctor + slop-check
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Make scripts executable
run: chmod +x scripts/*.sh

- name: Run doctor
run: bash scripts/doctor.sh

- name: Run slop-check (default scan)
run: bash scripts/slop-check.sh
33 changes: 33 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Pre-commit configuration for PageKit.
#
# Install locally:
# pip install pre-commit
# pre-commit install
#
# What runs on every commit:
# - slop-check.sh against tracked drafts (catches the mechanical AI-slop
# tells before the PR stage)
# - doctor.sh (the same pre-flight that the SessionStart hook runs)
#
# Slower / semantic checks (claim-check prompt expansion, run-check) are
# not wired here because they need arguments. Invoke them manually from
# the run folder.

repos:
- repo: local
hooks:
- id: pagekit-slop-check
name: PageKit slop-check (default scan)
entry: bash scripts/slop-check.sh
language: system
pass_filenames: false
always_run: true
stages: [pre-commit]

- id: pagekit-doctor
name: PageKit doctor
entry: bash scripts/doctor.sh
language: system
pass_filenames: false
always_run: true
stages: [pre-commit]
50 changes: 50 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Changelog

All notable changes to PageKit are documented here.

Format loosely follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). SemVer applies loosely — we bump minor when the canonical method changes, patch when only docs/tooling change.

## [Unreleased]

### Added
- `.github/workflows/check.yml` runs `doctor.sh` and `slop-check.sh` on every push and PR.
- `.github/PULL_REQUEST_TEMPLATE.md` + issue templates (`method-feedback`, `bug`, `method-proposal`).
- `CONTRIBUTING.md` — four shapes of contribution, required checks, what bad contribution looks like.
- `CHANGELOG.md` — this file.
- `.pre-commit-config.yaml` — local enforcement of `slop-check.sh`.
- `scripts/new-source-brief.sh` — scaffold an individual source brief (wedge / mechanism / proof / comparison).
- `.claude/agents/pagekit-evaluator-pass.md` — read-only subagent for adversarial evaluator-pass work.
- `.claude/skills/pagekit-evaluator-pass/SKILL.md` — skill wrapper that delegates to the subagent.
- `scripts/run-check.sh` — new tier above FULLY LOGGED: **PUBLISHABLE** (fully logged + claim-check present + slop-check clean).

### Changed
- `README.md` — sharper public-facing hero; points at `runs/vegan-dog-food-verdel/` as the canonical worked example.
- `scripts/doctor.sh` — includes the new subagent and skill in its manifest checks.

## [0.1.0] — 2026-04-14

Baseline public release. The agentic foundation is in place.

### Added
- Canonical method manifest (`pagekit.yaml`) and canonical prompts (`prompts/01-07-*.md`).
- `AGENTS.md` (Codex-first agent contract) + expanded `CLAUDE.md` operational section.
- Anti-slop framework (`frameworks/anti-slop.md`) + regression script (`scripts/slop-check.sh`).
- Claim-check framework + severity calibration (light / normal / hard) + `ai-slop tell` flag type.
- Run-logging framework with fully-logged / summary-logged tiers; `sources/` and `evaluator-pass` required at fully-logged.
- First-page-decision framework with hard "case FOR each candidate" requirement and the first-page-alternatives-vs-later-funnel-pages distinction.
- Page-argument-shape framework with length/density consideration and the anti-slop drafting constraints block.
- 8 templates including `output-judgment-template.md`, `wedge-definition-template.md`, `claim-check-template.md`.
- 7 guided-run READMEs (`guided-runs/01-07`).
- 5 tool guides (ChatGPT, Claude, Perplexity, Grok, OpenAI) and 5 matching quickstarts — all reference canonical prompts by path, not by copy.
- Third tier `agentic/` with paths for Claude Code, Codex, Claude Cowork.
- Claude Code skills bundle: master `pagekit` orchestrator + 7 per-step skills + 3 tooling skills + `pagekit-claim-checker` subagent.
- `.claude/settings.json` SessionStart hook running `scripts/doctor.sh`.
- Scripts: `new-run.sh`, `run-check.sh`, `claim-check.sh`, `slop-check.sh`, `doctor.sh`; `Makefile` with discoverable targets.
- `runs/vegan-dog-food-verdel/` — first fully-logged run on the agentic foundation. Non-homepage first page. First real exercise of the claim-check step.
- `LICENSE` (MIT) and `.gitignore`.

### Notes
- PRs #1–#8 all merged prior to this release. See the GitHub PR history for the run-to-repo-improvement loop that produced the method surface.

[Unreleased]: https://github.com/hnshah/pagekit/compare/v0.1.0...HEAD
[0.1.0]: https://github.com/hnshah/pagekit/releases/tag/v0.1.0
Loading
Loading