Apply Me CRM evaluator-pass punch list + fix claim-checker subagent bug by hnshah · Pull Request #11 · hnshah/pagekit

hnshah · 2026-04-15T01:56:50Z

The personal-crm-founders run (first real agentic fully-logged run after the PR #10 enforcement tightenings) surfaced a 4-item punch list in its evaluator-pass.md. This PR applies all four. It also fixes a bug in the pagekit-claim-checker subagent that the run exposed.

This is the run-to-repo-improvement loop working exactly as designed: run produces evaluator-pass produces specific repo changes in the very next PR.

Punch list (from `runs/personal-crm-founders/evaluator-pass.md`)

1. First-page-decision template: falsification prompt

templates/first-page-decision-template.md — added an "If this is a hypothesis: what would falsify it?" sub-field under Confidence basis. Stops hypothesis-level decisions from being silently promoted to conclusions. Required when confidence is "hypothesis"; optional when "data" or "signal."

2. Evaluation scaffold: Source quality field

scripts/new-run.sh evaluation scaffold now includes a ## Source quality section at the top: Real / Training fiction / Mixed. Surfaces provenance prominently rather than burying it in sources/01-source-capture.md. A reader should immediately know whether the run was built on real or invented material.

3. Claim-check: distinguish remove-vs-verify

Claim-check previously collapsed "cut this line" into a single correction category. The audit should preserve three dispositions:

rewrite
remove (wrong) — disqualified; do not restore
remove pending verification — potentially restorable if source confirms

Updated across three surfaces: prompts/07-claim-check.md, .claude/agents/pagekit-claim-checker.md, templates/claim-check-template.md.

4. Evaluation scaffold: weak-section to source-gap mapping

scripts/new-run.sh evaluation scaffold now requires that every section flagged as weak name the specific source material that would fix it. A weak section without a source gap named is a weak section shipping by choice, not by constraint.

Bug fix: claim-checker subagent corrupted the corrected draft

The Me CRM run's working-log reports:

"subagent left inline *[Rewritten: ...]* annotations in body copy; these were stripped manually; 2 new em-dashes introduced by rewrites (lines 41 and 65) were also fixed"

The pagekit-claim-checker subagent was introducing new slop while correcting old slop. Fixed with explicit hard rules in .claude/agents/pagekit-claim-checker.md:

No inline annotation markers in body copy. Provenance goes in the audit, not the corrected draft.
No new em-dashes introduced by rewrites (per frameworks/anti-slop.md).
Self-scan rewrites for flagged patterns before saving the corrected draft.

Mirrored in prompts/07-claim-check.md so chat users get the same enforcement.

Verified

bash scripts/doctor.sh → PASS
bash scripts/slop-check.sh → exit 0 clean
bash scripts/run-check.sh runs/vegan-dog-food-verdel → PUBLISHABLE
bash scripts/run-check.sh runs/personal-crm-founders → PUBLISHABLE
Fresh new-run.sh _test scaffold shows ## Source quality and ## Weak section to source-gap mapping sections in evaluation.md
templates/first-page-decision-template.md shows the falsification prompt
.claude/agents/pagekit-claim-checker.md contains the "Hard rules for the corrected draft" block

…subagent The personal-crm-founders run (first real agentic fully-logged run after the PR #10 enforcement tightenings) surfaced a 4-item punch list in its evaluator-pass. This PR applies all four. It also fixes a bug in the pagekit-claim-checker subagent that the run exposed. ## Punch list (from runs/personal-crm-founders/evaluator-pass.md) ### 1. First-page-decision template: falsification prompt templates/first-page-decision-template.md — added an 'If this is a hypothesis: what would falsify it?' sub-field under 'Confidence basis for this decision'. Stops hypothesis-level decisions from being silently promoted to conclusions. Required when confidence is 'hypothesis'; optional when 'data' or 'signal'. ### 2. Evaluation scaffold: Source quality field scripts/new-run.sh evaluation.md scaffold now includes a 'Source quality' section at the top: Real / Training fiction / Mixed. Surfaces the source provenance prominently in the evaluation rather than burying it one level down in sources/01-source-capture.md. A reader scanning the eval should immediately know whether the run was built on real or invented material. ### 3. Claim-check: distinguish remove-vs-verify Claim-check previously collapsed 'cut this line' into a single correction category. The audit should preserve the distinction between: - rewrite - remove (wrong) — disqualified; do not restore - remove pending verification — potentially restorable if source X confirms Updated three surfaces: - prompts/07-claim-check.md (canonical prompt) - .claude/agents/pagekit-claim-checker.md (subagent instructions) - templates/claim-check-template.md (audit format) ### 4. Evaluation scaffold: weak-section to source-gap mapping scripts/new-run.sh evaluation.md scaffold now requires that every section flagged as weak (in 'What stayed thin' or 'Where outputs drifted generic') name the specific source material that would fix it. A weak section without a source gap named is a weak section shipping by choice, not by constraint. ## Bug fix: claim-checker subagent corrupted the corrected draft The personal-crm-founders run reported the subagent left inline '*[Rewritten: ...]*' annotations in body copy and introduced two new em-dashes during rewrites. The working-log shows these had to be manually cleaned before the corrected draft could pass slop-check. Fixed in .claude/agents/pagekit-claim-checker.md with explicit hard rules for the corrected draft: - No inline annotation markers (*[Rewritten:...]* etc.) in body copy. Provenance belongs in the audit, not the corrected draft. - No new em-dashes introduced by rewrites (per frameworks/anti-slop.md). - Self-scan rewrites for flagged patterns before saving. Mirrored in prompts/07-claim-check.md so chat users get the same enforcement. ## Verified - scripts/doctor.sh PASS - scripts/slop-check.sh exit 0 clean - runs/vegan-dog-food-verdel still PUBLISHABLE - runs/personal-crm-founders still PUBLISHABLE - Fresh scaffold shows the new Source quality field and Weak-section- to-source-gap mapping sections - templates/first-page-decision-template.md shows the new falsification prompt - Fresh scaffold classifies as FULLY LOGGED (below PUBLISHABLE, as expected for an empty scaffold)

hnshah merged commit f7c5d10 into main Apr 15, 2026
1 check passed

hnshah deleted the claude/me-crm-evaluator-punch-list branch April 15, 2026 02:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply Me CRM evaluator-pass punch list + fix claim-checker subagent bug#11

Apply Me CRM evaluator-pass punch list + fix claim-checker subagent bug#11
hnshah merged 1 commit into
mainfrom
claude/me-crm-evaluator-punch-list

hnshah commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hnshah commented Apr 15, 2026

Punch list (from runs/personal-crm-founders/evaluator-pass.md)

1. First-page-decision template: falsification prompt

2. Evaluation scaffold: Source quality field

3. Claim-check: distinguish remove-vs-verify

4. Evaluation scaffold: weak-section to source-gap mapping

Bug fix: claim-checker subagent corrupted the corrected draft

Verified

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Punch list (from `runs/personal-crm-founders/evaluator-pass.md`)