Apply Me CRM evaluator-pass punch list + fix claim-checker subagent bug#11
Merged
Conversation
…subagent The personal-crm-founders run (first real agentic fully-logged run after the PR #10 enforcement tightenings) surfaced a 4-item punch list in its evaluator-pass. This PR applies all four. It also fixes a bug in the pagekit-claim-checker subagent that the run exposed. ## Punch list (from runs/personal-crm-founders/evaluator-pass.md) ### 1. First-page-decision template: falsification prompt templates/first-page-decision-template.md — added an 'If this is a hypothesis: what would falsify it?' sub-field under 'Confidence basis for this decision'. Stops hypothesis-level decisions from being silently promoted to conclusions. Required when confidence is 'hypothesis'; optional when 'data' or 'signal'. ### 2. Evaluation scaffold: Source quality field scripts/new-run.sh evaluation.md scaffold now includes a 'Source quality' section at the top: Real / Training fiction / Mixed. Surfaces the source provenance prominently in the evaluation rather than burying it one level down in sources/01-source-capture.md. A reader scanning the eval should immediately know whether the run was built on real or invented material. ### 3. Claim-check: distinguish remove-vs-verify Claim-check previously collapsed 'cut this line' into a single correction category. The audit should preserve the distinction between: - rewrite - remove (wrong) — disqualified; do not restore - remove pending verification — potentially restorable if source X confirms Updated three surfaces: - prompts/07-claim-check.md (canonical prompt) - .claude/agents/pagekit-claim-checker.md (subagent instructions) - templates/claim-check-template.md (audit format) ### 4. Evaluation scaffold: weak-section to source-gap mapping scripts/new-run.sh evaluation.md scaffold now requires that every section flagged as weak (in 'What stayed thin' or 'Where outputs drifted generic') name the specific source material that would fix it. A weak section without a source gap named is a weak section shipping by choice, not by constraint. ## Bug fix: claim-checker subagent corrupted the corrected draft The personal-crm-founders run reported the subagent left inline '*[Rewritten: ...]*' annotations in body copy and introduced two new em-dashes during rewrites. The working-log shows these had to be manually cleaned before the corrected draft could pass slop-check. Fixed in .claude/agents/pagekit-claim-checker.md with explicit hard rules for the corrected draft: - No inline annotation markers (*[Rewritten:...]* etc.) in body copy. Provenance belongs in the audit, not the corrected draft. - No new em-dashes introduced by rewrites (per frameworks/anti-slop.md). - Self-scan rewrites for flagged patterns before saving. Mirrored in prompts/07-claim-check.md so chat users get the same enforcement. ## Verified - scripts/doctor.sh PASS - scripts/slop-check.sh exit 0 clean - runs/vegan-dog-food-verdel still PUBLISHABLE - runs/personal-crm-founders still PUBLISHABLE - Fresh scaffold shows the new Source quality field and Weak-section- to-source-gap mapping sections - templates/first-page-decision-template.md shows the new falsification prompt - Fresh scaffold classifies as FULLY LOGGED (below PUBLISHABLE, as expected for an empty scaffold)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The personal-crm-founders run (first real agentic fully-logged run after the PR #10 enforcement tightenings) surfaced a 4-item punch list in its
evaluator-pass.md. This PR applies all four. It also fixes a bug in thepagekit-claim-checkersubagent that the run exposed.This is the run-to-repo-improvement loop working exactly as designed: run produces evaluator-pass produces specific repo changes in the very next PR.
Punch list (from
runs/personal-crm-founders/evaluator-pass.md)1. First-page-decision template: falsification prompt
templates/first-page-decision-template.md— added an "If this is a hypothesis: what would falsify it?" sub-field under Confidence basis. Stops hypothesis-level decisions from being silently promoted to conclusions. Required when confidence is "hypothesis"; optional when "data" or "signal."2. Evaluation scaffold: Source quality field
scripts/new-run.shevaluation scaffold now includes a## Source qualitysection at the top: Real / Training fiction / Mixed. Surfaces provenance prominently rather than burying it insources/01-source-capture.md. A reader should immediately know whether the run was built on real or invented material.3. Claim-check: distinguish remove-vs-verify
Claim-check previously collapsed "cut this line" into a single correction category. The audit should preserve three dispositions:
Updated across three surfaces:
prompts/07-claim-check.md,.claude/agents/pagekit-claim-checker.md,templates/claim-check-template.md.4. Evaluation scaffold: weak-section to source-gap mapping
scripts/new-run.shevaluation scaffold now requires that every section flagged as weak name the specific source material that would fix it. A weak section without a source gap named is a weak section shipping by choice, not by constraint.Bug fix: claim-checker subagent corrupted the corrected draft
The Me CRM run's working-log reports:
The
pagekit-claim-checkersubagent was introducing new slop while correcting old slop. Fixed with explicit hard rules in.claude/agents/pagekit-claim-checker.md:frameworks/anti-slop.md).Mirrored in
prompts/07-claim-check.mdso chat users get the same enforcement.Verified
bash scripts/doctor.sh→ PASSbash scripts/slop-check.sh→ exit 0 cleanbash scripts/run-check.sh runs/vegan-dog-food-verdel→ PUBLISHABLEbash scripts/run-check.sh runs/personal-crm-founders→ PUBLISHABLEnew-run.sh _testscaffold shows## Source qualityand## Weak section to source-gap mappingsections inevaluation.mdtemplates/first-page-decision-template.mdshows the falsification prompt.claude/agents/pagekit-claim-checker.mdcontains the "Hard rules for the corrected draft" block