Skip to content

Apply Me CRM evaluator-pass punch list + fix claim-checker subagent bug#11

Merged
hnshah merged 1 commit into
mainfrom
claude/me-crm-evaluator-punch-list
Apr 15, 2026
Merged

Apply Me CRM evaluator-pass punch list + fix claim-checker subagent bug#11
hnshah merged 1 commit into
mainfrom
claude/me-crm-evaluator-punch-list

Conversation

@hnshah
Copy link
Copy Markdown
Owner

@hnshah hnshah commented Apr 15, 2026

The personal-crm-founders run (first real agentic fully-logged run after the PR #10 enforcement tightenings) surfaced a 4-item punch list in its evaluator-pass.md. This PR applies all four. It also fixes a bug in the pagekit-claim-checker subagent that the run exposed.

This is the run-to-repo-improvement loop working exactly as designed: run produces evaluator-pass produces specific repo changes in the very next PR.

Punch list (from runs/personal-crm-founders/evaluator-pass.md)

1. First-page-decision template: falsification prompt

templates/first-page-decision-template.md — added an "If this is a hypothesis: what would falsify it?" sub-field under Confidence basis. Stops hypothesis-level decisions from being silently promoted to conclusions. Required when confidence is "hypothesis"; optional when "data" or "signal."

2. Evaluation scaffold: Source quality field

scripts/new-run.sh evaluation scaffold now includes a ## Source quality section at the top: Real / Training fiction / Mixed. Surfaces provenance prominently rather than burying it in sources/01-source-capture.md. A reader should immediately know whether the run was built on real or invented material.

3. Claim-check: distinguish remove-vs-verify

Claim-check previously collapsed "cut this line" into a single correction category. The audit should preserve three dispositions:

  • rewrite
  • remove (wrong) — disqualified; do not restore
  • remove pending verification — potentially restorable if source confirms

Updated across three surfaces: prompts/07-claim-check.md, .claude/agents/pagekit-claim-checker.md, templates/claim-check-template.md.

4. Evaluation scaffold: weak-section to source-gap mapping

scripts/new-run.sh evaluation scaffold now requires that every section flagged as weak name the specific source material that would fix it. A weak section without a source gap named is a weak section shipping by choice, not by constraint.

Bug fix: claim-checker subagent corrupted the corrected draft

The Me CRM run's working-log reports:

"subagent left inline *[Rewritten: ...]* annotations in body copy; these were stripped manually; 2 new em-dashes introduced by rewrites (lines 41 and 65) were also fixed"

The pagekit-claim-checker subagent was introducing new slop while correcting old slop. Fixed with explicit hard rules in .claude/agents/pagekit-claim-checker.md:

  • No inline annotation markers in body copy. Provenance goes in the audit, not the corrected draft.
  • No new em-dashes introduced by rewrites (per frameworks/anti-slop.md).
  • Self-scan rewrites for flagged patterns before saving the corrected draft.

Mirrored in prompts/07-claim-check.md so chat users get the same enforcement.

Verified

  • bash scripts/doctor.sh → PASS
  • bash scripts/slop-check.sh → exit 0 clean
  • bash scripts/run-check.sh runs/vegan-dog-food-verdel → PUBLISHABLE
  • bash scripts/run-check.sh runs/personal-crm-founders → PUBLISHABLE
  • Fresh new-run.sh _test scaffold shows ## Source quality and ## Weak section to source-gap mapping sections in evaluation.md
  • templates/first-page-decision-template.md shows the falsification prompt
  • .claude/agents/pagekit-claim-checker.md contains the "Hard rules for the corrected draft" block

…subagent

The personal-crm-founders run (first real agentic fully-logged run
after the PR #10 enforcement tightenings) surfaced a 4-item punch list
in its evaluator-pass. This PR applies all four. It also fixes a bug
in the pagekit-claim-checker subagent that the run exposed.

## Punch list (from runs/personal-crm-founders/evaluator-pass.md)

### 1. First-page-decision template: falsification prompt
templates/first-page-decision-template.md — added an
'If this is a hypothesis: what would falsify it?' sub-field under
'Confidence basis for this decision'. Stops hypothesis-level
decisions from being silently promoted to conclusions. Required
when confidence is 'hypothesis'; optional when 'data' or 'signal'.

### 2. Evaluation scaffold: Source quality field
scripts/new-run.sh evaluation.md scaffold now includes a
'Source quality' section at the top: Real / Training fiction / Mixed.
Surfaces the source provenance prominently in the evaluation rather
than burying it one level down in sources/01-source-capture.md. A
reader scanning the eval should immediately know whether the run
was built on real or invented material.

### 3. Claim-check: distinguish remove-vs-verify
Claim-check previously collapsed 'cut this line' into a single
correction category. The audit should preserve the distinction
between:
  - rewrite
  - remove (wrong)       — disqualified; do not restore
  - remove pending verification — potentially restorable if source X confirms
Updated three surfaces:
  - prompts/07-claim-check.md (canonical prompt)
  - .claude/agents/pagekit-claim-checker.md (subagent instructions)
  - templates/claim-check-template.md (audit format)

### 4. Evaluation scaffold: weak-section to source-gap mapping
scripts/new-run.sh evaluation.md scaffold now requires that every
section flagged as weak (in 'What stayed thin' or 'Where outputs
drifted generic') name the specific source material that would fix
it. A weak section without a source gap named is a weak section
shipping by choice, not by constraint.

## Bug fix: claim-checker subagent corrupted the corrected draft

The personal-crm-founders run reported the subagent left inline
'*[Rewritten: ...]*' annotations in body copy and introduced two new
em-dashes during rewrites. The working-log shows these had to be
manually cleaned before the corrected draft could pass slop-check.

Fixed in .claude/agents/pagekit-claim-checker.md with explicit
hard rules for the corrected draft:
  - No inline annotation markers (*[Rewritten:...]* etc.) in body
    copy. Provenance belongs in the audit, not the corrected draft.
  - No new em-dashes introduced by rewrites (per frameworks/anti-slop.md).
  - Self-scan rewrites for flagged patterns before saving.

Mirrored in prompts/07-claim-check.md so chat users get the same
enforcement.

## Verified

- scripts/doctor.sh PASS
- scripts/slop-check.sh exit 0 clean
- runs/vegan-dog-food-verdel still PUBLISHABLE
- runs/personal-crm-founders still PUBLISHABLE
- Fresh scaffold shows the new Source quality field and Weak-section-
  to-source-gap mapping sections
- templates/first-page-decision-template.md shows the new
  falsification prompt
- Fresh scaffold classifies as FULLY LOGGED (below PUBLISHABLE, as
  expected for an empty scaffold)
@hnshah hnshah merged commit f7c5d10 into main Apr 15, 2026
1 check passed
@hnshah hnshah deleted the claude/me-crm-evaluator-punch-list branch April 15, 2026 02:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants