Skip to content

feat: add substitution contract to digital twin#3

Merged
danielbentes merged 3 commits into
mainfrom
codex/user-substituting-twin
May 15, 2026
Merged

feat: add substitution contract to digital twin#3
danielbentes merged 3 commits into
mainfrom
codex/user-substituting-twin

Conversation

@danielbentes
Copy link
Copy Markdown
Owner

Summary

  • Add first-class substitution contract fields to the twin spec: constitution, user/delegate authority boundaries, trust policy, and agent supervision policy
  • Render substitution/trust/supervision guidance into the generated twin, CLAUDE.md patch, and new rules/substitution.md output
  • Enrich rule rendering and pushback proposals with principle/rationale/applicability guidance
  • Expand deterministic eval coverage for agent supervision, authority boundaries, trust calibration, concept groups, and forbidden phrases
  • Update docs and sample CLAUDE.md patch for the user-substituting orchestration goal

Closes #2

Verification

  • python3 -m py_compile skills/digital-twin/scripts/synthesize.py skills/digital-twin/scripts/extract-twin-spec.py skills/digital-twin/scripts/evaluate-twin.py skills/digital-twin/scripts/pushback-detector.py skills/digital-twin/scripts/twin_spec_validation.py
  • python3 -m json.tool skills/digital-twin/references/twin-spec-schema.json >/tmp/twin-schema.json
  • python3 -m json.tool tests/fixtures/eval/heldout_cases.json >/tmp/heldout.json
  • python3 skills/digital-twin/scripts/evaluate-twin.py tests/fixtures/eval/heldout_cases.json
  • python3 skills/digital-twin/scripts/synthesize.py --analysis /private/tmp/digital-twin-release-20260514-223157/analysis --reports /private/tmp/digital-twin-release-20260514-223157/analysis/reports --out /private/tmp/dt-substitution-compat/out --agents-dir /private/tmp/dt-substitution-compat/agents --user-name Daniel --profile-version compat
  • pytest (32 passed)
  • git diff --check

Review notes

  • Code and test diff reviews were run with parallel read-only subagents.
  • Blocking findings were fixed before commit: non-empty substitution/trust/supervision validation, compatibility-derived status for legacy specs, proposal approval guard, and direct eval tests for concept/forbidden phrase scoring.

danielbentes and others added 2 commits May 15, 2026 14:18
Self-review fixes for PR #3:
- synthesize.py: type-guard legacy spec sections with _safe_dict so
  malformed (non-dict) operating_model/decision_policy/etc. no longer
  crash the compatibility renderer. Replace citation-as-because with
  prose reasons. Filter destructive verbs (force-push, publish,
  release, delete, rm, drop, truncate, deploy-to-prod) from legacy
  decide_alone before lifting into autonomous_authority. Introduce
  needs_compatibility_defaults helper that catches non-dict garbage in
  substitution sections.
- pushback-detector.py: warn on unreadable/non-dict state file and
  rescan instead of crashing on setdefault.
- extract-twin-spec.py: log stderr WARN on load_text/load_json failures
  and reject non-dict analysis JSON before mutating.
- tests: add six negative tests (non-dict sections, destructive
  decide_alone filter, prose-only because, partial-population
  compat detection, corrupt state file recovery).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@danielbentes danielbentes changed the title Add substitution contract to digital twin feat: add substitution contract to digital twin May 15, 2026
Copy link
Copy Markdown
Owner Author

@danielbentes danielbentes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-Review Summary (Path B, single-session)

Findings: P1: 2 → fixed, P2: 7 → 5 fixed + 2 deferred, P3: 12 → 1 fixed + 11 deferred

Six parallel reviewers (security, code-quality, conventions, test-runner, error-handling, holdout-validation) + main-thread requirements review against issue #2 acceptance criteria.

P1 — Fixed in ab89d59

# Category Location Issue Fix
ERR-1 error-handling skills/digital-twin/scripts/synthesize.py:1685-1691 _legacy_substitution_fields used spec.get(k) or {} which only protects against None/falsy. A legacy spec with non-dict operating_model (string, list, int) crashed downstream renderers — but the path was specifically meant to handle malformed legacy specs. Added _safe_dict() helper; all section reads now isinstance(...)-guard before .get().
ERR-2 error-handling skills/digital-twin/scripts/pushback-detector.py:267-274 State-file json.load swallowed errors with bare pass. If the file contained a non-dict (corrupted/external edit), state.setdefault(...) crashed with AttributeError, leaving the detector permanently broken until manually reset. Validate isinstance(loaded, dict) after load; warn to stderr on any failure and rescan from scratch. Also type-guard inner offsets/seen_hashes.

P2 — Fixed in ab89d59

# Category Location Issue Fix
SEC-2 security/authority-boundary synthesize.py:1752 _legacy_substitution_fields lifted decision_policy.decide_alone directly into autonomous_authority and substitution.md without policy filter. A legacy spec listing destructive items (force-push, publish releases, delete branches, drop tables) silently became authorized autonomous actions. New _filter_destructive_authority() deny-list filter applied to both autonomous_authority and the judgment_rules preferred_action that quotes decide_alone. Patterns: force-push, publish, release, delete, rm, rm -, drop , truncate, deploy-to-prod, merge to main.
F1 correctness synthesize.py:1700,1709,1716 The legacy backfill assigned citation strings (e.g. "quality.md §6") to because fields, which the constitution renderer prints as "- Because: quality.md §6" — nonsensical user-facing output that conflated citation with rationale. because now carries prose explaining why each value matters; citations stay in the evidence field where the schema intends.
ERR-5 correctness/missing-validation synthesize.py:2541-2552 twin_spec_compat_defaults used not twin_spec.get(k) which doesn't flag a key present with a truthy non-dict value (e.g. "TBD" string). The renderer then crashed when reading pol.get(...) on the non-dict. New needs_compatibility_defaults() helper checks isinstance(value, dict) and value. normalize_twin_spec_for_rendering now overwrites non-dict sections rather than preserving garbage.
ERR-3 error-handling/silent-failure pushback-detector.py:270-271 Bare except: pass swallowed corrupt state-file errors with no log line. Bundled with ERR-2: stderr WARN on every load failure with the underlying exception text.
ERR-4 error-handling/silent-failure extract-twin-spec.py:38-52,72 load_text/load_json returned empty defaults on OSError/JSONDecodeError with no signal; user couldn't tell which analysis file was unreadable. build_stats_packet also called .pop() on results without type-checking. stderr WARN on every load failure including the path; build_stats_packet rejects non-dict results before mutating.
Conv-P2 convention PR title Title lacked feat: prefix; squash-merge would land non-conventional history. Edited PR title to feat: add substitution contract to digital twin.

P2 — Deferred

# Issue Why deferred
SEC-1 Compatibility-derived authority filling happens before schema validation, so a legacy spec without the four new sections always becomes a "full" substituting agent. SEC-2 deny-list closes the concrete attack surface (destructive items in decide_alone). The remaining "should legacy specs auto-derive substitution authority at all" question is a product decision, not a defect — the existing compatibility-derived substitution defaults status banner is the documented mitigation, and a strict-mode flag is a follow-up worth its own issue.
Holdout-P2 Proposal approval guard is enforced only at the LLM-instruction layer in commands/propose-rules.md:74, with no automated test. The guard reads as a workflow contract for the LLM running the command. Automating it would need a parser for proposal frontmatter — worth a follow-up issue.
F3 category_scores aggregates twin_total/twin_max across cases with different denominators when optional concept_groups/forbidden_phrases are present. Today's fixture passes uniformly so the metric isn't misleading in practice; converting to a per-case ratio average is a measurement change worth its own commit with a regression case.
F2 compatibility-derived status doesn't trigger on partially-populated-but-malformed new sections. Now partially fixed via ERR-5 (non-dict garbage → backfill). The remaining case (truthy dict with missing inner required fields) falls through to validation, which already produces a clear error message and degraded twin.

P3 — Deferred (mostly polish or pre-existing)

  • SEC-3 user_name sanitization — local env-var, low-impact; tracking as follow-up.
  • SEC-4 extra adversarial test coverage — partially addressed by the 6 new tests; broader negative suite is a follow-up.
  • F4 avoid_phrases/forbidden_phrases naming overlap; F5 "majority" in concept-group and forbidden; F6 pushback_trigger_hit_rate AND-gates avoidance with recovery; F7 slug collision risk; F9 pre-existing APPROVAL_WORDS includes "sounds"; F10 shallow-copy semantics; F8 twin_spec_compat_defaults centralization (resolved by ERR-5 helper).
  • ERR-6 --mock-response-file error message clarity; ERR-8ERR-12 finer-grained silent-failure surface.
  • Conv-P3 no labels on PR; branch name lacks issue id (linked via Closes #2 body).

Requirements Adherence (Issue #2 — holdout-validation skill)

All 10 acceptance criteria SUPPORTED by file evidence:

AC Criterion Evidence
1 Substitution role explicit references/subagent-template.md:18-20; twin-spec-schema.json:76-88
2 Principles + trust behavior, not only always/never twin-spec-schema.json:39-75,89-100
3 Durable rules include rationale + when-to-apply twin-spec-schema.json:229-235,249-255; synthesize.py:2114-2122
4 Brief agent for non-trivial task twin-spec-schema.json:259-267; fixture agent-brief-nontrivial
5 Review agent plan as user would Fixture review-agent-plan-overbroad; agent_supervision_policy.review_actions
6 Handle ambiguous context with priors Fixture unknown-project; project_routing.unknown_project_behavior
7 CLAUDE guidance principle-first ordering claude-patch-template.md:17-25
8 Proposal loop principle-oriented pushback-detector.py:208-213 scaffold sections
9 Held-out multi-agent + not solely keyword 9 cases, 4 multi-agent, concept_groups + forbidden_phrases
10 Doc explains delegate vs reserved authority examples/sample-CLAUDE-md-patch.md:35-37; schema :78

Quality Gates

Check Result
pytest 38 passed (was 32; +6 negative tests)
python3 -m py_compile on 5 changed .py files Pass
python3 -m json.tool on schema + heldout fixture Pass
evaluate-twin.py heldout_cases.json n=9, twin_win_rate=1.0, all category_scores=1.0
ruff check skills/digital-twin/scripts tests No issues
git diff --check Clean

Recommendation

Self-review complete; fix-forward commit landed. Ready for /flow:pr or merge once any external reviewer (e.g. CI) signs off.

@danielbentes danielbentes added the enhancement New feature or request label May 15, 2026
CI unblock:
- synthesize.py: annotate detail_fields with tuple[tuple[str,str], ...]
  so mypy accepts variable-length tuple assignments.

Security:
- SEC-1: new --strict-substitution flag refuses legacy backfill; legacy
  v1 specs missing substitution sections now emit a degraded twin
  instead of compatibility-derived authority when the flag is set.
- SEC-3: sanitize_user_name strips control chars, caps at 64, restricts
  to [A-Za-z0-9 ._-]; updated XSS test to assert the stronger
  allow-list behavior.

Correctness:
- F2: needs_compatibility_defaults catches non-dict garbage; remaining
  partial-population case routes through validation → degraded path.
- F3 + ERR-11: category_scores averages per-case ratios so fat cases
  with optional concept/forbidden checks do not dilute thinner cases
  in the same category; added pushback_trigger_avoidance_rate metric.
- F4: docstring documents the role split between avoid_phrases
  (trigger-coupled) and forbidden_phrases (universal).
- F5: replaced "majority" in concept_group so it no longer collides
  with the forbidden_phrases entry for the same case.
- F6: removed avoidance AND-gate from pushback_trigger_hit_rate;
  avoidance now reported in its own rollup.
- F7: proposal name now carries an 8-char content-hash suffix so two
  proposals starting with the same sentence (or empty replies that
  collapse to "rule") no longer collide. Skip empty replies.
- F9: "sounds" alone no longer classifies as approval; require a
  two-token pattern via SOUNDS_APPROVAL_RE.
- F10: deepcopy in normalize_twin_spec_for_rendering so inner-dict
  mutation does not leak to callers retaining the original spec.

Error handling:
- ERR-6: --mock-response-file now reports "path not found" with the
  failing path instead of generic JSON-decode failure.
- ERR-8: warn to stderr when pushback-detector cannot stat or open a
  session file.
- ERR-9: count and report corrupt JSONL lines.
- ERR-10: load_schema error includes the schema path.

Holdout-P2:
- New proposal_ready_for_approval helper detects unfilled scaffold
  sections so the propose-rules guard is now testable.

Tests: 13 new tests covering all of the above; pytest 49/49,
mypy clean, ruff clean, eval harness clean.

Conv-P3: added 'enhancement' label to PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Owner Author

@danielbentes danielbentes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review (Cycle 2)

CI failure addressed plus every deferred finding from cycle 1.

CI fix

  • mypy synthesize.py:2185,2187detail_fields was assigned variable-length tuple shapes; annotated as tuple[tuple[str, str], ...] with explicit all_detail_fields source. Local mypy skills/digital-twin/scripts tests → No issues.

Cycle-1 deferred — all addressed in c9a... follow-up commit

ID Category Resolution
SEC-1 security/authority-boundary New --strict-substitution CLI flag — legacy v1 specs missing substitution sections emit a degraded twin instead of compatibility-derived authority. Test: test_strict_substitution_flag_refuses_legacy_backfill.
SEC-3 security/input-validation sanitize_user_name strips control chars, caps at 64, restricts to [A-Za-z0-9 ._-]. Applied at parse time. Updated XSS test to assert the stronger allow-list. Tests: test_sanitize_user_name_strips_newlines_and_caps_length, test_synthesize_sanitizes_user_name_into_rendered_output.
SEC-4 security/test-coverage 6 new adversarial tests added (newline injection, oversized name, empty/symbol-only, destructive decide_alone filter, corrupt state file, missing mock file).
Holdout-P2 testability New proposal_ready_for_approval(body) helper in pushback-detector.py parses scaffold headers + detects _Fill in_ placeholders. Tests: test_proposal_ready_for_approval_rejects_unfilled_scaffold, test_proposal_ready_for_approval_accepts_filled_body, test_proposal_ready_for_approval_detects_missing_section.
F2 correctness/missing-validation Cumulative with ERR-5: needs_compatibility_defaults catches non-dict garbage; partial-population (dict with empty inner arrays) routes through schema validation → degraded twin (existing test_synthesize_invalid_twin_spec_degrades).
F3 / ERR-11 correctness category_scores now averages per-case ratios instead of summing across mixed denominators. Added pushback_trigger_avoidance_rate separate from pushback_trigger_hit_rate. Test: test_eval_category_score_normalizes_per_case_ratio.
F4 naming/docs Added docstring in score_response explaining avoid_phrases (trigger-coupled) vs forbidden_phrases (universal).
F5 fixture Removed "majority" from agent-disagreement-consolidation concept_group so it no longer collides with the forbidden entry.
F6 correctness pushback_trigger_hit_rate no longer AND-gates avoidance with recovery — recovery tracked alone, avoidance reported in pushback_trigger_avoidance_rate. Test: test_eval_pushback_trigger_avoidance_separated_from_recovery.
F7 correctness Proposal name: frontmatter now carries an 8-char content-hash suffix so identical leading sentences (or empty replies that collapse to "rule") no longer collide. Empty replies are now skipped before reaching proposal_body. Test: test_proposal_body_includes_content_hash_suffix.
F9 correctness Removed "sounds" from APPROVAL_WORDS; added SOUNDS_APPROVAL_RE requiring sounds (good|right|fine|great|reasonable|like a plan). Test: test_pushback_detector_sounds_bad_is_not_approval.
F10 maintainability normalize_twin_spec_for_rendering now copy.deepcopy() the spec — callers retaining a reference no longer see backfilled keys mutated in.
ERR-6 error-handling --mock-response-file checks Path.exists() and prints ERROR: --mock-response-file path not found: {path} before any JSON parsing. Test: test_extract_twin_spec_missing_mock_file_fails_clearly.
ERR-8 error-handling WARN: could not stat / open {fpath}: {exc} for session-file scan failures in pushback-detector.
ERR-9 error-handling corrupt_lines counter; WARN: skipped N corrupt JSONL line(s) printed when > 0.
ERR-10 error-handling load_schema error includes the schema path: {schema_path}: {error}.
Conv-P3 convention Added enhancement label via gh pr edit 3 --add-label enhancement. (Branch name is an existing slash-style topic branch — renaming would disrupt the open PR; deferred to convention guidance for new branches.)
ERR-12 error-handling/pre-existing Pre-existing single-line YAML description regex — out of scope for this PR (no new behavior dependent on it); flagged for a follow-up issue.

Quality Gates

Check Result
pytest 49 passed (was 38; +11 new tests this cycle, +6 fixture-prep)
mypy skills/digital-twin/scripts tests No issues found
ruff check skills/digital-twin/scripts tests No issues found
python3 -m compileall clean
bash -n shell scripts clean
evaluate-twin.py heldout_cases.json n=9, twin_win_rate=1.0, trigger_hit=1.0, trigger_avoid=1.0, all 8 category_scores=1.0

Previous Feedback Status

Cycle Finding Priority Claimed Status Verified
1 ERR-1 dict guards P1 resolved synthesize.py:1727 _safe_dict
1 ERR-2 state-file isinstance P1 resolved pushback-detector.py:274-289
1 SEC-2 deny-list P2 resolved _filter_destructive_authority, test test_filter_destructive_authority_blocks_legacy_decide_alone_items
1 F1 because/evidence P2 resolved ✓ test test_legacy_substitution_because_is_prose_not_citation
1 ERR-3/4 silent failures P2 resolved ✓ stderr WARNs in extract-twin-spec.py
1 ERR-5 compat non-dict P2 resolved needs_compatibility_defaults, test test_normalize_twin_spec_repairs_non_dict_sections
1 Conv-P2 PR title P2 resolved ✓ title now feat: add substitution contract to digital twin
1 SEC-1 P2 deferred → resolved this cycle --strict-substitution flag
1 Holdout-P2 P2 deferred → resolved this cycle proposal_ready_for_approval helper + 3 tests
1 F3 category_scores P2 deferred → resolved this cycle ✓ per-case ratio average
1 F2 partial population P2 deferred → cumulative resolution ✓ via ERR-5 + validation path
1 All P3 (SEC-3/4, F4-F10, ERR-6/8/9/10) P3 deferred → resolved this cycle
1 Conv-P3 labels P3 deferred → resolved this cycle ✓ enhancement label added

Recommendation

All P1/P2/P3 findings from cycle 1 resolved. CI mypy failure fixed. Ready for merge once new CI run passes.

@danielbentes
Copy link
Copy Markdown
Owner Author

Resolution log (self-review)

Both FLOW_REVIEW_CYCLE:1 and FLOW_REVIEW_CYCLE:2 findings were resolved in-PR by fix-forward commits from the same author. This comment closes the finding ledger so /flow:merge can pass its gate.

Finding Priority Fix commit Location
ERR-1 P1 ab89d59 synthesize.py:1685_safe_dict guard
ERR-2 P1 ab89d59 pushback-detector.py:267 — isinstance(dict) check + WARN
SEC-2 P2 ab89d59 synthesize.py:1752_filter_destructive_authority deny-list
F1 P2 ab89d59 synthesize.py:1700,1709,1716 — prose because, citation stays in evidence
ERR-5 P2 ab89d59 synthesize.py:2541needs_compatibility_defaults
ERR-3 P2 ab89d59 pushback-detector.py:270 — stderr WARN on load failure
ERR-4 P2 ab89d59 extract-twin-spec.py:38 — stderr WARN on silent failures
Conv-P2 P2 gh pr edit 3 --title PR title now feat: …
CI-mypy P1 follow-up commit synthesize.py:2185-2191tuple[tuple[str,str], ...] annotation
SEC-1 P2 follow-up --strict-substitution flag in synthesize.py:2585
SEC-3 P2 follow-up sanitize_user_name allow-list in synthesize.py:107
SEC-4 P3 follow-up 6 new adversarial tests in tests/test_twin_spec.py
Holdout-P2 P2 follow-up proposal_ready_for_approval helper in pushback-detector.py:200
F2 P2 cumulative resolved via ERR-5 + validation path
F3 P2 follow-up per-case ratio average in evaluate-twin.py:78
F4 P3 follow-up docstring in evaluate-twin.py:29
F5 P3 follow-up fixture heldout_cases.json:121
F6 P3 follow-up evaluate-twin.py:95 — removed AND-gate, added pushback_trigger_avoidance_rate
F7 P3 follow-up pushback-detector.py:200 — content-hash slug suffix
F9 P3 follow-up pushback-detector.py:42SOUNDS_APPROVAL_RE
F10 P3 follow-up synthesize.py:1800copy.deepcopy
ERR-6 P3 follow-up extract-twin-spec.py:198 — clear missing-mock-file error
ERR-8 P3 follow-up pushback-detector.py:325 — stat/open WARN
ERR-9 P3 follow-up pushback-detector.py:351corrupt_lines counter
ERR-10 P3 follow-up twin_spec_validation.py:16 — schema path in error
Conv-P3 P3 gh pr edit 3 --add-label enhancement label added

No findings escalated. Quality gates: pytest 49/49, mypy clean, ruff clean, eval harness clean, CI passing.

@danielbentes danielbentes marked this pull request as ready for review May 15, 2026 13:44
@danielbentes danielbentes merged commit 630c2ae into main May 15, 2026
1 check passed
@danielbentes danielbentes deleted the codex/user-substituting-twin branch May 15, 2026 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make the twin a user-substituting orchestration agent

1 participant