Releases: MatrixFounder/Agentic-development
v3.20.12 - Framework different refinements
Full Changelog: v3.19.0...v3.20.12
v3.20.12 — Corrected description of the framework's helper tools (additional + fallback)
The framework ships a small set of helper tools — generate a unique archive filename, run tests, check/stage/commit with git, and read/write/list files. A previous release mistakenly described this whole subsystem as obsolete and unused. This release fixes the description only: behaviour, code, and files are unchanged, and nothing was deleted or moved.
Changed
- The helper tools now fall into two clear groups. The archive-filename generator has no equivalent among a coding assistant's own tools, so it is always used — run it with
python3 .agent/tools/task_id_tool.py <name>. The rest (run tests, git, file read/write/list) duplicate what an assistant already does on its own, so they act as a fallback for assistants that lack those built-ins. The code that runs an individual tool is implemented and tested; the piece that would let an assistant drive these tools by itself is documented honestly as not yet built. - The tool list was restored in the assistant setup files —
AGENTS.md(Cursor, Codex) andGEMINI.md(Gemini CLI, Antigravity) — where the previous release had removed it. Each entry was written to match that assistant's own official documentation for running commands and adding extra tools. The Claude Code setup file was already correct and left unchanged. - Supporting documents were brought in line —
ORCHESTRATOR.md,SOURCE_OF_TRUTH.md,SKILLS.md, andRELEASE_CHECKLIST.mdnow present the subsystem as additional/fallback tools, and a leftover reference to a tool that never actually existed was corrected. Nothing was archived; all automated checks pass.
v3.20.11 — Vendor-Currency: Tool-Layer Reword + GEMINI.md Symlink Re-sync
Follow-up to the System/Agents cross-vendor audit (items 1, 2, 4; item 3 — version-header re-stamp — intentionally skipped). Framing-only, zero pipeline-behavior change. Task 082, gate artifact docs/reviews/framework-audit-082.md. Scope: "reword prompts only" — schemas.py / tool_runner.py / ORCHESTRATOR.md left in place. Gates 43/43, pytest 30/30.
Changed
- Dead tool-dispatch framing retired in the two core role-prompts and two bootstrap files.
00_agent_development.md,01_orchestrator.md,AGENTS.md,GEMINI.mdno longer instruct the orchestrator to call a standalone-Pythonrun_tests/git_status/git_ops/file_ops/execute_tooldispatcher (imported only by its own tests; used by no current vendor harness). They now say: use your harness's built-in file/shell/search tools, and run repo Python helpers via the shell.CLAUDE.mdwas already correct and is unchanged (it is the donor template).System/Docs/ORCHESTRATOR.mdis now referenced as legacy. task_boundaryfiction removed.00_agent_development.md(General Concept, §2, anti-patterns) andGEMINI.md(workflow dispatch) no longer describe atask_boundarytool/protocol for state tracking; state is persisted viaskill-session-state(update_state.py) at phase boundaries.
Fixed
- GEMINI.md symlink-resolution gap. Gemini CLI's bootstrap file was a stale fork (~v3.15) missing the SYMLINK RESOLUTION + SYMLINK-AWARE COMMAND DEFAULTS protocol that
AGENTS.mdreceived in v3.19.1. Ported both sections — closes a silent skill-load failure when the framework is deployed via symlinks (Gemini'sfind/rgskip symlinks by default).
v3.20.10 — Item 6 In-Repo Complete: Antigravity Adapter + Vendor Dispatch (6d) + Wave-5 Generator (6e)
Finishes the in-repo half of roadmap item 6 (C-07). After this, item 6's only remaining piece is operator e2e validation on real CLIs. Task 081, gate artifact docs/reviews/framework-audit-081.md. Doc/script-only, gates 43/43, pytest 30/30; the only severity escalation in the merge logic remains R3b.
Added
- Google Antigravity adapter (4th vendor) —
references/antigravity.mdstub→full, verified via web (primary docs render client-side; corroborated from antigravity.google/docs/agent-manager + Google-Cloud/Medium + DataCamp + gemini-cli discussion #27305). Dynamic-first architecture documented (orchestrator spawns subagents on the fly, no config files) alongside the static custom-agent form (agent.jsonat~/.gemini/antigravity-cli/agents/<name>/). Async parallel ✅. Detection ambiguity recorded honestly — Antigravity sharesAGENTS.md(Codex) and~/.gemini/(Gemini); a provisional.antigravity/marker is used pending validation. - Wave-5 wrapper generator (6e) —
scripts/generate_wrappers.py+scripts/wrappers_manifest.json: one manifest → 12 critic wrappers across 4 scaffold vendors (Gemini MD+YAML, Codex TOML, Cursor MD+YAML, Antigravity JSON) in their native formats, all pointing at the same SOT skills + enum. Claude Code excluded (validated reference/donor, hand-maintained).--checkmode exits non-zero on drift (CI-gateable). Hand-sync of scaffold wrappers is eliminated.
Changed
- 6d — sequential-fallback demoted (C-07 "functionally equivalent" claim removed from
vdd-multi.md+SKILL.md §7):vdd-multi's "Fallback (Sequential)" section is now "Vendor dispatch" — resolve the runtime (skill §1) → use its native parallel adapter (Codex/Cursor/Antigravity ✅, Gemini Layer-A pending); sequential role-switching is the documented last resort (primitive-less runtime / single-session debug / 1-slot CI), explicitly not functionally equivalent to parallel. All flags + the evidence contract honored on every path. - 6e — drift-grep extended (
KNOWN_ISSUES.md) to all 5 wrapper dirs (.claude/.gemini/.codex/.cursor/.antigravity/agents/); documents that scaffold wrappers are generated (edit the manifest, run the generator, never hand-edit). - Detection table (
SKILL.md §1.1) Antigravity row updated (provisional marker + ambiguity).skill-parallel-orchestration3.6→3.7.
Still open under item 6
- Operator e2e validation on real Codex / Cursor / Antigravity / Gemini installs — graduates each
⚠️ SCAFFOLD → ✅. Until then the banners stay and sequential remains the proven path. Roadmap item 6 stays 🔜 (6a–6e in-repo ✅ · validation ⏳).
v3.20.9 — Vendor Adapter Scaffolds: Codex / Gemini / Cursor (roadmap item 6, sub-tasks 6a–6c, in-repo portion)
In-repo scaffolds for parallel-critic dispatch on three non-Claude runtimes, authored from primary-source docs (geminicli.com, developers.openai.com/codex, cursor.com — verified in-session 2026-06-10). Everything ships /vdd-multi --no-fix per CLI (hardware/accounts the operator does not have right now — explicitly deferred). Task 080, gate artifact docs/reviews/framework-audit-080.md. Doc-only diff, gates 43/43, pytest 30/30.
Added
- Three vendor references (
skill-parallel-orchestration/references/):codex-cli.md(NEW),gemini-cli.md+cursor.md(stub→full), each atclaude-code.mddepth — concept→primitive mapping, Layer A pattern, read-only critic enforcement, wrapper catalog, validation gate. - Nine thin critic wrappers at real runtime paths, all pointing at the same SOT skills (
vdd-adversarial,skill-adversarial-security,skill-adversarial-performance) with the sameclean-pass | issues-found | bikeshedding-onlyenum:.codex/agents/critic-*.toml(×3,sandbox_mode="read-only"),.gemini/agents/critic-*.md(×3, read-onlytools),.cursor/agents/critic-*.md(×3,readonly: true). - Detection table (
SKILL.md §1.1) gains a Codex row (.codex/agents/); Gemini/Cursor statuses restated as "Scaffold — documented, not validated". First-match-wins keeps Claude Code precedence in this repo.
Verified against primary docs
- Codex CLI — TOML in
.codex/agents/; parallel confirmed ("spawns in parallel, waits for all, consolidates");sandbox_mode="read-only"maps to the read-only critic guarantee. - Cursor 2.4 — Markdown+YAML in
.cursor/agents/; parallel confirmed (max 10);readonly: trueis purpose-built for reviewer subagents;is_background= async (Layer B, deferred). - Gemini CLI — Markdown+YAML in
.gemini/agents/;⚠️ parallel multi-spawn NOT documented (only auto-delegation +@subagent). The scaffold records this gap honestly and corrects the roadmap's earlier optimistic "concurrent subagents" note; the multi-critic flow on Gemini stays sequential-delegation until a real run proves Layer A.
Deferred (still open under item 6)
- e2e validation on real CLIs (operator action), 6d (sequential-fallback demotion +
vdd-multi"Vendor dispatch" rewrite), 6e (drift-grep extension + Wave-5 wrapper generator).skill-parallel-orchestration3.5→3.6. Roadmap item 6 stays 🔜 — scaffolds authored, not validated.
v3.20.8 — Tier-Diverse Escalation Demoted to Tag-Only (mini-exp 078 refuted the premise)
The R3c tier-diverse +1 escalation shipped in v3.20.7 was an explicit pilot. Mini-experiment 078 (docs/reviews/tier-diverse-experiment-078.md, fresh sealed corpus, 3 arms, 18 bugs) refuted its sole premise: cross-tier critic agreement was less precise than same-tier (overlap precision 0.66 vs 0.73), so escalating severity on it would manufacture false positives. This release translates that finding into the rule — the escalation is demoted to a tier-diverse provenance tag with no +1. The --models config is retained (078 validated it as a recall/coverage tool: D-tier hit the highest recall, 100% pooled). Task 079, gate artifact docs/reviews/framework-audit-079.md. Zero mechanics regression beyond the inten...
v3.19.0 — Multi-Critic Objective Convergence (parallel adversarial pipeline)
Full Changelog: v3.17.0...v3.19.0
v3.19.0 — Multi-Critic Objective Convergence (parallel adversarial pipeline)
Follow-up to v3.18.0: the parallel critics (critic-logic / critic-security / critic-performance) still self-certified convergence via a subjective hallucinating state — the same gameable pattern v3.18.0 removed from Sarcasmotron, and worse, /vdd-multi's Phase-3 termination marked a category done on it. Replaced with an objective state, so both the termination gate and the merge noise-filter are objective.
Changed
- Critic
Convergence signalenumclean-pass | issues-found | hallucinating→clean-pass | issues-found | **bikeshedding-only**across the three critic agents, withbikeshedding-onlydefined objectively ("no legitimate findings remain — only style/nits; NOT 'forced to invent problems'"). /vdd-multiPhase-3 termination now marks a category ✓ on the objectivebikeshedding-only/clean-passstate instead of "critic inventing problems".- Merge noise-filter (
vdd-multi.md+skill-parallel-orchestration) re-keyed offbikeshedding-only; the "drop a converged critic's low-severity items this iteration" mechanic is unchanged. Satellite references (skill-parallel-orchestration§2.3,examples/usage_example.md,references/sequential-fallback.md) refreshed to the objective terminology.
Unchanged (invariants)
- All other merge rules — location dedup (±3 lines), cross-category re-attribution, severity escalation on independent overlap,
--severityfilter, iteration cap — and the Layer A / Layer B decision rule are byte-identical. Skill gate 43/43; VDD adversarial review APPROVED with zero findings.
v3.18.0 — Reviewers Hardening (provable clean review + objective Sarcasmotron exit, cross-vendor)
Three reviewer weaknesses, plus a cross-vendor backup gap, hardened without merging or re-toning the two review roles. The Code Reviewer's clean pass is now provable; its output contract is converged across four drifting definitions; the Sarcasmotron exit is moved off a subjective trigger onto an objective bar across every authoritative definition; and /framework-upgrade now backs up all vendor bootstrap files. has_critical_issues and the orchestrator DECISION TABLE are byte-for-byte unchanged — control-flow is identical before and after.
Added
- Code Reviewer "Verified" block — when
has_critical_issues = false, the report must carry a plain-markdown block proving the scope of the clean pass (requirements cross-checked + edge cases considered), so "looked and clean" is distinguishable from "didn't look". Body text only — never a structured key — so it cannot affect control-flow (09_code_reviewer_prompt.md). - Objective Convergence — the Sarcasmotron exit is now bound to an objective bar (full test run executed · 0 CRITICAL · 0 legitimate logic/security/slop findings · only bikeshedding left), replacing the subjective "forced to invent nitpicks → approve" trigger that let a lazy/sycophantic model exit early.
Changed
- Reviewer output contract converged to one superset
{ review_status, has_critical_issues, e2e_tests_pass, stubs_replaced }across all four definitions — SOT09_…,skill-orchestrator-patternsExtended Schema, the.claude/agents/code-reviewer.mdwrapper, and01_orchestrator.mdStep 11.commentsis reconciled everywhere as the prose report body, not a JSON key. Additive only;has_critical_issuessemantics untouched. - Objective-Convergence criterion applied identically across all authoritative Sarcasmotron definitions —
vdd-03-develop.md,vdd-adversarial/SKILL.md,vdd-sarcastic/SKILL.md,vdd-adversarial/references/vdd-methodology.md, plus the/vdd-adversarialworkflow — with hostile tone and "assume broken until proven" stance preserved. Stale "Hallucination Convergence/Exit" terminology refreshed invdd-05-run-full-task.md,System/Docs/WORKFLOWS.md,VDD.md, andTDD_VS_VDD.md. VDD-loop mechanics (3-REJECT / escalation / HITL) unchanged. /framework-upgradebackup/rollback is now vendor-aware — Step 3.1 and Step 5 iterate over every present bootstrap file (CLAUDE.md,AGENTS.md,GEMINI.md) and skip absent ones, instead of hard-codingGEMINI.mdalone.
Fixed
- Reviewer-contract drift —
09_…previously emitted only{review_status, has_critical_issues}while three consumers expectede2e_tests_pass/stubs_replaced; now consistent. - Subjective exit fabrication — approval could be triggered by the auditor inventing nitpicks (an unobservable, gameable signal); approval is now bound to the objective bar in every definition. The Phase-4 adversarial review (eating its own dogfood against the new bar) caught two further normative residuals (
VDD.md,TDD_VS_VDD.md,/vdd-adversarial) which were folded in. The/vdd-multiconvergence: hallucinatingdedup noise-filter is a distinct mechanism, intentionally left untouched.
v3.17.0 — Skill-Validator Inline-Block Rule Reform (two-tier warn/fail)
Full Changelog: v3.16.0...v3.17.0
v3.17.0 — Skill-Validator Inline-Block Rule Reform (two-tier warn/fail)
The skill quality gate hard-failed CI on any fenced code block over 12 lines — an arbitrary, line-based threshold with no warning tier and no awareness of block type, stricter than ARCHITECTURE.md §8 itself (which cites "50 lines" as the bad case). v3.16.0's skill-archive-task tripped it. The principle (progressive disclosure — keep SKILL.md lean) is kept; the crude implementation is replaced with a two-tier, fence-type-aware, config-driven check. Validator-only change — no runtime-pipeline tools touched.
Added
- Two-tier inline-block policy — a fenced block over 20 lines emits a non-blocking warning; over 60 lines a hard error. Thresholds are config-driven (
validation.quality_checks.max_inline_lines_warn/_fail). - Fence-type awareness —
mermaidfences are exempt (diagrams);text/console/outputfences can only warn, never fail (output samples). Configurable viainline_exempt_fence_langs/inline_softcheck_fence_langs. - Unclosed-fence detection — an unclosed
```is now reported explicitly instead of being silently swallowed by the parser. tests/test_inline_efficiency.py— 9 regression tests (warn / fail / exempt / softcheck / unclosed branches + a drift guard asserting thevalidate_skill.pyandanalyze_gaps.pycopies stay behaviourally identical), wired into theFramework GatesCI.
Changed
check_inline_efficiencynow returns(errors, warnings)— warnings route to the non-blocking channel instead of failing CI.skill-creatorv2.0 → v2.1,skill-enhancerv1.2 → v1.3 — both validator copies reformed in lockstep;skill-archive-taskv1.2 → v1.3.- Config updated across
.agent/rules/skill_standards.yamland all bundledskill_standards_default.yaml;skill-creatordocs,System/Docs/skill-writing.md, andARCHITECTURE.md§8 updated.
Fixed
skill-archive-taskCI failure — the v3.16.0 Step-7 protocol block (35 lines) and Example Flow (17 lines) hard-failed the old 12-line rule; both were restructured into smaller labelled blocks, and the rule itself reformed so coherent procedural content is no longer penalised.
v3.16.0 — Deterministic Artifact Archiving (PLAN.md lockstep + ARCHITECTURE.md Index-Mode
Full Changelog: v3.15.0...v3.16.0
v3.16.0 — Deterministic Artifact Archiving (PLAN.md lockstep + ARCHITECTURE.md Index-Mode)
Closes a long-standing drift: docs/TASK.md archived reliably, while docs/PLAN.md and docs/ARCHITECTURE.md did not — some projects archived plans to docs/plans/, some dumped them flat into docs/archives/, some never archived; ARCHITECTURE.md grew unbounded (one project reached 2037 lines). Archiving of PLAN.md and ARCHITECTURE.md is now an explicit, deterministic protocol wired into the same skills, prompts, and workflows that already make TASK.md archiving work. Protocol change — no new scripts and no runtime-pipeline tools touched; the existing archive_protocol.py test mirror gains matching archive_plan() coverage.
Added
- PLAN.md lockstep archiving —
skill-archive-tasknow archivesdocs/PLAN.md→docs/plans/plan-NNN-slug.mdin lockstep with TASK.md, reusing the same ID and slug (task-NNN-slug.md↔plan-NNN-slug.md). New protocol Step 7 with explicit edge cases (PLAN.md absent, orphan PLAN.md, re-plan, corrected ID). - ARCHITECTURE.md Index-Mode —
architecture-format-coregains a "Living Document & Index-Mode" section:docs/ARCHITECTURE.mdis a single living document, updated in place and never per-task archived; when it exceeds 1500 lines it is split intodocs/architectures/<section-slug>.mdchunks with a short (~≤200-line) index.
Changed
skill-archive-taskv1.1 → v1.2 (now covers TASK.md + PLAN.md).artifact-managementv1.0 → v1.1,architecture-format-corev1.0 → v1.1,architecture-review-checklistv1.0 → v1.1,skill-safe-commandsv1.0 → v1.1.- Agent prompts — Analyst, Architect, Planner, Architecture Reviewer wired for lockstep archiving + the Index-Mode size check / reviewer backstop.
- Workflows —
01-start-feature,vdd-01-start-feature,light-01-start-feature,light-02-develop-task,04-update-docs,02-plan-implementation,vdd-02-planannotated with the new rules. - docs/ARCHITECTURE.md — Directory Structure updated with
docs/plans/anddocs/architectures/; new "Artifact rotation" note. .agent/tools/archive_protocol.py(the skill-archive-task test mirror) — newarchive_plan()function implementing Step 7 + 8 lockstep tests (23 archive tests total, all green).
Fixed
- PLAN.md archiving drift — plans are no longer dumped flat into
docs/archives/or left unarchived. This repo's own legacydocs/archives/PLAN-*.mdmigrated todocs/plans/plan-NNN-slug.md. - ARCHITECTURE.md unbounded growth — the 1500-line Index-Mode threshold plus an Architecture Reviewer 🟡 MAJOR backstop prevent monolithic architecture files.
v3.15.0 — Framework Installer: `install.sh` (5 vendors, 5 subcommands)
Full Changelog: v3.14.3...v3.15.0
A bootstrap-time CLI that deploys the framework into a clean target project under a chosen agent-system profile — replacing manual folder copying. The framework lives in the target's .agentic-development/ (a symlink to a sibling clone, or a full copy); per-item relative symlinks point into it; a SHA-256-hash-protected managed .gitignore block keeps framework files out of the project's git history. Built end-to-end through the framework's own VDD pipeline (Analyst → Architect → Planner → an 11-task /vdd-develop-all chain → 3-critic /vdd-multi adversarial review). The adversarial passes caught and fixed real bugs before merge — a --dry-run that mutated the filesystem, a snapshot crash on overlapping paths, a CWD-dependent copytree symlink-resolution bug, and an uninstall that could delete user-owned content. The installer is a standalone bootstrap tool — no runtime-pipeline changes.
Added
install.sh— minimal bash wrapper (BASH_VERSIONguard,python3/PyYAML dependency check,exec python3).System/scripts/install.py+System/scripts/installer/— 16-module Python package (stdlib + PyYAML only, per NFR-5).System/scripts/vendors.yaml— declarative vendor profiles; a new agent system is added without touching Python.- Five subcommands —
install/switch/update/uninstall/doctor. - Five vendor profiles —
claude,antigravity,codex,cursor,gemini-cli. - Two deployment modes —
--mode symlink(default,.agentic-development/→ sibling clone) and--mode copy(self-contained, for airgapped / CI). - Pre-flight conflict prevention — every target path is classified (
safe/our/hard_conflict/soft_conflict) before any write;CLAUDE.md/AGENTS.md/GEMINI.md, usersettings.json, and a user-ownedSystem/are never overwritten.--dry-runpreviews the plan with zero filesystem mutation. - Anti-clobber engine — managed blocks in
.gitignoreand bootstrap files are SHA-256-hashed; a hand-edited block aborts the run with a unified diff unless--force(which backs the old version up first). doctor— read-only integrity verifier with a--jsonreport schema (broken symlinks, hash mismatches, state-schema check).tests/installer/— 169unittesttests (per-module unit + 10-scenario end-to-end + bash-wrapper smoke), wired intotests/run_tests.py.
Changed
- docs/ARCHITECTURE.md — new §9 Framework Installer Subsystem (data model, components, invariants, security & safety).
- README.md §Installation —
install.shdocumented as the recommended deployment method; manual folder-copy retained as the alternative.
Fixed
- Adversarial-review fixes folded in before merge:
--dry-runfilesystem mutation; snapshot overlapping-path crash;copytreeCWD-dependent dangling-symlink resolution;uninstall/switchover-broad deletion of user content;inject_blockmarker-line injection;doctorstate-schema gap;apply_retentionable to delete every backup; a staleSystemsymlink survivinguninstall.
v3.14.3 — `/vdd-develop-all`: VDD chain workflow with Sarcasmotron review
Full Changelog: v3.14.2...v3.14.3
v3.14.3 — /vdd-develop-all: VDD chain workflow with Sarcasmotron review
New workflow composing the chain-iteration of /develop-all with the per-task adversarial Sarcasmotron loop of /vdd-develop. Walks the full docs/PLAN.md, applies hostile review to each task, gates progression on explicit user input between tasks, and never auto-commits. Built end-to-end through the framework's own VDD pipeline (Analyst → Architect → Planner → Developer → Sarcasmotron); the build itself surfaced 1 honest REJECTED iteration that fixed a real control-flow bug before merge. No architectural changes — pure composition of existing Layer A / Stage Cycle patterns.
Added
.agent/workflows/vdd-05-run-full-task.md(53 lines) — new chain workflow with 5 numbered steps:- Plan parsing with
--dry-runflag (preview chain without executing). - Per-task VDD cycle A→D: Builder → Verification → Sarcasmotron-roast → Refinement loop. Sarcasmotron persona is delegated to
vdd-03-develop.mdStep 3 (DRY — not inlined). - HITL gate between tasks (
yes / pause / abort) with optional--auto-continue=<seconds>flag for unattended runs. - Session-state persistence on both APPROVED-merge and 3-REJECTED-STOP paths, with explicit ordering
merge → persist → HITL(load-bearing — persist BEFORE the HITL prompt to survive runner crashes during user wait). - Finalization with full regression suite (
python3 tests/run_tests.py+validate_skill.py) and metrics report (merged tasks, REJECTED iterations, Hallucination-Convergence vs honest APPROVED counts). Auto-commit forbidden; commit/PR decision belongs to the user.
- Plan parsing with
.claude/commands/vdd-develop-all.md— slash-command registration, byte-identical structure todevelop-all.md/vdd-develop.mdmodulo the workflow path.- Resumability section + behavioral smoke test: re-invoke
/vdd-develop-allafterpausereads.agent/sessions/latest.yamland resumes from the first non-merged task indocs/PLAN.md. - Refinement loop limit: 3 REJECTED iterations per task before STOP + escalation (chosen over
/03-develop-single-task's 2 because Sarcasmotron is stricter — 2 escalates noisily, 4+ wastes tokens on stuck tasks).
Changed
- CLAUDE.md
## WORKSPACE WORKFLOWS:+1 −1—/vdd-develop-allinserted into Available Commands list next to/vdd-develop. - GEMINI.md:
+1 −1—vdd-05-run-full-taskadded to Available Workflows enumeration. - AGENTS.md §Development Phase:
+1— chain-execution pointer comparing/develop-all(auto-commit) vs/vdd-develop-all(adversarial, HITL, no auto-commit). - System/Docs/WORKFLOWS.md:
+22 −5across 4 surgical edits:- Mermaid diagram:
VDDRunAll{{vdd-05-run-full-task}}node added to Automation Loops, with edge labels distinguishing auto-commit (RunAll) from Sarcasmotron+HITL (VDDRunAll). - Automation Loops table: new row for
vdd-05-run-full-task; clarified existing05-run-full-taskrow to mention auto-commit; clarifiedvdd-03-developas "single task". - FAQ: new entry "When should I use
vdd-05-run-full-taskinstead of05-run-full-task?" listing the 3 load-bearing differences (adversarial review, mandatory HITL, no auto-commit). - VDD Multi-Step example: Step 3 expanded into 3a (single-task) and 3b (chain) variants.
- Mermaid diagram:
- .agent/workflows/vdd-03-develop.md:
+2— trailing cross-link note pointing at/vdd-develop-allfor chain execution.
Fixed (caught during the build itself)
- Hallucinated test path (caught in Verification, before Sarcasmotron): user's original brief and intermediate spec drafts referenced
bash tests/test_e2e.sh— a file that does not exist in this repo. The actual test harness ispython3 tests/run_tests.py. Workflow file now uses the real path. Spec drift remains indocs/TASK.mdanddocs/tasks/task-061-02-workflow-impl.md; left as-is per "specs are write-once snapshots" — the implementation is canonical. - Step 3 ↔ Step 4 control-flow ambiguity (caught by Sarcasmotron, iteration 1 of 061-02): Step 2D originally said "APPROVED → merge → Step 3 (HITL)", but Step 4 (session-state persist) sat below Step 3 numerically and claimed "after every merge", leaving the persist-vs-HITL execution order undefined. Could lose merge state if a runner crashes during user wait. Fix: Step 2D now explicitly states
merge → Step 4 (persist) → Step 3 (HITL gate) → next task, with "Order is load-bearing" callout. - Missing failure-path session-state persist (caught by Sarcasmotron, iteration 1 of 061-02): the 3-REJECTED-STOP path did not persist failure state, so resumption after escalation would silently retry from scratch. Fix: Step 2D now invokes Step 4 with
--status "failed_sarcasmotron" --add_blocker "Task <name>: 3 REJECTED iterations"on the STOP path; Step 4 header annotates "called from Step 2D — both APPROVED and 3-REJECTED-STOP paths".
Verification
- All 7 RTM checks for Task 061-01 (stubs) GREEN on iteration 1.
- All 7 RTM checks for Task 061-02 (logic) GREEN after iteration 2 (1 Verification fix + 2 Sarcasmotron fixes).
- All 4 RTM checks for Task 061-03 (cross-links) GREEN on iteration 1 (
+1 −1to CLAUDE.md,+2 −0tovdd-03-develop.md, no other workflow regressions). - Final regression suite:
python3 tests/run_tests.py→ 5/5 passed. - Workflow file 53 lines (≤150 line budget).
Process metrics (chain build)
| Metric | Value |
|---|---|
| Tasks merged | 3/3 |
| Total REJECTED iterations across chain | 1 |
| Hallucination-Convergence APPROVED | 3 |
| Honest APPROVED (no nitpick-inversion) | 0 |
| Verification-phase finds (caught before Sarcasmotron) | 1 |
The single honest REJECTED iteration was a genuine save: Sarcasmotron caught a control-flow bug that would have made session-state semantics ambiguous to a future LLM consumer. The Verification phase separately caught a hallucinated test path inherited from the brief — exactly the filter Step B is designed to provide.
Impact
- New chain primitive for high-rigor multi-task batches: per-task adversarial scrutiny + mandatory HITL + zero accidental commits. Pairs with the existing
/develop-all(fast path with auto-commit) — pick by required rigor, not by default. - Resumability via
latest.yamlmakes pause/resume a first-class operation: long batches can run across context-window resets without losing merge state. - Demonstrates the framework can build its own next-tier workflow under its own VDD pipeline, including catching real bugs via the adversarial loop. The Sarcasmotron persona's "Hallucination Convergence" exit rule worked as intended: round 1 produced 2 honest findings, round 2 reduced to bikeshedding.
v3.14.2 — security-audit skill v3.2 → v3.3 (bug fixes + coverage + hardening)
Full Changelog: v3.9.17...v3.14.2
v3.14.2 — security-audit skill v3.2 → v3.3 (bug fixes + coverage + hardening)
Post-analysis critique of the security-audit skill surfaced 2 real bugs, 4 coverage gaps, and 6 refinement opportunities. All 14 items addressed. Scanner integrity is visibly improved; no breaking changes to CLI surface (new --max-size is additive).
Fixed (HIGH — correctness bugs)
- Pip lock-file detection: scanners.py previously treated
requirements.txtas a lock file (it is not — it does not pin a transitive graph with hashes) AND had nois_typebranch for Python at all, soMissing Lock Filenever fired for pip projects. Rewrotescan_dependenciesto use ecosystem groups with explicitmarkers(presence ofpyproject.toml/setup.py/setup.cfg/requirements.txt/Pipfile) andlocks(real lock files only:Pipfile.lock,poetry.lock,uv.lock,pdm.lock). Also fixed JS over-flagging (previouslyyarn+pnpmboth fired whenpackage-lock.jsonwas present — now any of the three locks satisfies the JS ecosystem). - Report-path rassinchron:
.agent/workflows/security-audit.mdwrotedocs/SECURITY_AUDIT.md; thesecurity-auditoragent (andSystem/Agents/10_security_auditor.md) writesdocs/audit/security-{ID}.md. Aligned workflow → agent convention (supports multiple audits + integrates withskill-archive-taskID convention).
Fixed (MED — coverage + refinement)
- SBOM scan was non-recursive (scanners.py
scan_sbom):glob("*sbom*")only looked in the project root. SBOMs are commonly placed inbuild/,dist/,artifacts/,docs/. Switched torglobwithSKIP_DIRSfilter and duplicate dedup. Nesteddocs/sbom.jsonnow detected. - Dead SBOM-probe block removed: previous code ran
syft --version/cdxgen --versionand printed "tool is available for SBOM generation" without actually generating anything. Either misleading or unfinished — removed entirely; generation instructions remain in theMissing SBOMfinding message. MAX_FILE_SIZE5 MB → 15 MB + CLI--max-size MB: 5 MB was silently skipping most modern minified production bundles (vendor.js/bundle.jsroutinely exceed 10 MB after WebpackDefinePlugin). Bumped default to 15 MB; added--max-sizeflag with runtime override (scanners now readconfig.MAX_FILE_SIZEvia module-reference, not import-time copy).- Solidity
public/externalfalse positives on view/pure: pattern flagged every non-modifierpublicfunction, noisily firing on view/pure getters. Tightened regex with negative lookahead(?!.*\b(?:view|pure|constant)\b)— getters no longer flagged; state-mutators still flagged.
Added (coverage expansion)
- +16 regex patterns across 3 language stacks (
patterns.py):- Rust (6):
unsafe {}blocks,unsafe fn,std::mem::transmute,std::mem::forget,.unwrap_unchecked,from_raw_parts,rand::random(weak RNG for security). - Go (6):
"math/rand"import + call-site, SQL concat/Sprintfindb.Query/Exec,http.ListenAndServe(missing TLS),filepath.Joinwith request data,exec.Commandwith formatted/concat string. - GraphQL (4):
introspection: true,GRAPHQL_PLAYGROUND=true,graphiql: true,ApolloServer({...})config (verify depth/complexity limits for DoS).
- Rust (6):
- External tools — cross-cutting additions (external.py):
semgrep --config auto(de-facto SAST standard since 2024) now runs for any project type.gitleaks detect(primary) withtrufflehog filesystemfallback — stronger secret detection than regex-only.- Missing tools remain non-fatal (per
run_commandcontract).
- ReDoS guard: added
MAX_LINE_LENGTH = 4000to config.py;scan_code_patternsnow skips pathologically long lines (minified JS routinely has >100k-char single lines, triggering catastrophic backtracking on complex regex). Real source code lines almost never exceed 4k chars. fuzzing_invariants.mdexpanded 42 → 170+ lines: 8 invariant categories (accounting, access control, monotonicity, pausability, ERC-20, ERC-4626, oracle, reentrancy); Foundry / Echidna / Medusa / Halmos setup; mandatory handler-based fuzzing pattern with ghost state; depth requirement table by criticality; 10-item edge-case checklist; post-fuzz regression discipline.
Documentation
- SKILL.md §2 now documents
--max-sizeflag, Rust/Go/GraphQL coverage, semgrep/gitleaks cross-cutting tools, ReDoS guard, and clarifies that--scan-type externalruns ONLY external tools (SKIPS regex scans) — previously ambiguous. - Version bumped to v3.3 in SKILL.md frontmatter, header, and
run_audit.pymodule docstring + CLI description.
Verification (smoke-tested)
- Self-exclusion holds on own skill dir (0 findings).
pyproject.tomlalone →Missing Lock File(pip) — previously silent.requirements.txtalone →Missing Lock File(pip) — previously silent.- Nested
docs/sbom.jsondetected via rglob — previously reported missing. - Rust test file (
unsafe {},std::mem::transmute,rand::random::<u32>()) → all 3 patterns fire. - Solidity test:
viewgetter skipped,publicstate-mutator flagged. - Config/deps/IaC/SBOM scans on repo root all pass without regression.
Impact
- Python projects without real lock files (Pipfile.lock/poetry.lock/uv.lock/pdm.lock) now receive supply-chain warnings — previously false-negative. Hash-pinned
requirements.txt(pip-compile output with--hash=sha256:lines) is accepted as a lock (avoids false-positive on pip-tools mainstream pattern, added in Round 4). - Rust, Go, and GraphQL codebases receive initial in-process regex coverage. For depth,
gosec/govulncheck/semgrep/cargo-audit/clippyremain primary (invoked via--scan-type external); the in-process patterns are fast signalling, not a replacement. - Minified bundles up to 15 MB are now scanned for accidentally-committed secrets (previously 5 MB cutoff).
- Adversarial convergence signal:
issues-foundat R3 (3 actionable bugs fixed in P1–P2) and again at R4 (10 defects — 4 broken patterns + pip-compile false-positive regression + SBOM perf regression + test gap — all fixed before release tag).
v3.14.1 — VDD adversarial-review fixes on v3.14.0
Post-release adversarial critique of v3.14.0 surfaced 7 findings (2 HIGH, 4 MED, 2 LOW). All addressed in this patch. No behavior change for Claude Code users; all fixes are rigor/documentation improvements that close silent-fail modes.
Fixed (HIGH — closes silent-fail modes)
SKILL.md §1load-semantics ambiguity: previous wording "load the matching reference" did not specify WHO reads WHEN. A junior agent could consult the selection table, memorize the choice, and never actuallyReadthe reference file — proceeding to §2 with only abstract concepts and no invocation syntax. Now explicitly: "Use theReadtool to load the matching reference file now, before applying §2–§6."sequential-fallback.mduntested claim: the file was marked "Complete and vendor-agnostic" but had never been exercised on a non-Claude runtime. Downgraded to "proposed pattern, not yet validated" with an explicit caveat that all claims about wall-clock overhead, context-bleed, and persona-swap effectiveness are theoretical. ParentSKILL.md §7gained the same caveat. Invites first-validator PR after real run.
Fixed (MED — closes fail-soft-where-loud-was-safer)
- Stubs now emit a visible DEGRADED-MODE banner (
gemini-cli.md,cursor.md,antigravity.md): previously a non-Claude agent landing on a stub silently fell through to sequential fallback — user thinks they have parallel execution, actually running at ~3× latency. The banner makes the degradation loud at the top of each stub. SKILL.md §1.1detection now specifies cwd-walkup: previous rule assumed the agent ran from project root. Now detects via find-up (walk from cwd toward filesystem root, stop at.git), and emits a warning rather than silently falling back when no marker is found.SKILL.md §1.2tie-break clarified: when multiple runtime markers match, replace the untestable "prefer runtime currently executing" with concrete signals — tool-list fingerprint (Agent+TeamCreate+SendMessage→ Claude Code), explicitruntime:caller hint, and explicit warning on still-ambiguous instead of silent guess.- v3.14.0 CHANGELOG overclaim softened: "No behavior change for Claude Code users" → "Content preserved; section numbering reorganized — see §9 History and
references/claude-code.mdfor the mapping." Calls out the§5.1 → §5anchor shift for anyone citing the old structure in external notes.
Fixed (LOW — dedup + translation hint)
- Stubs deduplicated via
references/_stub-template.md: previously 3 near-identical files (~45 lines each, ~80% shared). The shared checklist + contribution guidance now lives in_stub-template.md; vendor-specific stubs slimmed to ~22 lines each, carrying only the vendor-specific marker + warning banner + pointer to template. Maintenance: update one template file instead of three. sequential-fallback.mdadds orchestration-style note: previously assumed chat-based orchestration (messages as persona swaps). Added explicit SDK/API translation note — for non-chat runtimes, the pattern is onesystemprompt per teammate withmessageslist reset betwe...
v3.9.17 — Developer Discipline: Karpathy Guidelines Integration
Full Changelog: v3.9.16...v3.9.17
v3.9.17 — Developer Discipline: Karpathy Guidelines Integration
Added
- §1.5 Think Before Implementing (
developer-guidelines): Graduated ambiguity handling protocol — critical ambiguity goes to TASK.md Open Questions, implementation-level decisions are made by the developer with brief documentation, trivial decisions are made silently. - §1.6 Implementation Discipline (
developer-guidelines): Two-level decision framework — architectural decisions (new modules, public API, data models) must come from PLAN.md/ARCHITECTURE.md; implementation details (internal patterns, helpers, abstractions) are the developer's professional judgment. Speculative complexity is prohibited. - §6.2 Multi-Step Tasks (
developer-guidelines): Generalized Verification Protocol withStep → verify: [check]pattern, extending the Bug Fixing Protocol to all multi-step work. - Before/after code examples (
developer-guidelines/examples/coding-anti-patterns.md): 3 real-world examples — drive-by refactoring, speculative features vs. plan-driven implementation, silent interpretation vs. surfacing ambiguity. Adapted from Karpathy Guidelines for complex product development context.
Improved
- Red Flags (
developer-guidelines§0): +2 entries — against silent architectural changes and speculative features. - Strict Adherence (
developer-guidelines§1): +2 entries — Task Traceability (every change must serve the task, professional choices within scope are OK) and Style Matching (match existing code style). - Rationalization Table (
developer-guidelines§9): +3 entries covering speculative additions, silent plan deviation, and drive-by improvements. - Atomicity & Traceability (
core-principles§1): Added Verification Checkpoints for multi-step tasks. - Minimizing Hallucinations (
core-principles§3): Added Ambiguity Protocol with cross-reference to developer-guidelines §1.5. - Token budget (
skill-phase-context): Updated Development phase estimate from ~768 to ~1,100 to reflect expanded developer-guidelines.
Design Decisions
- "Implementation Discipline" instead of "Simplicity First": Karpathy's "minimum code" principle was adapted for complex product development — architectural complexity is valid when plan-driven; only speculative complexity is prohibited.
- Graduated Ambiguity instead of "ask everything": Three-tier protocol prevents bombarding users with questions while ensuring critical decisions are surfaced.
- No new standalone skill created: All changes integrated into existing
developer-guidelines(Tier 1) andcore-principles(Tier 0) to avoid skill bloat and tier conflicts.
v3.9.16 — Security Audit v3.2: Smart Contract Patterns & Modular Architecture
Full Changelog: v3.9.15...v3.9.16
v3.9.16 — Security Audit v3.2: Smart Contract Patterns & Modular Architecture
Added
- Solidity/Smart Contract patterns (16 new): Reentrancy (
.call{value:},.send(),.transfer()), arbitrary execution (delegatecall,selfdestructEIP-6780,suicide()), access control (tx.origin, public/external without modifier), oracle manipulation (getReserves(),latestRoundData()), unchecked return values, unprotected initializers, integer overflow (pre-0.8.0), locked ether, inline assembly. - VDD Round 3 critique document with real hack coverage matrix (Dec 2025 – Mar 2026).
- Real-world hack validation: Scanner tested against contracts simulating SwapNet ($13.4M), Truebit ($26.4M), YieldBlox ($10.2M), Aperture ($4M) attack vectors — 7/10 vectors fully detected.
Improved
- Modular scanner architecture: Refactored 886-line monolith
run_audit.pyinto 7-file package (audit/config.py,audit/patterns.py,audit/helpers.py,audit/scanners.py,audit/external.py,audit/__init__.py). - MAX_FILE_SIZE consistency: Added 5MB file size guard to
scan_configuration()andscan_iac(). - Pattern count: 105 → 121 total patterns (28 secret + 62 dangerous + 25 IaC + 6 config).
Fixed
- VDD Round 2 (8 issues):
os.popen()CWE misclassification, missingsubprocess.run shell=True, Flask open redirect regex, SQL%formatting detection, IaC false positives on non-IaC YAML, symlink following, SSRF pattern expansion.
v3.9.15 — Claude Code Integration
v3.9.15 — Claude Code Integration
Added
- Claude Code entry point: Created
CLAUDE.md(136 lines) adapted fromGEMINI.mdwith native Claude Code tool references (Read, Write, Edit, Bash, Grep, Glob), session state bootstrap, and explicit tier-based skill loading protocol. - Claude Code hooks: Added
.claude/settings.jsonwithPostToolUsehook and.claude/hooks/validate_skill_hook.shfor automatic skill validation on file modification. - Claude Code commands: Created 20 slash command files in
.claude/commands/covering all 21 workflows (delegator pattern — single source of truth in.agent/workflows/).- Core:
/start-feature,/plan,/develop,/develop-all,/light - VDD:
/vdd,/vdd-start-feature,/vdd-plan,/vdd-develop,/vdd-adversarial,/vdd-multi - Pipelines:
/full,/security-audit,/base-stub-first,/framework-upgrade,/iterative-design - Product:
/product-full-discovery,/product-market-only,/product-quick-vision - Docs:
/update-docs
- Core:
- Migration specification: Added
docs/migration-to-claude.mdwith full platform comparison, tool mapping, hook adaptation guide, and validation checklist.
Improved
- AGENTS.md: Added missing "Session State Persistence" instruction (
update_state.pyon phase boundaries), achieving parity withGEMINI.md. - SESSION_CONTEXT_GUIDE.md: Added Section 5 "Platform Memory Integration" documenting how framework session state complements platform-specific memory systems (Claude Code, Cursor, Gemini).
- README.md / README.ru.md: Updated "Option C: Claude Code" section — replaced manual setup instructions with ready-to-use configuration and full command list.
Full Changelog: v3.9.14...v3.9.15