Skip to content

Releases: MatrixFounder/Agentic-development

v3.20.12 - Framework different refinements

10 Jun 19:51

Choose a tag to compare

Full Changelog: v3.19.0...v3.20.12

v3.20.12 — Corrected description of the framework's helper tools (additional + fallback)

The framework ships a small set of helper tools — generate a unique archive filename, run tests, check/stage/commit with git, and read/write/list files. A previous release mistakenly described this whole subsystem as obsolete and unused. This release fixes the description only: behaviour, code, and files are unchanged, and nothing was deleted or moved.

Changed

  • The helper tools now fall into two clear groups. The archive-filename generator has no equivalent among a coding assistant's own tools, so it is always used — run it with python3 .agent/tools/task_id_tool.py <name>. The rest (run tests, git, file read/write/list) duplicate what an assistant already does on its own, so they act as a fallback for assistants that lack those built-ins. The code that runs an individual tool is implemented and tested; the piece that would let an assistant drive these tools by itself is documented honestly as not yet built.
  • The tool list was restored in the assistant setup filesAGENTS.md (Cursor, Codex) and GEMINI.md (Gemini CLI, Antigravity) — where the previous release had removed it. Each entry was written to match that assistant's own official documentation for running commands and adding extra tools. The Claude Code setup file was already correct and left unchanged.
  • Supporting documents were brought in lineORCHESTRATOR.md, SOURCE_OF_TRUTH.md, SKILLS.md, and RELEASE_CHECKLIST.md now present the subsystem as additional/fallback tools, and a leftover reference to a tool that never actually existed was corrected. Nothing was archived; all automated checks pass.

v3.20.11 — Vendor-Currency: Tool-Layer Reword + GEMINI.md Symlink Re-sync

Follow-up to the System/Agents cross-vendor audit (items 1, 2, 4; item 3 — version-header re-stamp — intentionally skipped). Framing-only, zero pipeline-behavior change. Task 082, gate artifact docs/reviews/framework-audit-082.md. Scope: "reword prompts only" — schemas.py / tool_runner.py / ORCHESTRATOR.md left in place. Gates 43/43, pytest 30/30.

Changed

  • Dead tool-dispatch framing retired in the two core role-prompts and two bootstrap files. 00_agent_development.md, 01_orchestrator.md, AGENTS.md, GEMINI.md no longer instruct the orchestrator to call a standalone-Python run_tests / git_status / git_ops / file_ops / execute_tool dispatcher (imported only by its own tests; used by no current vendor harness). They now say: use your harness's built-in file/shell/search tools, and run repo Python helpers via the shell. CLAUDE.md was already correct and is unchanged (it is the donor template). System/Docs/ORCHESTRATOR.md is now referenced as legacy.
  • task_boundary fiction removed. 00_agent_development.md (General Concept, §2, anti-patterns) and GEMINI.md (workflow dispatch) no longer describe a task_boundary tool/protocol for state tracking; state is persisted via skill-session-state (update_state.py) at phase boundaries.

Fixed

  • GEMINI.md symlink-resolution gap. Gemini CLI's bootstrap file was a stale fork (~v3.15) missing the SYMLINK RESOLUTION + SYMLINK-AWARE COMMAND DEFAULTS protocol that AGENTS.md received in v3.19.1. Ported both sections — closes a silent skill-load failure when the framework is deployed via symlinks (Gemini's find/rg skip symlinks by default).

v3.20.10 — Item 6 In-Repo Complete: Antigravity Adapter + Vendor Dispatch (6d) + Wave-5 Generator (6e)

Finishes the in-repo half of roadmap item 6 (C-07). After this, item 6's only remaining piece is operator e2e validation on real CLIs. Task 081, gate artifact docs/reviews/framework-audit-081.md. Doc/script-only, gates 43/43, pytest 30/30; the only severity escalation in the merge logic remains R3b.

Added

  • Google Antigravity adapter (4th vendor) — references/antigravity.md stub→full, verified via web (primary docs render client-side; corroborated from antigravity.google/docs/agent-manager + Google-Cloud/Medium + DataCamp + gemini-cli discussion #27305). Dynamic-first architecture documented (orchestrator spawns subagents on the fly, no config files) alongside the static custom-agent form (agent.json at ~/.gemini/antigravity-cli/agents/<name>/). Async parallel ✅. Detection ambiguity recorded honestly — Antigravity shares AGENTS.md (Codex) and ~/.gemini/ (Gemini); a provisional .antigravity/ marker is used pending validation.
  • Wave-5 wrapper generator (6e)scripts/generate_wrappers.py + scripts/wrappers_manifest.json: one manifest → 12 critic wrappers across 4 scaffold vendors (Gemini MD+YAML, Codex TOML, Cursor MD+YAML, Antigravity JSON) in their native formats, all pointing at the same SOT skills + enum. Claude Code excluded (validated reference/donor, hand-maintained). --check mode exits non-zero on drift (CI-gateable). Hand-sync of scaffold wrappers is eliminated.

Changed

  • 6d — sequential-fallback demoted (C-07 "functionally equivalent" claim removed from vdd-multi.md + SKILL.md §7): vdd-multi's "Fallback (Sequential)" section is now "Vendor dispatch" — resolve the runtime (skill §1) → use its native parallel adapter (Codex/Cursor/Antigravity ✅, Gemini Layer-A pending); sequential role-switching is the documented last resort (primitive-less runtime / single-session debug / 1-slot CI), explicitly not functionally equivalent to parallel. All flags + the evidence contract honored on every path.
  • 6e — drift-grep extended (KNOWN_ISSUES.md) to all 5 wrapper dirs (.claude/.gemini/.codex/.cursor/.antigravity/agents/); documents that scaffold wrappers are generated (edit the manifest, run the generator, never hand-edit).
  • Detection table (SKILL.md §1.1) Antigravity row updated (provisional marker + ambiguity). skill-parallel-orchestration 3.6→3.7.

Still open under item 6

  • Operator e2e validation on real Codex / Cursor / Antigravity / Gemini installs — graduates each ⚠️ SCAFFOLD → ✅. Until then the banners stay and sequential remains the proven path. Roadmap item 6 stays 🔜 (6a–6e in-repo ✅ · validation ⏳).

v3.20.9 — Vendor Adapter Scaffolds: Codex / Gemini / Cursor (roadmap item 6, sub-tasks 6a–6c, in-repo portion)

In-repo scaffolds for parallel-critic dispatch on three non-Claude runtimes, authored from primary-source docs (geminicli.com, developers.openai.com/codex, cursor.com — verified in-session 2026-06-10). Everything ships ⚠️ SCAFFOLD — not yet validated on real runtimes; graduation to ✅ requires one operator-run /vdd-multi --no-fix per CLI (hardware/accounts the operator does not have right now — explicitly deferred). Task 080, gate artifact docs/reviews/framework-audit-080.md. Doc-only diff, gates 43/43, pytest 30/30.

Added

  • Three vendor references (skill-parallel-orchestration/references/): codex-cli.md (NEW), gemini-cli.md + cursor.md (stub→full), each at claude-code.md depth — concept→primitive mapping, Layer A pattern, read-only critic enforcement, wrapper catalog, validation gate.
  • Nine thin critic wrappers at real runtime paths, all pointing at the same SOT skills (vdd-adversarial, skill-adversarial-security, skill-adversarial-performance) with the same clean-pass | issues-found | bikeshedding-only enum: .codex/agents/critic-*.toml (×3, sandbox_mode="read-only"), .gemini/agents/critic-*.md (×3, read-only tools), .cursor/agents/critic-*.md (×3, readonly: true).
  • Detection table (SKILL.md §1.1) gains a Codex row (.codex/agents/); Gemini/Cursor statuses restated as "Scaffold — documented, not validated". First-match-wins keeps Claude Code precedence in this repo.

Verified against primary docs

  • Codex CLI — TOML in .codex/agents/; parallel confirmed ("spawns in parallel, waits for all, consolidates"); sandbox_mode="read-only" maps to the read-only critic guarantee.
  • Cursor 2.4 — Markdown+YAML in .cursor/agents/; parallel confirmed (max 10); readonly: true is purpose-built for reviewer subagents; is_background = async (Layer B, deferred).
  • Gemini CLI — Markdown+YAML in .gemini/agents/; ⚠️ parallel multi-spawn NOT documented (only auto-delegation + @subagent). The scaffold records this gap honestly and corrects the roadmap's earlier optimistic "concurrent subagents" note; the multi-critic flow on Gemini stays sequential-delegation until a real run proves Layer A.

Deferred (still open under item 6)

  • e2e validation on real CLIs (operator action), 6d (sequential-fallback demotion + vdd-multi "Vendor dispatch" rewrite), 6e (drift-grep extension + Wave-5 wrapper generator). skill-parallel-orchestration 3.5→3.6. Roadmap item 6 stays 🔜 — scaffolds authored, not validated.

v3.20.8 — Tier-Diverse Escalation Demoted to Tag-Only (mini-exp 078 refuted the premise)

The R3c tier-diverse +1 escalation shipped in v3.20.7 was an explicit pilot. Mini-experiment 078 (docs/reviews/tier-diverse-experiment-078.md, fresh sealed corpus, 3 arms, 18 bugs) refuted its sole premise: cross-tier critic agreement was less precise than same-tier (overlap precision 0.66 vs 0.73), so escalating severity on it would manufacture false positives. This release translates that finding into the rule — the escalation is demoted to a tier-diverse provenance tag with no +1. The --models config is retained (078 validated it as a recall/coverage tool: D-tier hit the highest recall, 100% pooled). Task 079, gate artifact docs/reviews/framework-audit-079.md. Zero mechanics regression beyond the inten...

Read more

v3.19.0 — Multi-Critic Objective Convergence (parallel adversarial pipeline)

28 May 21:56

Choose a tag to compare

Full Changelog: v3.17.0...v3.19.0

v3.19.0 — Multi-Critic Objective Convergence (parallel adversarial pipeline)

Follow-up to v3.18.0: the parallel critics (critic-logic / critic-security / critic-performance) still self-certified convergence via a subjective hallucinating state — the same gameable pattern v3.18.0 removed from Sarcasmotron, and worse, /vdd-multi's Phase-3 termination marked a category done on it. Replaced with an objective state, so both the termination gate and the merge noise-filter are objective.

Changed

  • Critic Convergence signal enum clean-pass | issues-found | hallucinatingclean-pass | issues-found | **bikeshedding-only** across the three critic agents, with bikeshedding-only defined objectively ("no legitimate findings remain — only style/nits; NOT 'forced to invent problems'").
  • /vdd-multi Phase-3 termination now marks a category ✓ on the objective bikeshedding-only / clean-pass state instead of "critic inventing problems".
  • Merge noise-filter (vdd-multi.md + skill-parallel-orchestration) re-keyed off bikeshedding-only; the "drop a converged critic's low-severity items this iteration" mechanic is unchanged. Satellite references (skill-parallel-orchestration §2.3, examples/usage_example.md, references/sequential-fallback.md) refreshed to the objective terminology.

Unchanged (invariants)

  • All other merge rules — location dedup (±3 lines), cross-category re-attribution, severity escalation on independent overlap, --severity filter, iteration cap — and the Layer A / Layer B decision rule are byte-identical. Skill gate 43/43; VDD adversarial review APPROVED with zero findings.

v3.18.0 — Reviewers Hardening (provable clean review + objective Sarcasmotron exit, cross-vendor)

Three reviewer weaknesses, plus a cross-vendor backup gap, hardened without merging or re-toning the two review roles. The Code Reviewer's clean pass is now provable; its output contract is converged across four drifting definitions; the Sarcasmotron exit is moved off a subjective trigger onto an objective bar across every authoritative definition; and /framework-upgrade now backs up all vendor bootstrap files. has_critical_issues and the orchestrator DECISION TABLE are byte-for-byte unchanged — control-flow is identical before and after.

Added

  • Code Reviewer "Verified" block — when has_critical_issues = false, the report must carry a plain-markdown block proving the scope of the clean pass (requirements cross-checked + edge cases considered), so "looked and clean" is distinguishable from "didn't look". Body text only — never a structured key — so it cannot affect control-flow (09_code_reviewer_prompt.md).
  • Objective Convergence — the Sarcasmotron exit is now bound to an objective bar (full test run executed · 0 CRITICAL · 0 legitimate logic/security/slop findings · only bikeshedding left), replacing the subjective "forced to invent nitpicks → approve" trigger that let a lazy/sycophantic model exit early.

Changed

  • Reviewer output contract converged to one superset { review_status, has_critical_issues, e2e_tests_pass, stubs_replaced } across all four definitions — SOT 09_…, skill-orchestrator-patterns Extended Schema, the .claude/agents/code-reviewer.md wrapper, and 01_orchestrator.md Step 11. comments is reconciled everywhere as the prose report body, not a JSON key. Additive only; has_critical_issues semantics untouched.
  • Objective-Convergence criterion applied identically across all authoritative Sarcasmotron definitions — vdd-03-develop.md, vdd-adversarial/SKILL.md, vdd-sarcastic/SKILL.md, vdd-adversarial/references/vdd-methodology.md, plus the /vdd-adversarial workflow — with hostile tone and "assume broken until proven" stance preserved. Stale "Hallucination Convergence/Exit" terminology refreshed in vdd-05-run-full-task.md, System/Docs/WORKFLOWS.md, VDD.md, and TDD_VS_VDD.md. VDD-loop mechanics (3-REJECT / escalation / HITL) unchanged.
  • /framework-upgrade backup/rollback is now vendor-aware — Step 3.1 and Step 5 iterate over every present bootstrap file (CLAUDE.md, AGENTS.md, GEMINI.md) and skip absent ones, instead of hard-coding GEMINI.md alone.

Fixed

  • Reviewer-contract drift09_… previously emitted only {review_status, has_critical_issues} while three consumers expected e2e_tests_pass/stubs_replaced; now consistent.
  • Subjective exit fabrication — approval could be triggered by the auditor inventing nitpicks (an unobservable, gameable signal); approval is now bound to the objective bar in every definition. The Phase-4 adversarial review (eating its own dogfood against the new bar) caught two further normative residuals (VDD.md, TDD_VS_VDD.md, /vdd-adversarial) which were folded in. The /vdd-multi convergence: hallucinating dedup noise-filter is a distinct mechanism, intentionally left untouched.

v3.17.0 — Skill-Validator Inline-Block Rule Reform (two-tier warn/fail)

22 May 21:13

Choose a tag to compare

Full Changelog: v3.16.0...v3.17.0

v3.17.0 — Skill-Validator Inline-Block Rule Reform (two-tier warn/fail)

The skill quality gate hard-failed CI on any fenced code block over 12 lines — an arbitrary, line-based threshold with no warning tier and no awareness of block type, stricter than ARCHITECTURE.md §8 itself (which cites "50 lines" as the bad case). v3.16.0's skill-archive-task tripped it. The principle (progressive disclosure — keep SKILL.md lean) is kept; the crude implementation is replaced with a two-tier, fence-type-aware, config-driven check. Validator-only change — no runtime-pipeline tools touched.

Added

  • Two-tier inline-block policy — a fenced block over 20 lines emits a non-blocking warning; over 60 lines a hard error. Thresholds are config-driven (validation.quality_checks.max_inline_lines_warn / _fail).
  • Fence-type awarenessmermaid fences are exempt (diagrams); text/console/output fences can only warn, never fail (output samples). Configurable via inline_exempt_fence_langs / inline_softcheck_fence_langs.
  • Unclosed-fence detection — an unclosed ``` is now reported explicitly instead of being silently swallowed by the parser.
  • tests/test_inline_efficiency.py — 9 regression tests (warn / fail / exempt / softcheck / unclosed branches + a drift guard asserting the validate_skill.py and analyze_gaps.py copies stay behaviourally identical), wired into the Framework Gates CI.

Changed

  • check_inline_efficiency now returns (errors, warnings) — warnings route to the non-blocking channel instead of failing CI.
  • skill-creator v2.0 → v2.1, skill-enhancer v1.2 → v1.3 — both validator copies reformed in lockstep; skill-archive-task v1.2 → v1.3.
  • Config updated across .agent/rules/skill_standards.yaml and all bundled skill_standards_default.yaml; skill-creator docs, System/Docs/skill-writing.md, and ARCHITECTURE.md §8 updated.

Fixed

  • skill-archive-task CI failure — the v3.16.0 Step-7 protocol block (35 lines) and Example Flow (17 lines) hard-failed the old 12-line rule; both were restructured into smaller labelled blocks, and the rule itself reformed so coherent procedural content is no longer penalised.

v3.16.0 — Deterministic Artifact Archiving (PLAN.md lockstep + ARCHITECTURE.md Index-Mode

21 May 11:19

Choose a tag to compare

Full Changelog: v3.15.0...v3.16.0

v3.16.0 — Deterministic Artifact Archiving (PLAN.md lockstep + ARCHITECTURE.md Index-Mode)

Closes a long-standing drift: docs/TASK.md archived reliably, while docs/PLAN.md and docs/ARCHITECTURE.md did not — some projects archived plans to docs/plans/, some dumped them flat into docs/archives/, some never archived; ARCHITECTURE.md grew unbounded (one project reached 2037 lines). Archiving of PLAN.md and ARCHITECTURE.md is now an explicit, deterministic protocol wired into the same skills, prompts, and workflows that already make TASK.md archiving work. Protocol change — no new scripts and no runtime-pipeline tools touched; the existing archive_protocol.py test mirror gains matching archive_plan() coverage.

Added

  • PLAN.md lockstep archivingskill-archive-task now archives docs/PLAN.mddocs/plans/plan-NNN-slug.md in lockstep with TASK.md, reusing the same ID and slug (task-NNN-slug.mdplan-NNN-slug.md). New protocol Step 7 with explicit edge cases (PLAN.md absent, orphan PLAN.md, re-plan, corrected ID).
  • ARCHITECTURE.md Index-Modearchitecture-format-core gains a "Living Document & Index-Mode" section: docs/ARCHITECTURE.md is a single living document, updated in place and never per-task archived; when it exceeds 1500 lines it is split into docs/architectures/<section-slug>.md chunks with a short (~≤200-line) index.

Changed

  • skill-archive-task v1.1 → v1.2 (now covers TASK.md + PLAN.md).
  • artifact-management v1.0 → v1.1, architecture-format-core v1.0 → v1.1, architecture-review-checklist v1.0 → v1.1, skill-safe-commands v1.0 → v1.1.
  • Agent prompts — Analyst, Architect, Planner, Architecture Reviewer wired for lockstep archiving + the Index-Mode size check / reviewer backstop.
  • Workflows01-start-feature, vdd-01-start-feature, light-01-start-feature, light-02-develop-task, 04-update-docs, 02-plan-implementation, vdd-02-plan annotated with the new rules.
  • docs/ARCHITECTURE.md — Directory Structure updated with docs/plans/ and docs/architectures/; new "Artifact rotation" note.
  • .agent/tools/archive_protocol.py (the skill-archive-task test mirror) — new archive_plan() function implementing Step 7 + 8 lockstep tests (23 archive tests total, all green).

Fixed

  • PLAN.md archiving drift — plans are no longer dumped flat into docs/archives/ or left unarchived. This repo's own legacy docs/archives/PLAN-*.md migrated to docs/plans/plan-NNN-slug.md.
  • ARCHITECTURE.md unbounded growth — the 1500-line Index-Mode threshold plus an Architecture Reviewer 🟡 MAJOR backstop prevent monolithic architecture files.

v3.15.0 — Framework Installer: `install.sh` (5 vendors, 5 subcommands)

20 May 12:26

Choose a tag to compare

Full Changelog: v3.14.3...v3.15.0

A bootstrap-time CLI that deploys the framework into a clean target project under a chosen agent-system profile — replacing manual folder copying. The framework lives in the target's .agentic-development/ (a symlink to a sibling clone, or a full copy); per-item relative symlinks point into it; a SHA-256-hash-protected managed .gitignore block keeps framework files out of the project's git history. Built end-to-end through the framework's own VDD pipeline (Analyst → Architect → Planner → an 11-task /vdd-develop-all chain → 3-critic /vdd-multi adversarial review). The adversarial passes caught and fixed real bugs before merge — a --dry-run that mutated the filesystem, a snapshot crash on overlapping paths, a CWD-dependent copytree symlink-resolution bug, and an uninstall that could delete user-owned content. The installer is a standalone bootstrap tool — no runtime-pipeline changes.

Added

  • install.sh — minimal bash wrapper (BASH_VERSION guard, python3/PyYAML dependency check, exec python3).
  • System/scripts/install.py + System/scripts/installer/ — 16-module Python package (stdlib + PyYAML only, per NFR-5).
  • System/scripts/vendors.yaml — declarative vendor profiles; a new agent system is added without touching Python.
  • Five subcommandsinstall / switch / update / uninstall / doctor.
  • Five vendor profilesclaude, antigravity, codex, cursor, gemini-cli.
  • Two deployment modes--mode symlink (default, .agentic-development/ → sibling clone) and --mode copy (self-contained, for airgapped / CI).
  • Pre-flight conflict prevention — every target path is classified (safe / our / hard_conflict / soft_conflict) before any write; CLAUDE.md / AGENTS.md / GEMINI.md, user settings.json, and a user-owned System/ are never overwritten. --dry-run previews the plan with zero filesystem mutation.
  • Anti-clobber engine — managed blocks in .gitignore and bootstrap files are SHA-256-hashed; a hand-edited block aborts the run with a unified diff unless --force (which backs the old version up first).
  • doctor — read-only integrity verifier with a --json report schema (broken symlinks, hash mismatches, state-schema check).
  • tests/installer/ — 169 unittest tests (per-module unit + 10-scenario end-to-end + bash-wrapper smoke), wired into tests/run_tests.py.

Changed

  • docs/ARCHITECTURE.md — new §9 Framework Installer Subsystem (data model, components, invariants, security & safety).
  • README.md §Installationinstall.sh documented as the recommended deployment method; manual folder-copy retained as the alternative.

Fixed

  • Adversarial-review fixes folded in before merge: --dry-run filesystem mutation; snapshot overlapping-path crash; copytree CWD-dependent dangling-symlink resolution; uninstall/switch over-broad deletion of user content; inject_block marker-line injection; doctor state-schema gap; apply_retention able to delete every backup; a stale System symlink surviving uninstall.

v3.14.3 — `/vdd-develop-all`: VDD chain workflow with Sarcasmotron review

08 May 10:44

Choose a tag to compare

Full Changelog: v3.14.2...v3.14.3

v3.14.3 — /vdd-develop-all: VDD chain workflow with Sarcasmotron review

New workflow composing the chain-iteration of /develop-all with the per-task adversarial Sarcasmotron loop of /vdd-develop. Walks the full docs/PLAN.md, applies hostile review to each task, gates progression on explicit user input between tasks, and never auto-commits. Built end-to-end through the framework's own VDD pipeline (Analyst → Architect → Planner → Developer → Sarcasmotron); the build itself surfaced 1 honest REJECTED iteration that fixed a real control-flow bug before merge. No architectural changes — pure composition of existing Layer A / Stage Cycle patterns.

Added

  • .agent/workflows/vdd-05-run-full-task.md (53 lines) — new chain workflow with 5 numbered steps:
    1. Plan parsing with --dry-run flag (preview chain without executing).
    2. Per-task VDD cycle A→D: Builder → Verification → Sarcasmotron-roast → Refinement loop. Sarcasmotron persona is delegated to vdd-03-develop.md Step 3 (DRY — not inlined).
    3. HITL gate between tasks (yes / pause / abort) with optional --auto-continue=<seconds> flag for unattended runs.
    4. Session-state persistence on both APPROVED-merge and 3-REJECTED-STOP paths, with explicit ordering merge → persist → HITL (load-bearing — persist BEFORE the HITL prompt to survive runner crashes during user wait).
    5. Finalization with full regression suite (python3 tests/run_tests.py + validate_skill.py) and metrics report (merged tasks, REJECTED iterations, Hallucination-Convergence vs honest APPROVED counts). Auto-commit forbidden; commit/PR decision belongs to the user.
  • .claude/commands/vdd-develop-all.md — slash-command registration, byte-identical structure to develop-all.md / vdd-develop.md modulo the workflow path.
  • Resumability section + behavioral smoke test: re-invoke /vdd-develop-all after pause reads .agent/sessions/latest.yaml and resumes from the first non-merged task in docs/PLAN.md.
  • Refinement loop limit: 3 REJECTED iterations per task before STOP + escalation (chosen over /03-develop-single-task's 2 because Sarcasmotron is stricter — 2 escalates noisily, 4+ wastes tokens on stuck tasks).

Changed

  • CLAUDE.md ## WORKSPACE WORKFLOWS: +1 −1/vdd-develop-all inserted into Available Commands list next to /vdd-develop.
  • GEMINI.md: +1 −1vdd-05-run-full-task added to Available Workflows enumeration.
  • AGENTS.md §Development Phase: +1 — chain-execution pointer comparing /develop-all (auto-commit) vs /vdd-develop-all (adversarial, HITL, no auto-commit).
  • System/Docs/WORKFLOWS.md: +22 −5 across 4 surgical edits:
    • Mermaid diagram: VDDRunAll{{vdd-05-run-full-task}} node added to Automation Loops, with edge labels distinguishing auto-commit (RunAll) from Sarcasmotron+HITL (VDDRunAll).
    • Automation Loops table: new row for vdd-05-run-full-task; clarified existing 05-run-full-task row to mention auto-commit; clarified vdd-03-develop as "single task".
    • FAQ: new entry "When should I use vdd-05-run-full-task instead of 05-run-full-task?" listing the 3 load-bearing differences (adversarial review, mandatory HITL, no auto-commit).
    • VDD Multi-Step example: Step 3 expanded into 3a (single-task) and 3b (chain) variants.
  • .agent/workflows/vdd-03-develop.md: +2 — trailing cross-link note pointing at /vdd-develop-all for chain execution.

Fixed (caught during the build itself)

  • Hallucinated test path (caught in Verification, before Sarcasmotron): user's original brief and intermediate spec drafts referenced bash tests/test_e2e.sh — a file that does not exist in this repo. The actual test harness is python3 tests/run_tests.py. Workflow file now uses the real path. Spec drift remains in docs/TASK.md and docs/tasks/task-061-02-workflow-impl.md; left as-is per "specs are write-once snapshots" — the implementation is canonical.
  • Step 3 ↔ Step 4 control-flow ambiguity (caught by Sarcasmotron, iteration 1 of 061-02): Step 2D originally said "APPROVED → merge → Step 3 (HITL)", but Step 4 (session-state persist) sat below Step 3 numerically and claimed "after every merge", leaving the persist-vs-HITL execution order undefined. Could lose merge state if a runner crashes during user wait. Fix: Step 2D now explicitly states merge → Step 4 (persist) → Step 3 (HITL gate) → next task, with "Order is load-bearing" callout.
  • Missing failure-path session-state persist (caught by Sarcasmotron, iteration 1 of 061-02): the 3-REJECTED-STOP path did not persist failure state, so resumption after escalation would silently retry from scratch. Fix: Step 2D now invokes Step 4 with --status "failed_sarcasmotron" --add_blocker "Task <name>: 3 REJECTED iterations" on the STOP path; Step 4 header annotates "called from Step 2D — both APPROVED and 3-REJECTED-STOP paths".

Verification

  • All 7 RTM checks for Task 061-01 (stubs) GREEN on iteration 1.
  • All 7 RTM checks for Task 061-02 (logic) GREEN after iteration 2 (1 Verification fix + 2 Sarcasmotron fixes).
  • All 4 RTM checks for Task 061-03 (cross-links) GREEN on iteration 1 (+1 −1 to CLAUDE.md, +2 −0 to vdd-03-develop.md, no other workflow regressions).
  • Final regression suite: python3 tests/run_tests.py → 5/5 passed.
  • Workflow file 53 lines (≤150 line budget).

Process metrics (chain build)

Metric Value
Tasks merged 3/3
Total REJECTED iterations across chain 1
Hallucination-Convergence APPROVED 3
Honest APPROVED (no nitpick-inversion) 0
Verification-phase finds (caught before Sarcasmotron) 1

The single honest REJECTED iteration was a genuine save: Sarcasmotron caught a control-flow bug that would have made session-state semantics ambiguous to a future LLM consumer. The Verification phase separately caught a hallucinated test path inherited from the brief — exactly the filter Step B is designed to provide.

Impact

  • New chain primitive for high-rigor multi-task batches: per-task adversarial scrutiny + mandatory HITL + zero accidental commits. Pairs with the existing /develop-all (fast path with auto-commit) — pick by required rigor, not by default.
  • Resumability via latest.yaml makes pause/resume a first-class operation: long batches can run across context-window resets without losing merge state.
  • Demonstrates the framework can build its own next-tier workflow under its own VDD pipeline, including catching real bugs via the adversarial loop. The Sarcasmotron persona's "Hallucination Convergence" exit rule worked as intended: round 1 produced 2 honest findings, round 2 reduced to bikeshedding.

v3.14.2 — security-audit skill v3.2 → v3.3 (bug fixes + coverage + hardening)

20 Apr 10:06

Choose a tag to compare

Full Changelog: v3.9.17...v3.14.2

v3.14.2 — security-audit skill v3.2 → v3.3 (bug fixes + coverage + hardening)

Post-analysis critique of the security-audit skill surfaced 2 real bugs, 4 coverage gaps, and 6 refinement opportunities. All 14 items addressed. Scanner integrity is visibly improved; no breaking changes to CLI surface (new --max-size is additive).

Fixed (HIGH — correctness bugs)

  • Pip lock-file detection: scanners.py previously treated requirements.txt as a lock file (it is not — it does not pin a transitive graph with hashes) AND had no is_type branch for Python at all, so Missing Lock File never fired for pip projects. Rewrote scan_dependencies to use ecosystem groups with explicit markers (presence of pyproject.toml/setup.py/setup.cfg/requirements.txt/Pipfile) and locks (real lock files only: Pipfile.lock, poetry.lock, uv.lock, pdm.lock). Also fixed JS over-flagging (previously yarn+pnpm both fired when package-lock.json was present — now any of the three locks satisfies the JS ecosystem).
  • Report-path rassinchron: .agent/workflows/security-audit.md wrote docs/SECURITY_AUDIT.md; the security-auditor agent (and System/Agents/10_security_auditor.md) writes docs/audit/security-{ID}.md. Aligned workflow → agent convention (supports multiple audits + integrates with skill-archive-task ID convention).

Fixed (MED — coverage + refinement)

  • SBOM scan was non-recursive (scanners.py scan_sbom): glob("*sbom*") only looked in the project root. SBOMs are commonly placed in build/, dist/, artifacts/, docs/. Switched to rglob with SKIP_DIRS filter and duplicate dedup. Nested docs/sbom.json now detected.
  • Dead SBOM-probe block removed: previous code ran syft --version / cdxgen --version and printed "tool is available for SBOM generation" without actually generating anything. Either misleading or unfinished — removed entirely; generation instructions remain in the Missing SBOM finding message.
  • MAX_FILE_SIZE 5 MB → 15 MB + CLI --max-size MB: 5 MB was silently skipping most modern minified production bundles (vendor.js/bundle.js routinely exceed 10 MB after Webpack DefinePlugin). Bumped default to 15 MB; added --max-size flag with runtime override (scanners now read config.MAX_FILE_SIZE via module-reference, not import-time copy).
  • Solidity public/external false positives on view/pure: pattern flagged every non-modifier public function, noisily firing on view/pure getters. Tightened regex with negative lookahead (?!.*\b(?:view|pure|constant)\b) — getters no longer flagged; state-mutators still flagged.

Added (coverage expansion)

  • +16 regex patterns across 3 language stacks (patterns.py):
    • Rust (6): unsafe {} blocks, unsafe fn, std::mem::transmute, std::mem::forget, .unwrap_unchecked, from_raw_parts, rand::random (weak RNG for security).
    • Go (6): "math/rand" import + call-site, SQL concat/Sprintf in db.Query/Exec, http.ListenAndServe (missing TLS), filepath.Join with request data, exec.Command with formatted/concat string.
    • GraphQL (4): introspection: true, GRAPHQL_PLAYGROUND=true, graphiql: true, ApolloServer({...}) config (verify depth/complexity limits for DoS).
  • External tools — cross-cutting additions (external.py):
    • semgrep --config auto (de-facto SAST standard since 2024) now runs for any project type.
    • gitleaks detect (primary) with trufflehog filesystem fallback — stronger secret detection than regex-only.
    • Missing tools remain non-fatal (per run_command contract).
  • ReDoS guard: added MAX_LINE_LENGTH = 4000 to config.py; scan_code_patterns now skips pathologically long lines (minified JS routinely has >100k-char single lines, triggering catastrophic backtracking on complex regex). Real source code lines almost never exceed 4k chars.
  • fuzzing_invariants.md expanded 42 → 170+ lines: 8 invariant categories (accounting, access control, monotonicity, pausability, ERC-20, ERC-4626, oracle, reentrancy); Foundry / Echidna / Medusa / Halmos setup; mandatory handler-based fuzzing pattern with ghost state; depth requirement table by criticality; 10-item edge-case checklist; post-fuzz regression discipline.

Documentation

  • SKILL.md §2 now documents --max-size flag, Rust/Go/GraphQL coverage, semgrep/gitleaks cross-cutting tools, ReDoS guard, and clarifies that --scan-type external runs ONLY external tools (SKIPS regex scans) — previously ambiguous.
  • Version bumped to v3.3 in SKILL.md frontmatter, header, and run_audit.py module docstring + CLI description.

Verification (smoke-tested)

  • Self-exclusion holds on own skill dir (0 findings).
  • pyproject.toml alone → Missing Lock File (pip) — previously silent.
  • requirements.txt alone → Missing Lock File (pip) — previously silent.
  • Nested docs/sbom.json detected via rglob — previously reported missing.
  • Rust test file (unsafe {}, std::mem::transmute, rand::random::<u32>()) → all 3 patterns fire.
  • Solidity test: view getter skipped, public state-mutator flagged.
  • Config/deps/IaC/SBOM scans on repo root all pass without regression.

Impact

  • Python projects without real lock files (Pipfile.lock/poetry.lock/uv.lock/pdm.lock) now receive supply-chain warnings — previously false-negative. Hash-pinned requirements.txt (pip-compile output with --hash=sha256: lines) is accepted as a lock (avoids false-positive on pip-tools mainstream pattern, added in Round 4).
  • Rust, Go, and GraphQL codebases receive initial in-process regex coverage. For depth, gosec/govulncheck/semgrep/cargo-audit/clippy remain primary (invoked via --scan-type external); the in-process patterns are fast signalling, not a replacement.
  • Minified bundles up to 15 MB are now scanned for accidentally-committed secrets (previously 5 MB cutoff).
  • Adversarial convergence signal: issues-found at R3 (3 actionable bugs fixed in P1–P2) and again at R4 (10 defects — 4 broken patterns + pip-compile false-positive regression + SBOM perf regression + test gap — all fixed before release tag).

v3.14.1 — VDD adversarial-review fixes on v3.14.0

Post-release adversarial critique of v3.14.0 surfaced 7 findings (2 HIGH, 4 MED, 2 LOW). All addressed in this patch. No behavior change for Claude Code users; all fixes are rigor/documentation improvements that close silent-fail modes.

Fixed (HIGH — closes silent-fail modes)

  • SKILL.md §1 load-semantics ambiguity: previous wording "load the matching reference" did not specify WHO reads WHEN. A junior agent could consult the selection table, memorize the choice, and never actually Read the reference file — proceeding to §2 with only abstract concepts and no invocation syntax. Now explicitly: "Use the Read tool to load the matching reference file now, before applying §2–§6."
  • sequential-fallback.md untested claim: the file was marked "Complete and vendor-agnostic" but had never been exercised on a non-Claude runtime. Downgraded to "proposed pattern, not yet validated" with an explicit caveat that all claims about wall-clock overhead, context-bleed, and persona-swap effectiveness are theoretical. Parent SKILL.md §7 gained the same caveat. Invites first-validator PR after real run.

Fixed (MED — closes fail-soft-where-loud-was-safer)

  • Stubs now emit a visible DEGRADED-MODE banner (gemini-cli.md, cursor.md, antigravity.md): previously a non-Claude agent landing on a stub silently fell through to sequential fallback — user thinks they have parallel execution, actually running at ~3× latency. The banner makes the degradation loud at the top of each stub.
  • SKILL.md §1.1 detection now specifies cwd-walkup: previous rule assumed the agent ran from project root. Now detects via find-up (walk from cwd toward filesystem root, stop at .git), and emits a warning rather than silently falling back when no marker is found.
  • SKILL.md §1.2 tie-break clarified: when multiple runtime markers match, replace the untestable "prefer runtime currently executing" with concrete signals — tool-list fingerprint (Agent+TeamCreate+SendMessage → Claude Code), explicit runtime: caller hint, and explicit warning on still-ambiguous instead of silent guess.
  • v3.14.0 CHANGELOG overclaim softened: "No behavior change for Claude Code users" → "Content preserved; section numbering reorganized — see §9 History and references/claude-code.md for the mapping." Calls out the §5.1 → §5 anchor shift for anyone citing the old structure in external notes.

Fixed (LOW — dedup + translation hint)

  • Stubs deduplicated via references/_stub-template.md: previously 3 near-identical files (~45 lines each, ~80% shared). The shared checklist + contribution guidance now lives in _stub-template.md; vendor-specific stubs slimmed to ~22 lines each, carrying only the vendor-specific marker + warning banner + pointer to template. Maintenance: update one template file instead of three.
  • sequential-fallback.md adds orchestration-style note: previously assumed chat-based orchestration (messages as persona swaps). Added explicit SDK/API translation note — for non-chat runtimes, the pattern is one system prompt per teammate with messages list reset betwe...
Read more

v3.9.17 — Developer Discipline: Karpathy Guidelines Integration

15 Apr 13:47

Choose a tag to compare

Full Changelog: v3.9.16...v3.9.17

v3.9.17 — Developer Discipline: Karpathy Guidelines Integration

Added

  • §1.5 Think Before Implementing (developer-guidelines): Graduated ambiguity handling protocol — critical ambiguity goes to TASK.md Open Questions, implementation-level decisions are made by the developer with brief documentation, trivial decisions are made silently.
  • §1.6 Implementation Discipline (developer-guidelines): Two-level decision framework — architectural decisions (new modules, public API, data models) must come from PLAN.md/ARCHITECTURE.md; implementation details (internal patterns, helpers, abstractions) are the developer's professional judgment. Speculative complexity is prohibited.
  • §6.2 Multi-Step Tasks (developer-guidelines): Generalized Verification Protocol with Step → verify: [check] pattern, extending the Bug Fixing Protocol to all multi-step work.
  • Before/after code examples (developer-guidelines/examples/coding-anti-patterns.md): 3 real-world examples — drive-by refactoring, speculative features vs. plan-driven implementation, silent interpretation vs. surfacing ambiguity. Adapted from Karpathy Guidelines for complex product development context.

Improved

  • Red Flags (developer-guidelines §0): +2 entries — against silent architectural changes and speculative features.
  • Strict Adherence (developer-guidelines §1): +2 entries — Task Traceability (every change must serve the task, professional choices within scope are OK) and Style Matching (match existing code style).
  • Rationalization Table (developer-guidelines §9): +3 entries covering speculative additions, silent plan deviation, and drive-by improvements.
  • Atomicity & Traceability (core-principles §1): Added Verification Checkpoints for multi-step tasks.
  • Minimizing Hallucinations (core-principles §3): Added Ambiguity Protocol with cross-reference to developer-guidelines §1.5.
  • Token budget (skill-phase-context): Updated Development phase estimate from ~768 to ~1,100 to reflect expanded developer-guidelines.

Design Decisions

  • "Implementation Discipline" instead of "Simplicity First": Karpathy's "minimum code" principle was adapted for complex product development — architectural complexity is valid when plan-driven; only speculative complexity is prohibited.
  • Graduated Ambiguity instead of "ask everything": Three-tier protocol prevents bombarding users with questions while ensuring critical decisions are surfaced.
  • No new standalone skill created: All changes integrated into existing developer-guidelines (Tier 1) and core-principles (Tier 0) to avoid skill bloat and tier conflicts.

v3.9.16 — Security Audit v3.2: Smart Contract Patterns & Modular Architecture

08 Mar 18:15

Choose a tag to compare

Full Changelog: v3.9.15...v3.9.16

v3.9.16 — Security Audit v3.2: Smart Contract Patterns & Modular Architecture

Added

  • Solidity/Smart Contract patterns (16 new): Reentrancy (.call{value:}, .send(), .transfer()), arbitrary execution (delegatecall, selfdestruct EIP-6780, suicide()), access control (tx.origin, public/external without modifier), oracle manipulation (getReserves(), latestRoundData()), unchecked return values, unprotected initializers, integer overflow (pre-0.8.0), locked ether, inline assembly.
  • VDD Round 3 critique document with real hack coverage matrix (Dec 2025 – Mar 2026).
  • Real-world hack validation: Scanner tested against contracts simulating SwapNet ($13.4M), Truebit ($26.4M), YieldBlox ($10.2M), Aperture ($4M) attack vectors — 7/10 vectors fully detected.

Improved

  • Modular scanner architecture: Refactored 886-line monolith run_audit.py into 7-file package (audit/config.py, audit/patterns.py, audit/helpers.py, audit/scanners.py, audit/external.py, audit/__init__.py).
  • MAX_FILE_SIZE consistency: Added 5MB file size guard to scan_configuration() and scan_iac().
  • Pattern count: 105 → 121 total patterns (28 secret + 62 dangerous + 25 IaC + 6 config).

Fixed

  • VDD Round 2 (8 issues): os.popen() CWE misclassification, missing subprocess.run shell=True, Flask open redirect regex, SQL % formatting detection, IaC false positives on non-IaC YAML, symlink following, SSRF pattern expansion.

v3.9.15 — Claude Code Integration

06 Mar 10:00

Choose a tag to compare

v3.9.15 — Claude Code Integration

Added

  • Claude Code entry point: Created CLAUDE.md (136 lines) adapted from GEMINI.md with native Claude Code tool references (Read, Write, Edit, Bash, Grep, Glob), session state bootstrap, and explicit tier-based skill loading protocol.
  • Claude Code hooks: Added .claude/settings.json with PostToolUse hook and .claude/hooks/validate_skill_hook.sh for automatic skill validation on file modification.
  • Claude Code commands: Created 20 slash command files in .claude/commands/ covering all 21 workflows (delegator pattern — single source of truth in .agent/workflows/).
    • Core: /start-feature, /plan, /develop, /develop-all, /light
    • VDD: /vdd, /vdd-start-feature, /vdd-plan, /vdd-develop, /vdd-adversarial, /vdd-multi
    • Pipelines: /full, /security-audit, /base-stub-first, /framework-upgrade, /iterative-design
    • Product: /product-full-discovery, /product-market-only, /product-quick-vision
    • Docs: /update-docs
  • Migration specification: Added docs/migration-to-claude.md with full platform comparison, tool mapping, hook adaptation guide, and validation checklist.

Improved

  • AGENTS.md: Added missing "Session State Persistence" instruction (update_state.py on phase boundaries), achieving parity with GEMINI.md.
  • SESSION_CONTEXT_GUIDE.md: Added Section 5 "Platform Memory Integration" documenting how framework session state complements platform-specific memory systems (Claude Code, Cursor, Gemini).
  • README.md / README.ru.md: Updated "Option C: Claude Code" section — replaced manual setup instructions with ready-to-use configuration and full command list.

Full Changelog: v3.9.14...v3.9.15