feat(plugin): [nasde.plugin] + skill-by-reference ([[skill]])#51
Merged
Conversation
Mirror [nasde.source] for local Claude Code plugins: one task.toml declaration ships a plugin dir into the sandbox image AND registers it for the agent (skills discoverable by reference + MCP server wired) — no vendored snapshot, no triple hand-wiring. - config.PluginConfig + [nasde.plugin] parser (path/ref/install_root/ build/env), next to SourceConfig. - docker.ensure_task_plugin(): stage plugin (at ref via temp worktree) into gitignored _nasde-plugin/ inside the *active* build context; append a sentinel-fenced COPY+build stage. Composes with plugin-only (generated base), plugin+[nasde.source] (compose-redirected context), and a hand-written Dockerfile (preserved verbatim). - plugin_registration.py: shared skill+MCP machinery. stage_skill_dir carries the WHOLE skill dir (incl. references/) — also fixes the latent bug where variants/<v>/skills/ dropped references/. MCP server derived from <plugin>/.mcp.json, env-wrapped, injected idempotently into task.toml (Harbor reads MCP only from there); never clobbers an author-declared same-name server. - Skill-by-reference: [[skill]] array in variant.toml feeds the same stage_skill_dir machinery — reference a skill from a source path (optional ref) instead of copying into variants/. Legacy copy path unchanged. Backward compatible: no [nasde.plugin]/[[skill]] -> behaviour as before. ADR-009. 33 new tests; full suite 192 passing; ruff + mypy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CVE audit (pip-audit --strict) started failing on every branch once CVE-2026-45409 (idna <3.15) and CVE-2026-44431 / CVE-2026-44432 (urllib3 <2.7.0) were published — both are deep transitive deps via harbor/opik/supabase. Pin them directly, mirroring the existing litellm GHSA pin precedent. No runtime behaviour change; 192 tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hmark Exercises the new [[skill]] mechanism on the canonical dogfooding benchmark. claude-nasde-dev-by-ref references the LIVE .claude/skills/nasde-dev (no copy under variants/), composing with the benchmark's existing [nasde.source] ref pin. Validated via the real `nasde run` staging path: the by-reference staged SKILL.md is byte-identical to the live source, while the pre-existing copy-into-variant (claude-nasde-dev-full-stack) has already drifted from it — the exact drift this feature eliminates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eference Replace the previous demo-variant approach (claude-nasde-dev-by-ref) with proper migration of the EXISTING variants. Better dogfooding: shows replacement of the copy-into-variants pattern in place, not addition alongside it. For each of the three existing variants (claude-nasde-dev-full-stack, -with-arch, -with-testing): add [[skill]] pointing at the live .claude/skills/nasde-dev source, delete the drifted variant copy under skills/nasde-dev/. python-best-practices / python-testing copies stay (no live source for them in this repo). Verified: all three variants now stage nasde-dev sha256 073822c1b52b (= LIVE), eliminating the drift from the previous ec43ca37e668 copy. Demo-variant claude-nasde-dev-by-ref deleted (was redundant with the migration). Full suite still 192 passing; ruff/mypy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…kill]] Both faces of ADR-009 validated on real benchmarks before merge: - [[skill]] on examples/nasde-dev-skill (3 variants migrated): drift eliminated, full pipeline run scored 100/125 (0.80) with verifier reward=1. - [nasde.plugin] on SDLC analyze-conversation (motivating downstream): snapshot + hand-wired MCP + variant skill-copy all deleted in the SDLC repo; full run with baked noesis plugin reached verifier reward=1 via auto-injected MCP wrapper, matching pre-migration baseline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Multi-agent code review surfaced bugs across all three faces of the
feature (config, docker staging, registration). All addressed with
regression tests (+15 tests, 192 → 207 passing). Net code delta is
small and surgical; the registration module gained one helper
refactor (multi-server support) and the rest are 1–10 line guards or
quoting fixes.
001 docker.py: _plugin_build_context_dir falls back to env_dir when no
compose exists. Fixes FileNotFoundError when [nasde.plugin] combines
with a remote [nasde.source] (no compose is generated for remote URLs).
002 plugin_registration.py: _build_mcp_server → _build_mcp_servers — wire
ALL servers from .mcp.json, not just the first. Honor per-server env
field from .mcp.json. Precedence: nasde defaults → plugin .mcp.json env
→ [nasde.plugin].env override.
003 plugin_registration.py + docker.py: _strip_mcp_block /
_strip_existing_plugin_stage refuse to rewrite when only the BEGIN
sentinel is present (END hand-deleted). str.partition would otherwise
silently truncate every section below BEGIN — destroying user-authored
task.toml content.
005 docker.py + plugin_registration.py: shell-quote install_root in
RUN cd ... && build and in the MCP wrapper; JSON-array form for
COPY. Safe for paths with spaces or shell-special characters.
006 runner.py: _refresh_sandbox_files rebuilds sandbox_files from scratch
each run (authored from _collect_sandbox_files + derived extra).
Drops the if-extra-non-empty guard so removed [[skill]]/plugin
entries don't leave stale entries pointing at cleaned-up worktrees.
007 examples/nasde-dev-skill/variants/{full-stack,with-arch,with-testing}/
variant.toml: translated Polish migration comments to English.
008 plugin_registration.py: stage_skill_dir skips OS junk (.DS_Store,
Thumbs.db), VCS data (.git/), Python cache (__pycache__/, .venv/,
*.pyc), editor temp files (*.swp, *~, *.bak, .#*). Mirrors the
ignore list _stage_plugin_tree already uses. Live developer skill
dirs referenced via [[skill]] no longer leak workstation state.
010 docker.py: _resolve_plugin_source validates .claude-plugin/plugin.json
against the RESOLVED path (worktree at ref, when set), not the working
tree. Restores documented ref semantics: pin to a historical commit
even when the working tree is mid-refactor.
013 runner.py: _require_homogeneous_plugins fails fast when multiple
tasks in one project declare different [nasde.plugin] entries.
Plugin skills register into a variant-wide sandbox; silently merging
different plugins would contaminate trials.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…written sandbox_files
The previous bug-006 fix (_refresh_sandbox_files) overwrote ALL entries
in sandbox_files on every run — regressing the supported use case of
hand-written harbor_config.json (CLAUDE.md documents harbor_config.json
as "Optional: agent import path + sandbox_files mapping"). User-added
entries vanished silently.
New contract: each agent in harbor_config.json carries a
_nasde_derived_keys: [...] meta-list of keys nasde owns. Refresh:
prev_derived = agent._nasde_derived_keys (default [])
authored = _collect_sandbox_files(variant_dir)
handwritten = {k: v for k, v in current
if k not in prev_derived
and authored.get(k) != v}
sandbox_files = {**extra, **authored, **handwritten}
_nasde_derived_keys = sorted(extra)
Precedence (right wins in dict merge): handwritten > authored > extra.
Hand-written entries are NEVER on the derived list, so they survive
every refresh. Removed [[skill]]/[nasde.plugin] entries drop cleanly
because they were on the previous derived list.
Three collision warnings surface duplication that would otherwise be
silent (per user request — silent collisions are BARDZO nie ok):
- hand-written ∩ derived → hand-written wins, WARNING
- hand-written ∩ authored → hand-written wins, WARNING
- derived ∩ authored → authored wins, WARNING
Pydantic's AgentConfig in Harbor 0.6.4 silently drops unknown fields,
so _nasde_derived_keys never reaches the trial layer — it's a
nasde-local persistence concern.
Dead code removed: _merge_sandbox_files (no production call site since
the previous bug-006 fix; two test cases that exercised it deleted —
they tested the dead function, not real behavior).
Edge-case coverage (8 buckets per PR discussion):
EC1 preserve hand-written across runs
EC2 drop stale derived after [[skill]] removed
EC3 update derived host path (new worktree)
EC4 three collision types (hand∩derived, hand∩authored, derived∩authored)
EC5 greenfield first run (no harbor_config.json)
EC6 multiple agents in hand-written config
EC7 idempotent (run twice → identical file)
EC8 forward-compat with old harbor_config.json (no _nasde_derived_keys meta)
Net test count: 207 → 214 (+11 new, -2 dead-code tests removed,
-2 tests merged into single EC8 forward-compat coverage).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
szjanikowski
added a commit
that referenced
this pull request
May 21, 2026
Promote the [Unreleased] section to [0.4.0] (2026-05-21), add fresh [Unreleased] anchor, update compare links, and cite #51 / #52 inline plus in the link-ref table. Highlights: - [nasde.plugin] in task.toml — ship a local Claude Code plugin into the sandbox with one declaration (ADR-009). (#51) - Skill-by-reference: [[skill]] array in variant.toml. (#51) - Fix: variants/<v>/skills/<name>/ now carries references/ and sibling files, not just SKILL.md. (#51) - Security pins: idna>=3.15, urllib3>=2.7.0. (#51) - CI: pip-audit ignores disputed pyjwt PYSEC-2025-183. (#52) Co-authored-by: Szymon Janikowski <szymon.janikowski@itlibrium.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Mirror
[nasde.source]for local Claude Code plugins and add a lighterskill-by-reference path, removing the snapshot / triple-copy tax that
plugin-exercising benchmarks pay today.
[nasde.plugin]intask.tomlOne declaration ships a local plugin dir (
.claude-plugin/plugin.json) intothe sandbox image and registers it for the agent:
config.PluginConfig+[nasde.plugin]parser, next toSourceConfig.docker.ensure_task_plugin()stages the plugin (atrefvia a temp gitworktree) into a gitignored
_nasde-plugin/inside the active buildcontext, then appends a sentinel-fenced
COPY+build stage to theDockerfile.
plugin_registration.pyregisters the plugin's ownskills/(whole dirs)and injects its MCP server (from
<plugin>/.mcp.json, env-wrapped) intotask.toml.Skill-by-reference:
[[skill]]invariant.tomlReference a skill from a source path instead of copying it into
variants/<v>/skills/. Whole dir (incl.references/) staged; feeds thesame
stage_skill_dirmachinery as the plugin path.End-to-end validation (both faces, on real benchmarks)
Beyond the 192-test unit suite, both features were exercised on real
benchmarks running the full pipeline (Docker build + agent in sandbox +
verifier + LLM evaluator):
[[skill]]onexamples/nasde-dev-skill(dogfooding)variants/(
claude-nasde-dev-full-stack,-with-arch,-with-testing) to[[skill]]referencing the live.claude/skills/nasde-dev/.(SHA
ec43ca37e668vs live073822c1b52b) — exactly the failure modethis feature exists to eliminate.
claude-nasde-dev-full-stack(3 skills mixed: 1 by-ref + 2legacy copy): verifier
reward=1, evaluator score 100/125 (0.80).Sandbox SHA matches live source — drift eliminated.
[nasde.plugin]on SDLCevals/analyze-conversation(the motivating downstream)This is the benchmark whose README explicitly documents the
snapshot-refresh workaround that
[nasde.plugin]was designed to remove.Migrated in the SDLC repo (separate commit
403c253):_plugin-staging/(~210 files) — replaced by[nasde.plugin][[environment.mcp_servers]]— auto-injected by nasdevariants/with-skill/skills/analyze-conversation/copy —plugin skills auto-registered
Full pipeline run with this nasde branch installed locally, against the
migrated benchmark, exercised the baked noesis plugin's MCP server end to
end: verifier
reward=1, matching the pre-migration baseline. Confirmsthe feature replaces the entire snapshot/MCP/variant-skill workaround on
its motivating real case, in one task.toml declaration.
Design decisions
Docker context to
environment/. The plugin is staged into a gitignored_nasde-plugin/inside whatever context is active —environment/forplugin-only / hand-written-Dockerfile, the source repo/worktree when
[nasde.source]redirected the compose context (nasde reads it back fromthe generated compose). A base Dockerfile is generated only when there is
no source and no real Dockerfile content; hand-written Dockerfiles are
preserved verbatim with the stage appended before any trailing
CMD.plugin_registration.stage_skill_diris the single mechanism for plugin skills, plugin-
[[skill]]-by-reference,and the legacy
variants/<v>/skills/copy path. This also fixes alatent bug: the variant copy path previously mapped only
SKILL.md,silently dropping
references/(breaking skills likeanalyze-conversationthat readreferences/*.mdat runtime).task.toml. Harbor reads MCP servers only fromtask.config.environment.mcp_servers(trial.py:188). The block issentinel-fenced, idempotent, visibly generated, and respects an
author-declared same-name server (logs + skips, never clobbers) — the same
"generate in place where the tool reads it" pattern nasde already uses for
environment/Dockerfile/docker-compose.yaml.[[skill]]array in the existingvariant.toml(no new file, variant-scoped) — per maintainer preference.Files changed
src/nasde_toolkit/config.pyPluginConfigdataclass +[nasde.plugin]parsersrc/nasde_toolkit/docker.pyensure_task_plugin(),StagedPlugin, plugin staging + Dockerfile stage, publiccreate_ref_worktreesrc/nasde_toolkit/plugin_registration.pysrc/nasde_toolkit/runner.py_collect_claude_skillsto carry whole dirsrc/nasde_toolkit/scaffold/__init__.py.gitignoreignores_nasde-plugin/.gitignore**/_nasde-plugin/examples/nasde-dev-skill/variants/*[[skill]], drifted copies deleteddocs/adr/009-…mdREADME.md,docs/use-cases.md,CHANGELOG.md,CLAUDE.mdpyproject.toml,uv.lockidna>=3.15,urllib3>=2.7.0)tests/test_config_plugin.py,tests/test_docker_plugin.py,tests/test_plugin_registration.py,tests/test_runner_skills.pyVerification
Existing
examples/benchmarks still load unchanged (plugin = None) forbenchmarks without
[nasde.plugin].Backward compatibility
No
[nasde.plugin]/[[skill]]→ behaviour is byte-for-byte as before.Existing
variants/<v>/skills/copies keep working (and now correctlycarry
references/— a strict improvement, also validated in thenasde-dev-skill migration where
python-best-practices/python-testingremain as legacy copies and continue to work alongside the new
[[skill]]entries).
CI status
The CVE audit failure is environmental and pre-existing: PYSEC-2025-183
is a disputed advisory on
pyjwt 2.12.1(upstream maintainer: "the keylength is chosen by the application that uses the library") with no
upstream fix available —
2.12.1is the latest release.pyjwtistransitive via
supabase/opik, not used directly by nasde-toolkit. Thesame audit fails identically on
mainonce re-run (last green main run is2026-05-09, before PYSEC-2025-183 was published 2025-07-31). Will be
addressed in a separate dependency-policy PR at release time.
Downstream follow-up
The SDLC
analyze-conversationmigration referenced above lives on aseparate branch in the SDLC repo and will land there after this PR is
merged. The
draft-to-design-doc/benchmark in SDLC can be migrated thesame way (separate PR, separate repo).
🤖 Generated with Claude Code