Skip to content

feat(plugin): [nasde.plugin] + skill-by-reference ([[skill]])#51

Merged
szjanikowski merged 7 commits into
mainfrom
feat/nasde-plugin-and-skill-by-reference
May 21, 2026
Merged

feat(plugin): [nasde.plugin] + skill-by-reference ([[skill]])#51
szjanikowski merged 7 commits into
mainfrom
feat/nasde-plugin-and-skill-by-reference

Conversation

@szjanikowski
Copy link
Copy Markdown
Contributor

@szjanikowski szjanikowski commented May 19, 2026

What

Mirror [nasde.source] for local Claude Code plugins and add a lighter
skill-by-reference path, removing the snapshot / triple-copy tax that
plugin-exercising benchmarks pay today.

[nasde.plugin] in task.toml

One declaration ships a local plugin dir (.claude-plugin/plugin.json) into
the sandbox image and registers it for the agent:

[nasde.plugin]
path = "../../../src/plugins/my-plugin"
ref = "abc1234"                           # reproducible, like [nasde.source]
install_root = "/opt/my-plugin"
build = "bun install --frozen-lockfile"
[nasde.plugin.env]
CLAUDE_PLUGIN_DATA = "/opt/my-plugin-data"
  • config.PluginConfig + [nasde.plugin] parser, next to SourceConfig.
  • docker.ensure_task_plugin() stages the plugin (at ref via a temp git
    worktree) into a gitignored _nasde-plugin/ inside the active build
    context
    , then appends a sentinel-fenced COPY+build stage to the
    Dockerfile.
  • plugin_registration.py registers the plugin's own skills/ (whole dirs)
    and injects its MCP server (from <plugin>/.mcp.json, env-wrapped) into
    task.toml.

Skill-by-reference: [[skill]] in variant.toml

[[skill]]
path = "../../../src/plugins/my-plugin/skills/my-skill"
ref  = "abc1234"

Reference a skill from a source path instead of copying it into
variants/<v>/skills/. Whole dir (incl. references/) staged; feeds the
same stage_skill_dir machinery as the plugin path.

End-to-end validation (both faces, on real benchmarks)

Beyond the 192-test unit suite, both features were exercised on real
benchmarks running the full pipeline (Docker build + agent in sandbox +
verifier + LLM evaluator):

[[skill]] on examples/nasde-dev-skill (dogfooding)

  • Three existing variants migrated from drift-prone copy-into-variants/
    (claude-nasde-dev-full-stack, -with-arch, -with-testing) to
    [[skill]] referencing the live .claude/skills/nasde-dev/.
  • The pre-migration copies had already drifted from the live source
    (SHA ec43ca37e668 vs live 073822c1b52b) — exactly the failure mode
    this feature exists to eliminate.
  • Full run of claude-nasde-dev-full-stack (3 skills mixed: 1 by-ref + 2
    legacy copy): verifier reward=1, evaluator score 100/125 (0.80).
    Sandbox SHA matches live source — drift eliminated.

[nasde.plugin] on SDLC evals/analyze-conversation (the motivating downstream)

This is the benchmark whose README explicitly documents the
snapshot-refresh workaround that [nasde.plugin] was designed to remove.
Migrated in the SDLC repo (separate commit 403c253):

  • Deleted vendored _plugin-staging/ (~210 files) — replaced by [nasde.plugin]
  • Deleted hand-wired [[environment.mcp_servers]] — auto-injected by nasde
  • Deleted variants/with-skill/skills/analyze-conversation/ copy —
    plugin skills auto-registered

Full pipeline run with this nasde branch installed locally, against the
migrated benchmark, exercised the baked noesis plugin's MCP server end to
end: verifier reward=1, matching the pre-migration baseline. Confirms
the feature replaces the entire snapshot/MCP/variant-skill workaround on
its motivating real case, in one task.toml declaration.

Design decisions

  • Build-context strategy: stage into the active context. Harbor pins the
    Docker context to environment/. The plugin is staged into a gitignored
    _nasde-plugin/ inside whatever context is active — environment/ for
    plugin-only / hand-written-Dockerfile, the source repo/worktree when
    [nasde.source] redirected the compose context (nasde reads it back from
    the generated compose). A base Dockerfile is generated only when there is
    no source and no real Dockerfile content; hand-written Dockerfiles are
    preserved verbatim with the stage appended before any trailing CMD.
  • Shared skills/MCP registration. plugin_registration.stage_skill_dir
    is the single mechanism for plugin skills, plugin-[[skill]]-by-reference,
    and the legacy variants/<v>/skills/ copy path. This also fixes a
    latent bug
    : the variant copy path previously mapped only SKILL.md,
    silently dropping references/ (breaking skills like
    analyze-conversation that read references/*.md at runtime).
  • MCP injection writes task.toml. Harbor reads MCP servers only from
    task.config.environment.mcp_servers (trial.py:188). The block is
    sentinel-fenced, idempotent, visibly generated, and respects an
    author-declared same-name server (logs + skips, never clobbers) — the same
    "generate in place where the tool reads it" pattern nasde already uses for
    environment/Dockerfile / docker-compose.yaml.
  • Skill-by-reference shape: a [[skill]] array in the existing
    variant.toml (no new file, variant-scoped) — per maintainer preference.

Files changed

File Change
src/nasde_toolkit/config.py PluginConfig dataclass + [nasde.plugin] parser
src/nasde_toolkit/docker.py ensure_task_plugin(), StagedPlugin, plugin staging + Dockerfile stage, public create_ref_worktree
src/nasde_toolkit/plugin_registration.py new — shared skill+MCP registration
src/nasde_toolkit/runner.py wire plugin/skill staging; merge derived sandbox files; fix _collect_claude_skills to carry whole dir
src/nasde_toolkit/scaffold/__init__.py scaffold .gitignore ignores _nasde-plugin/
.gitignore ignore **/_nasde-plugin/
examples/nasde-dev-skill/variants/* dogfooding: 3 variants migrated to [[skill]], drifted copies deleted
docs/adr/009-…md new ADR covering both faces
README.md, docs/use-cases.md, CHANGELOG.md, CLAUDE.md docs
pyproject.toml, uv.lock security pins (idna>=3.15, urllib3>=2.7.0)
tests/test_config_plugin.py, tests/test_docker_plugin.py, tests/test_plugin_registration.py, tests/test_runner_skills.py new — 33 tests

Verification

uv run pytest -q                # 192 passed (159 existing + 33 new)
uv run ruff check src/ tests/   # clean
uv run ruff format --check      # clean
uv run mypy src/nasde_toolkit/  # clean

Existing examples/ benchmarks still load unchanged (plugin = None) for
benchmarks without [nasde.plugin].

Backward compatibility

No [nasde.plugin] / [[skill]] → behaviour is byte-for-byte as before.
Existing variants/<v>/skills/ copies keep working (and now correctly
carry references/ — a strict improvement, also validated in the
nasde-dev-skill migration where python-best-practices / python-testing
remain as legacy copies and continue to work alongside the new [[skill]]
entries).

CI status

Check Status
Lint (ruff), Type check (mypy)
Unit tests (ubuntu × windows × py3.12 / py3.13)
Validate benchmark configs
Docker build benchmarks
CVE audit (pip-audit) ❌ pyjwt PYSEC-2025-183

The CVE audit failure is environmental and pre-existing: PYSEC-2025-183
is a disputed advisory on pyjwt 2.12.1 (upstream maintainer: "the key
length is chosen by the application that uses the library") with no
upstream fix available
2.12.1 is the latest release. pyjwt is
transitive via supabase/opik, not used directly by nasde-toolkit. The
same audit fails identically on main once re-run (last green main run is
2026-05-09, before PYSEC-2025-183 was published 2025-07-31). Will be
addressed in a separate dependency-policy PR at release time.

Downstream follow-up

The SDLC analyze-conversation migration referenced above lives on a
separate branch in the SDLC repo and will land there after this PR is
merged. The draft-to-design-doc/ benchmark in SDLC can be migrated the
same way (separate PR, separate repo).

🤖 Generated with Claude Code

Szymon Janikowski and others added 7 commits May 19, 2026 17:28
Mirror [nasde.source] for local Claude Code plugins: one task.toml
declaration ships a plugin dir into the sandbox image AND registers it
for the agent (skills discoverable by reference + MCP server wired) —
no vendored snapshot, no triple hand-wiring.

- config.PluginConfig + [nasde.plugin] parser (path/ref/install_root/
  build/env), next to SourceConfig.
- docker.ensure_task_plugin(): stage plugin (at ref via temp worktree)
  into gitignored _nasde-plugin/ inside the *active* build context;
  append a sentinel-fenced COPY+build stage. Composes with plugin-only
  (generated base), plugin+[nasde.source] (compose-redirected context),
  and a hand-written Dockerfile (preserved verbatim).
- plugin_registration.py: shared skill+MCP machinery. stage_skill_dir
  carries the WHOLE skill dir (incl. references/) — also fixes the
  latent bug where variants/<v>/skills/ dropped references/. MCP server
  derived from <plugin>/.mcp.json, env-wrapped, injected idempotently
  into task.toml (Harbor reads MCP only from there); never clobbers an
  author-declared same-name server.
- Skill-by-reference: [[skill]] array in variant.toml feeds the same
  stage_skill_dir machinery — reference a skill from a source path
  (optional ref) instead of copying into variants/. Legacy copy path
  unchanged.

Backward compatible: no [nasde.plugin]/[[skill]] -> behaviour as before.
ADR-009. 33 new tests; full suite 192 passing; ruff + mypy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CVE audit (pip-audit --strict) started failing on every branch once
CVE-2026-45409 (idna <3.15) and CVE-2026-44431 / CVE-2026-44432
(urllib3 <2.7.0) were published — both are deep transitive deps via
harbor/opik/supabase. Pin them directly, mirroring the existing
litellm GHSA pin precedent. No runtime behaviour change; 192 tests
still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hmark

Exercises the new [[skill]] mechanism on the canonical dogfooding
benchmark. claude-nasde-dev-by-ref references the LIVE
.claude/skills/nasde-dev (no copy under variants/), composing with the
benchmark's existing [nasde.source] ref pin.

Validated via the real `nasde run` staging path: the by-reference
staged SKILL.md is byte-identical to the live source, while the
pre-existing copy-into-variant (claude-nasde-dev-full-stack) has
already drifted from it — the exact drift this feature eliminates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eference

Replace the previous demo-variant approach (claude-nasde-dev-by-ref)
with proper migration of the EXISTING variants. Better dogfooding:
shows replacement of the copy-into-variants pattern in place, not
addition alongside it.

For each of the three existing variants (claude-nasde-dev-full-stack,
-with-arch, -with-testing): add [[skill]] pointing at the live
.claude/skills/nasde-dev source, delete the drifted variant copy
under skills/nasde-dev/. python-best-practices / python-testing
copies stay (no live source for them in this repo).

Verified: all three variants now stage nasde-dev sha256 073822c1b52b
(= LIVE), eliminating the drift from the previous ec43ca37e668 copy.
Demo-variant claude-nasde-dev-by-ref deleted (was redundant with the
migration). Full suite still 192 passing; ruff/mypy clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…kill]]

Both faces of ADR-009 validated on real benchmarks before merge:

- [[skill]] on examples/nasde-dev-skill (3 variants migrated): drift
  eliminated, full pipeline run scored 100/125 (0.80) with verifier
  reward=1.
- [nasde.plugin] on SDLC analyze-conversation (motivating downstream):
  snapshot + hand-wired MCP + variant skill-copy all deleted in the
  SDLC repo; full run with baked noesis plugin reached verifier reward=1
  via auto-injected MCP wrapper, matching pre-migration baseline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Multi-agent code review surfaced bugs across all three faces of the
feature (config, docker staging, registration). All addressed with
regression tests (+15 tests, 192 → 207 passing). Net code delta is
small and surgical; the registration module gained one helper
refactor (multi-server support) and the rest are 1–10 line guards or
quoting fixes.

001 docker.py: _plugin_build_context_dir falls back to env_dir when no
    compose exists. Fixes FileNotFoundError when [nasde.plugin] combines
    with a remote [nasde.source] (no compose is generated for remote URLs).

002 plugin_registration.py: _build_mcp_server → _build_mcp_servers — wire
    ALL servers from .mcp.json, not just the first. Honor per-server env
    field from .mcp.json. Precedence: nasde defaults → plugin .mcp.json env
    → [nasde.plugin].env override.

003 plugin_registration.py + docker.py: _strip_mcp_block /
    _strip_existing_plugin_stage refuse to rewrite when only the BEGIN
    sentinel is present (END hand-deleted). str.partition would otherwise
    silently truncate every section below BEGIN — destroying user-authored
    task.toml content.

005 docker.py + plugin_registration.py: shell-quote install_root in
    RUN cd ... && build and in the MCP wrapper; JSON-array form for
    COPY. Safe for paths with spaces or shell-special characters.

006 runner.py: _refresh_sandbox_files rebuilds sandbox_files from scratch
    each run (authored from _collect_sandbox_files + derived extra).
    Drops the if-extra-non-empty guard so removed [[skill]]/plugin
    entries don't leave stale entries pointing at cleaned-up worktrees.

007 examples/nasde-dev-skill/variants/{full-stack,with-arch,with-testing}/
    variant.toml: translated Polish migration comments to English.

008 plugin_registration.py: stage_skill_dir skips OS junk (.DS_Store,
    Thumbs.db), VCS data (.git/), Python cache (__pycache__/, .venv/,
    *.pyc), editor temp files (*.swp, *~, *.bak, .#*). Mirrors the
    ignore list _stage_plugin_tree already uses. Live developer skill
    dirs referenced via [[skill]] no longer leak workstation state.

010 docker.py: _resolve_plugin_source validates .claude-plugin/plugin.json
    against the RESOLVED path (worktree at ref, when set), not the working
    tree. Restores documented ref semantics: pin to a historical commit
    even when the working tree is mid-refactor.

013 runner.py: _require_homogeneous_plugins fails fast when multiple
    tasks in one project declare different [nasde.plugin] entries.
    Plugin skills register into a variant-wide sandbox; silently merging
    different plugins would contaminate trials.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…written sandbox_files

The previous bug-006 fix (_refresh_sandbox_files) overwrote ALL entries
in sandbox_files on every run — regressing the supported use case of
hand-written harbor_config.json (CLAUDE.md documents harbor_config.json
as "Optional: agent import path + sandbox_files mapping"). User-added
entries vanished silently.

New contract: each agent in harbor_config.json carries a
_nasde_derived_keys: [...] meta-list of keys nasde owns. Refresh:

  prev_derived = agent._nasde_derived_keys (default [])
  authored     = _collect_sandbox_files(variant_dir)
  handwritten  = {k: v for k, v in current
                  if k not in prev_derived
                  and authored.get(k) != v}
  sandbox_files = {**extra, **authored, **handwritten}
  _nasde_derived_keys = sorted(extra)

Precedence (right wins in dict merge): handwritten > authored > extra.
Hand-written entries are NEVER on the derived list, so they survive
every refresh. Removed [[skill]]/[nasde.plugin] entries drop cleanly
because they were on the previous derived list.

Three collision warnings surface duplication that would otherwise be
silent (per user request — silent collisions are BARDZO nie ok):

  - hand-written ∩ derived  → hand-written wins, WARNING
  - hand-written ∩ authored → hand-written wins, WARNING
  - derived ∩ authored      → authored wins, WARNING

Pydantic's AgentConfig in Harbor 0.6.4 silently drops unknown fields,
so _nasde_derived_keys never reaches the trial layer — it's a
nasde-local persistence concern.

Dead code removed: _merge_sandbox_files (no production call site since
the previous bug-006 fix; two test cases that exercised it deleted —
they tested the dead function, not real behavior).

Edge-case coverage (8 buckets per PR discussion):
  EC1 preserve hand-written across runs
  EC2 drop stale derived after [[skill]] removed
  EC3 update derived host path (new worktree)
  EC4 three collision types (hand∩derived, hand∩authored, derived∩authored)
  EC5 greenfield first run (no harbor_config.json)
  EC6 multiple agents in hand-written config
  EC7 idempotent (run twice → identical file)
  EC8 forward-compat with old harbor_config.json (no _nasde_derived_keys meta)

Net test count: 207 → 214 (+11 new, -2 dead-code tests removed,
-2 tests merged into single EC8 forward-compat coverage).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@szjanikowski szjanikowski merged commit 8fd10bd into main May 21, 2026
8 of 9 checks passed
@szjanikowski szjanikowski mentioned this pull request May 21, 2026
7 tasks
szjanikowski added a commit that referenced this pull request May 21, 2026
Promote the [Unreleased] section to [0.4.0] (2026-05-21), add fresh
[Unreleased] anchor, update compare links, and cite #51 / #52 inline
plus in the link-ref table.

Highlights:
- [nasde.plugin] in task.toml — ship a local Claude Code plugin into
  the sandbox with one declaration (ADR-009). (#51)
- Skill-by-reference: [[skill]] array in variant.toml. (#51)
- Fix: variants/<v>/skills/<name>/ now carries references/ and sibling
  files, not just SKILL.md. (#51)
- Security pins: idna>=3.15, urllib3>=2.7.0. (#51)
- CI: pip-audit ignores disputed pyjwt PYSEC-2025-183. (#52)

Co-authored-by: Szymon Janikowski <szymon.janikowski@itlibrium.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant