Skip to content

Skills, File System, Console capaiblities via pydantic-ai vs custom artifact store. #88

@NISH1001

Description

@NISH1001

pydantic-ai is the direction we will take, so we want to minimize custom implementation.

Claude helped me work through the design comparing our ArtifactStore (custom implementation) vs pydantic-ai capabilities. Verdict + plan below.

Scope: read-only for skills/artifacts (knowledge inputs), read+write for per-session workspaces (ephemeral local — papers, plots, generated files). Agent-driven writes back to GitHub remain out of scope.


ArtifactStore vs pydantic-ai capabilities — finalized verdict

Scope

In akd-ext, agents need to:

  1. Read skills/artifacts (knowledge inputs) — bundled with the package and/or hosted on a public GitHub repo. Read-only.
  2. Read + write a per-session workspace — ephemeral local directory where the agent produces outputs (papers, plots, generated files) the user can inspect at end of chat.

Agent-driven writes back to GitHub (publishing produced artifacts back to a repo) are explicitly out of scope for this plan.

(akd-ext nomenclature: what pydantic-ai-skills calls a "skill" we call an "artifact"; same shape, just naming. We use the terms interchangeably below; on disk both are SKILL.md files with YAML frontmatter.)

Verdict

ArtifactStore is fully redundant. Adopt pydantic-ai-skills for the read side and pydantic-ai-backend for the session-workspace read/write side. Both responsibilities map cleanly onto pydantic-ai capabilities; neither requires custom infrastructure. The work-in-progress on feature/github-store (GitHubArtifactStore) is paused — neither merged nor deleted — because GitHub access in akd-ext is read-only via GitSkillsRegistry, and writes to GitHub are out of scope. Immediate priority: migrate closed_loop_cm1 agents off their eager system-prompt context injection (12k+ lines of cm1_readme.md per call) to SkillsCapability's progressive disclosure, for the token win.

Three needs → two pydantic-ai capabilities

Agent need pydantic-ai answer Custom code
Read skills/artifacts from local dirs (e.g. closed_loop_cm1/context/*.md bundled in the package) SkillsCapability(directories=[...]) from pydantic-ai-skills None
Read skills/artifacts from a public GitHub repo SkillsCapability(registries=[GitSkillsRegistry(repo_url=...)]) — clones, caches, supports auth, version pinning None
Read + write a per-session workspace (papers, plots, generated files) ConsoleCapability(backend=LocalBackend(root_dir=session_path), permissions=DEFAULT_RULESET) from pydantic-ai-backend None (per-session dir lifecycle is already handled by pydantic-ai-backend's SessionManager)

Both SkillsCapability configurations can coexist on a single agent. The ConsoleCapability composes alongside them — they're orthogonal: skills are loaded knowledge (immutable inputs), session workspace is generated output (ephemeral, local-only). The agent sees one unified set of tools.

SkillsCapability ships read-only progressive disclosure: at construction it injects skill name + description into the system prompt (cheap), then exposes list_skills / load_skill(name) / read_skill_resource(...) / run_skill_script(...) tools so the model fetches full content only when relevant.

ConsoleCapability ships ls / read / write / edit / glob / grep filtered by permission rulesets (READONLY / DEFAULT / STRICT / PERMISSIVE). With LocalBackend(root_dir=session_path), the agent is sandboxed inside the session workspace — it cannot read or write outside of it.

Why ArtifactStore is fully redundant — two facts

Fact 1: A real, present skills use case is burning tokens today

akd_ext/agents/closed_loop_cm1/capability_feasibility_mapper.py:369-374 (on feature/pydantic_ai_base_agent):

cluster_it_context: str = Field(
    default_factory=lambda: (Path(__file__).parent / "context" / "cluster_it.md").read_text(),
)
cm1_readme_context: str = Field(
    default_factory=lambda: (Path(__file__).parent / "context" / "cm1_readme.md").read_text(),
)

_create_agent (line 436-442) concatenates both into agent.instructions on every construction. cm1_readme.md is 12,115 lines — currently injected into the system prompt on every model call. This is precisely the failure mode SkillsCapability's progressive disclosure was designed to fix.

Fact 2: ArtifactStore has zero call sites

Cross-branch grep for ArtifactStore, GitHubArtifact, LocalArtifact: hits only inside akd_ext/artifacts/ itself. No agent imports it. No tool consumes it. No system prompt builder uses __str__(). It's purely speculative infrastructure.

Dependency assessment (verified, not skimmed)

pydantic-ai-skills (Doug Trajano) — safe to depend on

  • 261 ⭐, 21 forks, v0.8.0 released April 21 2026
  • 17 issues total, 0 open — every reported issue closed
  • 20 PRs merged including headline features: SkillsCapability, GitSkillsRegistry, auto_reload, generic SkillsToolset[Any]
  • Recent commits: responsive iteration on real user reports
  • Single maintainer — real risk, but the data format (markdown + YAML frontmatter) is fully portable; rebuild cost if abandoned is ~1 day for a homegrown ~150 LOC AbstractCapability subclass

pydantic-ai-backend (Vstorm) — safe to depend on

  • 83 ⭐, 19 forks, v0.2.5 released April 20 2026
  • 3 issues total, 0 open
  • Multi-contributor (Kacper Włodarczyk, DEENUU1, ilayu, community PRs) — better bus factor than skills
  • Recent activity: docker+daytona session manager, async protocol, sandbox sessions
  • Used here only for local-filesystem ConsoleCapability (per-session workspace). No GitHub backend, no external network.

Both are alpha-stage but actively maintained. No abandonment signals.

Action sequence

Prerequisite: Merge feature/pydantic_ai_base_agent

Everything below builds on the pydantic-ai foundation. Until that branch lands on develop (or whatever your integration branch is), the capability work has no place to live. Confirm mergeability and ship it first.

Step 1 — Adopt SkillsCapability for closed_loop_cm1, local-only (priority: token win, ~1 day)

On a new branch off the now-merged pydantic-ai base:

  • pyproject.toml: add pydantic-ai-skills>=0.8.0
  • Convert akd_ext/agents/closed_loop_cm1/context/cluster_it.md and cm1_readme.md into SKILL.md format with YAML frontmatter:
    ---
    name: cluster-it-infrastructure
    description: NCAR/Frontera cluster compute resources, scheduling, storage layout for HPC feasibility analysis.
    ---
    
    <existing markdown body>
    
    Each lives at e.g. akd_ext/agents/closed_loop_cm1/skills/cluster-it-infrastructure/SKILL.md.
  • akd_ext/agents/closed_loop_cm1/capability_feasibility_mapper.py:
    • Delete cluster_it_context and cm1_readme_context config fields (lines 369-376)
    • Delete the extra += concatenation in _create_agent (lines 436-442)
    • Add SkillsCapability(directories=[Path(__file__).parent / "skills"]) to the agent's capability list. Read _base/pydantic_ai/_capabilities.py first to find the composition site.
  • Apply the same pattern to peers in closed_loop_cm1/: experiment_implementation.py, workflow_spec_builder.py, research_report_generator.py, interpretation_paper_assembly.py.

Verification:

  • uv run pytest tests/agents/test_capability_feasibility_mapper.py -v (existing tests; may need updates if they assert on the deleted config fields)
  • Run a representative query against the agent. Confirm via debug logging that cm1_readme.md is not in the initial system prompt, is available via list_skills, and that load_skill("cm1-readme") returns the content when called
  • Compare token counts before/after on a representative input — the win should be visible

Step 2 — Session-workspace ConsoleCapability (when first agent produces file outputs)

When the first agent needs to write outputs the user wants to inspect (likely research_report_generator or interpretation_paper_assembly):

  • pyproject.toml: add pydantic-ai-backend>=0.2.5
  • Decide on session-workspace lifecycle: per-chat-session temp dir vs. per-user persistent dir vs. pydantic-ai-backend.SessionManager. Probably: a small wrapper around SessionManager that scopes a workspace dir to the chat session and exposes the path to the agent at run time.
  • Wire into agents that produce outputs:
    capabilities=[
        SkillsCapability(directories=[...]),   # from Step 1
        ConsoleCapability(
            backend=LocalBackend(root_dir=session_workspace_path),
            permissions=DEFAULT_RULESET,
        ),
    ]

Verification:

  • Run an agent that produces a file (e.g. a markdown report) in its workspace. Confirm the file lands at session_workspace_path and is accessible to the user post-run.
  • Confirm sandboxing: agent attempts to read/write outside root_dir should fail.

Step 3 — Add GitSkillsRegistry once a public skills/artifacts repo exists (read-only, ~half day)

When the canonical AKD skills/artifacts repo is set up on GitHub (e.g. NASA-IMPACT/akd-skills or similar), wire it in:

SkillsCapability(
    directories=[Path(__file__).parent / "skills"],   # local fallback
    registries=[GitSkillsRegistry(
        repo_url="https://github.com/NASA-IMPACT/akd-skills",
        path="skills",
        target_dir="./.cache/akd-skills",
        clone_options=GitCloneOptions(depth=1, single_branch=True),
    )],
)

Agents pull canonical artifacts from GitHub at startup and use them via the same progressive-disclosure surface as local skills. Read-only — no commit/push semantics.

Verification:

  • Tiny PydanticAIBaseAgent instantiation against the public repo. Confirm list_skills reports the GitHub-hosted artifacts, load_skill(...) returns content. Skip CI runs unless network is allowed.

Step 4 — Park feature/github-store

Don't merge. Don't delete. The PyGithub plumbing in akd_ext/artifacts/stores/github.py:14-210 (sha-aware writes, fast-path no-op detection, github_client= injection pattern) is good code — keep the branch as a parking lot. If a GitHub-write use case ever lands later (e.g. CARE agents publishing produced artifacts back to a repo), this is the implementation foundation to revive. Until then, leave it dormant.

Scope note

This plan covers read-only skills/artifact loading plus read-write session workspaces. Agent-driven writes to GitHub are intentionally out of scope here — defer that decision to a separate plan when a concrete write use case lands.

The mechanical work in Step 1 is well-scoped and ready to execute as a regular task once the pydantic-ai base is merged. Steps 2 and 3 unblock when their respective use cases / infrastructure are ready.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions