feat(agents,hands): per-agent skills, mcp_servers, profile, budget, and history limits#89
Conversation
… history limits
All 32 agent manifests now declare explicit allowlists instead of
inheriting everything globally:
- profile: ToolProfile (Minimal/Coding/Research/Messaging/Automation/Full)
- skills: explicit skill allowlist per role (empty=all only for ops)
- mcp_servers: per-agent MCP allowlist ("_none" sentinel for agents
that need zero MCPs, pending upstream mcp_disabled field)
- max_history_messages: 8/12/15/20 by role
- max_llm_tokens_per_hour: light=100k, standard=200k, heavy=500k
Reduces prompt token overhead from ~50k (167 tools for everyone) to
role-appropriate subsets. Addresses librefang-registry#87.
All 17 hand definitions now declare explicit allowlists: - skills: role-appropriate skill list per hand category - skills_disabled: true for content-only hands (clip, creator) - mcp_servers: specific servers or "_none" sentinel Hands with browser needs (collector, researcher, browser) get puppeteer. Dev hands (devops, devteam) get git+filesystem. Communication hands (linkedin, reddit, twitter) get writing-coach + web-search only.
…dded Addresses gaps found by architecture audit: agents missing skills their role requires (e.g. data-scientist missing web-search, debugger missing rust-expert, devops-lead missing web-search, recruiter missing interview-prep). Two MCP assignments corrected: email-assistant gets gmail, meeting-assistant gets google-calendar.
houko
left a comment
There was a problem hiding this comment.
The intent — slashing the ~50k tokens of universally-broadcast tool definitions per call — is sound and well-evidenced in #87. The per-role audit is mostly careful work. But two issues should block merge and a few smaller ones deserve callouts.
⛔ Blocking
B1. mcp_servers = ["_none"] works only by name-collision accident
The PR body acknowledges _none is a sentinel "pending upstream mcp_disabled field #4808". I traced it through the kernel — nothing special-cases the string _none. It works only because:
tools_and_skills.rs:194-216filters MCP tools by matching the resolved server name against the allowlist; no server happens to be named_none, so every tool falls through.mcp_summary.rs:53does the samecontinuein the prompt-summary path.
The day someone publishes an MCP server named _none (nothing prevents it — normalize_name accepts underscores), every "_none"-gated agent silently grants access to it. This kind of bug doesn't fail in CI; it leaks privilege quietly.
Recommendation: wait for librefang/librefang#4808 (mcp_disabled: bool) and use the proper field. If you really want to ship now, add a kernel-side guard that rejects \"_none\" as a server name and a manifest-load test that fails the build if any TOML uses the sentinel without #4808 also being merged.
B2. ops/agent.toml: max_llm_tokens_per_hour: 50000 → 0
ResourceQuota::effective_token_limit() (crates/librefang-types/src/agent.rs:537) treats Some(0) as unlimited, not zero. ops is the periodic system agent (cron every 5m) with skills = [] (= all skills) and profile = \"full\". Flipping its hourly cap from 50k to unlimited turns it into a runaway-cost vector if any LLM-tool loop hangs. The PR description says "unlimited=0" without flagging that ops is the one agent inheriting it.
Recommendation: keep a finite cap (the description's "heavy=500k" tier is already a 10× bump) or call this out explicitly in both the PR body and the manifest comment.
⚠️ Significant
S1. profile = \"...\" is dead config on every audited agent
Per tools_and_skills.rs:87-88,123-150, profile is only consulted when capabilities.tools is empty or contains \"*\". I spot-checked assistant (declares 7 specific tools, profile=automation) and ops (declares 4 specific tools, profile=full) — both have non-empty, non-wildcard tool lists, so the profile branch is never reached. Looking at the diff, all 32 agents in this PR already have non-empty capabilities.tools blocks, so the new profile field is cosmetic across the board.
Either drop the field, or set capabilities.tools = [\"*\"] and let the profile do the work — but pick one model. Two sources of truth will drift.
S2. Three agents had their token cap lowered — surface this in the PR body
| Agent | Before | After | Δ |
|---|---|---|---|
meeting-assistant |
150k | 100k | −50k |
personal-finance |
150k | 100k | −50k |
translator |
200k | 100k | −100k |
The body claims net savings per call (true), but these are per-hour caps — an existing user running near the previous limit will start hitting the new floor. The body says "light=100k, standard=200k, heavy=500k" but doesn't enumerate which agents got demoted a tier. List them, or leave them at the prior cap.
Minor
assistant: 300k → 500k. Inconsistent with the "save tokens" framing. Why?hands/devteamMCP allowlist broadened ([\"github\"]→[\"git\", \"filesystem\", \"github\"]). Hand allowlists intersect with agent allowlists so it's defensible, but the PR title is "per-agent allowlists" — a one-liner about hand-side widening helps reviewers.[\"_none\"]+skills_disabled = trueon clip/creator/wiki is redundant. The boolean is the load-bearing thing; the string list is decorative noise. Drop the list.travel-planner: profile = \"custom\"—CustomandFullboth expand to[\"*\"]. Indistinguishable from omitting the field.- No registry-side schema test. Misspelling a skill name silently no-ops. A test that loads each TOML, parses it as
AgentManifest, and cross-checks everyskills[*]/mcp_servers[*]against the canonical list would catch typos before they reach operators. Worth adding alongside this PR or as a follow-up. - List ordering:
skills = [...]andmcp_servers = [...]are not alphabetized. The kernel sorts before prompt injection (#3298 invariant), so this isn't a cache-hit bug — but a sort-on-write check protects against the day someone re-implements normalization.
What's good
- Per-role allowlist taxonomy is sensible.
web-searchfor research roles,git+filesystemfor dev roles,_none(sigh) for content/legal/personal-finance. skills_disabled = trueon the two content-only hands is the right model — those don't go through the LLM tool path.- The audit commit ("46 missing skills added") shows the author re-checked their own work; that's reassuring.
max_history_messagestiered by role (8/12/15/20) is sensible —MIN_HISTORY_MESSAGES = 4floor in the kernel keeps anyone from going below 4.
Summary
Two blockers (_none sentinel safety + ops unlimited tokens), one significant cleanup (profile is dead config on all 32 agents), three callouts the PR body should add. Underlying audit work is solid and the token impact is real — worth getting in once B1/B2 are addressed.
…#87) All 32 agent manifests and 17 hands shipped with empty mcp_servers / skills lists, which the kernel interprets as "no filter" — every globally-configured MCP server's tools and every installed skill get injected into the prompt on every LLM call. On a typical instance (9 MCP servers, ~85 MCP tools + ~82 built-in tools) that's ~50k input tokens per turn spent on definitions the agent never uses. Changes ------- 32 agents/*/agent.toml: - mcp_servers: 1-4 per agent. memory wherever state persists across turns; fetch / exa-search / brave-search only where the prompt actually calls for web; git / github / filesystem on engineering agents; gmail / google-calendar / linear / jira on productivity agents whose prompts mention them. - skills: per-role allowlist driven by what the system_prompt names (e.g. coder → rust/python/typescript/git/shell-scripting; devops- lead → docker/kubernetes/terraform/ansible/ci-cd/helm/prometheus/ sysadmin). Generalists (assistant) and low-surface agents keep skills = []. - max_history_messages: tiered by workload shape — 40 hello-world, recipe-assistant, health-tracker, home-automation (short conversational agents) 60 writer, translator, doc-writer, email-assistant, customer- support, sales-assistant, recruiter, social-media, personal- finance, tutor, travel-planner, meeting-assistant, ops, devops-lead, planner (single-turn task agents) 80 coder, debugger, architect, code-reviewer, test-engineer, security-auditor, analyst, data-scientist, academic- researcher, researcher, legal-assistant (multi-step, tool- heavy work) 120 assistant, orchestrator (long multi-agent coordination — prompt-cache continuity is critical) All four tiers sit at or above the rising kernel default (40 today, rising upstream). Pinning lower would thrash the prompt cache; #91 already demonstrated this on the creator hand. 17 hands/*/HAND.toml: - hand-level mcp_servers / skills now declared on every hand, so every [agents.*] inside inherits a sensible allowlist. - skills_disabled = true on clip and creator (pure media pipelines that don't benefit from any skill). - devteam: expand existing mcp_servers = ["github"] to include memory / git / filesystem; populate skills with the expected dev-team expertise (replacing the placeholder skills = []). - wiki: replace placeholder mcp_servers = [] with [memory, fetch, filesystem]. Hand-level skills stays [] so the existing per- agent override (agents.main.skills = ["wiki-librarian"]) keeps its meaning. schema.toml: register mcp_servers / skills / max_history_messages on the agent field schema so machine consumers (RegistrySchema in librefang-types) see the new top-level fields. agents/README.md: update example block to show the allowlists, plus a max_history_messages override commented out with a prompt-cache caveat. "Adding a New Agent" checklist now calls out the allowlists. Why not adopt PR #89's approach ------------------------------- PR #89 covers similar ground but with three issues this PR avoids: 1. PR #89 uses mcp_servers = ["_none"] as a sentinel for "no MCPs", explicitly pending upstream librefang#4808 (mcp_disabled). Shipping a magic-string today means coming back later to clean it up after the upstream field lands. This PR omits "_none" and uses real allowlists where MCPs are needed. 2. PR #89 sets max_history_messages to 8 / 12 / 15 / 20 — at or below the kernel's MIN_HISTORY_MESSAGES floor (4) and far below today's default (40). Every turn that hits the cap invalidates the cached prompt prefix, costing more than the smaller history saves. The values in this PR (40-120) align with #91's direction for long-workflow hands. 3. PR #89 also doubles several max_llm_tokens_per_hour caps (coder 200k→500k, assistant 300k→500k). That widens the per-agent budget — the opposite direction from #87's "reduce per-call cost" goal — and is left to the operator's instance-specific tuning. Refs #87, #89
…#87) All 32 agent manifests and 17 hands shipped with empty mcp_servers / skills lists, which the kernel interprets as "no filter" — every globally-configured MCP server's tools and every installed skill get injected into the prompt on every LLM call. On a typical instance (9 MCP servers, ~85 MCP tools + ~82 built-in tools) that's ~50k input tokens per turn spent on definitions the agent never uses. Changes ------- 32 agents/*/agent.toml: - mcp_servers: 1-4 per agent. memory wherever state persists across turns; fetch / exa-search / brave-search only where the prompt actually calls for web; git / github / filesystem on engineering agents; gmail / google-calendar / linear / jira on productivity agents whose prompts mention them. - skills: per-role allowlist driven by what the system_prompt names (e.g. coder → rust/python/typescript/git/shell-scripting; devops- lead → docker/kubernetes/terraform/ansible/ci-cd/helm/prometheus/ sysadmin). Generalists (assistant) keep skills = [] (see "Open items" below). - skills_disabled = true on the four short-conversational agents (hello-world, recipe-assistant, health-tracker, home-automation). Their system prompts never instruct the LLM to consult any skill, so loading all 60 was pure waste. They also drop the explicit max_history_messages override and inherit the kernel default (60). - max_history_messages tiered by workload shape: 60 short conversational (hello-world, recipe, health-tracker, home-automation) — inherits the rising kernel default (`DEFAULT_MAX_HISTORY_MESSAGES = 60`); no override needed. 60 single-turn task agents (writer, translator, doc-writer, email-assistant, customer-support, sales-assistant, recruit- er, social-media, personal-finance, tutor, travel-planner, meeting-assistant, ops, devops-lead, planner) — explicit override at the same value to lock the cap if the kernel default moves again. 80 multi-step / tool-heavy (coder, debugger, architect, code- reviewer, test-engineer, security-auditor, analyst, data- scientist, academic-researcher, researcher, legal-assistant) 120 coordinators (assistant, orchestrator) — long multi-agent sessions where prompt-cache continuity is critical All values sit at or above the kernel default. Pinning lower would thrash the prompt cache (the failure mode #91 fixed for the creator hand by *raising* the cap, not lowering it). 17 hands/*/HAND.toml: - hand-level mcp_servers / skills now declared on every hand, so every [agents.*] inside inherits a sensible allowlist. - skills_disabled = true placed on each [agents.*] inside clip and creator (pure media pipelines that don't benefit from any skill). HandDefinitionRaw in librefang-hands does NOT have a top-level skills_disabled field — declaring it at the hand top level would be silently dropped by serde, so the setting must live on the AgentManifest of each sub-agent role. - devteam: expand existing mcp_servers = ["github"] to include memory / git / filesystem; populate skills with the expected dev-team expertise (replacing the placeholder skills = []). - wiki: replace placeholder mcp_servers = [] with [memory, fetch, filesystem]. Hand-level skills stays []. - lead: hand-level skills was originally [email-writer, writing- coach, interview-prep]; interview-prep is for job-interview preparation, not lead generation. Replaced with data-analyst (used by the qualification-scoring step in the prompt). schema.toml: register mcp_servers / skills / max_history_messages on the agent field schema so machine consumers (RegistrySchema in librefang-types) see the new top-level fields. The max_history_messages description now points at librefang_runtime::agent_loop::DEFAULT_MAX_HISTORY_MESSAGES (60 today) by name, so the schema doesn't go stale when the constant moves again. agents/README.md: example block + "Adding a New Agent" checklist mention the allowlists; max_history_messages example is shown commented out with a prompt-cache caveat. Open items ---------- `assistant` (the default user-facing agent) keeps `skills = []` deliberately. It is the generalist entry point — capping its skill surface at a small allowlist would defeat its "delegate to any specialist" job. The trade-off is that this single agent still pays the full skill-definition load on every turn; operators who want a strict allowlist for `assistant` can override it after install. Why not adopt PR #89's approach ------------------------------- #89 covers similar ground but with three issues this PR avoids: 1. mcp_servers = ["_none"] sentinel. #89's body explicitly notes it's pending upstream librefang#4808 (mcp_disabled). Shipping a magic-string today means coming back later to clean it up. This PR uses real allowlists. 2. max_history_messages = 8 / 12 / 15 / 20. Far below today's kernel default (60) and #91's direction for long-workflow hands (80–120). Every turn that hits the cap invalidates the cached prompt prefix; the cost of cache misses exceeds the saving from shorter history. This PR uses 60–120. 3. Doubling max_llm_tokens_per_hour (coder 200k→500k, assistant 300k→500k) widens the per-agent budget — the opposite direction from #87's "reduce per-call cost" goal. Left to the operator's instance-specific tuning. Refs #87, #89
…#87) (#92) All 32 agent manifests and 17 hands shipped with empty mcp_servers / skills lists, which the kernel interprets as "no filter" — every globally-configured MCP server's tools and every installed skill get injected into the prompt on every LLM call. On a typical instance (9 MCP servers, ~85 MCP tools + ~82 built-in tools) that's ~50k input tokens per turn spent on definitions the agent never uses. Changes ------- 32 agents/*/agent.toml: - mcp_servers: 1-4 per agent. memory wherever state persists across turns; fetch / exa-search / brave-search only where the prompt actually calls for web; git / github / filesystem on engineering agents; gmail / google-calendar / linear / jira on productivity agents whose prompts mention them. - skills: per-role allowlist driven by what the system_prompt names (e.g. coder → rust/python/typescript/git/shell-scripting; devops- lead → docker/kubernetes/terraform/ansible/ci-cd/helm/prometheus/ sysadmin). Generalists (assistant) keep skills = [] (see "Open items" below). - skills_disabled = true on the four short-conversational agents (hello-world, recipe-assistant, health-tracker, home-automation). Their system prompts never instruct the LLM to consult any skill, so loading all 60 was pure waste. They also drop the explicit max_history_messages override and inherit the kernel default (60). - max_history_messages tiered by workload shape: 60 short conversational (hello-world, recipe, health-tracker, home-automation) — inherits the rising kernel default (`DEFAULT_MAX_HISTORY_MESSAGES = 60`); no override needed. 60 single-turn task agents (writer, translator, doc-writer, email-assistant, customer-support, sales-assistant, recruit- er, social-media, personal-finance, tutor, travel-planner, meeting-assistant, ops, devops-lead, planner) — explicit override at the same value to lock the cap if the kernel default moves again. 80 multi-step / tool-heavy (coder, debugger, architect, code- reviewer, test-engineer, security-auditor, analyst, data- scientist, academic-researcher, researcher, legal-assistant) 120 coordinators (assistant, orchestrator) — long multi-agent sessions where prompt-cache continuity is critical All values sit at or above the kernel default. Pinning lower would thrash the prompt cache (the failure mode #91 fixed for the creator hand by *raising* the cap, not lowering it). 17 hands/*/HAND.toml: - hand-level mcp_servers / skills now declared on every hand, so every [agents.*] inside inherits a sensible allowlist. - skills_disabled = true placed on each [agents.*] inside clip and creator (pure media pipelines that don't benefit from any skill). HandDefinitionRaw in librefang-hands does NOT have a top-level skills_disabled field — declaring it at the hand top level would be silently dropped by serde, so the setting must live on the AgentManifest of each sub-agent role. - devteam: expand existing mcp_servers = ["github"] to include memory / git / filesystem; populate skills with the expected dev-team expertise (replacing the placeholder skills = []). - wiki: replace placeholder mcp_servers = [] with [memory, fetch, filesystem]. Hand-level skills stays []. - lead: hand-level skills was originally [email-writer, writing- coach, interview-prep]; interview-prep is for job-interview preparation, not lead generation. Replaced with data-analyst (used by the qualification-scoring step in the prompt). schema.toml: register mcp_servers / skills / max_history_messages on the agent field schema so machine consumers (RegistrySchema in librefang-types) see the new top-level fields. The max_history_messages description now points at librefang_runtime::agent_loop::DEFAULT_MAX_HISTORY_MESSAGES (60 today) by name, so the schema doesn't go stale when the constant moves again. agents/README.md: example block + "Adding a New Agent" checklist mention the allowlists; max_history_messages example is shown commented out with a prompt-cache caveat. Open items ---------- `assistant` (the default user-facing agent) keeps `skills = []` deliberately. It is the generalist entry point — capping its skill surface at a small allowlist would defeat its "delegate to any specialist" job. The trade-off is that this single agent still pays the full skill-definition load on every turn; operators who want a strict allowlist for `assistant` can override it after install. Why not adopt PR #89's approach ------------------------------- #89 covers similar ground but with three issues this PR avoids: 1. mcp_servers = ["_none"] sentinel. #89's body explicitly notes it's pending upstream librefang#4808 (mcp_disabled). Shipping a magic-string today means coming back later to clean it up. This PR uses real allowlists. 2. max_history_messages = 8 / 12 / 15 / 20. Far below today's kernel default (60) and #91's direction for long-workflow hands (80–120). Every turn that hits the cap invalidates the cached prompt prefix; the cost of cache misses exceeds the saving from shorter history. This PR uses 60–120. 3. Doubling max_llm_tokens_per_hour (coder 200k→500k, assistant 300k→500k) widens the per-agent budget — the opposite direction from #87's "reduce per-call cost" goal. Left to the operator's instance-specific tuning. Refs #87, #89
|
Thanks for the work here — closing in favor of #92, which addresses #87 with the same intent but lands a few choices differently. Posting the concrete review notes so you know exactly why:
The hand-level |
Summary
Closes #87
All 32 agent manifests and 17 hand definitions now declare explicit allowlists. Previously
skills = []andmcp_servers = []meant "all" — every agent received all 60+ skills and all MCP tools on every LLM call.Changes per agent
profile: ToolProfile (Minimal/Coding/Research/Messaging/Automation/Full)skills: explicit skill allowlist per roleskills_disabled: true for content-only hands (clip, creator)mcp_servers: per-agent MCP allowlist (["_none"]sentinel for zero MCPs, pending upstreammcp_disabledfield #4808)max_history_messages: 8/12/15/20 by rolemax_llm_tokens_per_hour: light=100k, standard=200k, heavy=500k, unlimited=0Token impact
Commits
Related
mcp_disabledfield (needed for proper "no MCP" semantics)