fix(agents,hands): per-agent/per-hand mcp_servers / skills allowlists (#87)#92
Merged
Conversation
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
f9a9866 to
e5645b8
Compare
…#87) All 32 agent manifests and 17 hands shipped with empty mcp_servers / skills lists, which the kernel interprets as "no filter" — every globally-configured MCP server's tools and every installed skill get injected into the prompt on every LLM call. On a typical instance (9 MCP servers, ~85 MCP tools + ~82 built-in tools) that's ~50k input tokens per turn spent on definitions the agent never uses. Changes ------- 32 agents/*/agent.toml: - mcp_servers: 1-4 per agent. memory wherever state persists across turns; fetch / exa-search / brave-search only where the prompt actually calls for web; git / github / filesystem on engineering agents; gmail / google-calendar / linear / jira on productivity agents whose prompts mention them. - skills: per-role allowlist driven by what the system_prompt names (e.g. coder → rust/python/typescript/git/shell-scripting; devops- lead → docker/kubernetes/terraform/ansible/ci-cd/helm/prometheus/ sysadmin). Generalists (assistant) keep skills = [] (see "Open items" below). - skills_disabled = true on the four short-conversational agents (hello-world, recipe-assistant, health-tracker, home-automation). Their system prompts never instruct the LLM to consult any skill, so loading all 60 was pure waste. They also drop the explicit max_history_messages override and inherit the kernel default (60). - max_history_messages tiered by workload shape: 60 short conversational (hello-world, recipe, health-tracker, home-automation) — inherits the rising kernel default (`DEFAULT_MAX_HISTORY_MESSAGES = 60`); no override needed. 60 single-turn task agents (writer, translator, doc-writer, email-assistant, customer-support, sales-assistant, recruit- er, social-media, personal-finance, tutor, travel-planner, meeting-assistant, ops, devops-lead, planner) — explicit override at the same value to lock the cap if the kernel default moves again. 80 multi-step / tool-heavy (coder, debugger, architect, code- reviewer, test-engineer, security-auditor, analyst, data- scientist, academic-researcher, researcher, legal-assistant) 120 coordinators (assistant, orchestrator) — long multi-agent sessions where prompt-cache continuity is critical All values sit at or above the kernel default. Pinning lower would thrash the prompt cache (the failure mode #91 fixed for the creator hand by *raising* the cap, not lowering it). 17 hands/*/HAND.toml: - hand-level mcp_servers / skills now declared on every hand, so every [agents.*] inside inherits a sensible allowlist. - skills_disabled = true placed on each [agents.*] inside clip and creator (pure media pipelines that don't benefit from any skill). HandDefinitionRaw in librefang-hands does NOT have a top-level skills_disabled field — declaring it at the hand top level would be silently dropped by serde, so the setting must live on the AgentManifest of each sub-agent role. - devteam: expand existing mcp_servers = ["github"] to include memory / git / filesystem; populate skills with the expected dev-team expertise (replacing the placeholder skills = []). - wiki: replace placeholder mcp_servers = [] with [memory, fetch, filesystem]. Hand-level skills stays []. - lead: hand-level skills was originally [email-writer, writing- coach, interview-prep]; interview-prep is for job-interview preparation, not lead generation. Replaced with data-analyst (used by the qualification-scoring step in the prompt). schema.toml: register mcp_servers / skills / max_history_messages on the agent field schema so machine consumers (RegistrySchema in librefang-types) see the new top-level fields. The max_history_messages description now points at librefang_runtime::agent_loop::DEFAULT_MAX_HISTORY_MESSAGES (60 today) by name, so the schema doesn't go stale when the constant moves again. agents/README.md: example block + "Adding a New Agent" checklist mention the allowlists; max_history_messages example is shown commented out with a prompt-cache caveat. Open items ---------- `assistant` (the default user-facing agent) keeps `skills = []` deliberately. It is the generalist entry point — capping its skill surface at a small allowlist would defeat its "delegate to any specialist" job. The trade-off is that this single agent still pays the full skill-definition load on every turn; operators who want a strict allowlist for `assistant` can override it after install. Why not adopt PR #89's approach ------------------------------- #89 covers similar ground but with three issues this PR avoids: 1. mcp_servers = ["_none"] sentinel. #89's body explicitly notes it's pending upstream librefang#4808 (mcp_disabled). Shipping a magic-string today means coming back later to clean it up. This PR uses real allowlists. 2. max_history_messages = 8 / 12 / 15 / 20. Far below today's kernel default (60) and #91's direction for long-workflow hands (80–120). Every turn that hits the cap invalidates the cached prompt prefix; the cost of cache misses exceeds the saving from shorter history. This PR uses 60–120. 3. Doubling max_llm_tokens_per_hour (coder 200k→500k, assistant 300k→500k) widens the per-agent budget — the opposite direction from #87's "reduce per-call cost" goal. Left to the operator's instance-specific tuning. Refs #87, #89
e5645b8 to
f13c9b9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #87. Alternative to / supersedes #89 — see "Why not #89" below.
Summary
All 32 agent manifests and 17 hands ship with empty
mcp_servers/skillslists, which the kernel interprets as "no filter" — every globally-configured MCP server's tools and every installed skill get injected into the prompt on every LLM call. On a typical instance (9 MCP servers ≈ 85 MCP tools + ~82 built-in tools) that's ~50k input tokens per turn spent on definitions the agent never uses.What changed (51 files, +462/-8)
32
agents/*/agent.tomlmcp_servers— 1–4 per agent (e.g.coder → [memory, git, github, filesystem],devops-lead → [memory, git, github, filesystem],meeting-assistant → [memory, google-calendar, gmail],hello-world → [memory, fetch]).skills— per-role allowlist driven by the system prompt (e.g.devops-lead → [docker, kubernetes, terraform, ansible, ci-cd, helm, prometheus, sysadmin]).skills_disabled = trueon the four pure-conversational agents (hello-world,recipe-assistant,health-tracker,home-automation). Their prompts never instruct the LLM to consult any skill, so loading all 60 was waste.max_history_messages— tiered, all values at or above the rising kernel default (DEFAULT_MAX_HISTORY_MESSAGES = 60):Pinning below the kernel default would thrash the prompt cache, the failure mode #91 fixed for the creator hand by raising the cap.
17
hands/*/HAND.tomlmcp_servers+skills(inherited by every[agents.*]inside).clip+creator:skills_disabled = trueplaced on each[agents.*]sub-table, not at the hand top level.HandDefinitionRawinlibrefang-handsdoesn't carry a top-levelskills_disabledfield — a hand-level entry would be silently dropped by serde — so the setting must live on theAgentManifestof each sub-agent.devteamextends its existingmcp_servers = ["github"]and replaces the placeholderskills = []with the expected dev-team expertise.wikireplaces placeholdermcp_servers = []with[memory, fetch, filesystem].lead: hand-levelskillswas originally[email-writer, writing-coach, interview-prep];interview-prepis for job-interview prep, not lead generation. Replaced withdata-analyst(used by the qualification-scoring step).schema.toml+agents/README.mdmcp_servers/skills/max_history_messageson the agent field schema (soRegistrySchemainlibrefang-typessees them).max_history_messagesdescription points atlibrefang_runtime::agent_loop::DEFAULT_MAX_HISTORY_MESSAGESby name (60 today) so the schema doesn't go stale when the constant moves again.max_history_messagesexample is shown commented out with a prompt-cache caveat.Open items / explicit trade-offs
assistant.skills = [](no allowlist) — kept deliberately.assistantis the default generalist entry point; capping its skill surface at a small allowlist would defeat its "delegate to any specialist" job. The cost is that this single agent still pays the full skill-definition load on every turn. Operators who want a strict allowlist forassistantcan override it after install. Open to a reviewer call here.Estimated impact (per #87)
Why not adopt #89
#89 covers similar ground but with three issues this PR avoids:
mcp_servers = ["_none"]sentinel — feat(agents,hands): per-agent skills, mcp_servers, profile, budget, and history limits #89's PR body explicitly notes it's pending upstreammcp_disabled(librefang#4808). This PR uses real allowlists.max_history_messages = 8 / 12 / 15 / 20— far below today's kernel default of 60 (raised from 40 on 2026-05-11) and fix(creator): raise max_history_messages + repair refresh-cache CI #91's direction for long-workflow hands (80–120). Every turn that hits the cap invalidates the cached prompt prefix; the cost of cache misses exceeds the saving from shorter history. This PR uses 60–120.max_llm_tokens_per_hourwidens the per-agent budget — the opposite direction from Agent manifests missing mcp_servers / skills / max_history_messages — 50k+ tokens wasted per call #87's "reduce per-call cost" goal. Left to the operator.Verification
python scripts/validate.py→ PASSED (32 agents, 17 hands, 33 mcp, 60 skills, 11 plugins, 267 models, 57 providers).find . -name '*.toml' | xargs taplo fmt --check→ clean.mcp_serversentry resolves to anmcp/*.toml, everyskillsentry resolves to askills/*/directory (only pre-existing exception:agents.main.skills = ["wiki-librarian"]inhands/wiki/HAND.toml, untouched by this PR, references an out-of-tree skill).crates/librefang-types/src/agent.rs::AgentManifest(mcp_servers, skills, max_history_messages, skills_disabled — all top-level) andcrates/librefang-hands/src/lib.rs::HandDefinitionRaw(mcp_servers, skills present; skills_disabled NOT present → must go on sub-agents).Notes for reviewers
tools = […],shell = […], and the verbs/services its system prompt mentions. If your instance routinely needs an MCP server I didn't include for a given role, please call it out inline so we widen that one list rather than collapsing back to[].scripts/validate.pydoes not cross-checkmcp_servers/skillsreferences againstmcp/andskills/inventories. Theinterview-preptypo inleadand thewiki-librariandangling reference would have been caught by a 20-line check.