fix(creator): raise max_history_messages + repair refresh-cache CI#91
Merged
Conversation
Creator Hand's async video_generate path polls video_status every 15-20s
until completion (1-3 min typical), consuming ~5-15 turns per video
request. Combined workflows (video + TTS + music) plus normal back-and-
forth cross the kernel default of 40 messages quickly, which surfaced
in user logs as:
WARN run_agent_loop: Trimming old messages at safe turn boundary
agent=creator:creator-hand total_messages=41 trimming=2
INFO run_agent_loop: prompt cache metrics for turn
hit_ratio=0.0 creation=0 read=0
Every turn was hitting the trim cap and invalidating the prompt-cache
prefix. 80 covers ~30 polling iterations plus a comfortable pre-context
window without runaway memory growth. Other hands keep the default 40.
Branch protection on `main` started rejecting the workflow's auto-commit with GH006 "Changes must be made through a pull request" — see run 25632824585 on 2026-05-10 against commit 6785807 (the first push that hit the tightened protection). Direct push is precisely what the file's own security comment (#1) warns against ("Compromised maintainer pushes a malicious plugins-index.json directly to main. Mitigation: GitHub branch protection on main requires PR review"), so the fix preserves that gate rather than working around it. The workflow now creates a short-lived `automation/refresh-indexes-<sha>` branch, commits the regen there, pushes, and opens a PR back to main via `gh pr create`. Maintainers see a one-click squash-merge. Permissions: add `pull-requests: write` to the existing `contents: write` so `gh pr create` can be authorised through the default GITHUB_TOKEN. The post-merge run on the index PR is a no-op (no diff under `hands/**`, `plugins/**`, etc. between consecutive states), so no `[skip ci]` marker is needed and no loop is possible. Without this fix, every content PR landing on main leaves plugins-index.json + registry-index.json stale, blocking new agents and hands from reaching daemons until a maintainer manually regenerates.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Three hand coordinators have workflows that routinely exceed the kernel default history cap on a single user turn: - researcher (max_iterations=80) — deep web_search → web_fetch → summarize loops with multi-source synthesis. 80 iterations × ~4 messages each → 200+ messages per user turn. Set to 120. - devops (max_iterations=60) — incident response and CI/CD fan out into long shell_exec chains (logs, retries, post-mortems). Set to 80. - predictor (max_iterations=60) — long reasoning chains accumulating signals across many web/knowledge queries, with scheduled re-checks referring back. Set to 80. Creator's existing override is rephrased "raise above the kernel default" so the comment stays correct regardless of the order this PR and the upstream kernel-default bump (librefang side) land in. Other hands (lead/linkedin/reddit/clip/analytics/apitester/browser/ collector/strategist) stay on the kernel default; the upstream bump covers them.
3 tasks
4 tasks
houko
added a commit
that referenced
this pull request
May 12, 2026
…#87) All 32 agent manifests and 17 hands shipped with empty mcp_servers / skills lists, which the kernel interprets as "no filter" — every globally-configured MCP server's tools and every installed skill get injected into the prompt on every LLM call. On a typical instance (9 MCP servers, ~85 MCP tools + ~82 built-in tools) that's ~50k input tokens per turn spent on definitions the agent never uses. Changes ------- 32 agents/*/agent.toml: - mcp_servers: 1-4 per agent. memory wherever state persists across turns; fetch / exa-search / brave-search only where the prompt actually calls for web; git / github / filesystem on engineering agents; gmail / google-calendar / linear / jira on productivity agents whose prompts mention them. - skills: per-role allowlist driven by what the system_prompt names (e.g. coder → rust/python/typescript/git/shell-scripting; devops- lead → docker/kubernetes/terraform/ansible/ci-cd/helm/prometheus/ sysadmin). Generalists (assistant) and low-surface agents keep skills = []. - max_history_messages: tiered by workload shape — 40 hello-world, recipe-assistant, health-tracker, home-automation (short conversational agents) 60 writer, translator, doc-writer, email-assistant, customer- support, sales-assistant, recruiter, social-media, personal- finance, tutor, travel-planner, meeting-assistant, ops, devops-lead, planner (single-turn task agents) 80 coder, debugger, architect, code-reviewer, test-engineer, security-auditor, analyst, data-scientist, academic- researcher, researcher, legal-assistant (multi-step, tool- heavy work) 120 assistant, orchestrator (long multi-agent coordination — prompt-cache continuity is critical) All four tiers sit at or above the rising kernel default (40 today, rising upstream). Pinning lower would thrash the prompt cache; #91 already demonstrated this on the creator hand. 17 hands/*/HAND.toml: - hand-level mcp_servers / skills now declared on every hand, so every [agents.*] inside inherits a sensible allowlist. - skills_disabled = true on clip and creator (pure media pipelines that don't benefit from any skill). - devteam: expand existing mcp_servers = ["github"] to include memory / git / filesystem; populate skills with the expected dev-team expertise (replacing the placeholder skills = []). - wiki: replace placeholder mcp_servers = [] with [memory, fetch, filesystem]. Hand-level skills stays [] so the existing per- agent override (agents.main.skills = ["wiki-librarian"]) keeps its meaning. schema.toml: register mcp_servers / skills / max_history_messages on the agent field schema so machine consumers (RegistrySchema in librefang-types) see the new top-level fields. agents/README.md: update example block to show the allowlists, plus a max_history_messages override commented out with a prompt-cache caveat. "Adding a New Agent" checklist now calls out the allowlists. Why not adopt PR #89's approach ------------------------------- PR #89 covers similar ground but with three issues this PR avoids: 1. PR #89 uses mcp_servers = ["_none"] as a sentinel for "no MCPs", explicitly pending upstream librefang#4808 (mcp_disabled). Shipping a magic-string today means coming back later to clean it up after the upstream field lands. This PR omits "_none" and uses real allowlists where MCPs are needed. 2. PR #89 sets max_history_messages to 8 / 12 / 15 / 20 — at or below the kernel's MIN_HISTORY_MESSAGES floor (4) and far below today's default (40). Every turn that hits the cap invalidates the cached prompt prefix, costing more than the smaller history saves. The values in this PR (40-120) align with #91's direction for long-workflow hands. 3. PR #89 also doubles several max_llm_tokens_per_hour caps (coder 200k→500k, assistant 300k→500k). That widens the per-agent budget — the opposite direction from #87's "reduce per-call cost" goal — and is left to the operator's instance-specific tuning. Refs #87, #89
houko
added a commit
that referenced
this pull request
May 12, 2026
…#87) All 32 agent manifests and 17 hands shipped with empty mcp_servers / skills lists, which the kernel interprets as "no filter" — every globally-configured MCP server's tools and every installed skill get injected into the prompt on every LLM call. On a typical instance (9 MCP servers, ~85 MCP tools + ~82 built-in tools) that's ~50k input tokens per turn spent on definitions the agent never uses. Changes ------- 32 agents/*/agent.toml: - mcp_servers: 1-4 per agent. memory wherever state persists across turns; fetch / exa-search / brave-search only where the prompt actually calls for web; git / github / filesystem on engineering agents; gmail / google-calendar / linear / jira on productivity agents whose prompts mention them. - skills: per-role allowlist driven by what the system_prompt names (e.g. coder → rust/python/typescript/git/shell-scripting; devops- lead → docker/kubernetes/terraform/ansible/ci-cd/helm/prometheus/ sysadmin). Generalists (assistant) keep skills = [] (see "Open items" below). - skills_disabled = true on the four short-conversational agents (hello-world, recipe-assistant, health-tracker, home-automation). Their system prompts never instruct the LLM to consult any skill, so loading all 60 was pure waste. They also drop the explicit max_history_messages override and inherit the kernel default (60). - max_history_messages tiered by workload shape: 60 short conversational (hello-world, recipe, health-tracker, home-automation) — inherits the rising kernel default (`DEFAULT_MAX_HISTORY_MESSAGES = 60`); no override needed. 60 single-turn task agents (writer, translator, doc-writer, email-assistant, customer-support, sales-assistant, recruit- er, social-media, personal-finance, tutor, travel-planner, meeting-assistant, ops, devops-lead, planner) — explicit override at the same value to lock the cap if the kernel default moves again. 80 multi-step / tool-heavy (coder, debugger, architect, code- reviewer, test-engineer, security-auditor, analyst, data- scientist, academic-researcher, researcher, legal-assistant) 120 coordinators (assistant, orchestrator) — long multi-agent sessions where prompt-cache continuity is critical All values sit at or above the kernel default. Pinning lower would thrash the prompt cache (the failure mode #91 fixed for the creator hand by *raising* the cap, not lowering it). 17 hands/*/HAND.toml: - hand-level mcp_servers / skills now declared on every hand, so every [agents.*] inside inherits a sensible allowlist. - skills_disabled = true placed on each [agents.*] inside clip and creator (pure media pipelines that don't benefit from any skill). HandDefinitionRaw in librefang-hands does NOT have a top-level skills_disabled field — declaring it at the hand top level would be silently dropped by serde, so the setting must live on the AgentManifest of each sub-agent role. - devteam: expand existing mcp_servers = ["github"] to include memory / git / filesystem; populate skills with the expected dev-team expertise (replacing the placeholder skills = []). - wiki: replace placeholder mcp_servers = [] with [memory, fetch, filesystem]. Hand-level skills stays []. - lead: hand-level skills was originally [email-writer, writing- coach, interview-prep]; interview-prep is for job-interview preparation, not lead generation. Replaced with data-analyst (used by the qualification-scoring step in the prompt). schema.toml: register mcp_servers / skills / max_history_messages on the agent field schema so machine consumers (RegistrySchema in librefang-types) see the new top-level fields. The max_history_messages description now points at librefang_runtime::agent_loop::DEFAULT_MAX_HISTORY_MESSAGES (60 today) by name, so the schema doesn't go stale when the constant moves again. agents/README.md: example block + "Adding a New Agent" checklist mention the allowlists; max_history_messages example is shown commented out with a prompt-cache caveat. Open items ---------- `assistant` (the default user-facing agent) keeps `skills = []` deliberately. It is the generalist entry point — capping its skill surface at a small allowlist would defeat its "delegate to any specialist" job. The trade-off is that this single agent still pays the full skill-definition load on every turn; operators who want a strict allowlist for `assistant` can override it after install. Why not adopt PR #89's approach ------------------------------- #89 covers similar ground but with three issues this PR avoids: 1. mcp_servers = ["_none"] sentinel. #89's body explicitly notes it's pending upstream librefang#4808 (mcp_disabled). Shipping a magic-string today means coming back later to clean it up. This PR uses real allowlists. 2. max_history_messages = 8 / 12 / 15 / 20. Far below today's kernel default (60) and #91's direction for long-workflow hands (80–120). Every turn that hits the cap invalidates the cached prompt prefix; the cost of cache misses exceeds the saving from shorter history. This PR uses 60–120. 3. Doubling max_llm_tokens_per_hour (coder 200k→500k, assistant 300k→500k) widens the per-agent budget — the opposite direction from #87's "reduce per-call cost" goal. Left to the operator's instance-specific tuning. Refs #87, #89
houko
added a commit
that referenced
this pull request
May 12, 2026
…#87) (#92) All 32 agent manifests and 17 hands shipped with empty mcp_servers / skills lists, which the kernel interprets as "no filter" — every globally-configured MCP server's tools and every installed skill get injected into the prompt on every LLM call. On a typical instance (9 MCP servers, ~85 MCP tools + ~82 built-in tools) that's ~50k input tokens per turn spent on definitions the agent never uses. Changes ------- 32 agents/*/agent.toml: - mcp_servers: 1-4 per agent. memory wherever state persists across turns; fetch / exa-search / brave-search only where the prompt actually calls for web; git / github / filesystem on engineering agents; gmail / google-calendar / linear / jira on productivity agents whose prompts mention them. - skills: per-role allowlist driven by what the system_prompt names (e.g. coder → rust/python/typescript/git/shell-scripting; devops- lead → docker/kubernetes/terraform/ansible/ci-cd/helm/prometheus/ sysadmin). Generalists (assistant) keep skills = [] (see "Open items" below). - skills_disabled = true on the four short-conversational agents (hello-world, recipe-assistant, health-tracker, home-automation). Their system prompts never instruct the LLM to consult any skill, so loading all 60 was pure waste. They also drop the explicit max_history_messages override and inherit the kernel default (60). - max_history_messages tiered by workload shape: 60 short conversational (hello-world, recipe, health-tracker, home-automation) — inherits the rising kernel default (`DEFAULT_MAX_HISTORY_MESSAGES = 60`); no override needed. 60 single-turn task agents (writer, translator, doc-writer, email-assistant, customer-support, sales-assistant, recruit- er, social-media, personal-finance, tutor, travel-planner, meeting-assistant, ops, devops-lead, planner) — explicit override at the same value to lock the cap if the kernel default moves again. 80 multi-step / tool-heavy (coder, debugger, architect, code- reviewer, test-engineer, security-auditor, analyst, data- scientist, academic-researcher, researcher, legal-assistant) 120 coordinators (assistant, orchestrator) — long multi-agent sessions where prompt-cache continuity is critical All values sit at or above the kernel default. Pinning lower would thrash the prompt cache (the failure mode #91 fixed for the creator hand by *raising* the cap, not lowering it). 17 hands/*/HAND.toml: - hand-level mcp_servers / skills now declared on every hand, so every [agents.*] inside inherits a sensible allowlist. - skills_disabled = true placed on each [agents.*] inside clip and creator (pure media pipelines that don't benefit from any skill). HandDefinitionRaw in librefang-hands does NOT have a top-level skills_disabled field — declaring it at the hand top level would be silently dropped by serde, so the setting must live on the AgentManifest of each sub-agent role. - devteam: expand existing mcp_servers = ["github"] to include memory / git / filesystem; populate skills with the expected dev-team expertise (replacing the placeholder skills = []). - wiki: replace placeholder mcp_servers = [] with [memory, fetch, filesystem]. Hand-level skills stays []. - lead: hand-level skills was originally [email-writer, writing- coach, interview-prep]; interview-prep is for job-interview preparation, not lead generation. Replaced with data-analyst (used by the qualification-scoring step in the prompt). schema.toml: register mcp_servers / skills / max_history_messages on the agent field schema so machine consumers (RegistrySchema in librefang-types) see the new top-level fields. The max_history_messages description now points at librefang_runtime::agent_loop::DEFAULT_MAX_HISTORY_MESSAGES (60 today) by name, so the schema doesn't go stale when the constant moves again. agents/README.md: example block + "Adding a New Agent" checklist mention the allowlists; max_history_messages example is shown commented out with a prompt-cache caveat. Open items ---------- `assistant` (the default user-facing agent) keeps `skills = []` deliberately. It is the generalist entry point — capping its skill surface at a small allowlist would defeat its "delegate to any specialist" job. The trade-off is that this single agent still pays the full skill-definition load on every turn; operators who want a strict allowlist for `assistant` can override it after install. Why not adopt PR #89's approach ------------------------------- #89 covers similar ground but with three issues this PR avoids: 1. mcp_servers = ["_none"] sentinel. #89's body explicitly notes it's pending upstream librefang#4808 (mcp_disabled). Shipping a magic-string today means coming back later to clean it up. This PR uses real allowlists. 2. max_history_messages = 8 / 12 / 15 / 20. Far below today's kernel default (60) and #91's direction for long-workflow hands (80–120). Every turn that hits the cap invalidates the cached prompt prefix; the cost of cache misses exceeds the saving from shorter history. This PR uses 60–120. 3. Doubling max_llm_tokens_per_hour (coder 200k→500k, assistant 300k→500k) widens the per-agent budget — the opposite direction from #87's "reduce per-call cost" goal. Left to the operator's instance-specific tuning. Refs #87, #89
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two related fixes shipped together because the second blocks the first from reaching users.
Summary
1.
hands/creator/HAND.toml—max_history_messages = 80on[agents.main]Creator Hand's async
video_generateflow pollsvideo_statusevery 15-20s until completion (1-3 min typical), consuming 5-15 turns per video request. Combined workflows (video + TTS + music) plus normal back-and-forth cross the kernel default of 40 messages quickly.User log (real-world observation):
Every turn was hitting the trim cap and invalidating the prompt-cache prefix. 80 covers ~30 polling iterations plus a comfortable pre-context window without runaway memory growth. Other hands keep the default 40.
2.
.github/workflows/refresh-cache.yml— open PR instead ofgit pushto mainBranch protection on
mainstarted rejecting the workflow's auto-commit withGH006 "Changes must be made through a pull request"— see run 25632824585 on 2026-05-10. Direct push is precisely what the file's own security comment (mitigation #1) warns against ("Compromised maintainer pushes a malicious plugins-index.json directly to main. Mitigation: GitHub branch protection on main requires PR review"), so this PR preserves that gate rather than working around it.The workflow now:
automation/refresh-indexes-<sha>branchmainviagh pr createMaintainers see a one-click squash-merge.
Permissions: added
pull-requests: writeto the existingcontents: writesogh pr createis authorised through the defaultGITHUB_TOKEN.No loop risk: the post-merge run on the index PR is a no-op (no diff under
hands/**,plugins/**, etc. between consecutive states), so no[skip ci]marker is needed.Why bundled
Without the workflow fix, the creator-hand
max_history_messageschange would land onmainbutplugins-index.jsonwould stay stale (last successful regen: 2026-05-04). Daemons fetching the index wouldn't see the new value. Same blast radius applies to every other content PR currently waiting onmain.Verification
python3 -c "import yaml; yaml.safe_load(open('.github/workflows/refresh-cache.yml'))"→ OKhands/creator/HAND.toml: 7-line addition (config + comment), no other changespermissions:blockTest plan
hands/**etc. triggers refresh-cache; verify it opens a freshautomation/refresh-indexes-<sha>PR (not red GH006).plugins-index.jsonandregistry-index.jsonreflect the new content, andhttps://stats.librefang.ai/api/registry/refreshreturns 200.Once this lands and a fresh refresh-cache run succeeds, the user-side fix is for daemons running Creator Hand to pull the regenerated index (or restart). No code change in librefang is required for this half of the PR.