fix(creator): raise max_history_messages + repair refresh-cache CI by houko · Pull Request #91 · librefang/librefang-registry

houko · 2026-05-11T06:20:06Z

Two related fixes shipped together because the second blocks the first from reaching users.

Summary

1. `hands/creator/HAND.toml` — `max_history_messages = 80` on `[agents.main]`

Creator Hand's async video_generate flow polls video_status every 15-20s until completion (1-3 min typical), consuming 5-15 turns per video request. Combined workflows (video + TTS + music) plus normal back-and-forth cross the kernel default of 40 messages quickly.

User log (real-world observation):

WARN run_agent_loop: Trimming old messages at safe turn boundary
  agent=creator:creator-hand total_messages=41 trimming=2
INFO run_agent_loop: prompt cache metrics for turn
  hit_ratio=0.0 creation=0 read=0

Every turn was hitting the trim cap and invalidating the prompt-cache prefix. 80 covers ~30 polling iterations plus a comfortable pre-context window without runaway memory growth. Other hands keep the default 40.

2. `.github/workflows/refresh-cache.yml` — open PR instead of `git push` to main

Branch protection on main started rejecting the workflow's auto-commit with GH006 "Changes must be made through a pull request" — see run 25632824585 on 2026-05-10. Direct push is precisely what the file's own security comment (mitigation #1) warns against ("Compromised maintainer pushes a malicious plugins-index.json directly to main. Mitigation: GitHub branch protection on main requires PR review"), so this PR preserves that gate rather than working around it.

The workflow now:

creates a short-lived automation/refresh-indexes-<sha> branch
commits the regen there
pushes
opens a PR back to main via gh pr create

Maintainers see a one-click squash-merge.

Permissions: added pull-requests: write to the existing contents: write so gh pr create is authorised through the default GITHUB_TOKEN.

No loop risk: the post-merge run on the index PR is a no-op (no diff under hands/**, plugins/**, etc. between consecutive states), so no [skip ci] marker is needed.

Why bundled

Without the workflow fix, the creator-hand max_history_messages change would land on main but plugins-index.json would stay stale (last successful regen: 2026-05-04). Daemons fetching the index wouldn't see the new value. Same blast radius applies to every other content PR currently waiting on main.

Verification

python3 -c "import yaml; yaml.safe_load(open('.github/workflows/refresh-cache.yml'))" → OK
Diff of hands/creator/HAND.toml: 7-line addition (config + comment), no other changes
Workflow diff: only the "Commit regenerated indexes if changed" step (renamed → "Open PR with regenerated indexes if changed") and the permissions: block

Test plan

On merge, the next push that touches hands/** etc. triggers refresh-cache; verify it opens a fresh automation/refresh-indexes-<sha> PR (not red GH006).
Auto-opened PR is mergeable and the post-merge run is a no-op.
After auto-PR merges, plugins-index.json and registry-index.json reflect the new content, and https://stats.librefang.ai/api/registry/refresh returns 200.

Once this lands and a fresh refresh-cache run succeeds, the user-side fix is for daemons running Creator Hand to pull the regenerated index (or restart). No code change in librefang is required for this half of the PR.

Creator Hand's async video_generate path polls video_status every 15-20s until completion (1-3 min typical), consuming ~5-15 turns per video request. Combined workflows (video + TTS + music) plus normal back-and- forth cross the kernel default of 40 messages quickly, which surfaced in user logs as: WARN run_agent_loop: Trimming old messages at safe turn boundary agent=creator:creator-hand total_messages=41 trimming=2 INFO run_agent_loop: prompt cache metrics for turn hit_ratio=0.0 creation=0 read=0 Every turn was hitting the trim cap and invalidating the prompt-cache prefix. 80 covers ~30 polling iterations plus a comfortable pre-context window without runaway memory growth. Other hands keep the default 40.

Branch protection on `main` started rejecting the workflow's auto-commit with GH006 "Changes must be made through a pull request" — see run 25632824585 on 2026-05-10 against commit 6785807 (the first push that hit the tightened protection). Direct push is precisely what the file's own security comment (#1) warns against ("Compromised maintainer pushes a malicious plugins-index.json directly to main. Mitigation: GitHub branch protection on main requires PR review"), so the fix preserves that gate rather than working around it. The workflow now creates a short-lived `automation/refresh-indexes-<sha>` branch, commits the regen there, pushes, and opens a PR back to main via `gh pr create`. Maintainers see a one-click squash-merge. Permissions: add `pull-requests: write` to the existing `contents: write` so `gh pr create` can be authorised through the default GITHUB_TOKEN. The post-merge run on the index PR is a no-op (no diff under `hands/**`, `plugins/**`, etc. between consecutive states), so no `[skip ci]` marker is needed and no loop is possible. Without this fix, every content PR landing on main leaves plugins-index.json + registry-index.json stale, blocking new agents and hands from reaching daemons until a maintainer manually regenerates.

chatgpt-codex-connector · 2026-05-11T06:20:13Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Three hand coordinators have workflows that routinely exceed the kernel default history cap on a single user turn: - researcher (max_iterations=80) — deep web_search → web_fetch → summarize loops with multi-source synthesis. 80 iterations × ~4 messages each → 200+ messages per user turn. Set to 120. - devops (max_iterations=60) — incident response and CI/CD fan out into long shell_exec chains (logs, retries, post-mortems). Set to 80. - predictor (max_iterations=60) — long reasoning chains accumulating signals across many web/knowledge queries, with scheduled re-checks referring back. Set to 80. Creator's existing override is rephrased "raise above the kernel default" so the comment stays correct regardless of the order this PR and the upstream kernel-default bump (librefang side) land in. Other hands (lead/linkedin/reddit/clip/analytics/apitester/browser/ collector/strategist) stay on the kernel default; the upstream bump covers them.

…#87) All 32 agent manifests and 17 hands shipped with empty mcp_servers / skills lists, which the kernel interprets as "no filter" — every globally-configured MCP server's tools and every installed skill get injected into the prompt on every LLM call. On a typical instance (9 MCP servers, ~85 MCP tools + ~82 built-in tools) that's ~50k input tokens per turn spent on definitions the agent never uses. Changes ------- 32 agents/*/agent.toml: - mcp_servers: 1-4 per agent. memory wherever state persists across turns; fetch / exa-search / brave-search only where the prompt actually calls for web; git / github / filesystem on engineering agents; gmail / google-calendar / linear / jira on productivity agents whose prompts mention them. - skills: per-role allowlist driven by what the system_prompt names (e.g. coder → rust/python/typescript/git/shell-scripting; devops- lead → docker/kubernetes/terraform/ansible/ci-cd/helm/prometheus/ sysadmin). Generalists (assistant) and low-surface agents keep skills = []. - max_history_messages: tiered by workload shape — 40 hello-world, recipe-assistant, health-tracker, home-automation (short conversational agents) 60 writer, translator, doc-writer, email-assistant, customer- support, sales-assistant, recruiter, social-media, personal- finance, tutor, travel-planner, meeting-assistant, ops, devops-lead, planner (single-turn task agents) 80 coder, debugger, architect, code-reviewer, test-engineer, security-auditor, analyst, data-scientist, academic- researcher, researcher, legal-assistant (multi-step, tool- heavy work) 120 assistant, orchestrator (long multi-agent coordination — prompt-cache continuity is critical) All four tiers sit at or above the rising kernel default (40 today, rising upstream). Pinning lower would thrash the prompt cache; #91 already demonstrated this on the creator hand. 17 hands/*/HAND.toml: - hand-level mcp_servers / skills now declared on every hand, so every [agents.*] inside inherits a sensible allowlist. - skills_disabled = true on clip and creator (pure media pipelines that don't benefit from any skill). - devteam: expand existing mcp_servers = ["github"] to include memory / git / filesystem; populate skills with the expected dev-team expertise (replacing the placeholder skills = []). - wiki: replace placeholder mcp_servers = [] with [memory, fetch, filesystem]. Hand-level skills stays [] so the existing per- agent override (agents.main.skills = ["wiki-librarian"]) keeps its meaning. schema.toml: register mcp_servers / skills / max_history_messages on the agent field schema so machine consumers (RegistrySchema in librefang-types) see the new top-level fields. agents/README.md: update example block to show the allowlists, plus a max_history_messages override commented out with a prompt-cache caveat. "Adding a New Agent" checklist now calls out the allowlists. Why not adopt PR #89's approach ------------------------------- PR #89 covers similar ground but with three issues this PR avoids: 1. PR #89 uses mcp_servers = ["_none"] as a sentinel for "no MCPs", explicitly pending upstream librefang#4808 (mcp_disabled). Shipping a magic-string today means coming back later to clean it up after the upstream field lands. This PR omits "_none" and uses real allowlists where MCPs are needed. 2. PR #89 sets max_history_messages to 8 / 12 / 15 / 20 — at or below the kernel's MIN_HISTORY_MESSAGES floor (4) and far below today's default (40). Every turn that hits the cap invalidates the cached prompt prefix, costing more than the smaller history saves. The values in this PR (40-120) align with #91's direction for long-workflow hands. 3. PR #89 also doubles several max_llm_tokens_per_hour caps (coder 200k→500k, assistant 300k→500k). That widens the per-agent budget — the opposite direction from #87's "reduce per-call cost" goal — and is left to the operator's instance-specific tuning. Refs #87, #89

…#87) All 32 agent manifests and 17 hands shipped with empty mcp_servers / skills lists, which the kernel interprets as "no filter" — every globally-configured MCP server's tools and every installed skill get injected into the prompt on every LLM call. On a typical instance (9 MCP servers, ~85 MCP tools + ~82 built-in tools) that's ~50k input tokens per turn spent on definitions the agent never uses. Changes ------- 32 agents/*/agent.toml: - mcp_servers: 1-4 per agent. memory wherever state persists across turns; fetch / exa-search / brave-search only where the prompt actually calls for web; git / github / filesystem on engineering agents; gmail / google-calendar / linear / jira on productivity agents whose prompts mention them. - skills: per-role allowlist driven by what the system_prompt names (e.g. coder → rust/python/typescript/git/shell-scripting; devops- lead → docker/kubernetes/terraform/ansible/ci-cd/helm/prometheus/ sysadmin). Generalists (assistant) keep skills = [] (see "Open items" below). - skills_disabled = true on the four short-conversational agents (hello-world, recipe-assistant, health-tracker, home-automation). Their system prompts never instruct the LLM to consult any skill, so loading all 60 was pure waste. They also drop the explicit max_history_messages override and inherit the kernel default (60). - max_history_messages tiered by workload shape: 60 short conversational (hello-world, recipe, health-tracker, home-automation) — inherits the rising kernel default (`DEFAULT_MAX_HISTORY_MESSAGES = 60`); no override needed. 60 single-turn task agents (writer, translator, doc-writer, email-assistant, customer-support, sales-assistant, recruit- er, social-media, personal-finance, tutor, travel-planner, meeting-assistant, ops, devops-lead, planner) — explicit override at the same value to lock the cap if the kernel default moves again. 80 multi-step / tool-heavy (coder, debugger, architect, code- reviewer, test-engineer, security-auditor, analyst, data- scientist, academic-researcher, researcher, legal-assistant) 120 coordinators (assistant, orchestrator) — long multi-agent sessions where prompt-cache continuity is critical All values sit at or above the kernel default. Pinning lower would thrash the prompt cache (the failure mode #91 fixed for the creator hand by *raising* the cap, not lowering it). 17 hands/*/HAND.toml: - hand-level mcp_servers / skills now declared on every hand, so every [agents.*] inside inherits a sensible allowlist. - skills_disabled = true placed on each [agents.*] inside clip and creator (pure media pipelines that don't benefit from any skill). HandDefinitionRaw in librefang-hands does NOT have a top-level skills_disabled field — declaring it at the hand top level would be silently dropped by serde, so the setting must live on the AgentManifest of each sub-agent role. - devteam: expand existing mcp_servers = ["github"] to include memory / git / filesystem; populate skills with the expected dev-team expertise (replacing the placeholder skills = []). - wiki: replace placeholder mcp_servers = [] with [memory, fetch, filesystem]. Hand-level skills stays []. - lead: hand-level skills was originally [email-writer, writing- coach, interview-prep]; interview-prep is for job-interview preparation, not lead generation. Replaced with data-analyst (used by the qualification-scoring step in the prompt). schema.toml: register mcp_servers / skills / max_history_messages on the agent field schema so machine consumers (RegistrySchema in librefang-types) see the new top-level fields. The max_history_messages description now points at librefang_runtime::agent_loop::DEFAULT_MAX_HISTORY_MESSAGES (60 today) by name, so the schema doesn't go stale when the constant moves again. agents/README.md: example block + "Adding a New Agent" checklist mention the allowlists; max_history_messages example is shown commented out with a prompt-cache caveat. Open items ---------- `assistant` (the default user-facing agent) keeps `skills = []` deliberately. It is the generalist entry point — capping its skill surface at a small allowlist would defeat its "delegate to any specialist" job. The trade-off is that this single agent still pays the full skill-definition load on every turn; operators who want a strict allowlist for `assistant` can override it after install. Why not adopt PR #89's approach ------------------------------- #89 covers similar ground but with three issues this PR avoids: 1. mcp_servers = ["_none"] sentinel. #89's body explicitly notes it's pending upstream librefang#4808 (mcp_disabled). Shipping a magic-string today means coming back later to clean it up. This PR uses real allowlists. 2. max_history_messages = 8 / 12 / 15 / 20. Far below today's kernel default (60) and #91's direction for long-workflow hands (80–120). Every turn that hits the cap invalidates the cached prompt prefix; the cost of cache misses exceeds the saving from shorter history. This PR uses 60–120. 3. Doubling max_llm_tokens_per_hour (coder 200k→500k, assistant 300k→500k) widens the per-agent budget — the opposite direction from #87's "reduce per-call cost" goal. Left to the operator's instance-specific tuning. Refs #87, #89

…#87) (#92) All 32 agent manifests and 17 hands shipped with empty mcp_servers / skills lists, which the kernel interprets as "no filter" — every globally-configured MCP server's tools and every installed skill get injected into the prompt on every LLM call. On a typical instance (9 MCP servers, ~85 MCP tools + ~82 built-in tools) that's ~50k input tokens per turn spent on definitions the agent never uses. Changes ------- 32 agents/*/agent.toml: - mcp_servers: 1-4 per agent. memory wherever state persists across turns; fetch / exa-search / brave-search only where the prompt actually calls for web; git / github / filesystem on engineering agents; gmail / google-calendar / linear / jira on productivity agents whose prompts mention them. - skills: per-role allowlist driven by what the system_prompt names (e.g. coder → rust/python/typescript/git/shell-scripting; devops- lead → docker/kubernetes/terraform/ansible/ci-cd/helm/prometheus/ sysadmin). Generalists (assistant) keep skills = [] (see "Open items" below). - skills_disabled = true on the four short-conversational agents (hello-world, recipe-assistant, health-tracker, home-automation). Their system prompts never instruct the LLM to consult any skill, so loading all 60 was pure waste. They also drop the explicit max_history_messages override and inherit the kernel default (60). - max_history_messages tiered by workload shape: 60 short conversational (hello-world, recipe, health-tracker, home-automation) — inherits the rising kernel default (`DEFAULT_MAX_HISTORY_MESSAGES = 60`); no override needed. 60 single-turn task agents (writer, translator, doc-writer, email-assistant, customer-support, sales-assistant, recruit- er, social-media, personal-finance, tutor, travel-planner, meeting-assistant, ops, devops-lead, planner) — explicit override at the same value to lock the cap if the kernel default moves again. 80 multi-step / tool-heavy (coder, debugger, architect, code- reviewer, test-engineer, security-auditor, analyst, data- scientist, academic-researcher, researcher, legal-assistant) 120 coordinators (assistant, orchestrator) — long multi-agent sessions where prompt-cache continuity is critical All values sit at or above the kernel default. Pinning lower would thrash the prompt cache (the failure mode #91 fixed for the creator hand by *raising* the cap, not lowering it). 17 hands/*/HAND.toml: - hand-level mcp_servers / skills now declared on every hand, so every [agents.*] inside inherits a sensible allowlist. - skills_disabled = true placed on each [agents.*] inside clip and creator (pure media pipelines that don't benefit from any skill). HandDefinitionRaw in librefang-hands does NOT have a top-level skills_disabled field — declaring it at the hand top level would be silently dropped by serde, so the setting must live on the AgentManifest of each sub-agent role. - devteam: expand existing mcp_servers = ["github"] to include memory / git / filesystem; populate skills with the expected dev-team expertise (replacing the placeholder skills = []). - wiki: replace placeholder mcp_servers = [] with [memory, fetch, filesystem]. Hand-level skills stays []. - lead: hand-level skills was originally [email-writer, writing- coach, interview-prep]; interview-prep is for job-interview preparation, not lead generation. Replaced with data-analyst (used by the qualification-scoring step in the prompt). schema.toml: register mcp_servers / skills / max_history_messages on the agent field schema so machine consumers (RegistrySchema in librefang-types) see the new top-level fields. The max_history_messages description now points at librefang_runtime::agent_loop::DEFAULT_MAX_HISTORY_MESSAGES (60 today) by name, so the schema doesn't go stale when the constant moves again. agents/README.md: example block + "Adding a New Agent" checklist mention the allowlists; max_history_messages example is shown commented out with a prompt-cache caveat. Open items ---------- `assistant` (the default user-facing agent) keeps `skills = []` deliberately. It is the generalist entry point — capping its skill surface at a small allowlist would defeat its "delegate to any specialist" job. The trade-off is that this single agent still pays the full skill-definition load on every turn; operators who want a strict allowlist for `assistant` can override it after install. Why not adopt PR #89's approach ------------------------------- #89 covers similar ground but with three issues this PR avoids: 1. mcp_servers = ["_none"] sentinel. #89's body explicitly notes it's pending upstream librefang#4808 (mcp_disabled). Shipping a magic-string today means coming back later to clean it up. This PR uses real allowlists. 2. max_history_messages = 8 / 12 / 15 / 20. Far below today's kernel default (60) and #91's direction for long-workflow hands (80–120). Every turn that hits the cap invalidates the cached prompt prefix; the cost of cache misses exceeds the saving from shorter history. This PR uses 60–120. 3. Doubling max_llm_tokens_per_hour (coder 200k→500k, assistant 300k→500k) widens the per-agent budget — the opposite direction from #87's "reduce per-call cost" goal. Left to the operator's instance-specific tuning. Refs #87, #89

houko added 2 commits May 11, 2026 15:19

houko mentioned this pull request May 11, 2026

fix(runtime): raise DEFAULT_MAX_HISTORY_MESSAGES from 40 to 60 librefang/librefang#4891

Merged

3 tasks

houko enabled auto-merge (squash) May 11, 2026 06:47

houko mentioned this pull request May 11, 2026

fix(api): stop the dashboard 401 spam on initial mount librefang/librefang#4893

Merged

4 tasks

houko disabled auto-merge May 11, 2026 23:52

houko enabled auto-merge (squash) May 11, 2026 23:52

houko disabled auto-merge May 11, 2026 23:55

houko merged commit 651ff1b into main May 11, 2026
3 checks passed

houko deleted the fix/creator-hand-max-history branch May 11, 2026 23:55

houko mentioned this pull request May 12, 2026

fix(agents,hands): per-agent/per-hand mcp_servers / skills allowlists (#87) #92

Merged

houko mentioned this pull request May 12, 2026

feat(agents,hands): per-agent skills, mcp_servers, profile, budget, and history limits #89

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(creator): raise max_history_messages + repair refresh-cache CI#91

fix(creator): raise max_history_messages + repair refresh-cache CI#91
houko merged 3 commits into
mainfrom
fix/creator-hand-max-history

houko commented May 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

houko commented May 11, 2026

Summary

1. hands/creator/HAND.toml — max_history_messages = 80 on [agents.main]

2. .github/workflows/refresh-cache.yml — open PR instead of git push to main

Why bundled

Verification

Test plan

Uh oh!

chatgpt-codex-connector Bot commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `hands/creator/HAND.toml` — `max_history_messages = 80` on `[agents.main]`

2. `.github/workflows/refresh-cache.yml` — open PR instead of `git push` to main