fix(teams): add agent loop protection — rate limiter + chain depth cap#224
fix(teams): add agent loop protection — rate limiter + chain depth cap#224jcenters wants to merge 1 commit intoTinyAGI:mainfrom
Conversation
Agents in a team could trigger runaway feedback loops by sending each other messages indefinitely. Two mechanisms failed to prevent this: 1. The chatroom fan-out (fixed separately in TinyAGI#220) escaped the conversation tracking system entirely, so totalMessages never incremented and the maxMessages guard never fired. 2. Agent-to-agent @mentions via sendInternalMessage had a maxMessages guard, but the default was 50 — enough for a 5-hour API limit burn before anything stopped. This PR adds two independent, layered defenses: **Rate limiter in enqueueMessage (queues.ts)** Any message where fromAgent is set is agent-generated. Before inserting, count how many agent-to-agent messages the target agent already has queued in the last 60 seconds. If at or above the limit, drop the message and log a [LoopGuard] warning instead of enqueuing. Default: 10 messages/minute/agent. Configurable via settings.json: "protection": { "max_agent_messages_per_minute": 10 } **Conversation chain depth cap (conversation.ts)** Lower DEFAULT_MAX_CONVERSATION_MESSAGES from 50 to 10. Read the effective value from settings.json at conversation creation time so operators can tune it without a code change: "protection": { "max_chain_depth": 10 } Both limits are independent — the rate limiter catches loops that escape the conversation system (e.g. chatroom messages, new conversations spawned by agents), while the chain depth cap limits depth within a single tracked conversation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Greptile SummaryThis PR introduces two layered protections against runaway agent feedback loops in team conversations: a per-agent rate limiter in the message queue (capping agent-to-agent messages at 10/minute by default) and a reduced default conversation chain depth (50 → 10), both configurable via Key findings:
Confidence Score: 2/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Agent enqueues message\nvia enqueueMessage] --> B{data.fromAgent set?}
B -- No: human message --> F[Insert into DB\nno rate check]
B -- Yes: agent-generated --> C[Query DB:\nCOUNT pending+processing messages\nfor target agent in last 60s]
C --> D{cnt >= maxPerMinute\ndefault 10}
D -- Yes --> E["[LoopGuard] Drop + log WARN\nreturn null"]
D -- No --> F
F --> G[Message in queue]
G --> H[Agent processes message\nvia handleTeamResponse]
H --> I{conversationId present\nand tracked?}
I -- Yes --> J{totalMessages < maxMessages\ndefault 10}
J -- Yes --> K[Extract @mentions\nenqueueInternalMessage]
K --> G
J -- No --> L["Log WARN: hit max messages\ndo not enqueue further mentions"]
L --> M[Complete conversation]
I -- No: chatroom / escaped --> N[Only rate limiter\nprovides protection]
N --> C
style E fill:#f66,color:#fff
style L fill:#f90,color:#fff
style N fill:#f90,color:#fff
Last reviewed commit: d3f2d53 |
| const recent = getDb().prepare( | ||
| `SELECT COUNT(*) as cnt FROM messages | ||
| WHERE agent=? AND from_agent IS NOT NULL | ||
| AND created_at > ? AND status IN ('pending','processing')` | ||
| ).get(targetAgent, now - RATE_WINDOW_MS) as { cnt: number }; |
There was a problem hiding this comment.
Rate limiter misses already-completed messages
The status IN ('pending','processing') filter means that if an agent processes messages quickly (each LLM call finishes before the next message is checked), those completed rows are excluded from the count. In a fast chatroom loop — the primary scenario this guard is designed to catch — messages could be enqueued, processed, and completed well within the 60-second window, keeping the pending/processing count at 0 or 1 and allowing the loop to bypass the limiter entirely.
Since pruneCompletedMessages defaults to a 24-hour retention window, completed messages within the last 60 seconds are still available in the table. Removing the status filter gives an accurate picture of the actual send rate regardless of processing speed:
| const recent = getDb().prepare( | |
| `SELECT COUNT(*) as cnt FROM messages | |
| WHERE agent=? AND from_agent IS NOT NULL | |
| AND created_at > ? AND status IN ('pending','processing')` | |
| ).get(targetAgent, now - RATE_WINDOW_MS) as { cnt: number }; | |
| const recent = getDb().prepare( | |
| `SELECT COUNT(*) as cnt FROM messages | |
| WHERE agent=? AND from_agent IS NOT NULL | |
| AND created_at > ?` | |
| ).get(targetAgent, now - RATE_WINDOW_MS) as { cnt: number }; |
| const settings = getSettings(); | ||
| const maxPerMinute = settings.protection?.max_agent_messages_per_minute ?? DEFAULT_MAX_AGENT_MESSAGES_PER_MINUTE; | ||
| const targetAgent = data.agent ?? 'default'; |
There was a problem hiding this comment.
getSettings() called on every agent message enqueue
getSettings() is invoked for every agent-originated message. Depending on the implementation in ./config, this could involve a file read or JSON parse on each call. In a burst scenario (e.g., chatroom fan-out to many teammates), this gets called for every single enqueue. Consider caching or passing the settings in from the call site to avoid repeated I/O on the hot path.
Problem
Agents in a team can trigger runaway feedback loops that exhaust your API budget in minutes. Two failure modes:
maxMessagesguard by not carrying aconversationId. The guard only fires for in-conversation@mentionchains.Two layered fixes
1. Rate limiter in
enqueueMessage(packages/core/src/queues.ts)Any message with
fromAgentset was generated by an agent, not a human. Before inserting, count how many agent-to-agent messages the target agent already has queued in the last 60 seconds. If at or above the limit, drop and log a[LoopGuard]warning.This catches loops that escape the conversation system entirely — chatroom messages, new conversations spawned by agents, anything without a
conversationId.Default: 10 messages/minute/agent. Configurable:
{ "protection": { "max_agent_messages_per_minute": 10 } }2. Lower default chain depth (
packages/teams/src/conversation.ts)DEFAULT_MAX_CONVERSATION_MESSAGES: 50 → 10. Read fromsettings.jsonat conversation creation time:{ "protection": { "max_chain_depth": 10 } }Both limits are independent — the rate limiter is a hard floor for anything that escapes conversation tracking; the chain depth cap limits depth within a tracked conversation.
Test plan
max_chain_depth(default 10)[LoopGuard]in logssettings.jsonoverrides respected:max_agent_messages_per_minute: 20raises the rate limitfromAgent) are never rate-limited🤖 Generated with Claude Code