Problem
If an LLM provider is slow or hangs, BaseAgent.act() and think() will block indefinitely. There's no timeout, retry, or fallback mechanism.
Proposed Solution
- Add configurable
timeout_seconds parameter to BaseAgent (default: 120s)
- Wrap provider calls in
asyncio.wait_for()
- Add exponential backoff retry (max 3 attempts) on transient errors (rate limits, timeouts)
- Emit structured log on retry/failure so the pipeline can report which stage failed and why
Bonus
Consider adding a fallback_provider option so agents can degrade gracefully (e.g., try Claude, fall back to local Ollama).
Problem
If an LLM provider is slow or hangs,
BaseAgent.act()andthink()will block indefinitely. There's no timeout, retry, or fallback mechanism.Proposed Solution
timeout_secondsparameter to BaseAgent (default: 120s)asyncio.wait_for()Bonus
Consider adding a
fallback_provideroption so agents can degrade gracefully (e.g., try Claude, fall back to local Ollama).