v1 uses Anthropic's ephemeral cache (5-minute TTL). Rate-limit retries during a run can blow through TTL and trigger re-cache writes — observed twice in the v4 run on a 71-obligation document. Moving to a longer-TTL cache tier (or restructuring to keep call cadence inside TTL) would improve hit rate from 97% to ~100% on long runs. Anthropic's docs note when extended-TTL caching is available; check current pricing before committing. Tracked for v2.
v1 uses Anthropic's ephemeral cache (5-minute TTL). Rate-limit retries during a run can blow through TTL and trigger re-cache writes — observed twice in the v4 run on a 71-obligation document. Moving to a longer-TTL cache tier (or restructuring to keep call cadence inside TTL) would improve hit rate from 97% to ~100% on long runs. Anthropic's docs note when extended-TTL caching is available; check current pricing before committing. Tracked for v2.