Preventive context management for LangChain agents.
When an agent reads ten files in a row, those twenty messages stay in context long after they've been processed. This middleware quietly collapses them into a single line, keeping only the most recent result. The expensive stuff (LLM summarization) triggers less often.
No LLM calls. No hallucination risk. Stateless.
pip install langchain-collapseAgents burn through context by accumulating tool results they've already processed. A typical file exploration phase (8 reads, 4 greps) can eat thousands of tokens that just sit there. CollapseMiddleware scans for these repetitive groups and replaces the older ones with a short note, keeping the last result visible to the model.
On a realistic coding session, this produces a 92% token reduction. When paired with SummarizationMiddleware, summarization triggers 4.2x later because context fills up more slowly.
from langchain.agents import create_agent
from langchain_collapse import CollapseMiddleware
agent = create_agent(
model="anthropic:claude-sonnet-4-6",
tools=[...],
middleware=[CollapseMiddleware()],
)Place CollapseMiddleware first. It reduces the message count before summarization decides whether to fire:
from langchain.agents.middleware import SummarizationMiddleware
middleware = [
CollapseMiddleware(),
SummarizationMiddleware(
model="anthropic:claude-haiku-4-5-20251001",
trigger=("fraction", 0.85),
),
]CollapseMiddleware(
collapse_tools=frozenset({"read_file", "grep", "glob", "web_search"}), # default
min_group_size=2, # minimum consecutive pairs to collapse
)- Source (single file, ~150 lines)
- Benchmark (realistic session with token counts)
- Tests (unit tests + property-based invariant tests)
git clone https://github.com/johanity/langchain-collapse.git
cd langchain-collapse
pip install -e ".[test]"
pytestMIT