Problem
The filter fetches chat history and re-injects timestamps into all historical user messages on every request. This causes full KV cache invalidation each turn because the prompt content changes, even for messages that were already timestamped.
For users running local models via llama-server, this results in the entire context being reprocessed on every message instead of only the new tokens. As context grows, this adds significant prompt processing time per message.
Root Cause
The inlet() method iterates over all historical messages and re-stamps them each turn, even though the startswith("[") check is intended to prevent this.
Impact
Any user running llama.cpp, llama-server, or similar local inference backends with KV cache reuse enabled will be affected. Cloud API users may not notice since they don't have persistent KV cache.