Skip to content

KV cache invalidated every turn due to timestamp re-injection on historical messages #3

@Autumnorus

Description

@Autumnorus

Problem

The filter fetches chat history and re-injects timestamps into all historical user messages on every request. This causes full KV cache invalidation each turn because the prompt content changes, even for messages that were already timestamped.

For users running local models via llama-server, this results in the entire context being reprocessed on every message instead of only the new tokens. As context grows, this adds significant prompt processing time per message.

Root Cause

The inlet() method iterates over all historical messages and re-stamps them each turn, even though the startswith("[") check is intended to prevent this.

Impact

Any user running llama.cpp, llama-server, or similar local inference backends with KV cache reuse enabled will be affected. Cloud API users may not notice since they don't have persistent KV cache.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions