Skip to content

m2ai-portfolio/local-ai-coding-agent-optimizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Local AI Coding Agent Optimizer

Infrastructure layer that makes local LLMs viable for coding agents by solving the KV cache invalidation problem.

Quick StartFeaturesExamplesContributing

What is this?

Local AI Coding Agent Optimizer is a command‑line utility that sits between a coding agent and a locally‑run LLM, re‑using previously computed KV cache tensors when the prompt prefix has not changed. It enables private, low‑latency, cost‑free inference for multi‑turn coding workflows.

Example usage:

$ echo -e "def foo():\n0" | python -m optimizer store
eJzLVQ... (base64 blob) 5

The command stores the KV cache for the prompt and returns a base64‑encoded tensor together with the sequence length.

Problem

Developers want to use local LLMs for coding agents to avoid API costs and maintain privacy, but local models become unusable after a few turns due to prefix shifts that invalidate the KV cache. This forces expensive re‑prefilling of the entire context, making the experience painfully slow. There's a technical gap preventing local models from being practical for agentic coding workflows.

Features

Feature Description
Prefix Hashing & Cache Lookup Computes a SHA‑256 hash of the prompt prefix and checks a local SQLite database for a matching KV cache entry.
KV Cache Storage When a prefix hash is miss, runs the LLM on the full prompt, serializes the resulting KV tensor, and stores it keyed by the hash.
Incremental KV Cache Extension Extends a cached KV tensor with new suffix tokens by running the LLM only on the suffix while feeding the cached KV as initial state.
CLI Pipe Interface All subcommands read from stdin and write to stdout, allowing seamless integration with coding agents via simple pipes.
Persistent SQLite Cache Cache entries are saved in a single local SQLite file, surviving restarts of the optimizer process.
Low‑latency Lookup Retrieving a cached KV tensor takes under 50 ms on a typical CPU laptop, avoiding costly re‑prefilling.
Zero External Dependencies Runs with only Python 3.11+, torch, typer, and pytest; no network calls or extra services are required.
Tokenizer Agnostic (MVP) Uses a whitespace split for tokenization in the minimum viable product, but can be swapped for any tokenizer later.

Quick Start

  1. Clone the repository:
    git clone https://github.com/m2ai-portfolio/local-ai-coding-agent-optimizer.git
    cd local-ai-coding-agent-optimizer
    
  2. Create a virtual environment and install dependencies:
    python -m venv venv
    source venv/bin/activate
    pip install typer pytest
    
    (torch should already be installed with your local LLM.)
  3. Test the store subcommand with a simple prompt:
    echo -e "def foo():\n0" | python -m optimizer store
    
    You should see a base64‑encoded blob and an integer sequence length printed to stdout.

Examples

Storing a new prompt prefix

$ echo -e "def foo():\n0" | python -m optimizer store
eJzLVQ... (base64 blob) 5

The optimizer runs the LLM on the full prompt, caches the KV tensor, and outputs the serialized blob together with the sequence length.

Looking up an existing prefix

$ echo -e "def foo():\n0" | python -m optimizer lookup
eJzLVQ... (same base64 blob) 5

Because the prefix hash matches a cached entry, the stored tensor is returned instantly without re‑running the LLM.

Extending a cached KV with new tokens

$ echo -e "eJzLVQ...\n5\n10 20 30" | python -m optimizer extend
eJzLVQ... (updated blob) 8

The command feeds the cached KV as past_key_values, processes the suffix tokens [10, 20, 30], and returns the updated KV tensor covering the full prefix+suffix.

File Structure

Local AI Coding Agent Optimizer/
├── optimizer/
│   ├── __init__.py
│   ├── cli.py          # Typer entry point with lookup, store, extend subcommands
│   ├── db.py           # SQLite wrapper for KV cache persistence
│   ├── hashing.py      # Prefix hashing helpers (SHA‑256)
│   ├── kv_utils.py     # Tensor serialize/deserialize and LLM forward hooks
│   ├── models.py       # Dataclasses for KVCacheEntry and OptimizerIO
│   └── __main__.py     # Allows `python -m optimizer` execution
├── tests/
│   ├── test_cli.py
│   ├── test_db.py
│   ├── test_hashing.py
│   └── test_kv_utils.py
├── assets/
│   └── infographic.png
├── .gitignore
├── LICENSE
└── README.md

Tech Stack

Technology Purpose
Python 3.11+ Core language and runtime
torch Tensor handling and LLM inference (assumed pre‑installed)
sqlite3 (stdlib) Persistent storage of KV cache entries
hashlib (stdlib) SHA‑256 hashing of prompt prefixes
typer Command‑line interface and subcommand parsing
pytest Unit and integration testing

Contributing

Fork the repository, make your changes, run the test suite, and submit a pull request. Please follow the existing coding style and add tests for new functionality.

License

MIT

Author

Matthew Snow -- M2AI | @m2ai-portfolio

About

Enables fast, private, zero‑cost coding agents by reusing KV cache prefixes so local LLMs stay responsive across multi‑turn prompts.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages