The same task as GPT-4 — predict the next word — solved with eigenvalues instead of backpropagation.
Kalle is a bilingual chatbot (German + English) that runs entirely in your browser using Kernel Ridge Regression with Random Fourier Features. No neural networks, no backpropagation, no gradient descent, no server — just matrix multiplication, a Gaussian kernel, and eigenvalues. TensorFlow.js provides WebGL GPU acceleration; the model weights are embedded directly in the HTML.
Kalle chats about food, hobbies, music, feelings, weather, and simple math — with multi-turn context awareness, honest scope boundaries, and RAG (Retrieval-Augmented Generation) over the Eigenvalues & AI blog post. Ask "What are eigenvalues?" or "How does PageRank work?" and Kalle retrieves the relevant blog section to answer.
Everything is bundled into a single self-contained HTML file (index.html, ~33 MB). No server, no API, no external dependencies at runtime.
OFFLINE BUILD (Python, Float64, ~3 min on CPU)
┌─────────────────────────────────────────────────────────┐
│ │
│ data/corpus.md (2113 dialog pairs) │
│ │ │
│ ├──→ Word2Vec (gensim, 32-dim) → embeddings │
│ ├──→ Token sequence (5× repeat) → RFF features │
│ │ └──→ W = (Z^TZ + λI)⁻¹ Z^TY [KRR] │
│ ├──→ IDF weights + BoW pair embeddings │
│ └──→ data/chunk_index.json → RAG chunk embeddings │
│ │
│ All tensors → Float16 + gzip + base64 │
│ └──→ Injected into data/template.html │
│ └──→ index.html (self-contained, ~33 MB) │
└─────────────────────────────────────────────────────────┘
ONLINE (Browser, WebGL GPU)
┌─────────────────────────────────────────────────────────┐
│ 1. Decompress base64 → Float16 → GPU tensors │
│ 2. User query → chunk retrieval (RAG) → pair matching │
│ 3. Word-by-word rendering with prediction comparison │
│ 4. TensorFlow.js WebGL for matrix ops (<1ms/word) │
└─────────────────────────────────────────────────────────┘
The template (data/template.html) contains ~80 lines of vanilla JavaScript for the matching and rendering logic. The build script (src/build.py) trains the model offline in Float64 for numerical precision, then quantizes to Float16 and packs everything into the template.
# 1. Random Fourier Features (Rahimi & Recht, 2007)
z(x) = sqrt(2/D) * cos(x @ omega + bias) # ω ~ N(0, 1/σ²)
# 2. Kernel Ridge Regression (closed-form, no gradient descent)
W = solve(Z.T @ Z + lambda * I, Z.T @ Y) # one matrix solve
# 3. Prediction (single matrix-vector multiply)
next_word = argmax(z(context) @ W) # <1ms on WebGL GPUNo epochs. No learning rate. No convergence monitoring. W is the only learned parameter (6144 × 2952 ≈ 18.1M values).
| Parameter | Value |
|---|---|
| Corpus | 2174 curated dialog pairs (DE + EN) |
| Vocabulary | 2952 words (Word2Vec, 32-dim) |
| Context window | 24 words |
| RFF dimension (D) | 6144 |
| Kernel bandwidth (σ) | 1.5 |
| Regularization (λ) | 10⁻⁶ |
| RAG chunks | 29 (from Eigenvalues & AI blog) |
| File size | ~56 MB (self-contained HTML) |
| Runtime | WebGL GPU via TensorFlow.js |
Deterministic consequences of the BoW+IDF design — not programmed, but also not magical:
- Math validation: Kalle asks "what is 3+5?", user says "8" → "correct!" — pure pattern matching on
plus 3 5 8, no code checks the math - Insult immunity: Profanity is out-of-vocabulary → invisible to the model → no filter needed
- Typo robustness: OOV words from typos are silently ignored, remaining words still match
- Bilingual routing: English/German words have different IDF weights → queries route to language-appropriate pairs without language detection code
- Honest limits: Low confidence → "I can only talk about food, hobbies, music, feelings, weather and math"
| Iteration | Pairs | Encoding | Top-1 | Lesson |
|---|---|---|---|---|
| V1 (original) | 57 | Hash (128 buckets) | 99.8% | Perfect but narrow |
| MEGA (mass) | 4301 | Word2Vec + hacks | 34.9% | More data = worse! |
| FINAL (curated) | 2174 | Word2Vec (32-dim) | 65.4% | Curated > generated |
This mirrors what OpenAI, Anthropic and Google learned: data quality beats data quantity.
pip install numpy gensim
# Generate corpus (optional — data/corpus.md already included)
python3 src/gen_corpus.py
# v1 build: direct Gaussian elimination (fastest on CPU at current scale)
python3 src/build.py
# → produces index.html (~56 MB, all weights embedded, runs in any browser)
# v2 build: pluggable solver (v1-compatible + new iterative paths)
python3 src/build_v2.py --solver=direct # v1-equivalent
python3 src/build_v2.py --solver=cg # Block-PCG (default, GPU-friendly)
python3 src/build_v2.py --solver=cg --cg-tol=1e-8 --cg-maxiter=2000 # tighter tolerance
# → produces kalle-chat-v2.htmlThe v2 solver implements the absorber-stochasticization described in the
v2 paper — same result as direct solve,
but using only matrix-vector products (GPU-ideal, scales to D ≫ 10,000).
See benchmarks/README.md for synthetic comparisons and
benchmarks/real_kalle_results.md for the
full-scale benchmark on actual Kalle training matrices (NumPy vs. PyTorch CPU
vs. Apple MPS GPU — with a surprising finding: PyTorch's BLAS back-end matters
more than the GPU at this scale).
pip install playwright && playwright install chromium
# Full regression suite (34 scenarios)
python3 tests/test_regression.py index.html
# Filter by category
python3 tests/test_regression.py index.html --filter math
python3 tests/test_regression.py index.html --filter emotiongit clone https://github.com/pmmathias/krr-chat.git
cd krr-chat
open index.html # opens in default browser, GPU acceleratedNo build step needed to run — the HTML file is self-contained with embedded model weights.
krr-chat/
├── index.html # The chatbot (self-contained, ~56 MB)
├── README.md
├── ARCHITECTURE.md # Full technical deep-dive
├── src/
│ ├── build.py # Build pipeline: corpus → Word2Vec → KRR → HTML
│ ├── gen_corpus.py # Curated corpus generator (~2100 pairs)
│ └── gen_rag_qa.py # RAG Q&A pair generator from blog chunks
├── tests/
│ └── test_regression.py # Playwright regression suite (34 scenarios)
└── data/
├── corpus.md # 2174 curated dialog pairs (training data)
├── chunk_index.json # 29 blog chunks for RAG retrieval
└── template.html # HTML/JS template (matching + rendering logic)
See ARCHITECTURE.md for the full technical architecture, design decisions, and mathematical foundations.
- How KRR Chat Works — deep-dive blog post covering every component: from RFF kernel approximation to the eigenvalue spectrum of the training matrix
- Eigenvalues & AI — the foundation: why eigenvalues connect Google PageRank, regularization, quantum mechanics, and language models
- KI-Mathias Blog — all posts on the mathematical foundations of AI
MIT
Mathias Leonhardt — CTO at pmagentur.com, writing about the mathematical foundations of AI at ki-mathias.de
Built in collaboration with Claude Code (Anthropic)