OllamaChat

A self-hosted ChatGPT-style web app that runs on your machine using Ollama. No cloud APIs, no API keys your data stays local.

Features

Chat interface: streaming responses, markdown with syntax highlighting, conversation history
Vision / image input: attach images via button, drag-and-drop, or clipboard paste — auto-routes to a vision-capable model
Smart model routing: "Auto" mode detects code patterns and routes to your configured code model, and detects images to route to a vision model
System prompts: set a custom system prompt per conversation to control the model's behavior
Grounded responses: assistant messages include confidence + citations when RAG context is used
Persistent memory: automatic memory capture from conversation turns (preference, fact, decision)
Memory transparency: assistant responses show which memory items were injected automatically
Optional self-hosted voice mode: push-to-talk input + spoken assistant replies via local speech provider
Configurable models: set your default, code, and embedding models from the in-app settings panel — works with whatever you have installed
Dark mode: mobile-responsive, conversation management

Knowledge Base

Upload documents or paste URLs to build a searchable knowledge base. When RAG is enabled for a conversation, relevant chunks are automatically retrieved and injected into the prompt context.

Upload files: drag-and-drop or file picker (supports .md, .txt, .pdf, .ts, .js, .py, .go, .rs, .java, .cpp, .c, .html, .css, .json, .yaml, .toml)
Index URLs: paste any URL to scrape and index its content
Document management: view status, chunk count, file size; reindex or delete documents
Test search: run queries against your indexed documents to verify retrieval quality
Last cited visibility: knowledge base table shows when each document was last cited in chat

Settings

Configure model, memory, and RAG parameters from the settings page.

Automatic memory capture: memory extraction runs on every completed turn
Memory token budget: max prompt budget used for memory injection (approximation: content.length / 4)
Voice provider settings: one-time setup for local speech URL + STT/TTS models + voice preset
Voice behavior controls: enable/disable voice globally and auto-speak assistant replies
RAG toggle: enable or disable RAG globally
Chunk size / overlap: control how documents are split into chunks (100–2000 tokens, 0–500 overlap)
Top-K results: number of chunks retrieved per query (1–20)
Similarity threshold: minimum cosine similarity score for retrieved chunks (0–1)
Embedding model: select which Ollama model generates embeddings (default: nomic-embed-text)
Watched folders: add local directories for automatic file indexing via file watcher
Supported file types: toggle which file extensions are indexed

Memory

Memory is automatic with user review controls:

Memory page (/memory): review, search, filter, edit, archive, and restore auto-captured memory items
Scopes:
- global: available to all conversations
- conversation: only injected into a specific conversation
Types: preference, fact, decision
Usage tracking: each memory item records useCount and lastUsedAt
Auto-capture tags: automatically extracted memories are tagged with auto
Prompt injection order: system prompt → memory → RAG → conversation history

Setup

1. Install Ollama

brew install ollama   # or download from ollama.ai
ollama serve          # or open the desktop app

2. Pull any models you want

ollama pull gemma2:9b          # lightweight general model
ollama pull qwen3:14b          # larger general model
ollama pull qwen3-coder:30b    # coding specialist
ollama pull llama3.2-vision    # vision model for image input
ollama pull nomic-embed-text   # embeddings for RAG

Any model from ollama.ai/library works. Pull it and it appears in the app.

3. Run

pnpm install
pnpm db:setup
pnpm dev

Open http://localhost:3000. The SQLite database is created automatically.

pnpm db:setup runs Prisma migrations and sets up the vector column for RAG embeddings.

Optional: self-hosted voice mode

Push-to-talk input and spoken assistant replies via a local Speaches sidecar. No cloud APIs — everything runs on your machine.

docker compose -f docker-compose.voice.yml up -d   # start Speaches

Then open Settings → Voice and enable it. See VOICE.md for the full setup guide, model installation, troubleshooting, and GPU acceleration.

4. Configure models

Click the gear icon in the header to open settings. Choose which of your installed models to use for:

Default Model — general conversations
Code Model — used by Auto mode when code is detected in your prompt
Embedding Model — used for RAG document embeddings

Vision-capable models are tagged with (vision) in the model selector. When images are attached in Auto mode, the app auto-routes to the first available vision model.

On first run, the app auto-detects your installed models and picks defaults.

Customizing

Routing logic: edit src/lib/router.ts to change which patterns trigger the code model.
System prompt: click the "System Prompt" toggle below the chat header to set per conversation instructions.
Memory ranking/budget logic: edit src/lib/memory/index.ts.
Remote Ollama: set OLLAMA_BASE_URL in .env to point at a GPU server running Ollama.
Remote speech service: set VOICE_BASE_URL/VOICE_API_KEY to any OpenAI-compatible self-hosted speech API.
Voice tone tuning: set VOICE_TTS_SPEED (default 0.92). A range around 0.9-0.95 often sounds less robotic.

Grounding and Citations

When RAG is enabled, assistant messages show an experimental confidence label (high / medium / low) and cite the source documents used. When no relevant sources are found, the model still responds but is instructed to flag uncertainty.

Evals

pnpm eval:grounding

Runs retrieval checks from evals/datasets/grounding.jsonl.

Writes report to evals/reports/grounding-baseline.json
Skips when app is unreachable (use EVAL_STRICT=1 to fail instead)

pnpm eval:grounding:export

Exports draft low-confidence cases from local usage to evals/datasets/grounding.draft.jsonl.

Architecture

src/lib/
├── chat/              # Chat pipeline helpers
│   ├── resolve-model  # Smart router wrapper (auto → code/default model)
│   ├── context        # Message building, system prompt + RAG injection
│   └── index          # Barrel exports
├── memory/            # Memory selection, ranking, injection, tag/usage helpers
├── rag/               # RAG pipeline
│   ├── embeddings     # Ollama embedding API
│   ├── vector-db      # libSQL vector search
│   ├── chunker        # Document-aware chunking
│   └── parsers/       # Markdown, PDF, code, URL parsers
├── tools/             # Agent tool definitions + executor (web_search, fetch_url)
├── voice/             # Voice STT/TTS integration (Speaches)
├── router             # Pattern-matching code detection
├── ollama             # Ollama HTTP API wrapper (chat, embeddings, model capabilities)
├── config             # App configuration (DB-backed)
└── db                 # Prisma client singleton

Stack

Next.js 16 · React 19 · Tailwind v4 · TypeScript · Prisma v7 + SQLite · Ollama API

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
evals		evals
prisma		prisma
public		public
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
VOICE.md		VOICE.md
docker-compose.voice.yml		docker-compose.voice.yml
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.mjs		postcss.config.mjs
prisma.config.ts		prisma.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OllamaChat

Features

Setup

1. Install Ollama

2. Pull any models you want

3. Run

Optional: self-hosted voice mode

4. Configure models

Customizing

Grounding and Citations

Evals

Architecture

Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OllamaChat

Features

Setup

1. Install Ollama

2. Pull any models you want

3. Run

Optional: self-hosted voice mode

4. Configure models

Customizing

Grounding and Citations

Evals

Architecture

Stack

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages