AI Helpdesk — Local RAG Knowledge Base

A self-hosted RAG application for IT helpdesk support. Users ask questions in natural language and receive answers sourced directly from uploaded helpdesk documents.

What This Is

When a user asks a question, the system:

Searches your helpdesk documents for the most relevant information
Passes that information to a local language model
The model generates an answer based only on your documents
The answer is returned to the user in a chat interface

This project uses RAG (Retrieval Augmented Generation) — the model answers based on your documents, not general training data.

The Stack

Tool	Role
Docker	Runs and connects all services
Ollama	Local LLM server — runs the chat and embedding models
Open WebUI	Chat interface, RAG pipeline, and built-in vector store (ChromaDB)

How It Works

Document Ingestion

Uploaded documents are split into chunks by Open WebUI's text splitter (chunk size: 1000, overlap: 100). Each chunk is passed to nomic-embed-text via the Ollama API, which returns a vector representation. The vector and original text are stored together in ChromaDB.

Your document
      ↓
Text splitter (chunk size: 1000, overlap: 100)
      ↓
nomic-embed-text → vector
      ↓
ChromaDB stores vector + original text

Query Pipeline

At query time, the user's question is embedded using the same nomic-embed-text model. ChromaDB performs a hybrid search (vector similarity + BM25 keyword search, weighted 0.5/0.5) to retrieve the top K most relevant chunks. The chunks are injected into a RAG prompt template alongside the user's question and passed to the chat model via Ollama.

User asks a question
      ↓
nomic-embed-text → vector
      ↓
ChromaDB hybrid search (vector similarity + BM25) → top K chunks
      ↓
RAG prompt template: system prompt + chunks + question
      ↓
llama3.1:8b generates response
      ↓
Answer streams back to user with source citations

Models

Model	Role	Size
`nomic-embed-text`	Embedding — converts text to vectors	274MB
`llama3.1:8b`	Chat — generates responses from retrieved context	4.7GB

The embedding model used at ingestion time must match the one used at query time. Changing the embedding model requires re-embedding all documents.

Getting Started

System Requirements

Spec	Minimum	Recommended
RAM	16GB	32GB
Disk	20GB free	50GB free
OS	Windows 10/11, macOS, Linux	Windows 11
Docker	Docker Desktop	Docker Desktop

Model RAM Requirements

Model	RAM Required	Quality
`llama3.2:3b`	~4GB	Good
`llama3.1:8b`	~8GB	Very good
`llama3.1:70b`	~40GB	Excellent (requires 40GB+ RAM)

Alternative Chat Models

The default chat model is llama3.1:8b. The following are tested alternatives:

Mistral 7B (mistral) A 7B parameter model from Mistral AI. Slightly smaller than Llama 3.1 8B with similar performance. Known for strong instruction following and concise responses. Good alternative if Llama 3.1 8B is not available or underperforms on your hardware.

docker exec -it ollama ollama pull mistral

Llama 3.1 70B (llama3.1:70b) The 70B parameter variant of Llama 3.1. Significantly stronger instruction following and lower hallucination rate than the 8B model. Requires approximately 40GB RAM. Recommended for production use where accuracy is critical.

docker exec -it ollama ollama pull llama3.1:70b

To switch models, update the Base Model in Workspace → Models → IT Helpdesk and click Save & Update.

The embedding model (nomic-embed-text) does not need to change when switching chat models.

Installation

Step 1 — Install WSL2 (Windows only)

Docker on Windows requires WSL2. Open PowerShell as Administrator and run:

wsl --install

Restart your machine when prompted.

Step 2 — Install Docker Desktop

Download and install from: https://www.docker.com/products/docker-desktop

Step 3 — Clone or Download This Project

Place the project folder anywhere on your machine.

Step 4 — Configure Credentials

Create a .env file in the project root:

WEBUI_SECRET_KEY=your_secret_key_here

Important: Never commit the .env file to GitHub. It is listed in .gitignore by default.

Step 5 — Start All Services

docker-compose up -d

Verify both containers are running:

docker-compose ps

Both services should show status Up.

Step 6 — Pull Models

Pull the chat model:

docker exec -it ollama ollama pull llama3.1:8b

Pull the embedding model:

docker exec -it ollama ollama pull nomic-embed-text

Open WebUI Setup

Step 7 — Create an Admin Account

Open http://localhost:8080 and create your admin account on first launch.

Step 8 — Configure RAG Settings

Go to Admin Settings → Documents and configure the following:

Embedding:

Embedding Model Engine: Ollama
Ollama URL: http://ollama:11434
Embedding Model: nomic-embed-text

Retrieval:

Full Context Mode: OFF
Hybrid Search: ON
Enrich Hybrid Search Text: OFF
Top K: 10

Click Save, then click Reindex Knowledge Base Vectors at the bottom of the page under Danger Zone.

Step 9 — Upload Your Documents

Go to Workspace → Knowledge
Click + to create a new knowledge base
Upload your helpdesk documents (PDF, Word, txt, markdown)

When documents are updated, delete the old version from the knowledge base and re-upload the updated file. ChromaDB re-embeds automatically on upload.

Step 10 — Create the Helpdesk Model

Go to Workspace → Models
Click + to create a new model
Set:
- Name: IT Helpdesk
- Base Model: llama3.1:8b
- System Prompt: You are an IT helpdesk assistant. Answer questions using only the information provided in the context. Do not add or assume information that is not explicitly stated. If the answer is not in the context, say "I don't have that information — please contact IT directly."
- Knowledge: select your knowledge base
- Capabilities: Citations only
Click Save & Update

Step 11 — Test

Open http://localhost:8080, select the IT Helpdesk model and ask a question related to your uploaded documents.

Third Party Service URLs

Service	URL	Purpose
Open WebUI	`http://localhost:8080`	User chat interface
Ollama API	`http://localhost:11434`	LLM API

Useful Commands

# Start all services
docker-compose up -d

# Stop all services
docker-compose down

# View running containers
docker-compose ps

# View logs for a specific service
docker-compose logs -f open-webui

# Pull a new Ollama model
docker exec -it ollama ollama pull llama3.1:8b

# List downloaded Ollama models
docker exec -it ollama ollama list

# Stop everything and delete all data (DESTRUCTIVE)
docker-compose down -v

Troubleshooting

Containers won't start Ensure Docker Desktop is fully running before running docker-compose up -d.

Model not responding Check that the model has been pulled: docker exec -it ollama ollama list

Answers not sourced from documents Ensure the knowledge base is selected in the model settings under Workspace → Models. Check that Bypass Embedding and Retrieval is OFF in Admin Settings → Documents.

Wrong documents being retrieved Delete all documents from the knowledge base and re-upload them fresh. This forces ChromaDB to re-embed with the correct embedding model.

Out of memory errors Switch to a smaller model. Replace llama3.1:8b with llama3.2:3b (~4GB RAM).

Data lost after restart Data is safe as long as you do not use the -v flag. Only docker-compose down -v deletes volumes.

Reindexing the Knowledge Base

The Reindex Knowledge Base Vectors button is located at the bottom of Admin Settings → Documents under the Danger Zone section.

When to use it

Scenario	Action required
Switched embedding model (e.g. SentenceTransformers → Ollama)	Reindex
Changed chunk size or chunk overlap settings	Reindex
Documents retrieving incorrectly after settings changes	Reindex
Added or replaced documents in the knowledge base	Re-upload only — reindex not required

What it does

Reindexing deletes all existing vectors in ChromaDB and re-embeds every document in the knowledge base using the currently configured embedding model. This ensures the vectors are consistent with the current settings.

Reindexing does not delete your documents — only the vector representations. Documents remain in the knowledge base and are re-processed automatically.

When NOT to use it

Do not use Reindex as a general fix for bad answers. If retrieval is returning the wrong documents, the more reliable fix is to delete and re-upload the affected documents individually. Reindex is specifically for cases where the embedding model or chunking settings have changed.

Known Limitations

Model hallucination on broad questions The llama3.1:8b model may occasionally supplement retrieved context with information from its training data, particularly for open-ended questions. Specific factual questions (phone numbers, step-by-step processes, specific policies) perform best. Upgrading to a larger model (70B+) or a hosted model (GPT-4o) significantly reduces this behaviour.

Phrasing sensitivity How a question is phrased affects retrieval quality. Specific questions perform better than broad ones. For example, "what is the IT helpdesk phone number" retrieves better than "how do I contact IT".

Re-embedding required after document updates When documents are modified, they must be deleted and re-uploaded to the knowledge base for changes to take effect. ChromaDB stores vectors from the original upload and does not automatically detect file changes.

Architecture

User
  ↓
Open WebUI (http://localhost:8080)
  ├── ChromaDB (built-in vector store)
  └── Ollama (http://ollama:11434)
       ├── llama3.1:8b (chat model)
       └── nomic-embed-text (embedding model)

All services communicate over a private Docker bridge network (helpdesk-network).

Knowledge Base Documents

Document	Contents
`account-management.txt`	Passwords, MFA, account setup, lockouts
`hardware-support.txt`	Laptops, peripherals, mobile devices, printers
`software-support.txt`	Standard software, requests, licensing, updates
`network-and-connectivity.txt`	Wi-Fi, VPN, network drives
`email-support.txt`	Setup, mobile, shared mailboxes, security
`remote-work.txt`	Policy, home office, remote desktop
`security-policy.txt`	Acceptable use, data classification, phishing
`it-portal-and-ticketing.txt`	Portal, ticket priorities, contact details
`onboarding-offboarding.txt`	New employee setup, account termination
`backup-and-recovery.txt`	OneDrive, file recovery, disaster recovery

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
rag-documents		rag-documents
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Helpdesk — Local RAG Knowledge Base

What This Is

The Stack

How It Works

Document Ingestion

Query Pipeline

Models

Getting Started

System Requirements

Model RAM Requirements

Alternative Chat Models

Installation

Open WebUI Setup

Third Party Service URLs

Useful Commands

Troubleshooting

Reindexing the Knowledge Base

When to use it

What it does

When NOT to use it

Known Limitations

Architecture

Knowledge Base Documents

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

AI Helpdesk — Local RAG Knowledge Base

What This Is

The Stack

How It Works

Document Ingestion

Query Pipeline

Models

Getting Started

System Requirements

Model RAM Requirements

Alternative Chat Models

Installation

Open WebUI Setup

Third Party Service URLs

Useful Commands

Troubleshooting

Reindexing the Knowledge Base

When to use it

What it does

When NOT to use it

Known Limitations

Architecture

Knowledge Base Documents

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages