Skip to content

tomevang/AIHelpdesk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

AI Helpdesk — Local RAG Knowledge Base

A self-hosted RAG application for IT helpdesk support. Users ask questions in natural language and receive answers sourced directly from uploaded helpdesk documents.


What This Is

When a user asks a question, the system:

  1. Searches your helpdesk documents for the most relevant information
  2. Passes that information to a local language model
  3. The model generates an answer based only on your documents
  4. The answer is returned to the user in a chat interface

This project uses RAG (Retrieval Augmented Generation) — the model answers based on your documents, not general training data.


The Stack

Tool Role
Docker Runs and connects all services
Ollama Local LLM server — runs the chat and embedding models
Open WebUI Chat interface, RAG pipeline, and built-in vector store (ChromaDB)

How It Works

Document Ingestion

Uploaded documents are split into chunks by Open WebUI's text splitter (chunk size: 1000, overlap: 100). Each chunk is passed to nomic-embed-text via the Ollama API, which returns a vector representation. The vector and original text are stored together in ChromaDB.

Your document
      ↓
Text splitter (chunk size: 1000, overlap: 100)
      ↓
nomic-embed-text → vector
      ↓
ChromaDB stores vector + original text

Query Pipeline

At query time, the user's question is embedded using the same nomic-embed-text model. ChromaDB performs a hybrid search (vector similarity + BM25 keyword search, weighted 0.5/0.5) to retrieve the top K most relevant chunks. The chunks are injected into a RAG prompt template alongside the user's question and passed to the chat model via Ollama.

User asks a question
      ↓
nomic-embed-text → vector
      ↓
ChromaDB hybrid search (vector similarity + BM25) → top K chunks
      ↓
RAG prompt template: system prompt + chunks + question
      ↓
llama3.1:8b generates response
      ↓
Answer streams back to user with source citations

Models

Model Role Size
nomic-embed-text Embedding — converts text to vectors 274MB
llama3.1:8b Chat — generates responses from retrieved context 4.7GB

The embedding model used at ingestion time must match the one used at query time. Changing the embedding model requires re-embedding all documents.


Getting Started

System Requirements

Spec Minimum Recommended
RAM 16GB 32GB
Disk 20GB free 50GB free
OS Windows 10/11, macOS, Linux Windows 11
Docker Docker Desktop Docker Desktop

Model RAM Requirements

Model RAM Required Quality
llama3.2:3b ~4GB Good
llama3.1:8b ~8GB Very good
llama3.1:70b ~40GB Excellent (requires 40GB+ RAM)

Alternative Chat Models

The default chat model is llama3.1:8b. The following are tested alternatives:

Mistral 7B (mistral) A 7B parameter model from Mistral AI. Slightly smaller than Llama 3.1 8B with similar performance. Known for strong instruction following and concise responses. Good alternative if Llama 3.1 8B is not available or underperforms on your hardware.

docker exec -it ollama ollama pull mistral

Llama 3.1 70B (llama3.1:70b) The 70B parameter variant of Llama 3.1. Significantly stronger instruction following and lower hallucination rate than the 8B model. Requires approximately 40GB RAM. Recommended for production use where accuracy is critical.

docker exec -it ollama ollama pull llama3.1:70b

To switch models, update the Base Model in Workspace → Models → IT Helpdesk and click Save & Update.

The embedding model (nomic-embed-text) does not need to change when switching chat models.


Installation

Step 1 — Install WSL2 (Windows only)

Docker on Windows requires WSL2. Open PowerShell as Administrator and run:

wsl --install

Restart your machine when prompted.

Step 2 — Install Docker Desktop

Download and install from: https://www.docker.com/products/docker-desktop

Step 3 — Clone or Download This Project

Place the project folder anywhere on your machine.

Step 4 — Configure Credentials

Create a .env file in the project root:

WEBUI_SECRET_KEY=your_secret_key_here

Important: Never commit the .env file to GitHub. It is listed in .gitignore by default.

Step 5 — Start All Services

docker-compose up -d

Verify both containers are running:

docker-compose ps

Both services should show status Up.

Step 6 — Pull Models

Pull the chat model:

docker exec -it ollama ollama pull llama3.1:8b

Pull the embedding model:

docker exec -it ollama ollama pull nomic-embed-text

Open WebUI Setup

Step 7 — Create an Admin Account

Open http://localhost:8080 and create your admin account on first launch.

Step 8 — Configure RAG Settings

Go to Admin Settings → Documents and configure the following:

Embedding:

  • Embedding Model Engine: Ollama
  • Ollama URL: http://ollama:11434
  • Embedding Model: nomic-embed-text

Retrieval:

  • Full Context Mode: OFF
  • Hybrid Search: ON
  • Enrich Hybrid Search Text: OFF
  • Top K: 10

Click Save, then click Reindex Knowledge Base Vectors at the bottom of the page under Danger Zone.

Step 9 — Upload Your Documents

  1. Go to Workspace → Knowledge
  2. Click + to create a new knowledge base
  3. Upload your helpdesk documents (PDF, Word, txt, markdown)

When documents are updated, delete the old version from the knowledge base and re-upload the updated file. ChromaDB re-embeds automatically on upload.

Step 10 — Create the Helpdesk Model

  1. Go to Workspace → Models
  2. Click + to create a new model
  3. Set:
    • Name: IT Helpdesk
    • Base Model: llama3.1:8b
    • System Prompt: You are an IT helpdesk assistant. Answer questions using only the information provided in the context. Do not add or assume information that is not explicitly stated. If the answer is not in the context, say "I don't have that information — please contact IT directly."
    • Knowledge: select your knowledge base
    • Capabilities: Citations only
  4. Click Save & Update

Step 11 — Test

Open http://localhost:8080, select the IT Helpdesk model and ask a question related to your uploaded documents.


Third Party Service URLs

Service URL Purpose
Open WebUI http://localhost:8080 User chat interface
Ollama API http://localhost:11434 LLM API

Useful Commands

# Start all services
docker-compose up -d

# Stop all services
docker-compose down

# View running containers
docker-compose ps

# View logs for a specific service
docker-compose logs -f open-webui

# Pull a new Ollama model
docker exec -it ollama ollama pull llama3.1:8b

# List downloaded Ollama models
docker exec -it ollama ollama list

# Stop everything and delete all data (DESTRUCTIVE)
docker-compose down -v

Troubleshooting

Containers won't start Ensure Docker Desktop is fully running before running docker-compose up -d.

Model not responding Check that the model has been pulled: docker exec -it ollama ollama list

Answers not sourced from documents Ensure the knowledge base is selected in the model settings under Workspace → Models. Check that Bypass Embedding and Retrieval is OFF in Admin Settings → Documents.

Wrong documents being retrieved Delete all documents from the knowledge base and re-upload them fresh. This forces ChromaDB to re-embed with the correct embedding model.

Out of memory errors Switch to a smaller model. Replace llama3.1:8b with llama3.2:3b (~4GB RAM).

Data lost after restart Data is safe as long as you do not use the -v flag. Only docker-compose down -v deletes volumes.


Reindexing the Knowledge Base

The Reindex Knowledge Base Vectors button is located at the bottom of Admin Settings → Documents under the Danger Zone section.

When to use it

Scenario Action required
Switched embedding model (e.g. SentenceTransformers → Ollama) Reindex
Changed chunk size or chunk overlap settings Reindex
Documents retrieving incorrectly after settings changes Reindex
Added or replaced documents in the knowledge base Re-upload only — reindex not required

What it does

Reindexing deletes all existing vectors in ChromaDB and re-embeds every document in the knowledge base using the currently configured embedding model. This ensures the vectors are consistent with the current settings.

Reindexing does not delete your documents — only the vector representations. Documents remain in the knowledge base and are re-processed automatically.

When NOT to use it

Do not use Reindex as a general fix for bad answers. If retrieval is returning the wrong documents, the more reliable fix is to delete and re-upload the affected documents individually. Reindex is specifically for cases where the embedding model or chunking settings have changed.


Known Limitations

Model hallucination on broad questions The llama3.1:8b model may occasionally supplement retrieved context with information from its training data, particularly for open-ended questions. Specific factual questions (phone numbers, step-by-step processes, specific policies) perform best. Upgrading to a larger model (70B+) or a hosted model (GPT-4o) significantly reduces this behaviour.

Phrasing sensitivity How a question is phrased affects retrieval quality. Specific questions perform better than broad ones. For example, "what is the IT helpdesk phone number" retrieves better than "how do I contact IT".

Re-embedding required after document updates When documents are modified, they must be deleted and re-uploaded to the knowledge base for changes to take effect. ChromaDB stores vectors from the original upload and does not automatically detect file changes.


Architecture

User
  ↓
Open WebUI (http://localhost:8080)
  ├── ChromaDB (built-in vector store)
  └── Ollama (http://ollama:11434)
       ├── llama3.1:8b (chat model)
       └── nomic-embed-text (embedding model)

All services communicate over a private Docker bridge network (helpdesk-network).


Knowledge Base Documents

Document Contents
account-management.txt Passwords, MFA, account setup, lockouts
hardware-support.txt Laptops, peripherals, mobile devices, printers
software-support.txt Standard software, requests, licensing, updates
network-and-connectivity.txt Wi-Fi, VPN, network drives
email-support.txt Setup, mobile, shared mailboxes, security
remote-work.txt Policy, home office, remote desktop
security-policy.txt Acceptable use, data classification, phishing
it-portal-and-ticketing.txt Portal, ticket priorities, contact details
onboarding-offboarding.txt New employee setup, account termination
backup-and-recovery.txt OneDrive, file recovery, disaster recovery

About

A local RAG pipeline for IT helpdesk support. Documents are chunked, embedded via nomic-embed-text, and stored in ChromaDB. At query time, hybrid search retrieves relevant chunks which are passed as context to llama3.1:8b running on Ollama. Deployed via Docker Compose.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors