Local RAG with CSC documentation

This repository contains code for doing retrieval-augmented generation (RAG) with CSC user documentation using models run locally on workstation CPUs.

Setting up

Make sure you have a working installation of Python and Docker Engine.

Install Python dependencies

python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt

Download models

The models are run using the llama.cpp framework, which requires models to be converted to the GGUF format. Fortunately, there are many pre-converted models available on Hugging Face. The hf CLI tool used here is installed by the huggingface-hub Python package, which is included in the dependencies.

hf download --local-dir ./data/llama.cpp unsloth/embeddinggemma-300m-GGUF embeddinggemma-300m-Q4_0.gguf
hf download --local-dir ./data/llama.cpp unsloth/gemma-3-4b-it-qat-GGUF gemma-3-4b-it-qat-Q4_K_M.gguf

Usage

Start the llama.cpp and Qdrant servers

Start the llama.cpp inference server and Qdrant vector database containers using Docker Compose. The container images are pulled automatically.

docker compose up

The containers can be stopped using a similar command.

docker compose down

If you encounter any issues, it can be helpful to remove any stopped containers before running Compose.

docker container prune

Build Qdrant vector index

Build the vector database using the provided script. This takes around 25 minutes on my workstation.

python3 build_index.py

Open Chainlit chat interface

Open the Chainlit chat interface. The chainlit CLI tool is installed by a Python package of the same name, which is included in the dependencies. The server's startup script takes a few seconds to run, so if you see an error message when trying to access the web UI, try refreshing the page.

chainlit run app.py

RAG is disabled by default, but can be enabled from the settings menu, which is accessed by clicking on the gear icon located on the left side of the input field. The retrieved documents can be viewed by expanding the "Used retrieve" step in the chat and clicking on one of the source icons at the end of the step.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
images		images
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
app.py		app.py
build_index.py		build_index.py
chainlit.md		chainlit.md
compose.yaml		compose.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local RAG with CSC documentation

Setting up

Install Python dependencies

Download models

Usage

Start the llama.cpp and Qdrant servers

Build Qdrant vector index

Open Chainlit chat interface

Example

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Local RAG with CSC documentation

Setting up

Install Python dependencies

Download models

Usage

Start the llama.cpp and Qdrant servers

Build Qdrant vector index

Open Chainlit chat interface

Example

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages