Community Discord bot for nan.builders. It answers member questions in configured support channels using semantic search over a curated markdown knowledge base (a SimpleVectorStore backed by SQLite, qwen3-embedding 4096-dim vectors, cosine similarity computed in Python) and the qwen3.6 chat model exposed through the NaN LiteLLM gateway. The bot also reports daily and per-user token usage pulled from the LiteLLM admin API.
- Mention-triggered auto-response in channels listed in
ALLOWED_CHANNELS, with retrieval-augmented answers from the local vector store and aqwen3.6fallback. - Per-user rate limiting (3 mentions per 60-second window per
(user, channel)pair). - Username sanitization to mitigate prompt injection via Discord display names.
- Text commands (prefix
/):/health,/docs,/search <query>. - Slash commands (Discord interactions):
/metrics,/my-metrics. - Daily token usage report posted to
STATUS_CHANNEL_IDatMETRICS_SEND_HOUR(UTC), pulled from the LiteLLM proxy. - HTTP health endpoint on port
9101(GET /health) consumed by the DockerHEALTHCHECK. - Doc-hash optimization: unchanged markdown files are skipped on startup, so embeddings are only recomputed when content actually changes.
- Python 3.11 (Dockerfile
python:3.11-slim,pyproject.tomlrequires-python = ">=3.11"). - discord.py >= 2.3.2.
- openai async SDK, pointed at the NaN LiteLLM gateway (
AsyncOpenAI). aiohttpfor the LiteLLM admin metrics calls.- pydantic-settings for
.envparsing. - SQLite (stdlib
sqlite3) as the vector store backend. - Hatchling build backend.
- Ruff for linting and formatting.
main.py boots a SimpleVectorStore against vector_db/vectors.db, loads markdown files from bot/docs/knowledge/ through load_documentation, embeds any new or changed chunks via the LiteLLM embeddings endpoint, and starts the NanBot discord.py client. Graceful shutdown is wired through SIGINT/SIGTERM handlers that cancel pending tasks, persist the vector store, close the OpenAI clients, stop the health HTTP server, and close the bot connection.
On message events, NanBot.on_message filters by ALLOWED_CHANNELS and mention, applies the rate limiter, embeds the question, runs a cosine-similarity search over the in-memory chunks (top-K configurable via TOP_K), and calls LLMClient.answer_with_context against the qwen3.6 chat model. A CircuitBreaker (5 failures, 60-second cool-off) protects the chat endpoint from cascading failures, and an asyncio.Semaphore(5) caps concurrent LLM calls.
Metrics live in bot/metrics.py. They hit the LiteLLM proxy /spend/logs/ui endpoint (configured via LITELLM_PROXY_URL and LITELLM_ADMIN_KEY) and aggregate token usage per user_api_key_alias. The daily scheduler sleeps until METRICS_SEND_HOUR UTC, posts the top-10 report to STATUS_CHANNEL_ID, and then loops every 24 hours.
discord-bot/
├── main.py # Entry point and shutdown wiring
├── bot/
│ ├── __init__.py
│ ├── base.py # NanBot, commands, message handler, health HTTP server
│ ├── config.py # pydantic-settings, paths, logger
│ ├── knowledge.py # SimpleVectorStore, chunking, doc loader
│ ├── llm.py # LLMClient, CircuitBreaker, RAG prompt
│ ├── metrics.py # LiteLLM spend log aggregation and reports
│ └── docs/
│ └── knowledge/ # Embedded markdown corpus
│ ├── intro.md
│ ├── getting-started.md
│ └── models.md
├── Dockerfile
├── entrypoint.sh
├── docker-compose.yml # Local build
├── docker-compose.prod.yml # Production override (pulled image)
├── pyproject.toml
├── .env.example
└── .github/
└── workflows/
└── deploy.yml # GHCR build + SSH deploy
- Python 3.11.
- A Discord application with a bot user, a token, and the following privileged intents enabled in the Discord Developer Portal: MESSAGE CONTENT INTENT and SERVER MEMBERS INTENT. Without them the bot fails to connect with
PrivilegedIntentsRequired. - The bot invited to your guild with permissions to read messages, send messages, embed links, and use slash commands.
- A LiteLLM API key. The bot defaults to
https://api.nan.builders/v1; override withLITELLM_BASE_URLif you run your own gateway. - For the metrics features: network reachability to the LiteLLM proxy URL (defaults to
http://localhost:4000, i.e. the bot is expected to run on the same host) and an admin key with read access to/spend/logs/ui.
git clone https://github.com/helmcode/nan-discord-bot.git
cd nan-discord-bot
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env
# Fill in DISCORD_TOKEN, DISCORD_GUILD_ID, LITELLM_API_KEY, ALLOWED_CHANNELS, etc.
python main.pycp .env.example .env
# Fill in the required variables.
docker compose up --buildThe container exposes the health endpoint on port 9101 (consumed by the HEALTHCHECK defined in the Dockerfile).
NanBot registers two flavors of commands. Text commands use the commands.Bot prefix (currently /) and slash commands are real Discord interactions registered via bot.tree.
| Command | Type | Description | Cooldown |
|---|---|---|---|
/health |
text | Bot status and the number of chunks currently loaded in the vector store. | none |
/docs |
text | List of markdown files loaded from bot/docs/knowledge/. |
none |
/search <q> |
text | Top-3 chunks from the knowledge base for the given query, with cosine score. | none |
/metrics |
slash | Manually trigger the global LiteLLM top-10 token usage report (last 24 hours). | 1 per 3600 s |
/my-metrics |
slash | The caller's personal token usage and per-model breakdown (last 24 hours). | 1 per 300 s |
Auto-response is triggered when the bot is mentioned inside a channel listed in ALLOWED_CHANNELS. Rate limiting allows at most 3 mentions per user per channel per 60-second window; excess messages get a Spanish "demasiadas peticiones" reply.
| Name | Required | Default | Description |
|---|---|---|---|
DISCORD_TOKEN |
yes | — | Bot token from the Discord Developer Portal. |
DISCORD_GUILD_ID |
yes | — | Guild (server) ID the bot is associated with. |
LITELLM_BASE_URL |
no | https://api.nan.builders/v1 |
OpenAI-compatible base URL used for chat completions and embeddings. |
LITELLM_API_KEY |
yes | — | LiteLLM key used by both the chat and embeddings clients. |
LITELLM_PROXY_URL |
no | http://localhost:4000 |
LiteLLM proxy base URL used by the metrics module to call /spend/logs/ui. |
LITELLM_ADMIN_KEY |
no | "" (disables metrics) |
Admin key for the LiteLLM proxy. When empty, metrics commands and the daily report are skipped. |
EMBEDDING_MODEL |
no | qwen3-embedding |
Embedding model identifier sent to the LiteLLM gateway. |
EMBEDDING_DIM |
no | 4096 |
Expected embedding dimensionality. Informational; not enforced at write time. |
TOP_K |
no | 5 |
Number of chunks returned by the vector search used to build the RAG context. |
ALLOWED_CHANNELS |
no | "" (all channels) |
Comma-separated Discord channel IDs the bot will respond in. Empty means every channel is allowed. |
STATUS_CHANNEL_ID |
no | "" (disables daily report) |
Channel ID where the daily metrics report is posted. Required for the scheduler to run. |
METRICS_SEND_HOUR |
no | 9 |
UTC hour (0–23) at which the daily metrics report is posted. |
Markdown files in bot/docs/knowledge/ are loaded at startup by SimpleVectorStore, chunked on paragraph boundaries (target ~2000 chars per chunk with overlap), embedded via the LiteLLM embeddings endpoint, and persisted to vector_db/vectors.db. A doc_hashes table stores a SHA-256 of each source file so unchanged files are skipped on subsequent boots; files that disappear from disk have their chunks evicted from the database.
To update the corpus, edit or add .md files under bot/docs/knowledge/ and restart the bot. Only files whose content hash changed will trigger new embedding API calls.
- Lint:
ruff check . - Format:
ruff format . - Ruff is configured in
pyproject.toml(line-length = 120,target-version = "py311", rulesE, F, I, N, W, UP). - There is currently no test suite. The
devextra installspytestandpytest-asyncio, andpyproject.tomlalready configuresasyncio_mode = "auto"for when tests are added.
Production runs as a Docker container on the inference server.
- On every push to
main,.github/workflows/deploy.ymlbuilds the image and pushes it to GHCR (ghcr.io/helmcode/nan-discord-bot) tagged with bothlatestand the commit SHA. - The same workflow then SSHes into the deploy target, pulls the new image (by SHA, with
latestas fallback), retags it aslatest, and runsdocker compose -f docker-compose.yml -f docker-compose.prod.yml up -d --remove-orphans. - The production compose override uses
network_mode: host, mounts the read-only knowledge directory and a named volume forvector_db, and caps the container at 512 MiB / 1 vCPU. - The Dockerfile starts as root, runs
entrypoint.shwhichchowns thevector_dbvolume to the unprivilegedbotuser, thenexec gosu bot python main.py.
| Secret | Purpose |
|---|---|
SERVER_HOST |
Hostname or IP of the deploy target. |
SERVER_USER |
SSH username on the deploy target. |
SSH_PRIVATE_KEY |
SSH private key authorized on the deploy target. |
DEPLOY_DIR |
Optional. Working directory on the deploy target. Falls back to $HOME/nan-discord-bot when empty or unset. |
- Branch from
mainand open a pull request. - Run
ruff check .andruff format .before pushing. - Never commit
.envfiles or any token, API key, or secret.