Notion CMS sync with PGlite for portable PostgreSQL storage.
- All content is managed in a Notion database
- Syncs via a GitHub Action that:
- Extracts metadata and article content from Notion
- Downloads and stores images as BLOBs in the database
- Uses GPT-4o-mini to generate article descriptions and sample questions for semantic search
- Creates vector embeddings using OpenAI's text-embedding-3-small model
- Exports the database as a portable gzipped tarball (
db/notion.db.tar.gz) - Commits the database to this repo and the frontend repo
The sync pipeline uses PGlite (WASM-based PostgreSQL) instead of a traditional PostgreSQL server. This means:
- No Docker or external database required
- Database runs entirely in-process
- Supports pgvector for embeddings
- Portable tarball format for easy distribution
- Fetch Phase - Retrieve data from Notion, download images
- Update Phase - Upsert articles, store images, generate embeddings
- Upload Phase - Export database to tarball
Smart diffing skips unchanged articles based on last_edited_time timestamps.
Clone this repo:
git clone git@github.com:adit-bala/knowledge-base.git
cd knowledge-base
npm installCreate a .env file:
NOTION_TOKEN=<notion api key>
NOTION_DB_ID=<id of notion database>
OPENAI_API_KEY=<openai api key>
For the backend (in ./backend/.env):
OPENAI_API_KEY=<openai api key>
CO_API_KEY=<cohere api key>
No Docker required! Simply run:
npm run syncThis will:
- Fetch all articles from Notion
- Download images and store them as BLOBs
- Generate LLM descriptions and embeddings for each article
- Export to
db/notion.db.tar.gz
Interactive query tool with multiple modes:
npm run queryCommands:
/sql <query>- Execute raw SQL/articles- List all articles/stats- Show database stats/ask <question>- Hybrid search (vector + full-text) returning top 5 articles<text>- RAG search with AI-generated answer/quit- Exit
Example:
> /ask What does Aditya do for fun?
> /sql SELECT title, tags FROM article LIMIT 5
> What has Aditya been working on?
To run the backend:
cd backend/
docker compose up --buildTest with:
curl -X POST http://localhost:3000/ask \
-H "Content-Type: application/json" \
-d '{"question": "What does Aditya like to do?"}'Run all tests:
npm testThe sync workflow runs daily at 9 AM PST and can be triggered manually. It:
- Syncs Notion → PGlite
- Commits
db/notion.db.tar.gzto this repo - Pushes the database to the frontend repo