AI-powered knowledge base that automatically discovers, generates, and curates technology content using LLMs and GitHub automation.
This system uses AI agents to:
- Discover new content from GitHub releases, arXiv papers, and other sources
- Generate well-structured markdown articles with YAML frontmatter
- Review content through GitHub Pull Requests
- Publish approved content to a static site (GitHub Pages)
- ✅ Extensible LLM Support - Works with Anthropic, OpenAI, DeepSeek, Ollama, and more
- ✅ Multiple Discovery Sources - GitHub, arXiv, with easy plugin architecture
- ✅ Automated PR Workflow - Human review before publication
- ✅ Content Deduplication - SHA-256 hash-based duplicate detection
- ✅ Credibility Scoring - Automatically scores content quality (1-10)
- ✅ Static Site Generation - Fast, SEO-friendly Docusaurus site
- ✅ CI/CD Integration - GitHub Actions for automation
┌─────────────────┐ ┌──────────────┐ ┌─────────────┐
│ External │────▶│ Discovery │────▶│ Content │
│ Sources │ │ Agents │ │ Generator │
│ • GitHub │ │ │ │ (LLM) │
│ • arXiv │ └──────────────┘ └──────┬──────┘
└─────────────────┘ │
▼
┌───────────────────────────────────────┐
│ GitHub PR Workflow │
│ • Create branch │
│ • Commit content │
│ • Human review │
└────────────────┬──────────────────────┘
│
▼
┌───────────────────────────────────────┐
│ GitHub Pages Deployment │
│ • Build Docusaurus site │
│ • Deploy to Pages │
└───────────────────────────────────────┘
- Python 3.11+
- Node.js 18+
- GitHub repository
- API keys for your chosen LLM provider
- Clone the repository:
git clone https://github.com/your-org/knowledge-base.git
cd knowledge-base- Install Python dependencies:
pip install -r requirements.txt- Install Node.js dependencies:
npm install- Configure the agent:
Edit scripts/config.yaml:
github:
owner: "your-org"
repo: "knowledge-base"
ai:
provider: "openai" # or anthropic, deepseek
model: "gpt-5.2"- Set environment variables:
export GITHUB_TOKEN="your-github-token"
export OPENAI_API_KEY="your-openai-key"
# or ANTHROPIC_API_KEY, DEEPSEEK_API_KEY- Run the agent:
python -m scripts.agentRun tests:
pytest tests/ -v --cov=scriptsLint code:
ruff check scripts/Build site locally:
npm run startThe system is configured via scripts/config.yaml:
ai:
provider: "openai" # openai, anthropic, deepseek, ollama
model: "gpt-5.2"
max_tokens: 4000
temperature: 0.3
rate_limit:
requests_per_minute: 50
tokens_per_minute: 150000sources:
- id: "github_ai_releases"
type: "github"
enabled: true
query:
repos:
- "openai/openai-python"
- "anthropic/anthropic-sdk-python"
event_types: ["release"]
domain: "ai"
level: "intermediate"github:
owner: "your-org"
repo: "knowledge-base"
pr_labels: ["ai-generated", "needs-review"]
reviewers: ["@expert1", "@expert2"]Add these secrets to your GitHub repository:
ANTHROPIC_API_KEY- Anthropic API key (if using Claude)OPENAI_API_KEY- OpenAI API key (if using GPT)DEEPSEEK_API_KEY- DeepSeek API key (if using DeepSeek)
The GITHUB_TOKEN is automatically provided by GitHub Actions.
Content Agent (.github/workflows/content-agent.yml):
- Manual: Workflow dispatch
- Scheduled: Weekly (Monday 10:00 UTC)
- External: Repository dispatch event
Deployment (.github/workflows/deploy.yml):
- Automatic: On push to
mainbranch - Manual: Workflow dispatch
To avoid GitHub's 60-day workflow inactivity limit, use an external cron service:
Using Vercel Cron (Free):
// vercel.json
{
"crons": [{
"path": "/api/trigger-agent",
"schedule": "0 10 * * 1"
}]
}
// api/trigger-agent.js
export default async function handler(req, res) {
const response = await fetch(
'https://api.github.com/repos/your-org/knowledge-base/dispatches',
{
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.GITHUB_TOKEN}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ event_type: 'scheduled-run' })
}
);
res.status(200).json({ status: 'triggered' });
}- Create adapter class:
# scripts/adapters/custom.py
from .base import BaseModelAdapter, GenerationRequest, GenerationResponse
class CustomAdapter(BaseModelAdapter):
def __init__(self, api_key: str, model: str = "custom-model"):
self.api_key = api_key
self.model = model
def generate(self, request: GenerationRequest) -> GenerationResponse:
# Implement your API call here
pass
def get_name(self) -> str:
return f"custom:{self.model}"- Register with factory:
from scripts.adapters import ModelFactory
from scripts.adapters.custom import CustomAdapter
ModelFactory.register("custom", CustomAdapter)- Update config:
ai:
provider: "custom"
model: "custom-model"
api_key_env: "CUSTOM_API_KEY"- Create discoverer class:
# scripts/discovery/custom.py
from .base import BaseDiscoverer, DiscoveredItem
class CustomDiscoverer(BaseDiscoverer):
def discover(self, last_processed_id=None) -> list[DiscoveredItem]:
# Implement your discovery logic
items = []
# ... fetch from your source ...
return items
def get_source_type(self) -> str:
return "custom"- Update agent to use it:
# scripts/agent.py
from scripts.discovery.custom import CustomDiscoverer
# In _discover_content method:
elif source_type == "custom":
discoverer = CustomDiscoverer(source_id, source_config)- Add to config:
sources:
- id: "custom_source"
type: "custom"
enabled: true
query:
# Your custom configuration
domain: "ai"
level: "beginner"Generated content follows this frontmatter schema:
---
# Required fields
title: "Article Title"
domain: "ai" # ai, blockchain, protocol
level: "beginner" # beginner, intermediate, master
category: "article" # article, tool, resource, video
tags: ["machine-learning", "python"]
# Metadata
created: "2026-02-28T10:00:00Z"
updated: "2026-02-28T10:00:00Z"
sources:
- url: "https://example.com"
title: "Source Title"
accessed_at: "2026-02-28T10:00:00Z"
ai_reviewed: true
human_reviewed: false
status: "pending-review"
# Optional
description: "Brief description"
author: "knowledge-base-agent"
credibility_score: 8
---
# Article content starts hereEnsure environment variables are set:
export ANTHROPIC_API_KEY="your-key"
# or OPENAI_API_KEY, DEEPSEEK_API_KEY
export GITHUB_TOKEN="your-github-token"Adjust rate limits in config.yaml:
ai:
rate_limit:
requests_per_minute: 30 # Lower this
tokens_per_minute: 100000 # Lower thisCheck:
- Sources are enabled in
config.yaml - Discovery sources have new content since last run
- Check agent state:
.agent-state/state.json
- Enable GitHub Pages in repository settings
- Set source to "GitHub Actions"
- Check workflow permissions (Settings → Actions → Workflow permissions)
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a Pull Request
MIT License - see LICENSE file for details
- GitHub Issues: Report bugs or request features
- Documentation: Full technical specification
Built with: