Auto-indexes every PR, commit, and issue in your repo. Answers why is the code like this? with local semantic search. No cloud, no Confluence, no forms.
You open a file. The code is doing something strange. You have no idea why.
You check git blame — cryptic commit message. You search Slack — nothing. You ask a colleague — they don't remember either.
arkive fixes this. It indexes your GitHub activity and makes it searchable in natural language — locally, instantly, with no external services.
GitHub API → Ingest → Local embeddings (MiniLM) → Orama vector store
↓
CLI / REST API / MCP server
- Ingest — pulls PRs, commits, and issues from GitHub
- Embed — generates semantic embeddings locally with
Xenova/all-MiniLM-L6-v2 - Index — stores in a local Orama hybrid-search database (BM25 + vector)
- Search — answers natural language queries in milliseconds
Everything runs on your machine. No cloud, no telemetry, no API keys required beyond GitHub.
# 1. Install globally
npm install -g arkive
# 2. Index your repo
arkive ingest your-org/your-repo
# 3. Ask questions
arkive search "why did we switch to postgres"That's it. Results in seconds.
npm install -g arkiveRequirements: Node.js ≥ 18
For private repos, set your GitHub token:
export GITHUB_TOKEN=ghp_your_token_hereFetch and index GitHub activity for a repository.
arkive ingest facebook/react
arkive ingest your-org/backend --kinds pr,issue
arkive ingest your-org/backend --since 2024-01-01 --limit 500
arkive ingest your-org/backend --token ghp_xxx| Option | Description | Default |
|---|---|---|
-t, --token <token> |
GitHub personal access token | $GITHUB_TOKEN |
-s, --since <date> |
Only fetch activity after this date (ISO 8601) | all time |
-l, --limit <n> |
Max documents per kind to fetch | 200 |
-k, --kinds <list> |
Comma-separated: pr,commit,issue |
all |
-d, --data-dir <path> |
Custom data directory | ~/.arkive |
Search the knowledge base with natural language.
arkive search "why did we switch to postgres"
arkive search "auth middleware refactor"
arkive search "performance improvements q4" --mode hybrid
arkive search "database migration" --kind pr --repo your-org/backend
arkive search "redis caching" --json| Option | Description | Default |
|---|---|---|
-m, --mode <mode> |
hybrid, semantic, fulltext |
hybrid |
-k, --kind <kind> |
Filter: pr, commit, issue |
all |
-r, --repo <owner/repo> |
Filter by repository | all |
-l, --limit <n> |
Max results | 10 |
-d, --data-dir <path> |
Custom data directory | ~/.arkive |
--json |
Output raw JSON | false |
Search modes:
hybrid— BM25 keyword matching + semantic vector similarity (best results)semantic— Pure embedding similarity (great for conceptual questions)fulltext— BM25 only (fast, no GPU/model needed)
Start the REST API and MCP server for integration with coding agents.
arkive serve
arkive serve --port 3737| Option | Description | Default |
|---|---|---|
-p, --port <number> |
Port to listen on | 3737 |
-d, --data-dir <path> |
Custom data directory | ~/.arkive |
Starts:
- REST API at
http://localhost:3737/api - MCP server at
http://localhost:3737/mcp
# Search
GET /api/search?q=why+did+we+switch+to+postgres&mode=hybrid&limit=5
POST /api/search { "query": "...", "mode": "hybrid", "kind": "pr", "limit": 5 }
# Stats
GET /api/stats
# Health
GET /api/healthExample response:
{
"query": "why did we switch to postgres",
"mode": "hybrid",
"count": 3,
"results": [
{
"score": 0.921,
"kind": "pr",
"repo": "your-org/backend",
"title": "feat: migrate from MySQL to PostgreSQL",
"url": "https://github.com/your-org/backend/pull/142",
"author": "alice",
"createdAt": "2023-11-15T10:30:00Z",
"decisions": ["Switching to Postgres for better JSON support and full-text search..."],
"body": "## Why\nMySQL's JSON handling was limiting our query complexity..."
}
]
}arkive exposes a Model Context Protocol server so coding agents can search your knowledge base directly from the IDE.
{
"mcpServers": {
"arkive": {
"url": "http://localhost:3737/mcp"
}
}
}{
"mcpServers": {
"arkive": {
"url": "http://localhost:3737/mcp"
}
}
}Once configured, ask your agent:
"Why did we switch to PostgreSQL?" "What issues led to the auth refactor?" "Find PRs about rate limiting"
The agent calls arkive_search automatically.
- All data is stored locally in
~/.arkive/(SQLite-like JSON snapshot) - The embedding model (
all-MiniLM-L6-v2) runs on-device, no API calls - GitHub data is fetched via the GitHub API and stored only on your machine
- Zero telemetry, zero cloud
| Environment variable | Description |
|---|---|
GITHUB_TOKEN |
GitHub personal access token (required for private repos) |
ARKIVE_DATA_DIR |
Override default data directory (~/.arkive) |
git clone https://github.com/rankgnar/arkive
cd arkive
npm install
npm run build
# Run locally
node dist/cli/index.js ingest your-org/repo
node dist/cli/index.js search "your query"MIT © Raul Rosello
