Skip to content

rankgnar/arkive

Repository files navigation

arkive

npm version CI License: MIT Node.js

banner

Auto-indexes every PR, commit, and issue in your repo. Answers why is the code like this? with local semantic search. No cloud, no Confluence, no forms.


The problem

You open a file. The code is doing something strange. You have no idea why.

You check git blame — cryptic commit message. You search Slack — nothing. You ask a colleague — they don't remember either.

arkive fixes this. It indexes your GitHub activity and makes it searchable in natural language — locally, instantly, with no external services.

How it works

GitHub API  →  Ingest  →  Local embeddings (MiniLM)  →  Orama vector store
                                                               ↓
                                          CLI / REST API / MCP server
  1. Ingest — pulls PRs, commits, and issues from GitHub
  2. Embed — generates semantic embeddings locally with Xenova/all-MiniLM-L6-v2
  3. Index — stores in a local Orama hybrid-search database (BM25 + vector)
  4. Search — answers natural language queries in milliseconds

Everything runs on your machine. No cloud, no telemetry, no API keys required beyond GitHub.


Quick start

# 1. Install globally
npm install -g arkive

# 2. Index your repo
arkive ingest your-org/your-repo

# 3. Ask questions
arkive search "why did we switch to postgres"

That's it. Results in seconds.


Installation

npm install -g arkive

Requirements: Node.js ≥ 18

For private repos, set your GitHub token:

export GITHUB_TOKEN=ghp_your_token_here

CLI Reference

arkive ingest <owner/repo>

Fetch and index GitHub activity for a repository.

arkive ingest facebook/react
arkive ingest your-org/backend --kinds pr,issue
arkive ingest your-org/backend --since 2024-01-01 --limit 500
arkive ingest your-org/backend --token ghp_xxx
Option Description Default
-t, --token <token> GitHub personal access token $GITHUB_TOKEN
-s, --since <date> Only fetch activity after this date (ISO 8601) all time
-l, --limit <n> Max documents per kind to fetch 200
-k, --kinds <list> Comma-separated: pr,commit,issue all
-d, --data-dir <path> Custom data directory ~/.arkive

arkive search "<query>"

Search the knowledge base with natural language.

arkive search "why did we switch to postgres"
arkive search "auth middleware refactor"
arkive search "performance improvements q4" --mode hybrid
arkive search "database migration" --kind pr --repo your-org/backend
arkive search "redis caching" --json
Option Description Default
-m, --mode <mode> hybrid, semantic, fulltext hybrid
-k, --kind <kind> Filter: pr, commit, issue all
-r, --repo <owner/repo> Filter by repository all
-l, --limit <n> Max results 10
-d, --data-dir <path> Custom data directory ~/.arkive
--json Output raw JSON false

Search modes:

  • hybrid — BM25 keyword matching + semantic vector similarity (best results)
  • semantic — Pure embedding similarity (great for conceptual questions)
  • fulltext — BM25 only (fast, no GPU/model needed)

arkive serve

Start the REST API and MCP server for integration with coding agents.

arkive serve
arkive serve --port 3737
Option Description Default
-p, --port <number> Port to listen on 3737
-d, --data-dir <path> Custom data directory ~/.arkive

Starts:

  • REST API at http://localhost:3737/api
  • MCP server at http://localhost:3737/mcp

REST API

# Search
GET  /api/search?q=why+did+we+switch+to+postgres&mode=hybrid&limit=5
POST /api/search  { "query": "...", "mode": "hybrid", "kind": "pr", "limit": 5 }

# Stats
GET  /api/stats

# Health
GET  /api/health

Example response:

{
  "query": "why did we switch to postgres",
  "mode": "hybrid",
  "count": 3,
  "results": [
    {
      "score": 0.921,
      "kind": "pr",
      "repo": "your-org/backend",
      "title": "feat: migrate from MySQL to PostgreSQL",
      "url": "https://github.com/your-org/backend/pull/142",
      "author": "alice",
      "createdAt": "2023-11-15T10:30:00Z",
      "decisions": ["Switching to Postgres for better JSON support and full-text search..."],
      "body": "## Why\nMySQL's JSON handling was limiting our query complexity..."
    }
  ]
}

MCP Server (Claude Code & Cursor)

arkive exposes a Model Context Protocol server so coding agents can search your knowledge base directly from the IDE.

Claude Code (~/.claude.json)

{
  "mcpServers": {
    "arkive": {
      "url": "http://localhost:3737/mcp"
    }
  }
}

Cursor (.cursor/mcp.json)

{
  "mcpServers": {
    "arkive": {
      "url": "http://localhost:3737/mcp"
    }
  }
}

Once configured, ask your agent:

"Why did we switch to PostgreSQL?" "What issues led to the auth refactor?" "Find PRs about rate limiting"

The agent calls arkive_search automatically.


Data & Privacy

  • All data is stored locally in ~/.arkive/ (SQLite-like JSON snapshot)
  • The embedding model (all-MiniLM-L6-v2) runs on-device, no API calls
  • GitHub data is fetched via the GitHub API and stored only on your machine
  • Zero telemetry, zero cloud

Configuration

Environment variable Description
GITHUB_TOKEN GitHub personal access token (required for private repos)
ARKIVE_DATA_DIR Override default data directory (~/.arkive)

Development

git clone https://github.com/rankgnar/arkive
cd arkive
npm install
npm run build

# Run locally
node dist/cli/index.js ingest your-org/repo
node dist/cli/index.js search "your query"

License

MIT © Raul Rosello

About

Self-hosted knowledge base for developer teams. Automatically captures decisions, incidents, and architecture insights from GitHub activity.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors