arkive

Auto-indexes every PR, commit, and issue in your repo. Answers why is the code like this? with local semantic search. No cloud, no Confluence, no forms.

The problem

You open a file. The code is doing something strange. You have no idea why.

You check git blame — cryptic commit message. You search Slack — nothing. You ask a colleague — they don't remember either.

arkive fixes this. It indexes your GitHub activity and makes it searchable in natural language — locally, instantly, with no external services.

How it works

GitHub API  →  Ingest  →  Local embeddings (MiniLM)  →  Orama vector store
                                                               ↓
                                          CLI / REST API / MCP server

Ingest — pulls PRs, commits, and issues from GitHub
Embed — generates semantic embeddings locally with Xenova/all-MiniLM-L6-v2
Index — stores in a local Orama hybrid-search database (BM25 + vector)
Search — answers natural language queries in milliseconds

Everything runs on your machine. No cloud, no telemetry, no API keys required beyond GitHub.

Quick start

# 1. Install globally
npm install -g arkive

# 2. Index your repo
arkive ingest your-org/your-repo

# 3. Ask questions
arkive search "why did we switch to postgres"

That's it. Results in seconds.

Installation

npm install -g arkive

Requirements: Node.js ≥ 18

For private repos, set your GitHub token:

export GITHUB_TOKEN=ghp_your_token_here

CLI Reference

`arkive ingest <owner/repo>`

Fetch and index GitHub activity for a repository.

arkive ingest facebook/react
arkive ingest your-org/backend --kinds pr,issue
arkive ingest your-org/backend --since 2024-01-01 --limit 500
arkive ingest your-org/backend --token ghp_xxx

Option	Description	Default
`-t, --token <token>`	GitHub personal access token	`$GITHUB_TOKEN`
`-s, --since <date>`	Only fetch activity after this date (ISO 8601)	all time
`-l, --limit <n>`	Max documents per kind to fetch	`200`
`-k, --kinds <list>`	Comma-separated: `pr,commit,issue`	all
`-d, --data-dir <path>`	Custom data directory	`~/.arkive`

`arkive search "<query>"`

Search the knowledge base with natural language.

arkive search "why did we switch to postgres"
arkive search "auth middleware refactor"
arkive search "performance improvements q4" --mode hybrid
arkive search "database migration" --kind pr --repo your-org/backend
arkive search "redis caching" --json

Option	Description	Default
`-m, --mode <mode>`	`hybrid`, `semantic`, `fulltext`	`hybrid`
`-k, --kind <kind>`	Filter: `pr`, `commit`, `issue`	all
`-r, --repo <owner/repo>`	Filter by repository	all
`-l, --limit <n>`	Max results	`10`
`-d, --data-dir <path>`	Custom data directory	`~/.arkive`
`--json`	Output raw JSON	false

Search modes:

hybrid — BM25 keyword matching + semantic vector similarity (best results)
semantic — Pure embedding similarity (great for conceptual questions)
fulltext — BM25 only (fast, no GPU/model needed)

`arkive serve`

Start the REST API and MCP server for integration with coding agents.

arkive serve
arkive serve --port 3737

Option	Description	Default
`-p, --port <number>`	Port to listen on	`3737`
`-d, --data-dir <path>`	Custom data directory	`~/.arkive`

Starts:

REST API at http://localhost:3737/api
MCP server at http://localhost:3737/mcp

REST API

# Search
GET  /api/search?q=why+did+we+switch+to+postgres&mode=hybrid&limit=5
POST /api/search  { "query": "...", "mode": "hybrid", "kind": "pr", "limit": 5 }

# Stats
GET  /api/stats

# Health
GET  /api/health

Example response:

{
  "query": "why did we switch to postgres",
  "mode": "hybrid",
  "count": 3,
  "results": [
    {
      "score": 0.921,
      "kind": "pr",
      "repo": "your-org/backend",
      "title": "feat: migrate from MySQL to PostgreSQL",
      "url": "https://github.com/your-org/backend/pull/142",
      "author": "alice",
      "createdAt": "2023-11-15T10:30:00Z",
      "decisions": ["Switching to Postgres for better JSON support and full-text search..."],
      "body": "## Why\nMySQL's JSON handling was limiting our query complexity..."
    }
  ]
}

MCP Server (Claude Code & Cursor)

arkive exposes a Model Context Protocol server so coding agents can search your knowledge base directly from the IDE.

Claude Code (`~/.claude.json`)

{
  "mcpServers": {
    "arkive": {
      "url": "http://localhost:3737/mcp"
    }
  }
}

Cursor (`.cursor/mcp.json`)

{
  "mcpServers": {
    "arkive": {
      "url": "http://localhost:3737/mcp"
    }
  }
}

Once configured, ask your agent:

"Why did we switch to PostgreSQL?" "What issues led to the auth refactor?" "Find PRs about rate limiting"

The agent calls arkive_search automatically.

Data & Privacy

All data is stored locally in ~/.arkive/ (SQLite-like JSON snapshot)
The embedding model (all-MiniLM-L6-v2) runs on-device, no API calls
GitHub data is fetched via the GitHub API and stored only on your machine
Zero telemetry, zero cloud

Configuration

Environment variable	Description
`GITHUB_TOKEN`	GitHub personal access token (required for private repos)
`ARKIVE_DATA_DIR`	Override default data directory (`~/.arkive`)

Development

git clone https://github.com/rankgnar/arkive
cd arkive
npm install
npm run build

# Run locally
node dist/cli/index.js ingest your-org/repo
node dist/cli/index.js search "your query"

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
assets		assets
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

arkive

The problem

How it works

Quick start

Installation

CLI Reference

`arkive ingest <owner/repo>`

`arkive search "<query>"`

`arkive serve`

REST API

MCP Server (Claude Code & Cursor)

Claude Code (`~/.claude.json`)

Cursor (`.cursor/mcp.json`)

Data & Privacy

Configuration

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

arkive

The problem

How it works

Quick start

Installation

CLI Reference

arkive ingest <owner/repo>

arkive search "<query>"

arkive serve

REST API

MCP Server (Claude Code & Cursor)

Claude Code (~/.claude.json)

Cursor (.cursor/mcp.json)

Data & Privacy

Configuration

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`arkive ingest <owner/repo>`

`arkive search "<query>"`

`arkive serve`

Claude Code (`~/.claude.json`)

Cursor (`.cursor/mcp.json`)

Packages