Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions docs/blog/building-brain.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Building Brain: A CLI for Team Knowledge Sharing

## The problem

I use AI agents for most of my development work. They produce a lot of markdown: guides, runbooks, patterns, context files. Over months, this accumulates into a personal knowledge base that's genuinely useful.

The problem is sharing it. I tried Obsidian, which works well for personal use but doesn't solve team knowledge sharing. There's no good way for a teammate's agent to access what my agent has already figured out. The pattern I kept seeing: I'd ask a teammate a question, they'd ask their agent, the agent would answer from scratch. That knowledge existed somewhere, but nobody could find it.

Wikis don't solve this either. They require manual curation, they rot without maintenance, and AI agents can't interact with them programmatically. I wanted something that fits how developers already work: command line, git, markdown.

## Architecture

Brain is a CLI tool that stores knowledge as markdown files in a git repository. Three design decisions define the architecture:

**Git as storage.** Entries are markdown files with YAML frontmatter, committed to a shared repo. No server to run, no database to manage, no accounts to create. Version history and access control come from git. A team joins by cloning the repo.

**SQLite FTS5 for search.** Each machine maintains a local search index using SQLite's FTS5 virtual table with BM25 ranking. The index is a disposable cache, rebuilt from git on every sync. This gives sub-millisecond full-text search with prefix matching and contextual snippets, without requiring any external service.

**MCP as the agent interface.** Brain exposes 10 tools and 2 resources via the Model Context Protocol over stdio. An AI agent connected to Brain can search team knowledge, read entries, publish findings, and check what's new. The agent doesn't need the CLI; it talks MCP directly. This is the key differentiator: the agent is a first-class user, not an afterthought.

The rest follows from these three decisions. Read receipts are JSON files in the repo (so they sync with git). Freshness scoring uses a multiplicative formula over recency and read frequency. Pruning moves stale entries to `_archive/` (reversible). Everything runs locally, everything syncs through git.

## The tagging problem

Brain's first auto-tagger was a 56-term hardcoded dictionary. It matched words like "docker" and "kubernetes" in entry content and used them as tags. This works for the obvious cases but misses everything else. A guide about "payment service deployment patterns" gets tagged `docker` but not `payments`, `deployment-pipeline`, or `microservices`. The dictionary doesn't know your domain.

The relationship system had the same issue: four heuristic signals (shared tags, title overlap, same author, content cross-references) that miss connections between entries with different vocabulary. Two entries about Redis timeouts and connection pooling aren't linked because they happen to use different words.

We're replacing this with a two-algorithm approach, both zero-dependency:

**RAKE (Rapid Automatic Keyword Extraction)** extracts multi-word keyphrases per document. Instead of matching "docker" from a dictionary, it extracts "multi-stage docker builds" as a meaningful phrase. About 60 lines of TypeScript, no corpus needed.

**TF-IDF with zone weighting** scores terms by how distinctive they are within the corpus. A term that appears in one entry but rarely across the brain scores high. A term that appears everywhere (like "the" or even "guide") scores low. Markdown structure matters: title tokens get 3x weight, headings get 2x, code blocks 1.5x. The corpus index lives in SQLite and improves as the brain grows.

For relationships, TF-IDF cosine similarity replaces the heuristic linker. Two entries with high overlap in distinctive terms are related, regardless of whether they share tags or title words. This catches the Redis timeout / connection pooling case: both score high on `redis`, `connection`, `timeout`, `pool` relative to the rest of the corpus.

The full design is in [docs/INTELLIGENT_TAGGING_DESIGN.md](../INTELLIGENT_TAGGING_DESIGN.md).

## Obsidian compatibility

Every brain works as an Obsidian vault. The directory structure (`guides/`, `skills/`) maps to folders. Entries are standard markdown with YAML frontmatter. Open `~/.brain/repo` in Obsidian and you get a visual graph of your team's knowledge for free.

This matters because it meets people where they are. Some team members prefer a visual editor. Some want a graph view. Brain doesn't force a choice between CLI and GUI; the same data works in both.

## What's next

The intelligent tagging system is the next major feature. After that:

- Better auto-linking via TF-IDF cosine similarity and entity extraction (CLI commands, file paths, URLs as link signals)
- Louvain clustering for auto-discovered topic groups
- Multi-brain support (multiple knowledge bases per machine)
- Auto-archive for entries that stay stale for 30+ days

The full roadmap is in [ROADMAP.md](../../ROADMAP.md).

Brain is open source and in alpha. If you're interested, the repo is at [github.com/vraspar/brain](https://github.com/vraspar/brain) and the project site is at [brain.vraspar.com](https://brain.vraspar.com).
2 changes: 1 addition & 1 deletion src/commands/ingest.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ import { createIndex, rebuildIndex, getDbPath, updateFreshnessScores } from '../
import { commitAndPush } from '../utils/git.js';
import { recordReceipt } from '../core/receipts.js';
import { upsertSource } from '../core/sources.js';
import { buildUsageStatsMap } from '../core/freshness-stats.js';
import { buildUsageStatsMap } from '../core/freshness.js';
import { maybeUpdateObsidianLinks } from '../core/obsidian.js';
import type { EntryType, IngestCandidate } from '../types.js';

Expand Down
2 changes: 1 addition & 1 deletion src/commands/prune.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ import {
} from '../core/index-db.js';
import { scanEntries } from '../core/entry.js';
import { commitAndPush } from '../utils/git.js';
import { buildUsageStatsMap } from '../core/freshness-stats.js';
import { buildUsageStatsMap } from '../core/freshness.js';
import { freshnessIndicator } from '../core/freshness.js';
import Table from 'cli-table3';

Expand Down
2 changes: 1 addition & 1 deletion src/commands/sources.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import { loadSources, removeSource } from '../core/sources.js';
import { syncSource } from '../core/source-sync.js';
import { createIndex, getDbPath, rebuildIndex, updateFreshnessScores } from '../core/index-db.js';
import { scanEntries } from '../core/entry.js';
import { buildUsageStatsMap } from '../core/freshness-stats.js';
import { buildUsageStatsMap } from '../core/freshness.js';

export const sourcesCommand = new Command('sources')
.description('Manage external source repositories')
Expand Down
2 changes: 1 addition & 1 deletion src/commands/sync.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ import { loadConfig } from '../core/config.js';
import { syncBrain } from '../core/repo.js';
import { scanEntries } from '../core/entry.js';
import { createIndex, getDbPath, rebuildIndex, updateFreshnessScores } from '../core/index-db.js';
import { buildUsageStatsMap } from '../core/freshness-stats.js';
import { buildUsageStatsMap } from '../core/freshness.js';
import { maybeUpdateObsidianLinks } from '../core/obsidian.js';

export const syncCommand = new Command('sync')
Expand Down
2 changes: 1 addition & 1 deletion src/core/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ export function getBrainDir(): string {
return path.join(os.homedir(), BRAIN_DIR_NAME);
}

export function ensureBrainDir(): void {
function ensureBrainDir(): void {
const brainDir = getBrainDir();
if (!fs.existsSync(brainDir)) {
fs.mkdirSync(brainDir, { recursive: true });
Expand Down
25 changes: 0 additions & 25 deletions src/core/freshness-stats.ts

This file was deleted.

33 changes: 25 additions & 8 deletions src/core/freshness.ts
Original file line number Diff line number Diff line change
@@ -1,16 +1,10 @@
import type { Entry, FreshnessLabel, FreshnessScore } from '../types.js';
import { getBulkEntryStats } from './receipts.js';
import { VOLATILE_TAGS, STABLE_TAGS } from '../utils/constants.js';

const HALF_LIFE_DAYS = 60;
const LN2 = 0.693;

const VOLATILE_TAGS = new Set([
'api', 'docker', 'kubernetes', 'cicd', 'deployment', 'config',
]);

const STABLE_TAGS = new Set([
'architecture', 'design', 'principles', 'patterns', 'conventions',
]);

/**
* Exponential decay based on days since last update.
* Half-life of 60 days: score halves every 60 days.
Expand Down Expand Up @@ -115,3 +109,26 @@ export function freshnessIndicator(label: FreshnessLabel): string {
case 'stale': return '🔴 Stale';
}
}

/**
* Build a UsageStats map for all entries by scanning receipts.
* Bridges the receipts system and the freshness scoring engine.
*/
export function buildUsageStatsMap(
repoPath: string,
period: string,
): Map<string, UsageStats> {
const bulkStats = getBulkEntryStats(repoPath, period);
const result = new Map<string, UsageStats>();

for (const [entryId, stats] of bulkStats.entries()) {
result.set(entryId, {
accessCount30d: stats.accessCount,
// We don't track exact lastReadDaysAgo from receipts currently,
// so approximate: if there are reads in the period, assume recent
lastReadDaysAgo: stats.accessCount > 0 ? 0 : null,
});
}

return result;
}
17 changes: 1 addition & 16 deletions src/core/ingest.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import os from 'node:os';
import path from 'node:path';
import { cloneForIngest, cloneRepo, getBatchFileModifiedDates, validateUrl } from '../utils/git.js';
import { extractTags } from '../utils/tags.js';
import { META_FILES, EXCLUDED_DIRS, BRAIN_ONLY_EXCLUDED_DIRS } from '../utils/constants.js';
import {
createEntry,
extractTitle,
Expand Down Expand Up @@ -31,22 +32,6 @@ export interface IngestOptions {
onProgress?: (message: string) => void;
}

const META_FILES = new Set([
'readme.md', 'changelog.md', 'changes.md', 'license.md', 'licence.md',
'contributing.md', 'code_of_conduct.md', 'security.md',
'pull_request_template.md', 'issue_template.md',
]);

const EXCLUDED_DIRS = new Set([
'node_modules', '.git', '.github', '.vscode', 'dist', 'build',
'coverage', '__pycache__', '.tox', 'vendor', 'target',
]);

// Additional dirs excluded only when scanning the brain's own repo
const BRAIN_ONLY_EXCLUDED_DIRS = new Set([
'docs', '_archive',
]);

/**
* Determine if a relative path should be included for ingest.
* Excludes meta files, hidden dirs, and known non-doc directories.
Expand Down
2 changes: 1 addition & 1 deletion src/core/obsidian.ts
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ export function ensureObsidianConfig(repoPath: string): void {
}
}

export function removeObsidianLinks(repoPath: string): void {
function removeObsidianLinks(repoPath: string): void {
for (const dirName of ['guides', 'skills']) {
const dirPath = path.join(repoPath, dirName);
if (!fs.existsSync(dirPath)) continue;
Expand Down
1 change: 1 addition & 0 deletions src/intelligence/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
// Intelligent tagging module — TF-IDF, bigrams, corpus stats
10 changes: 10 additions & 0 deletions src/intelligence/types.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
export interface TagCandidate {
tag: string;
score: number;
source: 'keyword' | 'tfidf' | 'bigram' | 'manual';
}

export interface CorpusStats {
totalDocuments: number;
documentFrequency: Map<string, number>;
}
2 changes: 1 addition & 1 deletion src/mcp/tools.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ import {
searchEntries,
} from '../core/index-db.js';
import { computeFreshness } from '../core/freshness.js';
import { buildUsageStatsMap } from '../core/freshness-stats.js';
import { buildUsageStatsMap } from '../core/freshness.js';
import { getStats, recordReceipt } from '../core/receipts.js';
import { getTrailEntries } from '../core/links.js';
import { commitAndPush } from '../utils/git.js';
Expand Down
44 changes: 44 additions & 0 deletions src/utils/constants.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
/**
* Shared constants used across the brain codebase.
* Centralised here for discoverability and reuse.
*/

/** Known tech terms for auto-tag extraction. */
export const KNOWN_TECH_TERMS = new Set([
'typescript', 'javascript', 'python', 'react', 'node', 'docker',
'kubernetes', 'k8s', 'aws', 'azure', 'gcp', 'terraform', 'ci/cd',
'cicd', 'git', 'api', 'rest', 'graphql', 'sql', 'nosql', 'redis',
'postgres', 'mongodb', 'nginx', 'linux', 'bash', 'helm', 'jenkins',
'github', 'gitlab', 'vscode', 'eslint', 'prettier', 'vitest', 'jest',
'webpack', 'vite', 'nextjs', 'express', 'fastify', 'rust', 'go',
'java', 'csharp', 'dotnet', 'angular', 'vue', 'svelte', 'tailwind',
'css', 'html', 'npm', 'yarn', 'pnpm', 'deno', 'bun',
]);

/** Root-level meta files excluded from ingest. */
export const META_FILES = new Set([
'readme.md', 'changelog.md', 'changes.md', 'license.md', 'licence.md',
'contributing.md', 'code_of_conduct.md', 'security.md',
'pull_request_template.md', 'issue_template.md',
]);

/** Directories always excluded from ingest scanning. */
export const EXCLUDED_DIRS = new Set([
'node_modules', '.git', '.github', '.vscode', 'dist', 'build',
'coverage', '__pycache__', '.tox', 'vendor', 'target',
]);

/** Additional dirs excluded only when scanning the brain's own repo. */
export const BRAIN_ONLY_EXCLUDED_DIRS = new Set([
'docs', '_archive',
]);

/** Tags indicating volatile (fast-changing) content — decay faster. */
export const VOLATILE_TAGS = new Set([
'api', 'docker', 'kubernetes', 'cicd', 'deployment', 'config',
]);

/** Tags indicating stable (long-lived) content — decay slower. */
export const STABLE_TAGS = new Set([
'architecture', 'design', 'principles', 'patterns', 'conventions',
]);
18 changes: 4 additions & 14 deletions src/utils/tags.ts
Original file line number Diff line number Diff line change
@@ -1,17 +1,7 @@
/**
* Shared set of known tech terms for auto-tag extraction.
* Used by push and ingest to detect technology keywords in content.
*/
export const KNOWN_TECH_TERMS = new Set([
'typescript', 'javascript', 'python', 'react', 'node', 'docker',
'kubernetes', 'k8s', 'aws', 'azure', 'gcp', 'terraform', 'ci/cd',
'cicd', 'git', 'api', 'rest', 'graphql', 'sql', 'nosql', 'redis',
'postgres', 'mongodb', 'nginx', 'linux', 'bash', 'helm', 'jenkins',
'github', 'gitlab', 'vscode', 'eslint', 'prettier', 'vitest', 'jest',
'webpack', 'vite', 'nextjs', 'express', 'fastify', 'rust', 'go',
'java', 'csharp', 'dotnet', 'angular', 'vue', 'svelte', 'tailwind',
'css', 'html', 'npm', 'yarn', 'pnpm', 'deno', 'bun',
]);
import { KNOWN_TECH_TERMS } from './constants.js';

// Re-export for backward compatibility
export { KNOWN_TECH_TERMS } from './constants.js';

/**
* Extract technology tags from content by matching against KNOWN_TECH_TERMS.
Expand Down
2 changes: 1 addition & 1 deletion test/mcp.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ import {
import { createEntry, scanEntries, writeEntry } from '../src/core/entry.js';
import { getEntryStats, getStats, recordReceipt } from '../src/core/receipts.js';
import { computeFreshness } from '../src/core/freshness.js';
import { buildUsageStatsMap } from '../src/core/freshness-stats.js';
import { buildUsageStatsMap } from '../src/core/freshness.js';
import { extractTags } from '../src/utils/tags.js';
import { parseTimeWindow } from '../src/utils/time.js';
import { registerTools } from '../src/mcp/tools.js';
Expand Down
2 changes: 1 addition & 1 deletion test/prune.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ import {
} from '../src/core/index-db.js';
import { saveConfig } from '../src/core/config.js';
import { recordReceipt } from '../src/core/receipts.js';
import { buildUsageStatsMap } from '../src/core/freshness-stats.js';
import { buildUsageStatsMap } from '../src/core/freshness.js';
import type { BrainConfig, Entry } from '../src/types.js';
import type Database from 'better-sqlite3';

Expand Down
Loading