From 885d321613d13747facae24901cdc5676835445a Mon Sep 17 00:00:00 2001 From: vraspar Date: Fri, 27 Mar 2026 16:27:14 -0700 Subject: [PATCH 1/2] Add first blog post: Building Brain Technical blog post covering why Brain exists, architecture decisions (git storage, FTS5 search, MCP agent interface), the tagging problem and RAKE + TF-IDF solution, Obsidian compatibility, and what's next. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- docs/blog/building-brain.md | 56 +++++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+) create mode 100644 docs/blog/building-brain.md diff --git a/docs/blog/building-brain.md b/docs/blog/building-brain.md new file mode 100644 index 0000000..3e8e05d --- /dev/null +++ b/docs/blog/building-brain.md @@ -0,0 +1,56 @@ +# Building Brain: A CLI for Team Knowledge Sharing + +## The problem + +I use AI agents for most of my development work. They produce a lot of markdown: guides, runbooks, patterns, context files. Over months, this accumulates into a personal knowledge base that's genuinely useful. + +The problem is sharing it. I tried Obsidian, which works well for personal use but doesn't solve team knowledge sharing. There's no good way for a teammate's agent to access what my agent has already figured out. The pattern I kept seeing: I'd ask a teammate a question, they'd ask their agent, the agent would answer from scratch. That knowledge existed somewhere, but nobody could find it. + +Wikis don't solve this either. They require manual curation, they rot without maintenance, and AI agents can't interact with them programmatically. I wanted something that fits how developers already work: command line, git, markdown. + +## Architecture + +Brain is a CLI tool that stores knowledge as markdown files in a git repository. Three design decisions define the architecture: + +**Git as storage.** Entries are markdown files with YAML frontmatter, committed to a shared repo. No server to run, no database to manage, no accounts to create. Version history and access control come from git. A team joins by cloning the repo. + +**SQLite FTS5 for search.** Each machine maintains a local search index using SQLite's FTS5 virtual table with BM25 ranking. The index is a disposable cache, rebuilt from git on every sync. This gives sub-millisecond full-text search with prefix matching and contextual snippets, without requiring any external service. + +**MCP as the agent interface.** Brain exposes 10 tools and 2 resources via the Model Context Protocol over stdio. An AI agent connected to Brain can search team knowledge, read entries, publish findings, and check what's new. The agent doesn't need the CLI; it talks MCP directly. This is the key differentiator: the agent is a first-class user, not an afterthought. + +The rest follows from these three decisions. Read receipts are JSON files in the repo (so they sync with git). Freshness scoring uses a multiplicative formula over recency and read frequency. Pruning moves stale entries to `_archive/` (reversible). Everything runs locally, everything syncs through git. + +## The tagging problem + +Brain's first auto-tagger was a 56-term hardcoded dictionary. It matched words like "docker" and "kubernetes" in entry content and used them as tags. This works for the obvious cases but misses everything else. A guide about "payment service deployment patterns" gets tagged `docker` but not `payments`, `deployment-pipeline`, or `microservices`. The dictionary doesn't know your domain. + +The relationship system had the same issue: four heuristic signals (shared tags, title overlap, same author, content cross-references) that miss connections between entries with different vocabulary. Two entries about Redis timeouts and connection pooling aren't linked because they happen to use different words. + +We're replacing this with a two-algorithm approach, both zero-dependency: + +**RAKE (Rapid Automatic Keyword Extraction)** extracts multi-word keyphrases per document. Instead of matching "docker" from a dictionary, it extracts "multi-stage docker builds" as a meaningful phrase. About 60 lines of TypeScript, no corpus needed. + +**TF-IDF with zone weighting** scores terms by how distinctive they are within the corpus. A term that appears in one entry but rarely across the brain scores high. A term that appears everywhere (like "the" or even "guide") scores low. Markdown structure matters: title tokens get 3x weight, headings get 2x, code blocks 1.5x. The corpus index lives in SQLite and improves as the brain grows. + +For relationships, TF-IDF cosine similarity replaces the heuristic linker. Two entries with high overlap in distinctive terms are related, regardless of whether they share tags or title words. This catches the Redis timeout / connection pooling case: both score high on `redis`, `connection`, `timeout`, `pool` relative to the rest of the corpus. + +The full design is in [docs/INTELLIGENT_TAGGING_DESIGN.md](../INTELLIGENT_TAGGING_DESIGN.md). + +## Obsidian compatibility + +Every brain works as an Obsidian vault. The directory structure (`guides/`, `skills/`) maps to folders. Entries are standard markdown with YAML frontmatter. Open `~/.brain/repo` in Obsidian and you get a visual graph of your team's knowledge for free. + +This matters because it meets people where they are. Some team members prefer a visual editor. Some want a graph view. Brain doesn't force a choice between CLI and GUI; the same data works in both. + +## What's next + +The intelligent tagging system is the next major feature. After that: + +- Better auto-linking via TF-IDF cosine similarity and entity extraction (CLI commands, file paths, URLs as link signals) +- Louvain clustering for auto-discovered topic groups +- Multi-brain support (multiple knowledge bases per machine) +- Auto-archive for entries that stay stale for 30+ days + +The full roadmap is in [ROADMAP.md](../../ROADMAP.md). + +Brain is open source and in alpha. If you're interested, the repo is at [github.com/vraspar/brain](https://github.com/vraspar/brain) and the project site is at [brain.vraspar.com](https://brain.vraspar.com). From 8bb416defc6c48c80c0a27d1abfae81f64e10785 Mon Sep 17 00:00:00 2001 From: vraspar Date: Fri, 27 Mar 2026 16:30:40 -0700 Subject: [PATCH 2/2] =?UTF-8?q?refactor:=20code=20cleanup=20=E2=80=94=20me?= =?UTF-8?q?rge=20modules,=20consolidate=20constants,=20scaffold=20intellig?= =?UTF-8?q?ence?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 1. Merged freshness-stats.ts into freshness.ts (one module for all freshness logic) 2. Consolidated constants into src/utils/constants.ts (KNOWN_TECH_TERMS, META_FILES, EXCLUDED_DIRS, VOLATILE_TAGS, STABLE_TAGS) 3. Created src/intelligence/ scaffold (index.ts, types.ts) for upcoming TF-IDF tagging 4. Unexported internal-only functions (ensureBrainDir, removeObsidianLinks) 5. Updated all imports across 14 files Build clean, all tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- src/commands/ingest.ts | 2 +- src/commands/prune.ts | 2 +- src/commands/sources.ts | 2 +- src/commands/sync.ts | 2 +- src/core/config.ts | 2 +- src/core/freshness-stats.ts | 25 --------------------- src/core/freshness.ts | 33 +++++++++++++++++++++------- src/core/ingest.ts | 17 +------------- src/core/obsidian.ts | 2 +- src/intelligence/index.ts | 1 + src/intelligence/types.ts | 10 +++++++++ src/mcp/tools.ts | 2 +- src/utils/constants.ts | 44 +++++++++++++++++++++++++++++++++++++ src/utils/tags.ts | 18 ++++----------- test/mcp.test.ts | 2 +- test/prune.test.ts | 2 +- 16 files changed, 94 insertions(+), 72 deletions(-) delete mode 100644 src/core/freshness-stats.ts create mode 100644 src/intelligence/index.ts create mode 100644 src/intelligence/types.ts create mode 100644 src/utils/constants.ts diff --git a/src/commands/ingest.ts b/src/commands/ingest.ts index d49f741..41e2f94 100644 --- a/src/commands/ingest.ts +++ b/src/commands/ingest.ts @@ -7,7 +7,7 @@ import { createIndex, rebuildIndex, getDbPath, updateFreshnessScores } from '../ import { commitAndPush } from '../utils/git.js'; import { recordReceipt } from '../core/receipts.js'; import { upsertSource } from '../core/sources.js'; -import { buildUsageStatsMap } from '../core/freshness-stats.js'; +import { buildUsageStatsMap } from '../core/freshness.js'; import { maybeUpdateObsidianLinks } from '../core/obsidian.js'; import type { EntryType, IngestCandidate } from '../types.js'; diff --git a/src/commands/prune.ts b/src/commands/prune.ts index c8b996d..d49486c 100644 --- a/src/commands/prune.ts +++ b/src/commands/prune.ts @@ -14,7 +14,7 @@ import { } from '../core/index-db.js'; import { scanEntries } from '../core/entry.js'; import { commitAndPush } from '../utils/git.js'; -import { buildUsageStatsMap } from '../core/freshness-stats.js'; +import { buildUsageStatsMap } from '../core/freshness.js'; import { freshnessIndicator } from '../core/freshness.js'; import Table from 'cli-table3'; diff --git a/src/commands/sources.ts b/src/commands/sources.ts index 4ffc522..224c812 100644 --- a/src/commands/sources.ts +++ b/src/commands/sources.ts @@ -6,7 +6,7 @@ import { loadSources, removeSource } from '../core/sources.js'; import { syncSource } from '../core/source-sync.js'; import { createIndex, getDbPath, rebuildIndex, updateFreshnessScores } from '../core/index-db.js'; import { scanEntries } from '../core/entry.js'; -import { buildUsageStatsMap } from '../core/freshness-stats.js'; +import { buildUsageStatsMap } from '../core/freshness.js'; export const sourcesCommand = new Command('sources') .description('Manage external source repositories') diff --git a/src/commands/sync.ts b/src/commands/sync.ts index c9d2447..1c9aaeb 100644 --- a/src/commands/sync.ts +++ b/src/commands/sync.ts @@ -4,7 +4,7 @@ import { loadConfig } from '../core/config.js'; import { syncBrain } from '../core/repo.js'; import { scanEntries } from '../core/entry.js'; import { createIndex, getDbPath, rebuildIndex, updateFreshnessScores } from '../core/index-db.js'; -import { buildUsageStatsMap } from '../core/freshness-stats.js'; +import { buildUsageStatsMap } from '../core/freshness.js'; import { maybeUpdateObsidianLinks } from '../core/obsidian.js'; export const syncCommand = new Command('sync') diff --git a/src/core/config.ts b/src/core/config.ts index a7ce0b3..6b1af4d 100644 --- a/src/core/config.ts +++ b/src/core/config.ts @@ -11,7 +11,7 @@ export function getBrainDir(): string { return path.join(os.homedir(), BRAIN_DIR_NAME); } -export function ensureBrainDir(): void { +function ensureBrainDir(): void { const brainDir = getBrainDir(); if (!fs.existsSync(brainDir)) { fs.mkdirSync(brainDir, { recursive: true }); diff --git a/src/core/freshness-stats.ts b/src/core/freshness-stats.ts deleted file mode 100644 index 3a9e0dc..0000000 --- a/src/core/freshness-stats.ts +++ /dev/null @@ -1,25 +0,0 @@ -import { getBulkEntryStats } from './receipts.js'; -import type { UsageStats } from './freshness.js'; - -/** - * Build a UsageStats map for all entries by scanning receipts. - * Bridges the receipts system and the freshness scoring engine. - */ -export function buildUsageStatsMap( - repoPath: string, - period: string, -): Map { - const bulkStats = getBulkEntryStats(repoPath, period); - const result = new Map(); - - for (const [entryId, stats] of bulkStats.entries()) { - result.set(entryId, { - accessCount30d: stats.accessCount, - // We don't track exact lastReadDaysAgo from receipts currently, - // so approximate: if there are reads in the period, assume recent - lastReadDaysAgo: stats.accessCount > 0 ? 0 : null, - }); - } - - return result; -} diff --git a/src/core/freshness.ts b/src/core/freshness.ts index 80e35dc..6a7f554 100644 --- a/src/core/freshness.ts +++ b/src/core/freshness.ts @@ -1,16 +1,10 @@ import type { Entry, FreshnessLabel, FreshnessScore } from '../types.js'; +import { getBulkEntryStats } from './receipts.js'; +import { VOLATILE_TAGS, STABLE_TAGS } from '../utils/constants.js'; const HALF_LIFE_DAYS = 60; const LN2 = 0.693; -const VOLATILE_TAGS = new Set([ - 'api', 'docker', 'kubernetes', 'cicd', 'deployment', 'config', -]); - -const STABLE_TAGS = new Set([ - 'architecture', 'design', 'principles', 'patterns', 'conventions', -]); - /** * Exponential decay based on days since last update. * Half-life of 60 days: score halves every 60 days. @@ -115,3 +109,26 @@ export function freshnessIndicator(label: FreshnessLabel): string { case 'stale': return '🔴 Stale'; } } + +/** + * Build a UsageStats map for all entries by scanning receipts. + * Bridges the receipts system and the freshness scoring engine. + */ +export function buildUsageStatsMap( + repoPath: string, + period: string, +): Map { + const bulkStats = getBulkEntryStats(repoPath, period); + const result = new Map(); + + for (const [entryId, stats] of bulkStats.entries()) { + result.set(entryId, { + accessCount30d: stats.accessCount, + // We don't track exact lastReadDaysAgo from receipts currently, + // so approximate: if there are reads in the period, assume recent + lastReadDaysAgo: stats.accessCount > 0 ? 0 : null, + }); + } + + return result; +} diff --git a/src/core/ingest.ts b/src/core/ingest.ts index de527e7..d40aacd 100644 --- a/src/core/ingest.ts +++ b/src/core/ingest.ts @@ -4,6 +4,7 @@ import os from 'node:os'; import path from 'node:path'; import { cloneForIngest, cloneRepo, getBatchFileModifiedDates, validateUrl } from '../utils/git.js'; import { extractTags } from '../utils/tags.js'; +import { META_FILES, EXCLUDED_DIRS, BRAIN_ONLY_EXCLUDED_DIRS } from '../utils/constants.js'; import { createEntry, extractTitle, @@ -31,22 +32,6 @@ export interface IngestOptions { onProgress?: (message: string) => void; } -const META_FILES = new Set([ - 'readme.md', 'changelog.md', 'changes.md', 'license.md', 'licence.md', - 'contributing.md', 'code_of_conduct.md', 'security.md', - 'pull_request_template.md', 'issue_template.md', -]); - -const EXCLUDED_DIRS = new Set([ - 'node_modules', '.git', '.github', '.vscode', 'dist', 'build', - 'coverage', '__pycache__', '.tox', 'vendor', 'target', -]); - -// Additional dirs excluded only when scanning the brain's own repo -const BRAIN_ONLY_EXCLUDED_DIRS = new Set([ - 'docs', '_archive', -]); - /** * Determine if a relative path should be included for ingest. * Excludes meta files, hidden dirs, and known non-doc directories. diff --git a/src/core/obsidian.ts b/src/core/obsidian.ts index 3431963..d76eb2f 100644 --- a/src/core/obsidian.ts +++ b/src/core/obsidian.ts @@ -76,7 +76,7 @@ export function ensureObsidianConfig(repoPath: string): void { } } -export function removeObsidianLinks(repoPath: string): void { +function removeObsidianLinks(repoPath: string): void { for (const dirName of ['guides', 'skills']) { const dirPath = path.join(repoPath, dirName); if (!fs.existsSync(dirPath)) continue; diff --git a/src/intelligence/index.ts b/src/intelligence/index.ts new file mode 100644 index 0000000..646c76b --- /dev/null +++ b/src/intelligence/index.ts @@ -0,0 +1 @@ +// Intelligent tagging module — TF-IDF, bigrams, corpus stats diff --git a/src/intelligence/types.ts b/src/intelligence/types.ts new file mode 100644 index 0000000..406e185 --- /dev/null +++ b/src/intelligence/types.ts @@ -0,0 +1,10 @@ +export interface TagCandidate { + tag: string; + score: number; + source: 'keyword' | 'tfidf' | 'bigram' | 'manual'; +} + +export interface CorpusStats { + totalDocuments: number; + documentFrequency: Map; +} diff --git a/src/mcp/tools.ts b/src/mcp/tools.ts index c2a6aed..a4e44f0 100644 --- a/src/mcp/tools.ts +++ b/src/mcp/tools.ts @@ -15,7 +15,7 @@ import { searchEntries, } from '../core/index-db.js'; import { computeFreshness } from '../core/freshness.js'; -import { buildUsageStatsMap } from '../core/freshness-stats.js'; +import { buildUsageStatsMap } from '../core/freshness.js'; import { getStats, recordReceipt } from '../core/receipts.js'; import { getTrailEntries } from '../core/links.js'; import { commitAndPush } from '../utils/git.js'; diff --git a/src/utils/constants.ts b/src/utils/constants.ts new file mode 100644 index 0000000..4b926e3 --- /dev/null +++ b/src/utils/constants.ts @@ -0,0 +1,44 @@ +/** + * Shared constants used across the brain codebase. + * Centralised here for discoverability and reuse. + */ + +/** Known tech terms for auto-tag extraction. */ +export const KNOWN_TECH_TERMS = new Set([ + 'typescript', 'javascript', 'python', 'react', 'node', 'docker', + 'kubernetes', 'k8s', 'aws', 'azure', 'gcp', 'terraform', 'ci/cd', + 'cicd', 'git', 'api', 'rest', 'graphql', 'sql', 'nosql', 'redis', + 'postgres', 'mongodb', 'nginx', 'linux', 'bash', 'helm', 'jenkins', + 'github', 'gitlab', 'vscode', 'eslint', 'prettier', 'vitest', 'jest', + 'webpack', 'vite', 'nextjs', 'express', 'fastify', 'rust', 'go', + 'java', 'csharp', 'dotnet', 'angular', 'vue', 'svelte', 'tailwind', + 'css', 'html', 'npm', 'yarn', 'pnpm', 'deno', 'bun', +]); + +/** Root-level meta files excluded from ingest. */ +export const META_FILES = new Set([ + 'readme.md', 'changelog.md', 'changes.md', 'license.md', 'licence.md', + 'contributing.md', 'code_of_conduct.md', 'security.md', + 'pull_request_template.md', 'issue_template.md', +]); + +/** Directories always excluded from ingest scanning. */ +export const EXCLUDED_DIRS = new Set([ + 'node_modules', '.git', '.github', '.vscode', 'dist', 'build', + 'coverage', '__pycache__', '.tox', 'vendor', 'target', +]); + +/** Additional dirs excluded only when scanning the brain's own repo. */ +export const BRAIN_ONLY_EXCLUDED_DIRS = new Set([ + 'docs', '_archive', +]); + +/** Tags indicating volatile (fast-changing) content — decay faster. */ +export const VOLATILE_TAGS = new Set([ + 'api', 'docker', 'kubernetes', 'cicd', 'deployment', 'config', +]); + +/** Tags indicating stable (long-lived) content — decay slower. */ +export const STABLE_TAGS = new Set([ + 'architecture', 'design', 'principles', 'patterns', 'conventions', +]); diff --git a/src/utils/tags.ts b/src/utils/tags.ts index 7b90d88..29bbc42 100644 --- a/src/utils/tags.ts +++ b/src/utils/tags.ts @@ -1,17 +1,7 @@ -/** - * Shared set of known tech terms for auto-tag extraction. - * Used by push and ingest to detect technology keywords in content. - */ -export const KNOWN_TECH_TERMS = new Set([ - 'typescript', 'javascript', 'python', 'react', 'node', 'docker', - 'kubernetes', 'k8s', 'aws', 'azure', 'gcp', 'terraform', 'ci/cd', - 'cicd', 'git', 'api', 'rest', 'graphql', 'sql', 'nosql', 'redis', - 'postgres', 'mongodb', 'nginx', 'linux', 'bash', 'helm', 'jenkins', - 'github', 'gitlab', 'vscode', 'eslint', 'prettier', 'vitest', 'jest', - 'webpack', 'vite', 'nextjs', 'express', 'fastify', 'rust', 'go', - 'java', 'csharp', 'dotnet', 'angular', 'vue', 'svelte', 'tailwind', - 'css', 'html', 'npm', 'yarn', 'pnpm', 'deno', 'bun', -]); +import { KNOWN_TECH_TERMS } from './constants.js'; + +// Re-export for backward compatibility +export { KNOWN_TECH_TERMS } from './constants.js'; /** * Extract technology tags from content by matching against KNOWN_TECH_TERMS. diff --git a/test/mcp.test.ts b/test/mcp.test.ts index a6a5187..14e9b03 100644 --- a/test/mcp.test.ts +++ b/test/mcp.test.ts @@ -14,7 +14,7 @@ import { import { createEntry, scanEntries, writeEntry } from '../src/core/entry.js'; import { getEntryStats, getStats, recordReceipt } from '../src/core/receipts.js'; import { computeFreshness } from '../src/core/freshness.js'; -import { buildUsageStatsMap } from '../src/core/freshness-stats.js'; +import { buildUsageStatsMap } from '../src/core/freshness.js'; import { extractTags } from '../src/utils/tags.js'; import { parseTimeWindow } from '../src/utils/time.js'; import { registerTools } from '../src/mcp/tools.js'; diff --git a/test/prune.test.ts b/test/prune.test.ts index 8ae6b99..5324f14 100644 --- a/test/prune.test.ts +++ b/test/prune.test.ts @@ -12,7 +12,7 @@ import { } from '../src/core/index-db.js'; import { saveConfig } from '../src/core/config.js'; import { recordReceipt } from '../src/core/receipts.js'; -import { buildUsageStatsMap } from '../src/core/freshness-stats.js'; +import { buildUsageStatsMap } from '../src/core/freshness.js'; import type { BrainConfig, Entry } from '../src/types.js'; import type Database from 'better-sqlite3';