Skip to content

Evaluate extracting reusable packages from internal/ #2

@blackwell-systems

Description

@blackwell-systems

Context

knowing's internal/ contains several components with clean boundaries that could serve broader use cases as independent Go packages. This issue tracks evaluation of what to extract, when, and under what stability guarantees.

Candidates

Strong candidates (clean boundaries, general-purpose)

Package Current location What it does Potential consumers
Hierarchical Merkle tree internal/snapshot/hierarchical.go + merkle.go Semantic-boundary Merkle trees with subgraph roots, typed diffs, context pack roots Any content-addressed system: config management, infrastructure graphs, dependency trackers, audit systems
Subgraph cache internal/cache/subgraph.go TTL-bounded cache keyed by Merkle roots with selective package-scoped invalidation Anything using content-addressed caching
GCF/TOON wire formats internal/wire/gcf.go + toon.go Token-optimized wire formats for LLM context delivery Any MCP server, any tool sending structured data to agents
Community detection internal/community/ Pluggable Algorithm interface with Louvain + label propagation Graph analysis, visualization, social network analysis
Hash identity internal/types/types.go (Hash, NewHash, domain prefixes, Verify) Content-addressed identity with type-safe domain prefixes Any Go project using SHA-256 content addressing
Equivalence classes internal/context/equivalence.go + universal_seeds.go Vocabulary bridging between natural language and code symbol names Code search tools, developer-facing retrieval systems

Moderate candidates (need interface decoupling)

Package Blocker
Tree-sitter extractor Depends on types package; needs a minimal interface
LSP enrichment Depends on types + store; the pattern is reusable but needs abstraction
RWR + HITS algorithms Graph algorithms are general but wrapped in knowing-specific scoring

Not extractable (too knowing-specific)

  • internal/store/sqlite.go (schema is knowing-specific)
  • internal/mcp/ (tool definitions are product-specific)
  • internal/daemon/ (watcher + lifecycle is product-specific)
  • cmd/knowing/ (CLI is the product)

Stability concerns

The Merkle tree API is NOT stable yet:

  • Hash domain prefixes shipped 2026-05-18 (broke all existing hashes)
  • File-level roots are planned (Phase 4, would add a tree level)
  • The flat tree was just dropped (hierarchical root is now canonical)

Do not extract until:

  1. Hash format is stable for at least one release cycle
  2. File-level roots ship or are explicitly deferred
  3. Subgraph cache has been validated by daily use
  4. At least one external consumer validates the API

Suggested extraction order

  1. GCF/TOON first (helps MCP ecosystem, creates network effect, format is stable)
  2. Community detection second (generic algorithms, no competitive advantage from keeping private)
  3. Equivalence classes third (the concept is useful broadly, knowing's classes are tuned for knowing)
  4. Hierarchical Merkle tree last (the differentiator, extract only after API is stable)

Decision

This is a tracking issue. No extraction should happen until the conditions above are met. The purpose is to maintain awareness of what's extractable so internal code stays clean at the boundaries.

Metadata

Metadata

Assignees

No one assigned

    Labels

    trackingLong-lived tracking issues

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions