Context
knowing's internal/ contains several components with clean boundaries that could serve broader use cases as independent Go packages. This issue tracks evaluation of what to extract, when, and under what stability guarantees.
Candidates
Strong candidates (clean boundaries, general-purpose)
| Package |
Current location |
What it does |
Potential consumers |
| Hierarchical Merkle tree |
internal/snapshot/hierarchical.go + merkle.go |
Semantic-boundary Merkle trees with subgraph roots, typed diffs, context pack roots |
Any content-addressed system: config management, infrastructure graphs, dependency trackers, audit systems |
| Subgraph cache |
internal/cache/subgraph.go |
TTL-bounded cache keyed by Merkle roots with selective package-scoped invalidation |
Anything using content-addressed caching |
| GCF/TOON wire formats |
internal/wire/gcf.go + toon.go |
Token-optimized wire formats for LLM context delivery |
Any MCP server, any tool sending structured data to agents |
| Community detection |
internal/community/ |
Pluggable Algorithm interface with Louvain + label propagation |
Graph analysis, visualization, social network analysis |
| Hash identity |
internal/types/types.go (Hash, NewHash, domain prefixes, Verify) |
Content-addressed identity with type-safe domain prefixes |
Any Go project using SHA-256 content addressing |
| Equivalence classes |
internal/context/equivalence.go + universal_seeds.go |
Vocabulary bridging between natural language and code symbol names |
Code search tools, developer-facing retrieval systems |
Moderate candidates (need interface decoupling)
| Package |
Blocker |
| Tree-sitter extractor |
Depends on types package; needs a minimal interface |
| LSP enrichment |
Depends on types + store; the pattern is reusable but needs abstraction |
| RWR + HITS algorithms |
Graph algorithms are general but wrapped in knowing-specific scoring |
Not extractable (too knowing-specific)
internal/store/sqlite.go (schema is knowing-specific)
internal/mcp/ (tool definitions are product-specific)
internal/daemon/ (watcher + lifecycle is product-specific)
cmd/knowing/ (CLI is the product)
Stability concerns
The Merkle tree API is NOT stable yet:
- Hash domain prefixes shipped 2026-05-18 (broke all existing hashes)
- File-level roots are planned (Phase 4, would add a tree level)
- The flat tree was just dropped (hierarchical root is now canonical)
Do not extract until:
- Hash format is stable for at least one release cycle
- File-level roots ship or are explicitly deferred
- Subgraph cache has been validated by daily use
- At least one external consumer validates the API
Suggested extraction order
- GCF/TOON first (helps MCP ecosystem, creates network effect, format is stable)
- Community detection second (generic algorithms, no competitive advantage from keeping private)
- Equivalence classes third (the concept is useful broadly, knowing's classes are tuned for knowing)
- Hierarchical Merkle tree last (the differentiator, extract only after API is stable)
Decision
This is a tracking issue. No extraction should happen until the conditions above are met. The purpose is to maintain awareness of what's extractable so internal code stays clean at the boundaries.
Context
knowing's
internal/contains several components with clean boundaries that could serve broader use cases as independent Go packages. This issue tracks evaluation of what to extract, when, and under what stability guarantees.Candidates
Strong candidates (clean boundaries, general-purpose)
internal/snapshot/hierarchical.go+merkle.gointernal/cache/subgraph.gointernal/wire/gcf.go+toon.gointernal/community/internal/types/types.go(Hash, NewHash, domain prefixes, Verify)internal/context/equivalence.go+universal_seeds.goModerate candidates (need interface decoupling)
Not extractable (too knowing-specific)
internal/store/sqlite.go(schema is knowing-specific)internal/mcp/(tool definitions are product-specific)internal/daemon/(watcher + lifecycle is product-specific)cmd/knowing/(CLI is the product)Stability concerns
The Merkle tree API is NOT stable yet:
Do not extract until:
Suggested extraction order
Decision
This is a tracking issue. No extraction should happen until the conditions above are met. The purpose is to maintain awareness of what's extractable so internal code stays clean at the boundaries.