Skip to content

Latest commit

 

History

History
86 lines (56 loc) · 7.65 KB

File metadata and controls

86 lines (56 loc) · 7.65 KB

Forge Tooling

Everything in this directory is optional. These are tools that accelerate the Forge methodology. They are not the methodology itself.

The methodology is the feedback loop: formalize, validate cold, fix, repeat. The tools speed it up. Use them, don't use them, build your own, or print your spec and mail it to a review team. The process works regardless of tooling.

What's Here

graph/droppable/

Drop this folder into your project. Copy graph/droppable/ into your project (e.g., tools/forge_graph/), edit forge_graph.toml with your project details, pip install networkx, and run python forge_graph.py validate. No Neo4j required. The graph builds in memory from your markdown docs.

Contains:

  • forge_graph.py -- Project-agnostic graph builder and validator (supports both networkx and Neo4j backends)
  • forge_graph.toml -- Config template (edit for your project)
  • README.md -- Usage guide for both humans and AI agents

graph/ (reference materials)

Background docs on the graph approach:

  • architecture_graph_guide.md -- How the graph concept works and when it pays off
  • expanded_graph_analysis.md -- 18 signal types the graph validates
  • arch_graph_reference.py -- Original project-specific implementation (BobbyTables). Kept for reference; use the agnostic version in droppable/ for your project.

docker/

Future home for Docker configurations (Neo4j setup, etc.).

It's Not as Heavy as It Sounds

"Neo4j dependency graph and Weaviate semantic search" sounds like enterprise infrastructure. It's not. It's docker compose up and five minutes on a CPU. No GPU. No cloud service. No configuration marathon. The containers come up, the docs get embedded, and you're running semantic queries against your spec. If you have Docker installed, you have everything you need.


Three Validation Layers

Forge validation operates at three layers. Each catches things the others can't.

Layer What it catches How it works
Surface Imprecise language in binding contexts Dictionary lint -- pattern matching against the ambiguous language dictionary. Catches "should," "handle gracefully," "as needed" and hundreds of other probabilistically wide terms.
Structure Relationship problems, cascade gaps, phase conflicts, missing WHY blocks Graph analysis -- nodes and edges representing domains, decisions, concepts, and their dependencies. Answers "what breaks if I change this?" via traversal.
Meaning Consistency, contradiction, concept drift, faithfulness Semantic search -- embeddings of spec passages compared by similarity. Finds where two documents discuss the same concept but make opposing claims, even when they share no keywords.

You don't need all three. Each layer adds value independently. But they compound -- the graph tells you which domains are coupled, semantic search tells you whether the coupled domains agree, and the dictionary tells you whether the agreement is stated precisely.

Tooling Options by Setup

Pick what fits your situation. The methodology works at every level -- more tooling means more automated coverage, not a different process.

Level 0: No Infrastructure

  • Dictionary lint: Run as a prompt. Give a model the spec and the semantic review prompt (see ../methodology/prompts/semantic_review_prompt.md). It reads the docs and flags ambiguous language. No tools, no setup, works in any AI chat session.
  • Graph: Skip it. Use manual cascade checking. Works fine up to ~10 domains.
  • Semantic search: Skip it. The cold validation sessions serve as a manual semantic consistency check -- when a cold session asks a question that reveals two documents disagree, that's the consistency lens operating through the model's context window.

Level 1: Python Only (No Docker, No Services)

  • Dictionary lint: Script that converts the dictionary to regex patterns and scans docs. Faster and more consistent than prompt-based, misses some context-dependent ambiguity. Can be combined with a prompt pass for full coverage.
  • Graph: Use networkx (in-memory Python graph library). No database, no server, pip install networkx and go. Architecture graphs are small (hundreds of nodes, not millions) -- networkx handles them easily. Ephemeral -- rebuilds from docs on every run.
  • Semantic search: SQLite now supports vector embeddings. Combined with mozilla-ai/encoderfile or sentence-transformers, you get full semantic analysis with nothing but pip installs. No Weaviate. No Docker. No services. Chunk your docs, embed into SQLite, query by similarity. Same concept reachability checks, contradiction detection, and consistency analysis as the Weaviate path, just lighter weight. For Forge-sized corpora (a few hundred chunks), this runs in seconds on any machine.

This means Level 1 gives you the complete validation stack (graph + semantic + dictionary lint) with zero infrastructure. Just Python packages on a laptop. pip install networkx sentence-transformers and you have everything. An agent with file access can set this up and run it without human intervention.

If you want persistence, visualization, or Cypher queries, move to Level 2. But most projects never need to.

Level 2: Neo4j

  • Everything from Level 1, plus persistent graph storage, Cypher query language, and Neo4j Browser for visualization. The reference implementation in graph/arch_graph_reference.py targets this level. Useful when you want the graph to persist across sessions, when multiple tools query the same graph, or when you want visual exploration of the dependency structure.
  • Setup: Docker is the easiest path. A Docker Compose config for Neo4j will live in docker/ (coming soon).

Level 3: Full Stack (Neo4j + Vector Database)

  • Everything from Level 2, plus a vector database (Weaviate, Qdrant, Chroma, Pinecone, etc.) for semantic search at scale. Enables the meaning layer: consistency lens, contradiction detection, concept drift tracking, and spec-to-code faithfulness checks after code runs.
  • The orchestration question: At this level, an agent that knows how to query all three layers and synthesize the results adds significant value. It uses the graph to identify coupled domains, semantic search to compare what they say, and the dictionary to check how precisely they say it. Building that agent is project-specific -- the Forge methodology tells it what to look for, but the wiring depends on your stack.

Level 4: Institutional Memory System

  • Everything from Level 3, plus persistent cross-session memory, conversation capture, thrash detection, and the ability to trace decisions back to the discussions that produced them. This is the full realization of the Forge tooling vision -- not just validating the current spec, but maintaining the entire knowledge graph of how the project evolved, what was tried and rejected, and why.
  • This is not something most projects need. It's for large-scale, long-lived projects where the cost of lost institutional knowledge justifies the infrastructure. Most projects are well-served by Levels 0-2.

When Tooling Pays Off

  • 5 domains: Level 0-1 is sufficient. Manual checking still works.
  • 10 domains: Level 1-2 starts catching things you miss manually.
  • 15+ domains: Level 2+ is essential. Manual cascade checking is unreliable at this scale.
  • Cross-cutting changes: Graph tooling (any level) shows full blast radius instantly.
  • Multi-model validation: Semantic search helps reconcile findings across model runs.

The graph's value scales with the square of domain count. Each new domain adds N potential dependencies with existing domains.