Releases · arthurfantaci/graphrag-api-db

21 Feb 20:09

arthurfantaci

v0.2.0

701e7fb

v0.2.0 Latest

Latest

What's New

Community Summary Embeddings (#36)

New CommunityEmbedder class embeds Community node summaries with Voyage AI voyage-4 (1024d)
community_summary_embeddings vector index enables semantic search over communities
Idempotent — only embeds communities missing summary_embedding

Gleaning Entity Label Fix (#37)

Fixed ExtractionGleaner._merge_gleaned_results() MERGE pattern to include __Entity__ and __KGBuilder__ labels
Gleaned entities are now visible to entity resolver, cross-label dedup, and downstream queries
Added examples/backfill_entity_labels.py utility for repairing existing data
Added examples/diagnose_concept_anomaly.py diagnostic script

Rename Jama-Prefixed Classes (#39)

JamaGuideScraper → GuideScraper
JamaHTMLLoader → GuideHTMLLoader
JamaKGPipelineConfig → KGPipelineConfig
create_jama_kg_pipeline() → create_kg_pipeline()
User-Agent updated to GuideScraper/0.1.0
All proper nouns (URLs, "Jama Software", "Jama Connect") preserved

Documentation Overhaul

README.md comprehensively updated: architecture diagram with all 10 post-processing steps, 15 features, complete project structure (42 source files, 13 tests, 4 examples), full schema (18 node types, 16 relationship types), Voyage AI configuration, community search queries
CLAUDE.md updated with CommunityEmbedder module, gleaning label requirement, renumbered modules

Full Changelog

v0.1.0...v0.2.0

Assets 2

20 Feb 17:32

arthurfantaci

v0.1.0

e794341

v0.1.0 — Initial Release

The first release of the GraphRAG Knowledge Graph Pipeline — a complete end-to-end system that scrapes Jama Software's Essential Guide to Requirements Management and Traceability and loads it into a Neo4j knowledge graph using LLM-based entity extraction, vector embeddings, and community detection.

Highlights

5-stage pipeline: Scrape → Extract & Embed → Normalize → Supplement → Validate
Schema-constrained extraction with 10 node types and 10 relationship types via neo4j_graphrag
Voyage AI voyage-4 embeddings (1024d) with automatic OpenAI fallback
Leiden community detection with LLM-generated community summaries
Industry taxonomy normalization consolidating 100+ variants into 18 canonical industries
Comprehensive validation with pass/fail checks and idempotent repair operations

Added

Async web scraping pipeline with httpx and optional Playwright for JS-rendered content
Neo4j GraphRAG integration via neo4j_graphrag.SimpleKGPipeline
LangChain HTMLHeaderTextSplitter for hierarchical document chunking
Optional Chonkie SemanticChunker with Savitzky-Golay boundary detection
Entity post-processing pipeline: normalize, deduplicate, cleanup, consolidate, backfill, summarize
LangExtract augmentation with source grounding (text span provenance)
Supplementary graph structure: Chapter, Resource (Image/Video/Webinar), and Glossary nodes
CLI with scrape and validate subcommands, dry-run support, and cost estimation
Pre-flight validation before pipeline ingestion
CI/CD pipeline with linting (Ruff), type checking (ty), unit tests, and integration tests
PEP 561 py.typed marker for downstream type checking support
Example scripts in examples/ directory (knowledge graph querying)
Contributing guide, Dependabot configuration, and README badges

Changed

Consolidated models_core.py into models/content.py subpackage for structural consistency
Unified LLM_EXTRACTED_ENTITY_LABELS as single-source frozenset in extraction/schema.py
Curated public API exports in postprocessing and validation packages

Fixed

Cypher double-WHERE syntax in relabel query (use WITH bridge)
Chunk ordering property name (chunk_index → index)
Voyage AI dimensions in .env.example (1536d → 1024d)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's New

Community Summary Embeddings (#36)

Gleaning Entity Label Fix (#37)

Rename Jama-Prefixed Classes (#39)

Documentation Overhaul

Full Changelog

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

Added

Changed

Fixed

Uh oh!

Releases: arthurfantaci/graphrag-api-db

v0.2.0

What's New

Community Summary Embeddings (#36)

Gleaning Entity Label Fix (#37)

Rename Jama-Prefixed Classes (#39)

Documentation Overhaul

Full Changelog

Uh oh!

v0.1.0 — Initial Release

Highlights

Added

Changed

Fixed

Uh oh!