Skip to content

Releases: ziffan/ChunkLab

ChunkLab v0.2.0

02 May 10:41

Choose a tag to compare

ChunkLab v0.2.0

Initial public release of ChunkLab — an interactive sandbox for text chunking experimentation, built for RAG pipelines.

Features

  • 7 chunking strategies: Fixed, Recursive, Token-aware, Sentence (pysbd), Sentence Indonesia, Legal Indonesia, Markdown Structure
  • Per-chunk quality metrics (Boundary Quality, Information Density, completeness)
  • Semantic retrieval simulation via intfloat/multilingual-e5-large (top-K: 5/10/25/50)
  • Comparison mode (side-by-side diff stats)
  • Multi-format export: JSON, JSONL, YAML
  • Token estimation: OpenAI, Ollama, LM Studio, OpenRouter, Gemini, Mock
  • ChunkLegend UI component explaining all per-chunk indicators
  • Docker Compose support (docker compose up --build)
  • Overlap visualization (amber/cyan)
  • Regex metadata extraction with capture groups

Quickstart (Docker)

git clone https://github.com/ziffan/ChunkLab.git
cd ChunkLab
cp backend/.env.example backend/.env
docker compose up --build

UI available at http://localhost:80