Releases: ziffan/ChunkLab
Releases · ziffan/ChunkLab
ChunkLab v0.2.0
ChunkLab v0.2.0
Initial public release of ChunkLab — an interactive sandbox for text chunking experimentation, built for RAG pipelines.
Features
- 7 chunking strategies: Fixed, Recursive, Token-aware, Sentence (pysbd), Sentence Indonesia, Legal Indonesia, Markdown Structure
- Per-chunk quality metrics (Boundary Quality, Information Density, completeness)
- Semantic retrieval simulation via
intfloat/multilingual-e5-large(top-K: 5/10/25/50) - Comparison mode (side-by-side diff stats)
- Multi-format export: JSON, JSONL, YAML
- Token estimation: OpenAI, Ollama, LM Studio, OpenRouter, Gemini, Mock
- ChunkLegend UI component explaining all per-chunk indicators
- Docker Compose support (
docker compose up --build) - Overlap visualization (amber/cyan)
- Regex metadata extraction with capture groups
Quickstart (Docker)
git clone https://github.com/ziffan/ChunkLab.git
cd ChunkLab
cp backend/.env.example backend/.env
docker compose up --buildUI available at http://localhost:80