Skip to content

Fix/pr 375 chunking import#383

Open
ASuresh0524 wants to merge 4 commits into
mainfrom
fix/pr-375-chunking-import
Open

Fix/pr 375 chunking import#383
ASuresh0524 wants to merge 4 commits into
mainfrom
fix/pr-375-chunking-import

Conversation

@ASuresh0524

Copy link
Copy Markdown
Collaborator

What does this PR do?

Related Issues

Fixes #

Checklist

  • Tests pass (uv run pytest)
  • Code formatted (ruff format and ruff check)
  • Pre-commit hooks pass (pre-commit run --all-files)

luojiyin1987 and others added 4 commits June 13, 2026 17:43
- Add max_tokens_per_chunk param to create_traditional_chunks() and
  create_text_chunks() — auto-scales chunk_size using existing
  calculate_safe_chunk_size() and validates with validate_chunk_token_limits()
- Also scales AST chunk_size when token limit is given
- Revalidate chunk_overlap after scaling to avoid SentenceSplitter errors
- Add BaseRAGExample._resolve_chunk_token_limit() helper to resolve the
  embedding model's token limit from CLI args
- build_index() now warns if existing chunks exceed the model's token limit

Backward compat: default max_tokens_per_chunk=None (no-op, identical behavior)
- Delete _traditional_chunks_as_dicts() — it was a pure alias for
  create_traditional_chunks().  Replace all 6 call sites with direct calls.
- Extract _parse_ast_chunk_output() — normalizes the 4 different chunk
  output shapes (object, str, dict with content, dict with text) into a
  uniform (text, metadata) tuple.  Removes 20 lines of inline dispatch
  from create_ast_chunks().
- Net: +28/−35 lines, no behaviour change
PR #375 renamed the helper to create_traditional_chunks but apps/chunking
still imports the old name, breaking test collection on all platforms.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants