Support multi-field TSV parsing in collection loader by mushkanrana73 · Pull Request #40 · JuliaGenAI/ColBERT.jl

mushkanrana73 · 2026-03-22T03:55:29Z

The current loader reads TSV lines without parsing, causing document IDs and additional fields (e.g., titles) to be included in the document text.

This PR updates the logic to:

Example:
doc_id \t title \t body → "title body"

This improves compatibility with multi-field datasets and aligns with Python ColBERT behavior.

Support multi-field TSV parsing in collection loader

6b527e5

mushkanrana73 requested a review from codetalker7 as a code owner March 22, 2026 03:55

Provide feedback