Skip to content

Conversation

@sudhanshu112233shukla
Copy link

@sudhanshu112233shukla sudhanshu112233shukla commented Jan 20, 2026

Summary

Fixes sourcebot-dev/sourcebot#575
This PR adds support for indexing binary files (files containing null bytes), which are currently skipped by default.

Problem

Currently, IndexBuilder detects files with null bytes (0x00) and marks them as "binary", verifying them with DocChecker. This prevents users from indexing and searching content within binary-like formats (e.g., PDFs, binary-encoded text) if they choose to do so.

Solution

  • Added AllowBinary field to IndexBuilder and build.Options.
  • Updated DocChecker.Check to accept an allowBinary boolean.
  • Modified IndexBuilder.Add to bypass the null-byte check when AllowBinary is true.

Verification

  • Added new test cases in index_test.go to verify that files with null bytes are accepted when the flag is enabled.

@sudhanshu112233shukla
Copy link
Author

the Semgrep - SAST Scan is failing because the workflow in
main is checking for a secret (GH_SEMGREP_SAST_TOKEN) that is missing in this repository. Since this is a pull_request_target workflow, it runs the version from main so I cannot fix it or disable it in this PR. This failure is unrelated to my changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FR] Support zoekt indexing of binary files

1 participant