Skip to content

Full text search optimizations#30

Open
Ankur Goyal (ankrgyl) wants to merge 10 commits into
0.22.0-tweaksfrom
more-debugging
Open

Full text search optimizations#30
Ankur Goyal (ankrgyl) wants to merge 10 commits into
0.22.0-tweaksfrom
more-debugging

Conversation

@ankrgyl
Copy link
Copy Markdown

@ankrgyl Ankur Goyal (ankrgyl) commented May 2, 2026

The goal is to reduce the amount of work (specifically I/O) required to prove that a phrase (which may have many terms of varying frequency) cannot match a set of documents. This basically optimizes for needle-in-a-haystack phrases.

  • Look up term metadata before loading positional postings, so missing terms fail fast.
  • Probe likely-rarer terms first, with an adaptive per-query ordering based on observed misses.
  • Cache duplicate term metadata lookups within a phrase.
  • For long exact phrases, run a cheap two-term positional preflight using the lowest-cost term pair.
  • Reuse preflight postings when building the final phrase scorer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant