Skip to content

Add more notes about text index#5630

Open
Ergus wants to merge 5 commits intoClickHouse:mainfrom
Ergus:text_index
Open

Add more notes about text index#5630
Ergus wants to merge 5 commits intoClickHouse:mainfrom
Ergus:text_index

Conversation

@Ergus
Copy link
Member

@Ergus Ergus commented Mar 2, 2026

Summary

Checklist

Add a few pending noted recommending text index over ngrambf_v1 and tokenbf_v1

@Ergus Ergus requested a review from a team as a code owner March 2, 2026 12:35
@vercel
Copy link

vercel bot commented Mar 2, 2026

@Ergus is attempting to deploy a commit to the ClickHouse Team on Vercel.

A member of the Team first needs to authorize it.

@Ergus Ergus requested a review from rschu1ze March 2, 2026 12:36
> Note: With text indexes generally availability (GA) starting from ClickHouse version 26.2, bloom filter–based indexes are not recommended anymore for full text search.
Although they are more compact, unfortunately they tend to produce false positives because they are probabilistic.
Furthermore, they offer limited configurability.
> Note: With `text` indexes generally availability (GA) starting from ClickHouse version 26.2, `ngrambf_v1` and `tokenbf_v1` indexes are NOT recommended anymore for full text search.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> Note: With `text` indexes generally availability (GA) starting from ClickHouse version 26.2, `ngrambf_v1` and `tokenbf_v1` indexes are NOT recommended anymore for full text search.
:::note
With general availability (GA) of the `text` index starting from ClickHouse version 26.2, `tokenbf_v1` and `ngrambf_v1` indexes are no longer recommended for full text search.
See page ["Full-text search with text indexes"](/engines/table-engines/mergetree-family/textindexes.md) for details.
:::

> Although they are more compact, unfortunately they tend to produce false positives because they are probabilistic.
> Furthermore, they offer limited configurability.
>
> The `text` index provides a true inverted index with better search performance, more predictable behavior, and greater flexibility and performance compared with token-based Bloom filter indexes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "more predictable behavior" mean here? Do you mean the text index is deterministic, so no false-positives?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Member

@ahmadov ahmadov Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just remove that part since it's not based on the probabilistic data structure, so there is no "predictable behavior"? But it's up to you.

@vercel
Copy link

vercel bot commented Mar 2, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
clickhouse-docs Ready Ready Preview Mar 3, 2026 10:53am

Request Review


> Note: With text indexes generally availability (GA) starting from ClickHouse version 26.2, bloom filter–based indexes are not recommended anymore for full text search.
Although they are more compact, unfortunately they tend to produce false positives because they are probabilistic.
:::note
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here, we should keep it short and sweet.

L. 113 is fine.

Instead of l. 115-121, we can just say

:::note
The usage of `ngrambf_v1` indexes for full-text search is deprecated in ClickHouse versions >= 26.2 in favor of `text` indexes (see here for further details).
:::

here is a link to the text index docs.

(same below for tokenbf_v1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants