Add force_filter_selections to restore pushdown_filters behavior prior to parquet 57.1.0 upgrade#19003
Conversation
| /// pushdown_filters is enabled. If false, the reader will automatically | ||
| /// choose between a RowSelection and a Bitmap based on the number and | ||
| /// pattern of selected rows. | ||
| pub force_filter_selections: bool, default = false |
There was a problem hiding this comment.
@rluvaton suggests in #18820 (comment):
What do you think of making it an enum instead to allow for future additions without breaking changes?
(that enum should also be non exhaustive to avoid adding a variant a breaking change)
I also see that the
with_row_selection_policyalready accept enum.making it an enum also allow to force mask or configure the threshold in the auto policy. this is also useful for testing to force specific path when creating a reproduction test for a bug
I personally think it is better as a flag (escape valve) as I don't forsee any reason to try and tune the parameters but would be happy to hear other opinions
6ae5d68 to
e5ef31f
Compare
| // reads more than necessary from the cache as then another bitmap is applied | ||
| // See https://github.com/apache/datafusion/pull/18820 for setting and workaround | ||
| expected_records: 7, | ||
| expected_records: 7, // reads more than necessary from the cache as then another bitmap is applied |
There was a problem hiding this comment.
this test shows the change in behavior after the parquet 57.1.0 upgrade. The previous result with 57.0.0 was 4
The old result can now be obtained by setting force_filter_selections to true
force_filter_selections to restore pushdown_filters behaviorforce_filter_selections to restore pushdown_filters behavior prior to parquet 57.1.0 upgrade
|
Thank you @rluvaton -- I plan to merge this PR tomorrow unless anyone would like more time to review |
Draft until #18820 is mergedWhich issue does this PR close?
arrow,parquetto57.1.0#18820Rationale for this change
The parquet 57.1.0 upgrade includes a new adaptive filter from @hhhizzz :
Our testing shows this is faster in all cases, but I want to have an escape valve for people to turn it off if they hit some issue.
I had originally included this in #18820 but @rluvaton suggested it would be easier to understand as its own PR in #18820 (review)
What changes are included in this PR?
force_filter_selectionsconfig settingAre these changes tested?
Yes
Are there any user-facing changes?
A new boolean flag