Skip to content

Fix assertion failures in assert_tpch_result_equal due to float sort ambiguity#22378

Open
Matt711 wants to merge 2 commits intorapidsai:mainfrom
Matt711:bug/pdsds/q64-validation
Open

Fix assertion failures in assert_tpch_result_equal due to float sort ambiguity#22378
Matt711 wants to merge 2 commits intorapidsai:mainfrom
Matt711:bug/pdsds/q64-validation

Conversation

@Matt711
Copy link
Copy Markdown
Contributor

@Matt711 Matt711 commented May 5, 2026

Description

When comparing results, sorting by non-float columns alone can leave rows with equal non-float keys in an arbitrary order, causing assert_frame_equal to fail on valid results. This PR retries the comparison using float columns as a secondary sort key before raising a validation error.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@Matt711 Matt711 requested a review from a team as a code owner May 5, 2026 13:43
@Matt711 Matt711 requested a review from vyasr May 5, 2026 13:43
@Matt711 Matt711 added bug Something isn't working non-breaking Non-breaking change labels May 5, 2026
@github-actions github-actions Bot added Python Affects Python cuDF API. cudf-polars Issues specific to cudf-polars labels May 5, 2026
@GPUtester GPUtester moved this to In Progress in cuDF Python May 5, 2026
@TomAugspurger
Copy link
Copy Markdown
Contributor

TomAugspurger commented May 5, 2026

So my original thinking here was that sorting on the float columns should be unnecessary. Suppose you have a sequence of (key, value) pairs, sorted by key:

Key, Value
0, x
0, x-e
0, x+e

Assuming e (or more precisely, 2e) is negligible, then any permutation of those rows should be considered equal to any other. And so my hope was that we could sort by Key and then validate with

  1. assert_frame_equal on Key without any abs_tol
  2. assert_frame_equal on all the columns with abs_tol

we'd correctly implement that logic. It would pass as long as abs_tol >= 2e, and fail otherwise (on the second stage).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cudf-polars Issues specific to cudf-polars non-breaking Non-breaking change Python Affects Python cuDF API.

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

[BUG] Validate TPC-DS Q64

3 participants