Skip to content

Implement min_count for groupby reductions#22372

Open
galipremsagar wants to merge 1 commit intorapidsai:pandas3from
galipremsagar:groupby_min_count
Open

Implement min_count for groupby reductions#22372
galipremsagar wants to merge 1 commit intorapidsai:pandas3from
galipremsagar:groupby_min_count

Conversation

@galipremsagar
Copy link
Copy Markdown
Contributor

@galipremsagar galipremsagar commented May 4, 2026

Summary

Split out from #22289. GroupBy._reduce previously raised NotImplementedError whenever min_count != 0, forcing cudf.pandas to fall back to the slow path for groupby.sum(min_count=...) and similar calls.

Implementation (python/cudf/cudf/core/groupby/groupby.py)

Run the requested aggregation, then mask result rows whose per-group non-null count (computed via self.agg(\"count\")) is below min_count. Supports both Series and DataFrame results.

Tests

python/cudf/cudf/tests/groupby/test_reductions.py:

  • test_groupby_reduce_min_count over sum, min, max, first, last for min_count values 0, 1, 2, 3, 5.
  • test_groupby_series_reduce_min_count for Series.groupby paths.

Relationship to #22289

One of the four split PRs requested in the review on #22289. No conftest removals because the existing pandas-tests entries that fail with min_count errors also need the other split PRs (string sum, bool any/all, grouping-key exclusion) before they can be unmarked.

``GroupBy._reduce`` previously raised ``NotImplementedError`` when
``min_count != 0``, forcing cudf.pandas to fall back to the slow path.
Now run the requested aggregation, then mask result rows whose
per-group non-null count is below ``min_count`` (computed via
``self.agg("count")``). Supports both ``Series`` and ``DataFrame``
results.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@galipremsagar galipremsagar requested a review from a team as a code owner May 4, 2026 20:06
@galipremsagar galipremsagar requested review from Matt711 and mroeschke and removed request for a team May 4, 2026 20:06
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 4, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions Bot added the Python Affects Python cuDF API. label May 4, 2026
@GPUtester GPUtester moved this to In Progress in cuDF Python May 4, 2026
@galipremsagar galipremsagar added bug Something isn't working non-breaking Non-breaking change 3 - Ready for Review Ready for review by team labels May 4, 2026
@galipremsagar
Copy link
Copy Markdown
Contributor Author

/okay to test 5f3e8ac

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 - Ready for Review Ready for review by team bug Something isn't working non-breaking Non-breaking change Python Affects Python cuDF API.

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants