Skip to content

perf(ptv3): simplify SerializedPooling export clustering#206

Draft
mojomex wants to merge 2 commits into
mainfrom
ptv3-serialized-pooling-pr
Draft

perf(ptv3): simplify SerializedPooling export clustering#206
mojomex wants to merge 2 commits into
mainfrom
ptv3-serialized-pooling-pr

Conversation

@mojomex
Copy link
Copy Markdown
Collaborator

@mojomex mojomex commented May 5, 2026

Summary

This changes the PTv3 SerializedPooling export path to derive pooling clusters directly from the existing serialized order instead of using the export-only unique path.

It also adds a focused module test that checks export mode against train mode for SerializedPooling.

What changed

  • remove the export-path dependency on unique in SerializedPooling
  • build cluster, indices, and idx_ptr by scanning pooled-code boundaries along the existing serialized order
  • keep the existing pooled-order/inverse behavior unchanged after clustering
  • add projects/PTv3/tests/test_serialized_pooling.py to verify export-mode and train-time outputs match for the changed module

Validation

  • pytest -q projects/PTv3/tests/test_serialized_pooling.py

Benchmark context

This exact SerializedPooling change was benchmarked separately on an inference box. That rerun measured GPU total replicate-mean latency at 24.868 ms versus the original 26.937 ms baseline, a 2.069 ms (7.68%) improvement.

The benchmark/profile artifacts are intentionally not part of this PR.

@mojomex mojomex changed the title ptv3: simplify SerializedPooling export clustering perf(ptv3): simplify SerializedPooling export clustering May 5, 2026
@mojomex mojomex self-assigned this May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant