Skip to content

Conversation

@RocMarshal
Copy link
Contributor

@RocMarshal RocMarshal commented Jan 20, 2026

What is the purpose of the change

[FLINK-38943][runtime] Support Adaptive Partition Selection for RescalePartitioner and RebalancePartitioner

Brief change log

Introduce the following:

  • config options
    • taskmanager.network.adaptive-partitioner.enabled
    • taskmanager.network.adaptive-partitioner.max-traverse-size
  • AdaptiveLoadBasedRecordWriter.java for adaptive partition

Verifying this change

This change added tests and can be verified as follows:

  • AdaptiveLoadBasedRecordWriterTest.java

The benchmark about it is here

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

flinkbot commented Jan 20, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@RocMarshal
Copy link
Contributor Author

Hi, @davidradl @X-czh Could you help take a look ? thx a lot.

@RocMarshal
Copy link
Contributor Author

@flinkbot run azure

@X-czh
Copy link
Contributor

X-czh commented Jan 20, 2026

@RocMarshal Thanks for the quick contribution. I'll take a look later this week.

<td><h5>taskmanager.network.adaptive-partitioner.enabled</h5></td>
<td style="word-wrap: break-word;">false</td>
<td>Boolean</td>
<td>Whether to enabled adaptive partitioner feature for rescale and rebalance partitioners based on the loading of the downstream tasks.</td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: enabled => enable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @davidradl
The naming is following the conclusion here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RocMarshal understood- but the English is not correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @davidradl ,
Apologizes for the misunderstanding, and thank you for the correction.
Updated.

<td><h5>taskmanager.network.adaptive-partitioner.max-traverse-size</h5></td>
<td style="word-wrap: break-word;">4</td>
<td>Integer</td>
<td>How many channels to traverse at most when looking for the idlest channel for rescale and rebalance partitioners when enabled <code class="highlighter-rouge">taskmanager.network.adaptive-partitioner.enabled</code>.</td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should document the default.
nit: How many channels to traverse at most -> Maximum number of channels to traverse,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the line 19 is the default value description.

.withDescription(
Description.builder()
.text(
"How many channels to traverse at most when looking for the idlest channel for rescale and rebalance partitioners when enabled %s.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: when enabled %s.", -> when %s is enabled .",

super(writer, timeout, taskName);
this.numberOfSubpartitions = writer.getNumberOfSubpartitions();
this.maxTraverseSize = Math.min(maxTraverseSize, numberOfSubpartitions);
this.currentChannel = -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: does this need to be in the constructor - why not initialise the variable with -1.


public RecordWriterBuilder<T> setMaxTraverseSize(int maxTraverseSize) {
Preconditions.checkState(
maxTraverseSize > 1, "The maxTraverseSize must be greater than 1.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious why it cannot be one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, Because if it is set to 1, this is equivalent to not enabling the adaptive partition selection feature.

Actively enabling an option that does not even provide theoretical benefits is generally not acceptable.

Unless the number of downstream partitions is exactly 1.

Pls correct me if I am wrong.


public RecordWriterBuilder<T> setMaxTraverseSize(int maxTraverseSize) {
Preconditions.checkState(
maxTraverseSize > 1, "The maxTraverseSize must be greater than 1.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a better place to check this, in configuration validation - rather than the setter?

Copy link
Contributor Author

@RocMarshal RocMarshal Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it before the RecordWriter construction

Copy link
Contributor Author

@RocMarshal RocMarshal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @davidradl for the review.
I updated the related lines based on your comments.
PTAL ~

</tr>
<tr>
<td><h5>taskmanager.network.adaptive-partitioner.max-traverse-size</h5></td>
<td style="word-wrap: break-word;">4</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the line is the default value description.

<td><h5>taskmanager.network.adaptive-partitioner.enabled</h5></td>
<td style="word-wrap: break-word;">false</td>
<td>Boolean</td>
<td>Whether to enabled adaptive partitioner feature for rescale and rebalance partitioners based on the loading of the downstream tasks.</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @davidradl
The naming is following the conclusion here

<td><h5>taskmanager.network.adaptive-partitioner.max-traverse-size</h5></td>
<td style="word-wrap: break-word;">4</td>
<td>Integer</td>
<td>How many channels to traverse at most when looking for the idlest channel for rescale and rebalance partitioners when enabled <code class="highlighter-rouge">taskmanager.network.adaptive-partitioner.enabled</code>.</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the line 19 is the default value description.


public RecordWriterBuilder<T> setMaxTraverseSize(int maxTraverseSize) {
Preconditions.checkState(
maxTraverseSize > 1, "The maxTraverseSize must be greater than 1.");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, Because if it is set to 1, this is equivalent to not enabling the adaptive partition selection feature.

Actively enabling an option that does not even provide theoretical benefits is generally not acceptable.

Unless the number of downstream partitions is exactly 1.

Pls correct me if I am wrong.


public RecordWriterBuilder<T> setMaxTraverseSize(int maxTraverseSize) {
Preconditions.checkState(
maxTraverseSize > 1, "The maxTraverseSize must be greater than 1.");
Copy link
Contributor Author

@RocMarshal RocMarshal Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it before the RecordWriter construction

@github-actions github-actions bot added the community-reviewed PR has been reviewed by the community. label Jan 21, 2026
@RocMarshal RocMarshal requested a review from davidradl January 22, 2026 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-reviewed PR has been reviewed by the community.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants