Skip to content

Conversation

@rkhachatryan
Copy link
Contributor

No description provided.

@flinkbot
Copy link
Collaborator

flinkbot commented Jan 19, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@rkhachatryan rkhachatryan marked this pull request as ready for review January 19, 2026 12:55
<td><h5>execution.checkpointing.pause-checkpoints-if-tasks-not-running</h5></td>
<td style="word-wrap: break-word;">false</td>
<td>Boolean</td>
<td>When enabled, checkpoints will be paused as long as some tasks are not running (and not finished), even if the job status is running. This help to prevent an additional delay on job startup</td>
Copy link
Contributor

@davidradl davidradl Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am struggling with these sentences.

  • nit it would be simpler to say if any tasks rather than as long as some tasks
  • What does it mean to say a task is not running, but has a status of running? I think we need to talk about non blocking output, maybe more words outside of the config definition to say how to recognise non blocking jobs - as this relates to the subset of jobs we are effecting with this option.
  • nit: This help -> This helps
  • the text says "prevent an additional delay on job startup". But the logic does not seem startup specific? Can we get into any other non startup related scenarios where this might be helpful
  • Can we guide the user as to when they should consider using this option and any downside to using it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this PR with @1996fanrui offline and agreed to remove the option because the new behavior is the only desirable (until FLIP-457). After FLIP-457, this logic will be changed.

this.coordinator = checkNotNull(coordinator);
this.allTasksOutputNonBlocking = allTasksOutputNonBlocking;
return (jobId, newJobStatus, timestamp) -> {
if (newJobStatus == JobStatus.RUNNING && allTasksOutputNonBlocking) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I an curious - should we be checking the new config option here? I thought this is the condition we wanted to pause checkpointing for? Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check is in CheckpointCoordinator.createActivatorDeactivator, but as mentioned above, I'm going to remove this option.

new CheckpointCoordinatorDeActivator(this, allTasksOutputNonBlocking);
if (deActivator == null) {
if (pauseCheckpointsIfTasksNotRunning && allTasksOutputNonBlocking) {
deActivator = new ExecutionTrackingCheckpointCoordinatorDeActivator(this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a comment to indicate why this if body results in a pause.

@github-actions github-actions bot added the community-reviewed PR has been reviewed by the community. label Jan 19, 2026
@rkhachatryan rkhachatryan force-pushed the f38940 branch 3 times, most recently from 2012346 to bb34b02 Compare January 21, 2026 17:08
Comment on lines 34 to 43
static CheckpointCoordinatorDeActivator forJobStatus(
CheckpointCoordinator coordinator, boolean allTasksOutputNonBlocking) {
this.coordinator = checkNotNull(coordinator);
this.allTasksOutputNonBlocking = allTasksOutputNonBlocking;
return (jobId, newJobStatus, timestamp) -> {
if (newJobStatus == JobStatus.RUNNING && allTasksOutputNonBlocking) {
// start the checkpoint scheduler if there is no blocking edge
coordinator.startCheckpointScheduler();
} else {
// anything else should stop the trigger for now
coordinator.stopCheckpointScheduler();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forJobStatus is only called once, and allTasksOutputNonBlocking is false. so forJobStatus can be simplified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I've simplified the code, PTAL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-reviewed PR has been reviewed by the community.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants