Problem description
The handle-pr-merge job commits the stripped CHANGELOG to the snapshot branch after the Release Review PR is merged. This commit triggers GitHub ruleset validation ("Restricts updates to workflow files"). If the validation server is slow, GitHub returns: "Unable to validate ... Rule was unable to be completed in 10 seconds".
The Octokit default retry only covers 5xx errors, not this transient validation timeout. The result is a failed workflow with no draft release created, even though the PR was successfully merged.
Possible evolution
Add retry logic (1-2 retries with short delay) around the snapshot branch commit in the handle-pr-merge job. The error message pattern Unable to validate.*Rule was unable to be completed is a clear signal for retry.
Alternative solution
Document the manual recovery: re-run the failed workflow (gh run rerun <id> --failed). The failure is rare — observed once during E2E testing across hundreds of workflow runs.
Additional context
The release process stalls in an intermediate state when this occurs: snapshot branch exists, PR is merged, but no draft release. Re-running the failed workflow succeeds immediately, confirming the transient nature of the error.
Problem description
The
handle-pr-mergejob commits the stripped CHANGELOG to the snapshot branch after the Release Review PR is merged. This commit triggers GitHub ruleset validation ("Restricts updates to workflow files"). If the validation server is slow, GitHub returns: "Unable to validate ... Rule was unable to be completed in 10 seconds".The Octokit default retry only covers 5xx errors, not this transient validation timeout. The result is a failed workflow with no draft release created, even though the PR was successfully merged.
Possible evolution
Add retry logic (1-2 retries with short delay) around the snapshot branch commit in the
handle-pr-mergejob. The error message patternUnable to validate.*Rule was unable to be completedis a clear signal for retry.Alternative solution
Document the manual recovery: re-run the failed workflow (
gh run rerun <id> --failed). The failure is rare — observed once during E2E testing across hundreds of workflow runs.Additional context
The release process stalls in an intermediate state when this occurs: snapshot branch exists, PR is merged, but no draft release. Re-running the failed workflow succeeds immediately, confirming the transient nature of the error.