Skip to content

Retry logic for transient GitHub ruleset validation timeouts #155

@hdamker

Description

@hdamker

Problem description

The handle-pr-merge job commits the stripped CHANGELOG to the snapshot branch after the Release Review PR is merged. This commit triggers GitHub ruleset validation ("Restricts updates to workflow files"). If the validation server is slow, GitHub returns: "Unable to validate ... Rule was unable to be completed in 10 seconds".

The Octokit default retry only covers 5xx errors, not this transient validation timeout. The result is a failed workflow with no draft release created, even though the PR was successfully merged.

Possible evolution

Add retry logic (1-2 retries with short delay) around the snapshot branch commit in the handle-pr-merge job. The error message pattern Unable to validate.*Rule was unable to be completed is a clear signal for retry.

Alternative solution

Document the manual recovery: re-run the failed workflow (gh run rerun <id> --failed). The failure is rare — observed once during E2E testing across hundreds of workflow runs.

Additional context

The release process stalls in an intermediate state when this occurs: snapshot branch exists, PR is merged, but no draft release. Re-running the failed workflow succeeds immediately, confirming the transient nature of the error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BacklogImplementation not considered short-term, long-term evolution issueenhancementNew feature or requestrelease automationRelated to the implementation or introduction of new release automation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions