fix: flaky Windows cross-user test and record_pr workflow error#178
Open
mwiebe wants to merge 1 commit into
Open
fix: flaky Windows cross-user test and record_pr workflow error#178mwiebe wants to merge 1 commit into
mwiebe wants to merge 1 commit into
Conversation
CreateEnvironmentBlock reads from the user's registry hive at
HKEY_USERS\<SID>. Windows unloads that hive asynchronously when the
profile's refcount drops to zero. Back-to-back logons for the same user
can race with an in-progress unload and surface as:
* 0x800703FA ERROR_KEY_DELETED - 'Illegal operation attempted on a
registry key that has been marked for deletion'
* 0x80070005 E_ACCESSDENIED - documented by Microsoft as a related
symptom of the same race
Observed as a flaky failure in the Cross-User (Windows) CI job on
test_cross_user_helpers_dir_dacl_protects_from_session_user after the
preceding test's session cleanup. All other cross-user tests in the same
run passed, confirming a transient race rather than a logic bug.
Wrap CreateEnvironmentBlock in environment_for_user with a bounded
retry: 5 attempts with geometric backoff (20/40/80/160 ms,
worst-case ~300 ms). Only the two transient HRESULTs above trigger a
retry; every other error returns immediately so real failures still
surface promptly.
Preserves the defensive null-pointer guard added in 8a48851 by moving
it into the Ok(()) arm of the retry loop - block_ptr is freshly
initialized on each attempt and must be checked before being passed to
parse_environment_block or DestroyEnvironmentBlock.
The upstream Python openjd-sessions has a first-hand comment describing
this same class of error in _win32/_locate_executable.py; its
mitigation is forcing handle cleanup between calls, which is more
fragile than a bounded retry.
Updates specs/sessions/subprocess.md with a new 'CreateEnvironmentBlock
profile-unload race' subsection documenting the retry policy.
The reusable workflow at
OpenJobDescription/.github/.github/workflows/reusable_record_pr_details.yml
declares 'permissions: contents: read' in its workflow_call block.
GitHub Actions enforces that a caller cannot invoke a reusable workflow
requesting more permissions than the caller itself has granted, and the
workflow-root 'permissions: {}' hardening added in 8a48851 zeroed out
every permission - including the 'contents: read' the reusable workflow
needs.
Add a job-level 'permissions: contents: read' block on the job that
invokes the reusable workflow, matching the pattern already used by
codeql.yml, responded.yml, and stale_prs_and_issues.yml.
Resolves the 'requesting contents: read, but is only allowed
contents: none' validation error.
* cargo check -p openjd-sessions (Linux; win32.rs is #[cfg(windows)])
* cargo clippy -p openjd-sessions --all-targets -- -D warnings
* cargo fmt --all -- --check
* cargo test -p openjd-sessions --lib
* python3 YAML lint on .github/workflows/record_pr.yml
Signed-off-by: Mark <399551+mwiebe@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What was the problem/requirement? (What/Why)
Root caused a flaky test to an issue needing a fix, and need to fix the CI reusable_record_pr_details.yml workflow.
What was the solution? (How)
CreateEnvironmentBlock reads from the user's registry hive at HKEY_USERS<SID>. Windows unloads that hive asynchronously when the profile's refcount drops to zero. Back-to-back logons for the same user can race with an in-progress unload and surface as:
Observed as a flaky failure in the Cross-User (Windows) CI job on test_cross_user_helpers_dir_dacl_protects_from_session_user after the preceding test's session cleanup. All other cross-user tests in the same run passed, confirming a transient race rather than a logic bug.
Wrap CreateEnvironmentBlock in environment_for_user with a bounded retry: 5 attempts with geometric backoff (20/40/80/160 ms, worst-case ~300 ms). Only the two transient HRESULTs above trigger a retry; every other error returns immediately so real failures still surface promptly.
Preserves the defensive null-pointer guard added in 8a48851 by moving it into the Ok(()) arm of the retry loop - block_ptr is freshly initialized on each attempt and must be checked before being passed to parse_environment_block or DestroyEnvironmentBlock.
The upstream Python openjd-sessions has a first-hand comment describing this same class of error in _win32/_locate_executable.py; its mitigation is forcing handle cleanup between calls, which is more fragile than a bounded retry.
Updates specs/sessions/subprocess.md with a new 'CreateEnvironmentBlock profile-unload race' subsection documenting the retry policy.
The reusable workflow at
OpenJobDescription/.github/.github/workflows/reusable_record_pr_details.yml declares 'permissions: contents: read' in its workflow_call block. GitHub Actions enforces that a caller cannot invoke a reusable workflow requesting more permissions than the caller itself has granted, and the workflow-root 'permissions: {}' hardening added in 8a48851 zeroed out every permission - including the 'contents: read' the reusable workflow needs.
Add a job-level 'permissions: contents: read' block on the job that invokes the reusable workflow, matching the pattern already used by codeql.yml, responded.yml, and stale_prs_and_issues.yml.
Resolves the 'requesting contents: read, but is only allowed
contents: none' validation error.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.