fix: Graceful two-stage subprocess timeout for Windows process trees#151
Conversation
Replace subprocess.run with run_with_graceful_timeout in execute_command to handle Windows-specific edge cases where child process trees survive timeout termination and block on pipe cleanup. The new helper sends CTRL_BREAK_EVENT first, then falls back to taskkill /T /F if the process tree does not exit within the grace period. Closes CursorTouch#124, CursorTouch#146
There was a problem hiding this comment.
Pull request overview
This PR addresses Windows hangs when PowerShell spawns child/grandchild processes and a timeout occurs, by replacing subprocess.run(timeout=...) in execute_command with a Windows-specific helper that attempts a graceful process-group shutdown before force-killing the process tree.
Changes:
- Add
run_with_graceful_timeout(...)implementing a two-stage timeout shutdown (CTRL_BREAK_EVENT → taskkill /T /F). - Add
check_pid_exists(pid)for diagnostics logging during timeout handling. - Update
execute_commandto userun_with_graceful_timeout(...)instead ofsubprocess.run(...).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/windows_mcp/desktop/utils.py |
Introduces the new timeout helper and PID-existence helper used during shutdown diagnostics. |
src/windows_mcp/desktop/service.py |
Switches execute_command to the new timeout helper for more reliable Windows process-tree termination. |
Comments suppressed due to low confidence (1)
src/windows_mcp/desktop/service.py:435
execute_commandstill usescapture_output=True, which means stdout/stderr are read into memory via pipes andcommunicate(). For commands that continuously write (e.g. infinite print loops), this can still grow memory unboundedly until the timeout triggers, so the “memory spikes” described in the PR aren’t actually mitigated. If this needs to be addressed, consider redirecting stdout/stderr to temp files or streaming with a bounded buffer instead ofPIPEcapture.
result = run_with_graceful_timeout(
args,
stdin=subprocess.DEVNULL, # Prevent child processes from inheriting the MCP pipe stdin
capture_output=True, # No errors='ignore' - let subprocess return bytes
timeout=timeout,
cwd=os.path.expanduser(path="~"),
env=env,
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| exc2.add_note( | ||
| f"Process killed after failing to exit gracefully within {grace_period} seconds." | ||
| ) | ||
| raise exc2 |
There was a problem hiding this comment.
When the grace-period communicate(timeout=grace_period) also times out, the code raises exc2, which is a new TimeoutExpired whose .timeout value reflects grace_period rather than the original timeout. This diverges from subprocess.run(...) behavior and from the docstring claim of mirroring its API. Consider always re-raising the original timeout exception (exc1) while still attaching any collected stdout/stderr/notes from the later shutdown steps.
| logger.debug( | ||
| f"Process {process.pid} (exist: {check_pid_exists(process.pid)}) did not exit gracefully after {grace_period} seconds, killing it and all child processes..." | ||
| ) |
There was a problem hiding this comment.
This debug log uses an f-string that eagerly calls check_pid_exists(process.pid) even when debug logging is disabled, which adds an unnecessary psutil.Process(...) lookup on every timeout. Prefer guarding with logger.isEnabledFor(logging.DEBUG) or logging without the extra PID-exists probe unless needed.
| logger.debug( | |
| f"Process {process.pid} (exist: {check_pid_exists(process.pid)}) did not exit gracefully after {grace_period} seconds, killing it and all child processes..." | |
| ) | |
| if logger.isEnabledFor(logging.DEBUG): | |
| logger.debug( | |
| "Process %s (exist: %s) did not exit gracefully after %s seconds, killing it and all child processes...", | |
| process.pid, | |
| check_pid_exists(process.pid), | |
| grace_period, | |
| ) |
| def run_with_graceful_timeout( | ||
| *popenargs, | ||
| input=None, | ||
| capture_output=False, | ||
| timeout=None, | ||
| check=False, | ||
| grace_period: float = 2.0, | ||
| **kwargs, | ||
| ): |
There was a problem hiding this comment.
run_with_graceful_timeout is newly added behavior with multiple branches (normal completion, initial timeout -> CTRL_BREAK, and timeout -> taskkill). The repository has existing unit tests for windows_mcp.desktop.utils, but this new helper is untested; please add Windows-gated tests that cover at least the normal path and the timeout path(s) to prevent regressions (especially around not hanging on pipe drainage).
Closes #124, #146
Problem
When
execute_commandruns a PowerShell command that spawns a child process (e.g.pwsh -NoProfile -Command python, or a tight loop likepwsh -NoProfile -Command "python -c 'while True: print(1)'"), the standardsubprocess.run(timeout=...)path is not sufficient on Windows.After the timeout fires,
subprocess.runtries to terminate the top-level child (pwsh) and then drain its stdout/stderr pipes. However, becausepwshmay have already spawned a grandchild process (python) that inherited the pipe handles, those handles remain open even afterpwshexits. The parent (windows-mcp) then blocks indefinitely oncommunicate()waiting for EOF — which never comes.In addition, when the child process continuously writes to stdout (e.g. an infinite print loop), memory usage grows unboundedly until the timeout is reached, and the blocked drain phase compounds the problem further.
This may be the root cause of the hangs and memory spikes reported in #124 and #146.
Solution
Replace
subprocess.runinexecute_commandwith a new helperrun_with_graceful_timeoutthat implements a two-stage shutdown strategy:CTRL_BREAK_EVENTto the child process group, giving console applications (likepwsh) a chance to exit cleanly and close their pipe handles. Wait up tograce_periodseconds (default: 2 s).taskkill /PID <pid> /T /Fto forcibly terminate the entire process tree, then drain any remaining pipe output.The helper also sets
CREATE_NEW_PROCESS_GROUPon the spawned process, which is a Windows prerequisite forCTRL_BREAK_EVENTto be delivered to the child group rather than the current process.The overall API mirrors
subprocess.runexactly — same positional and keyword arguments, same return type (CompletedProcess), same exceptions — so the call site inexecute_commandrequired only a one-line swap.Changes
src/windows_mcp/desktop/utils.pycheck_pid_exists(pid)helper — checks whether a PID is alive and not a zombie (used in diagnostics logging)run_with_graceful_timeout(...)— drop-in replacement forsubprocess.runwith two-stage Windows-aware timeout handlingsrc/windows_mcp/desktop/service.pyexecute_command: replacesubprocess.run(...)withrun_with_graceful_timeout(...)Local test result
Follow-up
run_with_graceful_timeoutcovering the normal path, the graceful-shutdown path, and the force-kill pathrun-tests) so the MCP agent can test itself