Skip to content

fix: Graceful two-stage subprocess timeout for Windows process trees#151

Merged
Jeomon merged 2 commits intoCursorTouch:mainfrom
JezaChen:fix/graceful-subprocess-timeout
Mar 27, 2026
Merged

fix: Graceful two-stage subprocess timeout for Windows process trees#151
Jeomon merged 2 commits intoCursorTouch:mainfrom
JezaChen:fix/graceful-subprocess-timeout

Conversation

@JezaChen
Copy link
Collaborator

@JezaChen JezaChen commented Mar 27, 2026

Closes #124, #146

Problem

When execute_command runs a PowerShell command that spawns a child process (e.g. pwsh -NoProfile -Command python, or a tight loop like pwsh -NoProfile -Command "python -c 'while True: print(1)'"), the standard subprocess.run(timeout=...) path is not sufficient on Windows.

After the timeout fires, subprocess.run tries to terminate the top-level child (pwsh) and then drain its stdout/stderr pipes. However, because pwsh may have already spawned a grandchild process (python) that inherited the pipe handles, those handles remain open even after pwsh exits. The parent (windows-mcp) then blocks indefinitely on communicate() waiting for EOF — which never comes.

In addition, when the child process continuously writes to stdout (e.g. an infinite print loop), memory usage grows unboundedly until the timeout is reached, and the blocked drain phase compounds the problem further.

This may be the root cause of the hangs and memory spikes reported in #124 and #146.

Solution

Replace subprocess.run in execute_command with a new helper run_with_graceful_timeout that implements a two-stage shutdown strategy:

  1. Graceful stop — send CTRL_BREAK_EVENT to the child process group, giving console applications (like pwsh) a chance to exit cleanly and close their pipe handles. Wait up to grace_period seconds (default: 2 s).
  2. Force kill — if the process tree still has not exited after the grace period, run taskkill /PID <pid> /T /F to forcibly terminate the entire process tree, then drain any remaining pipe output.

The helper also sets CREATE_NEW_PROCESS_GROUP on the spawned process, which is a Windows prerequisite for CTRL_BREAK_EVENT to be delivered to the child group rather than the current process.

The overall API mirrors subprocess.run exactly — same positional and keyword arguments, same return type (CompletedProcess), same exceptions — so the call site in execute_command required only a one-line swap.

Changes

  • src/windows_mcp/desktop/utils.py
    • New check_pid_exists(pid) helper — checks whether a PID is alive and not a zombie (used in diagnostics logging)
    • New run_with_graceful_timeout(...) — drop-in replacement for subprocess.run with two-stage Windows-aware timeout handling
  • src/windows_mcp/desktop/service.py
    • execute_command: replace subprocess.run(...) with run_with_graceful_timeout(...)

Local test result

  • Passed in Claude code
  • Passed in MCP Inspector

Follow-up

  • Add unit tests for run_with_graceful_timeout covering the normal path, the graceful-shutdown path, and the force-kill path
  • Add an agent-runnable skill (e.g. run-tests) so the MCP agent can test itself

Replace subprocess.run with run_with_graceful_timeout in execute_command
to handle Windows-specific edge cases where child process trees survive
timeout termination and block on pipe cleanup.

The new helper sends CTRL_BREAK_EVENT first, then falls back to
taskkill /T /F if the process tree does not exit within the grace period.

Closes CursorTouch#124, CursorTouch#146
Copilot AI review requested due to automatic review settings March 27, 2026 11:30
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses Windows hangs when PowerShell spawns child/grandchild processes and a timeout occurs, by replacing subprocess.run(timeout=...) in execute_command with a Windows-specific helper that attempts a graceful process-group shutdown before force-killing the process tree.

Changes:

  • Add run_with_graceful_timeout(...) implementing a two-stage timeout shutdown (CTRL_BREAK_EVENT → taskkill /T /F).
  • Add check_pid_exists(pid) for diagnostics logging during timeout handling.
  • Update execute_command to use run_with_graceful_timeout(...) instead of subprocess.run(...).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/windows_mcp/desktop/utils.py Introduces the new timeout helper and PID-existence helper used during shutdown diagnostics.
src/windows_mcp/desktop/service.py Switches execute_command to the new timeout helper for more reliable Windows process-tree termination.
Comments suppressed due to low confidence (1)

src/windows_mcp/desktop/service.py:435

  • execute_command still uses capture_output=True, which means stdout/stderr are read into memory via pipes and communicate(). For commands that continuously write (e.g. infinite print loops), this can still grow memory unboundedly until the timeout triggers, so the “memory spikes” described in the PR aren’t actually mitigated. If this needs to be addressed, consider redirecting stdout/stderr to temp files or streaming with a bounded buffer instead of PIPE capture.
            result = run_with_graceful_timeout(
                args,
                stdin=subprocess.DEVNULL,  # Prevent child processes from inheriting the MCP pipe stdin
                capture_output=True,  # No errors='ignore' - let subprocess return bytes
                timeout=timeout,
                cwd=os.path.expanduser(path="~"),
                env=env,
            )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +202 to +205
exc2.add_note(
f"Process killed after failing to exit gracefully within {grace_period} seconds."
)
raise exc2
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the grace-period communicate(timeout=grace_period) also times out, the code raises exc2, which is a new TimeoutExpired whose .timeout value reflects grace_period rather than the original timeout. This diverges from subprocess.run(...) behavior and from the docstring claim of mirroring its API. Consider always re-raising the original timeout exception (exc1) while still attaching any collected stdout/stderr/notes from the later shutdown steps.

Copilot uses AI. Check for mistakes.
Comment on lines +186 to +188
logger.debug(
f"Process {process.pid} (exist: {check_pid_exists(process.pid)}) did not exit gracefully after {grace_period} seconds, killing it and all child processes..."
)
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This debug log uses an f-string that eagerly calls check_pid_exists(process.pid) even when debug logging is disabled, which adds an unnecessary psutil.Process(...) lookup on every timeout. Prefer guarding with logger.isEnabledFor(logging.DEBUG) or logging without the extra PID-exists probe unless needed.

Suggested change
logger.debug(
f"Process {process.pid} (exist: {check_pid_exists(process.pid)}) did not exit gracefully after {grace_period} seconds, killing it and all child processes..."
)
if logger.isEnabledFor(logging.DEBUG):
logger.debug(
"Process %s (exist: %s) did not exit gracefully after %s seconds, killing it and all child processes...",
process.pid,
check_pid_exists(process.pid),
grace_period,
)

Copilot uses AI. Check for mistakes.
Comment on lines +96 to +104
def run_with_graceful_timeout(
*popenargs,
input=None,
capture_output=False,
timeout=None,
check=False,
grace_period: float = 2.0,
**kwargs,
):
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_with_graceful_timeout is newly added behavior with multiple branches (normal completion, initial timeout -> CTRL_BREAK, and timeout -> taskkill). The repository has existing unit tests for windows_mcp.desktop.utils, but this new helper is untested; please add Windows-gated tests that cover at least the normal path and the timeout path(s) to prevent regressions (especially around not hanging on pipe drainage).

Copilot uses AI. Check for mistakes.
@Jeomon Jeomon merged commit 75a4f0d into CursorTouch:main Mar 27, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

windows MCP server hangs up in claude

3 participants