Skip to content

Fix unsafe shutdown pattern in test_launch_ros#542

Closed
mjcarroll wants to merge 1 commit into
ros2:rollingfrom
mjcarroll:fix/test-launch-ros-flakiness
Closed

Fix unsafe shutdown pattern in test_launch_ros#542
mjcarroll wants to merge 1 commit into
ros2:rollingfrom
mjcarroll:fix/test-launch-ros-flakiness

Conversation

@mjcarroll
Copy link
Copy Markdown
Member

@mjcarroll mjcarroll commented May 5, 2026

Description

Ensure loop always waits for launch_task to complete, even if it finished early. This prevents background ROSAdapter threads from accessing destroyed context/nodes, which causes intermittent segfaults (Access Violation 0xC0000005) on Windows.

The Problem

Several tests in test_launch_ros were skipping the ls.shutdown() call if the launch_task had already completed. However, even if the task is done, the LaunchService may still have active resources, specifically the background thread started by ROSAdapter. When ls.shutdown() is skipped, the LaunchService and its LaunchContext are garbage collected as the test finishes, while the background thread continues spinning. When it eventually attempts to access the now-destroyed rclpy.Context or Node (e.g. during a wait_set operation), it triggers a segfault.

Fixes #539

Reference

Failed build example: https://ci.ros2.org/job/nightly_win_rel/3788/
Traceback:

Windows fatal exception: access violation

Thread 0x00001110 (most recent call first):
  File "C:\ci\ws\install\Lib\site-packages\rclpy\executors.py", line 872 in _wait_for_ready_callbacks
  File "C:\ci\ws\install\Lib\site-packages\rclpy\executors.py", line 970 in wait_for_ready_callbacks
  File "C:\ci\ws\install\Lib\site-packages\rclpy\executors.py", line 1000 in _spin_once_impl
  File "C:\ci\ws\install\Lib\site-packages\rclpy\executors.py", line 1020 in spin_once
  File "C:\ci\ws\install\Lib\site-packages\launch_ros\ros_adapters.py", line 77 in _run

Assisted-by: Gemini CLI:2.0-Flash

Signed-off-by: Michael Carroll <mjcarroll.oss@gmail.com>
@mjcarroll
Copy link
Copy Markdown
Member Author

Pulls: ros2/launch#940, #542
Gist: https://gist.githubusercontent.com/mjcarroll/892025e2060edeba8af92073a0b8b821/raw/22c36fb0c652be55e73dc369b42ba37d513435ee/ros2.repos
BUILD args: --packages-above-and-dependencies launch_pytest launch_yaml test_launch_ros
TEST args: --packages-above launch_pytest launch_yaml test_launch_ros
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/19148

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

@mjcarroll
Copy link
Copy Markdown
Member Author

@ahcorde I'm not entirely convinced on this one, what do you think?

Copy link
Copy Markdown
Contributor

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIS, the original issue here is that if the launch task happens to complete during the sleep, ls.shutdown() is never called, and the function returns with the LaunchService still wired up - that is including its background ROS adapter thread.
this thread keeps spinning rclpy, the test framework moves on, fixtures get torn down, rclpy gets shut down out from under it, and on Windows you get an access violation in the executor's wait loop. (technically this could happen to any platform, i guess)

even with the fix, this seems to have the same problem. if launch_task is done, we still don't call ls.shutdown(), the run_until_complete(launch_task) just returns immediately, and goes to the same situation?

@mjcarroll
Copy link
Copy Markdown
Member Author

even with the fix, this seems to have the same problem.

I'm inclined to agree here. I'm going to mark this as draft and poke at it some more.

@mjcarroll mjcarroll marked this pull request as draft May 7, 2026 10:54
@mjcarroll mjcarroll closed this May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🧑‍🌾 test_launch_ros generating build regressions in nightly_win by a windows access violation

3 participants