Fix unsafe shutdown pattern in test_launch_ros#542
Conversation
Signed-off-by: Michael Carroll <mjcarroll.oss@gmail.com>
|
Pulls: ros2/launch#940, #542 |
|
@ahcorde I'm not entirely convinced on this one, what do you think? |
fujitatomoya
left a comment
There was a problem hiding this comment.
AFAIS, the original issue here is that if the launch task happens to complete during the sleep, ls.shutdown() is never called, and the function returns with the LaunchService still wired up - that is including its background ROS adapter thread.
this thread keeps spinning rclpy, the test framework moves on, fixtures get torn down, rclpy gets shut down out from under it, and on Windows you get an access violation in the executor's wait loop. (technically this could happen to any platform, i guess)
even with the fix, this seems to have the same problem. if launch_task is done, we still don't call ls.shutdown(), the run_until_complete(launch_task) just returns immediately, and goes to the same situation?
I'm inclined to agree here. I'm going to mark this as draft and poke at it some more. |
Description
Ensure loop always waits for
launch_taskto complete, even if it finished early. This prevents backgroundROSAdapterthreads from accessing destroyed context/nodes, which causes intermittent segfaults (Access Violation 0xC0000005) on Windows.The Problem
Several tests in
test_launch_roswere skipping thels.shutdown()call if thelaunch_taskhad already completed. However, even if the task is done, theLaunchServicemay still have active resources, specifically the background thread started byROSAdapter. Whenls.shutdown()is skipped, theLaunchServiceand itsLaunchContextare garbage collected as the test finishes, while the background thread continues spinning. When it eventually attempts to access the now-destroyedrclpy.ContextorNode(e.g. during await_setoperation), it triggers a segfault.Fixes #539
Reference
Failed build example: https://ci.ros2.org/job/nightly_win_rel/3788/
Traceback:
Assisted-by: Gemini CLI:2.0-Flash