RDKEMW-13969: Fix hibernating state handling in stopContainer#421
RDKEMW-13969: Fix hibernating state handling in stopContainer#421ks734 wants to merge 3 commits intordkcentral:topic/RDKEMW-13941from
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adjusts DobbyManager::stopContainer() to safely handle stopping containers that are actively in the Hibernating state, avoiding races with the hibernation thread by transitioning state, waking processes, and only then proceeding with the stop/kill logic.
Changes:
- Switches
stopContainer()locking tostd::unique_lockto allow temporary unlock/relock during hibernation handling. - Copies the container ID (instead of referencing the map key) to remain safe across a lock release.
- Adds a
State::Hibernatingstop path: set state toStopping, gather PIDs, unlock, wake processes in reverse order, relock, re-validate container, then continue withkillCont().
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| // If the container is hibernating, abort the ongoing hibernation by | ||
| // waking up all processes before sending the kill signal. This prevents | ||
| // the hibernate thread from trying to checkpoint PIDs that have | ||
| // already been terminated by the stop. | ||
| if (container->state == DobbyContainer::State::Hibernating) | ||
| { | ||
| AI_LOG_INFO("Container '%s' is hibernating, waking up before stop", id.c_str()); | ||
|
|
||
| // Set state to Stopping so the hibernate thread sees the state | ||
| // change and aborts before hibernating the next PID | ||
| container->state = DobbyContainer::State::Stopping; | ||
|
|
||
| // Get PIDs while holding the lock | ||
| Json::Value jsonPids = DobbyStats(id, mEnvironment, mUtilities).stats()["pids"]; | ||
|
|
||
| // Release the lock so the hibernate thread can see the state | ||
| // change and abort | ||
| locker.unlock(); | ||
|
|
||
| // Wake up all already-hibernated processes in reverse order | ||
| auto pidIt = jsonPids.end(); | ||
| while (pidIt != jsonPids.begin()) | ||
| { | ||
| --pidIt; | ||
| uint32_t pid = pidIt->asUInt(); | ||
| DobbyHibernate::WakeupProcess(pid); | ||
| } | ||
|
|
||
| // Re-acquire the lock | ||
| locker.lock(); | ||
|
|
||
| // Re-verify the container still exists after releasing the lock | ||
| auto newIt = mContainers.find(id); | ||
| if (newIt == mContainers.end() || !newIt->second || | ||
| newIt->second->descriptor != cd) | ||
| { | ||
| AI_LOG_WARN("Container '%s' no longer exists after wakeup", id.c_str()); | ||
| AI_LOG_FN_EXIT(); | ||
| return false; | ||
| } | ||
| } |
Coverity Issue - Uncaught exceptionAn exception of type "std::regex_error" is thrown but the exception specification "/implicit/noexcept" doesn't allow it to be thrown. This will result in a call to terminate(). Medium Impact, CWE-248 Issue locationThis issue was discovered outside the diff for this Pull Request. You can find it at: |
There was a problem hiding this comment.
Pull request overview
Fixes DobbyManager::stopContainer() handling when a container is actively hibernating by aborting the hibernation flow, waking already-hibernated processes, and then proceeding with normal termination—preventing crashes during stop requests.
Changes:
- Switch
stopContainer()to usestd::unique_lockso the manager lock can be temporarily released. - For
State::Hibernating, set state toStopping, collect PIDs, unlock, wake PIDs in reverse order, re-lock, re-validate container existence, and continue with kill. - Copy
ContainerId(instead of referencing map key) to allow safe re-lookup after unlocking.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // If the container is hibernating, abort the ongoing hibernation by | ||
| // waking up all processes before sending the kill signal. This prevents | ||
| // the hibernate thread from trying to checkpoint PIDs that have | ||
| // already been terminated by the stop. | ||
| if (container->state == DobbyContainer::State::Hibernating) | ||
| { | ||
| AI_LOG_INFO("Container '%s' is hibernating, waking up before stop", id.c_str()); | ||
|
|
||
| // Set state to Stopping so the hibernate thread sees the state | ||
| // change and aborts before hibernating the next PID | ||
| container->state = DobbyContainer::State::Stopping; | ||
|
|
||
| // Get PIDs while holding the lock | ||
| Json::Value jsonPids = DobbyStats(id, mEnvironment, mUtilities).stats()["pids"]; | ||
|
|
||
| // Release the lock so the hibernate thread can see the state | ||
| // change and abort | ||
| locker.unlock(); | ||
|
|
||
| // Wake up all already-hibernated processes in reverse order | ||
| auto pidIt = jsonPids.end(); | ||
| while (pidIt != jsonPids.begin()) | ||
| { | ||
| --pidIt; | ||
| uint32_t pid = pidIt->asUInt(); | ||
| DobbyHibernate::WakeupProcess(pid); | ||
| } |
There was a problem hiding this comment.
This PR adds new stop behavior specifically for State::Hibernating (collect PIDs, unlock, wake in reverse order, then continue stop). There are existing unit tests for stopContainer covering Running/Paused/Unknown, but none covering the Hibernating path; adding a test for this branch (including the reverse-order WakeupProcess calls and container-state transitions) would help prevent regressions.
264d0ee to
09e14bd
Compare
Description
In stopContainer(), when state is Hibernating, set state to Stopping (aborts the hibernate thread's next check), collect PIDs, temporarily release the lock, call WakeupProcess for all PIDs (matching wakeupContainer's reverse-order pattern), then re-acquire and proceed with the kill.
Test Procedure
Stopping a container during active hibernation should not cause a crash
Type of Change
Requires Bitbake Recipe changes?
meta-rdk-ext/recipes-containers/dobby/dobby.bb) must be modified to support the changes in this PR (beyond updatingSRC_REV)