Skip to content

RDKEMW-13969: Fix hibernating state handling in stopContainer#421

Open
ks734 wants to merge 3 commits intordkcentral:topic/RDKEMW-13941from
ks734:topic/RDKEMW-13969
Open

RDKEMW-13969: Fix hibernating state handling in stopContainer#421
ks734 wants to merge 3 commits intordkcentral:topic/RDKEMW-13941from
ks734:topic/RDKEMW-13969

Conversation

@ks734
Copy link
Copy Markdown
Contributor

@ks734 ks734 commented Mar 18, 2026

Description

In stopContainer(), when state is Hibernating, set state to Stopping (aborts the hibernate thread's next check), collect PIDs, temporarily release the lock, call WakeupProcess for all PIDs (matching wakeupContainer's reverse-order pattern), then re-acquire and proceed with the kill.

Test Procedure

Stopping a container during active hibernation should not cause a crash

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Other (doesn't fit into the above categories - e.g. documentation updates)

Requires Bitbake Recipe changes?

  • The base Bitbake recipe (meta-rdk-ext/recipes-containers/dobby/dobby.bb) must be modified to support the changes in this PR (beyond updating SRC_REV)

Copilot AI review requested due to automatic review settings March 18, 2026 11:39
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts DobbyManager::stopContainer() to safely handle stopping containers that are actively in the Hibernating state, avoiding races with the hibernation thread by transitioning state, waking processes, and only then proceeding with the stop/kill logic.

Changes:

  • Switches stopContainer() locking to std::unique_lock to allow temporary unlock/relock during hibernation handling.
  • Copies the container ID (instead of referencing the map key) to remain safe across a lock release.
  • Adds a State::Hibernating stop path: set state to Stopping, gather PIDs, unlock, wake processes in reverse order, relock, re-validate container, then continue with killCont().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +1375 to +1415
// If the container is hibernating, abort the ongoing hibernation by
// waking up all processes before sending the kill signal. This prevents
// the hibernate thread from trying to checkpoint PIDs that have
// already been terminated by the stop.
if (container->state == DobbyContainer::State::Hibernating)
{
AI_LOG_INFO("Container '%s' is hibernating, waking up before stop", id.c_str());

// Set state to Stopping so the hibernate thread sees the state
// change and aborts before hibernating the next PID
container->state = DobbyContainer::State::Stopping;

// Get PIDs while holding the lock
Json::Value jsonPids = DobbyStats(id, mEnvironment, mUtilities).stats()["pids"];

// Release the lock so the hibernate thread can see the state
// change and abort
locker.unlock();

// Wake up all already-hibernated processes in reverse order
auto pidIt = jsonPids.end();
while (pidIt != jsonPids.begin())
{
--pidIt;
uint32_t pid = pidIt->asUInt();
DobbyHibernate::WakeupProcess(pid);
}

// Re-acquire the lock
locker.lock();

// Re-verify the container still exists after releasing the lock
auto newIt = mContainers.find(id);
if (newIt == mContainers.end() || !newIt->second ||
newIt->second->descriptor != cd)
{
AI_LOG_WARN("Container '%s' no longer exists after wakeup", id.c_str());
AI_LOG_FN_EXIT();
return false;
}
}
@rdkcmf-jenkins
Copy link
Copy Markdown
Contributor

Coverity Issue - Uncaught exception

An exception of type "std::regex_error" is thrown but the exception specification "/implicit/noexcept" doesn't allow it to be thrown. This will result in a call to terminate().

Medium Impact, CWE-248
UNCAUGHT_EXCEPT

Issue location

This issue was discovered outside the diff for this Pull Request. You can find it at:
daemon/lib/source/DobbyManager.cpp:126

@ks734 ks734 changed the base branch from develop to topic/RDKEMW-13941 March 23, 2026 05:29
Copilot AI review requested due to automatic review settings March 23, 2026 09:12
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes DobbyManager::stopContainer() handling when a container is actively hibernating by aborting the hibernation flow, waking already-hibernated processes, and then proceeding with normal termination—preventing crashes during stop requests.

Changes:

  • Switch stopContainer() to use std::unique_lock so the manager lock can be temporarily released.
  • For State::Hibernating, set state to Stopping, collect PIDs, unlock, wake PIDs in reverse order, re-lock, re-validate container existence, and continue with kill.
  • Copy ContainerId (instead of referencing map key) to allow safe re-lookup after unlocking.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1379 to +1405
// If the container is hibernating, abort the ongoing hibernation by
// waking up all processes before sending the kill signal. This prevents
// the hibernate thread from trying to checkpoint PIDs that have
// already been terminated by the stop.
if (container->state == DobbyContainer::State::Hibernating)
{
AI_LOG_INFO("Container '%s' is hibernating, waking up before stop", id.c_str());

// Set state to Stopping so the hibernate thread sees the state
// change and aborts before hibernating the next PID
container->state = DobbyContainer::State::Stopping;

// Get PIDs while holding the lock
Json::Value jsonPids = DobbyStats(id, mEnvironment, mUtilities).stats()["pids"];

// Release the lock so the hibernate thread can see the state
// change and abort
locker.unlock();

// Wake up all already-hibernated processes in reverse order
auto pidIt = jsonPids.end();
while (pidIt != jsonPids.begin())
{
--pidIt;
uint32_t pid = pidIt->asUInt();
DobbyHibernate::WakeupProcess(pid);
}
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds new stop behavior specifically for State::Hibernating (collect PIDs, unlock, wake in reverse order, then continue stop). There are existing unit tests for stopContainer covering Running/Paused/Unknown, but none covering the Hibernating path; adding a test for this branch (including the reverse-order WakeupProcess calls and container-state transitions) would help prevent regressions.

Copilot uses AI. Check for mistakes.
@ks734 ks734 force-pushed the topic/RDKEMW-13969 branch from 264d0ee to 09e14bd Compare March 23, 2026 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants