Skip to content

Keep automations whole across a restart #221

Description

@nielsrowinbik

Problem statement

On restart, Home Assistant drops all in-flight delays and waits and misses any time triggers that fire while it is down. Any long-running or scheduled automation can therefore silently break, sometimes leaving devices stuck in the wrong state. Users have no native way to resume pending delays and waits or to replay time-based triggers that fired during downtime, so restarts (including routine updates) become a hidden source of unreliable automations.

Community signals

Individual upvotes are low (around one each), but the same failure mode recurs across four independent reports spanning a long period, which signals a sustained pain point rather than a one-off.

  • #3791 Native mechanism to recover missed automations and delays after system restart
  • #656 Missed triggers due to system restart
  • #1927 Restoring the status of running automations after restart/update
  • #3978 Ensure that automations revert to their last state after a restart

These feature requests were recently consolidated, so it'd be interesting to see whether the upvotes will concentrate onto a single request in the future. We'll update this opportunity with that information of course.

Scope & Boundaries

In scope

  • Resuming pending delays and waits that were in flight when Home Assistant stopped
  • Replaying or reconciling time-based triggers that fired during downtime
  • Restoring the running state of automations that were mid-execution across a restart

Not in scope

  • Guaranteeing exactly-once execution semantics for every conceivable action
  • Recovery from crashes that leave the recorder or config in an inconsistent state

Foreseen solution

Start by completing the priming work that was started as part of #2. Input on this behaviour has been requested from the community and is still very much welcome:

  1. How should unknown/unavailable flips affect duration-based triggers and conditions?
  2. How should duration-based triggers behave when the state already qualifies at creation?

Then, give the automation engine a way to persist in-flight execution state (pending delays, waits, and the point an automation had reached). The main design question is how far to extend that mechanism from triggers and conditions into full automation execution state (see open questions).

Risks & open questions

  • How much execution state can be safely persisted and resumed without introducing surprising or duplicated actions on startup?
  • Replaying missed time triggers risks firing actions that are no longer wanted (for example, a "turn off at sunset" that is now hours stale). What is the right reconciliation policy, and should it be user-configurable?
  • Where is the boundary between this and the existing priming and restart work in the new triggers and conditions system? This should extend that work, not duplicate it.
  • How does recovery behave after a long outage versus a quick restart, and should the two be treated differently?

Appetite

To be set.

Execution issues

No response

Decision log

Date Decision Outcome

Metadata

Metadata

Assignees

Labels

No labels
No labels

Fields

No fields configured for Opportunity.

Projects

Status
Considering

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions