Skip to content

mplind/os-autorollback

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OPNsense Auto Rollback

OPNsense Auto Rollback

Never lock yourself out of your firewall again.
Automatic configuration rollback with safe mode, connectivity watchdog, and crash recovery.

OPNsense Plugin Version 1.0 BSD-2 License 226 Tests Passing 92% Coverage

Quick Start · How It Works · Features · Configuration · CLI Reference · FAQ


Auto Rollback Flow


The Problem

You SSH into your remote firewall. You change a firewall rule. The connection drops. The firewall is 500 miles away.

Every network admin has been there. One bad rule, one typo in an interface config, one misguided NAT change, and you're driving to the data center at 2 AM.

The Solution

Auto Rollback adds a dead man's switch to OPNsense. Before you make changes, enter Safe Mode. A countdown timer starts. If you don't confirm your changes before the timer expires, the system automatically reverts to the last known-good configuration.

It's the same concept as Juniper's commit confirmed and MikroTik's Safe Mode, built natively for OPNsense.


Quick Start

Install

pkg install os-autorollback

Or install from the OPNsense web UI: System > Firmware > Plugins and search for os-autorollback.

Enable

Navigate to System > Auto Rollback and check Enable plugin. Click Save.

Use It

  1. Click Enter Safe Mode
  2. Make your configuration changes (firewall rules, interfaces, NAT, etc.)
  3. Verify everything works
  4. Click Confirm Changes to keep them, or let the timer expire to revert

That's it. If anything goes wrong, the system rolls back automatically.


How It Works

  You click              You make                Timer               Changes
  "Enter Safe Mode"      changes                 expires             revert
       |                    |                       |                  |
       v                    v                       v                  v
  +---------+    +------------------+    +-------------------+    +---------+
  | Snapshot | -> | Countdown Timer | -> | Automatic Revert  | -> | Reboot  |
  | Config   |    | (120s default)  |    | to snapshot        |    | or      |
  +---------+    +------------------+    +-------------------+    | Reload  |
                        |                                          +---------+
                        v
                  Click "Confirm"
                  to keep changes

Three Layers of Protection

Auto Rollback doesn't rely on a single mechanism. It uses three independent layers to guarantee your configuration is safe:

Three-layer safety architecture

If the timer daemon crashes, the watchdog catches it. If the whole system reboots, the boot recovery hook restores your config before any services read it. There is no scenario where a bad config survives.


Features

Safe Mode UI Preview

Safe Mode

  • Snapshot your configuration with one click (or one command)
  • Configurable timeout: 30 seconds to 60 minutes (default: 2 minutes)
  • Extend the timer in 60-second increments if you need more time
  • Full reboot or fast service reload on rollback
  • Git backup integration (if os-git-backup is installed)

Live Countdown UI

A persistent banner appears at the top of every page in the OPNsense UI while safe mode is active. You always know how much time you have left, and you can confirm or revert from any page.

The banner includes:

  • Live countdown timer with progress bar
  • Color-coded urgency (green > amber > red)
  • Confirm, Revert, and +60s buttons
  • Pulsing animation so you never forget it's active

The plugin settings page shows a larger control panel with the same controls plus CLI hints for SSH users.

Connectivity Watchdog

An always-on health monitor that automatically rolls back if your changes break connectivity. No safe mode required.

  • Runs independently of safe mode as a cron job (every 60 seconds)
  • Configurable grace period after config changes (default: 60 seconds)
  • Primary and secondary health check commands
  • Regex pattern matching on command output
  • Configurable failure threshold before rollback (default: 3 consecutive failures)
  • Default check: ping the default gateway

Example: You change a VLAN config that breaks your management interface. The watchdog detects the gateway is unreachable, counts 3 failures, and automatically reverts. You never even knew there was a problem.

Dashboard Widget

A compact status indicator on your OPNsense dashboard showing:

  • Current state (Disabled / Armed / Safe Mode / Restoring)
  • Live countdown during safe mode
  • Quick-action buttons
  • Watchdog failure count

CLI Support

Every operation is available via configctl for SSH users and automation:

configctl autorollback safemode start       # Enter safe mode
configctl autorollback safemode confirm     # Confirm changes
configctl autorollback safemode cancel      # Revert immediately
configctl autorollback safemode extend      # Add 60 seconds
configctl autorollback status               # Show current status
configctl autorollback watchdog check       # Run watchdog check

Configuration

Navigate to System > Auto Rollback to configure the plugin.

General

Setting Default Description
Enable plugin Off Master switch for the entire plugin
Safe mode timeout 120 seconds Time before automatic rollback (30 - 3600)
Rollback method Full reboot How to apply the restored config. reboot is safest. reload is faster but may not fully apply all changes.

Connectivity Watchdog

Setting Default Description
Enable watchdog Off Enable always-on connectivity monitoring
Grace period 60 seconds Wait time after a config change before running checks (15 - 600)
Failure threshold 3 Consecutive check failures before rollback (1 - 10)
Check command ping -c 1 -W 3 -t 5 %gateway% Primary health check. %gateway% is replaced with the default gateway IP.
Check pattern 1 packets received Regex pattern that must match the command output for a pass
Check command 2 (empty) Optional secondary check command
Check pattern 2 (empty) Pattern for secondary check

Logging

Setting Default Description
Log rollbacks On Log all rollback events to syslog (autorollback facility)

CLI Reference

All commands return JSON and can be piped to jq for scripting.

Enter Safe Mode

configctl autorollback safemode start
{
  "status": "ok",
  "message": "Safe mode activated. You have 120 seconds to confirm changes.",
  "timeout": 120,
  "remaining_seconds": 120,
  "expiry_time": 1700000120.0,
  "backup_file": "/conf/backup/config-1700000000.xml",
  "backup_revision": "1700000000",
  "token": "a1b2c3d4...",
  "rollback_method": "reboot"
}

Optional: override timeout (clamped to 30-3600):

configctl autorollback safemode start 300

Confirm Changes

configctl autorollback safemode confirm

Revert Immediately

configctl autorollback safemode cancel

Extend Timer

configctl autorollback safemode extend       # +60 seconds (default)
configctl autorollback safemode extend 120   # +120 seconds

Check Status

configctl autorollback status
{
  "status": "ok",
  "timestamp": 1700000033.0,
  "system_state": "safe_mode",
  "safe_mode": {
    "active": true,
    "remaining_seconds": 87,
    "backup_file": "/conf/backup/config-1700000000.xml",
    "backup_revision": "1700000000",
    "start_time": 1700000000.0,
    "expiry_time": 1700000120.0,
    "timeout": 120,
    "rollback_method": "reboot",
    "timer_pid": 12345
  },
  "watchdog": {
    "enabled": true,
    "fail_count": 0,
    "last_config_change": 0,
    "last_config_backup": ""
  },
  "settings": {
    "enabled": true,
    "timeout": 120,
    "rollback_method": "reboot",
    "watchdog_enabled": true,
    "grace_period": 60,
    "fail_threshold": 3,
    "check_command": "ping -c 1 -W 3 -t 5 %gateway%",
    "check_pattern": "1 packets received",
    "check_command_2": "",
    "check_pattern_2": "",
    "log_rollbacks": true
  },
  "token": "a1b2c3d4..."
}

System States

State Meaning
disabled Plugin is turned off
armed Plugin enabled, waiting for safe mode
safe_mode Countdown active, waiting for confirmation
restoring Rollback in progress

API Reference

All endpoints require OPNsense API authentication.

Endpoint Method Description
/api/autorollback/service/start POST Enter safe mode
/api/autorollback/service/confirm POST Confirm changes
/api/autorollback/service/cancel POST Revert immediately
/api/autorollback/service/extend POST Extend timer
/api/autorollback/service/status GET Get current status
/api/autorollback/settings/get GET Get plugin settings
/api/autorollback/settings/set POST Update plugin settings

Recommended Setups

Remote Firewall (Conservative)

For firewalls in remote locations where physical access is difficult:

Timeout:            300 seconds (5 minutes)
Rollback method:    Full reboot
Watchdog:           Enabled
Grace period:       60 seconds
Fail threshold:     2
Check command:      ping -c 1 -W 3 -t 5 %gateway%
Check command 2:    host 8.8.8.8

Lab / Local Firewall (Fast Iteration)

For local development or lab environments where you want faster turnaround:

Timeout:            60 seconds
Rollback method:    Service reload
Watchdog:           Disabled

HA Pair

For high-availability deployments, use a longer timeout to allow both nodes to sync:

Timeout:            300 seconds
Rollback method:    Full reboot
Watchdog:           Enabled
Grace period:       120 seconds
Fail threshold:     3

How Rollback Works

When a rollback triggers (timer expiry, watchdog threshold, or manual revert), the system:

  1. Validates the backup file (XML parsing, schema checks, path validation)
  2. Acquires an exclusive restore lock (prevents concurrent rollbacks)
  3. Creates a safety backup of the current config (config-pre-rollback.xml)
  4. Writes the backup atomically (temp file + rename, never a partial write)
  5. Preserves file ownership and permissions
  6. Clears the config cache
  7. Applies the restored config via the configured method (reboot or reload)
  8. Cleans up state files and timer processes
  9. Logs the event to syslog

The entire operation is designed to be crash-safe. If power is lost mid-rollback, the boot recovery hook picks up where it left off.


FAQ

Does this work with CARP / HA?

Yes. The rollback state is local to each node. If you're making changes to an HA pair, enter safe mode on each node independently. The timeout gives you enough time to sync changes between nodes before confirming.

What happens if the system reboots during safe mode?

The boot recovery hook detects the expired timer and restores the backup before any services start. Your firewall boots into the known-good configuration.

Can I use this via the API for automation?

Yes. All operations are available via the standard OPNsense REST API. Use /api/autorollback/service/start to enter safe mode, apply your changes via other API endpoints, verify connectivity, then call /api/autorollback/service/confirm. If your automation script fails, the timer rolls back automatically.

Does the watchdog run during safe mode?

The watchdog and safe mode operate independently. The watchdog monitors connectivity after any config change, whether or not safe mode is active. During safe mode, the timer daemon handles rollback. The watchdog provides an additional safety layer that works 24/7.

What gets rolled back?

The entire /conf/config.xml file is restored from the backup. This includes firewall rules, interface configurations, NAT, VPN settings, and all other OPNsense configuration. Plugin-specific data stored outside of config.xml is not affected.

Is there a maximum timeout?

Yes. The timeout is clamped to the range of 30 to 3600 seconds (1 hour). This prevents accidentally leaving safe mode running indefinitely.

What's the difference between "reboot" and "reload"?

  • Reboot restarts the entire system. This guarantees every service picks up the restored configuration. It takes longer (typically 30-60 seconds) but is the safest option.
  • Reload restarts services in-place without a full reboot. This is faster (typically 5-10 seconds) but some configuration changes may not fully apply without a reboot.

For remote firewalls, reboot is strongly recommended.

Can I customize the watchdog health check?

Yes. You can set any command that runs on the OPNsense shell. The special token %gateway% is replaced with the default gateway IP. Examples:

# Ping a specific host
ping -c 1 -W 3 8.8.8.8

# DNS resolution check
host opnsense.org

# HTTP reachability check
curl -s -o /dev/null -w "%{http_code}" http://10.0.0.1 | grep 200

# Check a specific interface
ping -c 1 -W 3 -S 192.168.1.1 10.0.0.1

Does this work with os-git-backup?

Yes. When os-git-backup is installed, Auto Rollback triggers a git backup snapshot at the start of safe mode. This gives you a git-versioned record of the configuration at the point safe mode was entered.


Architecture

                    OPNsense Web UI
                          |
                    PHP Controllers
                     (MVC Pattern)
                          |
                    configd Backend
                    (actions.conf)
                          |
          +---------------+---------------+
          |               |               |
    safemode.py     watchdog.py      status.py
    (Safe Mode       (Cron           (Status
     Manager)        Watchdog)        Reporter)
          |               |
    timer_daemon.py   rollback.py
    (Background        (Rollback
     Countdown)        Executor)
          |               |
          +-------+-------+
                  |
            lib/common.py
            (Shared Utilities)
                  |
          +-------+-------+
          |               |
   config.xml       persistent state
   (OPNsense        (/conf/autorollback_
    Config)          pending.json)

Syshook Integration

Config saved ──> 50-autorollback (config hook)
                    Records config change timestamp
                    Enables watchdog tracking

System boots ──> 10-autorollback-recovery (early hook)
                    Checks for expired timers
                    Restores config before services start

Development

Running Tests

cd os-autorollback
python -m pytest tests/ -v --tb=short

Coverage Report

python -m pytest tests/ --cov=sysutils/autorollback/src/opnsense/scripts/autorollback \
    --cov-report=term-missing -v

Current coverage: 92% across 226 tests.

Module Statements Coverage
common.py 251 91%
rollback.py 188 94%
safemode.py 180 84%
status.py 52 98%
timer_daemon.py 87 99%
watchdog.py 164 97%

Project Structure

os-autorollback/
  sysutils/autorollback/
    Makefile                          # OPNsense build config
    pkg-descr                         # Package description
    src/
      etc/
        inc/plugins.inc.d/
          autorollback.inc            # Cron and service registration
        rc.syshook.d/
          config/
            50-autorollback           # Config change hook
          early/
            10-autorollback-recovery  # Boot recovery hook
      opnsense/
        mvc/app/
          controllers/                # PHP MVC controllers
          models/                     # XML data model and ACL
          views/                      # Volt templates (UI)
        scripts/autorollback/         # Python backend
          lib/common.py               # Shared utilities
          safemode.py                 # Safe mode manager
          timer_daemon.py             # Background countdown
          watchdog.py                 # Health monitor
          rollback.py                 # Rollback executor
          status.py                   # Status reporter
        service/conf/actions.d/
          actions_autorollback.conf   # configd action definitions
        www/js/
          autorollback_banner.js      # Global countdown banner
          widgets/
            AutoRollback.js           # Dashboard widget
            Metadata/
              AutoRollback.xml        # Widget metadata (endpoints, translations)
  tests/                              # 226 unit tests
    conftest.py                       # Shared fixtures
    test_common.py
    test_safemode.py
    test_timer_daemon.py
    test_watchdog.py
    test_rollback.py
    test_status.py
    test_syshooks.py

Inspired By


License

BSD-2-Clause. See LICENSE.


Built for the OPNsense community.

About

OPNsense plugin for automatic configuration rollback with safe mode, watchdog health checks, and early boot recovery

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors