[SIP] Entity Version History for Dashboards, Charts, and Datasets
Status: Draft
Scope: Backend only. This SIP covers the version capture, restore, retention, and structured change-record APIs. A user-facing UI (version history browser, "what changed" view, one-click restore) is intentionally out of scope here and will land in a follow-up SIP. The endpoints described below are stable enough for a UI to consume; the UI proposal will refine the rendering of changes records and the restore confirmation flow.
Motivation
Every save to a dashboard, chart, or dataset in Superset is destructive — the previous state is permanently lost. Users who accidentally overwrite a dashboard layout, corrupt a set of filters, or remove calculated metrics have no recovery path short of a database backup, which requires admin intervention and may not be available.
This is a consistent source of support tickets and user frustration. Other platforms offer built-in version history with one-click restore. The absence of this capability in Superset is a pain point for organizations managing critical dashboards.
Proposed Change
Add automatic version history to dashboards, charts, and datasets. Every save creates a new version. Users can browse the history of any entity, inspect what each save changed, and restore a previous version. Restore is non-destructive: it produces a new version row that brings the entity back to a prior state, leaving the version chain intact.
Strategy at a glance
| Concern |
Strategy |
| Capture |
Use SQLAlchemy-Continuum to track every save automatically. Continuum mirrors each versioned class (parents Dashboard / Slice / SqlaTable and children TableColumn / SqlMetric) into a shadow table; the dashboard_slices M2M gets its own shadow. No save-path instrumentation needed beyond a single make_versioned() call at app init. |
| Baseline |
The first save of a pre-existing entity captures a synthetic operation_type=0 row representing the state before that save, so the version history for that entity isn't a single mystery edit. The baseline is treated like any other historical row by retention — it ages out alongside the rest of the history. |
| Restore |
Continuum's Reverter is the engine; VersionDAO.restore_version() calls it once per related collection (split-revert) and stamps changed_on / changed_by_fk on the live entity so the restoring commit is attributed to the user who clicked Restore. The new shadow row that the restoring commit produces is what gives "non-destructive restore." |
| Change log |
A diff-on-flush listener writes structured per-field change records into version_changes (one row per atomic change, keyed to version_transaction.id). Captured forward at save time so the UI can render "Added column 'country'" / "Renamed dashboard" without diffing snapshots at read time. |
| Retention |
Time-based, default 30 days, configurable via SUPERSET_VERSION_HISTORY_RETENTION_DAYS env var. A scheduled Celery beat task ages out old shadow rows. Only the live (current) row is preserved regardless of age; closed historical rows including the baseline age out. |
| API surface |
Two new endpoints per entity type — GET /api/v1/{resource}/<uuid>/versions/ and POST /api/v1/{resource}/<uuid>/versions/<version_uuid>/restore. Plus an ETag: "<version_uuid>" header on save responses and editor-fetch GETs so future UI work can do optimistic locking. |
The four save-time pieces (Continuum capture, baseline listener, change-record listener, and the M2M / child shadow tables that come along with __versioned__) are all wired up in a single init_versioning() step at app start. The three persistence layers (Continuum shadows, version_transaction, version_changes) are added by three Alembic migrations. The Celery beat task is registered in the existing CELERYBEAT_SCHEDULE. No feature flag, no UI in v1.
Per-entity restore scope
| Entity |
Restored |
Not restored |
| Dashboard |
scalar fields (dashboard_title, position_json, json_metadata, slug, css, description, certified_by, certification_details, published, external_url, theme_id, is_managed_externally); chart membership (dashboard_slices — which charts are attached, in what layout) |
owners, roles, tags |
| Chart |
scalar fields (slice_name, params, viz_type, description, cache_timeout, certified_by, certification_details, external_url, is_managed_externally); the dataset linkage (datasource_id, datasource_type) |
owners, tags, query_context (cached/regenerated), the dataset's own state |
| Dataset |
scalar fields (table_name, schema, catalog, database_id, sql, description, main_dttm_col, default_endpoint, offset, cache_timeout, params, template_params, extra, fetch_values_predicate, is_sqllab_view, is_managed_externally); calculated columns (TableColumn rows — column_name, expression, type, verbose_name, is_dttm, groupby, filterable, etc.); metrics (SqlMetric rows — metric_name, expression, metric_type, verbose_name, d3format, currency, etc.) |
owners, row_level_security_filters, tags, charts that depend on this dataset |
The remainder of this section drills into each piece.
Continuum-driven scalar capture
SQLAlchemy-Continuum is added as a dependency and configured in superset/extensions/__init__.py:
make_versioned(
user_cls=None,
transaction_cls=VersionTransactionFactory(), # custom name
plugins=[VersioningFlaskPlugin()], # JWT-aware user attribution
options={"strategy": "validity"},
)
Two custom subclasses fix integration friction:
VersionTransactionFactory renames Continuum's default transaction table to version_transaction (avoids collision with downstream extensions that use the unqualified name) and the PostgreSQL sequence to version_transaction_id_seq.
VersioningFlaskPlugin overrides transaction_args() to read the acting user via superset.utils.core.get_user_id() instead of flask_login.current_user. The default plugin reads current_user, which Flask-Login populates under cookie-authenticated sessions but not under JWT auth (JWT bypasses Flask-Login). get_user_id() reads g.user, populated by both auth modes, so the custom plugin attributes API saves consistently.
The three entity types declare which columns to track:
class Dashboard(...):
__versioned__ = {"exclude": ["slices", "owners", "roles"]}
class Slice(...):
__versioned__ = {"exclude": ["query_context", "owners", "dashboards"]}
class SqlaTable(...):
__versioned__ = {"exclude": ["owners", "row_level_security_filters"]}
Slice.query_context is excluded for three reasons: (a) it's a derived field — the result of running the chart, populated by /api/v1/chart/data and reproducible from params plus the datasource; (b) its size (typically 10–50 KB, can exceed 100 KB) would dominate slices_version rows and version_changes.from_value/to_value payloads; (c) it's touched on every chart run, not just on user edits, so versioning it would flood the history with non-user-authored noise. Restoring a chart's params is enough to recover its definition — the next render regenerates query_context from scratch.
Once the soft-delete SIP merges, deleted_at will be added to all three exclude lists so that soft-deleting an entity does not produce a content-edit version row.
make_versioned() registers SQLAlchemy mapper events that activate when __versioned__ models go through configure_mappers() (lazily, at first session use). It must therefore run before any code path that triggers mapper configuration. For Flask-SQLAlchemy this means in extensions/__init__.py immediately after db = SQLAlchemy(), before any model module is imported by the app initialiser.
Synthetic baseline capture
A before_flush listener (register_baseline_listener()) inserts a synthetic baseline row (operation_type=0) the first time a pre-existing entity is updated under versioning. This means the very first save of any pre-existing dashboard, chart, or dataset produces two version rows — the baseline (state before the edit) and the edit itself — giving users an immediate rollback point. The listener is prepended (insert=True) in the SQLAlchemy event chain so the baseline's transaction id is allocated before Continuum's own listener allocates the update's, keeping the baseline sorted first in the version history.
Continuum shadow tables for child state
TableColumn, SqlMetric, and the dashboard_slices association are versioned via Continuum's auto-generated shadow tables on the same validity strategy as the parents:
| Shadow table |
Mirrors |
Notes |
dashboards_version, slices_version, tables_version |
parents |
Continuum-default per-class shadows with the validity strategy |
table_columns_version, sql_metrics_version |
dataset children |
__versioned__ on TableColumn / SqlMetric excludes the four audit fields (changed_on/by_fk, created_on/by_fk) so child diffs don't surface "changed_on shifted" noise on every parent save |
dashboard_slices_version |
dashboard slices M2M |
depends on sc-105349-composite-association-pks for the live composite-PK shape — Continuum's M2M tracker builds shadow inserts from **params + transaction_id + operation_type, where params is what the live INSERT carries; the composite-PK shape lets that map to the shadow's PK cleanly |
Three pieces of the design let Continuum-tracked children work cleanly against Superset's existing entity model:
- Natural-key upsert in
DatasetDAO._override_columns. The dataset-edit path keys child writes on column_name (and metric_name for SqlMetric), updating in place rather than deleting and reinserting under fresh primary keys. Stable PKs across edits keep the shadow trail unambiguous — no overlapping (id, column_name) rows for the same logical column.
- Composite PK on
dashboard_slices (sc-105349-composite-association-pks). The live association table uses a composite PK on (dashboard_id, slice_id) instead of a surrogate id. Continuum's M2M tracker builds shadow inserts from **params + transaction_id + operation_type, where params is what the live INSERT carries. With the composite-PK live shape, that maps to the auto-generated dashboard_slices_version PK cleanly.
- Slice baseline under the dashboard's baseline tx. The baseline listener synthesises
operation_type=0 shadow rows in dashboard_slices_version AND in slices_version for each attached slice that has no prior shadow, using the dashboard's baseline transaction id. The M2M version-side restore query joins dashboard_slices_version with slices_version and filters by validity at the dashboard's tx — without the slice-side baseline, dashboards whose membership predates versioning would restore to an empty slices list.
The restore path uses Continuum's Reverter natively, with one further accommodation — a split-revert loop, described below — to avoid a Reverter / autoflush / cascade-add interaction that fires when revert(relations=['a','b']) is given two or more uselist relations and the target tx requires removing live children.
Restore
VersionDAO.restore_version() uses Continuum's Reverter natively, with a split-revert loop. For each relation in _RESTORE_RELATIONS[<class>] (datasets: ["columns", "metrics"]; dashboards: ["slices"]; charts: []):
target_version.revert(relations=[<one>]) — copies versioned scalar fields onto the live entity AND rebuilds the named relationship from the version-side shadows.
db.session.flush() — pushes pending child DELETEs to the DB.
db.session.expire(entity, _RESTORE_RELATIONS[<class>]) — invalidates the parent's in-memory collection so the next iteration's cascade-add doesn't trip on a deleted-state child instance.
After all relations are processed, the live entity's changed_on and changed_by_fk are overwritten with datetime.now() and the restoring user's id, so the resulting new version row attributes the restore to the right user. created_on / created_by_fk are left untouched (they continue to record original authorship).
The split-revert pattern addresses a Reverter / SQLAlchemy autoflush / cascade-add interaction that fires when revert(relations=['a','b']) is given two or more uselist relations and the target tx requires removing live children. Single-relation calls don't open the autoflush window between iterations, and the explicit flush + expire clears any deleted-state instances from the parent's in-memory collection before the next call's cascade-add runs.
Tags, owners, and roles are deliberately excluded across all three entity types — they are access/organisation metadata, not user-authored content. Restoring them would surprise users by retroactively changing collaborator or permission state. The exclusion is enforced by __versioned__["exclude"] on each parent class.
Structured change records
A version_changes table records an atomic per-field diff for every save. Schema:
| Column |
Type |
Notes |
id |
BIGINT PK |
|
transaction_id |
BIGINT FK → version_transaction.id, ON DELETE CASCADE |
|
entity_kind |
VARCHAR(32) |
chart / dashboard / dataset |
entity_id |
INTEGER |
|
sequence |
SMALLINT |
0-based, per (transaction_id, entity_kind, entity_id) |
kind |
VARCHAR(32) |
field, filter, metric, time_range, color_palette, dimension, column, chart |
path |
JSON |
e.g. ["dashboard_title"], ["params", "adhoc_filters", "country"], ["json_metadata", "refresh_frequency"] |
from_value, to_value |
JSON |
JSON-safe scalar values (datetime / UUID / bytes coerced to ISO / hex / str) |
Indexes and constraints:
UNIQUE (transaction_id, entity_kind, entity_id, sequence) — guards against duplicate inserts when after_flush re-fires within a single transaction (see Listener registration order above).
INDEX (transaction_id) — supports the FK lookup and the per-version "what changed" query.
INDEX (entity_kind, entity_id) — supports per-entity history queries (e.g., "show all change records for chart 42").
INDEX (kind) — supports kind-filtered queries (e.g., "show every filter change across the workspace").
A pure-Python diff engine (superset/versioning/diff.py) walks pre-state vs post-state and emits the structured records that power per-version labels in the UI ("Added filter country", "Changed time range", etc.). The records themselves are machine-readable and language-neutral; rendering them into localized strings is a UI concern via Flask-Babel t(), deferred to the follow-up UI SIP. Highlights:
Slice.params is JSON-parsed and walked: known keys (adhoc_filters, metrics, time_range, etc.) are promoted to first-class kinds; unknown keys fall through to kind="field".
Dashboard.json_metadata and position_json are JSON-parsed and walked at the top level (full Phase 2 nested-structure diff is deferred).
null ↔ "" transitions are filtered as audit noise (Superset's save paths normalize nullable strings to "").
- Per-entity audit-field exclusions:
Slice.last_saved_at, Slice.last_saved_by_fk, and params.slice_id are stamped on every chart save and produce no record.
- Datetime / UUID / bytes are coerced to JSON-safe strings before storage.
The capture listener (register_change_record_listener()) runs before_flush (read parent pre-state, compute scalar diff) and after_flush (read child shadow tables for child-collection diffs, bulk-insert records).
Listener registration order
Three listeners cooperate at flush time, and their registration order in init_versioning() is load-bearing:
- Continuum's own
before_flush (registered by make_versioned() at app init) — allocates version_transaction.id and writes shadow rows.
register_baseline_listener() — registered with insert=True so its before_flush runs before Continuum's, allocating the synthetic baseline tx id ahead of the update tx id. Also walks dirty/new/deleted children back to their parent so a child-only flush still triggers the parent's baseline; synthesises op=0 shadows for pre-versioning children + dashboard slices.
register_change_record_listener() — registered LAST so its after_flush runs after Continuum has written the new shadow rows, then reads child shadow rows (table_columns_version / sql_metrics_version / dashboard_slices_version joined with slices_version) and emits per-change records into version_changes.
The change-record listener also has to handle after_flush re-firings within a single transaction (autoflush triggered by mid-commit queries can fire the chain more than once). It tracks processed transaction ids on session.info so the second firing is a no-op; without this dedup the unique constraint on version_changes would trip on duplicate inserts.
Time-based retention via Celery
Retention is time-based: version rows whose owning version_transaction.issued_at is older than SUPERSET_VERSION_HISTORY_RETENTION_DAYS (default 30) are pruned. The setting reads from the environment variable of the same name (with superset_config.py override); 0 disables pruning entirely.
Pruning runs as a scheduled Celery beat task rather than synchronously on the save path. This avoids adding latency to user saves and lets the prune run at off-peak hours. The task — version_history.prune_old_versions, defined in superset/tasks/version_history_retention.py — is registered in CELERYBEAT_SCHEDULE to run daily at 03:00 by default. Operators can change the schedule via the existing CELERYBEAT_SCHEDULE override in superset_config.py.
The task:
- Resolves the candidate transaction set:
version_transaction rows with issued_at < (now() - retention_days), then filters out any transaction whose parent shadow rows include a live row (end_transaction_id IS NULL). The live row of any versioned entity must always remain queryable, so its owning transaction is preserved no matter how old issued_at is. Closed historical rows — including the synthetic baseline (operation_type=0) — are NOT preserved separately; they age out with the rest of the history.
- Deletes shadow rows in the parent tables (
dashboards_version, slices_version, tables_version) for the surviving candidate transactions.
- Deletes shadow rows in the child tables (
table_columns_version, sql_metrics_version, dashboard_slices_version) for the same transactions.
- Drops the
version_transaction rows themselves. version_changes rows cascade automatically via the ON DELETE CASCADE FK on version_transaction.id.
Failures (DB errors, missing tables) are caught, logged, and the next scheduled run retries from a clean slate. The task is idempotent — a second run with no time elapsed prunes nothing.
A pure-Python _prune_old_versions_impl(retention_days) is exposed alongside the Celery wrapper so unit / integration tests can exercise the prune deterministically by passing a fake retention value (and optionally backdating version_transaction.issued_at to force eligibility).
A synchronous (per-save) prune was considered and rejected: it added per-save latency proportional to retention work, and gave installations no way to skip pruning during peak hours.
New configuration keys
| Key |
Type |
Default |
Description |
SUPERSET_VERSION_HISTORY_RETENTION_DAYS |
int |
30 |
Version rows older than this many days are pruned by the version_history.prune_old_versions Celery beat task. 0 disables pruning. Read from environment variable of the same name; can be overridden in superset_config.py. |
New or Changed Public Interfaces
New endpoints (same shape across the three entity types)
| Method |
Path |
Description |
GET |
/api/v1/{resource}/<uuid>/versions/ |
List the entity's version history |
GET |
/api/v1/{resource}/<uuid>/versions/<version_uuid>/ |
Get a single version snapshot (scalar fields plus restored columns / metrics for datasets) |
POST |
/api/v1/{resource}/<uuid>/versions/<version_uuid>/restore |
Restore the entity to that version |
{resource} ∈ {chart, dashboard, dataset}.
Endpoints are UUID-keyed end-to-end. version_uuid is a deterministic v5 derivation from (entity_uuid, transaction_id), stable across requests, so clients can bookmark a specific version.
All three endpoints carry the standard Superset decorator stack: @protect() (FAB model-level can_write check), @safe, @statsd_metrics, and @event_logger.log_this_with_context so each call appears in FAB's action_log alongside other audited API operations.
Authorization layering:
- Restore (
POST /versions/<version_uuid>/restore) enforces both model-level (@protect()) and row-level access via security_manager.raise_for_ownership(entity) inside the restore command. A user with can_write Dashboard who lacks per-dashboard role access on a specific dashboard cannot restore that dashboard. Workspace admins bypass via the existing FAB mechanism.
- List / Get (
GET /versions/, GET /versions/<version_uuid>/) enforce model-level (@protect()) but NOT row-level access. The UUID requirement is defense-in-depth (UUIDs aren't enumerable from the public API surface), but a user with model-level can_write plus knowledge of a dashboard's UUID could read its version history without per-dashboard role access. Whether to tighten this to mirror restore's model is an open question — see Open Questions §6.
GET /versions/ and GET /versions/<version_uuid>/ responses include a changes array on each entry, shape [{kind, path, from_value, to_value}]. For operation_type=0 (baseline / first-create) transactions the array is empty by design.
Concurrency tokens via ETag
To enable optimistic locking on saves (and to give the future UI SIP a stable handle for "you're editing version N" affordances), V1 emits an ETag response header carrying the current version_uuid on:
GET /api/v1/chart/<pk> / <uuid> — single chart fetch (drives the Explore editor)
GET /api/v1/dashboard/<pk> / <uuid> — single dashboard fetch (drives the dashboard editor)
GET /api/v1/dataset/<pk> / <uuid> — single dataset fetch (drives the dataset editor)
PUT /api/v1/{resource}/<pk> save responses — carry the new version_uuid produced by the save
- The new version endpoints (
GET /versions/, GET /versions/<version_uuid>/) — carry the version's own version_uuid so clients can directly cache it
The header value is the strong-form ETag "<version_uuid>" (RFC 7232 quoted), where version_uuid is the same deterministic v5 derivation from (entity_uuid, transaction_id) documented above. Cross-origin clients reading the API (notably future write-enabled embedded surfaces, third-party integrations, and the OpenAPI-generated SDK) will see the header via Access-Control-Expose-Headers: ETag set on the affected responses.
V1 ships the data backbone only. The server emits the ETag everywhere a client would need it to track entity versions, but does NOT enforce If-Match on writes — saves go through whether or not the request includes a precondition header. The locking semantics (412 Precondition Failed on stale If-Match, the UI conflict-resolution flow, whether to block or warn) are deferred to the follow-up UI SIP. Clients adopting V1 can already cache the ETag from every response and use it in their own client-side conflict detection if they want; the server-side gate lands later.
ETag was chosen over a body field (json.result.version_uuid) because (a) headers don't pollute the response body schema, (b) ETag / If-Match is the standard HTTP optimistic-locking pattern that every client library already understands, and (c) embedded dashboards (read-only via guest tokens; the surface most sensitive to body-shape changes) are unaffected by adding response headers — the header is simply ignored by clients that don't read it.
Model changes
Dashboard, Slice, and SqlaTable gain a __versioned__ class attribute. TableColumn and SqlMetric also gain __versioned__ (with the four audit fields excluded). dashboard_slices is reshaped by sc-105349-composite-association-pks to a composite PK on (dashboard_id, slice_id) — no other columns or relationships are modified.
New tables
| Table |
Purpose |
version_transaction |
Continuum's transaction log (renamed from default transaction) |
dashboards_version, slices_version, tables_version |
Continuum parent shadow tables |
table_columns_version, sql_metrics_version |
Continuum shadows for dataset children |
dashboard_slices_version |
Continuum shadow for the dashboard ↔ chart M2M (depends on sc-105349) |
version_changes |
Structured per-field diff records |
Naming note: shadows follow Continuum's <entity>_version convention; the bookkeeping/diff tables (version_transaction, version_changes) use a version_* prefix.
New Dependencies
sqlalchemy-continuum >= 1.6.0, < 2.0.0 — automatic shadow table versioning for SQLAlchemy models. No other runtime dependencies are added; the diff engine, listeners, and restore logic are pure stdlib + SQLAlchemy.
- Stack dependency on
sc-105349-composite-association-pks — reshapes the live dashboard_slices table to a composite PK on (dashboard_id, slice_id) so Continuum's M2M tracker produces shadow inserts whose shape matches the auto-generated dashboard_slices_version. Addresses SQLAlchemy-Continuum issue #129 (M2M restore against junction tables with surrogate PKs).
Migration Plan and Compatibility
Three Alembic migrations land in order:
add_entity_version_history_tables — creates version_transaction and the three parent shadow tables (dashboards_version, slices_version, tables_version).
add_version_changes_table — creates version_changes.
add_child_continuum_shadow_tables — creates table_columns_version, sql_metrics_version, and dashboard_slices_version.
All three migrations are reversible. downgrade() drops every table they create. Round-trip tested on SQLite, PostgreSQL, and MySQL.
The migrations are backwards-compatible — adding new tables does not affect existing rows or behaviour. Rolling back loses captured version history but leaves the live entity tables untouched.
Import/export: the v1 import pipeline (ImportDashboardsCommand, ImportChartsCommand, ImportDatasetsCommand) uses ORM add() / merge() / setattr and per-entity setattr for property updates, so all overwrites trigger the SQLAlchemy event chain and produce version rows attributed to the importing user. The one bulk-DML edge case — DatasetDAO.update_columns()'s bulk_insert_mappings / query().delete() path — is rewritten to issue individual ORM session.delete() / session.add() calls so Continuum's flush hooks fire on column changes. No additional special handling for imports is required beyond that rewrite.
Direct database access: the version-history tables (Continuum shadows + version_transaction + version_changes) are written only by the listener chain. External tooling that writes directly to the main entity tables will not have version rows captured for those writes.
Operational notes
Brief notes on how versioning interacts with adjacent surface areas. None of these are open design questions:
- Embedded dashboards / guest tokens. Version endpoints all require
can_write on the underlying entity model (@protect() decorator). Guest tokens grant read-only access scoped to a specific dashboard, so they cannot reach the version endpoints; the embedded surface is unaffected.
- CLI imports.
superset import-dashboards <file> and the equivalent commands flow through the same v1 import pipeline as the API and produce version rows the same way: a baseline (if the entity had no prior versions) plus an update row per imported entity. A 100-dashboard bulk import grows each entity's history by at most one row; subsequent retention pruning ages those rows out by SUPERSET_VERSION_HISTORY_RETENTION_DAYS.
- UUID-keyed version endpoints alongside integer-keyed CRUD. Existing
/api/v1/{resource}/<pk> endpoints keep their integer <pk> parameter; the new version endpoints take <uuid> to align with the broader Superset UUID migration. Clients fetch the entity's UUID from the standard list/get response and use it in subsequent version calls. The two paths coexist deliberately; nothing about the existing endpoints changes.
- Plugin / inheritance behaviour. Forks and downstream deployments that subclass
Slice, Dashboard, or SqlaTable inherit __versioned__ automatically — derived classes get versioned without explicit opt-in. This is the desired behaviour (any persistence model that derives from a versioned parent is conceptually a specialisation of the same content), but plugin authors should be aware that subclasses don't need their own __versioned__ declaration and that excluding a column added in a subclass requires an __versioned__ override on the subclass.
Rejected Alternatives
-
Hybrid (Continuum for parent scalars, JSON snapshot tables for child collections) — Two purpose-built JSON-blob tables (dataset_snapshots, dashboard_snapshots) keyed on (entity_id, transaction_id) could store child state as columns_json / metrics_json / slice_ids_json instead of using Continuum's auto-generated child shadow tables. Restore would deserialize the JSON, expunge live children, and re-insert. Rejected because the storage cost of full-collection snapshots on every save (one row of N child rows per save) exceeds the row-per-changed-child cost of shadow tables for any non-trivial edit cadence, and because maintaining schema changes for a JSON blob is more challenging than maintaining table schemas.
-
Custom JSON snapshot table for everything — A single entity_versions table with a snapshot JSONB column. Restore would be simpler (deserialize and setattr), and Continuum's shadow tables would be avoided entirely. Rejected because Continuum captures scalar updates automatically across all three entity types without instrumenting every save path, and a fully-custom approach would need manual schema synchronisation as entity models evolve.
-
SQLAlchemy-History — A fork of Continuum. Inactive, smaller community, no meaningful feature advantage.
-
Database-level temporal tables / system versioning — Native or extension-based row-history tracking via SQL semantics (FOR SYSTEM_TIME AS OF …). Would shift capture into the engine and eliminate the Python library dependency. Rejected for two reasons. (a) Cross-database support is uneven: MariaDB 10.3+ has native WITH SYSTEM VERSIONING; PostgreSQL has no native system versioning (the third-party temporal_tables extension exists but isn't in core); MySQL has no native system versioning either (commonly confused with MariaDB); SQLite has none and would need triggers that sacrifice atomicity. Supporting all four would require four backend-specific paths plus a fallback. (b) Application-level concerns still cost code: temporal tables capture what changed but not who (user attribution would still need triggers or an application-level transaction table — re-inventing Continuum's version_transaction); historical queries are read-only, so restore is still application code; the version_changes field-level diff records need a diff engine regardless; and the cross-table identity problems (override_columns, dashboard_slices) that drove R-017/R-018 are independent of where snapshots live. With Continuum the Python implementation is identical across all four backends.
-
Single-call multi-relation Reverter — Calling target_version.revert(relations=["columns", "metrics"]) (or relations=["slices"]) in one shot. Rejected for restore because it triggers a Reverter / autoflush / cascade-add interaction whenever the target tx requires removing live children: between iterating relation a and relation b, getattr(self.obj, prop.key) runs a Continuum query whose autoflush flushes the pending session.delete(live_child) from relation a's second loop, transitioning those instances to state.deleted=True; the final session.add(version_parent) then cascades through the parent's save-update collection and trips on the deleted-state instance with InvalidRequestError. The split-revert pattern (one relation per call, with flush + expire between calls) avoids the autoflush window entirely.
-
Synchronous (per-save) retention pruning — A second after_commit listener could prune at the moment a new version row is committed (e.g., delete the oldest row whenever a per-entity count threshold is exceeded). Rejected because the prune work falls onto the user's save path, adding latency proportional to retention churn; installations have no way to defer it to off-peak hours; and the model couples pruning cadence to save cadence (a rarely-edited entity never gets pruned). The Celery beat approach decouples retention from the request lifecycle.
-
Hand-rolled diff engine vs. external library (deepdiff, dictdiffer, jsonpatch) — Considered and rejected. The on-disk path shape and kind classification (filter vs. metric vs. field etc.) are co-located with diff walking; external libraries return JSON-Pointer or string paths that would need translation. Child-collection identity uses natural keys (column_name, metric_name, slice uuid), not list indices, which is also non-default. The hand-rolled diff is ~500 LOC of pure functions with comprehensive unit-test coverage.
-
Default-off feature flag (e.g. VERSION_HISTORY_ENABLED) — Considered and not added. The implementation is engineered to no-op cleanly when its tables don't exist (each listener narrow-catches OperationalError on the SQLite "no such table" startup race; broader DB errors propagate so operators can fix migrations rather than silently lose capture). A flag adds a configuration surface that can drift from the migration state — deployments could disable the flag while leaving the tables populated, then re-enable months later with stale data. The cleaner kill switch, if needed, is to skip the three migrations entirely. If post-launch evidence shows operators want a runtime off-switch, a flag can be added in a follow-up without breaking compatibility.
Open Questions
-
Permissions model — Currently all three version endpoints require can_write. Open: should viewing version history allow can_read for auditability? Should restore require a separate, more privileged permission? Should there be a workspace-level toggle to disable versioning?
-
is_managed_externally entities — Entities managed by external tools (Terraform, CI/CD, CLI imports) may already have version control via Git. Should restore be disabled or require confirmation? Should frequent CLI imports accumulate version rows rapidly?
-
Dataset restore and downstream impact — Restoring a dataset version restores columns and metrics to their historical state. Should dependent charts be notified or validated? What is the user experience when a restored dataset column no longer matches a chart's query?
-
Deeper structural diff inside layout components — diff_dashboard_layout emits one record per logical action on a top-level component (chart added/removed/moved/edited, row added, etc.). Edits inside a chart's meta (resized, restyled) currently surface as a single edit record carrying the full meta dict; per-property records would be more useful for the UI. Deferred until the UI side is in scope and can validate the right granularity.
-
Multi-entity transactions — A single save (e.g. dashboard import) can produce versions across multiple entities sharing one version_transaction.id. The list endpoint correctly filters per-entity, but the structured change records intentionally key on (transaction_id, entity_kind, entity_id) so unrelated edits don't collide. Worth confirming that the UI surfaces transactions cleanly when one user action touches multiple entities.
-
Row-level access on GET /versions/ and GET /versions/<version_uuid>/ — The list and get endpoints enforce can_write at the FAB model level but do not call raise_for_ownership(entity) the way the restore endpoint does. A user with can_write Dashboard plus knowledge of a dashboard's UUID can read its version history even if Dashboard.roles would otherwise hide that dashboard from them. The UUID requirement is defence-in-depth but not strict authorization. Tightening to per-entity role enforcement on read is a small change — should we apply it for parity with restore, or leave the read surface looser by design?
-
Concurrent edits — when to enforce If-Match. V1 emits the ETag header (carrying version_uuid) on the save and editor-fetch endpoints listed above, so the data backbone for optimistic locking is in place. What V1 does NOT do is enforce If-Match on writes — saves currently succeed regardless of whether the client sent a precondition header. Open: when do we turn enforcement on, and what does the UI do when it fires? Subtle banner suggesting refresh? Blocking modal? Side-by-side merge view? The right answer is UX-driven and likely belongs in the follow-up UI SIP, not V1.
Compatibility
Soft-delete compatibility (a small set of test additions and one __versioned__["exclude"] extension) lands once #39464 merges.
[SIP] Entity Version History for Dashboards, Charts, and Datasets
Motivation
Every save to a dashboard, chart, or dataset in Superset is destructive — the previous state is permanently lost. Users who accidentally overwrite a dashboard layout, corrupt a set of filters, or remove calculated metrics have no recovery path short of a database backup, which requires admin intervention and may not be available.
This is a consistent source of support tickets and user frustration. Other platforms offer built-in version history with one-click restore. The absence of this capability in Superset is a pain point for organizations managing critical dashboards.
Proposed Change
Add automatic version history to dashboards, charts, and datasets. Every save creates a new version. Users can browse the history of any entity, inspect what each save changed, and restore a previous version. Restore is non-destructive: it produces a new version row that brings the entity back to a prior state, leaving the version chain intact.
Strategy at a glance
Dashboard/Slice/SqlaTableand childrenTableColumn/SqlMetric) into a shadow table; thedashboard_slicesM2M gets its own shadow. No save-path instrumentation needed beyond a singlemake_versioned()call at app init.operation_type=0row representing the state before that save, so the version history for that entity isn't a single mystery edit. The baseline is treated like any other historical row by retention — it ages out alongside the rest of the history.Reverteris the engine;VersionDAO.restore_version()calls it once per related collection (split-revert) and stampschanged_on/changed_by_fkon the live entity so the restoring commit is attributed to the user who clicked Restore. The new shadow row that the restoring commit produces is what gives "non-destructive restore."version_changes(one row per atomic change, keyed toversion_transaction.id). Captured forward at save time so the UI can render"Added column 'country'"/"Renamed dashboard"without diffing snapshots at read time.SUPERSET_VERSION_HISTORY_RETENTION_DAYSenv var. A scheduled Celery beat task ages out old shadow rows. Only the live (current) row is preserved regardless of age; closed historical rows including the baseline age out.GET /api/v1/{resource}/<uuid>/versions/andPOST /api/v1/{resource}/<uuid>/versions/<version_uuid>/restore. Plus anETag: "<version_uuid>"header on save responses and editor-fetch GETs so future UI work can do optimistic locking.The four save-time pieces (Continuum capture, baseline listener, change-record listener, and the M2M / child shadow tables that come along with
__versioned__) are all wired up in a singleinit_versioning()step at app start. The three persistence layers (Continuum shadows,version_transaction,version_changes) are added by three Alembic migrations. The Celery beat task is registered in the existingCELERYBEAT_SCHEDULE. No feature flag, no UI in v1.Per-entity restore scope
dashboard_title,position_json,json_metadata,slug,css,description,certified_by,certification_details,published,external_url,theme_id,is_managed_externally); chart membership (dashboard_slices— which charts are attached, in what layout)owners,roles, tagsslice_name,params,viz_type,description,cache_timeout,certified_by,certification_details,external_url,is_managed_externally); the dataset linkage (datasource_id,datasource_type)owners, tags,query_context(cached/regenerated), the dataset's own statetable_name,schema,catalog,database_id,sql,description,main_dttm_col,default_endpoint,offset,cache_timeout,params,template_params,extra,fetch_values_predicate,is_sqllab_view,is_managed_externally); calculated columns (TableColumnrows —column_name,expression,type,verbose_name,is_dttm,groupby,filterable, etc.); metrics (SqlMetricrows —metric_name,expression,metric_type,verbose_name,d3format,currency, etc.)owners,row_level_security_filters, tags, charts that depend on this datasetThe remainder of this section drills into each piece.
Continuum-driven scalar capture
SQLAlchemy-Continuum is added as a dependency and configured in
superset/extensions/__init__.py:Two custom subclasses fix integration friction:
VersionTransactionFactoryrenames Continuum's defaulttransactiontable toversion_transaction(avoids collision with downstream extensions that use the unqualified name) and the PostgreSQL sequence toversion_transaction_id_seq.VersioningFlaskPluginoverridestransaction_args()to read the acting user viasuperset.utils.core.get_user_id()instead offlask_login.current_user. The default plugin readscurrent_user, which Flask-Login populates under cookie-authenticated sessions but not under JWT auth (JWT bypasses Flask-Login).get_user_id()readsg.user, populated by both auth modes, so the custom plugin attributes API saves consistently.The three entity types declare which columns to track:
Once the soft-delete SIP merges,
deleted_atwill be added to all three exclude lists so that soft-deleting an entity does not produce a content-edit version row.make_versioned()registers SQLAlchemy mapper events that activate when__versioned__models go throughconfigure_mappers()(lazily, at first session use). It must therefore run before any code path that triggers mapper configuration. For Flask-SQLAlchemy this means inextensions/__init__.pyimmediately afterdb = SQLAlchemy(), before any model module is imported by the app initialiser.Synthetic baseline capture
A
before_flushlistener (register_baseline_listener()) inserts a synthetic baseline row (operation_type=0) the first time a pre-existing entity is updated under versioning. This means the very first save of any pre-existing dashboard, chart, or dataset produces two version rows — the baseline (state before the edit) and the edit itself — giving users an immediate rollback point. The listener is prepended (insert=True) in the SQLAlchemy event chain so the baseline's transaction id is allocated before Continuum's own listener allocates the update's, keeping the baseline sorted first in the version history.Continuum shadow tables for child state
TableColumn,SqlMetric, and thedashboard_slicesassociation are versioned via Continuum's auto-generated shadow tables on the samevaliditystrategy as the parents:dashboards_version,slices_version,tables_versionvaliditystrategytable_columns_version,sql_metrics_version__versioned__onTableColumn/SqlMetricexcludes the four audit fields (changed_on/by_fk,created_on/by_fk) so child diffs don't surface "changed_on shifted" noise on every parent savedashboard_slices_versionsc-105349-composite-association-pksfor the live composite-PK shape — Continuum's M2M tracker builds shadow inserts from**params + transaction_id + operation_type, whereparamsis what the live INSERT carries; the composite-PK shape lets that map to the shadow's PK cleanlyThree pieces of the design let Continuum-tracked children work cleanly against Superset's existing entity model:
DatasetDAO._override_columns. The dataset-edit path keys child writes oncolumn_name(andmetric_nameforSqlMetric), updating in place rather than deleting and reinserting under fresh primary keys. Stable PKs across edits keep the shadow trail unambiguous — no overlapping(id, column_name)rows for the same logical column.dashboard_slices(sc-105349-composite-association-pks). The live association table uses a composite PK on(dashboard_id, slice_id)instead of a surrogateid. Continuum's M2M tracker builds shadow inserts from**params + transaction_id + operation_type, whereparamsis what the live INSERT carries. With the composite-PK live shape, that maps to the auto-generateddashboard_slices_versionPK cleanly.operation_type=0shadow rows indashboard_slices_versionAND inslices_versionfor each attached slice that has no prior shadow, using the dashboard's baseline transaction id. The M2M version-side restore query joinsdashboard_slices_versionwithslices_versionand filters by validity at the dashboard's tx — without the slice-side baseline, dashboards whose membership predates versioning would restore to an emptysliceslist.The restore path uses Continuum's
Reverternatively, with one further accommodation — a split-revert loop, described below — to avoid a Reverter / autoflush / cascade-add interaction that fires whenrevert(relations=['a','b'])is given two or more uselist relations and the target tx requires removing live children.Restore
VersionDAO.restore_version()uses Continuum'sReverternatively, with a split-revert loop. For each relation in_RESTORE_RELATIONS[<class>](datasets:["columns", "metrics"]; dashboards:["slices"]; charts:[]):target_version.revert(relations=[<one>])— copies versioned scalar fields onto the live entity AND rebuilds the named relationship from the version-side shadows.db.session.flush()— pushes pending child DELETEs to the DB.db.session.expire(entity, _RESTORE_RELATIONS[<class>])— invalidates the parent's in-memory collection so the next iteration's cascade-add doesn't trip on a deleted-state child instance.After all relations are processed, the live entity's
changed_onandchanged_by_fkare overwritten withdatetime.now()and the restoring user's id, so the resulting new version row attributes the restore to the right user.created_on/created_by_fkare left untouched (they continue to record original authorship).The split-revert pattern addresses a Reverter / SQLAlchemy autoflush / cascade-add interaction that fires when
revert(relations=['a','b'])is given two or more uselist relations and the target tx requires removing live children. Single-relation calls don't open the autoflush window between iterations, and the explicitflush + expireclears any deleted-state instances from the parent's in-memory collection before the next call's cascade-add runs.Tags, owners, and roles are deliberately excluded across all three entity types — they are access/organisation metadata, not user-authored content. Restoring them would surprise users by retroactively changing collaborator or permission state. The exclusion is enforced by
__versioned__["exclude"]on each parent class.Structured change records
A
version_changestable records an atomic per-field diff for every save. Schema:idtransaction_idversion_transaction.id, ON DELETE CASCADEentity_kindchart/dashboard/datasetentity_idsequence(transaction_id, entity_kind, entity_id)kindfield,filter,metric,time_range,color_palette,dimension,column,chartpath["dashboard_title"],["params", "adhoc_filters", "country"],["json_metadata", "refresh_frequency"]from_value,to_valueIndexes and constraints:
UNIQUE (transaction_id, entity_kind, entity_id, sequence)— guards against duplicate inserts whenafter_flushre-fires within a single transaction (see Listener registration order above).INDEX (transaction_id)— supports the FK lookup and the per-version "what changed" query.INDEX (entity_kind, entity_id)— supports per-entity history queries (e.g., "show all change records for chart 42").INDEX (kind)— supports kind-filtered queries (e.g., "show every filter change across the workspace").A pure-Python diff engine (
superset/versioning/diff.py) walks pre-state vs post-state and emits the structured records that power per-version labels in the UI ("Added filtercountry", "Changed time range", etc.). The records themselves are machine-readable and language-neutral; rendering them into localized strings is a UI concern via Flask-Babelt(), deferred to the follow-up UI SIP. Highlights:Slice.paramsis JSON-parsed and walked: known keys (adhoc_filters,metrics,time_range, etc.) are promoted to first-class kinds; unknown keys fall through tokind="field".Dashboard.json_metadataandposition_jsonare JSON-parsed and walked at the top level (full Phase 2 nested-structure diff is deferred).null ↔ ""transitions are filtered as audit noise (Superset's save paths normalize nullable strings to"").Slice.last_saved_at,Slice.last_saved_by_fk, andparams.slice_idare stamped on every chart save and produce no record.The capture listener (
register_change_record_listener()) runsbefore_flush(read parent pre-state, compute scalar diff) andafter_flush(read child shadow tables for child-collection diffs, bulk-insert records).Listener registration order
Three listeners cooperate at flush time, and their registration order in
init_versioning()is load-bearing:before_flush(registered bymake_versioned()at app init) — allocatesversion_transaction.idand writes shadow rows.register_baseline_listener()— registered withinsert=Trueso itsbefore_flushruns before Continuum's, allocating the synthetic baseline tx id ahead of the update tx id. Also walks dirty/new/deleted children back to their parent so a child-only flush still triggers the parent's baseline; synthesises op=0 shadows for pre-versioning children + dashboard slices.register_change_record_listener()— registered LAST so itsafter_flushruns after Continuum has written the new shadow rows, then reads child shadow rows (table_columns_version/sql_metrics_version/dashboard_slices_versionjoined withslices_version) and emits per-change records intoversion_changes.The change-record listener also has to handle
after_flushre-firings within a single transaction (autoflush triggered by mid-commit queries can fire the chain more than once). It tracks processed transaction ids onsession.infoso the second firing is a no-op; without this dedup the unique constraint onversion_changeswould trip on duplicate inserts.Time-based retention via Celery
Retention is time-based: version rows whose owning
version_transaction.issued_atis older thanSUPERSET_VERSION_HISTORY_RETENTION_DAYS(default30) are pruned. The setting reads from the environment variable of the same name (withsuperset_config.pyoverride);0disables pruning entirely.Pruning runs as a scheduled Celery beat task rather than synchronously on the save path. This avoids adding latency to user saves and lets the prune run at off-peak hours. The task —
version_history.prune_old_versions, defined insuperset/tasks/version_history_retention.py— is registered inCELERYBEAT_SCHEDULEto run daily at 03:00 by default. Operators can change the schedule via the existingCELERYBEAT_SCHEDULEoverride insuperset_config.py.The task:
version_transactionrows withissued_at < (now() - retention_days), then filters out any transaction whose parent shadow rows include a live row (end_transaction_id IS NULL). The live row of any versioned entity must always remain queryable, so its owning transaction is preserved no matter how oldissued_atis. Closed historical rows — including the synthetic baseline (operation_type=0) — are NOT preserved separately; they age out with the rest of the history.dashboards_version,slices_version,tables_version) for the surviving candidate transactions.table_columns_version,sql_metrics_version,dashboard_slices_version) for the same transactions.version_transactionrows themselves.version_changesrows cascade automatically via theON DELETE CASCADEFK onversion_transaction.id.Failures (DB errors, missing tables) are caught, logged, and the next scheduled run retries from a clean slate. The task is idempotent — a second run with no time elapsed prunes nothing.
A pure-Python
_prune_old_versions_impl(retention_days)is exposed alongside the Celery wrapper so unit / integration tests can exercise the prune deterministically by passing a fake retention value (and optionally backdatingversion_transaction.issued_atto force eligibility).A synchronous (per-save) prune was considered and rejected: it added per-save latency proportional to retention work, and gave installations no way to skip pruning during peak hours.
New configuration keys
SUPERSET_VERSION_HISTORY_RETENTION_DAYS30version_history.prune_old_versionsCelery beat task.0disables pruning. Read from environment variable of the same name; can be overridden insuperset_config.py.New or Changed Public Interfaces
New endpoints (same shape across the three entity types)
GET/api/v1/{resource}/<uuid>/versions/GET/api/v1/{resource}/<uuid>/versions/<version_uuid>/columns/metricsfor datasets)POST/api/v1/{resource}/<uuid>/versions/<version_uuid>/restore{resource}∈{chart, dashboard, dataset}.Endpoints are UUID-keyed end-to-end.
version_uuidis a deterministic v5 derivation from(entity_uuid, transaction_id), stable across requests, so clients can bookmark a specific version.All three endpoints carry the standard Superset decorator stack:
@protect()(FAB model-levelcan_writecheck),@safe,@statsd_metrics, and@event_logger.log_this_with_contextso each call appears in FAB'saction_logalongside other audited API operations.Authorization layering:
POST /versions/<version_uuid>/restore) enforces both model-level (@protect()) and row-level access viasecurity_manager.raise_for_ownership(entity)inside the restore command. A user withcan_write Dashboardwho lacks per-dashboard role access on a specific dashboard cannot restore that dashboard. Workspace admins bypass via the existing FAB mechanism.GET /versions/,GET /versions/<version_uuid>/) enforce model-level (@protect()) but NOT row-level access. The UUID requirement is defense-in-depth (UUIDs aren't enumerable from the public API surface), but a user with model-levelcan_writeplus knowledge of a dashboard's UUID could read its version history without per-dashboard role access. Whether to tighten this to mirror restore's model is an open question — see Open Questions §6.GET /versions/andGET /versions/<version_uuid>/responses include achangesarray on each entry, shape[{kind, path, from_value, to_value}]. Foroperation_type=0(baseline / first-create) transactions the array is empty by design.Concurrency tokens via
ETagTo enable optimistic locking on saves (and to give the future UI SIP a stable handle for "you're editing version N" affordances), V1 emits an
ETagresponse header carrying the currentversion_uuidon:GET /api/v1/chart/<pk>/<uuid>— single chart fetch (drives the Explore editor)GET /api/v1/dashboard/<pk>/<uuid>— single dashboard fetch (drives the dashboard editor)GET /api/v1/dataset/<pk>/<uuid>— single dataset fetch (drives the dataset editor)PUT /api/v1/{resource}/<pk>save responses — carry the newversion_uuidproduced by the saveGET /versions/,GET /versions/<version_uuid>/) — carry the version's ownversion_uuidso clients can directly cache itThe header value is the strong-form ETag
"<version_uuid>"(RFC 7232 quoted), whereversion_uuidis the same deterministic v5 derivation from(entity_uuid, transaction_id)documented above. Cross-origin clients reading the API (notably future write-enabled embedded surfaces, third-party integrations, and the OpenAPI-generated SDK) will see the header viaAccess-Control-Expose-Headers: ETagset on the affected responses.V1 ships the data backbone only. The server emits the ETag everywhere a client would need it to track entity versions, but does NOT enforce
If-Matchon writes — saves go through whether or not the request includes a precondition header. The locking semantics (412 Precondition Failed on staleIf-Match, the UI conflict-resolution flow, whether to block or warn) are deferred to the follow-up UI SIP. Clients adopting V1 can already cache the ETag from every response and use it in their own client-side conflict detection if they want; the server-side gate lands later.ETag was chosen over a body field (
json.result.version_uuid) because (a) headers don't pollute the response body schema, (b)ETag/If-Matchis the standard HTTP optimistic-locking pattern that every client library already understands, and (c) embedded dashboards (read-only via guest tokens; the surface most sensitive to body-shape changes) are unaffected by adding response headers — the header is simply ignored by clients that don't read it.Model changes
Dashboard,Slice, andSqlaTablegain a__versioned__class attribute.TableColumnandSqlMetricalso gain__versioned__(with the four audit fields excluded).dashboard_slicesis reshaped bysc-105349-composite-association-pksto a composite PK on(dashboard_id, slice_id)— no other columns or relationships are modified.New tables
version_transactiontransaction)dashboards_version,slices_version,tables_versiontable_columns_version,sql_metrics_versiondashboard_slices_versionversion_changesNaming note: shadows follow Continuum's
<entity>_versionconvention; the bookkeeping/diff tables (version_transaction,version_changes) use aversion_*prefix.New Dependencies
sqlalchemy-continuum >= 1.6.0, < 2.0.0— automatic shadow table versioning for SQLAlchemy models. No other runtime dependencies are added; the diff engine, listeners, and restore logic are pure stdlib + SQLAlchemy.sc-105349-composite-association-pks— reshapes the livedashboard_slicestable to a composite PK on(dashboard_id, slice_id)so Continuum's M2M tracker produces shadow inserts whose shape matches the auto-generateddashboard_slices_version. Addresses SQLAlchemy-Continuum issue #129 (M2M restore against junction tables with surrogate PKs).Migration Plan and Compatibility
Three Alembic migrations land in order:
add_entity_version_history_tables— createsversion_transactionand the three parent shadow tables (dashboards_version,slices_version,tables_version).add_version_changes_table— createsversion_changes.add_child_continuum_shadow_tables— createstable_columns_version,sql_metrics_version, anddashboard_slices_version.All three migrations are reversible.
downgrade()drops every table they create. Round-trip tested on SQLite, PostgreSQL, and MySQL.The migrations are backwards-compatible — adding new tables does not affect existing rows or behaviour. Rolling back loses captured version history but leaves the live entity tables untouched.
Import/export: the v1 import pipeline (
ImportDashboardsCommand,ImportChartsCommand,ImportDatasetsCommand) uses ORMadd()/merge()/setattrand per-entitysetattrfor property updates, so all overwrites trigger the SQLAlchemy event chain and produce version rows attributed to the importing user. The one bulk-DML edge case —DatasetDAO.update_columns()'sbulk_insert_mappings/query().delete()path — is rewritten to issue individual ORMsession.delete()/session.add()calls so Continuum's flush hooks fire on column changes. No additional special handling for imports is required beyond that rewrite.Direct database access: the version-history tables (Continuum shadows +
version_transaction+version_changes) are written only by the listener chain. External tooling that writes directly to the main entity tables will not have version rows captured for those writes.Operational notes
Brief notes on how versioning interacts with adjacent surface areas. None of these are open design questions:
can_writeon the underlying entity model (@protect()decorator). Guest tokens grant read-only access scoped to a specific dashboard, so they cannot reach the version endpoints; the embedded surface is unaffected.superset import-dashboards <file>and the equivalent commands flow through the same v1 import pipeline as the API and produce version rows the same way: a baseline (if the entity had no prior versions) plus an update row per imported entity. A 100-dashboard bulk import grows each entity's history by at most one row; subsequent retention pruning ages those rows out bySUPERSET_VERSION_HISTORY_RETENTION_DAYS./api/v1/{resource}/<pk>endpoints keep their integer<pk>parameter; the new version endpoints take<uuid>to align with the broader Superset UUID migration. Clients fetch the entity's UUID from the standard list/get response and use it in subsequent version calls. The two paths coexist deliberately; nothing about the existing endpoints changes.Slice,Dashboard, orSqlaTableinherit__versioned__automatically — derived classes get versioned without explicit opt-in. This is the desired behaviour (any persistence model that derives from a versioned parent is conceptually a specialisation of the same content), but plugin authors should be aware that subclasses don't need their own__versioned__declaration and that excluding a column added in a subclass requires an__versioned__override on the subclass.Rejected Alternatives
Hybrid (Continuum for parent scalars, JSON snapshot tables for child collections) — Two purpose-built JSON-blob tables (
dataset_snapshots,dashboard_snapshots) keyed on(entity_id, transaction_id)could store child state ascolumns_json/metrics_json/slice_ids_jsoninstead of using Continuum's auto-generated child shadow tables. Restore would deserialize the JSON, expunge live children, and re-insert. Rejected because the storage cost of full-collection snapshots on every save (one row of N child rows per save) exceeds the row-per-changed-child cost of shadow tables for any non-trivial edit cadence, and because maintaining schema changes for a JSON blob is more challenging than maintaining table schemas.Custom JSON snapshot table for everything — A single
entity_versionstable with asnapshotJSONB column. Restore would be simpler (deserialize andsetattr), and Continuum's shadow tables would be avoided entirely. Rejected because Continuum captures scalar updates automatically across all three entity types without instrumenting every save path, and a fully-custom approach would need manual schema synchronisation as entity models evolve.SQLAlchemy-History — A fork of Continuum. Inactive, smaller community, no meaningful feature advantage.
Database-level temporal tables / system versioning — Native or extension-based row-history tracking via SQL semantics (
FOR SYSTEM_TIME AS OF …). Would shift capture into the engine and eliminate the Python library dependency. Rejected for two reasons. (a) Cross-database support is uneven: MariaDB 10.3+ has nativeWITH SYSTEM VERSIONING; PostgreSQL has no native system versioning (the third-partytemporal_tablesextension exists but isn't in core); MySQL has no native system versioning either (commonly confused with MariaDB); SQLite has none and would need triggers that sacrifice atomicity. Supporting all four would require four backend-specific paths plus a fallback. (b) Application-level concerns still cost code: temporal tables capture what changed but not who (user attribution would still need triggers or an application-level transaction table — re-inventing Continuum'sversion_transaction); historical queries are read-only, so restore is still application code; theversion_changesfield-level diff records need a diff engine regardless; and the cross-table identity problems (override_columns,dashboard_slices) that drove R-017/R-018 are independent of where snapshots live. With Continuum the Python implementation is identical across all four backends.Single-call multi-relation
Reverter— Callingtarget_version.revert(relations=["columns", "metrics"])(orrelations=["slices"]) in one shot. Rejected for restore because it triggers a Reverter / autoflush / cascade-add interaction whenever the target tx requires removing live children: between iterating relationaand relationb,getattr(self.obj, prop.key)runs a Continuum query whose autoflush flushes the pendingsession.delete(live_child)from relationa's second loop, transitioning those instances tostate.deleted=True; the finalsession.add(version_parent)then cascades through the parent'ssave-updatecollection and trips on the deleted-state instance withInvalidRequestError. The split-revert pattern (one relation per call, withflush + expirebetween calls) avoids the autoflush window entirely.Synchronous (per-save) retention pruning — A second
after_commitlistener could prune at the moment a new version row is committed (e.g., delete the oldest row whenever a per-entity count threshold is exceeded). Rejected because the prune work falls onto the user's save path, adding latency proportional to retention churn; installations have no way to defer it to off-peak hours; and the model couples pruning cadence to save cadence (a rarely-edited entity never gets pruned). The Celery beat approach decouples retention from the request lifecycle.Hand-rolled diff engine vs. external library (
deepdiff,dictdiffer,jsonpatch) — Considered and rejected. The on-diskpathshape and kind classification (filtervs.metricvs.fieldetc.) are co-located with diff walking; external libraries return JSON-Pointer or string paths that would need translation. Child-collection identity uses natural keys (column_name,metric_name, sliceuuid), not list indices, which is also non-default. The hand-rolled diff is ~500 LOC of pure functions with comprehensive unit-test coverage.Default-off feature flag (e.g.
VERSION_HISTORY_ENABLED) — Considered and not added. The implementation is engineered to no-op cleanly when its tables don't exist (each listener narrow-catchesOperationalErroron the SQLite "no such table" startup race; broader DB errors propagate so operators can fix migrations rather than silently lose capture). A flag adds a configuration surface that can drift from the migration state — deployments could disable the flag while leaving the tables populated, then re-enable months later with stale data. The cleaner kill switch, if needed, is to skip the three migrations entirely. If post-launch evidence shows operators want a runtime off-switch, a flag can be added in a follow-up without breaking compatibility.Open Questions
Permissions model — Currently all three version endpoints require
can_write. Open: should viewing version history allowcan_readfor auditability? Should restore require a separate, more privileged permission? Should there be a workspace-level toggle to disable versioning?is_managed_externallyentities — Entities managed by external tools (Terraform, CI/CD, CLI imports) may already have version control via Git. Should restore be disabled or require confirmation? Should frequent CLI imports accumulate version rows rapidly?Dataset restore and downstream impact — Restoring a dataset version restores columns and metrics to their historical state. Should dependent charts be notified or validated? What is the user experience when a restored dataset column no longer matches a chart's query?
Deeper structural diff inside layout components —
diff_dashboard_layoutemits one record per logical action on a top-level component (chart added/removed/moved/edited, row added, etc.). Edits inside a chart'smeta(resized, restyled) currently surface as a singleeditrecord carrying the full meta dict; per-property records would be more useful for the UI. Deferred until the UI side is in scope and can validate the right granularity.Multi-entity transactions — A single save (e.g. dashboard import) can produce versions across multiple entities sharing one
version_transaction.id. The list endpoint correctly filters per-entity, but the structured change records intentionally key on(transaction_id, entity_kind, entity_id)so unrelated edits don't collide. Worth confirming that the UI surfaces transactions cleanly when one user action touches multiple entities.Row-level access on
GET /versions/andGET /versions/<version_uuid>/— The list and get endpoints enforcecan_writeat the FAB model level but do not callraise_for_ownership(entity)the way the restore endpoint does. A user withcan_write Dashboardplus knowledge of a dashboard's UUID can read its version history even if Dashboard.roles would otherwise hide that dashboard from them. The UUID requirement is defence-in-depth but not strict authorization. Tightening to per-entity role enforcement on read is a small change — should we apply it for parity with restore, or leave the read surface looser by design?Concurrent edits — when to enforce
If-Match. V1 emits theETagheader (carryingversion_uuid) on the save and editor-fetch endpoints listed above, so the data backbone for optimistic locking is in place. What V1 does NOT do is enforceIf-Matchon writes — saves currently succeed regardless of whether the client sent a precondition header. Open: when do we turn enforcement on, and what does the UI do when it fires? Subtle banner suggesting refresh? Blocking modal? Side-by-side merge view? The right answer is UX-driven and likely belongs in the follow-up UI SIP, not V1.Compatibility
Soft-delete compatibility (a small set of test additions and one
__versioned__["exclude"]extension) lands once #39464 merges.