Skip to content

docs: secure canister upgrade guide#57

Merged
marc0olo merged 2 commits into
mainfrom
docs/guides-security-canister-upgrades
Apr 16, 2026
Merged

docs: secure canister upgrade guide#57
marc0olo merged 2 commits into
mainfrom
docs/guides-security-canister-upgrades

Conversation

@marc0olo
Copy link
Copy Markdown
Member

Summary

  • Covers the full upgrade lifecycle: preflight checklist, upgrade sequence (stop → pre_upgrade → Wasm swap → post_upgrade → start), and post-upgrade verification
  • Stable memory patterns: Motoko persistent actor vs Rust ic-stable-structures with concrete code examples
  • Why to avoid pre_upgrade serialization (instruction-limit trap risk) and the skip_pre_upgrade emergency path
  • Candid interface compatibility rules: safe vs breaking changes
  • Snapshot-based rollback workflow using icp-cli commands
  • Schema evolution patterns: Motoko two-step migration, Rust serde(default) + Bound::Unbounded
  • Testing upgrades locally with a write → upgrade → read verification loop
  • Controller safety: backup controllers before risky upgrades

Sync recommendation

informed by portal building-apps/canister-management/upgrade.mdx, .sources/cdk-rs, and icskills canister-security

Covers pre/post-upgrade hooks, stable memory patterns (Motoko persistent actor
and Rust stable structures), Candid interface compatibility rules, snapshot-based
rollback workflow, schema evolution, local testing procedures, and controller safety.
Written as a security checklist with actionable patterns for production upgrades.
@marc0olo
Copy link
Copy Markdown
Member Author

Review: Secure Upgrades

Must fix

  • Missing icp canister start after snapshot creation in rollback workflow: The snapshot-based rollback example stops the canister, creates a snapshot, then immediately calls icp deploy without restarting the canister first. The official icp-cli snapshot guide (.sources/icp-cli/docs/guides/canister-snapshots.md) shows the pattern as stop → snapshot create → start → deploy. Omitting the icp canister start leaves the canister stopped while the deploy runs — icp deploy does stop/restart internally during upgrade, so this may work in practice, but it means the canister is down during the entire snapshot creation period with no indication to the reader that this is intentional. The rollback flow at step 4b correctly shows start after restore; consistency requires the same treatment after snapshot creation in step 1. Suggested fix:

    # 1. Stop the canister and create a snapshot
    icp canister stop my-canister -e ic
    icp canister snapshot create my-canister -e ic
    # Note the snapshot ID printed in the output
    icp canister start my-canister -e ic
    
    # 2. Deploy the upgrade
    icp deploy my-canister -e ic
  • ciborium serialization in schema evolution example is non-canonical: The UserV2 schema evolution snippet uses ciborium::into_writer / ciborium::from_reader for CBOR serialization. No examples in .sources/examples/ or .sources/cdk-rs/ use ciborium for Storable implementations — the canonical pattern uses Candid encoding (Encode\!/Decode\!, as in .sources/examples/rust/tokenmania/backend/types.rs) or raw byte layout. The ciborium crate is not referenced anywhere in the upstream sources. Introducing a crate not demonstrated in any upstream source risks sending readers to install and configure a dependency they would otherwise never encounter. Suggested fix: replace the ciborium serialization body with the Candid encoding pattern used in upstream examples, or note that the encoding format is flexible and link to an upstream example.

Suggestions

  • transient var comment framing: The recentCallers field comment says "resets to [] on each upgrade — correct for caches." The broader use-cases for transient include rate-limit counters and in-progress request logs. Expanding the comment to "correct for caches, transient logs, and reset-on-upgrade counters" would help readers recognize when their own fields should be transient.

  • skip_pre_upgrade — add a link to the management canister reference: The page correctly describes skip_pre_upgrade as a management canister install_code flag. Adding a link to ../../reference/management-canister.md (which exists in the repo) would give readers a quick path to the full API rather than having to search for it.

  • Checklist item connection: The checklist item "In Motoko, use persistent actor" and "Avoid pre_upgrade hooks that serialize large state" are listed as separate items. For newcomers, a short parenthetical "(which eliminates the need for pre_upgrade)" on the persistent actor line would connect the two without requiring the reader to cross-reference.

  • Frontmatter description omits schema evolution: The description mentions "pre/post hooks, stable memory, Candid compatibility, snapshot rollbacks, and testing" but the schema evolution section is substantial and not listed. Adding "schema evolution" to the description improves discoverability.

Verified

  • All internal link targets exist (three .md links resolve to .mdx files — Astro resolves these correctly; three resolve to .md files directly):
    • ../canister-management/snapshots.mddocs/guides/canister-management/snapshots.mdx
    • ../testing/pocket-ic.mddocs/guides/testing/pocket-ic.md
    • access-management.mddocs/guides/security/access-management.mdx
    • ../backends/data-persistence.mddocs/guides/backends/data-persistence.mdx
    • ../canister-management/lifecycle.mddocs/guides/canister-management/lifecycle.mdx
    • ../testing/strategies.mddocs/guides/testing/strategies.md
  • External URL https://docs.rs/ic-stable-structures/latest/ic_stable_structures/ matches the linking rules table in content-authoring.md
  • All CLI commands verified against .sources/icp-cli/docs/reference/cli.md: icp deploy, icp deploy --yes, icp network start -d, icp canister call, icp canister stop, icp canister start, icp canister snapshot create/delete/restore, icp canister settings show, icp canister settings update --add-controller — all valid with correct flags ✓
  • Motoko persistent actor and transient var syntax verified against .sources/motoko/ test files and .sources/icskills/skills/stable-memory/SKILL.md
  • Rust stable structures API (MemoryManager, StableBTreeMap, StableCell, VirtualMemory, DefaultMemoryImpl, MemoryId, Bound::Unbounded) verified against .sources/examples/rust/
  • #[ic_cdk::post_upgrade] and #[ic_cdk::pre_upgrade] macro names verified against .sources/cdk-rs/ic-cdk/tests/
  • ic_cdk::storage::stable_save verified as available in .sources/cdk-rs/ic-cdk/src/storage.rs
  • skip_pre_upgrade: Option<bool> field verified in .sources/cdk-rs/ic-management-canister-types/src/lib.rs
  • Candid interface compatibility rules (safe/breaking changes table) verified against .sources/candid/spec/Candid.md — parameter and return type subtyping directions are correct ✓
  • Named parameter syntax in the Candid example (label : opt text) verified as valid Candid shorthand per spec ✓
  • Storable import path (ic_stable_structures::storable::{Bound, Storable}) verified against .sources/examples/rust/tokenmania/backend/types.rs
  • No dfx references found ✓
  • No banned URLs (internetcomputer.org/docs/, docs.internetcomputer.org) found ✓
  • No <\!-- TODO: verify output --> or <\!-- Needs human verification: --> flags present ✓
  • Frontmatter complete: title, description, sidebar.order present ✓
  • <\!-- Upstream: --> comment present at end of file ✓
  • npm run build passes ✓

@marc0olo
Copy link
Copy Markdown
Member Author

Feedback addressed:

  • Added icp canister start between snapshot create and icp deploy in the rollback workflow (matching the canonical stop → snapshot → start → deploy pattern from icp-cli docs)
  • Replaced ciborium CBOR serialization in the UserV2 Storable impl with the Candid encoding pattern (Encode!/Decode!) from .sources/examples/rust/tokenmania/backend/types.rs; updated section heading and bullet accordingly; added dfinity/examples to <!-- Upstream: --> comment
  • Expanded transient var comment to mention caches, transient logs, and reset-on-upgrade counters
  • Added link to ../../reference/management-canister.md for skip_pre_upgrade
  • Added "(which eliminates the need for pre_upgrade hooks)" parenthetical to the persistent actor checklist item
  • Added "schema evolution" to frontmatter description

@marc0olo marc0olo merged commit 9713eaf into main Apr 16, 2026
1 check passed
@marc0olo marc0olo deleted the docs/guides-security-canister-upgrades branch April 16, 2026 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant