Skip to content

SizunJiangLab/fusion_toolkit

Repository files navigation

fusion-toolkit

Toolkit for managing data produced by the PhenoCycler Fusion (Akoya Biosciences). Covers the data naming standard, monthly backup to TrueNAS and AWS S3, and capacity-driven TrueNAS cleanup.

Subsystems

fusion_toolkit.standard

Validates that a study directory matches the naming standard before any backup runs. Eight rules:

  • R1 — study name matches YYYYMMDD_<user>_<project>_<customname> (4 fields) with a valid calendar date
  • R2 — experiment subdirectory = <study> (4-field form, same as study) or <study>_<suffix> (5-field form, suffix ∈ [A-Za-z0-9-]+; no _)
  • R3 — character set per field is [A-Za-z0-9-]; experiment name length ≤ 60 chars (Windows MAX_PATH guard)
  • R4 — study has both metadata (≥ 1 .xpd or .fpr) and data (≥ 1 experiment directory)
  • R5 — at study level, allow only .xpd, .fpr, and matching experiment directories (nothing else)
  • R6 — every .xpd has a matching sibling experiment directory of the same name (strict 1:1)
  • R7.fpr filename uses the R3 character set; presence-only (one .fpr may serve multiple experiments, no pairing required)
  • R8 — each experiment has at least one Scan<N>/ whose contents include both a *.qptiff file and a .temp/ folder

Full rule reference with failing / passing examples: docs/data-standard.md. Exposed as validate_study(path) / validate_root(path) and via the validate CLI group.

fusion_toolkit.manifest

Per-month append-only JSONL ledger of backup runs. The authoritative copy lives on TrueNAS at manifest_root/YYYY-MM.jsonl (configurable); a mirror is uploaded to S3 after every append as a disaster-recovery copy. Each record holds:

  • backup_name, run_name, source / TrueNAS / S3 paths
  • file_count, total_bytes
  • started_at, truenas_verified_at, s3_verified_at, completed_at, source_deleted_at
  • statusin_progress, complete, failed_validation, failed_truenas, or failed_s3

Readers fold the file into {backup_name: latest_record} to get current state. The backup subsystem writes; the cleanup subsystem reads (for its safety-gate check).

fusion_toolkit.backup

Monthly orchestrator that copies eligible idle runs (mtime ≥ idle_days ago) from the Fusion host to TrueNAS and AWS S3. Each run goes through the pipeline:

  1. Validate against the data standard — fail fast if violating
  2. Stream-copy to TrueNAS with per-file SHA-256
  3. Read-back-verify every file on TrueNAS against the hash
  4. Atomically write a _checksums.sha256 sidecar
  5. Upload to S3 with server-side ChecksumSHA256 headers
  6. Append a complete record to the manifest
  7. Delete the source on the Fusion host

An optional [hooks] table in config.toml lets each subcommand run an external command before its own logic kicks off:

[hooks]
backup  = ["powershell.exe", "-NoProfile", "-File", "C:/path/to/mount-truenas.ps1"]
cleanup = ["powershell.exe", "-NoProfile", "-File", "C:/path/to/mount-truenas.ps1"]

Currently backup and cleanup honor hooks. Each value is an argv list. Hook stdio is inherited so its output appears live in fusion-toolkit's log; non-zero exit aborts the subcommand with exit code 1 (the actual backup/cleanup never starts). A missing key, an empty [hooks] table, or no [hooks] table at all is a no-op — the subcommand runs as before.

The original use case is mounting an SMB share via Windows New-SmbMapping before each scheduled task, since drive-letter mappings are per-logon-session and can't be inherited from a boot-time mount task.

While a study is being processed a .fusion-toolkit.lock file is written into the study directory and removed in a finally clause when the per-study pipeline exits (success or any failure). If the process crashes mid-pipeline (SIGKILL, power loss) the lock is left behind. At the start of the next backup run, source_root is scanned for any .fusion-toolkit.lock files; if any are present the entire run is aborted with exit code 1 so an operator can inspect and clean up before more backups run. The lock holds metadata only (hostname / pid / started_at / run_name) — the orchestrator does not check PID liveness or staleness, so manual cleanup is always required after a crash.

Per-run exceptions are isolated so one failing run does not abort the monthly batch; the failure is recorded in the manifest and the next run is attempted. Driven by backup run [--config PATH] [--dry-run].

Also exposes scan_health_check(study, mtime_gap_days=14) as a soft pre-backup gate (not wired into the orchestrator; the caller decides when to apply it). Returns HealthWarning items for two independent checks:

  • empty — no experiment has a Scan<N>/ with both *.qptiff and .temp/
  • mtime_gap — earliest and latest mtimes in the study span more than mtime_gap_days days (study may still be actively edited)

fusion_toolkit.cleanup

Capacity-driven FIFO eviction on TrueNAS. When free space falls below free_threshold_pct, the oldest backups are deleted until the threshold is met. Every delete passes three safety gates:

  • Lock — an exclusive .cleanup.lock prevents concurrent cleanup and overlapping backup/cleanup races
  • Manifest gate — the candidate's latest manifest record must be Status.COMPLETE; in-progress or failed runs are skipped
  • S3 gate — at least one object must exist under the candidate's S3 prefix (defense in depth against partial uploads that passed the manifest gate somehow)

Only TrueNAS data is deleted — the S3 copy is the permanent archive and never touched. Driven by cleanup run [--config PATH] [--dry-run].

fusion_toolkit.monitor

Long-running tail of the Fusion application's Fusion.log. Detects ERROR blocks, enriches each with the nearest sample + cycle context, and emails them via the notify subsystem:

  • Offset + dedup-map persisted under ~/.config/fusion-toolkit/state/ so restarts don't re-email past errors
  • Dedup key = first 200 chars of the error content (timestamp stripped); same error suppressed for sent_keep_days (default 30)
  • File rotation handled by detecting a shrunk file and resetting to 0
  • Unhandled exceptions in the main loop trigger a [Fusion Monitor] CRASH email before exit 1

Driven by monitor tail [--config PATH] [--smtp-env PATH].

fusion_toolkit.notify

Gmail SMTP helper used by backup, cleanup, and monitor for failure / event emails. Credentials follow a boto3-style chain: environment variables (FUSION_SMTP_USER, FUSION_SMTP_APP_PASSWORD) first, then a dotenv-style file at ~/.config/fusion-toolkit/smtp.env.

Five events:

Event Fires on
backup_failure backup run exit != 0
cleanup_failure cleanup run exit != 0
fusion_error monitor tail detects an ERROR line in Fusion.log
monitor_crash monitor tail itself crashes
toolkit_error any fusion-toolkit WARNING/ERROR log (tool-health channel, dedup + cooldown)

Subscription lists are per-event text files, one email per line, under ~/.config/fusion-toolkit/recipients/:

~/.config/fusion-toolkit/recipients/
├── backup_failure.txt    # ops@lab / manager@lab
├── cleanup_failure.txt
├── fusion_error.txt      # oncall + instrument owners
├── monitor_crash.txt
├── toolkit_error.txt     # dev / maintainer
└── default.txt           # fallback for any event with an empty file

Lab members subscribe by appending their email to the relevant file. Lines starting with # are comments; blank lines are ignored. No TOML syntax, no array brackets — just one email per line. Hot-reload still applies: every alert reads fresh, so changes take effect on the next email with no restart.

Resolution order for each event:

  1. recipients/<event>.txt (if any uncommented emails)
  2. recipients/default.txt
  3. empty → that event sends nothing

To inspect / verify:

fusion-toolkit notify list-subscribers              # all events + sources
fusion-toolkit notify list-subscribers --event fusion_error
fusion-toolkit notify test --event fusion_error --dry-run
fusion-toolkit notify test --event fusion_error     # actually sends a TEST email

The config.toml [notify] section only carries the global switch, cooldown, and an optional recipients-dir override — no recipients live in TOML:

[notify]
enabled = true
# Optional: override the recipients directory. Default is the
# `recipients` sibling of this config.toml.
# recipients_dir = "/some/other/path"
toolkit_error_cooldown_seconds = 300

fusion_toolkit.install

One-shot deployment helper for the Fusion host. Two commands:

  • install init [--config-dir PATH] [--force] — populates ~/.config/fusion-toolkit/ with config.toml, smtp.env, and per-event recipients/*.txt subscription list templates. Refuses to clobber an existing config.toml / smtp.env without --force; recipients/*.txt is never overwritten, even with --force, because it carries operator-curated state.

  • install tasks [--user USER --password PASS] [--uninstall] — registers (or with --uninstall removes) three Windows Scheduled Tasks via schtasks.exe:

    Task Command Schedule
    FusionToolkitBackup backup run Monthly, day 1 at 02:00
    FusionToolkitCleanup cleanup run Monthly, day 20 at 02:00
    FusionToolkitMonitor monitor tail On system start

Argv construction is pure (build_create_argv / build_delete_argv), so the schtasks contract is unit-tested without invoking Windows. Unregister is idempotent — missing tasks are not an error. See Setup on the Fusion host below for the typical flow.

CLI

fusion-toolkit --version
fusion-toolkit [-v | --verbose] <subcommand> ...
fusion-toolkit validate study <path> [-f text|json] [-q]
fusion-toolkit validate root  <path> [-f text|json] [-q]
fusion-toolkit manifest show  <manifest.jsonl>
fusion-toolkit backup  run    [--config PATH] [--dry-run] [--smtp-env PATH]
fusion-toolkit cleanup run    [--config PATH] [--dry-run] [--smtp-env PATH]
fusion-toolkit monitor tail   [--config PATH] [--smtp-env PATH]
fusion-toolkit install init   [--config-dir PATH] [--force]
fusion-toolkit install tasks  [--user USER --password PASS] [--uninstall]
fusion-toolkit notify list-subscribers [--event EVENT]
fusion-toolkit notify test --event EVENT [--smtp-env PATH] [--dry-run]

Exit codes

Code Meaning
0 Success
1 Validation failure or orchestrator reported one or more failed runs
2 Config error (missing or malformed TOML, missing required key, lock held)
3 Unexpected runtime error (network, permissions, unhandled bug)

Scheduled-task alerting should treat any non-zero exit code as actionable. When [notify].enabled = true, backup run / cleanup run / monitor tail also route failures and tool-health warnings to event-specific recipient lists (see the fusion_toolkit.notify section above). If notify is disabled or credentials are missing the commands behave exactly as before (exit codes only).

Setup on the Fusion host

After uv sync in the cloned repo, the typical flow is:

uv run fusion-toolkit install init        # writes config + smtp.env templates
# edit ~/.config/fusion-toolkit/{config.toml,smtp.env}
uv run fusion-toolkit install tasks        # prompts for your Windows password

install tasks defaults --user to $env:USERDOMAIN\$env:USERNAME (the current account) and prompts for the password via stdin so it never appears in PowerShell history. Pass --user or --password explicitly to override (e.g. for unattended automation).

Full details (Python + uv install, AWS credential chain, SMB mapping quirks) are in docs/setup-fusion-host.md (中文: docs/setup-fusion-host.zh.md).

Web validator

Browser-based name checker at https://wuwenrui555.github.io/fusion_toolkit/. Enter a study name and optional experiment name; per-field pass/fail is shown inline. Maintenance: docs/gh-pages.md.

Development

uv sync
uv run pre-commit install

Run checks locally before pushing:

uv run ruff check src/ tests/
uv run ruff format --check src/ tests/
uv run pyright src/
uv run pytest

License

TBD

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages