fusion-toolkit

Toolkit for managing data produced by the PhenoCycler Fusion (Akoya Biosciences). Covers the data naming standard, monthly backup to TrueNAS and AWS S3, and capacity-driven TrueNAS cleanup.

Subsystems

`fusion_toolkit.standard`

Validates that a study directory matches the naming standard before any backup runs. Eight rules:

R1 — study name matches YYYYMMDD_<user>_<project>_<customname> (4 fields) with a valid calendar date
R2 — experiment subdirectory = <study> (4-field form, same as study) or <study>_<suffix> (5-field form, suffix ∈ [A-Za-z0-9-]+; no _)
R3 — character set per field is [A-Za-z0-9-]; experiment name length ≤ 60 chars (Windows MAX_PATH guard)
R4 — study has both metadata (≥ 1 .xpd or .fpr) and data (≥ 1 experiment directory)
R5 — at study level, allow only .xpd, .fpr, and matching experiment directories (nothing else)
R6 — every .xpd has a matching sibling experiment directory of the same name (strict 1:1)
R7 — .fpr filename uses the R3 character set; presence-only (one .fpr may serve multiple experiments, no pairing required)
R8 — each experiment has at least one Scan<N>/ whose contents include both a *.qptiff file and a .temp/ folder

Full rule reference with failing / passing examples: docs/data-standard.md. Exposed as validate_study(path) / validate_root(path) and via the validate CLI group.

`fusion_toolkit.manifest`

Per-month append-only JSONL ledger of backup runs. The authoritative copy lives on TrueNAS at manifest_root/YYYY-MM.jsonl (configurable); a mirror is uploaded to S3 after every append as a disaster-recovery copy. Each record holds:

backup_name, run_name, source / TrueNAS / S3 paths
file_count, total_bytes
started_at, truenas_verified_at, s3_verified_at, completed_at, source_deleted_at
status — in_progress, complete, failed_validation, failed_truenas, or failed_s3

Readers fold the file into {backup_name: latest_record} to get current state. The backup subsystem writes; the cleanup subsystem reads (for its safety-gate check).

`fusion_toolkit.backup`

Monthly orchestrator that copies eligible idle runs (mtime ≥ idle_days ago) from the Fusion host to TrueNAS and AWS S3. Each run goes through the pipeline:

Validate against the data standard — fail fast if violating
Stream-copy to TrueNAS with per-file SHA-256
Read-back-verify every file on TrueNAS against the hash
Atomically write a _checksums.sha256 sidecar
Upload to S3 with server-side ChecksumSHA256 headers
Append a complete record to the manifest
Delete the source on the Fusion host

An optional [hooks] table in config.toml lets each subcommand run an external command before its own logic kicks off:

[hooks]
backup  = ["powershell.exe", "-NoProfile", "-File", "C:/path/to/mount-truenas.ps1"]
cleanup = ["powershell.exe", "-NoProfile", "-File", "C:/path/to/mount-truenas.ps1"]

Currently backup and cleanup honor hooks. Each value is an argv list. Hook stdio is inherited so its output appears live in fusion-toolkit's log; non-zero exit aborts the subcommand with exit code 1 (the actual backup/cleanup never starts). A missing key, an empty [hooks] table, or no [hooks] table at all is a no-op — the subcommand runs as before.

The original use case is mounting an SMB share via Windows New-SmbMapping before each scheduled task, since drive-letter mappings are per-logon-session and can't be inherited from a boot-time mount task.

While a study is being processed a .fusion-toolkit.lock file is written into the study directory and removed in a finally clause when the per-study pipeline exits (success or any failure). If the process crashes mid-pipeline (SIGKILL, power loss) the lock is left behind. At the start of the next backup run, source_root is scanned for any .fusion-toolkit.lock files; if any are present the entire run is aborted with exit code 1 so an operator can inspect and clean up before more backups run. The lock holds metadata only (hostname / pid / started_at / run_name) — the orchestrator does not check PID liveness or staleness, so manual cleanup is always required after a crash.

Per-run exceptions are isolated so one failing run does not abort the monthly batch; the failure is recorded in the manifest and the next run is attempted. Driven by backup run [--config PATH] [--dry-run].

Also exposes scan_health_check(study, mtime_gap_days=14) as a soft pre-backup gate (not wired into the orchestrator; the caller decides when to apply it). Returns HealthWarning items for two independent checks:

empty — no experiment has a Scan<N>/ with both *.qptiff and .temp/
mtime_gap — earliest and latest mtimes in the study span more than mtime_gap_days days (study may still be actively edited)

`fusion_toolkit.cleanup`

Capacity-driven FIFO eviction on TrueNAS. When free space falls below free_threshold_pct, the oldest backups are deleted until the threshold is met. Every delete passes three safety gates:

Lock — an exclusive .cleanup.lock prevents concurrent cleanup and overlapping backup/cleanup races
Manifest gate — the candidate's latest manifest record must be Status.COMPLETE; in-progress or failed runs are skipped
S3 gate — at least one object must exist under the candidate's S3 prefix (defense in depth against partial uploads that passed the manifest gate somehow)

Only TrueNAS data is deleted — the S3 copy is the permanent archive and never touched. Driven by cleanup run [--config PATH] [--dry-run].

`fusion_toolkit.monitor`

Long-running tail of the Fusion application's Fusion.log. Detects ERROR blocks, enriches each with the nearest sample + cycle context, and emails them via the notify subsystem:

Offset + dedup-map persisted under ~/.config/fusion-toolkit/state/ so restarts don't re-email past errors
Dedup key = first 200 chars of the error content (timestamp stripped); same error suppressed for sent_keep_days (default 30)
File rotation handled by detecting a shrunk file and resetting to 0
Unhandled exceptions in the main loop trigger a [Fusion Monitor] CRASH email before exit 1

Driven by monitor tail [--config PATH] [--smtp-env PATH].

`fusion_toolkit.notify`

Gmail SMTP helper used by backup, cleanup, and monitor for failure / event emails. Credentials follow a boto3-style chain: environment variables (FUSION_SMTP_USER, FUSION_SMTP_APP_PASSWORD) first, then a dotenv-style file at ~/.config/fusion-toolkit/smtp.env.

Five events:

Event	Fires on
`backup_failure`	`backup run` exit != 0
`cleanup_failure`	`cleanup run` exit != 0
`fusion_error`	`monitor tail` detects an `ERROR` line in `Fusion.log`
`monitor_crash`	`monitor tail` itself crashes
`toolkit_error`	any fusion-toolkit `WARNING`/`ERROR` log (tool-health channel, dedup + cooldown)

Subscription lists are per-event text files, one email per line, under ~/.config/fusion-toolkit/recipients/:

~/.config/fusion-toolkit/recipients/
├── backup_failure.txt    # ops@lab / manager@lab
├── cleanup_failure.txt
├── fusion_error.txt      # oncall + instrument owners
├── monitor_crash.txt
├── toolkit_error.txt     # dev / maintainer
└── default.txt           # fallback for any event with an empty file

Lab members subscribe by appending their email to the relevant file. Lines starting with # are comments; blank lines are ignored. No TOML syntax, no array brackets — just one email per line. Hot-reload still applies: every alert reads fresh, so changes take effect on the next email with no restart.

Resolution order for each event:

recipients/<event>.txt (if any uncommented emails)
recipients/default.txt
empty → that event sends nothing

To inspect / verify:

fusion-toolkit notify list-subscribers              # all events + sources
fusion-toolkit notify list-subscribers --event fusion_error
fusion-toolkit notify test --event fusion_error --dry-run
fusion-toolkit notify test --event fusion_error     # actually sends a TEST email

The config.toml [notify] section only carries the global switch, cooldown, and an optional recipients-dir override — no recipients live in TOML:

[notify]
enabled = true
# Optional: override the recipients directory. Default is the
# `recipients` sibling of this config.toml.
# recipients_dir = "/some/other/path"
toolkit_error_cooldown_seconds = 300

`fusion_toolkit.install`

One-shot deployment helper for the Fusion host. Two commands:

install init [--config-dir PATH] [--force] — populates ~/.config/fusion-toolkit/ with config.toml, smtp.env, and per-event recipients/*.txt subscription list templates. Refuses to clobber an existing config.toml / smtp.env without --force; recipients/*.txt is never overwritten, even with --force, because it carries operator-curated state.
install tasks [--user USER --password PASS] [--uninstall] — registers (or with --uninstall removes) three Windows Scheduled Tasks via schtasks.exe:

Task Command Schedule

FusionToolkitBackup backup run Monthly, day 1 at 02:00

FusionToolkitCleanup cleanup run Monthly, day 20 at 02:00

FusionToolkitMonitor monitor tail On system start

Argv construction is pure (build_create_argv / build_delete_argv), so the schtasks contract is unit-tested without invoking Windows. Unregister is idempotent — missing tasks are not an error. See Setup on the Fusion host below for the typical flow.

CLI

fusion-toolkit --version
fusion-toolkit [-v | --verbose] <subcommand> ...
fusion-toolkit validate study <path> [-f text|json] [-q]
fusion-toolkit validate root  <path> [-f text|json] [-q]
fusion-toolkit manifest show  <manifest.jsonl>
fusion-toolkit backup  run    [--config PATH] [--dry-run] [--smtp-env PATH]
fusion-toolkit cleanup run    [--config PATH] [--dry-run] [--smtp-env PATH]
fusion-toolkit monitor tail   [--config PATH] [--smtp-env PATH]
fusion-toolkit install init   [--config-dir PATH] [--force]
fusion-toolkit install tasks  [--user USER --password PASS] [--uninstall]
fusion-toolkit notify list-subscribers [--event EVENT]
fusion-toolkit notify test --event EVENT [--smtp-env PATH] [--dry-run]

Exit codes

Code	Meaning
`0`	Success
`1`	Validation failure or orchestrator reported one or more failed runs
`2`	Config error (missing or malformed TOML, missing required key, lock held)
`3`	Unexpected runtime error (network, permissions, unhandled bug)

Scheduled-task alerting should treat any non-zero exit code as actionable. When [notify].enabled = true, backup run / cleanup run / monitor tail also route failures and tool-health warnings to event-specific recipient lists (see the fusion_toolkit.notify section above). If notify is disabled or credentials are missing the commands behave exactly as before (exit codes only).

Setup on the Fusion host

After uv sync in the cloned repo, the typical flow is:

uv run fusion-toolkit install init        # writes config + smtp.env templates
# edit ~/.config/fusion-toolkit/{config.toml,smtp.env}
uv run fusion-toolkit install tasks        # prompts for your Windows password

install tasks defaults --user to $env:USERDOMAIN\$env:USERNAME (the current account) and prompts for the password via stdin so it never appears in PowerShell history. Pass --user or --password explicitly to override (e.g. for unattended automation).

Full details (Python + uv install, AWS credential chain, SMB mapping quirks) are in docs/setup-fusion-host.md (中文: docs/setup-fusion-host.zh.md).

Web validator

Browser-based name checker at https://wuwenrui555.github.io/fusion_toolkit/. Enter a study name and optional experiment name; per-field pass/fail is shown inline. Maintenance: docs/gh-pages.md.

Development

uv sync
uv run pre-commit install

Run checks locally before pushing:

uv run ruff check src/ tests/
uv run ruff format --check src/ tests/
uv run pyright src/
uv run pytest

License

TBD

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
.github/workflows		.github/workflows
docs		docs
src/fusion_toolkit		src/fusion_toolkit
tests		tests
.gitignore		.gitignore
.markdownlint.json		.markdownlint.json
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fusion-toolkit

Subsystems

`fusion_toolkit.standard`

`fusion_toolkit.manifest`

`fusion_toolkit.backup`

`fusion_toolkit.cleanup`

`fusion_toolkit.monitor`

`fusion_toolkit.notify`

`fusion_toolkit.install`

CLI

Exit codes

Setup on the Fusion host

Web validator

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Task	Command	Schedule
`FusionToolkitBackup`	`backup run`	Monthly, day 1 at 02:00
`FusionToolkitCleanup`	`cleanup run`	Monthly, day 20 at 02:00
`FusionToolkitMonitor`	`monitor tail`	On system start

Folders and files

Latest commit

History

Repository files navigation

fusion-toolkit

Subsystems

fusion_toolkit.standard

fusion_toolkit.manifest

fusion_toolkit.backup

fusion_toolkit.cleanup

fusion_toolkit.monitor

fusion_toolkit.notify

fusion_toolkit.install

CLI

Exit codes

Setup on the Fusion host

Web validator

Development

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`fusion_toolkit.standard`

`fusion_toolkit.manifest`

`fusion_toolkit.backup`

`fusion_toolkit.cleanup`

`fusion_toolkit.monitor`

`fusion_toolkit.notify`

`fusion_toolkit.install`

Packages