Redact the label image in whole-slide imaging (WSI) files — fast, reliable, pure Go.
wsi-label is a command-line tool that replaces or strips the label image inside digital pathology slide files — the small on-screen sticker photo that typically carries patient identifiers (PHI). It never decodes the multi-gigabyte pyramid, so it's fast on any slide size. For formats whose label is its own TIFF directory (Aperio SVS, generic-TIFF, Philips) the output is bit-identical to the input everywhere except that label IFD. For formats whose label is a region of a larger associated image (Hamamatsu NDPI macro, Ventana BIF overview) only that one small image is decoded, edited, and re-encoded — the pyramid is still untouched. A companion Python driver (label-slides.py) renders anonymized labels from CSV-driven or command-line metadata and batches the replacement.
Use cases: de-identification for data sharing, anonymization before publication, scrubbing PHI from research datasets, and bulk label updates in pathology pipelines.
The label-handling capability here also lives inside wsitools, a larger WSI toolkit. This repo exists on purpose, not by accident. wsitools is pure-Go capable, but its default build enables CGo-backed codec encoders (AVIF, JPEG-XL, WebP, HTJ2K, libjpeg-turbo) that link native libraries — you only get a dependency-free wsitools by opting out via build tags. wsi-label, by contrast, is unconditionally pure Go: two pure-Go dependencies, zero CGo, no build flags to remember. That single property is the whole point — it cross-compiles to every platform from one machine, ships as a self-contained static binary with no .so/.dll to install, and presents a tiny audit and maintenance surface. So the rule of thumb is: reach for wsi-label when you just need to redact or swap labels and want a drop-in binary; reach for wsitools when you need full whole-slide decoding and conversion. The overlap is intentional duplication, and it stays here as long as staying dependency-free is worth it.
- Replace or strip the label across SVS, generic-TIFF, Philips, NDPI, and BIF — never touching pyramid pixel data. (See the Supported formats table.)
- Two redaction strategies, picked automatically. Splice (SVS / generic-TIFF / Philips): the label is its own IFD, removed or replaced as opaque bytes — codec-independent and byte-exact outside the label. Region (NDPI / BIF): the label is a fixed crop of a small associated image (NDPI macro's left ~30%, BIF overview's top 1/3), which is decoded, blanked or recomposited, and re-encoded in pure Go.
- Preserves original label geometry on the splice formats — matches the existing label's dimensions and orientation (1200×848 landscape Aperio and older portrait ScanScope).
- BigTIFF-aware — handles both classic TIFF (magic 42) and BigTIFF (magic 43) transparently.
- Pure Go — no CGo, no libtiff/libopenslide runtime dependency. One static binary per platform.
- Atomic writes — temp-file + fsync + rename; killed runs never leave a half-written output.
- Extensible — format detection/classification lives behind a
redact.Classifierregistry (oneinit()registration per format); adding a new format (e.g. Leica SCN) is a drop-in classifier, not a rewrite.
Download the archive for your platform from the Releases page, extract, and move the wsi-label binary onto your PATH.
# Linux amd64 example
curl -L https://github.com/WSILabs/wsi-label-tools/releases/latest/download/wsi-label_0.2.0_linux_amd64.tar.gz | tar xz
sudo mv wsi-label /usr/local/bin/
wsi-label --version
Or build from source (requires Go 1.22+):
go build -o wsi-label ./cmd/wsi-label
wsi-label inspect <wsi-in> # list IFDs + roles
wsi-label replace <wsi-in> <label-image> [<wsi-out>] [flags] # swap label
wsi-label strip <wsi-in> [<wsi-out>] [flags] # remove label
wsi-label label-dims <wsi-in> # print "WxH"
Replace / strip flags:
--overwrite— clobber existing output (otherwise a numbered suffix is picked).--strict-replace— hard-fail if the input has no label (otherwise a label is added, and the tool exits 10).--label-dims WxH— force a specific target size (default: match existing label, else 1200×848).--resize fit|stretch|none— how to fit an input that isn't already at target dims. Defaultfit.--rotate 0|90|180|270— rotate input label before storing.--bg RRGGBB— letterbox fill for--resize=fit. DefaultF5F5E6(Aperio parchment).--force— bypass the >2× aspect-ratio safety check.--fsync=false— skip the fsync before rename (speed, not durability).-q— silence non-error stderr output.
label-slides.py is a Python wrapper over wsi-label replace that handles label rendering and batch execution:
# Interactive — pick files + enter label text
./label-slides.py --svs-dir /path/to/slides
# One label applied to every file (quick PHI scrub)
./label-slides.py --text "ANON-001" *.svs
# Per-file labels from a spreadsheet
./label-slides.py --csv labels.csv --svs-dir /path/to/slides -j 4
# Strip labels entirely (no rendering)
./label-slides.py --strip *.svs
Dependencies: pillow, and optionally questionary for the arrow-key TUI.
| Code | Meaning |
|---|---|
| 0 | Label replaced or stripped. |
| 2 | Usage error / bad flags / --strict-replace triggered. |
| 3 | Input unreadable or unrecognized. |
| 4 | Output exists (use --overwrite). |
| 5 | File has unexpected TIFF layout; refusing to proceed. |
| 10 | Success, but label was added rather than replaced. |
The tool is built around a handful of checked invariants rather than heuristics:
- Prefix byte-identity. Bytes
[0, cutoff)of the output equal the input byte-for-byte — the pyramid, pyramid tile offsets, thumbnail, and anything else before the label are untouched. Directly testable withcmp. - No pyramid pixel decode. Tiles are copied as opaque byte ranges. Whether they're JPEG, JPEG2000, or anything else is irrelevant.
- No dead-byte PHI. The old label's bytes never make it into the output file. No post-hoc scrubbing needed.
- Cutoff violation refuses, never corrupts. If a file's byte layout doesn't match the IFD chain order (pathological but legal TIFF), the tool fails loudly with exit 5 rather than silently corrupting a tile array.
- TIFF 6.0 compliant LZW. Uses
github.com/hhrutter/lzwwithoneOff=true; output decodes cleanly throughlibtiff, not just lenient readers.
Validated against an openslide-python round-trip plus a strict tiffinfo -D decode on every commit; see scripts/cross_validate.py.
| Format | Status | Notes |
|---|---|---|
| Aperio SVS | ✅ V1 | Classic TIFF + BigTIFF |
| generic-TIFF | ✅ | tag 65080 / heuristic; pure splice |
| Philips-TIFF | ✅ | Software prefix detect; pure splice (label CI fixture pending) |
| Hamamatsu NDPI | ✅ | Region strategy (blank/composite the macro's label crop); pure-Go JPEG |
| Ventana BIF | ✅ | Region strategy (blank/composite the overview's top-1/3 label band); DP200 + legacy |
| Leica SCN | 🛣️ Parked | overview-role + SCN-XML |
| OME-TIFF | ❌ out of charter | SubIFD pyramid — needs file-rewrite (use wsitools) |
| Zeiss CZI / MRXS | ❌ | non-TIFF / directory-based |
See docs/ROADMAP.md.
- Design doc:
docs/superpowers/specs/2026-04-20-wsi-label-design.md - Implementation plan:
docs/superpowers/plans/2026-04-20-wsi-label.md
whole-slide imaging · WSI · digital pathology · Aperio · SVS · TIFF · BigTIFF · label replacement · label redaction · de-identification · deidentification · PHI removal · anonymization · slide anonymizer · openslide · ImageScope · Grundium · ScanScope · digital pathology pipeline · pathology informatics · pathology data sharing · HIPAA-adjacent · pure Go