A small CLI for finding and resolving duplicate video scenes in a Stash library, safely.
StashJanitor solves three related problems:
- Workflow A — cross-scene duplicates. Two or more separate Stash scenes whose perceptual hashes are similar (the same content stored as different files, possibly different resolutions or encodes).
- Workflow B — within-scene multi-file cleanup. A single Stash scene that has more than one file attached because Stash auto-attaches re-detected files instead of creating a new scene.
- Workflow C — orphan metadata recovery. Scenes with no stash-box
metadata (no
stash_ids). stash-janitor queries stash-box by phash, finds matches, and links them back so Stash's Scene Tagger can pull the full metadata.
All workflows fetch their data from Stash, score/match every candidate, persist decisions to a local SQLite cache, and let you review everything before committing any mutation.
Default behavior is paranoid.
applyis dry-run by default. All destructive actions require both--commitAND an interactiveYESconfirmation.--yesbypasses the prompt for scripted use.
See PLAN.md for the full design and the rationale behind every locked-in decision.
All three workflows work end-to-end against Stash v0.31.0:
| Command | Status |
|---|---|
stash-janitor config init / stash-janitor config show |
✅ |
stash-janitor stats (library dashboard) |
✅ |
stash-janitor scenes scan / status / report |
✅ |
stash-janitor scenes apply --action tag|merge|delete |
✅ |
stash-janitor scenes mark --group N or --signature |
✅ |
stash-janitor files scan / status / report |
✅ |
stash-janitor files apply --commit (primary swap + deleteFiles) |
✅ |
stash-janitor files mark --scene-id |
✅ |
stash-janitor orphans scan / status / report |
✅ |
stash-janitor orphans scan --endpoint all (multi-endpoint) |
✅ |
stash-janitor orphans apply --commit (link to stash-box) |
✅ |
--submit-fingerprints (all apply commands) |
✅ |
stash-janitor review (interactive TUI) |
✅ |
| Per-loser "kept by: ..." explanations in reports | ✅ |
| Filename-info-loss safety net (workflow A) | ✅ |
| Rename-on-merge (preserve loser filename info) | ✅ |
| Filename-to-metadata extraction on merge | ✅ |
Stale-cache detection in stash-janitor stats |
✅ |
| Orphans scan progress indicator with ETA | ✅ |
stash-janitor organize scan / report / apply |
✅ |
See MANUAL.md for usage instructions and PLAN.md for the design and rationale.
- Go 1.25 or newer (required by the modernc.org/sqlite dependency)
- A reachable Stash instance running v0.31.0 (other 0.31.x patch releases should work; older or newer minor versions are untested)
- A few hundred MB of free disk for the SQLite cache (proportional to library size — at 60k scenes the DB is ~50 MB)
Option A: Download a release binary (no Go required)
Go to the Releases page, download the archive for your platform, and extract:
# Linux (amd64)
tar xzf stash-janitor-v*-linux-amd64.tar.gz
chmod +x stash-janitor
# macOS (Apple Silicon)
tar xzf stash-janitor-v*-darwin-arm64.tar.gz
chmod +x stash-janitor
# Windows — extract the .zip, you get stash-janitor.exeOption B: Build from source (requires Go 1.22+)
git clone https://github.com/Wasylq/StashJanitor.git
cd StashJanitor
go build -o stash-janitor ./cmd/stash-janitorOption C: go install
go install github.com/Wasylq/StashJanitor/cmd/stash-janitor@latestThe binary is fully self-contained (pure-Go SQLite, no cgo, no runtime deps).
# 1. Generate a config file. This creates config.yaml in the current dir.
./stash-janitor config init
# 2. Edit config.yaml — set your Stash URL:
# stash:
# url: http://your-nas:9999
# (the default is http://localhost:9999)
# 3. Optional: if your Stash has an API key, export it.
export STASH_API_KEY=...
# 4. See where you're at.
./stash-janitor stats
# 4. Scan and review cross-scene duplicates.
./stash-janitor scenes scan
./stash-janitor scenes report --filter decided | less
# 5. Apply (safest first pass: just tag).
./stash-janitor scenes apply --action tag --commit
# 6. Or go full merge (preserves metadata, renames files, reclaims disk):
./stash-janitor scenes apply --action merge --commit
# 7. Clean up multi-file scenes.
./stash-janitor files scan
./stash-janitor files report | less
./stash-janitor files apply --commit
# 8. Find metadata for orphan scenes via stash-box.
./stash-janitor orphans scan --max-scenes 500
./stash-janitor orphans report --filter matched
./stash-janitor orphans apply --commitstash-janitor config init writes a fully-commented config.yaml next to the binary.
The most important sections:
stash:
url: http://your-stash:9999
api_key_env: STASH_API_KEY # name of the env var to read
scan:
phash_distance: 4 # 0 = byte-equivalent, 4 = re-encodes,
# >8 produces false positives
duration_diff_seconds: 1.0 # critical false-positive filter
scoring:
scenes:
rules:
- has_stash_id # scene has stash-box metadata → wins
- organized
- resolution
- bitrate
- codec_preference
- file_size
- tag_count
- path_priority
files:
rules:
- filename_quality # regex match → wins
- path_priority
- mod_time
filename_quality:
pattern: '^\d{4}[-._]\d{2}[-._]\d{2}_.+_(480|540|720|1080|1440|2160|2880|4320|[2468][kK])p?(_\d+)?\.[A-Za-z0-9]+$'
apply:
scenes:
default_action: tag # safe default
loser_tag: _dedupe_loser
keeper_tag: _dedupe_keeperThe full default file is at
internal/config/default.yaml — every
field is documented inline.
stash-janitor config show prints the effective config (defaults merged with your
overrides), which is helpful when debugging "is my override being applied?".
Stash exposes
findDuplicateScenes(distance, duration_diff) which returns groups of
perceptually-similar scenes server-side. stash-janitor scenes scan consumes those
groups, scores each one with the configured rule chain, and writes the
results into the local SQLite cache.
./stash-janitor scenes scan --distance 4 --duration-diff 1.0Useful flags:
--distance N— phash hamming distance (default 4)--duration-diff S— max duration delta in seconds (default 1.0)--max-groups N— stop after N groups, useful for chunked iteration at large library scale (60k+ scenes)
Once scanned, the local cache is offline-readable:
./stash-janitor scenes status # summary + reclaimable bytes
./stash-janitor scenes report --filter all # every group
./stash-janitor scenes report --filter needs-review # groups the scorer couldn't decide
./stash-janitor scenes report --filter decided --json # machine-readable output./stash-janitor scenes apply --action tag # dry run
./stash-janitor scenes apply --action tag --commit # actually tag in StashTag mode finds-or-creates two tags (default _dedupe_loser and
_dedupe_keeper), uses bulkSceneUpdate to add them to the scenes,
and marks the local groups as applied. No scenes are deleted. You then
go to Stash's own UI, filter by _dedupe_loser, eyeball the suggestions,
and bulk-delete the ones you agree with.
./stash-janitor scenes apply --action merge # dry run
./stash-janitor scenes apply --action merge --commit # interactive YES prompt
./stash-janitor scenes apply --action merge --commit --yes # bypass prompt for cronMerge mode does the full cross-scene consolidation per group:
- Fetch full metadata for keeper + losers from Stash.
- Compute the metadata union per the policy in
merge.scene_level(multi-value fields union, scalar fields prefer keeper, rating100 max, organized any-true). - Filename metadata extraction: if the union still has empty title/date
but a loser file has a structured filename (e.g.
2024-12-15_Performer- Title_1080p.mp4), parse it and fill in the keeper's empty fields. - Call
sceneMergewith the union as thevaluesfield — Stash's sceneMerge does NOT auto-union scene metadata, so we compute it. - The merged keeper now has every loser's files attached. Pick the best file as primary (resolution → bitrate → filename → file_size → mod_time).
- Rename-on-merge: if the winning file has a junk basename but a loser
had a structured one, rename the winner via
moveFilesto use the loser's filename (with resolution token swapped to match the winner's actual height). deleteFileson the rest from disk. Mark the group applied.
Per-group failures don't abort the run — they're captured in the per-group report and the loop continues.
Stash attaches multi-files to a scene only by oshash/md5 match (verified
in pkg/scene/scan.go in v0.31.0), so all files within a single scene are
byte-equivalent. The scoring rules consider only filename, path, and
mod_time — tech specs are guaranteed identical.
./stash-janitor files scan
./stash-janitor files status
./stash-janitor files report --filter decided # each loser shows WHY it lost
./stash-janitor files apply # dry run
./stash-janitor files apply --commit # swap primary + delete losersThe default filename_quality regex matches
YYYY-MM-DD_<anything>_<resolution>[_N].<ext> including import suffixes
like _1080p_1.mp4 and resolutions up to 8K.
Scenes with no stash_ids (55% of a typical library) are "orphans". stash-janitor
queries stash-box by phash to find matches, then links them back to Stash.
./stash-janitor orphans scan --max-scenes 500 # start small (stash-box is slow)
./stash-janitor orphans report --filter matched # review proposed matches
./stash-janitor orphans apply --commit # write stash_ids back
./stash-janitor orphans scan # full library (re-runs skip cache)
./stash-janitor orphans scan --endpoint all # query every configured stash-boxAfter applying, run Stash's built-in Scene Tagger to pull full metadata (title, performers, tags, studio) from stash-box for the newly-linked scenes.
--dry-runis the default. You must pass--committo mutate anything. This protection is never removed.- Default action is
tag— fully reversible. - Destructive actions (merge, delete, files commit) require an
interactive
YESprompt with a summary of what will happen.--yesbypasses the prompt for scripted use, but--commitis still required separately, so a typo cannot trigger destruction. - Filename-info-loss safety net. When the scorer picks a keeper with
a junk filename but a loser has a date-bearing structured name, the
group is flagged
needs_reviewinstead of auto-decided. This prevents silently losing information encoded only in filenames. - Idempotent. Re-running
apply --commitis safe — applied groups are skipped viaapplied_at. - No filesystem touches. Every destructive op goes through Stash's GraphQL API, so the container's view of paths is the only one that matters. Works seamlessly with NAS / Docker setups.
- API key never logged, never stored in config, only read from env.
needs_reviewgroups are never applied automatically.- Stale-cache detection.
stash-janitor statssamples cached scene IDs and warns when they no longer exist in Stash.
go test ./... # unit tests, no Stash needed
SFX_INT_STASH_URL=http://your-stash:9999 \
go test ./internal/stash/... -run Integration # read-only integrationThe integration tests are env-gated so go test ./... stays hermetic.
┌──────────────┐ GraphQL ┌──────────┐
│ stash-janitor (binary) │ ─────────► │ Stash │ ── manages files on disk
└──────┬───────┘ └──────────┘
│
▼
┌──────────────┐
│ stash-janitor.sqlite │ ← scan cache, decisions, history
└──────────────┘
Module layout:
cmd/stash-janitor/main.go — entry point
internal/cli/ — cobra command tree (scenes, files, orphans, stats, config)
internal/config/ — YAML loader (embedded defaults)
internal/stash/ — GraphQL client + query catalog (v0.31.0)
internal/store/ — SQLite persistence (schema v4 with migrations)
internal/scan/ — workflow A/B/C scan orchestration
internal/decide/ — scene + file scoring engines
internal/merge/ — metadata union + filename parser for sceneMerge
internal/apply/ — tag, merge, delete, files-prune, orphans-link, rename
internal/report/ — text + JSON output for all workflows
internal/confirm/ — interactive YES prompt utility
Moves and renames files on disk (via Stash's moveFiles API) so your
library has a consistent, browsable directory structure based on metadata.
./stash-janitor organize scan # compute ideal paths for every scene
./stash-janitor organize report # show proposed moves
./stash-janitor organize report --filter conflict # naming collisions
./stash-janitor organize apply --commit # actually move files via StashThe default template produces your existing convention:
/data/<Performer>/<date>_<Performer>-<Title>_<resolution>.<ext>
For Whisparr compatibility, switch to a studio-first template in config.yaml:
organize:
path_template: "{studio}/{date} - {title}/{date}_{performer}-{title}_{resolution}.{ext}"Scenes without the required metadata (performer + date by default) are
skipped and left where they are. Run stash-janitor orphans scan + Stash Scene
Tagger first to maximize metadata coverage, then organize.
stash-janitor review launches a terminal UI for walking through duplicate
groups. Especially useful for needs_review groups where the scorer couldn't
auto-decide and you need to pick a keeper manually.
./stash-janitor review # decided + needs_review groups
./stash-janitor review --filter needs-review # just the ones that need you- For
needs_reviewgroups: pressoto pick which scene to keep, thenEnter. The status bar updates live. - For
decidedgroups: pressato accept the scorer's pick, oroto change it. n= not_duplicate,d= dismiss,q= quit.
The TUI only saves decisions locally. After reviewing, re-scan and apply:
stash-janitor scenes scan
stash-janitor scenes apply --action merge --commitSee MANUAL.md for the full review-to-delete walkthrough.
- PLAN.md — full design, locked-in decisions, phased roadmap
- MANUAL.md — step-by-step user guide
- internal/config/default.yaml — every config field documented inline
./stash-janitor help— cobra-generated command tree./stash-janitor scenes help,./stash-janitor files help,./stash-janitor orphans help— workflow-specific help