feat: write bcl2fastq output directly to /flowcells/, eliminating 3TB cross-filesystem rsync by jemma-nelson · Pull Request #82 · StamLab/stampipes

jemma-nelson · 2026-03-26T20:56:12Z

Summary

Eliminates the slow copy phase in setup.sh by writing bcl2fastq output directly to the destination filesystem (/flowcells/) instead of /sequencers/, then rsyncing 3TB across.

Key changes

`scripts/flowcells/setup.sh`

All bcl2fastq --output-dir args now write to $analysis_dir/bcl_output/ (a restricted subdirectory under the flowcell analysis dir on /flowcells/)
bcl_output/ is created with chmod 700 before the bcl2fastq job is submitted, so permissions are set before any files exist
copy_from_dir is unified to $analysis_dir/Demultiplexed/ for all non-legacy paths
In the copy SLURM job: internal samples now use mv (O(1) rename, no data movement) instead of rsync -aL; only samples with a project_share_directory still use rsync
bcl_output/ intermediate directories are not deleted — they retain stats, logs, and reports from bcl2fastq

`scripts/flowcells/rename_fastq_files.py` (was `link_nextseq.py`)

Renamed to reflect that it now moves files rather than creating symlinks
os.symlink() replaced with os.rename() — files are moved to canonical sample names; since everything is on the same filesystem this is an O(1) metadata operation
All internal verbiage (create_links, 'symlinks') updated to match

Commits

d42d1a2 feat: write bcl2fastq output directly to /flowcells/
2428eea refactor: rename link_nextseq.py, nest bcl output in bcl_output/ subdir
fb3c5a7 fix: terminology (bcl-convert → bcl2fastq in comments)
5651e75 fix: GUIDEseq index-swap paths updated to bcl_output/fastq/

- Rename scripts/flowcells/link_nextseq.py -> rename_fastq_files.py to reflect that it now moves files rather than creating symlinks. Update all references in setup.sh and fix internal verbiage (create_links -> rename_files, 'symlinks' -> 'renames' in help/docstrings). - Add a bcl_output/ subdirectory under analysis_dir as the landing zone for all bcl-convert raw output. Create it with chmod 700 *before* submitting the bcl-convert job, so permissions are set before any files exist. Remove the post-hoc chmod in __COPY__. - Update all fastq_dir assignments and bcl-convert --output-dir flags to write into analysis_dir/bcl_output/ instead of analysis_dir/ directly. - Update novaseq_link_command glob from 'fastq-withmask-*' to 'bcl_output/fastq-withmask-*' and non-NovaSeq link_command -i arg from 'fastq' to 'bcl_output/fastq'. - Remove the rm -rf of intermediate fastq dirs from __COPY__: those directories retain valuable stats, reports, and logs from bcl-convert.

…fastq)

jemma-nelson · 2026-03-26T20:57:14Z

I let Claude Code write most of this - I think the core idea is solid, but I am tagging myself to review the details at a later date.

Copilot

Pull request overview

This PR removes the large cross-filesystem copy step by having bcl2fastq write outputs directly under the flowcell analysis directory on /flowcells/, and then reorganizes/moves outputs in-place (with mv for internal samples) during the copy stage.

Changes:

Update bcl2fastq --output-dir paths to write into $analysis_dir/bcl_output/ and create that directory with restricted permissions ahead of job submission.
Replace the old “link” phase with rename_fastq_files.py, which now moves FASTQs into canonical names/structure.
Optimize the copy SLURM job: internal samples use filesystem renames (mv), while samples destined for project_share_directory still use rsync.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
scripts/flowcells/setup.sh	Redirects bcl2fastq output to `/flowcells` under `bcl_output/`, adjusts linking/rename invocation, and changes the copy job to `mv` internal outputs.
scripts/flowcells/rename_fastq_files.py	Renames/refactors the previous linking behavior to move files via `os.rename()` and updates messaging/function naming accordingly.

Comments suppressed due to low confidence (1)

scripts/flowcells/rename_fastq_files.py:147

os.rename() is only executed when output_file does not exist, but if it does exist the code silently leaves the input FASTQ in place (after logging "Moving …"), which can produce partially-processed runs and make reruns non-deterministic. Consider explicitly logging a warning/error when output_file already exists (and either aborting, or adding an overwrite/force option) so failures don’t go unnoticed and inputs aren’t left behind unexpectedly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

scripts/flowcells/setup.sh

jemma-nelson · 2026-04-03T20:58:33Z

Okay - I'm going to merge this in, and will deploy early next week.

jemma-nelson added 4 commits March 26, 2026 12:08

fix: remove unused rel_path variable in link_nextseq.py

d42d1a2

fix: replace bcl-convert with bcl2fastq in comments (branch uses bcl2…

fb3c5a7

…fastq)

fix: update GUIDEseq index-swap paths to bcl_output/fastq/

5651e75

jemma-nelson requested a review from Copilot March 30, 2026 20:25

Copilot started reviewing on behalf of jemma-nelson March 30, 2026 20:26 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

scripts/flowcells/setup.sh Show resolved Hide resolved

scripts/flowcells/setup.sh Show resolved Hide resolved

jemma-nelson added 2 commits March 30, 2026 15:19

address PR comments

1930c14

make rename_fastq_files --dry-run clearer

daa93bf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: write bcl2fastq output directly to /flowcells/, eliminating 3TB cross-filesystem rsync#82

feat: write bcl2fastq output directly to /flowcells/, eliminating 3TB cross-filesystem rsync#82
jemma-nelson wants to merge 6 commits intomainfrom
feat/direct-flowcell-output

jemma-nelson commented Mar 26, 2026

Uh oh!

jemma-nelson commented Mar 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

jemma-nelson commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jemma-nelson commented Mar 26, 2026

Summary

Key changes

scripts/flowcells/setup.sh

scripts/flowcells/rename_fastq_files.py (was link_nextseq.py)

Commits

Uh oh!

jemma-nelson commented Mar 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

jemma-nelson commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`scripts/flowcells/setup.sh`

`scripts/flowcells/rename_fastq_files.py` (was `link_nextseq.py`)