Skip to content

Refactor#1

Open
wenjie1991 wants to merge 8 commits into
mainfrom
refactor
Open

Refactor#1
wenjie1991 wants to merge 8 commits into
mainfrom
refactor

Conversation

@wenjie1991

Copy link
Copy Markdown
Collaborator

No description provided.

@wenjie1991 wenjie1991 marked this pull request as ready for review April 16, 2026 14:43
@wenjie1991 wenjie1991 requested a review from Copilot April 16, 2026 14:43

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the pipeline into a self-contained Nextflow DSL2 workflow for running Souporcell doublet calling on a directory of BAMs, adding a fast C++ stream filter to inject CB/CR tags and providing Conda + Singularity/Apptainer packaging to make runs more reproducible.

Changes:

  • Reworked main.nf/nextflow.config into a new parameter set, new process graph, and new execution profiles (local/slurm/conda/singularity/apptainer).
  • Added a C++ SAM stream filter (src/add_cb_rg_tags.cpp) plus build/wrapper scripts to produce and run it.
  • Added Conda env + container build definitions and updated documentation for the new run modes and parameters.

Reviewed changes

Copilot reviewed 10 out of 12 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tools/build_add_cb_rg_tags.sh Adds a helper script to compile the C++ tagger into bin/.
src/add_cb_rg_tags.cpp Introduces a C++ SAM stream filter to add CB/CR and strip existing tags.
nextflow.config Replaces legacy config with new params, process resources, and execution profiles.
main.nf Replaces the old workflow with a new DSL2 pipeline and inline process definitions.
envs/souporcell.yml Adds a Conda environment definition for samtools + souporcell + compiler.
containers/souporcell.def Adds a Singularity/Apptainer definition to build an image from the Conda env.
containers/build_singularity_image.sh Adds a helper script to build the container image locally or remotely.
containers/README.md Documents how to build/use the container image.
conf/hpc_resources.config Adds an example config for overriding CPU/memory on HPC.
bin/add_cb_rg_tags Adds a wrapper that auto-builds and execs the compiled C++ tagger.
README.md Updates repo documentation for the new pipeline usage and parameters.
.gitignore Ignores Nextflow work/logs and the compiled C++ binary artifact.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread nextflow.config
cpus = { check_max( 2 * task.attempt, 'cpus' ) }
memory = { check_max( 6.GB * task.attempt, 'memory' ) }
executor = 'local'
shell = ['/bin/bash', '-euo', 'pipefail']
Comment thread nextflow.config
Comment on lines +89 to +90
singularity.enabled = false
conda.enabled = false
Comment thread main.nf
Comment on lines +164 to +191
def no_umi_arg = params.no_umi.toString().toBoolean() ? 'True' : 'False'
def skip_remap_arg = params.skip_remap.toString().toBoolean() ? '--skip_remap True' : ''
def ignore_arg = params.ignore.toString().toBoolean() ? '--ignore True' : ''
def allow_troublet_failure = params.allow_troublet_failure.toString().toBoolean() ? 'true' : 'false'
def allow_souporcell_partial = params.allow_souporcell_partial.toString().toBoolean() ? 'true' : 'false'
def clean_souporcell_output = params.clean_souporcell_output.toString().toBoolean() ? 'true' : 'false'
"""
if [[ "${clean_souporcell_output}" == "true" ]]; then
rm -rf souporcell_output
fi
mkdir -p souporcell_output
set +e
${params.souporcell_cmd} \\
-i "${merged_bam}" \\
-b "${barcode_list}" \\
-f "${ref_fasta}" \\
-t ${task.cpus} \\
-k ${params.k_genotypes} \\
--no_umi ${no_umi_arg} \\
${skip_remap_arg} \\
${ignore_arg} \\
--min_alt ${params.min_alt} \\
--min_ref ${params.min_ref} \\
--max_loci ${params.max_loci} \\
--restarts ${params.restarts} \\
-o souporcell_output ${params.souporcell_extra_args}
exit 0
"""
Comment thread main.nf
Comment on lines +83 to +91
"""
samtools view -h -@ 1 "${bam}" \\
| add_cb_rg_tags \\
--mode "${params.extract_mode}" \\
--regex '${params.qname_regex}' \\
--colon-field ${params.colon_field} \\
--tail-length ${params.tail_length} \\
${index_arg} \\
| samtools view -b -@ 1 -o "${sample}.rg.bam" -
Comment thread src/add_cb_rg_tags.cpp
Comment on lines +35 to +40
if (opt.mode == "regex") {
std::regex re(opt.pattern);
std::smatch match;
if (std::regex_search(qname, match, re)) {
barcode = match.size() > 1 ? match[1].str() : match[0].str();
}
Comment thread README.md
Comment on lines +38 to +48
For SLURM with Conda:

```bash
nextflow run main.nf \
--input_dir /path/to/bam_files \
--barcode_list ./tmp/cell_barcode.tsv \
--ref_fasta ../../data/ref_genome/hg38_gencode.fa \
--k_genotypes 4 \
--out_dir souporcell_work \
-profile conda,slurm
```
Comment thread README.md
Comment on lines +193 to +199
Outputs are published under `--out_dir`:

- `bin/add_cb_rg_tags`: compiled C++ tagger.
- `ref/`: FASTA index generated by `samtools faidx`.
- `tagged_bams/*.rg.bam`: BAMs with `CB` and `CR` tags and without `RG` tags.
- `dedup_bams/*.dedup.bam`: barcode-aware duplicate-removed BAMs.
- `dedup_bams/*.dup.out`: `samtools markdup` duplicate metrics.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants