Skip to content

Nextflow and Dockerfile optimisations#651

Open
MattWellie wants to merge 6 commits intomainfrom
dockerfile_optimisations
Open

Nextflow and Dockerfile optimisations#651
MattWellie wants to merge 6 commits intomainfrom
dockerfile_optimisations

Conversation

@MattWellie
Copy link
Copy Markdown
Collaborator

Claude initiated

Fixes

  • The dockerfile contains some redundant build layers, and some unnecessary tools

Proposed Changes

  • All the same changes as Cc nf corrections #650, I need to integrate the two:
  • Various issues identified through analysis with claude.
  • Some are soft inconsistencies (mixed use of case, missing shebangs, missing nextflow.enable.dsl=2 etc.)
  • Some are hard bugs (bcftools output written using wrong flag)
  • Some are incomplete changes (mix of workflow publishing and output directives, addressed by unmerged Workflow outputs only #645)
  • Some are functional but incorrect (Memory setting of "1 GB" instead of 1.GB - same result, but though string parsing instead of plain Groovy
  • Some are ideal but not essential (set -euo pipefail added to all script blocks), rm -f to prevent failure if files (which are definitely written) stop being written
  • Some corrections (indexing files during correct stage and passing VCF with index, instead of indexing an input, so the index would be lost in the working directory)
  • Adds missing script: directives to some files

Remains a failure around workflow outputs..

Checklist

  • ChangeLog Updated
  • Version Bumped
  • Related Issue created
  • Tests covering new change
  • Linting checks pass

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refines the Talos Nextflow pipelines and container build setup by standardising DSL2 usage, aligning output handling with Nextflow “workflow outputs”, and tightening script execution/IO conventions.

Changes:

  • Add/standardise nextflow.enable.dsl=2, remove most publishDir usage, and expand workflow outputs to include a PanelApp artefact.
  • Improve robustness/consistency across modules (e.g., set -euo pipefail, safer cleanup, corrected channel usage/capitalisation, VCF/index tuple wiring).
  • Optimise Docker build and CI version extraction (switch VERSION detection to ARG VERSION=..., use uv sync in the main Dockerfile).

Reviewed changes

Copilot reviewed 34 out of 34 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
talos_only.nf Enables DSL2 and updates how prior annotation outputs are discovered; adds PanelApp to outputs.
src/talos/cpg_internal_scripts/cpgflow_jobs/make_config.py Minor comment wording fix (“fault-tolerant”).
src/talos/cpg_internal_scripts/CPG_Dockerfile Docker build layer/flow adjustments; consolidates stages and aligns version handling.
nextflow_schema.json Adds a Nextflow parameter schema for config validation/documentation.
nextflow/talos.nf Renames PanelApp path variable and emits PanelApp output channel.
nextflow/modules/talos/ValidateMOI/main.nf Removes publishDir; adds stricter shell flags.
nextflow/modules/talos/UnifiedPanelAppParser/main.nf Removes publishDir; adds explicit script: and stricter shell flags.
nextflow/modules/talos/StartupChecks/main.nf Removes commented publishDir; uses set -euo pipefail.
nextflow/modules/talos/RunHailFiltering/main.nf Removes publishDir; uses set -euo pipefail.
nextflow/modules/talos/HPOFlagging/main.nf Removes publishDir; adds explicit script: and stricter shell flags.
nextflow/modules/talos/CreateTalosHTML/main.nf Removes publishDir; uses set -euo pipefail.
nextflow/modules/prep/ResummariseRawSubmissions/main.nf Emits VCF+index as a tuple; adds stricter shell flags; safer cleanup.
nextflow/modules/prep/ParseManeIntoJson/main.nf Adds set -euo pipefail.
nextflow/modules/prep/ParseAlphaMissense/main.nf Adds set -euo pipefail.
nextflow/modules/prep/MakeClinvarbitrationPm5/main.nf Adds set -euo pipefail; safer cleanup.
nextflow/modules/prep/EncodeAlphaMissense/main.nf Adds set -euo pipefail.
nextflow/modules/prep/DownloadPanelApp/main.nf Adds set -euo pipefail.
nextflow/modules/prep/DownloadClinVarFiles/main.nf Adds set -euo pipefail.
nextflow/modules/prep/CreateRoiFromGff3/main.nf Minor comment tweak; adds set -euo pipefail.
nextflow/modules/prep/ConvertSpliceVarDb/main.nf Adds set -euo pipefail.
nextflow/modules/prep/AnnotateClinvarWithBcftools/main.nf Accepts (vcf, vcf_idx) tuple input; adds set -euo pipefail.
nextflow/modules/annotation/SplitVcf/main.nf Removes commented publishDir; adds set -euo pipefail.
nextflow/modules/annotation/NormaliseAndRegionFilterVcf/main.nf Removes commented publishDir; adds set -euo pipefail.
nextflow/modules/annotation/MergeVcfsWithBcftools/main.nf Removes commented publishDir; switches merge output to -Oz; adds set -euo pipefail.
nextflow/modules/annotation/AnnotatedVcfIntoMatrixTable/main.nf Removes publishDir; uses set -euo pipefail; safer checkpoint cleanup.
nextflow/modules/annotation/AnnotateWithEchtvar/main.nf Removes commented publishDir; uses set -euo pipefail.
nextflow/modules/annotation/AnnotateCsqWithBcftools/main.nf Removes commented publishDir; adds set -euo pipefail.
nextflow/annotation.nf Fixes Channel.fromPath capitalisation consistency.
nextflow.config Removes redundant params, uses 1.GB-style memory values, and sets per-process resources.
main.nf Enables DSL2 and emits PanelApp as a workflow output.
docs/NextflowConfiguration.md Updates docs to remove outdir and rely on outputDir.
README.md Updates output path documentation to reference outputDir.
Dockerfile Simplifies stages and installs Python deps via uv sync; switches VERSION to ARG.
.github/workflows/get_version.py Updates version extraction to read ARG VERSION=....

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 20 to 24
.map { row -> tuple(
row.cohort,
files("${params.outdir}/${row.cohort}_outputs/*.mt", type: 'dir'),
Channel.fromPath("${outputDir}/${row.cohort}_outputs/*.mt", checkIfExists: true),
file(row.pedigree, checkIfExists: true),
file(row.config, checkIfExists: true),
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Channel.fromPath(...) returns a channel; embedding it inside the tuple emitted by ch_inputs.map means ch_mts becomes a nested channel rather than a list/path value. Downstream (TALOS expects ch_mts items shaped like tuple(cohort, mts, ...), where mts is a path or collection of paths, not a Channel), so this will break joins and process inputs. Use files("${workflow.outputDir}/${row.cohort}_outputs/*.mt", type: 'dir') (or build a channel of .mt paths and groupTuple(by:0) like ANNOTATION does) instead of Channel.fromPath inside the tuple.

Copilot uses AI. Check for mistakes.
Comment on lines 9 to 17
def extract_version_from_file(file_path: str) -> str | None:
"""
Extract the version from a Dockerfile by searching for a line like:
ENV VERSION=1.0.0
"""
with open(file_path) as f:
content = f.read()
pattern = re.compile(r'^\s*ENV\s+VERSION\s*=\s*([^\s]+)', re.MULTILINE)
pattern = re.compile(r'^\s*ARG\s+VERSION\s*=\s*([^\s]+)', re.MULTILINE)
match = pattern.search(content)
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version-extraction regex now looks for ARG VERSION=..., but the function docstring still says it searches for ENV VERSION=... (and the later error message still references ENV). Update the documentation/error text to match the new ARG VERSION convention to avoid confusing future maintainers.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants