~3x run time speedup using parallel processing by ewels · Pull Request #197 · s-andrews/FastQC

ewels · 2026-05-21T09:26:21Z

I spent a while trying to get the performance improvements from my Rust rewrite back upstream into the Java distribution. After a lot of different attempts, none of which really made any difference, I gave up. Thankfully I posted about this on our Seqera Slack and @pditommaso, defender of the Java faith, stood up for Java and said that anything Rust could do, Java could do better (or as fast, anyway).

@pditommaso proceeded to try a lot of things and sure enough, pulled a 3x speed improvement out of the bag.

I've now gone though all of his changes with a tooth pick and tried to separate out the different aspects and benchmark which things had what effect. The result of that was pinpointing the smallest change that made the biggest difference: parallelisation of the stream reader. I pulled that code out into a new clean branch and hammered on it to cut it down as much as possible, as well as polish and benchmark. This PR is the result of that.

What it does

As of this PR, each file now runs through a small in-process pipeline instead of a single loop: one reader thread (gzip + FASTQ parse) feeds up to three processor threads that each own a disjoint slice of the QC modules. The modules themselves are unchanged and no module needs to become thread-safe. Despite us thinking that gzip was the bottleneck and that was that, this does indeed give a major performance boost.

Behaviour change

The core code change is in the first commit and is really rather small. The second commit is to handle the pre-existing -t flag which sets threads. Previously threads = number of files processed in parallel, as FastQC was single-threaded. After the above change, each -t value for a new file was actually 4 more CPUs, which is not what the end user would expect.

To handle this, I tweaked how it worked and made it truly reflect the number of CPUs to use. -t / -Dfastqc.threads is now a total CPU budget, split between files in parallel (outer concurrency) and the per-file pipeline (inner concurrency). It defaults to min(4 × num_files, available_cpus).

Invocation	Before	After
`-t 1`	1 core / file	1 core / file (sequential path, unchanged)
`-t N` (N > 1)	up to N files in parallel, 1 core each	total budget of N cores across files × pipeline
no `-t`	1 file at a time, 1 core	up to 4 cores / file

Benchmark

To validate the results I ran a set of benchmarks. Full report with results, and also discussing the changes in the PR and why they work, is here: report.html

Note that all outputs are byte-identical to master across every run.

To avoid downloading + opening the HTML to view, here's a full-page screenshot which you can squint at in the GitHub UI, if you prefer:

Report (screenshot)

_Volumes_T7%20Shield_fastqc-bench_results_report html

AnalysisRunner now runs as one reader thread plus three processor threads. The reader batches Sequences (1024 per batch) and pushes each batch reference onto N ArrayBlockingQueues. Each processor drains its own queue and runs an evenly split subset of the QCModule array, so modules stay single-threaded per processor and no in-module locking is needed. Progress callbacks (analysisUpdated) are fired from the reader thread at the same cadence as the previous single-threaded version (every batch boundary, gated on a 5% file-position advance). Co-Authored-By: Paolo Di Tommaso <paolo.ditommaso@gmail.com>

AnalysisQueue treats -t as a total-thread budget and splits it between outer concurrency (files in parallel) and inner concurrency (per-file reader + processor pipeline): processorsPerFile = min(MAX_PROCESSORS_PER_FILE, totalThreads - 1) outerSlots = max(1, totalThreads / (1 + processorsPerFile)) When -t is unset, OfflineRunner now tells AnalysisQueue how many files the run has via configure(); the default becomes min(THREADS_PER_FILE * max(1, expectedFiles), availableProcessors), so a single file gets the full per-file pipeline and many files scale up to the host's CPU count without the user needing to set -t. A budget of one CPU makes AnalysisRunner take its single-threaded path so -t 1 produces byte-identical behaviour to the unbatched runner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

s-andrews · 2026-05-21T14:53:39Z

Have pulled this to a local branch and tested. Can confirm that it does seem to offer a meaningful increase in speed.

On a test file which should be in memory file cache.

old fastqc (-t 1 effectively) took 8m6.225s
new fastqc -t 4 took 2m15.368s
new fastqc -t 1 took 8m10.267s

Will go ahead and pull into master.

ewels · 2026-05-21T15:02:09Z

Awesome, happy to hear it! Thanks for reviewing 🙏🏻

ewels · 2026-05-21T15:39:56Z

Minor side note: you probably don't need to change the target branch and do a second PR. I should have the Allow edits and access to secrets by maintainers box checked on all my PRs, meaning that you can push commits to my fork:

Then the workflow with GitHub CLI is as follows to work basically as if it were one of your own branches:

gh pr checkout 197
# test, make edits if needed
git commit -am "My review changes"
git push
gh pr merge  # or on the web interface

pditommaso · 2026-05-21T19:27:03Z

Happy to see this merged! 🎉

ewels and others added 3 commits May 21, 2026 07:53

Update CLI help text

ebe8fc6

s-andrews changed the base branch from master to performance May 21, 2026 12:50

s-andrews merged commit 31c54fd into s-andrews:performance May 21, 2026
1 check failed

ewels deleted the perf/parallel-pipeline branch May 21, 2026 14:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

~3x run time speedup using parallel processing#197

~3x run time speedup using parallel processing#197
s-andrews merged 3 commits into
s-andrews:performancefrom
ewels:perf/parallel-pipeline

ewels commented May 21, 2026

Uh oh!

Uh oh!

s-andrews commented May 21, 2026

Uh oh!

ewels commented May 21, 2026

Uh oh!

ewels commented May 21, 2026

Uh oh!

pditommaso commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ewels commented May 21, 2026

What it does

Behaviour change

Benchmark

Uh oh!

Uh oh!

s-andrews commented May 21, 2026

Uh oh!

ewels commented May 21, 2026

Uh oh!

ewels commented May 21, 2026

Uh oh!

pditommaso commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants