RFhap

A Nextflow pipeline for long-read phasing in trio datasets, leveraging multiple k-mers and a random forest classifier.

This pipeline is containerized and integrates the following third-party software components:

KMC: An ultra-fast k-mer counter (GitHub)
FastKM: An efficient k-mer matcher (GitHub)
Seqtk: Tools for fastq/fasta manipulation (GitHub)
R-base: Implements the random forest classifier for phasing.

Running RFhap

To run RFhap, ensure that Nextflow and Singularity are properly installed. Use the following command:

nextflow run rfhap/rfhap.nf  \ 
           --paternal_reads data/reads/hg002/father.txt \ 
           --maternal_reads data/reads/hg002/maternal.txt \ 
           --child_reads data/reads/hg002/child.txt \
           --num_cases 400000 \
           --outdir hg002hap -profile leftraru

Input File Format

Each input file should list the paths to sequencing reads in the following format:

paternal.txt (Short reads from the father)

path
/<path>/trio_data/short-reads/paternal.fwd.fasta.gz
/<path>/trio_data/short-reads/paternal.rev.fasta.gz

maternal.txt (Short reads from the mother)

path
/<path>/trio_data/short-reads/maternal.fwd.fasta.gz
/<path>/trio_data/short-reads/maternal.rev.fasta.gz

child.txt (Long reads from the child, any sequencing technology)

path
/<path>/trio_data/long-reads/kidONT.R94.p1p2.fasta.gz
/<path>/trio_data/long-reads/kidONT.R103.p1p2.fasta.gz

Additional Options

--kmers specifies the k-mer sizes used for counting/matching.
--num_cases specifies the size of the training dataset [def: 150000, max: 400000].
--outdir specifies the output directory.

Output Directory

RFHap produce the following output under the --outdir :

ukmers_per_haplotype : unique k-mers for each parent stored in uk24/uk27/uk30/uk33 hapA_only_kmers.txt and hapB_only_kmers.txt respectively.
fastkm: directory containing matrix results of matching paternal and maternal k-mers on the long-reads ("*_fastkm_kbz.txt"), for each long-read provided for the child.
rf: Random forest models and classification of reads in haplotypes. Tree subdirectories are created with training (train), prediction (prediction) and haplotype (haplotypes) read classification.
hap_fastq: Fastq files for each long read set of the child, split into *.hapA.fq.gz (haplotype A), *.hapB.fq.gz (haplotype B), *.hapU.fq.gz (haplotype Unknow or common).
pipeline_info: report of execution time and memory consuption of each step.

Haplotype assembly

After long-read haplotype partition, each read partition can be assembled on haploid mode with hifiasm. The recommended options for Hifiasm (version 0.24.0-r702) are the following:

ULA= reads haplotye A
ULB= reads haplotype B
ULU= reads haplotye U # unknow haplotype
CPU=44 # number of CPUs to use
hifiasm -o asm.hifiasm.hapA.ont -l 3 --ont ${ULA} ${ULU} -t ${CPU} 
hifiasm -o asm.hifiasm.hapB.ont -l 3 --ont ${ULB} ${ULU} -t ${CPU}

Developpers

Damariz Gonzalez
Alex Di Genova

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
aux_scripts		aux_scripts
bin		bin
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
child.txt		child.txt
conda.yml		conda.yml
maternal.txt		maternal.txt
nextflow.config		nextflow.config
paternal.txt		paternal.txt
rfhap.nf		rfhap.nf
triob.yml		triob.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RFhap

Running RFhap

Input File Format

paternal.txt (Short reads from the father)

maternal.txt (Short reads from the mother)

child.txt (Long reads from the child, any sequencing technology)

Additional Options

Output Directory

Haplotype assembly

Developpers

About

Uh oh!

Releases

Packages

Languages

License

GabrielCabas/rfhap

Folders and files

Latest commit

History

Repository files navigation

RFhap

Running RFhap

Input File Format

paternal.txt (Short reads from the father)

maternal.txt (Short reads from the mother)

child.txt (Long reads from the child, any sequencing technology)

Additional Options

Output Directory

Haplotype assembly

Developpers

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages