Skip to content

SushiLab/mTRAc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mTRAc

metatranscriptomic growth classifier

Installation:

# Download tool
git clone git@github.com:SushiLab/mTRAc.git
# create environment
cd mTRAc
conda env create --name mTRAc --file=conda.yaml
conda activate mTRAc

Tool Interface


$ Program: mTRAc - metatranscriptomic growth classifier
    Version: 0.0.5


mTRAc <command> [options]

   	 -- Database
    
          extract   Extract marker genes from a genome
                    and create index

          combine   Merge individual databases into one
                    comprehensive database and create
                    index

          index     Index a database
                    
   	 -- Quantification
    
          align     Align reads against a marker gene 
                    database and quantify gene abundances

          merge     Merge multiple marker gene abundance files
                    into a single file

   	 -- Prediction

          predict    Predict the growth state (Growth/No growth)
                     for each genome within each sample





Extract

Extract 129 MGs from a genome


$ python mtrac.py extract


Program: mTRAc - metatranscriptomic growth classifier
        Version: 0.0.5


        mTRAc extract [options]

        Input options:
           -f   STR          Input genome file. Can be gzipped

           -db  STR          Name of Database - new database will
                             be stored in default database folder

Extract Example:

Extracting 129 MGs from Arectalis


$ python mtrac.py extract -f databases/Arectalis/Arectalis.fasta.gz -db Arectalis_extraction_test
2025-11-04,08:18:57 INFO: mTRAc tool starting
2025-11-04,08:18:57 INFO: Calling genes
2025-11-04,08:18:59 INFO: Extracting markergenes
2025-11-04,08:19:03 INFO: Starting marker gene extraction from 1 protein files.
2025-11-04,08:19:04 INFO: Finished marker gene extraction.
2025-11-04,08:19:04 INFO: Found 128 / 129 markergenes
2025-11-04,08:19:04 INFO: Writing new database to mTRAc/databases

Combine

Combine existing databases into one database (e.g. Arectalis, Btheta and Ecoli in to EAM).

Note see the Arectalis_extraction_test database that was genereated in the extract step and is now usable in for downstream analysis:

$ python mtrac.py combine
2025-11-04,08:21:34 INFO: mTRAc tool starting
Program: mTRAc - metatranscriptomic growth classifier
        Version: 0.0.5

        mTRAc combine [options]

        Input options:
           -i   STR [STR ...]   Names of databases that should be
                                combined. Choices:
				 - EAM
				 - Arectalis_extraction_test
				 - Ecoli
				 - Arectalis
				 - Btheta

           -db  STR             Name of Database - new database will
                                be stored in default database folder.

Combine Example:

Combine the Ecoli and Arectalis databases which creates the Ecoli_Arectalis database:

python mtrac.py combine -i Ecoli Arectalis -db Ecoli_Arectalis
2025-11-04,08:24:35 INFO: mTRAc tool starting
2025-11-04,08:24:35 INFO: Writing mapping file: mTRAc/databases/Ecoli_Arectalis/Ecoli_Arectalis.map.gz
2025-11-04,08:24:35 INFO: Writing gff3 file: mTRAc/databases/Ecoli_Arectalis/Ecoli_Arectalis.gff3.gz
2025-11-04,08:24:35 INFO: Writing fasta file: /mTRAc/databases/Ecoli_Arectalis/Ecoli_Arectalis.fasta.gz

Index

Indexes an existing database - Needed only for downloaded databases.

$ python mtrac.py index

Program: mTRAc - metatranscriptomic growth classifier
    Version: 0.0.5

    mTRAc index [options]

    Input options:
        -db  STR          genome database to index. Choices:
    			 - EAM
                 - Ecoli
                 - Arectalis
                 - Btheta


Index Example:

python mtrac.py index -db Arectalis
2025-11-22,17:41:32 INFO: mTRAc tool starting
2025-11-22,17:41:32 INFO: Database Arectalis exists but is not built yet. Start building:
[bwa_index] Pack FASTA... 0.01 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 0.40 seconds elapse.
[bwa_index] Update BWT... 0.01 sec
[bwa_index] Pack forward-only FASTA... 0.01 sec
[bwa_index] Construct SA from BWT and Occ... 0.12 sec
[main] Version: 0.7.18-r1243-dirty
[main] CMD: bwa index mTRAc/databases/Arectalis/Arectalis.fasta
[main] Real time: 0.556 sec; CPU: 0.562 sec
2025-11-22,17:41:33 INFO: mTRAc tool shutting down with exitcode 0

Align

Align short read sequencing data against a marker gene database and quantify their abundances using FeatureCounts.

Note: See the 2 databases created in the extract and combine sections:

$ python mtrac.py align

Program: mTRAc - metatranscriptomic growth classifier
    Version: 0.0.5


    mTRAc align [options]

    Input options:
       -f   FILE[ FILE]  input file(s) for reads in forward orientation, fastq(.gz)-formatted

       -r   FILE[ FILE]  input file(s) for reads in reverse orientation, fastq(.gz)-formatted

       -db  STR          genome database to use. Choices:
    			- EAM
			 	- Arectalis_extraction_test
			 	- Ecoli_Arectalis
			 	- Ecoli
			 	- Arectalis
			 	- Btheta

    Output options:
       -o   FILE         output file prefix. Will create 3 files:
                            - prefix.bam
                            - prefix.fcnt
                            - prefix.fcnt.mgs

    Algorithm options:
       -t   INT          number of threads [1]

Example: Quantifying 129 MGs from of Arectalis using the dataset STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG.



python mTRAc/mtrac.py  align -f STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.1.fq.gz -r STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.2.fq.gz -db Arectalis -o STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.mtrac -t 4
2025-11-04,08:41:25 INFO: mTRAc tool starting
2025-11-04,08:41:25 INFO: Database Arectalis exists but is not built yet. Start building:
[bwa_index] Pack FASTA... 0.03 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 0.39 seconds elapse.
[bwa_index] Update BWT... 0.01 sec
[bwa_index] Pack forward-only FASTA... 0.01 sec
[bwa_index] Construct SA from BWT and Occ... 0.12 sec
[main] Version: 0.7.18-r1243-dirty
[main] CMD: bwa index mTRAc/databases/Arectalis/Arectalis.fasta
[main] Real time: 0.559 sec; CPU: 0.569 sec
2025-11-04,08:41:26 INFO: Start align command
2025-11-04,08:41:26 INFO: 	Start alignment
2025-11-04,08:41:26 INFO: 		Executing: bwa mem -a -t 4 mTRAc/databases/Arectalis/Arectalis.fasta STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.1.fq.gz STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.2.fq.gz
2025-11-04,08:42:27 INFO: 	Finished alignment. Start sorting.
2025-11-04,08:42:52 INFO: 	Finished sorting. Start featureCounts.
2025-11-04,08:42:52 INFO: Executing: featureCounts -O -M --fraction -t gene -a mTRAc/databases/Arectalis/Arectalis.gff3 -o STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.mtrac.fcnt -F GTF -g locus_tag -p -B --verbose STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.mtrac.bam --countReadPairs -T 4

        ==========     _____ _    _ ____  _____  ______          _____
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
	  v2.1.1

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                                                                            ||
||                           STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample ... ||
||                                                                            ||
||             Output file : STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample ... ||
||                 Summary : STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample ... ||
||              Paired-end : yes                                              ||
||        Count read pairs : yes                                              ||
||              Annotation : Arectalis.gff3 (GTF)                             ||
||      Dir for temp files : ./                                               ||
||                                                                            ||
||                 Threads : 4                                                ||
||                   Level : meta-feature level                               ||
||      Multimapping reads : counted (fractional)                             ||
|| Multi-overlapping reads : counted                                          ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file Arectalis.gff3 ...                                    ||
||    Features : 3283                                                         ||
||    Meta-features : 3283                                                    ||
||    Chromosomes/contigs : 1                                                 ||
||                                                                            ||
|| Process BAM file STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.mtrac.ts ... ||
||    Paired-end reads are included.                                          ||
||    Total alignments : 4619528                                              ||
||    Successfully assigned alignments : 4428247 (95.9%)                      ||
||    Running time : 0.05 minutes                                             ||
||                                                                            ||
|| Write the final count table.                                               ||
|| Write the read assignment summary.                                         ||
||                                                                            ||
|| Summary of counting results can be found in file "STAU23-2_Ere_Glu_5_9_37  ||
|| _Exp_3_ISOG_subsample.mtrac.fcnt.summary"                              ||
||                                                                            ||
\\============================================================================//

2025-11-04,08:42:55 INFO: 	Finished featureCounts. Start marker gene extraction.
2025-11-04,08:42:55 INFO: 	Finished marker gene extraction.
2025-11-04,08:42:55 INFO: Output file: STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.mtrac.mgs
2025-11-04,08:42:55 INFO: Finished align command

The align command will produce three files of which the *mgs file is the primary output:


head -n 10 STAU23-2_Ere_Glu_5_9_37_Exp_3_ISOG_subsample.mtrac.mgs
#GENOME	ANNOTATION	GENE	CHROMOSOME	LENGTH	COUNT
Arectalis	TIGR00362	MIO91_00005	CP092643.1	1362	2111
Arectalis	TIGR00663	MIO91_00010	CP092643.1	1113	1732.50
Arectalis	TIGR01059	MIO91_00025	CP092643.1	1938	3078.83
Arectalis	TIGR01063	MIO91_00030	CP092643.1	2508	4110.33
Arectalis	TIGR01146	MIO91_00570	CP092643.1	861	    1622.50
Arectalis	TIGR01039	MIO91_00575	CP092643.1	1392	2406.83
Arectalis	TIGR00755	MIO91_01105	CP092643.1	867	    1555.67
Arectalis	TIGR00420	MIO91_01410	CP092643.1	1089	1769.50
Arectalis	TIGR02397	MIO91_01495	CP092643.1	1572	2724.00

Merge

The merge command is used to combine multiple marker gene abundance files (.mgs), generated by the align command from different samples, into a single, comprehensive file. This merged file is the required input for the predict command.


$ python mtrac.py merge

Program: mTRAc - metatranscriptomic growth classifier
        Version: 0.0.5


        mTRAc merge [options]

        Input options:
           -f   FILE[ FILE]  input files for merging. Have to be at least 2 output files
                             ([prefix].mgs) of predict step from two different runs against
                             the same database

        Output options:
           -o   FILE         output file

Example

$ python mtrac.py merge -f sample1.mgs sample2.mgs sample3.mgs -o merged_counts.tsv

Output Format Example:

#GENOME	ANNOTATION	GENE	CHROMOSOME	LENGTH	sample1	sample2	sample3
Arectalis	COG0012	MJ392_00005	CP092639.1	1404	1028	1502	 987
Arectalis	COG0016	MJ392_00006	CP092639.1	1000	 500	 800	1200
...

Predict

The predict command uses the merged marker gene count data to predict the growth state (Growth/No growth) for each genome within each sample, based on a pre-trained machine learning model.

Usage

$ python mtrac.py predict

Program: mTRAc - metatranscriptomic growth classifier
        Version: 0.0.5


        mTRAc predict [options]

        Input options:
           -i   FILE        Input CSV file with count(s) produced by extract/merge functions.

        Output options:
           -o  STR          Output prefix

Example

Use the merged file from the previous example (merged_counts.tsv) and the default TPM normalisation method.

python mtrac.py predict -i merged_counts.tsv  -o growth_predictions

Output File Format:

 Sample	Probability estimate	Classification
sample1	                0.85	        Growth
sample2	                0.12	     No Growth
sample3	                0.55	        Growth

About

MetaTRAnscriptomic growth Classifier

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages