added humann2 script#1
Conversation
|
@tkosciol |
tkosciolek
left a comment
There was a problem hiding this comment.
looks good overall. just a few comments
| humann2_regroup_table -i $OUTPUT_DIR/normalized_genefamilies_cpm.tsv -o $OUTPUT_DIR/normalized_genefamilies_cpm_EC.tsv -c $MAPPING_FILES/map_level4ec_uniref90.txt.gz | ||
|
|
||
| # remove the intermediate files | ||
| rm -rf $OUTPUT_DIR/*_humann2_temp |
There was a problem hiding this comment.
are those temp files potentially useful for debugging? if yes, then consider not removing them by default. or using a flag to determine if temp files should be kept or removed. For example using the getopts function (e.g. https://stackoverflow.com/questions/14447406/bash-shell-script-check-for-a-flag-and-grab-its-value)
There was a problem hiding this comment.
temp files can be useful for debugging. They were so huge, especially sam files from bowtie step. They were taking so much space on the cluster. I will consider be keeping them
|
|
||
| cd $WORKING_DIR | ||
|
|
||
| # running humann2 |
There was a problem hiding this comment.
how long does the pipeline take? can I run it on a single computer or do I need to run it on a cluser? If it needs to be run on a cluster, please add a docstring at the top specifying that and include sample parameters for a specific queuing system. If the script can be run on a single computer, but it takes more than a few minutes, I'd consider changing the comments (like #running humann2) into print statements (echo "running humann2...), so that the user knows the script is executing.
There was a problem hiding this comment.
Okay will make changes on that. Diamond translated search part is very computational intensive. Takes like a day for one sample. Would't be advisable to run the pipeline on a personal computer. Especially if one has multiple files
| # map to KO and EC terms | ||
| # remove the intermediate files | ||
|
|
||
| #specify paths |
There was a problem hiding this comment.
should the user be chaning anything else besides those paths below (lines 14-17)? if not, please mark this section clearly, so that there is no doubt the user should only change those paths. If there is anything else to be changed, I am a big fan of grouping those things together, so that the user does not need to go through the whole script to change variables.
There was a problem hiding this comment.
here user only needs to change the paths only. Will automate the workflow and make some of the things clear.
Added script for functional annotation