Skip to content

added humann2 script#1

Open
Marysteph wants to merge 1 commit into
microbiome-immunity-project:masterfrom
Marysteph:mip-functions
Open

added humann2 script#1
Marysteph wants to merge 1 commit into
microbiome-immunity-project:masterfrom
Marysteph:mip-functions

Conversation

@Marysteph

Copy link
Copy Markdown

Added script for functional annotation

@Marysteph

Copy link
Copy Markdown
Author

@tkosciol

@tkosciolek tkosciolek left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good overall. just a few comments

humann2_regroup_table -i $OUTPUT_DIR/normalized_genefamilies_cpm.tsv -o $OUTPUT_DIR/normalized_genefamilies_cpm_EC.tsv -c $MAPPING_FILES/map_level4ec_uniref90.txt.gz

# remove the intermediate files
rm -rf $OUTPUT_DIR/*_humann2_temp

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are those temp files potentially useful for debugging? if yes, then consider not removing them by default. or using a flag to determine if temp files should be kept or removed. For example using the getopts function (e.g. https://stackoverflow.com/questions/14447406/bash-shell-script-check-for-a-flag-and-grab-its-value)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

temp files can be useful for debugging. They were so huge, especially sam files from bowtie step. They were taking so much space on the cluster. I will consider be keeping them


cd $WORKING_DIR

# running humann2

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how long does the pipeline take? can I run it on a single computer or do I need to run it on a cluser? If it needs to be run on a cluster, please add a docstring at the top specifying that and include sample parameters for a specific queuing system. If the script can be run on a single computer, but it takes more than a few minutes, I'd consider changing the comments (like #running humann2) into print statements (echo "running humann2...), so that the user knows the script is executing.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay will make changes on that. Diamond translated search part is very computational intensive. Takes like a day for one sample. Would't be advisable to run the pipeline on a personal computer. Especially if one has multiple files

# map to KO and EC terms
# remove the intermediate files

#specify paths

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the user be chaning anything else besides those paths below (lines 14-17)? if not, please mark this section clearly, so that there is no doubt the user should only change those paths. If there is anything else to be changed, I am a big fan of grouping those things together, so that the user does not need to go through the whole script to change variables.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here user only needs to change the paths only. Will automate the workflow and make some of the things clear.

@tkosciolek tkosciolek self-requested a review March 18, 2020 13:19
@tkosciolek tkosciolek self-assigned this Mar 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants