Fasta_Parsing_and_Gene_Annotations

Project has following folder tree:

├── coding_task_description.txt  
├── README.md  
├── project_files  
│   └── find_fastq.py
│   ├── frequent_sequences.py
│   ├── gtf_annotate_tsv.py
│   ├── annotated_coordinates.tsv (created after gtf_annotate_tsv is run or testfile is run)
│   ├── testfile.py
│   └── sample_files
|       └── optional_gtf  
|           └── gene_annotations.gtf  
│       ├── optional_tsv
|           └── coordinates_to_annotate.tsv 
│       ├── fasta
|           └── sample1.fasta 
|           └── sample2.fasta
│       └── fastq
|           └── otherreads
|               └── (Arbitrarily nested reads (Sample_R3.fastq, Sample_R4.fastq, Sample_R5.fastq, Sample_R6.fastq)
|           └── read1
|               └── Sample_R1.fastq
|           └── read2
|               └── Sample_R2.fastq

There are 5 main files/directories: find_fastq.py : Problem 1 frequent_sequences.py : Problem 2 gtf_annotate_tsv.py : Optional Problem test_script.py : Used to test all files at once sample_files : contains FASTQ, FASTA, GTF, and TSV files for testing purposes

Description: Please run the Python scripts from the directory in which they're stored (project_files) as the files paths are automatically created using the current working directory plus an extension. Each Python file has tests if run as the main file, or all of the Python files can be tested at once by running testfile.py.

Find_fastq.py finds all the FASTQ files in a directory, recursively searches subdirectories, and returns the FASTQ file name and percent of nucleotides longer than 30 for the file. Frequent_sequences.py finds the ten most frequent DNA sequences in a FASTA file and returns the DNA sequence along with the frequency of each. Gtf_annotate_tsv.py uses an annotated GTF file to help annotate a TSV file, which contains the chromosome and the location, with the gene added to a new column and a new TSV file is returned. The sample_files contain all the files tested.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fasta_Parsing_and_Gene_Annotations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
project_files		project_files
README.md		README.md
coding_task_description.txt		coding_task_description.txt

Folders and files

Latest commit

History

Repository files navigation

Fasta_Parsing_and_Gene_Annotations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages