Skip to content

kevinw02/Fasta_Parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Fasta_Parsing_and_Gene_Annotations

Project has following folder tree:

├── coding_task_description.txt  
├── README.md  
├── project_files  
│   └── find_fastq.py
│   ├── frequent_sequences.py
│   ├── gtf_annotate_tsv.py
│   ├── annotated_coordinates.tsv (created after gtf_annotate_tsv is run or testfile is run)
│   ├── testfile.py
│   └── sample_files
|       └── optional_gtf  
|           └── gene_annotations.gtf  
│       ├── optional_tsv
|           └── coordinates_to_annotate.tsv 
│       ├── fasta
|           └── sample1.fasta 
|           └── sample2.fasta
│       └── fastq
|           └── otherreads
|               └── (Arbitrarily nested reads (Sample_R3.fastq, Sample_R4.fastq, Sample_R5.fastq, Sample_R6.fastq)
|           └── read1
|               └── Sample_R1.fastq
|           └── read2
|               └── Sample_R2.fastq

There are 5 main files/directories: find_fastq.py : Problem 1 frequent_sequences.py : Problem 2 gtf_annotate_tsv.py : Optional Problem test_script.py : Used to test all files at once sample_files : contains FASTQ, FASTA, GTF, and TSV files for testing purposes

Description: Please run the Python scripts from the directory in which they're stored (project_files) as the files paths are automatically created using the current working directory plus an extension. Each Python file has tests if run as the main file, or all of the Python files can be tested at once by running testfile.py.

Find_fastq.py finds all the FASTQ files in a directory, recursively searches subdirectories, and returns the FASTQ file name and percent of nucleotides longer than 30 for the file. Frequent_sequences.py finds the ten most frequent DNA sequences in a FASTA file and returns the DNA sequence along with the frequency of each. Gtf_annotate_tsv.py uses an annotated GTF file to help annotate a TSV file, which contains the chromosome and the location, with the gene added to a new column and a new TSV file is returned. The sample_files contain all the files tested.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages