EEL6935 Big Data Ecosystems Project

Problem Statement

In this competition, you are provided the labeled data of SP1 transcription factor binding and non-binding sites on human chromosome1. There are 1000 sequences for binding sites and 1000 sequences for non-binding sites. Each sequence has 14 nucleotide base pairs. There are four different nucleobase types in the DNA sequence: adenine (A), cytosine (C), guanine (G), thymine (T). The sequences in the dataset are also denoted by these letters.

Requirements

scikit-learn
Keras
Tensorflow
numpy
pandas
Install dna2vec from repo: https://github.com/pnpnpn/dna2vec
Download the pre trainned dna2vec Model from https://github.com/pnpnpn/dna2vec/blob/master/pretrained/dna2vec-20161219-0153-k3to8-100d-10c-29320Mbp-sliding-Xat.w2v

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
DNA classification.ipynb		DNA classification.ipynb
README.md		README.md
results.csv		results.csv
test.csv		test.csv
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EEL6935 Big Data Ecosystems Project

Problem Statement

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EEL6935 Big Data Ecosystems Project

Problem Statement

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages