In this competition, you are provided the labeled data of SP1 transcription factor binding and non-binding sites on human chromosome1. There are 1000 sequences for binding sites and 1000 sequences for non-binding sites. Each sequence has 14 nucleotide base pairs. There are four different nucleobase types in the DNA sequence: adenine (A), cytosine (C), guanine (G), thymine (T). The sequences in the dataset are also denoted by these letters.
- scikit-learn
- Keras
- Tensorflow
- numpy
- pandas
- Install dna2vec from repo: https://github.com/pnpnpn/dna2vec
- Download the pre trainned dna2vec Model from https://github.com/pnpnpn/dna2vec/blob/master/pretrained/dna2vec-20161219-0153-k3to8-100d-10c-29320Mbp-sliding-Xat.w2v