Skip to content

Extract kmers, skipmers, minimizers.

Notifications You must be signed in to change notification settings

DBRetina/listDecoder

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

107 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kmerDecoder

Ubuntu

Quick Setup (using CMake)

Create CMakeLists.txt file in your project directory

cmake_minimum_required (VERSION 3.14)
project (KD_Test C CXX)

set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++14 -fopenmp")

add_subdirectory(kmerDecoder)

file(GLOB SOURCES
        "${PROJECT_SOURCE_DIR}/src/main.cpp"
        )

add_executable (kdtest ${SOURCES})
target_link_libraries(kdtest kmerDecoder)

Using kmerDecoder with cmake add_subdirectory() is recommended.

Usage Example

create sample.fa

>SAMPLE
ACGTAGCATGCATGACGATGCTAGCGT

create aa_sample.fa

>SAMPLE
KLGCKGAMLMKMKHACGKGLALMLLMMHAL

src/main.cpp

#include "kmerDecoder.hpp"

void extract(kmerDecoder *KD);

int main() {
  std::string filename = "path_to/sample.fa";
  unsigned int chunk_size = 1;

  /*
  Create three kmerDecoder objects
  */

  // 1- Kmers Mode > Kmers(filename, chunk_size, kSize)
  kmerDecoder *KMERS = new Kmers(filename, chunk_size, 15);

  // 2- Skipmers Mode > Skipmers(filename, chunk_size, m, n, k)
  kmerDecoder *SKIPMERS = new Skipmers(filename, chunk_size, 2, 3, 10);

  // 3- Minimizers Mode > Minimizers(filename, chunk_size, k, w)
  kmerDecoder *MINIMIZERS = new Minimizers(filename, chunk_size, 5, 10);

  // 4- Protein kmers Mode > aaKmers(filename, chunk_size, k) | Max kSize = 11
  kmerDecoder *PROT_KMERS = new aaKmers(prot_filename, _chunk_size, 11);

  // Start Extraction

  std::cout << "Mode: Kmers" << "\n";
  extract(KMERS);
  std::cout << "------------------------------------" << std::endl;

  std::cout << "Mode: Skipmers" << "\n";
  extract(SKIPMERS);
  std::cout << "------------------------------------" << std::endl;

  std::cout << "Mode: Minimizers" << "\n";
  extract(MINIMIZERS);
  std::cout << "------------------------------------" << std::endl;

  std::cout << "Mode: Protein Kmers" << "\n";
  extract(PROT_KMERS);
  std::cout << "------------------------------------" << std::endl;
}

void extract(kmerDecoder *KD) {
  while (!KD->end()) {
    KD->next_chunk();

    for (const auto &seq : *KD->getKmers()) {
      std::cout << "Read ID: " << seq.first << std::endl;
      for (const auto &kmer : seq.second) {
        std::cout << kmer.str << ": " << kmer.hash << std::endl;
      }
    }
  }
}

Output

Mode: Kmers
Read ID: sample
ACTGATCGTAGCTAG: 270607162
CTGATCGTAGCTAGC: 944645122
TGATCGTAGCTAGCA: 557632235
GATCGTAGCTAGCAT: 1027309074
ATCGTAGCTAGCATG: 937389716
TCGTAGCTAGCATGC: 386653658
CGTAGCTAGCATGCA: 472543720
GTAGCTAGCATGCAT: 769426497
TAGCTAGCATGCATC: 926137824
AGCTAGCATGCATCG: 58327299
GCTAGCATGCATCGT: 33644462
CTAGCATGCATCGTA: 873893451
TAGCATGCATCGTAG: 920819452
AGCATGCATCGTAGC: 343354516
GCATGCATCGTAGCT: 330181179
CATGCATCGTAGCTA: 94646772
ATGCATCGTAGCTAG: 1034175364
TGCATCGTAGCTAGC: 777708297
------------------------------------
Mode: Skipmers
Read ID: sample
ACGACGAGTA: 539404
CGACGAGTAC: 74282
GACGAGTACA: 374149
ACGAGTACAG: 210706
CGAGTACAGC: 854584
GAGTACAGCT: 1021218
AGTACAGCTC: 68631
GTACAGCTCT: 1045782
TACAGCTCTA: 75413
ACAGCTCTAC: 316100
CAGCTCTACT: 194024
AGCTCTACTG: 873565
GCTCTACTGC: 272874
CTATGTGCAG: 527346
TATGTGCAGA: 26575
ATGTGCAGAT: 191221
TGTGCAGATC: 455797
GTGCAGATCA: 334737
TGCAGATCAC: 32520
GCAGATCACG: 932927
CAGATCACGA: 532836
AGATCACGAG: 36901
GATCACGAGT: 471900
ATCACGAGTA: 638086
TCACGAGTAC: 467666
TGTCTACTGC: 1011830
GTCTACTGCT: 338374
TCTACTGCTG: 344123
CTACTGCTGA: 904801
TACTGCTGAT: 478035
ACTGCTGATG: 872943
CTGCTGATGT: 158651
TGCTGATGTG: 301791
GCTGATGTGC: 596234
CTGATGTGCA: 378027
TGATGTGCAG: 1019609
------------------------------------
Mode: Minimizers
Read ID: sample
ACTGA: 647
AGCAT: 485
AGCTA: 715
ATCGT: 25

About

Extract kmers, skipmers, minimizers.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 66.9%
  • CMake 33.1%