Conversation
| { | ||
| UInt32 base :22 ; // base value of DA | ||
| UInt32 lcheck :8 ; // arc label from parent node | ||
| UInt32 leaf_flag :1 ; // node is leaf |
There was a problem hiding this comment.
one could play around with the order of the members here, and see what performs the best (since extraction of a single bit is not cheap) and extraction of lcheck could be cheaper if moved to the back...
(keep in mind for later maybe)
cbielow
left a comment
There was a problem hiding this comment.
I did not get to the end. will finish later :)
| { | ||
| bc_[node_pos].leaf_flag = 1; | ||
| bc_[node_pos].base = static_cast<UInt32>(tail_.size()); | ||
| storeInTail_(sequences_[unique_idx_[begin]].substr(depth)); |
There was a problem hiding this comment.
substr() is expensive.
try a StringView
| /** | ||
| * @brief | ||
| * @param pos_in_protein position in the set protein | ||
| * @param peptide_index index of sequence in given vector |
There was a problem hiding this comment.
there is no vector given here. You mean the vector given in the ctor... but thats not clear
| bool failure_(); | ||
|
|
||
| /** | ||
| * @brief returns the supply link if this does not eliminate ambiguous aa. Don't do anything special with 0. Must be considered when calling |
There was a problem hiding this comment.
this function is only useful for spawns, i.e. when dealing with ambAA's, right?
So should there be a sibling function
void followSupplyLink(Int32 from, Int32& to)?
Can you also make input=output, i.e. merge the first two parameters?
| { | ||
| Int32 node; ///< Node after the ambiguous aa transition | ||
| UInt32 prot_pos; ///< Protein position after the ambiguous aa | ||
| uint8_t counter; ///< Count of all read ambiguous aa |
There was a problem hiding this comment.
so, this is the number of aa's in the path from root to 'node'?
| UInt32 aa_count_ = 0; | ||
|
|
||
| /// Indices of the sorted sequences, duplicate sequences are removed | ||
| std::vector<UInt32> unique_idx_{}; |
| void buildTrie_(Size begin, const Size end, const Size depth, const UInt32 node_pos); | ||
|
|
||
| /// returns base value | ||
| UInt32 xCheck_(); |
| OPENMS_LOG_INFO << "Mapping " << pep_DB.size() << " peptides to " << (proteins.size() == PROTEIN_CACHE_SIZE ? "? (unknown number of)" : String(proteins.size())) << " proteins." << std::endl; | ||
|
|
||
| if (length(pep_DB) == 0) | ||
| if (pep_DB.size() == 0) |
| AhoCorasickDA fuzzyAC(pep_DB); | ||
| auto s2 = std::chrono::high_resolution_clock::now(); | ||
| std::chrono::duration<double,std::milli> elapsed = s2 - s1; | ||
| std::cout << "Construction time " << elapsed.count() << " ms"<< std::endl; |
There was a problem hiding this comment.
why the extra chrono?
StopWatch is a high-resolution clock
(just don't convert to int() as done below to retain its accuracy)
| // CDA | ||
| //------------------------------------------------------------------------------------------------------------------------------------- | ||
|
|
||
| UInt32 AhoCorasickDA::stat() |
|
|
||
| #include <OpenMS/ANALYSIS/ID/AhoCorasickAmbiguous.h> | ||
| //#include <OpenMS/ANALYSIS/ID/AhoCorasickAmbiguous.h> | ||
| #include <OpenMS/ANALYSIS/ID/AhoCorasickDA.h> |
There was a problem hiding this comment.
can you get rid of the seqan includes?
| namespace OpenMS | ||
| { | ||
|
|
||
| /** |
There was a problem hiding this comment.
can you describe briefly some of the special tweaks here?
e.g. duplicate sequences are handled externally; how internal end nodes are handled; how spawns are processed (at what time and in what order), .... so the user gets a rough idea of what to expect here
| struct BCNode_ | ||
| { | ||
| UInt32 base :22 ; ///< base value of the node | ||
| UInt32 lcheck :8 ; ///< arc label from parent node. Only code of the aa is stored |
There was a problem hiding this comment.
it might be faster if you switch the order of base and lcheck because that should(!) save some bitshifting when accessing lcheck .. but it might not help....
don't redo the benchmarks!
Description
Please include a summary of the change and which issue is fixed.
Also please:
Checklist: