Precomputation for DIAScoring and DIAHelper#99
Precomputation for DIAScoring and DIAHelper#99taranehstrunk wants to merge 1 commit intocbielow:developfrom
Conversation
|
|
||
| #include <OpenMS/CHEMISTRY/AASequence.h> | ||
| #include <OpenMS/OPENSWATHALGO/DATAACCESS/DataStructures.h> | ||
| #include <OpenMS/FILTERING/DATAREDUCTION/IsotopeDistributionCache.h> |
There was a problem hiding this comment.
no need to include the header here.
Just forward declare should be enough (to save on compile time).
class IsotopeDistributionCache; within the OpenMS namespace.
| /// simulate spectrum from AASequence | ||
| OPENMS_DLLAPI void simulateSpectrumFromAASequence(const AASequence& aa, | ||
| OPENMS_DLLAPI void simulateSpectrumFromAASequence(IsotopeDistributionCache& iso, | ||
| const AASequence& aa, |
There was a problem hiding this comment.
it is probably a bit more intuitive to swap argument 1 and 2... because AASequence is the primary input
| #include <OpenMS/OPENSWATHALGO/DATAACCESS/TransitionExperiment.h> | ||
|
|
||
| #include <OpenMS/DATASTRUCTURES/DefaultParamHandler.h> | ||
| #include <OpenMS/ANALYSIS/OPENSWATH/DIAScoring.h> |
There was a problem hiding this comment.
no need for this header.
forward declare IsotopeDistributionCache as before
| //@} | ||
|
|
||
|
|
||
| void precalculateDistributionCache(Size num_begin, Size index); |
There was a problem hiding this comment.
can you document what the function does and what the parameters mean?
|
|
||
| void precalculateDistributionCache(Size num_begin, Size index); | ||
|
|
||
| void renormalize( TheoreticalIsotopePattern& isotopes, IsotopeDistribution& isotope_dist); |
| const TheoreticalIsotopePattern & getIsotopeDistribution(double mass) const; | ||
| const TheoreticalIsotopePattern& getIsotopeDistribution(double mass) ; | ||
|
|
||
| const IsotopeDistribution& getIntensity(double mass); |
There was a problem hiding this comment.
this is not really an intensity which is returned here, right?
Can you find a better name for the method?
Also document it
| /// Vector of pre-calculated isotope distributions for several mass windows | ||
| std::vector<TheoreticalIsotopePattern> isotope_distributions_; | ||
|
|
||
| std::vector<IsotopeDistribution> distribution_cache_; |
There was a problem hiding this comment.
can you document the four members?
|
|
||
| double mass_window_width_; | ||
|
|
||
| double intensity_percentage_ ; |
There was a problem hiding this comment.
what is the percentage? (document)
| CoarseIsotopePatternGenerator solver(nr_isotopes); | ||
| TheoreticalIsotopePattern isotopes; | ||
| auto d = solver.estimateFromPeptideWeight(product_mz * charge); | ||
| auto d = iso.getIntensity(product_mz * charge); |
There was a problem hiding this comment.
can be const auto& d = .... to avoid the copy?
|
|
||
| for (std::size_t i = 0; i < spec.size(); ++i) | ||
| { | ||
| std::vector<std::pair<double, double> > isotopes; |
There was a problem hiding this comment.
just as a suggestion for speed (even though this was not part of the PR):
move std::vector<std::pair<double, double> > isotopes; in front of the loop and just clear it inside to safe a lot of allocations.
|
|
||
| void IsotopeDistributionCache::renormalize( | ||
| OpenMS::IsotopeDistributionCache::TheoreticalIsotopePattern& isotopes, | ||
| OpenMS::IsotopeDistribution& isotope_dist) |
There was a problem hiding this comment.
isotope_dist can be const, right because it is only read from?
| auto d = solver.estimateFromPeptideWeight(0.5 * mass_window_width + index * mass_window_width); | ||
| auto d = solver.estimateFromPeptideWeight(0.5*mass_window_width_ + index * mass_window_width_); | ||
|
|
||
| distribution_cache_[index] = d; |
There was a problem hiding this comment.
we are storing a copy of d in the cache, and then go on to trim d and compute isotope_distributions_[index] based on it. Is this intended? If so, please document this somewhere
| //calculate index in the vector | ||
| Size index = static_cast<Size>(std::floor(mass / mass_window_width_)); | ||
|
|
||
| Size index = Size (std::floor(mass/mass_window_width_)); |
There was a problem hiding this comment.
no need for floor(), since casting to Size will take care of it (for positive masses.. which assume we have)
|
|
||
| //Return distribution | ||
| return isotope_distributions_[index]; | ||
| return isotope_distributions_[(int)index]; |
| //Return the intensity for a mass | ||
| const IsotopeDistribution& IsotopeDistributionCache::getIntensity(double mass) | ||
| { | ||
| Size index = Size (std::floor(mass/mass_window_width_)); |
| } | ||
|
|
||
| //Return intensity | ||
| return distribution_cache_[(int)index]; |
| // Here we deal with SWATH files (can be multiple files) | ||
|
|
||
| DIAScoring dia; | ||
| auto isotope_cache = dia.getCache(); |
There was a problem hiding this comment.
Could you not simply instanciate an IsotopeDistributionCache isotope_cache; here directly?
(remove the DIAScoring.h in this case)
Implementation of Isotope Distribution Cache in OpenSWATHWorkflow
We noticed that in this workflow where in DIAScoring and DIAHelper we compute isotope Distributions, a huge part of the runtime is lost in the calculation

estimateFromPeptideWeight(). (see the two "towers" in the flame graph below)In order to avoid doing this calculation so often, we implemented a Cache which we save as a member in each DIAScoring and DIAHelper. For this implementation, we used
IsotopeDistributionCache.cppandIsotopeDistributionCache.h. The needed feature was already implemented there, such that we just had to adapt the usage of thisisotope_distribution_member in each file and the corresponding calling functions.This change in the workflow led to a good improvement in runtime.
Benchmarks
Without caching:
OpenSwathWorkflow took 15:38 m (CPU)With IsotopDistributionCache:
OpenSwathWorkflow took 13:59 m (CPU)