Precomputation for DIAScoring and DIAHelper by taranehstrunk · Pull Request #99 · cbielow/OpenMS

taranehstrunk · 2020-05-28T08:00:36Z

Implementation of Isotope Distribution Cache in OpenSWATHWorkflow

We noticed that in this workflow where in DIAScoring and DIAHelper we compute isotope Distributions, a huge part of the runtime is lost in the calculation estimateFromPeptideWeight(). (see the two "towers" in the flame graph below)

In order to avoid doing this calculation so often, we implemented a Cache which we save as a member in each DIAScoring and DIAHelper. For this implementation, we used IsotopeDistributionCache.cpp and IsotopeDistributionCache.h. The needed feature was already implemented there, such that we just had to adapt the usage of this isotope_distribution_ member in each file and the corresponding calling functions.
This change in the workflow led to a good improvement in runtime.

Benchmarks

Without caching:
OpenSwathWorkflow took 15:38 m (CPU)
With IsotopDistributionCache:
OpenSwathWorkflow took 13:59 m (CPU)

cbielow · 2020-05-28T13:41:03Z

src/openms/include/OpenMS/ANALYSIS/OPENSWATH/DIAHelper.h


 #include <OpenMS/CHEMISTRY/AASequence.h>
 #include <OpenMS/OPENSWATHALGO/DATAACCESS/DataStructures.h>
+#include <OpenMS/FILTERING/DATAREDUCTION/IsotopeDistributionCache.h>


no need to include the header here.
Just forward declare should be enough (to save on compile time).
class IsotopeDistributionCache; within the OpenMS namespace.

cbielow · 2020-05-28T13:42:36Z

src/openms/include/OpenMS/ANALYSIS/OPENSWATH/DIAHelper.h

    /// simulate spectrum from AASequence
-    OPENMS_DLLAPI void simulateSpectrumFromAASequence(const AASequence& aa,
+    OPENMS_DLLAPI void simulateSpectrumFromAASequence(IsotopeDistributionCache& iso,
+                                        const AASequence& aa,


it is probably a bit more intuitive to swap argument 1 and 2... because AASequence is the primary input

cbielow · 2020-05-28T13:43:37Z

src/openms/include/OpenMS/ANALYSIS/OPENSWATH/DIAPrescoring.h

 #include <OpenMS/OPENSWATHALGO/DATAACCESS/TransitionExperiment.h>

 #include <OpenMS/DATASTRUCTURES/DefaultParamHandler.h>
+#include <OpenMS/ANALYSIS/OPENSWATH/DIAScoring.h>


no need for this header.
forward declare IsotopeDistributionCache as before

cbielow · 2020-05-28T13:44:42Z

src/openms/include/OpenMS/FILTERING/DATAREDUCTION/IsotopeDistributionCache.h

+    //@}
+
+
+    void precalculateDistributionCache(Size num_begin, Size index);


can you document what the function does and what the parameters mean?

cbielow · 2020-05-28T13:44:56Z

src/openms/include/OpenMS/FILTERING/DATAREDUCTION/IsotopeDistributionCache.h

+
+    void precalculateDistributionCache(Size num_begin, Size index);
+
+    void renormalize( TheoreticalIsotopePattern& isotopes, IsotopeDistribution& isotope_dist);


docs here as well please

cbielow · 2020-05-28T13:45:48Z

src/openms/include/OpenMS/FILTERING/DATAREDUCTION/IsotopeDistributionCache.h

-    const TheoreticalIsotopePattern & getIsotopeDistribution(double mass) const;
+    const TheoreticalIsotopePattern& getIsotopeDistribution(double mass) ;
+
+    const IsotopeDistribution& getIntensity(double mass);


this is not really an intensity which is returned here, right?
Can you find a better name for the method?
Also document it

cbielow · 2020-05-28T13:46:18Z

src/openms/include/OpenMS/FILTERING/DATAREDUCTION/IsotopeDistributionCache.h

    /// Vector of pre-calculated isotope distributions for several mass windows
    std::vector<TheoreticalIsotopePattern> isotope_distributions_;

+    std::vector<IsotopeDistribution> distribution_cache_;


can you document the four members?

cbielow · 2020-05-28T13:46:40Z

src/openms/include/OpenMS/FILTERING/DATAREDUCTION/IsotopeDistributionCache.h

+
    double mass_window_width_;
+
+    double intensity_percentage_ ;


what is the percentage? (document)

cbielow · 2020-05-28T13:47:49Z

src/openms/source/ANALYSIS/OPENSWATH/DIAHelper.cpp

-      CoarseIsotopePatternGenerator solver(nr_isotopes);
-      TheoreticalIsotopePattern isotopes;
-      auto d = solver.estimateFromPeptideWeight(product_mz * charge);
+      auto d = iso.getIntensity(product_mz * charge);


can be const auto& d = .... to avoid the copy?

cbielow · 2020-05-28T13:51:25Z

src/openms/source/ANALYSIS/OPENSWATH/DIAHelper.cpp


      for (std::size_t i = 0; i < spec.size(); ++i)
      {
        std::vector<std::pair<double, double> > isotopes;


just as a suggestion for speed (even though this was not part of the PR):
move std::vector<std::pair<double, double> > isotopes; in front of the loop and just clear it inside to safe a lot of allocations.

cbielow · 2020-05-28T14:00:28Z