Monitor and plot RSS memory and CPU usage during qlever index#277
Open
tanmay-9 wants to merge 26 commits into
Open
Monitor and plot RSS memory and CPU usage during qlever index#277tanmay-9 wants to merge 26 commits into
qlever index#277tanmay-9 wants to merge 26 commits into
Conversation
qlever index commandqlever index command
…/podman different memUsage parsing
qlever index commandqlever index
…e to the plot. Also make gb use consistent
…u cores used and add downsampling (max_points=500) for plot
…l explicit in index.py
qlever indexqlever index
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
So far, the
qlever indexcommand gave no insight into how much memory an index build actually needs or which index phase is responsible for the peak.With this change, every index build records RSS memory and CPU usage over time, writes
<name>.usage-log.tsv, and renders<name>.usage-log.pngonce the index build finishes. The plot shades each index build phase (parsing, vocabulary merge, conversion, each permutation, and the text index) as a separate band and annotates the memory peak, so resource usage can be attributed to a specific phase. For comparison across runs and settings, the plot is captioned with the git hash of the index binary, theSTXXL_MEMORYsetting, and the batch size. This works whether the index is built natively or in a container (docker/podman).The sampling rate can be set with
--resource-monitor-intervaland the plot density on long builds with--resource monitor-plot-max-points(the sampling itself is unaffected, only how many points are drawn). There is also a--replot-resource-usageoption that re-renders the plot from an existing<name>.usage-log.tsvwithout re-running the index build, which is useful for tweaking plot settings.