Skip to content

Comments

Save frequencies when bulk-saving of times is enabled#322

Open
oleksandr-pavlyk wants to merge 6 commits intoNVIDIA:mainfrom
oleksandr-pavlyk:feature/save-frequencies
Open

Save frequencies when bulk-saving of times is enabled#322
oleksandr-pavlyk wants to merge 6 commits intoNVIDIA:mainfrom
oleksandr-pavlyk:feature/save-frequencies

Conversation

@oleksandr-pavlyk
Copy link
Collaborator

Closes #318

This PR:

  • applies changes started by @gevtushenko to collect bulk values of frequency estimates and enable saving them alongside bulk of time-stamps
  • the same batched writeout_data utility is used to save duration and frequency data faster

@oleksandr-pavlyk
Copy link
Collaborator Author

oleksandr-pavlyk commented Feb 13, 2026

Questions to be answered:

  • should dumping of frequencies require a CLI option separate from the one controlling dumping of time-stamps?
    • Frequencies are saved when throttling control is enabled.
      Q: what happens when throttling control is turned off and bulk data is requested?

@oleksandr-pavlyk
Copy link
Collaborator Author

oleksandr-pavlyk commented Feb 18, 2026

Setting -throttle-threshold 0 disables collection of GPU frequency data:

$ rg 'm_check_throttling\(' ~/repos/nvbench/nvbench/
~/repos/nvbench/nvbench/detail/measure_cold.cu
47:    , m_check_throttling(!exec_state.get_run_once() && exec_state.get_throttle_threshold() > 0.f)

I executed the benchmark 2 times:

./nvbench.example.cpp17.axes -d 0 --jsonbin axes-d0 --stopping-criterion entropy
./nvbench.example.cpp17.axes -d 0 --jsonbin axes-d1 --throttle-threshold 0.0 --stopping-criterion entropy

Executing ./nvbench.example.cpp17.axes -d 0 --jsonbin axes-d1 --throttle-threshold 0.0 --stopping-criterion entropy thus exercises configuration flagged as possibly problematic in the earlier comment and produces empty BIN files for frequency data.

Sizes of BIN files for each run

Inspecting the JSON file corresponding to the run with default value of throttling threshold, the GPU frequencies are collected and non-empty binary files are generated:

 jq '.benchmarks[].states[] | select(.is_skipped == false ) | .summaries[] | select(has("hint")) | select(.hint | test("file/sample_freqs")) | .data[1].value' axes-d0 | uniq -c | awk '
  { buf[NR]=$0 }
  END {
    if (NR > 7) {
      for (i=1;i<=3;i++) print buf[i]
      printf "... (%d lines omitted) ...\n", NR-6
      for (i=NR-2;i<=NR;i++) print buf[i]
    } else {
      for (i=1;i<=NR;i++) print buf[i]
    }
  }'
      1 "312"
      1 "304"
      1 "204"
... (74 lines omitted) ...
      1 "320"
      1 "310"
      1 "344"

Inspecting the JSON file corresponding to the run with zero throttling threshold which turns off the collection of GPU frequency data, the sample frequencies binary files are all empty:

$ jq '.benchmarks[].states[] | select(.is_skipped == false ) | .summaries[] | select(has("hint")) | select(.hint | test("file/sample_freqs")) | .data[1].value' axes-d1 | uniq -c | awk '
  { buf[NR]=$0 }
  END {
    if (NR > 7) {
      for (i=1;i<=3;i++) print buf[i]
      printf "... (%d lines omitted) ...\n", NR-6
      for (i=NR-2;i<=NR;i++) print buf[i]
    } else {
      for (i=1;i<=NR;i++) print buf[i]
    }
  }'
     80 "0"

This could be used to save data as float32_t, or float64_t.
This flexibility is useful for experimentation.
This places all std::vector members together. Added default initialization
to all std::vector members, and all other members with default constructors.

Exceptions are references and nvbench::launch m_launch; member
@oleksandr-pavlyk
Copy link
Collaborator Author

After discussing the issue internally, we should always collect GPU frequencies even if throttling threshold has been set to zero.

@oleksandr-pavlyk
Copy link
Collaborator Author

**Q**: what happens when throttling control is turned off and bulk data is requested?

With GPU frequency bulk data now collected for any setting of throttling threshold, use of --jsonbin would always produce frequency data of the same size as the kernel execution duration data.

…tive

measure_cold class now directly inherits m_check_throttling from state.
This ensures that when `--jsonbin` is specified frequency data corresponding
to timing data are available to write out.
@oleksandr-pavlyk oleksandr-pavlyk marked this pull request as ready for review February 20, 2026 22:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add option to bulk store frequences

2 participants