Skip to content

SparseStorage, concurrent_flat_map, and SparseMatrixAtomic#43

Open
bendavid wants to merge 19 commits intomainfrom
calibrationdev
Open

SparseStorage, concurrent_flat_map, and SparseMatrixAtomic#43
bendavid wants to merge 19 commits intomainfrom
calibrationdev

Conversation

@bendavid
Copy link
Copy Markdown
Owner

@bendavid bendavid commented Apr 9, 2026

Adds a sparse storage backend for HistoBoost and the supporting
data structures (lock-free concurrent map and SparseMatrixAtomic).

Bottom-up commit list:

  • add python script for tests
  • Add SymMatrixAtomic
  • minor improvement for SymMatrixAtomic and add initial version of SparseMatrixAtomic
  • fix deprecated storage_type access
  • fix constness
  • make wrapper more flexible/robust
  • flexible column types for quantile helpers
  • add missing include
  • make range_to more flexible
  • add lock-free insert-only concurrent_flat_map
  • add SparseMatrixAtomic test driver
  • SparseMatrixAtomic: switch to narf::concurrent_flat_map
  • concurrent_flat_map: add move constructor and assignment
  • HistoBoost: add SparseStorage option backed by concurrent_flat_map
  • HistoBoost SparseStorage: convert result to wums.SparseHist
  • concurrent_flat_map: serialize segment growth via sentinel
  • SparseStorage: fix ND linearization mismatch with SparseHist
  • SparseMatrixAtomic: configurable fill_fraction
  • HistShiftHelper: guard against non-finite bin geometry

This series is the narf side of the larger sparse-input rework that
the rabbit and wums PRs make use of.

WMass/rabbit#129
WMass/wums#25

bendavid and others added 19 commits April 3, 2026 21:15
A segmented open-addressing hash map for integer keys supporting
concurrent lock-free find / insert / emplace / expansion. State bits
are encoded in the two MSBs of each slot's key. Includes tests
covering single-threaded correctness, pointer stability across
expansion, and multi-threaded concurrent insert/find, plus a test
for SparseMatrixAtomic that exercises its public API under
concurrent fetch_add.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces tbb::concurrent_unordered_map with the new lock-free
insert-only flat map, removing the FIXME about lock contention on
inserts. reserve() becomes a no-op since the new map grows on
demand.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Required so the map can live as a member of other movable types
(e.g. a boost::histogram storage class). The moved-from object is
left in a destroy-only state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds narf::concurrent_sparse_storage, a boost::histogram Storage type
backed by narf::concurrent_flat_map with has_threading_support = true,
plus a make_histogram_sparse factory and python-friendly snapshot
helpers (boost::histogram does not expose its storage_ member to
cppyy directly).

HistoBoost gains a SparseStorage marker class taking an estimated
fill_fraction (default 0.1) used to pre-size the underlying map and
avoid most on-the-fly expansions. Tensor weights are not supported in
this mode and conversion to a python hist.Hist is skipped; the raw
RResultPtr is returned. Includes an end-to-end RDataFrame test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The SparseStorage path now lazily converts the underlying C++
sparse histogram to a wums.sparse_hist.SparseHist on first
dereference, snapshotting the concurrent_flat_map into flat
indices/values that match the with-flow row-major layout.
Pass convert_to_hist=False to get the raw RResultPtr instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously every thread that observed a saturated tail segment
speculatively allocated a doubled-size successor and then either
won the CAS or freed it. Under high thread contention this caused
a transient memory spike of M_threads * segment_size per growth
event, easily inflating peak RSS by an order of magnitude for
multi-GB segments and potentially fragmenting the address space.

ensure_next now CAS-publishes a "growing" sentinel into the
segment's next pointer before allocating; only the winning thread
performs the allocation while losers yield-spin until the real
successor is published. All segment walks use a new observed_next
helper that treats the sentinel as "no successor yet".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
boost::histogram linearizes axes column-major (leftmost axis has
stride 1), but wums.SparseHist expects numpy row-major flat
indices. For ND histograms this caused entries to land in the
wrong bins (often flow bins) and silently disappear from
toarray(flow=False); 1D was unaffected and so the existing test
did not catch it.

The conversion now un-ravels each boost-linear key under F order
and re-ravels under C order before constructing the SparseHist.
Adds a 3D test that cross-checks against a dense HistoBoost.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the hard-coded size0*size1/40 initial capacity with a
fill_fraction constructor argument (default 0.025 to match the
previous behaviour) that sizes the underlying concurrent_flat_map
to fill_fraction * size0 * size1 entries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Treat continuous-axis bins with infinite width or center as flow bins
and return zero correction, preventing NaN propagation when an axis
uses np.inf as a bin edge.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant