SparseStorage, concurrent_flat_map, and SparseMatrixAtomic by bendavid · Pull Request #43 · bendavid/narf

bendavid · 2026-04-09T06:45:39Z

Adds a sparse storage backend for HistoBoost and the supporting
data structures (lock-free concurrent map and SparseMatrixAtomic).

Bottom-up commit list:

add python script for tests
Add SymMatrixAtomic
minor improvement for SymMatrixAtomic and add initial version of SparseMatrixAtomic
fix deprecated storage_type access
fix constness
make wrapper more flexible/robust
flexible column types for quantile helpers
add missing include
make range_to more flexible
add lock-free insert-only concurrent_flat_map
add SparseMatrixAtomic test driver
SparseMatrixAtomic: switch to narf::concurrent_flat_map
concurrent_flat_map: add move constructor and assignment
HistoBoost: add SparseStorage option backed by concurrent_flat_map
HistoBoost SparseStorage: convert result to wums.SparseHist
concurrent_flat_map: serialize segment growth via sentinel
SparseStorage: fix ND linearization mismatch with SparseHist
SparseMatrixAtomic: configurable fill_fraction
HistShiftHelper: guard against non-finite bin geometry

This series is the narf side of the larger sparse-input rework that
the rabbit and wums PRs make use of.

WMass/rabbit#129
WMass/wums#25

…seMatrixAtomic

A segmented open-addressing hash map for integer keys supporting concurrent lock-free find / insert / emplace / expansion. State bits are encoded in the two MSBs of each slot's key. Includes tests covering single-threaded correctness, pointer stability across expansion, and multi-threaded concurrent insert/find, plus a test for SparseMatrixAtomic that exercises its public API under concurrent fetch_add. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replaces tbb::concurrent_unordered_map with the new lock-free insert-only flat map, removing the FIXME about lock contention on inserts. reserve() becomes a no-op since the new map grows on demand. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Required so the map can live as a member of other movable types (e.g. a boost::histogram storage class). The moved-from object is left in a destroy-only state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds narf::concurrent_sparse_storage, a boost::histogram Storage type backed by narf::concurrent_flat_map with has_threading_support = true, plus a make_histogram_sparse factory and python-friendly snapshot helpers (boost::histogram does not expose its storage_ member to cppyy directly). HistoBoost gains a SparseStorage marker class taking an estimated fill_fraction (default 0.1) used to pre-size the underlying map and avoid most on-the-fly expansions. Tensor weights are not supported in this mode and conversion to a python hist.Hist is skipped; the raw RResultPtr is returned. Includes an end-to-end RDataFrame test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The SparseStorage path now lazily converts the underlying C++ sparse histogram to a wums.sparse_hist.SparseHist on first dereference, snapshotting the concurrent_flat_map into flat indices/values that match the with-flow row-major layout. Pass convert_to_hist=False to get the raw RResultPtr instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Previously every thread that observed a saturated tail segment speculatively allocated a doubled-size successor and then either won the CAS or freed it. Under high thread contention this caused a transient memory spike of M_threads * segment_size per growth event, easily inflating peak RSS by an order of magnitude for multi-GB segments and potentially fragmenting the address space. ensure_next now CAS-publishes a "growing" sentinel into the segment's next pointer before allocating; only the winning thread performs the allocation while losers yield-spin until the real successor is published. All segment walks use a new observed_next helper that treats the sentinel as "no successor yet". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

boost::histogram linearizes axes column-major (leftmost axis has stride 1), but wums.SparseHist expects numpy row-major flat indices. For ND histograms this caused entries to land in the wrong bins (often flow bins) and silently disappear from toarray(flow=False); 1D was unaffected and so the existing test did not catch it. The conversion now un-ravels each boost-linear key under F order and re-ravels under C order before constructing the SparseHist. Adds a 3D test that cross-checks against a dense HistoBoost. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace the hard-coded size0*size1/40 initial capacity with a fill_fraction constructor argument (default 0.025 to match the previous behaviour) that sizes the underlying concurrent_flat_map to fill_fraction * size0 * size1 entries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Treat continuous-axis bins with infinite width or center as flow bins and return zero correction, preventing NaN propagation when an axis uses np.inf as a bin edge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

bendavid and others added 19 commits April 3, 2026 21:15

add python script for tests

7d6c3a0

Add SymMatrixAtomic

e84d73a

minor improvement for SymMatrixAtomic and add initial version of Spar…

1a85357

…seMatrixAtomic

fix deprecated storage_type access

a5d2c6d

fix constness

6c27fcb

make wrapper more flexible/robust

db0394f

flexible column types for quantile helpers

5a4c98d

add missing include

7468541

make range_to more flexible

ff899e4

add SparseMatrixAtomic test driver

1c39602

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

concurrent_flat_map: add move constructor and assignment

a39c56d

Required so the map can live as a member of other movable types (e.g. a boost::histogram storage class). The moved-from object is left in a destroy-only state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SparseStorage, concurrent_flat_map, and SparseMatrixAtomic#43

SparseStorage, concurrent_flat_map, and SparseMatrixAtomic#43
bendavid wants to merge 19 commits intomainfrom
calibrationdev

bendavid commented Apr 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bendavid commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bendavid commented Apr 9, 2026 •

edited

Loading