Upstream 3pick by fshhr46 · Pull Request #336 · NVIDIA/recsys-examples

fshhr46 · 2026-03-27T06:55:50Z

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

* Refactor dyanmicemb with Cache&Storage. * Add score support and sync of event and stream in prefetch * Cache&Storage C++ codes * Restore optimizer.py * Format dynamicemb' code. * Pass compile. * Support eval mode in Cache&Storage * Cache metrics. * Test forward&backward w/wt cache/eval in BatchedDynamicEmbeddingTablesV2 * update HKV * Test prefetch and flush. * Test externel PS. * Benchmark Cache&Storage * Update benchmark results on EOS * Fix unit test script * Add load API for Storage * Fix memory consumption calculation * Fix memory consumption and copyright

* Admission counter table interface. * Counter table implementation * Unit test and Fix IMA of table operations * Unit test of table.dump and table.load * Add table operation unit tests to CI * Unit test of table.dump&load when num_gpu mismatched. * Unit test to table.insert_and_evict. * Add todo to unlock using index. * Remove kvcounter in dynamicemb_table_v2 * Refine unit test of load and dump APIs of ScoredHashTable * Fix potential issues and rigorously test the score in test_embedding_dump_load.py

* Add gradient clipping in dynamicemb * Fix potential capacity mismatch issue in incremental_dump

* Draft usage of KVCounter. * Add FrequencyAdmissionStrategy and AdmissionStrategy class. * Add storage only admission in training(Need to test). * Add storage only admission in training Step 2. * Add cache and storage admission in training(Need to test). * Pass Admission Counter to KeyValueTable lookup. * Add has admission flag for dedeup part in input dist. * Add test for embedding admssion and fix bug in lookup. * Fix cache frequency bug. * Fix some bugs. * Rebase Counter table and fix some comment's issues. * Move admit stratedy class to embedding_admission.py. * Rebase Counter table and move counter init outsite tableoptions. * Fix some bugs. * Dump and load correct counter files in dynamic_table_v2 * Unit test of counter table's checkpoint. * Decoupling lookup and admission. * Decoupling training and insert. * Do admission before initalizer. * Add DynamicEmbInitializerArgs for admit strategy. * Add Initializer for non-admit embs. * Fix circular dependency about initializer class. * Update document about embedding admission * Add admission options to example.py. * Add comment for admission threshlod. * Fix segmented unique rebase bugs. * Fix segmented unique rebase bugs step2. * Fix duplicated check to counter keys in test_embedding_dump_load.py * Fix test and format codes * Move create_initializer to initializer.py to unify the creation logic. * Add score_strategy and admit strategy into get_grouped_key of DynamicEmbTableOptions. * Fix admission test assertion for mutli gpus. * Integrated initialize_non_admitted_embeddings. * Pass admit strategy and evict strategy from table to function. * Fix bugs. * Fix bugs. * Remove some comments. --------- Co-authored-by: Jiashu Yao <jiashu.yao.cn@gmail.com>

Required by f5b608e C++ sources which use find_and_update and other APIs added in 9c197a9c558d1e8285c2e50c1974f0f102826f11. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace all src/ .cu/.h/.cuh files with their f5b608e versions to ensure consistency with the cherry-picked Python code. Key additions: find_pointers_with_scores, insert_and_evict_with_scores, find_and_initialize bindings in dynamic_emb_op.cu; updated hkv_variable.cuh/h with new virtual method overrides; all 18 hkv_variable_instantiations regenerated. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

f5b608e sources use std::optional in dynamic_variable_base.h and hkv_variable.h. Without C++17, nvcc and g++ both fail to resolve std::optional, causing cascading override errors. Also track build.sh. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace all dynamicemb/*.py and shard/planner files with f5b608e versions to match the C++ extension bindings. Key changes: - batched_dynamicemb_function.py: drop lookup_*_dense imports (not in our extension); use lookup_forward/backward + find_and_initialize - shard/embedding.py: add DynamicEmbeddingCollectionContext class - dump_load.py: re-export DynamicEmbInitializerArgs/Mode, DynamicEmbScoreStrategy, DynamicEmbTableOptions for backward compatibility with cherry-picked tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Machine has 2 GPUs; all test scripts hardcoded 4. Changed NUM_GPUS and --nproc_per_node to 2 in all affected scripts. Restored test_lfu_scores.sh from f5b608e (was missing from cherry-pick). Replaced test_embedding_dump_load.py with f5b608e version to fix missing imports (click, typing, record decorator). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Document all build errors encountered (std::optional, submodule, override errors), Python alignment issues, and test fixes applied during the build-install-test loop. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

All test shell scripts now read DYNAMICEMB_NUM_GPUS (default 2) to set --nproc_per_node, replacing hardcoded values of 2, 4, or 8. Scripts that run multiple torchrun calls concurrently with test_embedding_dump_load also read DYNAMICEMB_MASTER_PORT (per-script defaults 29601–29604) so they can run in parallel without competing on port 29500. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

f5b608e renamed BatchedDynamicEmbeddingTables to BatchedDynamicEmbeddingTablesV2. Add an alias at module level so existing code importing the old name continues to work. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The test used module.tables[0] (returns KeyValueTable, not DynamicEmbTable) and module.optimizer (returns BaseDynamicEmbeddingOptimizerV2 with incompatible update() signature). Fix by constructing hashtables via initialize_hashtables() directly and instantiating old-style optimizers via dynamicemb_optimizer_class with explicit table options. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Document final test results and the additional fixes applied during the test loop (GPU count, master ports, missing test files, API fixes). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

KeyValueTable wrappers so goatee callers (which pass KeyValueTable objects) can use free functions like dyn_emb_cols/rows/capacity, insert_or_assign and export_batch that previously only accepted the raw C++ DynamicEmbTable. - dynamicemb_config.py: - dyn_emb_to_torch: passthrough when already a torch.dtype - _unwrap_table(): extracts DynamicEmbTable from KeyValueTable - Python wrappers: dyn_emb_cols, dyn_emb_rows, dyn_emb_capacity, insert_or_assign, export_batch — all accept either table type - key_value_table.py: - optstate_dim() → backward-compat alias for optim_state_dim() - get_initial_optstate() → forwarded to underlying DynamicEmbTable - optimizer.py: - BaseDynamicEmbeddingOptimizer.register(BaseDynamicEmbeddingOptimizerV2) so isinstance(v2_optimizer, BaseDynamicEmbeddingOptimizer) is True Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jiashuy and others added 19 commits March 26, 2026 06:07

Gradient clipping by reusing TorchRec&FBGEMM's parameters (NVIDIA#223)

7923e44

* Add gradient clipping in dynamicemb * Fix potential capacity mismatch issue in incremental_dump

bump HierarchicalKV submodule to 9c197a9c

6086d18

Required by f5b608e C++ sources which use find_and_update and other APIs added in 9c197a9c558d1e8285c2e50c1974f0f102826f11. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: update upstream-3pick.md with build/test progress

f12868b

Document all build errors encountered (std::optional, submodule, override errors), Python alignment issues, and test fixes applied during the build-install-test loop. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Restore test_lfu_scores.py from f5b608e (was missing from cherry-pick)

75fa76b

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Update upstream-3pick.md: all 11 tests passed

3253ad2

Document final test results and the additional fixes applied during the test loop (GPU count, master ports, missing test files, API fixes). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add upstream-3pick-plan.md: build/test/publish plan for 0.0.4 wheel

cb46545

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Bump dynamicemb version to 0.0.4

cc9d7ff

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Remove _unwrap_table shims from dynamicemb_config

70636db

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upstream 3pick#336

Upstream 3pick#336
fshhr46 wants to merge 19 commits intoNVIDIA:mainfrom
fshhr46:upstream-3pick

fshhr46 commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

fshhr46 commented Mar 27, 2026

Description

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants