Skip to content

Comments

Add is_async to CheckpointingConfig TypedDict#1991

Open
dmvevents wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
dmvevents:add-is-async-to-checkpointing-config
Open

Add is_async to CheckpointingConfig TypedDict#1991
dmvevents wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
dmvevents:add-is-async-to-checkpointing-config

Conversation

@dmvevents
Copy link

@dmvevents dmvevents commented Feb 19, 2026

Summary

  • AutomodelCheckpointManager.init_checkpointer() already consumes is_async from config_updates (automodel_checkpoint.py:118)
  • AutomodelCheckpointManager.save_checkpoint() passes is_async through from the checkpointing_cfg dict (automodel_checkpoint.py:297)
  • However, the CheckpointingConfig(TypedDict) in checkpoint.py does not declare is_async as a field
  • This means Hydra config validation rejects +checkpointing.is_async=true as an unknown key, preventing users from enabling async checkpointing via config overrides

The fix adds a single NotRequired[bool] field to the CheckpointingConfig TypedDict, consistent with the existing pattern used by is_peft, save_consolidated, etc.

Test plan

  • Verify +checkpointing.is_async=true Hydra override is accepted without errors
  • Verify async checkpoint save/load works end-to-end with DTensor policy worker
  • Existing checkpoint tests continue to pass (field is NotRequired with default False)

Summary by CodeRabbit

  • New Features
    • Added optional asynchronous checkpoint configuration option to checkpoint settings.

The is_async parameter is already consumed by AutomodelCheckpointManager
(automodel_checkpoint.py:118) and passed through in save_checkpoint
(automodel_checkpoint.py:297), but was missing from the CheckpointingConfig
TypedDict definition. This causes Hydra to reject
`+checkpointing.is_async=true` as an unknown key.

Signed-off-by: dmvevents <dmvevents@users.noreply.github.com>
@dmvevents dmvevents requested a review from a team as a code owner February 19, 2026 15:20
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 19, 2026

📝 Walkthrough

Walkthrough

Added a new optional is_async field with a default value of False to the CheckpointingConfig TypedDict in the checkpoint utility module, extending the configuration schema without modifying existing fields or behavior.

Changes

Cohort / File(s) Summary
Checkpoint Configuration Schema
nemo_rl/utils/checkpoint.py
Added optional is_async: NotRequired[bool] field to CheckpointingConfig TypedDict to support asynchronous checkpoint operations.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately describes the single main change: adding an is_async field to the CheckpointingConfig TypedDict. It is concise, specific, and clear.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes ✅ Passed PR adds single-line optional TypedDict field enabling Hydra config overrides for existing parameter, with documented test plan covering acceptance, end-to-end functionality, and regression testing.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
nemo_rl/utils/checkpoint.py (1)

38-52: 🛠️ Refactor suggestion | 🟠 Major

Document is_async in the class docstring and add it to exemplar YAML configurations.

The new field is not documented in the CheckpointingConfig docstring Attributes section. Additionally, exemplar YAML files under examples/configs/ that define checkpointing: blocks do not include the is_async configuration key. Per coding guidelines, new config keys must be documented and reflected in exemplar YAMLs.

✏️ Proposed docstring addition
     is_peft (bool): Whether the model uses PEFT.
+    is_async (bool): Whether to use asynchronous checkpointing. Defaults to False.
     """

Add is_async: false to each checkpointing: block in exemplar YAML files under examples/configs/.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_rl/utils/checkpoint.py` around lines 38 - 52, Update the
CheckpointingConfig class docstring to document the new is_async attribute
(e.g., is_async (bool): Whether checkpoint saving is performed asynchronously)
and ensure it appears in the Attributes section alongside enabled,
checkpoint_dir, etc.; then add is_async: false to every exemplar YAML
checkpointing: block under examples/configs/ so config examples reflect the new
key. Reference the CheckpointingConfig class and the checkpointing: blocks in
the example YAMLs when making these edits.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@nemo_rl/utils/checkpoint.py`:
- Around line 38-52: Update the CheckpointingConfig class docstring to document
the new is_async attribute (e.g., is_async (bool): Whether checkpoint saving is
performed asynchronously) and ensure it appears in the Attributes section
alongside enabled, checkpoint_dir, etc.; then add is_async: false to every
exemplar YAML checkpointing: block under examples/configs/ so config examples
reflect the new key. Reference the CheckpointingConfig class and the
checkpointing: blocks in the example YAMLs when making these edits.

@chtruong814 chtruong814 added the needs-follow-up Issue needs follow-up label Feb 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants