Add is_async to CheckpointingConfig TypedDict#1991
Add is_async to CheckpointingConfig TypedDict#1991dmvevents wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
Conversation
The is_async parameter is already consumed by AutomodelCheckpointManager (automodel_checkpoint.py:118) and passed through in save_checkpoint (automodel_checkpoint.py:297), but was missing from the CheckpointingConfig TypedDict definition. This causes Hydra to reject `+checkpointing.is_async=true` as an unknown key. Signed-off-by: dmvevents <dmvevents@users.noreply.github.com>
📝 WalkthroughWalkthroughAdded a new optional Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
nemo_rl/utils/checkpoint.py (1)
38-52: 🛠️ Refactor suggestion | 🟠 MajorDocument
is_asyncin the class docstring and add it to exemplar YAML configurations.The new field is not documented in the
CheckpointingConfigdocstringAttributessection. Additionally, exemplar YAML files underexamples/configs/that definecheckpointing:blocks do not include theis_asyncconfiguration key. Per coding guidelines, new config keys must be documented and reflected in exemplar YAMLs.✏️ Proposed docstring addition
is_peft (bool): Whether the model uses PEFT. + is_async (bool): Whether to use asynchronous checkpointing. Defaults to False. """Add
is_async: falseto eachcheckpointing:block in exemplar YAML files underexamples/configs/.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemo_rl/utils/checkpoint.py` around lines 38 - 52, Update the CheckpointingConfig class docstring to document the new is_async attribute (e.g., is_async (bool): Whether checkpoint saving is performed asynchronously) and ensure it appears in the Attributes section alongside enabled, checkpoint_dir, etc.; then add is_async: false to every exemplar YAML checkpointing: block under examples/configs/ so config examples reflect the new key. Reference the CheckpointingConfig class and the checkpointing: blocks in the example YAMLs when making these edits.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@nemo_rl/utils/checkpoint.py`:
- Around line 38-52: Update the CheckpointingConfig class docstring to document
the new is_async attribute (e.g., is_async (bool): Whether checkpoint saving is
performed asynchronously) and ensure it appears in the Attributes section
alongside enabled, checkpoint_dir, etc.; then add is_async: false to every
exemplar YAML checkpointing: block under examples/configs/ so config examples
reflect the new key. Reference the CheckpointingConfig class and the
checkpointing: blocks in the example YAMLs when making these edits.
Summary
AutomodelCheckpointManager.init_checkpointer()already consumesis_asyncfromconfig_updates(automodel_checkpoint.py:118)AutomodelCheckpointManager.save_checkpoint()passesis_asyncthrough from thecheckpointing_cfgdict (automodel_checkpoint.py:297)CheckpointingConfig(TypedDict)incheckpoint.pydoes not declareis_asyncas a field+checkpointing.is_async=trueas an unknown key, preventing users from enabling async checkpointing via config overridesThe fix adds a single
NotRequired[bool]field to theCheckpointingConfigTypedDict, consistent with the existing pattern used byis_peft,save_consolidated, etc.Test plan
+checkpointing.is_async=trueHydra override is accepted without errorsNotRequiredwith defaultFalse)Summary by CodeRabbit