Skip to content

fix: use Cluster instead of WorkerConfig for ref log prob dynamic batching#376

Open
dubin555 wants to merge 1 commit intoalibaba:mainfrom
dubin555:oss-scout/verify-ref-worker-config-vs-cluster
Open

fix: use Cluster instead of WorkerConfig for ref log prob dynamic batching#376
dubin555 wants to merge 1 commit intoalibaba:mainfrom
dubin555:oss-scout/verify-ref-worker-config-vs-cluster

Conversation

@dubin555
Copy link

Problem

In RLVRPipeline._train (rlvr_pipeline.py:544), when use_ref_model=False, the worker variable is set to self.pipeline_config.actor_train (a WorkerConfig) instead of self.actor_train (a Cluster):

worker = self.reference if self.use_ref_model else self.pipeline_config.actor_train  # BUG
# ...
worker.dp_size  # AttributeError: WorkerConfig has no dp_size

This causes an AttributeError when computing reference log probabilities with enable_reference=True, use_ref_model=False, and use_dynamic_batching_in_infer=True.

Root Cause

Copy-paste error from line 543. worker_config correctly uses self.pipeline_config.actor_train (WorkerConfig), but worker on line 544 should use self.actor_train (Cluster) which has the dp_size property.

Fix

worker = self.reference if self.use_ref_model else self.actor_train

Files Changed

  • roll/pipeline/rlvr/rlvr_pipeline.py — Line 544
  • tests/test_ref_worker_type_consistency.py — AST-based regression test

When `use_ref_model=False`, the `worker` variable was set to
`self.pipeline_config.actor_train` (a WorkerConfig) instead of
`self.actor_train` (a Cluster). WorkerConfig does not have `dp_size`,
so `worker.dp_size` raises AttributeError when dynamic batching is
enabled for reference log prob computation without a separate
reference model.

Change the else-branch to use `self.actor_train` (Cluster) which
has the `dp_size` property.
@CLAassistant
Copy link

CLAassistant commented Mar 14, 2026

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants