Skip to content

question regarding the efficiency comparison #20

@YichaoCai1

Description

@YichaoCai1

Hi, thanks for the interesting work! I enjoyed reading the paper and the clean codebase.

I have a question regarding the efficiency comparison. Since DMLR is fundamentally a test-time optimization method, each example requires an iterative RL loop (max_rl_steps forward passes with output_attentions=True, plus reward computation per step). From the code, the default configuration runs ~20 optimization steps per example before the final model.generate() call. This means the total compute per example is substantially higher than both vanilla inference and standard CoT — even though the output is shorter.

Could you clarify: What exactly does the "Efficiency" metric (Fig 11 in the arXiv preprint) on the x-axis measure?

Thanks again for the great work!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions