Skip to content

Clarification on dataset splits, critic types, and feedback settings for reproduction #59

@junho328

Description

@junho328

Hello,
I am trying to reproduce the results from your paper, and I encountered a few points that are somewhat unclear to me. I opened this issue to ask for clarification so that I can faithfully follow the experimental setup.

  1. Dataset splits (train / eval / test)
    First, I would like to clarify the exact train, evaluation, and test splits used in the experiments.

According to Appendix C.1 of the paper, for the writing collaboration task (TLDR):

  • Training set: TLDR[0:1000]
  • Test set: TLDR[1000:1100]

However, when I checked magrpo_tldr_config.yaml, I noticed that:

  • train_split is set to train[:1100]
  • eval_split is set to test[:1100]

This seems slightly different from what is described in the paper.

Additionally, the trl-lib/tldr dataset already provides train, validation, and test splits.
In this context, does TLDR[1000:1100] refer to:

  • indices [1000:1100] from the train split, or
  • indices [1000:1100] from the test split?

I would really appreciate clarification on this point.

Similarly, for the Minecraft experiments, in house_build_magrpo_config.yaml, I see that:

  • train_split is set to [:]

whereas the paper mentions using [0:8] for training.
I was wondering whether this discrepancy was an oversight during code release, or if there is another intended explanation.

  1. Critic types used in main experiments

For the Actor-Critic–based algorithms, could you clarify which critic type was used for each method in the main experimental results reported in the paper?

Specifically, were the critic types exactly those specified in the released configuration files, or were there any differences between the paper experiments and the public code?

  1. Feedback types in Minecraft experiments

For the Minecraft tasks, I assume that performance may vary depending on the feedback type.
Could you please let me know which feedback types were used when training StrBuild and HouseBuild, respectively, for the results reported in Table 1?

Sorry for bothering you again with many questions
I genuinely think your work is excellent, and I would really like to run the experiments myself to better understand and verify the results.

Thank you very much for your time and for sharing such great research!!

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationquestionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions