Clarification on dataset splits, critic types, and feedback settings for reproduction

Hello,
I am trying to reproduce the results from your paper, and I encountered a few points that are somewhat unclear to me. I opened this issue to ask for clarification so that I can faithfully follow the experimental setup.

1. Dataset splits (train / eval / test)
First, I would like to clarify the exact train, evaluation, and test splits used in the experiments.

According to Appendix C.1 of the paper, for the writing collaboration task (TLDR):
- Training set: TLDR[0:1000]
- Test set: TLDR[1000:1100]

However, when I checked magrpo_tldr_config.yaml, I noticed that:
- train_split is set to train[:1100]
- eval_split is set to test[:1100]

This seems slightly different from what is described in the paper.

Additionally, the trl-lib/tldr dataset already provides train, validation, and test splits.
In this context, does TLDR[1000:1100] refer to:
- indices [1000:1100] from the train split, or
- indices [1000:1100] from the test split?

I would really appreciate clarification on this point.

Similarly, for the Minecraft experiments, in house_build_magrpo_config.yaml, I see that:
- train_split is set to [:]

whereas the paper mentions using [0:8] for training.
I was wondering whether this discrepancy was an oversight during code release, or if there is another intended explanation.

2. Critic types used in main experiments

For the Actor-Critic–based algorithms, could you clarify which critic type was used for each method in the main experimental results reported in the paper?

Specifically, were the critic types exactly those specified in the released configuration files, or were there any differences between the paper experiments and the public code?

3. Feedback types in Minecraft experiments

For the Minecraft tasks, I assume that performance may vary depending on the feedback type.
Could you please let me know which feedback types were used when training StrBuild and HouseBuild, respectively, for the results reported in Table 1?

⸻

Sorry for bothering you again with many questions
I genuinely think your work is excellent, and I would really like to run the experiments myself to better understand and verify the results.

Thank you very much for your time and for sharing such great research!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on dataset splits, critic types, and feedback settings for reproduction #59

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Clarification on dataset splits, critic types, and feedback settings for reproduction #59

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions