Dear author,
Why "we use a set of principles different from the reward model training stage, as illustrated in Table 8, which contains a few more principles that we would expect a well-aligned LLM AI-assistant agent would behave." ?
Thanks for your explanation!
Dear author,
Why "we use a set of principles different from the reward model training stage, as illustrated in Table 8, which contains a few more principles that we would expect a well-aligned LLM AI-assistant agent would behave." ?
Thanks for your explanation!