x

LESSWRONG

LW

Ed Li — LessWrong

Ed Li

Ed Li

Message

1

2

3y

Ed Li

3y

How to train any multiagent systems end-to-end from AI feedback

tldr: Finetuning many agents end-to-end offers a workaround to continual learning since different agents can specialize without catastrophic forgetting. Yet doing so is hard due to credit assignment and sample efficiency. We found that using AI feedback as per-action process rewards holds promise for addressing these challenges and unlocks a...