How to train any multiagent systems end-to-end from AI feedback
tldr: Finetuning many agents end-to-end offers a workaround to continual learning since different agents can specialize without catastrophic forgetting. Yet doing so is hard due to credit assignment and sample efficiency. We found that using AI feedback as per-action process rewards holds promise for addressing these challenges and unlocks a...
Feb 51