Ed Li — LessWrong

LESSWRONG
LW

Ed Li — LessWrong

How to train any multiagent systems end-to-end from AI feedback

tldr: Finetuning many agents end-to-end offers a workaround to continual learning since different agents can specialize without catastrophic forgetting. Yet doing so is hard due to credit assignment and sample efficiency. We found that using AI feedback as per-action process rewards holds promise for addressing these challenges and unlocks a new axis for scaling post-training.

Paper: https://arxiv.org/abs/2601.23228
Code: https://github.com/ltjed/multiagent-coaching
Blog: https://ltjed.github.io/MAPPA/

---

What we did

We have an off-the-shelf LLM act as a coach that watches multi-agent systems during training. It scores each action as it happens—process supervision rather than just outcome rewards at the end. The coach sees what each agent produced plus tool outputs (stdout, stderr, errors).

The parts I find interesting

Credit assignment just... works? When agent... (read more)

Replying toThree ways interpretability could be impactful

Ed Li2y

Three ways interpretability could be impactful

Some questions about the feasibility section:

About (2): what would 'crisp internal representations' look like? I think this is really useful to know since we haven't figure this out for the brain or LLMs (e.g. Interpreting Neural Networks through the Polytope Lens

Moreover, the current methods used in comparing the human brain's representations to those of ML models like RSA are quite high-level, or, at the very least do not reveal any useful insights for interpretability for either side (pls feel free to correct me). This question is not a point against mechanistic interpretability, however--granted the premise that LLMs are pressured to learn similar representations, interpretability research can both borrow from and shed light... (read more)

Replying to6 non-obvious mental health issues specific to AI safety

Ed Li2y

6 non-obvious mental health issues specific to AI safety

Thank you so much for posting this. It feels weird to tick every single symptom mentioned here...

The burnout that 'Dmitry' experiences is remarkably accurate for what am I experiencing, Are there any further guides on how to manage this? It would help me so much, any help is appreciated:)