LESSWRONG
LW

DeepMindAI
Frontpage

29

Paper: In-context Reinforcement Learning with Algorithm Distillation [Deepmind]

by LawrenceC
26th Oct 2022
AI Alignment Forum
1 min read
5

29

Ω 8

This is a linkpost for https://arxiv.org/abs/2210.14215
DeepMindAI
Frontpage

29

Ω 8

Paper: In-context Reinforcement Learning with Algorithm Distillation [Deepmind]
8Dunning K.
2Charbel-Raphaël
2Charbel-Raphaël
4Adam Jermyn
4Logan Riggs
New Comment
5 comments, sorted by
top scoring
Click to highlight new comments since: Today at 7:48 PM
[-]Dunning K.3y83

More evidence for the point "generative models can contain agents", or specifically "generative models trained to imitation agents can learn to behave agentically". However, not more evidence for the claim "generative models trained to be generators / generative models trained to be useful tools will suddenly learn an internal agent". Does that seem right?

Reply
[-]Charbel-Raphaël3y20

This post argues there is no inner mesa-optimizer here:

https://www.lesswrong.com/posts/avvXAvGhhGgkJDDso/caution-when-interpreting-deepmind-s-in-context-rl-paper

Reply
[-]Charbel-Raphaël3yΩ121

This model is a proof of concept of powerful implicit mesa optimizer, which is evidence towards "current architectures could be easily inner misaligned".

Reply
[-]Adam Jermyn3yΩ240

This indeed sure seems like there's an inner optimizer in there somewhere...

Reply
[-]Logan Riggs3yΩ142

Notably the model was trained across multiple episodes to pick up on RL improvement.

Though the usual inner misalignment means that it’s trying to gain more reward in future episodes by forgoing reward in earlier ones, but I don’t think this is evidence for that.

Reply
Moderation Log
More from LawrenceC
View more
Curated and popular this week
5Comments

Authors train transformers to imitate the trajectory of reinforcement learning (RL) algorithms. Find that the transformers learn to do in-context RL (that is, the transformers implement an RL algorithm)---the authors check this by having the transformers solve new RL tasks. Indeed, the transformers can sometimes do better than the RL algorithms they're trained to imitate.

Seems like more evidence for the "a generative model contain agents" point.

Abstract:

We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model. Algorithm Distillation treats learning to reinforcement learn as an across-episode sequential prediction problem. A dataset of learning histories is generated by a source RL algorithm, and then a causal transformer is trained by autoregressively predicting actions given their preceding learning histories as context. Unlike sequential policy prediction architectures that distill post-learning or expert sequences, AD is able to improve its policy entirely in-context without updating its network parameters. We demonstrate that AD can reinforcement learn in-context in a variety of environments with sparse rewards, combinatorial task structure, and pixel-based observations, and find that AD learns a more data-efficient RL algorithm than the one that generated the source data.

Mentioned in
35Powerful mesa-optimisation is already here