Here's a link towards DreamerV3, a new model from DeepMind that can be trained on a bunch of different tasks (including a simplified version of Minecraft) and outperform more narrow models. Link:

The most surprising bits are that:

  • The type of tasks they train it on is fairly diverse
  • Data efficiency scales with the number of parameters
  • They so far haven't scaled it that far and got pretty good results
New Comment
3 comments, sorted by Click to highlight new comments since: Today at 1:04 PM

The paper is an impressive testimony to the engineering sweat and tears they had to put in to get their model to generalize as well as it did. Like, just seeing a parameter set equal to 0.997 makes you think "huh, so did they try every value from 0.995 to 0.999?" - to say nothing of all the functional degrees of freedom. The end result is simpler than MCTS on the surface, but it doesn't seem obvious whether it was any less effort by the researchers. Still plenty cool to read about though.

And also, yes, of course, it's 2023 and shouldn't someone have heard the fire alarm already? Even though this research is not directly about building an AI that can navigate the real world, it's still pushing towards it, in a way that I sure wish orgs like DeepMind would put on a back burner relative to research on how to get an AI to do good things and not bad things if it's navigating the real world.

Is it just me, or does this validate some of the parts of Yann LeCun's "A Path Towards Autonomous Machine Intelligence" paper?


The two papers both use an algorithm consisting of multiple specialized models, with DreamerV3 using 3 models that seem very similar to those described by LeCun:


"the world model predicts future outcomes of potential actions"

"the critic judges the value of each situation"

"the actor learns to reach valuable situations"


World model, critic, actor - all are also described in LeCun's paper. So are we seeing a successful push towards an AGI similar to LeCun's ideas?

I'm not familiar with LeCun's ideas, but I don't think the idea of having an actor, critic, and world model is new in this paper. For a while, most RL algorithms have used an actor-critic architecture, including OpenAI's old favorite PPO. Model-based RL has been around for years as well, so probably plenty of projects have used an actor, critic, and world model.

Even though the core idea isn't novel, this paper getting good results might indicate that model-based RL is making more progress than expected, so if LeCun predicted that the future would look more like model-based RL, maybe he gets points for that.

New to LessWrong?