# 27

My impression was that thing that put Deepmind on the map was an AI that could play multiple Atari games. Lately there's been new Atari-playing AI (both from Deepmind and other companies) that are making the news. Are they doing basically the same thing 2014 Deepmind was doing but better? Are they doing a fundamentally different thing? Can someone explain the diff like I'm five?

Mentioned in
New Comment

# 2 Answers sorted by top scoring

Razied

### Nov 02, 2021

9

I'm not sure the level of sophistication you want, but here's an answer:

Performance on the games is much better and the amount of game time it takes for the AI to reach a certain performance is much lower. Yet fundamentally they are doing the same thing: solving a Markov Decision Process (abbreviated MDP) using Reinforcement Learning and Deep Learning. Pretty much any problem where there's anything resembling an "environment state" where you make "decisions" can be modelled as an MDP. Small MDPs can be exactly solved with dynamic programming.

For instance, if you have a 10 by 10 grid and different "rewards" placed at each grid location with a player moving around on the grid collecting rewards, you can solve the path the player should take exactly. That is because the environment is small and has only 100 states, so it's easy to just store a table with 100 values with rules like "at location (4,3) go right with probability 0.9". The real world is much larger, and so you can't run the algorithm for solving the MDP exactly, what you need are approximations of this exact algorithm.

The whole field of Deep Reinforcement Learning right now is basically how to make better and better approximations of this exact algorithm that work in the real world. In this sense almost everything Deepmind has been doing is the same: building better approximations for the same underlying algorithm. But the approximations have gotten way way better, lots of tricks and heuristics were developed and people understand how to make the algorithms run consistently now (at the beginning there was lots of "you need this particular random seed to make it work").

3 comments, sorted by Click to highlight new comments since:

Could you be concrete about what papers you consider newer and maybe also link to original deep-q paper you have in mind? (This might help someone answer the question)

This was the newer Deepmind one:

https://www.lesswrong.com/posts/mTGrrX8SZJ2tQDuqz/deepmind-generally-capable-agents-emerge-from-open-ended?commentId=bosARaWtGfR836shY#bosARaWtGfR836shY

I was motivated to post by this algorithm from China I heard about today: