Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

I haven't paid much attention to Atari in a long time, and so would appreciate takes from anyone who follows this more closely. My take:

A single architecture that can handle both the games that require significant exploration, and the games that require long-term credit assignment, and the 'easy' games, without tricks or expert performance, seems like an achievement to me. The main question then becomes "does it scale out of the simulator to problems we care about more than Atari?"

What does it use? It looks like it's a lot of 'engineering improvements' and data. That is, lots of approaches will have tons of small components that are set to some simple default. You need to explore, so you use the simplest possible method of exploration, which is every action you roll a die and epsilon percent of the time you take a random action. Of course you could do better if you thought about it more, but there are many places like that in the code, and 'thinking about it more' requires both developer effort and compute (which, since the efficiency of the whole thing depends on how much compute it can spend, might mean you're spending dollars to earn cents if you use a fancy method where a simple one would do).

That is, this doesn't look like a radical new competitor to DQN; this looks like DQN after five years of iteration and testing, where some defaults are swapped to fancier settings and others aren't, and thus is updating me a little upward on the importance of engineering and data as an input factor.

They say this isn't the end of Atari research:

This by no means marks the end of Atari research, not only in terms of data efficiency, but also in terms of general performance. We offer two views on this: firstly, analyzing the performance among percentiles gives us new insights on how general algorithms are. While Agent57 achieves strong results on the first percentiles of the 57 games and holds better mean and median performance than NGU or R2D2, as illustrated by MuZero, it could still obtain a higher average performance. Secondly, all current algorithms are far from achieving optimal performance in some games. To that end, key improvements to use might be enhancements in the representations that Agent57 uses for exploration, planning, and credit assignment.

New Comment
5 comments, sorted by Click to highlight new comments since: Today at 5:06 AM

I'm going to parrot a comment from the hackernews discussion on this:

This whole evolution looks more and more like expert systems from 1980s where people kept adding more and more complexity to "solve" a specific problem. For RL, we started with simple DQN that was elegant but now the new algorithms looks like a massive hodge podge of band aids. NGU, as it is, extraordinarily complex and looks adhoc mix of various patches. Now on the top of NGU, we are also throwing in meta-controller and even bandits among other things to complete the proverbial kitchen sink. Sure, we get to call victory on Atari but this is far and away from elegant and beautiful. It would be surprising if this victory generalizes to other problems where folks have built different "expert systems" specific to those problems. So all this feels a lot like Watson winning jeopardy moment to me...

PS: Kudos to DeepMind for pushing for median or even betten bottom percentile instead of simplistic average metric which also hides variance.

A lot of techniques had to be thrown together to make this work, and in that sense it reminds me of rainbow DQN since they've thrown together a bunch of things to solve the problem. However, a quick glance at the tables in appendix H.4 of the paper makes it hard to tell if this is really much of an improvement over the other atari agents Deepmind has put together.

1. Meta: Why does this have the coronavirus marker?[1] (I could understand if there was a comment or the OP said "and this can help with developing a vaccine by ...," but that doesn't seem to be the case.)

2. Specific: It would be interesting to see a comparison of different approaches all compared to each other, in therms of compute, cost, performance, etc., so it's clear when "a better algorithm has been found" versus "more compute was used".[2] I guess I'm interested in some basic credit assignment w.r.t. these different "AI" approaches.

[1] Any tips on how to add 'crossed out' to this part of the text if that changes, would be appreciated.

[2] If an algorithm doesn't work as well, stopping after a certain point makes sense - but it's nice to know if a later work does better, or is just 'more compute'.

1. Meta: Why does this have the coronavirus marker?*

I think that's just a mistake, and possibly a bug in some new code we made for managing tags easier.

So the "2" after "Coronavirus" indicated that "Coronavirus" is the 2nd tag, not that some algorithm determined it's relevance to that subject is "2"?

The "2" means that it has 2 "relevance". You can upvote or downvote a given post's tag-relevance to determine the sort-order on the tag page.