This whole evolution looks more and more like expert systems from 1980s where people kept adding more and more complexity to "solve" a specific problem. For RL, we started with simple DQN that was elegant but now the new algorithms looks like a massive hodge podge of band aids. NGU, as it is, extraordinarily complex and looks adhoc mix of various patches. Now on the top of NGU, we are also throwing in meta-controller and even bandits among other things to complete the proverbial kitchen sink. Sure, we get to call victory on Atari but this is far and away from elegant and beautiful. It would be surprising if this victory generalizes to other problems where folks have built different "expert systems" specific to those problems. So all this feels a lot like Watson winning jeopardy moment to me...

PS: Kudos to DeepMind for pushing for median or even betten bottom percentile instead of simplistic average metric which also hides variance.

A lot of techniques had to be thrown together to make this work, and in that sense it reminds me of rainbow DQN since they've thrown together a bunch of things to solve the problem. However, a quick glance at the tables in appendix H.4 of the paper makes it hard to tell if this is really much of an improvement over the other atari agents Deepmind has put together.

[-]Pattern6y30

1. Meta: Why does this have the coronavirus marker?[1] (I could understand if there was a comment or the OP said "and this can help with developing a vaccine by ...," but that doesn't seem to be the case.)

2. Specific: It would be interesting to see a comparison of different approaches all compared to each other, in therms of compute, cost, performance, etc., so it's clear when "a better algorithm has been found" versus "more compute was used".[2] I guess I'm interested in some basic credit assignment w.r.t. these different "AI" approaches.

[1] Any tips on how to add 'crossed out' to this part of the text if that changes, would be appreciated.

[2] If an algorithm doesn't work as well, stopping after a certain point makes sense - but it's nice to know if a later work does better, or is just 'more compute'.