[ Question ]

Does there exist an AGI-level parameter setting for modern DRL architectures?

by TurnTrout 1 min read9th Feb 20203 comments

15


Suppose the architecture includes memory (in the form of a recurrent state) and will act as the policy network for an observation-based RL agent. Evaluating the agent from a reasonable initial state, would you guess that there exists a model with robustly human+ capabilities for current architectures?

How many parameters would it take before you estimate there's a fifty-fifty chance of such a parameter setting existing? 1 billion? 1 trillion? More?

New Answer
Ask Related Question
New Comment

2 Answers

Yes. Modelspace is huge and we're only exploring a smidgen. The busy beaver sequence hints at how much you can do with a small number of parts and exponential luck. I think feeding a random number generator into a compiler could theoretically have spawned an AGI in the eighties. Given a memory tape, transformers (and much simpler architectures) are Turing-complete. Even if all my reasoning is wrong, can't the model just be hardcoded to output instructions on how to write an AGI?

Jumping out on a limb—and I might change my mind next week—but I would say "no", if using current popular mainstream DRL techniques, because these lack (1) foresight (i.e., running a generative model to predict the result of different possible courses of action, and choosing on the basis of the results), and (2) analysis-by-synthesis (processing inputs by continually running searches through a space of generative models to find the model that best matches that input). I think humans do both, and without both (among other requirements), I picture systems as sorta more like "operating on instinct" rather than "intelligent".

So (in my mind), your question would be "can we get 'robustly human+ capabilities' from a system operating on instinct?" and the answer is "Obviously yes when restricted to any finite set of tasks in any finite set of situations", e.g. AlphaStar. With enough parameters, the set of tasks and situations could get awfully high, and maybe that counts as "robustly human+"—just as a large enough Giant Lookup Table might count as "robustly human+". But my hunch is that systems with foresight and analysis-by-synthesis will be "robustly human+" earlier than any systems that operate on instinct.