Will scaling deep learning produce human-level generality, or do we need a new approach? You may have read the exchange between Scott Alexander and Gary Marcus, and felt that there are some good arguments on both sides, some bad ones, but few arguments that go beyond analogy and handwaving - arguments that would take what we know about deep learning and intelligence, and look at what that knowledge implies. If you haven’t read the exchange, here it is: SA, GM, SA, GM.

I will argue for Marcus’ position, but dive a little deeper than he does. I believe that symbolic representations, specifically programs, and learning as program synthesis, can provide data efficient and flexible generalization, in a way that deep learning can’t, no matter how much we scale it. I’ll show how probabilistic programs can represent causal models of the world, which deep learning can’t do, and why causal models are essential to intelligence. But I’ll start by examining the opposing view, that scaling deep learning is sufficient for general intelligence. To that end, I’ll quote from Gwern’s thorough essay on the scaling hypothesis.

Table of Contents:
 

New Comment