Disclaimer: This post isn't intended to convince or bring anyone closer to a certain position on AI. It is merely - as the title suggests - for the record.
I would like to publicly record my prediction about the prospects of artificial superintelligence, consisting of a weak and a strong thesis:
Weak thesis: Current deep learning paradigms will not be sufficient to create an artificial superintelligence.
You could call this the Anti-"scaling-maximalist"-thesis, except that it goes quite a bit further by including possible future deep leaning architectures. Of course, "deep learning" is doing a lot of work here, but as a rule of thumb, I would consider a new architecture fits within the DL paradigm if it involves a very large number (at least in the millions) of randomly initialized parameters that are organized into largely uniform layers and are updated via a simple algorithm like backpropagation.
I intentionally use the word "superintelligence" here because "AGI" or "human-level intelligence" have become rather loaded terms, the definitions of which are frequently a point of contention. I take superintelligence to mean an entity so obviously powerful and impressive in its accomplishments that all debates around its superhuman nature should be settled instantly. Feats like building a Dyson Sphere, launching intergalactic colony ships at 0.9c, designing and deploying nanobots that eat the whole biosphere etc. Stuff like passing the Turing Test or Level 5 self-driving decidedly do not qualify (not that I think these challenges will be easy; but I want to make the demarcation blatantly clear).
(the downside is that we may never live to witness such an ASI in case it turns out to be UnFriendly, but by then earning some reputation points on the internet will be the least of my concerns)
Strong thesis: If and when the first ASI is built, it will not use deep learning as one of its components.
For instance, if the ASI uses a few CNN-layers to pre-process visual inputs, or some autoencoder system to distill data into latent variables, that's already enough to refute the strong thesis. On the other hand, merely running on hardware that was design-assisted by DL-systems does not disqualify the ASI from being DL-free.
For future reference, here is some context this prediction was made in:
It is the beginning of 2023, 1.5 months after the release of chatGPT, 5 months after the release of Stable Diffusion, both of which have renewed hype around deep learning and Transformer-based models in particular. Rumors around yet-to-be-released GPT-4 presenting a big leap in capabilities are floating about. The scaling maximalist position has gained a lot of traction both inside and outside the community and may even represent the mainstream opinion on LW. Timeline predictions are short (AGI by 2030-35 seems to be the consensus), broadly speaking, and even extremely short timelines (~2025) are not unheard of.
The title is still wildly overconfident. There are no ASIs in the set of deep learning programs? If someone believes that, they don't understand deep learning - deep learning could be said to be the biology of software engineering.
Personally, I believe the strong inverse of this: there are absolutely no intelligent systems which are not a special case of deep learning, because deep learning is the anti-spotlight-effect of ai: connectionism is the perspective that we don't need to constrain types of connections, because statistical understanding will be able to generate an understanding of structural connectivity. Which means that even if the structural connectivity has enormous amounts of learned complexity, even if you already had another system that had more useful priors than mere connectionism, it still counts as connectionism! the point here is not merely that everything is a special case of everything else or something vacuous, but that the additional constraints, while useful, would need to import the core basis of deep learning: adjustment of large networks of interaction as the key basis for intelligence. it is the biology of software engineering because it allows us to talk about the shape of any program in terms of its manifold of behavior.
yes, there's a lot more an ai could use to shape its manifold of behavior than mere gradient descent. but as long as you have a need to do bayesian inference, you'll have something that looks like gradient descent in networks.
edit: not necessarily backprop, though.