In 2008, Eliezer Yudkowsky was strongly critical of neural networks. From his post "Logical or Connectionist AI?":

Not to mention that neural networks have also been "failing" (i.e., not yet succeeding) to produce real AI for 30 years now.  I don't think this particular raw fact licenses any conclusions in particular.  But at least don't tell me it's still the new revolutionary idea in AI.

This is the original example I used when I talked about the "Outside the Box" box - people think of "amazing new AI idea" and return their first cache hit, which is "neural networks" due to a successful marketing campaign thirty goddamned years ago.  I mean, not every old idea is bad - but to still be marketing it as the new defiant revolution?  Give me a break.

By contrast, in Yudkowsky's 2023 TED Talk, he said:

Nobody understands how modern AI systems do what they do. They are giant, inscrutable matrices of floating point numbers that we nudge in the direction of better performance until they inexplicably start working. At some point, the companies rushing headlong to scale AI will cough out something that's smarter than humanity. Nobody knows how to calculate when that will happen. My wild guess is that it will happen after zero to two more breakthroughs the size of transformers.

Sometime between 2014 and 2017, I remember reading a discussion in a Facebook group where Yudkowsky expressed skepticism toward neural networks. (Unfortunately, I don't remember what the group was.)

As I recall, he said that while the deep learning revolution was a Bayesian update, he still didn't believe neural networks were the royal road to AGI. I think he said that he leaned more towards GOFAI/symbolic AI (but I remember this less clearly). 

I've combed a bit through Yudkowsky's published writing, but I have a hard time tracking when, how, and why he changed his view on neural networks. Can anyone help me out?

This post exists only for archival purposes.

New Answer
New Comment

2 Answers sorted by

Michael Tontchev


Am I the only one reading the first passage as him being critical of the advertising of NNs, rather than of NNs themselves?

Partially, but it is still true that Eliezer was critical of NN's at the time, see the comment on the post:

I'm no fan of neurons; this may be clearer from other posts.


Eliezer has never denied that neural nets can work (and he provides examples in that linked post of NNs working). Eliezer's principal objection was that NNs were inscrutable black boxes which would be insanely difficult to make safe enough to entrust humanity-level power to compared to systems designed to be more mathematically tractable from the start. (If I may quip: "The 'I', 'R', & 'S' in the acronym 'DL' stand for 'Interpretable, Reliable, and Safe'.")

This remains true - for all the good work on NN interpretability, assisted by the surprising levels of linearity inside them, NNs remain inscrutable. To quote Neel Nanda the other day (who has overseen quite a lot of the interpretability research that anyone replying to this comment might be tempted to cite):

Oh man, I do AI interpretability research, and we do not know what deep learning neural networks do. An fMRI scan style thing is nowhere near knowing how it works.

What Eliezer (and I, and pretty much every other LWer at the time who spent any time looking at neural nets) got wrong about neural nets, and has admitted as much, is the timing. (Aside from that, Ms Lincoln...)

To expand a bit on the backstory I also discus... (read more)

I mean, isn't that what we have? It seems to me that, at least relative to what we expect, LLM's have turned out more human like, more oracle like than we imagined? Maybe that will change once we use RL to add planning.

LLM's have turned out more human like, more oracle like than we imagined?

They have turned out far more human-like than Amodei suggested, which means they are not even remotely oracle like. There is nothing in a LLM which is remotely like 'looking things up in a database and doing transparent symbolic-logical manipulations'. That's about the last thing that describes humans too - it takes decades of training to get us to LARP as an 'oracle', and we still do it badly. Even the stuff LLMs do, like inner-monologue, which seem to be transparent, are actually just more Bayesian meta-RL agentic behavior, where the inner-monologue is a mish-mash of amortized computation and task location where the model is flexibly using the roleplay as hints rather than what everyone seems to think it does, which is turn into a little Turing machine mindlessly executing instructions (hence eg. the ability to distill inner-monologue into the forward pass, or insert errors into few-shot examples or the monologue and still get correct answers).

I see what you mean, but I meant oracle-like in the sense of my recollection of Nick Bostrom's usage in Superintelligence. E.g. an AI that only answers questions and does not act. In some sense, it's how much it's not an agent. It does seem to me, that pretrained LLM's are not very agent-like by default. They are by default currently constrained to question answering. Although it's changing fast with things like toolformer. It kind of sounds like you are saying that they have a lot of agentic capability, but they are hampered by the lack of memory/planning. If your description here is predictive, then it seems there may be a lot of low hanging agentic behaviour that can be unlocked fairly easily. Like many other things with LLM's, we just need to "ask it properly". Perhaps using some standard RL techniques like world models. Do you see the properties/danger of LLM's changing once we start using RL to make them into proper agents (not just the few-step chat)?
I don't know what this was a reference to, but amusingly I just noticed that the video I wanted to link was a 2007 lecture at Google by him (if it's the same Geoffrey Hinton): In it he explained a novel approach to handwriting recognition: stack a bunch of increasingly small layers on top of each other until you have just a few dozen neurons, then an inverted pyramid on top of this bottleneck, and train the network by feeding it a lot of handwritten characters using some sort of modified gradient descent to train it to reproduce the input image in the topmost layer as accurately as possible. After the network is trained, use supervised learning with labeled data to train a usual small NN to interpret/map the bottleneck layer activations to characters. And it worked! I find it interesting, especially in context of your comment, because: * Unsupervised learning meant that you could feed it a LOT of data. * So you could force it to learn patterns in the data without overfitting. * It appeared uninspired by any natural neuronal organization. * It was a precursor to Deep Dream - you could run the network in reverse and see what it imagines when prompted with a specific digit. * It actually worked! and basically solved handwriting recognition, as far as I understand. And so it felt like a first qualitative leap in technology in decades, and a very impressive at that, innovating in weird and unexpected ways in several aspects. Sure, it would be another ten years until GPT2, but some promise was definitely there I think.
I guess the joke is not as well-known as I thought: (There's a better page of Hinton stories somewhere but I can't immediately refind it.)
Those statist AI doomers never miss a chance to bring I, R, and S into everything... More seriously, thanks for the history lesson!
Death, taxes, and war, you know - you may not be interested in I, R, or S, but they are interested in you.

Thomas Kwa


My guess is AlphaGo-- I once heard someone who worked at MIRI say that they watched the event and Eliezer was surprised by it.

Yes, I seem to remember him writing about it at the time, too. Not big posts, more public comments and short posts, not sure exactly where.

The invention of transformers circa 2017 would be the next time I remember a similar shift.

In retrospect Alpha0 was really the wake up call for me, not because it was so strong at chess but because it looked so human playing chess.

2 comments, sorted by Click to highlight new comments since:

Here is Yudkowsky (2008) Artificial Intelligence as a Positive and
Negative Factor in Global Risk:

Friendly AI is not a module you can instantly invent at the exact moment when it is first needed, and then bolt on to an existing, polished design which is otherwise completely unchanged.

The field of AI has techniques, such as neural networks and evolutionary programming, which have grown in power with the slow tweaking of decades. But neural networks are opaque—the user has no idea how the neural net is making its decisions—and cannot easily be rendered unopaque; the people who invented and polished neural networks were not thinking about the long-term problems of Friendly AI. Evolutionary programming (EP) is stochastic, and does not precisely preserve the optimization target in the generated code; EP gives you code that does what you ask, most of the time, under the tested circumstances, but the code may also do something else on the side. EP is a powerful, still maturing technique that is intrinsically unsuited to the demands of Friendly AI. Friendly AI, as I have proposed it, requires repeated cycles of recursive self-improvement that precisely preserve a stable optimization target.

The most powerful current AI techniques, as they were developed and then polished and improved over time, have basic incompatibilities with the requirements of Friendly AI as I currently see them. The Y2K problem—which proved very expensive to fix, though not global-catastrophic—analogously arose from failing to foresee tomorrow’s design requirements. The nightmare scenario is that we find ourselves stuck with a catalog of mature, powerful, publicly available AI techniques which combine to yield non-Friendly AI, but which cannot be used to build Friendly AI without redoing the last three decades of AI work from scratch.

Another 15 years didn't make the idea any newer. Being critical of invalid perception or presentation of an idea is more specific and different from "being critical" of the idea. Pointing out that the idea doesn't clarify specific confusions about why some processes work is different from the idea not referring to machines that make those processes work.

Similarly, forecasting that it won't work in some timeframe is more specific, and there does seem to have been a change of mind on that (as facts on the ground demand), but the linked post doesn't seem particularly relevant, there don't appear to be claims to that effect there, other than on the level of vibes.