Am I the only one reading the first passage as him being critical of the advertising of NNs, rather than of NNs themselves?
Partially, but it is still true that Eliezer was critical of NN's at the time, see the comment on the post:
I'm no fan of neurons; this may be clearer from other posts.
Eliezer has never denied that neural nets can work (and he provides examples in that linked post of NNs working). Eliezer's principal objection was that NNs were inscrutable black boxes which would be insanely difficult to make safe enough to entrust humanity-level power to compared to systems designed to be more mathematically tractable from the start. (If I may quip: "The 'I', 'R', & 'S' in the acronym 'DL' stand for 'Interpretable, Reliable, and Safe'.")
This remains true - for all the good work on NN interpretability, assisted by the surprising levels of linearity inside them, NNs remain inscrutable. To quote Neel Nanda the other day (who has overseen quite a lot of the interpretability research that anyone replying to this comment might be tempted to cite):
Oh man, I do AI interpretability research, and we do not know what deep learning neural networks do. An fMRI scan style thing is nowhere near knowing how it works.
What Eliezer (and I, and pretty much every other LWer at the time who spent any time looking at neural nets) got wrong about neural nets, and has admitted as much, is the timing. (Aside from that, Ms Lincoln...)
To expand a bit on the backstory I also discus...
LLM's have turned out more human like, more oracle like than we imagined?
They have turned out far more human-like than Amodei suggested, which means they are not even remotely oracle like. There is nothing in a LLM which is remotely like 'looking things up in a database and doing transparent symbolic-logical manipulations'. That's about the last thing that describes humans too - it takes decades of training to get us to LARP as an 'oracle', and we still do it badly. Even the stuff LLMs do, like inner-monologue, which seem to be transparent, are actually just more Bayesian meta-RL agentic behavior, where the inner-monologue is a mish-mash of amortized computation and task location where the model is flexibly using the roleplay as hints rather than what everyone seems to think it does, which is turn into a little Turing machine mindlessly executing instructions (hence eg. the ability to distill inner-monologue into the forward pass, or insert errors into few-shot examples or the monologue and still get correct answers).
My guess is AlphaGo-- I once heard someone who worked at MIRI say that they watched the event and Eliezer was surprised by it.
Yes, I seem to remember him writing about it at the time, too. Not big posts, more public comments and short posts, not sure exactly where.
The invention of transformers circa 2017 would be the next time I remember a similar shift.
In retrospect Alpha0 was really the wake up call for me, not because it was so strong at chess but because it looked so human playing chess.
Here is Yudkowsky (2008) Artificial Intelligence as a Positive and
Negative Factor in Global Risk:
Friendly AI is not a module you can instantly invent at the exact moment when it is first needed, and then bolt on to an existing, polished design which is otherwise completely unchanged.
The field of AI has techniques, such as neural networks and evolutionary programming, which have grown in power with the slow tweaking of decades. But neural networks are opaque—the user has no idea how the neural net is making its decisions—and cannot easily be rendered unopaque; the people who invented and polished neural networks were not thinking about the long-term problems of Friendly AI. Evolutionary programming (EP) is stochastic, and does not precisely preserve the optimization target in the generated code; EP gives you code that does what you ask, most of the time, under the tested circumstances, but the code may also do something else on the side. EP is a powerful, still maturing technique that is intrinsically unsuited to the demands of Friendly AI. Friendly AI, as I have proposed it, requires repeated cycles of recursive self-improvement that precisely preserve a stable optimization target.
The most powerful current AI techniques, as they were developed and then polished and improved over time, have basic incompatibilities with the requirements of Friendly AI as I currently see them. The Y2K problem—which proved very expensive to fix, though not global-catastrophic—analogously arose from failing to foresee tomorrow’s design requirements. The nightmare scenario is that we find ourselves stuck with a catalog of mature, powerful, publicly available AI techniques which combine to yield non-Friendly AI, but which cannot be used to build Friendly AI without redoing the last three decades of AI work from scratch.
Another 15 years didn't make the idea any newer. Being critical of invalid perception or presentation of an idea is more specific and different from "being critical" of the idea. Pointing out that the idea doesn't clarify specific confusions about why some processes work is different from the idea not referring to machines that make those processes work.
Similarly, forecasting that it won't work in some timeframe is more specific, and there does seem to have been a change of mind on that (as facts on the ground demand), but the linked post doesn't seem particularly relevant, there don't appear to be claims to that effect there, other than on the level of vibes.
In 2008, Eliezer Yudkowsky was strongly critical of neural networks. From his post "Logical or Connectionist AI?":
By contrast, in Yudkowsky's 2023 TED Talk, he said:
Sometime between 2014 and 2017, I remember reading a discussion in a Facebook group where Yudkowsky expressed skepticism toward neural networks. (Unfortunately, I don't remember what the group was.)
As I recall, he said that while the deep learning revolution was a Bayesian update, he still didn't believe neural networks were the royal road to AGI. I think he said that he leaned more towards GOFAI/symbolic AI (but I remember this less clearly).
I've combed a bit through Yudkowsky's published writing, but I have a hard time tracking when, how, and why he changed his view on neural networks. Can anyone help me out?
This post exists only for archival purposes.