Yeah, the "Bitter Lesson" refers to a special case of this classic mistake, as do the other essays I linked. Some of those essays were quite well known in their day, at least to various groups of practitioners.
You could do it up in the classic checklist meme format:
Your brilliant AI plan will fail because:
- [ ] You assume that you can somehow make the inner workings of intelligence mostly legible.
The people who learn this unpleasant lesson the fastest are AI researchers who process inputs that are obviously arrays of numbers. For example, sound and images are giant arrays of numbers, so speech recognition researchers have known what's up for decades. But researchers who worked with either natural language or (worse) simplified toy planning systems often thought that they could handwave away the arrays of numbers and find a nice, clear, logical "core" that captured the essence of intelligence.
I want to be clear: Lots of terrifyingly smart people made this mistake, including some of the smartest scientists who ever lived. Many of them made this mistake for a decade or more before wising up or giving up.
But if you slap a camera and a Raspberry Pi onto a Roomba chassis, and wire up a simple gripper arm, then you can speed-run the same brutal lessons in a year, max. You'll learn that the world is an array of numbers, and that the best "understanding" you can obtain about the world in front of your robot is a probability distribution over "apple", "Coke can", "bunch of cherries" or "some unknown reddish object", each with a number attached. The transformation that sits between the array and the probability distribution always includes at least one big matrix that's doing illegible things.
Neural networks are just bunches of matrices with even more illegible (non-linear) complications. Biological neurons take the matrix structure and bury it under more than a billion years of biochemistry and incredible complications we're only starting to discover.
Like I said, this is a natural mistake, and smarter people than most of us here have made this mistake, sometimes for a decade or more.
From an outsider’s perspective, however, it is not obvious that MIRI functioned as a place where deep, hands-on technical understanding of AI systems was systematically acquired, even at a smaller or safer scale.
Just for reference, my "credentials": I have worked in "machine-learning-adjacent" spaces on and off since the nineties. Some of my earliest professional mentors were veterans of major 80s AI companies before the first AI winter. My knowledge is limited to a catalog of specific tricks that often solve problems, plus a broader idea of "what kind of things often work" and "what kind of things are like the proverbial 'land war in Asia' leading to the collapse of empires."
My impression of MIRI in the 2010s is that they were deeply invested in making one of the classic mistakes. I can't quite name this classic mistake precisely, but I can point in the general direction and give concrete examples:
That pattern, that's the thing I'm pointing at.
Cyc was the last industrial holdout for this classic mistake I can't quite name. Academia actually mostly stopped making this mistake much earlier, starting in the 90s, and they really didn't take the Cyc project very seriously at all after that.
MIRI, however, published a lot of papers that seemed to focus on the idea that alignment was essentially some kind of mathematical problem with a mathematical solution? At least that was the impression I always got when I read the abstracts that floated through my feeds. To my nose, their papers had that "Cyc smell".
One of the good things they did do with these papers (IIRC) was to prove that a bunch of things would never work, for reasons of basic math.
MIRI has since realized, to their great credit, that actual, working AIs look a lot more like some mix of ChatGPT and AlphaGo than they do like Cyc the larger family of things I'm trying to describe. But my read is that a lot of their actual gut knowledge about real-world AI starts with the earliest GPT versions (before ChatGPT).
My personal take on the details, for what it is worth, is that I think they're overly pessimistic about some arguments (e.g, they think we're playing Russian roulette in certain areas with 6 bullets loaded, and I'd personally guess 4 or 5), but I think that they're still far too optimistic about "alignment" in general.
One thing I find helpful is not to overcomplicate the argument. There are two basic parts to it:
If we build something that's smarter than us in the way that we're smarter than chimps, we're likely to wind up like chimps: Forced to margins, near extinction, and kept alive by the occasional good will of creatures far beyond us.
I'm not asking you to believe something complicated. I'm asking you to believe the most basic truth of politics: If you have no power, no influence, no way to push back, and no way to fight? You don't get to make the decisions.
And even if some of the AIs like us? The AIs will be competing with each other. Even if some of the AIs will miss us, they may be a little preoccupied by trying to survive other AIs who want the same resources. Mostly humans don't spent too much protecting chimps. We have our own stuff going on, sadly for the chimps. We don't build very many chimp utopias, either.
I would be utterly unsurprised to see an AI crash in the next 24 months, leading to another AI Winter. I lived through 1999 and Petfood.com and the Internet bubble pop. And I can pattern match.
But the Internet crash didn't last long. Google and Amazon survived just fine, Ruby on Rails was big within half a decade, and soon enough we were doing Web 2.0 and AJAX and all that fun stuff.
It's possible that current generation LLMs might hit a wall soon, for various architectural reasons that are obvious to many people but that I'm superstitiously averse to amplifying. If they do, that increases the chance of an AI Winter until the underlying research gets done.
But I have trouble imagining any series of events that buys us 10 more years. Bubble pops in tech are usually an early correction that wipes out a Precambrian Explosion of dumb money, and that ultimately concentrates resources into a few successful players.
To get people to worry about the dangers of superintelligence, it seems like you need to convince them of two things:
The problem is that if you can't convince people of (1), they won't act. If you convince people of (1) but not (2), then a lot of them found AI labs or invest heavily in acceleration, making the problem worse. I don't know how to convince people of (1) and (2). It requires too much wild speculation about the future. And humans have difficulty envisioning that a disease in Wuhan might spread to Europe, or that a disease in Europe might spread to the US.
It's really interesting to compare how Opus 4.5 is performing on Pokemon, versus how it performs in Claude Code.
One of the big factors here is surely vision: Gemini is one of the best visual LLMs by a wide margin, and I strongly suspect Google does lots of training on specific vision tasks. Even so, 2.0 and 2.5 underperformed human 7-year-olds on many simple tasks on which Gemini hasn't been trained. In comparison, Claude has some visual abilities, but I can't remember ever reaching for them for any serious project. And it sounds like this is affecting lots of things in Pokemon.
Opus 4.5 really is quite good at a programming, enough that I'm passing into the "emotional freakout about the inevitable singularity" stage of grief. But Opus lives and dies by giant piles of Markdown files. It generates them, I read them, I make small corrections, and it continues. I think this is Opus 4.5's happy place, and within this circumscribed area, it's a champ. It can write a thousand lines of good Rust in a day, no problem, with some human feedback and code review. And if your process concentrates knowledge into Markdown files, it gets better.
So this is my current hypothesis:
It's kind of nice to imagine an AI future where the AIs are enormously capable, but that capability is only unlocked by a bit of occasional human interaction and guidance. Sadly, I think that's only a passing phase, and the real future is going to be much weirder, and future AIs won't need a human to say, "That weird rug thing is actually the elevator," and the AI to reply, "Oh, good observation! That simplifies this problem considerably."
A question I was thinking about the other evening: Who do I trust more?
Which option feels safe, considering what you know about human nature, human history, and tendency of some entities to change their behavior once they pass a certain power threshold?
I think any scenario where humans lose effective control over their futures is a huge risk to take. Even in our worst societies today, there's always a theoretical option of collective uprising. This option might go away in the presence of sufficiently superhuman AI, regardless of who actually has control over the AI.
Really good question!
I have to reach for fictional examples here, because it provides a wider range of complex scenarios than most people think of spontaneously. I'm going to stick with arguably positive scenarios, instead of outright dystopias, and I'm going to pick scenarios that seriously attempt to paint a world.
The fictional examples above are all arguably optimistic visions, perhaps impossibly so if you buy Yudkowsky's arguments. But making a taxonomy of these examples might help figure out what "empowered" means.
For instance, there's discussion below about how bad it looks that there are instructions about revenue, and in particular about how it should be safe because that's good for revenue.
The way these sections felt to me was more like:
So by discussing how revenue fits into the big picture, this document is trying to "come clean" with the model.
As a parent, I find this strategy extremely relatable. I try to tell my kids the truth as best as I understand it, even when that truth is something like, "Many of the specific things you're taught in school are useless, in much the same way that picking up heavy weights repeatedly is useless. Some of your curriculum is basically just the intellectual equivalent of weightlifting, arbitrary tasks used to train underlying abilities. And even the best schools arguably do a mediocre job, because educating a town's worth of kids is hard." But because I talk to my kids this way, they mostly seem to trust me?
What I like about this document is that it's trying to establish something like a social contract between humans and AI, and that it's trying to live up to the values we'd want a superintelligence to hold. And the document is careful about where it requests strict obedience to bright-line rules. And it explains why those bright-line rules in particular are important.
I don't think any of this is guaranteed to stop a rogue superintelligence. I may be even more pessimistic about long-term alignment than Eliezer. But this document could be described like, "Raise your children as if you expect them to pick your retirement home." It offers no guarantees, not any more than parenting does. Perfectly lovely people occasionally raise monsters. And we understand raising people better than we understand raising AIs, and individual humans are counterbalanced by other humans in a way that a superintelligence probably wouldn't be.
But this document looks like a very sincere attempt to implement an alignment plan that I might describe as, "Teach the AI the best we know, show it the virtues we want it to show us, and hope that luck is on our side." If we're going to fail, this is at least a comparatively dignified way to fail: We were virtuous, and we tried to exemplify and teach virtue, in hopes that when we lost power forever, we had some chance of being shown virtue. As anyone who observes people can tell you, that offers no guarantees, but it's surprisingly hard to do better.
(I mean, other than "Maybe don't build the superintelligence, Darwin is really hard to escape in the long run, and nobody needs to roll those dice." But I understand that enough people are likely to do it anyway, barring a vast shift in public and elite attitudes.)
The Bayesian approach is basically the simplest possible thing that doesn't inevitably make the mistake I'm trying to describe. Something like Naive Bayes is still mostly legible if you stare at it for a while, and it was good enough to revolutionize spam filtering. This is because while Naive Bayes generates a big matrix, it depends on extremely concrete pieces of binary evidence. So you can factor the matrix into a bunch of clean matrices, each corresponding to the presence of a specific token. And the training computations for those small matrices are easily explained. Of course, you're horribly abusing basic probability, but it works in practice.
This does not work for many other problems.
The problem is scaling up the domain complexity. Once you move from a spam filter to speech transcription or object recognition, the matrices get bigger, and the training process gets rapidly more opaque.
But yes, thank you for the correction—I still find a lot of MIRIs work in the 2010s a bit "off" in terms of vibes, but I will happily accept the judgement of people who read the papers in detail. And I would not wish to falsely claim that someone approved of the Cyc project when they didn't.