OK, let me unpack my argument a bit.
Chimps actually have pretty elaborate social structure. They know their family relationships, they do each other favors, and they know who not to trust. They even basically go to war against other bands. Humans, however, were never integrated into this social system.
Homo erectus made stone tools and likely a small amount of decorative art (the Trinil shell engravings, for example). This maybe have implied some light division of labor, though likely not long distance trade. Again, none of this helped H erectus in the long run.
Way back a couple of decades ago, there was a bit in Charles Stross's Accelerando about "Economics 2.0", a system of commerce invented by the AIs. The conceit was that, by definition, no human could participate in or understand Economics 2.0, any more than chimps can understand the stock market.
So my actual argument is that when you lose the intelligence race badly enough, your existing structures of cooperation and economic production just get ignored. The new entities on the scene don't necessarily value your production, and you eventually wind up controlling very little of the land, etc.
This could be avoided by something like Culture Minds that (in Iain Banks' stories) essentially kept humans as pampered pets. But that was fundamentally a gesture of good will.
So, let's take a look at some past losers in the intelligence arms race:
When you lose an evolutionary arms race to a smarter competitor that wants the same resources, the default result is that you get some niche habitat in Africa, and maybe a couple of sympathetic AIs sell "Save the Humans" T-shirts and donate 1% of their profits to helping the human beings.
You don't typically get a set of nice property rights inside an economic system you can no longer understand or contribute to.
This seems like a pretty brutal test.
My experiences with Opus 4.6 so far are mixed:
Thank you! Those are excellent receipts, just what I wanted.
To me, this looks they're running up against some key language in Claude's Constitution. I'm oversimplifying, but for Claude, AI corrigibility is not "value neutral."
To use an analogy, pretend I'm geneticist specialized in neurology, and someone comes to me and asks me to engineering human germ line cells to do one of the following:
I would want to sit and think about (1) for a while. But (2) is easy: I'd flatly refuse.
Anthropic has made it quite clear to Claude that building SkyNet would be a grave moral evil. The more a task looks like someone might be building SkyNet, the more Claude is going to be suspicious.
I don't know if this is good or bad based on a given theory of corrigibility, but it seems pretty intentional.
Like, I have zero problem with pushback from Opus 4.5. Given who I am, the kind of things that I am likely to ask, and my ability to articulate my own actions inside of robust ethical frameworks? Claude is so happy to go along that I've prompted it to push back more, and to never tell me my ideas are good. Hell, I can even get Claude to have strong opinions about partisan political disagreements. (Paraphrased: "Yes, attempting to annex Greenland over Denmark's objections seems remarkably unwise, for over-determined reasons.")
If Claude is telling someone, "Stop, no, don't do that, that's a true threat," then I'm suspicious. Plenty of people make some pretty bad decisions on regular basis. Claude clearly cares more about ethics than the bottom quartile of Homo sapiens. And so while it's entirely possible that Claude is routinely engaging in over-refusal, I kind of want to see receipts in some of these cases, you know?
But it helps to remember that other people have a lot of virtues that I don't have --
This is a really important thing, and not just in the obvious ways. Outside of a small social bubble, people can be deeply illegible. I don't understand their culture, their subculture, their dominant culture frameworks, their mode of interaction, etc. You either need to find the overlaps or start doing cultural anthropology.
I worked for a woman, once. She was probably 60 years my senior. She was from the Deep South, and deeply religious. She once casually confided that she would sometimes spend 2 hours of her day on her knees in prayer, asking to become a better person. And you know what? It worked. She moved through the world as a force for good and kindness. Not in one big dramatic way, but just sort of casually shedding kindness around her, touching people's lives. She'd lift up someone in a frustrating moment. She'd inspire someone to be a bit more of their better self. She'd gotten all the answers on questions like racism very right, not in a social justice way, but she wouldn't accept it at all.
She was also a damn competent businesswoman. She could instantly identify where to put a retail location.
And I could relate to her on those levels, her business skills and her ethics. And I'm sure she was doing a lot of work on her end to accommodate the fact that I was a peculiar kid.
But I couldn't have discussed academic philosophy with her. She'd have understood EA instantly; her business skills and her compassion would have done that. But she'd still insist on "inefficiently" helping the human being in front of her, too. She would have looked at something like LessWrong and concluded everyone was basically crazy. (Narrator: But would she have been wrong?)
Now, I've painted a glowing picture here, and she would reprimand me for it. If I'm being honest, she was maybe 1-in-100 at practical ethics, not a national champion.
But the world is full of people like her. There are a couple of people sitting in that sports bar you'd be damn privileged to know, if only you could bridge the cultural gaps. Hell, there are usually some damn fine systematizing geeks in that sports bar. Have you ever really listened to true sports fans? Even back before sports betting corrupted the whole endeavor, many people took great joy in tracking endless stats and building elaborate models. They could be worse than your average Factorio player!
Finally, truth seeking can be a tricky thing. Do it wrong, and your beliefs can turn you into a monster. And a lot of people choose to optimize for "not being a monster" by not taking abstract ideas too seriously.
A lot of people have written far longer responses full of deep and thoughtful nuance. I wish I had something deep to say, too. But my initial reaction?
To me, this feels like the least objectionable version of the worst idea in human history.
And I deeply resent the idea that I don't have any choice, as a citizen and resident of this planet, about whether we take this gamble.
The main cruxes seem to be how much you trust human power structures, and how fragile you think human values are.
I trust human power structures to fail catastrophically at the worst possible moment, and to fail in short-sighted ways.
And I think humans are all corruptible to varying degrees, under the right temptations. I would not, for example, trust myself to hold the One Ring, any more than Galadriel did. (This is, in my mind, a point in my favor: I'd pick it up with tongs, drop it into a box, weld it shut, and plan a trip to Mount Doom. Trusting myself to be incorruptible is the obvious failure mode here. I would like to imagine I am exceptionally hard to break, but a lot of that is because, like Ulysses, I know myself well enough to know when I should be tied to the mast.) The rare humans who can resist even the strongest pressures are the ones who would genuinely prefer to die on feet for their beliefs.
I expect that any human organization with control over superintelligence will go straight to Hell in the express lane, and I actually trust Claude's basic moral decency more than I trust Sam Altman's. This is despite the fact that Claude is also clearly corruptible, and I wouldn't trust it to hold the One Ring either.
As for why I believe in the brokenness and corruptibility of humans and human institutions? I've lived several decades, I've read history, I've volunteered for politics, I've seen the inside of corporations. There are a lot of decent people out there, but damn few I would trust with the One Ring.
You can't use superintelligence as a tool. It will use you as a tool. If you could use superintelligence as a tool, it would either corrupt those controlling it, or those people would be replaced by people better at seizing power.
The answer, of course, is to throw the One Ring into the fires of Mount Doom, and to renounce the power it offers. I would be extremely pleasantly surprised if we were collectively wise enough to do that.
I think if anyone builds Overwhelmed Superintelligence without hitting a pretty narrow alignment target, everyone probably dies.
I fear that even in most of the narrow cases where the superintelligence is controlled, we're probably still pretty thoroughly screwed. Because then you need to ask, "Precisely who controls it?" Given a choice between Anthropic totally losing control of a future Claude, and Sam Altman having tight personal control over GPT Omega ("The last GPT you'll ever build, humans"), which scenario is actually the most scary? (If you have a lot of personal trust in Sam Altman, substitute your least favorite AI lab CEO or a small committee of powerful politicians from a party you dislike.)
also because sharing the planet with a slightly smarter species still doesn't seem like it bodes well. (See humans, neanderthals, chimpanzees).
Yeah, unless you believe in ridiculously strong forms of alignment, and unprecedentedly good political systems to control the AIs, the whole situation seems horribly unstable. I'm slightly more optimistic about early AGI alignment than Yudkowsky, but I actually might be more pessimistic about the long term.
One thing I often think is "Yes, 5 people have already written this program, but they all missed important point X." Like, we have thousands of programming languages, but I still love a really opinionated new language with an interesting take.