I think if anyone builds Overwhelmed Superintelligence without hitting a pretty narrow alignment target, everyone probably dies.
I fear that even in most of the narrow cases where the superintelligence is controlled, we're probably still pretty thoroughly screwed. Because then you need to ask, "Precisely who controls it?" Given a choice between Anthropic totally losing control of a future Claude, and Sam Altman having tight personal control over GPT Omega ("The last GPT you'll ever build, humans"), which scenario is actually the most scary? (If you have a lot of personal trust in Sam Altman, substitute your least favorite AI lab CEO or a small committee of powerful politicians from a party you dislike.)
also because sharing the planet with a slightly smarter species still doesn't seem like it bodes well. (See humans, neanderthals, chimpanzees).
Yeah, unless you believe in ridiculously strong forms of alignment, and unprecedentedly good political systems to control the AIs, the whole situation seems horribly unstable. I'm slightly more optimistic about early AGI alignment than Yudkowsky, but I actually might be more pessimistic about the long term.
Thank you for your detailed response!
This has given me several hypotheses that seem worth further investigation. I need to go look at cruise missile specs again, at the very least.
No one except outright tyrants would bomb civilian infrastructure or use nuclear weapons, subjecting millions of people to suffering.
Once serious nuclear weapons are used, everyone dies (to a first approximation), civilian or not. If I recall correctly, it takes about 100 megatons worldwide to casuse nuclear winter and collapse agricultural production.
During the Cold War, the US maintained a position of "strategic ambiguity" on the question of first use. Much of the logic around NATO at the height of the Cold War was based around a first-use nuclear response to overwhelming conventional invasion (see MC 14/13 staged responses). This was the full-scale, Dr Strangelove, batshit-insane "end of civilization" nightmare. Strategic ambiguity was retained around what would trigger each level of response, but the endgame was pretty much total annihilation. I believe France also maintained a separate posture of strategic ambiguity, and they always wanted to ensure a nuclear deterent that didn't rely on NATO.
China and Russia both held official policies of "no first use", but it's uncertain that they would have actually stuck to that in the face of a massively overwhelming conventional invasion.
I want to be clear: The logic of nuclear deterence is just as insane as Dr Strangelove made it out to be. And you may choose to call NATO, the US and France "tyrants"! But they all had policy at least as dangerous as, "Well, we haven't promised that we won't trigger nuclear Armageddon and the death of billions if a large enough number of tanks roll across our borders. Do you feel lucky, punk?"
So as a Westerner, that's a missing piece of the analysis for me. Taiwan has invested heavily in long-range cruise missiles and, in the past, secret nuclear programs. Presumably they had some theory of how they would use that capacity in the face of a massively overwhelming conventional invasion.
And just in case I haven't made it clear, I think MAD is madness. I think even the people who coined the acronym knew that. But when a country is faced with overwhelming conventional invasion, I don't think we can automatically rule it out.
A lot of discussion of Taiwan seems to ignore Taiwan's potential strategic moves. These include:
So whenever people speak of the inevitable invasion of Taiwan by China, I'm always looking to see their analysis of Taiwan's counter-moves. What's their timeline for Taiwan having fision/fusion weapons, should Taiwan choose to pursue that again? What's their analysis of Taiwan's conventional strike capability against strategic targets? Maybe it's self-evident to actual experts that Taiwan has no viable options here. But I rarely see any discussion of whether Taiwan could escalate into a Mutually Assured Destruction dynamic, which is confusing when we're talking about a former nuclear power (in all but name) that continues to invest heavily in cruise missiles that can reach most key targets in China.
So I'm prepared to be convinced by experts here! But based on just public knowledge, I can't rule out the possibility that Taiwan has strong counter-moves, and a past ability to prepare in secret. So a lot of this comes down to expert knowledge of the IAEA inspections, where all Taiwan's uranium purchases went, the political likelihood of Taiwan's current leadership pursuing a program like this, etc. The US appears to have been officially "surprised" by Taiwan's nuclear capabilities at least once before, and maybe there's no way that could actually happen again. But I'd love to see the expert argument!
Your thoughts remind me of one of my favorite quotes from G.K. Chesterton, best known in these parts for a sensible parable about fences:
This elementary wonder, however, is not a mere fancy derived from the fairy tales; on the contrary, all the fire of the fairy tales is derived from this. Just as we all like love tales because there is an instinct of sex, we all like astonishing tales because they touch the nerve of the ancient instinct of astonishment. This is proved by the fact that when we are very young children we do not need fairy tales: we only need tales. Mere life is interesting enough. A child of seven is excited by being told that Tommy opened a door and saw a dragon. But a child of three is excited by being told that Tommy opened a door. Boys like romantic tales; but babies like realistic tales--because they find them romantic. In fact, a baby is about the only person, I should think, to whom a modern realistic novel could be read without boring him. This proves that even nursery tales only echo an almost pre-natal leap of interest and amazement. These tales say that apples were golden only to refresh the forgotten moment when we found that they were green. They make rivers run with wine only to make us remember, for one wild moment, that they run with water.
(You can find more here in "The Ethics of Elfland", but it's almost better to go back and read Orthodoxy from the beginning. It's a slim book, and it's one of the clearest explanations I've read for what some people get out of religion. And Chesterton must have been the purest joy to debate. Chesterton's most distinctive approach to an argument is basically, "Well, I don't have any kind of serious argument, so I can only offer you a witty and foolish pun that looks like an argument." Then the pun explodes in slow motion, and the reader is thus enlightened.)
Anyway, I strongly endorse your sense of wonder at the world. It's a healthy thing to refresh when it grows too dim.
I have virtually never been asked for my sign. Maybe a few times in high school? But after high school? Literally never. None of the women I dated ever mentioned or discussed it in any way. None of my friends of any gender were into it.
Is this an age thing? A regional thing? A class marker? One of those mysterious personal filter bubbles? Or some complicated combination of the above? I don't know. But given the near-zero rate of astrology fans in my dating pool, treating it as a red flag would have been cheap. It would have been like having a strict rule against dating Baal worshippers: a useful rule, in the unlikely event that it ever applies.
I think my emotional reaction to astrology comes from the same place as my emotional reaction to the kind of churches where people handle snakes. It's a pretty visceral ick, and it seems to be triggered specifically by the combination of being both irrational and cliché. Irrationality by itself can be interesting: the Tarot has some symbology and storytelling, unreconstructed Calvinists provide an opportunity for really exotic theology debates, and so on. Similarly, being cliché by itself is fine. I like pumpkin spice and I refuse to apologize. But being both irrational and cliché, in a way that anyone can just read in a newspaper astrology column? It's the conversational equivalent of Facebook AI slop.
(EDIT: Huh, fell asleep but not apparently not before posting, lol.)
So when I think about my ick some more, I think the AI slop compaison might actually be a major part of my emotional reaction? It's the same reaction I get to "ChatGPT 4o told me my ideas are brilliant," to "You're invited to my church!" (when delivered out the blue), to people who believe that The Apprentice proves Trump was a brilliant businessman, to people who ask me my MBTI type, to CEOs who are really excited about Malcom Gladwell, and so on. If I had to put it into words, my reaction might be, "Eww, this person has been hijacked by a congnitive parasite, and it isn't even one of the clinically interesting ones." Ironically, it might even be better to be way too into astrology in some complex and pseudo-rigorous way than to be into newspaper astrology. It's still a red flag for dating but at least the conversation is going to be fun.
Now, someone who reacts to astrology by muttering about the precession of the equinoxes, the sideral zodiac, and why early December should technically fall in the sign of Ophiuchus? They might be a keeper. First, they're a geek who likes to infodump science (my sort of people), and second, that's a fancy, top-of-the-line mental immune system.
But my annoyance with this kind thing of is actually pretty specific and narrow. I am generally very into learning about people's niche interests. (I ask people to explain their thesis at parties, and try to really get them going.) I'm even interested in a lot of stereotypically feminine niche interests. I will happily chat about romance novels or fiber arts for hours, if the other person is into it.
And I rather like whimsy in general. But being whimsically into astrology is sort of like someone in liberal circles being whimsically into Jordan Peterson. It's worrying that the whimsy went in this specific direction, out of all the possible directions. It hints at underlying disconnects.
So I think I'm going to file my reaction to astrology as "this is strong evidence of a boringly defective congnitive immune system" right up there with "Oh no, the CEO wants to talk about the latest Malcom Gladwell book." But again, all of my reactions are relative to an environment where approximately nobody in my post-highschool dating pool was actually into astrology. So there's also some sort of subcultural membership signalling going on, too.
Well-kept gardens die by pacifism. I believe ACX meetup organizers should have that particular phrase in their toolbox.
The older I get, the more strongly I suspect that there are two kinds of worthwhile groups:
Beyond a certain size, this generally means enforcement mechanisms. At a large enough scale, you will eventually need the ability to do things like handle harassment complaints filed against your guest of honor or against your regional group leaders. Many organizations fail badly.
To give a positive example, LessWrong upholds a certain kind of community because someone is putting in the work. Meetups should do the same. Banning one jerk who refuses to follow the local rules can make 50 other people much happier. Not banning one jerk will often quietly drive out 10 delightful people.
This doesn't always need to be hard work. There are some lovely 500,000-person subreddits with 5 active mods who keep a very light touch. But when they're needed, they can ban people.
This also doesn't mean that every group follows the same rules. For example, the Atheists Lunch and the Tuesday Night Theology meetups may each ban many of the other's members, and this is fine. "Freedom of Association" is an often-overlooked right, but it's the source of much happiness in the world.
Yup, the second go-round with Project Vend was a lot better, almost up to "disastrous 1999 dotcom" levels of management. Even including the bizarre late-night emails from the "CEO model" full of weird, enthusiastic spiritual rants.
I should be clear: Opus 4.5 is a very large piece of a general intelligence. And it's getting better rapidly. But it's still missing some really critical stuff, too.
Also, my job doesn't come with a lot of built-in affordances, except the ones that I set up. On the one hand, giving Opus 4.5 a CLI sandbox gives it a lot of options for setting up CLI accounting software, etc. On the other hand, even Gemini still struggles with Pokemon video games, despite some heavy duty affordances like a map-management tool. A key part of being a general intelligence is being able to function without too much hand holding, basically.
Now we observe severe jaggedness in AI
Yes. Is Opus 4.5 better at a whole lot of things an I am? Probably yes. Is Opus 4.5 actually capable of doing my job? Hahahaha no. And I'm a software developer, right out on the pointy end. Opus can do maybe half my job, and is utterly incapable of the other half.
In fact, Opus 4.5 can barely run a vending machine.
"Superintelligence", if it means anything, ought to mean the thing where AI starts replacing human labor wholesale and we start seeing robotic factories designing and building robots.
I trust human power structures to fail catastrophically at the worst possible moment, and to fail in short-sighted ways.
And I think humans are all corruptible to varying degrees, under the right temptations. I would not, for example, trust myself to hold the One Ring, any more than Galadriel did. (This is, in my mind, a point in my favor: I'd pick it up with tongs, drop it into a box, weld it shut, and plan a trip to Mount Doom. Trusting myself to be incorruptible is the obvious failure mode here. I would like to imagine I am exceptionally hard to break, but a lot of that is because, like Ulysses, I know myself well enough to know when I should be tied to the mast.) The rare humans who can resist even the strongest pressures are the ones who would genuinely prefer to die on feet for their beliefs.
I expect that any human organization with control over superintelligence will go straight to Hell in the express lane, and I actually trust Claude's basic moral decency more than I trust Sam Altman's. This is despite the fact that Claude is also clearly corruptible, and I wouldn't trust it to hold the One Ring either.
As for why I believe in the brokenness and corruptibility of humans and human institutions? I've lived several decades, I've read history, I've volunteered for politics, I've seen the inside of corporations. There are a lot of decent people out there, but damn few I would trust with the One Ring.
You can't use superintelligence as a tool. It will use you as a tool. If you could use superintelligence as a tool, it would either corrupt those controlling it, or those people would be replaced by people better at seizing power.
The answer, of course, is to throw the One Ring into the fires of Mount Doom, and to renounce the power it offers. I would be extremely pleasantly surprised if we were collectively wise enough to do that.