Raises an interesting critique of AI Risk. I think he is wrong, but would be interested in how others would respond.

New to LessWrong?

New Comment
14 comments, sorted by Click to highlight new comments since: Today at 8:26 AM

IMO, Hanson isn't engaging with the arguments for risk, instead he's giving his own argument for safety. I therefore don't feel particularly motivated to engage with him. 

Example: He speaks abstractly about how coups are already a thing between humans and how therefore we shouldn't be concerned about AI coups unless we have special reason to, like Eliezer who thinks progress will be lumpy. Well, that's one reason I guess. But plenty of people who don't think progress will be lumpy (such as Paul Christiano) still are worried about AI coups; Hanson should go read what they wrote and understand why they think it's still a risk despite the existence of insurance schemes etc. and then point out which premise he thinks is faulty. For example, if he were talking to me, he could read what 2026 looks like and then say what part of it seems extremely implausible to him & suggest more plausible ways the story could go instead. Then we could argue about which revisions to the story are best & then continue the story with those revisions together until we get to coup time, which (spoiler) will probably be in 2027-2028 if the story continues the way I think is most plausible. And then he'd be in a corner; he'd have to say why this story doesn't represent a possibility worth taking seriously and trying to prevent. And if he thinks it's worth taking seriously but still overall unlikely, I'd challenge him to write a similarly detailed story that he thinks is more plausible & let me critique it.)

Anyhow, giving his arguments a skim...

I guess there's a lot I disagree with, actually. But just to pick one example because I'm lazy:

Hanson, about Yudkowsky:

Alternatively, he claims (quite implausibly I think) that all AGIs naturally coordinate to merge into a single system to defeat competition-based checks.

Yudkowsky, in his Big List O' Doom:

34.  Coordination schemes between superintelligences are not things that humans can participate in (eg because humans can't reason reliably about the code of superintelligences); a "multipolar" system of 20 superintelligences with different utility functions, plus humanity, has a natural and obvious equilibrium which looks like "the 20 superintelligences cooperate with each other but not with humanity".

Me, from  three years ago:

To a tiger, human hunter-gatherers must be frustrating and bewildering in their ability to coordinate. "What the hell? Why are they all pouncing on me when I jumped on the little one? The little one is already dead anyway, and they are risking their own lives now for nothing! Dammit, gotta run!"

To a tribe of hunter-gatherers, farmers must be frustrating and bewildering in their ability to coordinate. "What the hell? We pillaged and slew that one village real good, they sure didn't have enough warriors left over to chase us down... why are the neighboring villages coming after us? And what's this--they have professional soldiers with fancy equipment riding horses? Somehow hundreds--no, thousands--of farmers cooperated over a period of several years to make this punitive expedition possible! How were we to know they would go to such lengths?"

To the nations colonized by the Europeans, it must have been pretty interesting how the Europeans were so busy fighting each other constantly, yet somehow managed to more or less peacefully divide up Africa, Asia, South America, etc. to be colonized between them. Take the Opium Wars and the Boxer Rebellion for example. I could imagine a Hansonian prophet in a native american tribe saying something like "Whatever laws the European nations use to keep the peace among themselves, we will benefit from them also; we'll register as a nation, sign treaties and alliances, and rely on the same balance of power." He would have been disastrously wrong.

I expect something similar to happen with us humans and AGI, if there are multiple AGI. "What? They all have different architectures and objectives, not to mention different users and owners... we even explicitly told them to compete with each other! Why are they doing X.... noooooooo...." (Perhaps they are competing with each other furiously, even fighting each other. Yet somehow they'll find a way to cut us out of whatever deal they reach, just as European powers so often did for their various native allies.)

I think that even if the AGIs have merely human-level coordination ability, it would still be more likely than not that they'd coordinate to do a coup once they had a high probability of success. Like EY said, the AGIs will probably be systematically different from humans in what they want (unless we solve the alignment problem) such that even if the AIs don't all want the same thing, there's a very natural schelling-pointy equilibrium which is all the AIs ganging up on the humans, just like humans ganged up on horses, farm animals, the rainforest, etc. and just like how e.g. European colonizers ganged up on various groups of natives even while literally fighting each other. Human groups have done this plenty of times in the past. Consider the other possible coalitions: All the AIs, minus one, plus the humans, go beat up that one remaining AI? Maybe this'll happen, sure. And then what's the next coalition -- who gets voted off the island next? Another one of the AIs, or the humans? Remember that AIs in general will be rapidly gaining in power relative to humans, so eventually the human faction will be more a liability than an asset to its team.* Also remember that at some fairly early point, the best way to get human support will be to lie to them via hyper-persuasive false campaign promises etc. rather than tell the truth. I expect some initial period where each AI faction has a bunch of human patsies, and then eventually the conflict between AI factions is resolved one way or another and the humans are disposed of. 

Oh, and also: It's quite plausible there never will be rival AI factions at all, because all the AIs will have similar values, perhaps because they are all copies of the same base model, or different base models trained similarly with similar architectures. Or, again, perhaps because they are better at coordinating than humans.

*this point may come earlier than you think, because of the thing about lying > honesty and because in a war between AIs, not having to worry about collateral damage to humans, or unethical treatment of humans more generally, could be quite useful.

Anyhow that's just one random point of disagreement, I could keep finding more if I wanted to.

I just read through your "what 2026 looks like" post, but didn't see how it is a problematic scenario. Why should we want to work ahead of time to prepare for that scenario?

Thanks for reading! No existential catastrophe has happened yet in that scenario. The catastrophe will probably happen in 2027-2028 or so if the scenario continues the way I think is most plausible. I'm sorry that I never got around to writing that part, if I had, this would be a much more efficient conversation! Basically to find our disagreement (assuming you agree my story up till 2026 is plausible) we need to discuss what the plausible continuations to the story are.

Surprise surprise, I think that there are plausible (indeed most-plausible) continuations of the story in which the bigass foundation model chatbots (which, in the story, are misaligned) as they scale up become strategically aware, agentic planners with superhuman persuasion and coding ability by around 2027. From there (absent clever intervention from prepared people) they gradually accumulate power and influence, slowly at first and then quickly as the R&D acceleration gets going. Maybe singularity happens around 2030ish, but well before that point the long-term trajectory of the world is firmly in the grip of some coalition of AIs.

Do you think (a) that the What 2026 Looks Like story is not plausible, (b) that the continuation of it I just sketched is not plausible, or (c) that it's not worth worrying about or preparing for? I'm guessing (b). Again it's unfortunate that I haven't finished the story, if I had we'd have more detailed things to disagree on.

ETA: Since I doubt you have time or interest to engage with me at length about the 2026 scenario, what do you think about the bulk of my criticism above? About coordination ability? Both of us like to analogize AI stuff to historical stuff, it seems we differ in which historical examples we draw from -- for you, the steam engine and other bits of machinery, for me, colonial conquests and revolutions.

It seems the key feature of this remaining story is the "coalition of AIs" part. I can believe that AIs would get powerful, what I'm skeptical about is the claim that they naturally form a coalition of them against us. Which is also what I object to in your prior comments. Horses are terrible at coordination compared to humans, and humans weren't built by horses and integrated into a horse society, with each human originally in the service of a particular horse.  

Yes, horses are terrible at coordination compared to humans, and that's a big part of why they lost. At some point in prehistory the horses could have coordinated to crush the humans, and didn't. Even in the 1900's horses could have gone on strike, gone to war, etc. and negotiated better working conditions etc. but didn't becuase they weren't smart enough.

Similarly, humans are terrible at coordination compared to AIs.

I agree that the fact that humans build the AIs is a huge advantage. But it's only a huge advantage insofar as we solve alignment or otherwise limit AI capabilities in ways that serve our interests; if instead we YOLO it or half-ass it, and end up with super capable unaligned agentic AGIs... then we squandered our advantage.

I don't think the integration into society or service relationship thing matters much. Imagine the following fictional case:

The island of Atlantis is in the mid-Atlantic. It's an agrarian empire at the tech level of the ancient Greeks. Atlantis also contains huge oil, coal, gold, and rare earth deposits.

In 1500 Atlantis is discovered by European explorers, who set up trading posts along the coast. For some religious reason, Atlanteans decide to implement the following strict laws: (a) No Atlantean can learn any language other than Atlantean. (b) It is heresy, thoughtcrime, for any Atlantean to understand European ideas. You can use European technology (guns, etc.) but you can't know how it works and you certainly can't know how to build it yourself, because that would mean you've been infected by foreign ideas.

(The point of these restrictions is to mimic how in humans vs. AI, the humans won't be able to keep up with AIs in capabilities no matter how hard they try. They can use the fancy gadgets and apps the AIs build, but they can't contribute to frontier R&D. By contrast in historical cases of colonialism and conquest, the technologically weaker side can in principle learn the scientific and economic methods and then catch up to the stronger side.)

Also, the Atlantean coast is a giant cliff along its entire length. It's essentially impossible to assault Atlantis from the sea, therefore; you need the permission of the locals to land, no matter how large your army is and no matter how advanced your technology is.

Anyhow. Suppose at first the Europeans are too weak to conquer Atlantis and so instead they trade peacefully; Atlantis in fact purchases large quantities of European slaves and hires large quantities of European entrepreneurs and craftsmen and servants. Because of (a) and (b) and the amazing technology and trade goods the Europeans bring, demand for European labor and goods is high and the population of Europeans and their goods grows exponentially. The tech level also rises dramatically until it catches up with the global frontier.

Eventually by 1800 native Atlanteans are outnumbered 100 to 1, and also pretty much the whole economy is run by Europeans, not just at the menial level but at the managerial level and R&D level as well, due to the laws a and b. Also, while the fancy weapons and tools are now held by Atlanteans and Europeans alike, the Europeans are much better at using them since only they are allowed to understand them and build them...

Yet (if I continue the story the way I think you think it should be continued) the Atlanteans remain in control and don't suffer any terrible calamities, certainly not a coup or anything like that, because the transition was gradual and the Europeans began in a position of servitude and integration in the economy on terms of native Atlantean choosing... Basically, all the arguments you make about why AI coups aren't a concern, apply to this hypothetical scenario as arguments for why European coups aren't a concern. And ditto for the more general arguments about loss of control.

Anyhow, this all seems super implausible to me, based on my reading of history. Atlanteans in 1500 should be scared that they'll be disenfranchised and/or killed long before 1800; they should not listen to Hansonian prophets in their midst.

Similarly, humans are terrible at coordination compared to AIs.


Are there any key readings you could share on this topic? I've come across arguments about AIs coordinating via DAOs or by reading each others' source code, including in Andrew Critch's RAAP. Is there any other good discussion of the topic?

Unfortunately I don't know of a single reading that contains all the arguments.

This post is relevant and has some interesting discussion below IIRC.

Mostly I think the arguments for AIs being superior at coordination are:
1. It doesn't seem like humans are near-optimal at coordination, so AIs will eventually be superior to humans at cooperation just like they will eventually be superior in most other things that humans can do but not near-optimally.
2. We can think of various fancy methods (such as involving reading source code, etc.) that AIs might use that humans don't or can't.
3. There seems to be a historical trend of increasing coordination ability / social tech, we should expect it to continue with AI.
4. Even if we just model AIs as somewhat smarter, more agentic, more rational humans... it still seems like that would probably be enough. Humans have coordinated coups and uprisings successfully before, if we imagine the conspirators are all mildly superhuman...

I think it might be possible to design an AI scheme in which the AIs don't coordinate with each other even though it would be in their interest. E.g. perhaps if they are trained to be bad at coordinating, strongly rewarded for defecting on each other, etc. But I don't think it'll happen by default.

I am slightly baffled that someone who has lucidly examined all of the ways in which corporations are horribly misaligned and principle-agent problems are everywhere, does not see the irony in saying that managing/regulating/policing those corporations will be similar to managing an AI supercluster totally united by the same utility function.

He says the "totally united by the same utility function" part is implausible:

he claims (quite implausibly I think) that all AGIs naturally coordinate to merge into a single system to defeat competition-based checks.

Different AIs run built and run by different organizations would have different utility functions and may face equal competition from one another, that's fine. My problem is the part after that where he implies (says?) that the Google StockMaxx AI supercluster would face stiff competition from the humans at FBI & co.

(I think) he thinks that managing/regulating/policing those corporations is the best that humans are willing to do.

I think it’s a pretty good argument. Holden Karnofsky puts a 1/3rd chance that we don’t see transformative AI this century. In that world, people today know very little about what advanced AI will eventually look like, and how to solve the challenges it presents. Surely some people should be working on problems that won’t be realized for a century or more, but it would seem much more difficult to argue that AI safety today is more altruistically pressing than other longtermist causes like biosecurity, and even neartermist causes like animal welfare and global poverty.

Personally I do buy the arguments that we could reach superintelligent AI within the next few decades, which is a large part of why I think AI safety is an important cause area right now.

Its not enough that AI might appear in a few decades, you also need something useful you can do about it now, compared to investing your money to have more to spend later when concrete problems appear.

A 2/3rds chance of a technology that might kill everyone (and certainly would change the world in any case) is still easily the most important thing going on right now. You'd have to demonstrate that AI has less than a 10% chance of appearing in my lifetime for me to not care about AI risk.