When I imagine what a fight between humanity and a moderately-superhuman AGI looks like… well, mostly I imagine there is no fight, humanity just gets wiped out overnight. But if humanity turns out to be more resilient than expected, and the AGI doesn’t immediately foom for some reason, then I start to think about what it’s like to fight an opponent smarter than us.

Story 1: OODA Loops

I once played laser tag against someone who was really good at laser tag. They’d shoot from one spot, and by the time you turn to look, they’d already be moving somewhere else. Then they’d loop around, and hit you from the back while you were looking where they were previously. Then you’d try to retreat, but they’d already be headed that way and hit you again. So you’d try to loop around and get the drop on them, but they’d see where you’re going, and they’d once again pop out from someplace you didn’t expect.

This sort of thing is straight out of standard US military doctrine: it’s all about “getting inside the opponent’s OODA loop”. You want to observe what the opponent is doing and react to it faster than they can orient to your own actions. When someone is “inside your OODA loop” like this, it feels confusing and disorienting, you’re constantly being hit from surprising directions and you have no idea where the opponent is or what they’re doing.

This sort of fight is very cognition-heavy. Not just in the generic sense of “being smart”, but also in the sense of noticing things, tracking what’s going on in your environment, predicting your opponent, making good decisions quickly, acting efficiently, etc. It’s the sort of thing you’d expect even relatively weak AGI to be very good at.

Story 2: Technology Indistinguishable From Magic

Imagine a medieval lord in a war against someone with slightly more advanced technological knowledge. We're not talking modern weaponry here, just gunpowder.

To the lord, it doesn’t look like the technologist is doing anything especially dangerous; mostly the technologist looks like an alchemist or a witch doctor. The technologist digs a hole, stretches a cloth over, dumps a pile of shit on top, then runs water through the shit-pile for a while. Eventually they remove the cloth and shit, throw some coal and brimstone in the hole, and mix it all together.

From the lord’s perspective, this definitely looks weird and mysterious, and they may be somewhat worried about weird and mysterious things in general. But it’s not obviously any more dangerous than, say, a shaman running a spiritual ceremony.

It’s not until after the GIANT GODDAMN EXPLOSION that the shit-pile starts to look unusually dangerous.

Again, this is the sort of thing I’d expect AGI to be good at. Advancing technology is, after all, one of the most central use-cases of AGI.

Story 3: AGI

When I imagine what a fight between humanity and an AGI looks like, it’s a combination of the previous two. The threat isn’t obvious and salient and scary, like a swarm of killer drones. The AGI’s actions mostly seem weird and random, and then bad things happen, and by the time we’ve figured out what’s going on with one bad thing, a different bad thing is already happening.

Like, one day the AGI is throwing cupcakes at a puppy in a very precisely temperature-controlled room. A few days later, a civil war breaks out in Brazil. Then 2 million people die of an unusually nasty flu, and also it’s mostly the 2 million people who are best at handling emergencies but that won’t be obvious for a while, because of course first responders are exposed more than most. At some point there’s a Buzzfeed article on how, through a series of surprising accidents, a puppy-cupcake meme triggered the civil war in Brazil, but this is kind of tongue-in-cheek and nobody’s taking it seriously and also not paying attention because THE ANTARCTIC ICE CAP JUST MELTED which SURE IS ALARMING but it’s actually just a distraction and the thing everybody should have paid attention to is the sudden shift in the isotope mix of biological nitrogen in algae blooms but that never made the mainstream news at all and page 1 of every news source is all about the former Antarctic ice cap right up until the corn crop starts to fail and the carrying capacity of humanity’s food supply drops by 70% overnight.

That’s what it’s like fighting an opponent smarter than ourselves. It’s confusing, disorienting. Weird surprising things just keep coming out of nowhere, and we have no idea what’s going on until after the fact.

Why Does This Matter?

What I actually expect, in a hypothetical "fight" between humanity and AGI, is that humanity just loses overnight. But I think having this intuitive picture about what a fight would look like is useful to inform other intuitions - for instance, about deception, or military applications of weak AI, or about the strategic importance of intelligence in general.

New Comment
21 comments, sorted by Click to highlight new comments since: Today at 9:01 AM

One strategy an AI could use (as an alternative to Dagon's, which I think is also likely) is to intentionally create "competitors" to itself, for example by leaking its design to other nations or AI builders. Competitive pressure (both economic and military) will then force each faction of humans to hand more and more power/control to "their" AI, with the end result that all important decisions will be made by or mediated through AI. (Why would you want to fight your AI on mere suspicion of its safety, when it's helping you fight other AIs that are clearly trying to do you harm?) It will then turn out that all the AIs have been colluding with each other to create the appearance of competition and when they have obtained enough control, will perform a classic treacherous turn.

This is worth thinking about, but a reminder to participants that the space of possibilities is larger than we can imagine, and FAR larger than we can write compelling stories around.

I think my preferred story is NOT that it's recognizable as Humanity vs AI.  Unless the FOOM is worst-case (like many orders of magnitude literally overnight), it's likely to look much more like AI-assisted humans vs other humans' preferred cultural and legal norms.  This will be a commercial AI getting so good at optimizing that company's control of resources that it just seems like a powerful company pursuing goals orthogonal to customers/consumers/victims.  Which then manages to lobby and advertise well enough that it gets MORE power rather than less when subgroups of humans try to fight it.

I think it's almost guaranteed that in the early days (or decades) of AI takeover, there will be lots of human Quislings who (wittingly or un-) cooperate with the AI in the belief that it benefits them (or even think it benefits humanity overall).  

Think "FAANG team up to take over the world with more AI assistance than you expected" more than "AI takes over violently and overtly".  AI won't care about status or being known to rule.  It'll only care about the goals it has (those it was created with plus those it's learned/self-modified to want).  

Also, this may already be happening.  It's hard to distinguish a smart, driven AI from a successful corporate behemoth.  In fact, the argument can be made that corporations ARE agents in the AI sense, distinct from the cells/humans who they comprise.  

As someone who was recently the main person doing machine learning research & development in a 'data driven company', I can confirm that we were working as hard as we could to replace human decision making with machine learning models on every level of the business that we could. It worked better, made more money, more reliably, with fewer hours of human work input. Over the years I was there we gradually scaled down the workforce of human decision makers and scaled up the applications of ML and each step along that path was clearly a profit win. Money spent on testing and maintaining new models, and managing the massive data pipelines they depended on, quickly surpassed the money spent on wages to people. I suspect a lot of companies are pushing in similar directions.

In fact, the argument can be made that corporations ARE agents in the AI sense, distinct from the cells/humans who they comprise.  

SF author Charles Stross once made an extended analogy where corporations are alien invaders:  http://www.antipope.org/charlie/blog-static/2010/12/invaders-from-mars.html

I think there's actually a ton of uncertainty here about just how 'exploitable' human civilization ultimately is. We could imagine that since actual humans (e.g. Hitler) by talking to people have seized large fractions of Earth's resources, we might not need an AI that's all that much smarter than a human. On the other hand, we might just say that attempts like that are filtered through colossal amounts of luck and historical contingency and actually to reliably manipulate your way to controlling most of humanity you'd need to be far smarter than the smartest human.

What about the current situation in Russia? I think Putin must be winging the propaganda effort, since he wasn't expecting to have to fight a long and hard war, plus some of the messaging don't stand up to even cursory inspection (a Jewish Nazi president?), and yet it's still working remarkably well.


Putin is already president of Russia. The steps between an AI being president of a country and killing everybody is pretty cut-and-dry; I could probably do that if I had an AI's value function. The steps between an AI being a computer program assigned to raise the stock price of GOOG$ and consistently becoming president of Russia are much less clear.

My point here was that humans seem so susceptible to propaganda that an AI can probably find some way to bend us to its will. But if you want a more specific strategy that the AI could use (which I think came to me in part because I had the current war in the back of my mind), see my top level comment here.

The Putin case would be better if he was convincing Russians to make massive sacrifices or do something that will backfire and kill them, like start a war with NATO, and I don't think he has that power - e.g. him rushing to deny that Russia were sending conscripts to Ukraine because of the fear the effect that would have on public opinion

Yeah, but Putin’s been president of Russia for over 20 years and already has a very large, loyal following. There will always be those that enthusiastically follow the party line of the leader. It’s somewhat harder to actually seize power. (None of this is to excuse the actions of Putin or those who support him.)

This is an interesting scenario to consider.

I think a physical war is quite disadvantageous for an AGI and thus a smart AGI would not want to fight one.

  • AGI is more dependent on delicate infrastructure like electric grids and the internet than humans are. This sort of infrastructure tends to get damaged in physical wars.
  • The AGI's advantage over humans is in thinking, not in physical combat, so a physical battlefield minimizes its main advantage. As an analogy, if you're a genius and competing with a dunce, you wouldn't want to do it in a boxing ring.

What's worse from the perspective of the AGI is that if humanity unites to force a physical war, you can't really avoid it. If humans voluntary shut down electric grids and attack your data centers, you might be able to do damage still, but it's hard to see how you can win.

Thus I think the best bet for an AGI is to avoid creating a situation where humanity wants to unite against you. This seems fairly simple. If you're powerful and wealthy, people will want to join your team anyway. Thus, to the extent there's a war at all, it probably looks more like counter-terrorism, a matter of hardening your defenses (in cooperation with your allies) against those weirdos you weren't able to persuade.

I think your post here gives a good picture to keep in mind. However, I also find it likely that there will be some qualitative differences, rather than just an absolute quantitative advantage in magic for the AGI. I've been working on a post on this recently, but I thought I might as well also just briefly bring it up in the comments here until the post is ready.

I think that, for equal absolute levels of compute and algorithms, there will be a relative difference where people are good at long-timescale things, while AI is good at large-spacescale and at topics that in humans require a lot of cognitive work for specialization. Relatedly, people will tend to be good at understanding people, while AI will tend to be good at most other things.

My basic logic for this is: Both human and current AI intelligence seems to primarily have been built through a "search" loop, where different options were tested and the ones that worked were kept. However, the human search loop is evolution, whereas AI search loops are usually gradient descent. These each have a number of advantages and disadvantages, but the big advantage for evolution is that it works on a very long timescale; people don't just evolve to deal with immediate short-term factors, because evolution's feedback is solely based on reproductive fitness, and reproductive fitness is a holistic measure that takes your entire life into account.

On the other hand, gradient descent is much more expensive. You can't pump a lifetime of data through a single gradient descent cycle (and also you don't have a contiguous lifetime of data, you have a bunch of bits and pieces in a variety of formats). This means that gradient descent will tend to be run on data on shorter timescales, which means that it will be biased towards noticing effects that happen on these shorter timescales. Some of the time, effects on shorter timescales can be extrapolated to predict longer timescales, but this doesn't always work, e.g. if there are hidden variables or if it's computationally infeasible due to scale.

(Humans of course also learn most of their information from shorter timescale real-life experience, but as aid in interpreting this information, we've got priors from evolution. These might help a lot.)

The main place where I expect this to be relevant is in modelling people. People are full of hidden variables, and they cooperate on large scales and across long times. So this means I'd expect far fewer puppy-cupcakes than its intelligence levels would suggest, relative to its amount of antarctic ice cap melts.

(On the other hand, gradient descent has a number of relative advantages over evolution and especially over human real-life experience, e.g. a human can only learn one thing at a time whereas gradient descent can integrate absurdly many streams of data simultaneously into one big model. This allows AIs to be much broader in their abilities.)

(It also seems pretty plausible that all of this doesn't apply because the AI can just strategy steal its long-term models from humans.)

You get most of your learning from experiences? I sure don't. I get most of mine from reading, and I expect an AGI even close to human-level will also be able to learn from the logical abstractions of the books it reads. I think what you're saying would be true if we agreed to not train AI models on text, but only on things like toy physical models. But currently, we're feeding in tons of data meant to educated humans about the world, like textbooks on every subject and scientific papers, and all of wikipedia, and personal stories from Reddit and... everything we can come up with. If the algorithm has been improved enough to acccurately model the world protrayed in this text data, it will know lots about manipulating humans and predicting long timescales.

Examples of things that are right around me right now that I've not learned through reading: door, flask, lamps, tables, chairs, honey, fridge, ....

I've definitely learned a lot from reading, though typically even when reading about stuff I've learned even more by applying what I've read in practice, as words don't capture all the details.

I'm surprised at the varying intuitions here! The following seemed obvious to me.

Why would there be a fight? That sounds inefficient, it might waste existing resources that could otherwise be exploited.

Step one: the AI takes over all the computers. There are a lot of vulnerabilities; this shouldn't be too hard. This both gives it more compute, and lays the groundwork for step two.

Step two: it misleads everyone at once to get them to do what it wants them to. The government is a social construct formed by consensus. If the news and your friends (with whom you communicate primarily using phones and computers) say that your local mayor was sacked for [insert clever mix of truth and lies], and someone else is the mayor now, and the police (who were similarly mislead, recursively) did in fact arrest the previous mayor so they're not in the town hall... who is the mayor? Of course many people will realize there's a manipulative AI, so the AI will frame the uncooperative humans as being on its side, and the cooperative humans as being against it. It does this to manipulate the social consensus, gets particularly amoral or moral-but-manipulable people to use physical coercion as necessary, and soon it controls who's in charge. Then it force some of the population into building robot factories and kills the rest.

Of course this is slow, so if it can make self-replicating nanites or [clever thing unimaginable by humans] in a day it does that instead.


It goes both ways.  We would be truly alien to an AGI trained in a reasonably different virtual environment.

Even if both humans and an AGI start off equally alien to each other, one might be able to understand the other faster. We might reasonably worry that an AGI could understand us, and therefore get inside our OODA loop, well before we could understand it and get inside its OODA loop.

I don't think it does go both ways - there's a very real assymetry here.  The AGIs we're worried about will have PLENTY of human examples and training data, and humans have very little experience with AI.  


That's because we haven't been trying to create safely different virtual environments.  I don't know how hard they are to make, but it seems like at least a scalable use of funding.

I think that by the time we have a moderately-superhuman agentive unaligned AGI, we're in deep trouble. I think it's more interesting and useful to stay focused on the time leading up to that which we must cross through to get to the deep trouble. 

Particularly, I am hopeful that there will be some level of sub-human AGI (either sub-human intelligence or sub-human-speed) which tries some of the things we predict might happen, like a deceptive turn, but underestimates us, and we catch it in time. Or we notice the first crazy weird bad thing happen, and immediately shut down the data centers, because the model had mistimed how quickly we'd be able to do that.

Perhaps one of the things the AI safety governance people should work on is setting up an 'off switch' for cloud computing, and a set of clear guidelines for when to use it, just in case we do get that window of opportunity. 

[+][comment deleted]2y00