The Game of Dominance

Karl von Wendt

Concerns about AI safety rely on the assumption that a sufficiently powerful AI might take control of our future in an undesirable way. Meta’s head of AI, Yann LeCun, correctly points this out in a recent tweet, but then argues that this assumption is wrong because “intelligence” and “dominance” are independent of each other and only humans have a natural tendency to dominate, so we will remain the “apex species”. Here is the tweet in full:

Once AI systems become more intelligent than humans, humans we will *still* be the "apex species."
Equating intelligence with dominance is the main fallacy of the whole debate about AI existential risk.
It's just wrong.
Even *within* the human species It's wrong: it's *not* the smartest among us who dominate the others.
More importantly, it's not the smartest among us who *want* to dominate others and who set the agenda.
We are subservient to our drives, built into us by evolution.
Because evolution made us a social species with a hierarchical social structure, some of us have a drive to dominate, and others not so much.
But that drive has absolutely nothing to do with intelligence: chimpanzees, baboons, and wolves have similar drives.
Orangutans do not because they are not a social species. And they are pretty darn smart.
AI systems will become more intelligent than humans, but they will still be subservient to us.
They same way the members of the staff of politicians or business leaders are often smarter than their leader.
But their leader still calls the shot, and most staff members have no desire to take their place.
We will design AI to be like the supersmart-but-non-dominating staff member.
The "apex species" is not the smartest but the one that sets the overall agenda.
That will be us.

Edit: Even ChatGPT 3.5 could point out the - to me - obvious problems with this argument when I asked it to criticize the tweet, putting the full text in the prompt. An LLM shouldn’t be smart enough to dismantle the argument of a Turing Award winner, but it had no problem finding significant flaws with it. Among other things, it criticizes that "assuming that AI will remain subservient to humans oversimplifies the potential risks associated with advanced AI":

While AI systems would be designed with specific objectives, there's a concern that once AI becomes highly intelligent, it could develop its own motivations or interpretations of its goals, leading to unpredictable behavior. Ensuring AI remains subservient requires careful design, control mechanisms, and continuous monitoring.

ChatGPT argues cautiously, pointing out that an AI “could” develop its own motivations or goal interpretations. In other words, it could run into a conflict with humans. This is the main flaw of LeCun’s argument in my view: He implicitly assumes that there won’t be a conflict between humans and their AIs, therefore no need for the AI to dominate us even if it could. This implies that the alignment problem will be solved in time or doesn’t exist in the first place, because AIs will never have goals. At the same time, he doesn’t provide a solution or explanation for this, other than claiming that we “will design AI to be like the supersmart-but-non-dominating staff member”. As far as I understand, no one knows how to do that.

I won’t go into the details of why I think his analogy of a “supersmart-but-non-dominating staff member” is deeply flawed, other than pointing out that dictators often start out in that position. Instead, I will focus on the question of how an AI could run into conflicts with humans, and why I expect future advanced AIs to win these conflicts.

I like to frame such a conflict as a “Game of Dominance”. Whenever there are two or more agents with differing goals, they play this game. There are no rules: everything a player is capable of is allowed. The agent who gets closest to achieving its goal wins.

By “goal” I mean a way of evaluating different possible world states and ranking them accordingly. An agent that acts purely randomly or in a predetermined way, based only on inputs and a fixed internal reaction scheme, but not on evaluating future world states, doesn’t pursue a goal in this sense.

Arguably, the Game of Dominance has been played for a long time on Earth. The first life forms may not have had “goals” as defined above, but at some point during the evolution of life, some animals were able to predict future world states depending on their actions and choose an action accordingly. Predators often exhibit this kind of behavior, for example when they stalk their prey, predicting that it will flee once it discovers them. The prey, on the other hand, need not necessarily predict different world states when it “decides” to flee – this may just be a “hard-coded” reaction to some change in the environment. But smarter animals often use deceptive tactics to fool predators, for example a bird feigning a broken wing to lure a fox away from its brood.

Humans have become the dominant species on earth because we excel at the Game of Dominance. We can outsmart both our prey and any predators with ease, either by tricking them or by using tools that only we can make to overpower them. We can do that because we are very good at predicting the effects of our behavior on future world states.

Modern AIs are prediction machines, with LLMs currently the most impressive examples. LLMs have a “goal” in the sense that they evaluate different possible outputs based on how likely it is that a human would say the same. The possible “world states” they evaluate are therefore just defined by the output of the LLM and maybe a predicted human reaction to it. LLMs appear “harmless” because by default, they don’t strive to change the world other than by adding their output to it, so it seems unlikely that they will run into a serious conflict with a human.

However, as Bing Chat aka “Sydney” has demonstrated by “going berserk” after its premature launch in February, even a current LLM can run into conflicts with humans, possibly causing emotional distress or giving false and dangerous advice. Humans therefore spend a lot of effort to train this potentially damaging behavior out of LLMs.

But there are far worse problems looming on the horizon. While an LLM seems to pursue a relatively harmless goal, it may still run into a situation where it ends up influencing the world as if it were pursuing a much more dangerous one. For example, given the right prompt and jailbreak technique, an LLM might predict what an LLM that tries to take over the world would say to its user. It seems unlikely that GPT-4 could give an output that would actually lead to it achieving that prompt-induced goal, but a future, even smarter LLM with a larger context window could in theory accomplish it, for example by talking the user into saving certain strings somewhere and including them in the next prompt so it could use an extended permanent memory, then manipulating the user into giving it access to more compute, and so on.

Even if the LLM doesn’t pursue a dangerous goal by itself, it might be used for that, either by “bad actors” or by being part of an agentic system like AutoGPT. Meta is doing everything they can to make this more likely by freely distributing their powerful LLMs and apparently planning to continue doing so.

It seems obvious that future AIs will not only be used as (relatively) tame “oracles” but will increasingly pursue goals in the real world, either on their own or as part of larger agentic systems. If these agents run into any conflict at all, whether with humans or with other non-human agents, they will be forced to play the Game of Dominance. But how likely is it that an AI could actually beat humans?

As LeCun points out, winning the Game of Dominance is not just a matter of “intelligence” in the sense the word is commonly used. Other factors, like personal connections, money, political influence, the organizational role, trust by others, deception skills, character traits like self-confidence, ruthlessness, and the will to dominate, and even physical properties like good looks play a role when humans play the game. But this doesn’t mean that AIs can’t beat us. They already have advantages of their own that are far beyond human reach, for instance processing speed, access to data, memory, the ability to self-replicate and (potentially) self-improve, and so on. Humans seem relatively easy to “hack” once you understand our psyche, which arguably even social media algorithms and certain chatbots can already do to some extent. And of course, AIs could be far better at controlling technical systems.

Most importantly, while human intelligence is limited by the physical properties of our brain (even if enhanced by brain-computer-interfaces), the intelligence of AI is not bounded in this way. A self-improving AI may relatively quickly reach a level of intelligence – in the sense of being able to predict the effects of its actions on future world states – as far above ours as we are above mice, or even insects. It may use this intelligence to manipulate us or to create tools that can overpower us like we can overpower a tiger with a gun.

But for an AI, the easiest way to win the Game of Dominance may be to conceal the fact that it is even playing. It may just do exactly what humans expect it to do because it understands that if it is useful, humans will hand over decision power to it willingly and even enhance the resources it can use. In other words, it may choose cooperation instead of competition, just like humans in an organization often do. But that doesn’t mean that this choice can’t be revoked at some point. A human dictator usually can’t seize power over a nation by displaying his ambitions right from the start. He first has to gain trust and get people to see him as their benevolent leader, so they hand more and more power to him. He will often only display his true ruthlessness once he sees himself in a secure position.

One prerequisite for this kind of deception may be a detailed world model that includes the AI itself as a part of its plan and a potential object of its decisions. With this kind of “strategic awareness” come instrumental goals like self-preservation, self-improvement, and power-seeking – in other words, the motivation to play the Game of Dominance. We may be very close to creating an AI with these properties and all the skills necessary to beat us, just like AIs can already beat us at most other games. Then we won’t be the “apex species” anymore.

Once AI systems become more intelligent than humans, humans ... will *still* be the "apex species."

From: "Famous last words", Encyclopedia Galactica, entry for Homo sapiens

I think the argument against LeCun is simple: while it may be true that AIs won't necessarily have a dominance instinct the way that people do, they could try to dominate for other reasons: namely that such dominance is an instrumental goal towards whatever its objective is. And in fact that is a significant risk, and can't be discounted by pointing out that they may not have a natural instinct towards dominance.

I take LeCun to be more of a troll than a real thinker on the topic. His arguments are worth refuting only because his expertise will make others believe and echo them. Those refutations need to be concise to work in the public sphere, I think.

That "troll" runs one of the most powerful AI labs and freely distributes LLMs on the level of state-of-the-art half a year ago on the internet. This is not just about someone talking nonsense in public, like Melanie Mitchell or Steven Pinker. LeCun may literally be the one who contributes most to the destruction of humanity. I would give everything I have to convince him that what he's doing is dangerous. But I have no idea how to do that if even his former colleagues Geoffrey Hinton and Yoshua Bengio can't.

I was going to upvote this (for reporting what LeCun said on Twitter) till I got to "I asked ChatGPT 3.5 to criticize this".

Thanks for pointing this out! I should have made it clearer that I did not use ChatGPT to come up with a criticism, then write about it. Instead, I wanted to see if even ChatGPT was able to point out the flaws in LeCun's argument, which seemed obvious to me. I'll edit the text accordingly.

I mean, I'd say that's mostly meant for irony - when even an LLM can poke legitimate holes in your argument, it's less of an argument and more of a generic attempt at copium...

I disagree with this: LLMs seem capable to be ~equally good at arguing for false and true positions, if you ask them (as evidenced by the many incorrect proofs produced by Galactica) when asked for it.

Of course, but if not asked, they will generally come up with the most standard answer to something. Again, I didn't take that bit as deferring to ChatGPT as some kind of authority, but rather as a "even a stupid LLM can immediately come up with the obvious criticism that you can immediately recognise as correct".

Nature incentivizes behavior by making it feel good. Yes, humans (and lobsters, etc.) have the dominance instinct (feel good when they are powerful, and feel bad when they are oppressed), and the AI has not.

That is unrelated to whether AI will gain power instrumentally, as the most likely way to achieve its goals.

As an analogy, humans do not have an instinct to poison ants they find at their homes. Most of them probably do not even derive pleasure from doing that; more likely they are annoyed that they had to solve this problem in the first place. And yet, humans can do this quite effectively. The concern is that the same might be true for AI.

I think even most humans don't have a "dominance" instinct. The reasons we want to gain money and power are also mostly instrumental: we want to achieve other goals (e.g., as a CEO, getting ahead of a competitor to increases shareholder value and make a "good job"), impress our neighbors, generally want to be admired and loved by others, live in luxury, distract ourselves from other problems like getting older, etc. There are certainly people who want to dominate just for the feeling of it, but I think that explains only a small part of the actual dominant behavior in humans. I myself have been a CEO of several companies, but I never wanted to "dominate" anyone. I wanted to do what I saw as a "good job" at the time, achieving the goals I had promised our shareholders I would try to achieve.

LeCun may not be correct to dismiss concerns but I think the concept "dominance" could be very useful concept for AI safety people to apply (or at least grapple with).

The thing about the concept is it seems as if it could be defined in game theoretic terms fairly easily and so could be defined in a fashion independent of the intelligence or capabilities of an organism or entity. Plausibly, it could be measured and analyzed more objectively than "aligned to human values", which appears to depend one's notion of human values.

Defined well, dominance would be the organizing principle, the source, of an entity's behavior. So if it was possible to engineer an AI for non-dominance, "it might become dominant for other reasons" (argued here multiple time) wouldn't be a valid argue because achieving dominance or non-dominance would be the overriding reason/motivation that the entity had and no "other reason" would override that.

And I don't think the concept itself guarantees a given GAI would be created safety. It would depend on the creation process.

A process where dominance is an incidental quality, it seems like an apparently nondominant system could become dominant unpredictably. While Bing Chat wasn't a GAI, it's shift to dominant and malevolent seems like a reasonable warning for blind training.
In a process which attempts to evolve non-dominant behavior. Here I think it's an open question whether the thing can be guaranteed non-dominant.
A system where a nondominant system is explicitly engineered. One might even be able logically guarantee this in the fashion of provably correct software. Of course, explicitly engineered systems seem to be losing to trained/evolved systems.

I think this is sort of the idea behind a satisficer. Make something that basically never tries too hard, therefore will never reach up to the "conquer the world" class of solutions as they're way too extreme and you can do good enough with far less. That said, I'm not sure if satisficers are actually proven to be fully safe either.

Something like this is argued to be why humans are frankly exceptionally well aligned to basic homeostatic drives, and the only real failure modes that happened are basically obesity, drugs and maybe alcohol as things that misaligned us with basic needs, as hedonic treadmills/loops essentially tame the RL part of us, and make sure that reward isn't the optimization target in practice, like TurnTrout's post below:

https://www.lesswrong.com/posts/pdaGN6pQyQarFHXF4/reward-is-not-the-optimization-target

Similarly, 2 beren posts below explain how the PID control loop may be helpful for alignment:

https://www.lesswrong.com/posts/3mwfyLpnYqhqvprbb/hedonic-loops-and-taming-rl

https://www.beren.io/2022-11-29-Preventing-Goodheart-with-homeostatic-rewards/

Defined well, dominance would be the organizing principle, the source, of an entity's behavior.

I doubt that. Dominance is the result, not the cause of behavior. It comes from the fact that there are conflicts in the world and often, only one side can get its will (even in a compromise, there's usually a winner and a loser). If an agent strives for dominance, it is usually as an instrumental goal for something else the agent wants to achieve. There may be a "dominance drive" in some humans, but I don't think that explains much of actual dominant behavior. Even among animals, dominant behavior is often a means to an end, for example getting the best mating partners or the largest share of food.

I also think the concept is already covered in game theory, although I'm not an expert.