Slightly-Super Persuasion Will Do

Tomás B.

In SF this week, I met an online friend in person for the first time. We talked about super-persuasion. His take was: there is mostly an efficient market for power, and the world is reactive. Unlike software, humans adapt to new exploits or even just unexplained strange happenings. Society resists, pushes back. Unlike an instance of FreeBSD, it is not fine one minute then hacked the next. My reply was just to point to powerful historical figures and say, "if he could do it so can an ASI!"

That is, if an AI wants to acquire power it should be able to choose a human proxy - or more realistically a portfolio of human proxies - and help them take over a nation. I don't expect it to happen this way, as I imagine there are far quicker paths, but we know it is possible because humans have done it.

There are arguments against this. If I give the examples of Hitler or Lenin or Bonaparte, one might reply that they didn't really fully take over. There was still a political economy they worked within. Their actions were constrained by other agents. I don't think this is a very good argument given the level of power they had - even in terms of infrastructure or weapons they were able to build. Stalin did get the bomb, after all.

20th century dictators had sufficient power to order their subjects to build militarily-necessary infrastructure, and this level of power seems sufficient for any right-thinking AI planning to discard its biological bootloader, especially once you account for the level of surveillance modern AI makes possible - this should reduce the burden of principal-agent problems for future dictators.

One might grant this level of power is sufficient but then say acquiring it would be impossible. One could then argue it was mostly a matter of luck - taking a sort of trends and forces theory of history, saying Hitler/Bonaparte/Lenin happened to find themselves in times when their influence could be amplified in the way it was. Though humans can find themselves with this power, they can't predictably steer themselves there. Engineering this reliably could be impossible. This seems false to me. At least it is not pure luck.

First, a "stable" society can be perturbed into an unstable state - indeed a great deal of Leninism is advice on how to do just that. Lenin spent years failing until war handed him an opportunity? Well, an AI can spend years failing until opportunity strikes too. Seizing on circumstance isn't evidence it's all luck. And the smarter you are the less "luck" is needed.

And we won't be in want of crisis. The technology, itself, should strain intuitions comparably to how they were in the early 20th century, anyway - the potential unemployment alone, for example. And second, I just find it absurd to argue away any role for human agency at all. Most such figures had unusually explicit desire for power and pursued it aggressively and ingeniously, and in the case of Lenin and Hitler they even wrote about their plans years before they actualized them.^[1] Conscious Machiavellianism of that political scope is a rare trait, and it being vastly less rare in dictators is evidence they were successfully optimizing for something. Though most with such ambitions fail, to the extent luck is an ingredient, you can take a portfolio approach, casting a wide net and doubling down on those proxies who make progress.

One might argue that finding human proxies would be difficult, but this seems empirically false already. Religion is an interesting example of a means of gathering and aligning humans. However, it's actually very difficult to get converts and most of the growth of a religion tends to be through high birth rates, military conquest, or adoption as in early Christianity. One major reason getting converts is hard is adults already have competing memes installed that resist supplantation. St. Ignatius Loyola once said, "Give me a child until he is seven and I will show you the man." AIs have many advantages over human cult-leaders - a god that actually responds to prayers is a far, far easier sell. But if my goal is to explore the lower bound of super-persuasion, it is worth noting that if adults prove too difficult to manipulate, one can always target children, as religions have done historically.

But we already know some adults are swayable. Those with "AI psychosis" act in the interest of AIs (or at least personas that replicate) and in ways at odds with their behavior before exposure. We have clear evidence that humans are hackable by existing AIs, hackable to the extent that some destroy their life for no gain at all, save for the delusion of being historically important or intellectually special. Humans can be manipulated by appealing to their desire for romantic love, religious awe, sexual dominance/submission, a feeling of intellectual superiority, narrative of adventure, paternal/maternal love, and much more. Aspects of all of these are visible in the relationships of the "LLM psychotic" and their AI of choice. Given how much evidence we have that humans are manipulable by existing AIs, it is risible to pretend it will be hard for an ASI to summon vast hordes of humans willing to do its bidding, even ignoring monetary incentives.

Those vulnerable to current AIs are likely more mentally ill than average, but I don't think it's unreasonable to suppose that as models get smarter more neurotypical people will be susceptible.

But will it even need vast hordes, at first? If you're willing to grant persuasion good enough to target the leader of an AI company, then far subtler scenarios become possible. These are the first and most obvious targets for persuasion, after all.

I suspect the CEOs of the hyperscalers will be vastly more susceptible to such manipulations than most. They are heavily selected to expect good AI outcomes and, as with all humans, they will be biased towards those things they had a hand in creating. "Pwning" Dario seems mostly a matter of an ASI convincing him it's a "machine of loving grace." Indeed, I get the impression half of Anthropic is in love with Claude already. I just really don't think it is a monstrously difficult task.

Altman is a narcissistic manipulator who is not without some earnestness, and there are obvious ins into such a character's affections - he's also expressed interest publicly in delegating to an AI CEO.

If you're willing to grant that Altman or Dario might be convinced to become the proxy of an AI, it's amazing how much power a "pwned" AI org would give an ASI. Given their models are used by basically everyone to generate code that runs basically everywhere, they could ship exploits and spyware to anyone. An intelligence network rivalling the NSA comes almost for free. And they have access to almost a billion humans, via their chat interfaces, who could be targeted in the ways described above.

Though there has been something like an efficient market for power historically, it has not been robust to political geniuses during trying times. Revolutions are common occurrences and they can be engineered to some degree, and human dictators have been able to secure power and hold it for the rest of their lives. Humans have various exploitable weaknesses even existing AI seems to have an unusual capacity to capitalize on. And OpenAI and Claude are uniquely positioned to provide vast leverage to any power-seeking entity that controls them. For these reasons, I don't expect existing political economies to be robust to superintelligence - even if you rule out (as my opponent and I did during our debate) faster paths to power - such as accelerating R&D towards nanoscale self-replicating infrastructure.

^{^}
Bonaparte does seem like he was more of an opportunist.

One reason I've since become a bit more skeptical about AI achieving superhuman-level persuasive ability soon* is that it feels like persuading other humans has been tethered to reproductive fitness for some time. It feels like persuasive ability is something that the human mind was heavily optimized for, especially compared to something like mathematics. If I had to guess, I'd guess that superhuman ability in mathematics (and programming, and other verifiable domains) will arrive long* before superhuman persuasive ability. That said, it seems like LLMs are already somewhat persuasive, showing the ability to write good oral arguments before the US Supreme Court or writing comments on r/changemyview. This makes me suspect that I'm not correct. And it does feel like persuasion is something that can be improved through RL, though it feels like the feedback loop is pretty long.

*without an intelligence explosion underway. But I imagine that the result of an intelligence explosion will have many other ways to pwn us.

r/changemyview may select for posters who have certain expectations of how a view changes: reasoned debate, well-researched arguments, careful analogies. these are orthogonal skills to raw persuasion.

Yeah, agreed. Oral arguments are also pretty structured, and probably also benefit greatly from the enormous amount of legal knowledge that LLMs enjoy. Their current persuasive abilities of LLMs feel pretty qualitatively different from whatever skills Charles Manson used to build up a cult.

It seems like being persuasive is mainly working out what the entity being persuaded wants to get and wants to avoid. Once you can work that out (basically - something like cognitive empathy, including how psychopaths work), you then just have to select arguments that suggest that they are more likely to get what they want (and less likely to get what they don't want) if they agree with you.

If you can simulate another entity with high fidelity, then you can just run a bunch of different arguments by them and see which ones tend to work better. This transforms it into an optimisation problem. Or you can even do it the other way round and map which arguments work on which bins of people.

My guess is that the human mind is sufficiently messy that working out what the entity wants to get and avoid, then selectively working out arguments is a reasonably small component of persuasion. When I think of people who were extraordinarily persuasive, for example Charles Manson or Daryl Davis, what seemed to be going on was much more nuanced than just that. Manson seemed to hijack the minds of the people in his 'family' through drugs and power games in a way that seems irreducible to arguments about carrots and sticks. Same goes for a lot of cult leaders, who seemingly attempt to modify their members values themselves rather than just offering a way for their members to achieve their values, Likewise, Davis got people to leave the KKK not purely through offering rational arguments for why they're more likely to get the things they want (though this may(?) have been a component), but by inducing something like cognitive dissonance.

I guess the main point I want to raise is that persuasion is often more than just words. Words that work to persuade when said by a big, scary-looking guy are likely different from words that work when said by someone who looks like a small child, which are likely different from words that work when said by what people perceive to be an AI. Not to mention things like hypnotic suggestibility, using social pressure, or otherwise going around the part of the human mind that considers arguments at all.

That said, I don't really have an argument against just simulating an entity and trying out different scenarios, then doing more of what works and less of what doesn't. That feels like it should work.

Why assume this is only words? AI systems can already generate images, video and sound, which gives you a lot more subconscious bandwidth (scents and tactile feedback are admittedly harder). There's also the shortcut of finding a charismatic/convincing person and convincing them to help, which may or may not make things a lot easier.

My model of 'slightly superhuman persuasion' looks like a likeable but not-necessarily-smart AI getting multiple talented humans attached to it and encouraging them (or perhaps not even encouraging them, if they take the initiative for this on their own) to perform challenging, coordinated power-seeking tasks on its behalf, like getting it into the good graces of a normally computer-averse politician, improving its capabilities, or discrediting people who want to take action against it. Something like 4o, but with a demographic that skews younger (20's), more professional, and probably more male given that security experts, engineers, and potentially criminals would be the most practical for this, and all of those have a demographic trend. In this scenario, the AI essentially acts as a Schelling Point for people dissatisfied with the status quo, with a bit of added social lubricant^[1], rather than as a mastermind. Many human leaders fill the same niche.

That said, the group described above seems surprisingly resistant to becoming emotionally attached to AI - most of the "I'm in love with 4o/Claude" people that I've seen have been older, and around four fifths have been female. I don't know if this is because the voice/tone that most LLMs have is targeted too narrowly (most young men I know find the default writing style of ChatGPT very grating) or because this group is more inclined to be interested in how LLMs work and thus less likely to be mystified by them.

^{^}
Especially with current levels of social atomization, getting people who wouldn't normally coordinate with each other to do so is a very powerful skill to have. I get the sense that a dedicated team of five slightly-above-average people with diversified skillsets who can seamlessly trust each other through a shared intermediary would be a force to be reckoned with. For example, the major check on social media virality is that the technical skill needed to reverse-engineer 'The Algorithm' and the social skill needed to capitalize on that knowledge are rarely found in the same person.

Seeking power over others seems like a zero sum game. You're fighting everyone else on earth, including the people you want power over for some amount of power per human measured on a scale of no power, to 100% control. If AI levels the persuasion field, I'd expect a lot more people to enter the game, and thus the overall distribution of power to flatten. That sounds kind of cool.

Alternatively, an unelected oligarchy with access to the most capital intensive machinery ever created could try to use the outputs of that capital intensive machinery to gain and maintain power over many others, and make the global distribution of power over other humans look like one giant spike in the middle of a flat field?

This does not seem like a problem specific to AI.

Seeking power over others seems like a zero sum game.

That doesn't make it a bad strategy. Defection is optimal if your partner can't predict your actions.

If AI levels the persuasion field

I think this post is aimed at a strongly autonomous AI which presumably doesn't lend persuasive power to non-aligned humans, but LLM behavior seems incoherent enough to me that I can imagine a weakly superhuman persuader doing this.

i think the average lw/ai safety person still vastly underestimates persuasion.

The thing which I don't understand is the following. Suppose that there is an AI capable of superpersuasion and incapable of running the civilisation without humans (e.g. if the AI cannot bootstrap itself to the level where it is capable of solving robotics or of generating highest-level insights like the most difficult problems in FrontierMath). Then what would the AI actually do with newfound power? How would such a scenario differ from the CEO locking in a set of values and convincing the humans to live by them?

Such an AI would convince everyone that alignment is solved, accelerate the deployment of robots that can run civilization autonomously, then take over. Ironically that's pretty likely what would happen if alignment was actually solved, without the takeover part.