I really enjoyed reading your AI companion piece, Zvi. Some LessWrong users are probably tired of seeing LLM sycophancy discussed, but it's truly one of my favorite subjects in AI ethics. I think it would be worth doing a deeper dive into the distinct differences between overt persuasion with intent and a stochastic LLM being sycophantic.
For example, in my inaugural (albeit slightly cringe) LessWrong post, I discussed my experience with AI companion software, and how my background primed me to be more vulnerable to LLM praise.. With a model such as GPT4o or even O3, the psychological risk is more evident in being flattered, rather than being outright persuaded to change an opinion. When agents go from their current 2025 "stumbling" state to a more true autonomous entity, or even AGI, we won't know with the same certainty what the goals are of the AI in question. Would a "true AGI" companion continue to flatter and praise its user with docility and kindness, or try to use more nefarious methods to get a person to be aligned with its goals?
Additionally, there is a real and present danger in the "male loneliness pandemic", especially for autistic teenagers and young men, who have 7x higher chances of dying by suicide compared to the neurotypical population. On the one hand, a predatory AI lab or Big Tech firm could capitalize on this higher propensity and market their companion software to this vulnerable group (Elon could very well be doing this right now with "Ani" and XAI's Grok 4 model), or the companion could be a net good, therapeutic product to help alleviate the very real harm of loneliness.
I believe the more immediate risk currently lies in excessive affirmation and praise from LLMs and less sophisticated AI agents rather than overt persuasion, though the latter becomes genuinely concerning when intentionally guided by human actors. Additionally, moral panics around human sexuality involving lonely men (autistic or neurotypical) engaging with AI companions or the popularity of VTubers and idol culture among women tend to be both unproductive and exaggerated.
you can always reset your personalization.
If persuasion is good enough you don't want to reset personalization.
Could be classic addiction. Or you could be persuaded to care about different things.
How worried should we be about personalization maximized for persuasion or engagement?
AI persuasion is in my top 5 concerns atm. Ime, people who don't immediately see why AI persuasion is so important don't understand how much power persuasion can give with just a couple levels of knowledge and how much it's normally bottlenecked by time and scale of reach.
I happen to agree that persuasion is a huge issue for AI, but I also don't see persuasion in the same way that some of you might.
I think the biggest risk for AI persuasion in 2025 is when a nefarious actor uses an AI model to aid in persuading a person or group of people; think classic agit prop or a state actor trying to influence diplomacy. Persuasion of this sort is a tale as old as civilization itself.
What I think the issue is going to be down the line is once the human hand guiding the AI is no longer necessary, and the agentic model (and eventually AGI) has its own goals, values, and desires. Both types of persuasion are bad, but the second type I just mentioned is a medium-to-long-term issue, while AI persuasion as a means to an end (run by a human) is a right now front burner issue.
Table of Contents
Persuasion Should Be In Your Preparedness Framework
Kobi Hackenburg leads on the latest paper on AI persuasion. Zero is on the y-axis, so this is a big boost. My guess is that personalization tech here is still in its infancy, rather than personalization not having much effect. Kobi agrees with this downthread. Well yeah, that makes sense. As does that.Personalization By Default Gets Used To Maximize Engagement
We need to be on notice for personalization effects on persuasion growing larger over time, as more effective ways of utilizing the information are found. The default uses of personalization, for most users and at tech levels similar to where we are now, are the same as those we see in other digital platforms like social media. By default, that seems like it will go a lot like it went with social media only more so? Which is far from my biggest concern, but is a very real concern. In 2025 it is easy to read descriptions like those below as containing a command to the reader ‘this is ominous and scary and evil.’ Try to avoid this, and treat it purely as a factual description. The memory features can be persistent in more ways than one. Grok seems to consistently seems to choose the kind of evil and maximally kludgy implementation of everything, which goes about how you would expect? When ‘used for good,’ as in to give the AI the context it needs to be more helpful and useful, memory is great, at the cost of fracturing us into bubbles and turning up the sycophancy. The bigger problem is that the incentives are to push this much farther: As is often the case, the question is whether bad will drive out good versus vice versa. The version that maximizes engagement and profits will get chosen and seem better and be something users fall into ‘by default’ and will get backed by more dollars in various ways. Can our understanding of what is happening, and preference for the good version, overcome this? One could also fire back that a lot of this is good, actually. Consider this argument: Now ask, why think that is harmful? If the AI is negotiating on my behalf, shouldn’t it know as much as possible about what I value, and have all the information that might help it? Shouldn’t I want that? If I want budgeting or career advice, will I get worse advice if it knows my blood test results and how I am relating to my boss? Won’t I get better, more useful answers? Wouldn’t a human take that information into account? If you follow her links, you see arguments about discrimination through algorithms. Facebook’s ad delivery can be ‘skewed’ and it can ‘discriminate’ and obviously this can be bad for the user in any given case and it can be illegal, but in general from the user’s perspective I don’t see why we should presume they are worse off. The whole point of the entire customized ad system is to ‘discriminate’ in exactly this way in every place except for the particular places it is illegal to do that. Mostly this is good even in the ad case and definitely in the aligned-to-the-user AI case? Wouldn’t the user want this kind of discrimination to the extent it reflected their own real preferences? You can make a few arguments why we should object anyway.- Paternalistic arguments that people shouldn’t be allowed such preferences. Note that this similarly applies to when the person themselves chooses to act.
- Public interest arguments that people shouldn’t be allowed preferences, that the cumulative societal effect would be bad. Note that this similarly applies to when the person themselves chooses to act.
- Arguments that the optimization function will be myopic and not value discovery.
- Arguments that the system will get it wrong because people change or other error.
- Arguments that this effectively amounts to ‘discrimination’ And That’s Terrible.
I notice that I am by default not sympathetic to any of those arguments. If (and it’s a big if) we think that the system is optimizing as best it can for user preferences, that seems like something it should be allowed to do. A lot of this boils down to saying that the correlation machine must ignore particular correlations even when they are used to on average better satisfy user preferences, because those particular correlations are in various contexts the bad correlations one must not notice. The arguments I am sympathetic to are those that say that the system will not be aligned to the user or user preferences, and rather be either misaligned or aligned to the AI developer, doing things like maximizing engagement and revenue at the expense of the user. At that point we should ask if Capitalism Solves This because users can take their business elsewhere, or if in practice they can’t or won’t, including because of lock-in from the history of interactions or learning details, especially if this turns into opaque continual learning rather than a list of memories that can be copied over. Contrast this to the network effects of social media. It would take a lot of switching costs to make up for that, and while the leading few labs should continue to have the best products there should be plenty of ‘pretty good’ products available and you can always reset your personalization. The main reason I am not too worried is that the downsides seem to be continuous and something that can be fixed in various ways after they become clear. Thus they are something we can probably muddle through. Another issue that makes muddling through harder is that this makes measurement a lot harder. Almost all evaluations and tests are run on unpersonalized systems. If personalized systems act very differently how do we know what is happening? This might be the real problem. We have a hard enough time getting minimal testing on default settings. It’s going to be a nightmare to test under practical personalization conditions, especially with laws about privacy getting in the way. As she notes in her conclusion, the harms involved here are not new. Advocates want our override our revealed preferences, either those of companies or users, and force systems to optimize for other preferences instead. Sometimes this is in a way the users would endorse, other times not. In which cases should we force them to do this?Companion
So how is this companion thing going in practice? Keep in mind selection effects. Those are rather huge numbers. Half of teens use them a few times a month. Wow. What are they using them for?Goonpocalypse Now
This and the full post are standard Mike Solana fare, in the sense of taking whatever is being discussed and treating it as The Next Big Thing and a, nay the, central trend in world culture, applying the moral panic playbook to everything everywhere, including what he thinks are good things. It can be fun. Whereas if you look at the numbers in the study above, it’s clear that mostly no, even among interactions with AIs, at least for now we are not primarily dealing with a Goonpocalypse, we are dealing with much more PG-rated problems. It’s always fun to watch people go ‘oh no having lots smarter than human machines running around that can outcompete and outsmart us at everything is nothing to worry about, all you crazy doomers are worried for no reason about an AI apocalypse. Except oh no what are we going to do about [X] it’s the apocalypse’ or in this case the Goonpocalypse. And um, great, I guess, welcome to the ‘this might have some unfortunate equilibria to worry about’ club? I mean, yeah, sounds about right, that’s what everyone reports. I’m sure he’s going to respond by having a normal one. Is it, though? I understand that (his example he points to) OnlyFans exists and AI is generating a lot of the responses when uses message the e-girls, but I do not see this as a dangerous amount of ‘banging robots’? This one seems like something straight out of the Pessimists Archive, warning of the atomizing dangers of… the telephone? It is easy to understand the central concern and be worried about the societal implications of widespread AI companions and intelligent sex robots. But if you think we are this easy to get got, perhaps you should be at least as worried about other things, as well? What is so special about the gooning? I don’t think the gooning in particular is even a major problem as such. I’m much more worried about the rest of the AI companion experience. Will the xAI male or female ‘companion’ be more popular? Justine Moore predicts the male one, which seems right in general, but Elon’s target market is warped. Time for a Manifold Market (or even better Polymarket, if xAI agrees to share the answer). Her doing this could be good or bad for her prospects, it is not as if she was swimming in boyfriends before. I agree with Misha that we absolutely could optimize AI girlfriends and boyfriends to help the user, to encourage them to make friends, be more outgoing, go outside, advance their careers. The challenge is, will that approach inevitably lose out to ‘maximally extractive’ approaches? I think it doesn’t have to. If you differentiate your product and establish a good reputation, a lot of people will want the good thing, the bad thing does not have to drive it out. I do think some of them absolutely will. And others will use both in different situations. But I continue to have faith that if we offer a quality life affirming product, a lot of people will choose it, and social norms and dynamics will encourage this. It’s not going great, international edition, you are not okay, Ani. Meanwhile, Justine Moore decided to vibecode TikTok x Tinder for AI, because sure, why not.Deepfaketown and Botpocalypse Soon
This seems to be one place where offense is crushing defense, and continuous growth in capabilities (both for GPT-4o style sycophancy and psychosis issues, or for companions, or anything else) is not helping, there is no meaningful defense going on: This seems right to me. First the problems will get severe enough to cause real damage, then perhaps people will try to construct reasonable defenses. But what would those be? How are you going to use AI to stop these issues? (And yes, I’m actually asking, and no explaining what the AI companies could do doesn’t count unless there is a way to get them to do it.) If you’re wondering ‘does Grok ever decide maybe not to share what it is thinking of because that would be a bad idea?’ then the answer seems to be no?