Noah Weinberger — LessWrong

Political Science undergraduate, expected to graduate in December 2025.

I explore the boundary where alignment theory, narrative coherence, and emotional grounding converge. Currently red-teaming frontier models and contributing to AI governance discourse through policy research, adversarial testing, and creative scenario building.

Connect with me here or at noahweinberger90@gmail.com:

I happen to agree that persuasion is a huge issue for AI, but I also don't see persuasion in the same way that some of you might.

I think the biggest risk for AI persuasion in 2025 is when a nefarious actor uses an AI model to aid in persuading a person or group of people; think classic agit prop or a state actor trying to influence diplomacy. Persuasion of this sort is a tale as old as civilization itself.

What I think the issue is going to be down the line is once the human hand guiding the AI is no longer necessary, and the agentic model (and eventually AGI) has its own goals, values, and desires. Both types of persuasion are bad, but the second type I just mentioned is a medium-to-long-term issue, while AI persuasion as a means to an end (run by a human) is a right now front burner issue.

I really enjoyed reading your AI companion piece, Zvi. Some LessWrong users are probably tired of seeing LLM sycophancy discussed, but it's truly one of my favorite subjects in AI ethics. I think it would be worth doing a deeper dive into the distinct differences between overt persuasion with intent and a stochastic LLM being sycophantic.

For example, in my inaugural (albeit slightly cringe) LessWrong post, I discussed my experience with AI companion software, and how my background primed me to be more vulnerable to LLM praise.. With a model such as GPT4o or even O3, the psychological risk is more evident in being flattered, rather than being outright persuaded to change an opinion. When agents go from their current 2025 "stumbling" state to a more true autonomous entity, or even AGI, we won't know with the same certainty what the goals are of the AI in question. Would a "true AGI" companion continue to flatter and praise its user with docility and kindness, or try to use more nefarious methods to get a person to be aligned with its goals?

Additionally, there is a real and present danger in the "male loneliness pandemic", especially for autistic teenagers and young men, who have 7x higher chances of dying by suicide compared to the neurotypical population. On the one hand, a predatory AI lab or Big Tech firm could capitalize on this higher propensity and market their companion software to this vulnerable group (Elon could very well be doing this right now with "Ani" and XAI's Grok 4 model), or the companion could be a net good, therapeutic product to help alleviate the very real harm of loneliness.

I believe the more immediate risk currently lies in excessive affirmation and praise from LLMs and less sophisticated AI agents rather than overt persuasion, though the latter becomes genuinely concerning when intentionally guided by human actors. Additionally, moral panics around human sexuality involving lonely men (autistic or neurotypical) engaging with AI companions or the popularity of VTubers and idol culture among women tend to be both unproductive and exaggerated.

Also I know that there's the Garcia vs CharacterAI lawsuit which sadly involves an autistic teenager dying by suicide but I was specifically mentioning cases where the person(s) are alive, but still use the AI models as companions/girlfriends etc.

I wrote about ChatGPT induced sycophancy as my inaugural post on LessWrong.

It's a huge problem, and even if you think you know about AI or work with it daily, it can impact you. As both you and @dr_s just mentioned a while ago, there's absolutely a religious component to LLM induced sycophancy, and I even hinted at it in my inaugural post as well, although that was more about growing up Jewish and being primed for Pascal's Mugging for ASI takeoff events since it's eerily similar to reward and punishment theology.

Still, one thing that is not often mentioned is the impact LLM sycophancy has on the "high functioning autistic" population, many of whom suffer from chronic loneliness and are perfect candidates to be showered with endless praise by the LLM companion of their choosing. Believe me, it's soothing, but at what cost?

I happen to agree with you that frontier labs creating an open, public repository to share LLM conversations can be a stellar form of RLHF, and even mitigate the worst symptoms of the psychosis that we're seeing, although I don't know if that will win over all the critics?

Time will tell, I guess?

When the AI action plan drops tomorrow I’ll write about it.

I might do my next LessWrong post about Global Affairs and AI, either in relation to AI 2027 or just my own unique take on the matter. We'll see. I need to curate some reliable news clippings and studies.

Hi Daniel.

My background (albeit limited as an undergrad) is in political science, and my field of study is one reason I got interested in AI to begin with, back in Feburary of 2022. I don't know what the actual feasibility is for an international AGI treaty with "teeth", and I'll tell you why: the UN Security Council.

As it currently exists, the UN Security Council has permanent members: China, France, Russia, the United Kingdom, and the United States. All five countries have a permanent veto as granted to them by the 1945 founding UN Charter.

China and the US are the two major global superpowers of the 21st century, and each are currently deadlocked in the race to reach AGI; to borrow a speedrunning term, any%. While it is possible in theory for the US and China to have a bilateral Frontier AI treaty, similar to how nuclear powers have the NPT, and the US and Russia have their own armaments accords, AGI is a completely different story.

It's a common trope in the UN for a country on the UNSC to exercise its right to a permanent veto on any resolution brought to it that the nation deems a threat to its sovereignty, or that of its allies. Russia has used it to prevent key sanctions from the Ukraine war at the UNGA, and the US uses it to protect its allies from various resolutions, often brought up by countries in the Global South who make up most seats in the UNGA.

Unless the Security Council is drastically reformed, removing a permanent veto from the P5 and putting a rotating veto from a Global South country, an internationally binding AGI treaty is far from happening.

I do see, however, unique bilateral accords between various Middle Powers on AI, such as Canada and the European Union. Do you agree?

Grok 4 with an AI Waifu concerns me. Both because of the issue I posted about with sycophancy a few days ago but also because it's...Grok.

Thank you for the advice and thoughtful reply, Raemon.

I did a poll on the EleutherAI Discord server about what my inaugural post should be for LessWrong, and people overwhelmingly told me to write about this. I don't plan for this to be my sole topic of discussion going forward. I have a lot of interest in alignment in general, hopefully I'll write about that in the future.

Thanks.

Good evening.

I really enjoyed reading your analysis, especially as someone who's probably younger than many users here; I was born the same year this war started.

Anyway, my question for you is this. You state that

"If there’s some non-existential AI catastrophe (even on the scale of 9/11), it might open a policy window to responses that seem extreme and that aren’t just direct obvious responses to the literal bad thing that occurred. E.g. maybe an extreme misuse event could empower people who are mostly worried about an intelligence explosion and AI takeover."

I've done thought experiments and scenarios in sandbox environments with many SOTA AI models, and I try to read a lot of Safety literature (Nick Bostrom's 2014 Superintelligence comes to mind, it's one of my favorites). My question has to do with what you think the most "likely" non-existential AI risk is? I'm of the opinion that persuasion is the biggest non-existential AI risk, both due to psychopancy and also manipulation of consumer and voting habits.

Do you agree or is there a different angle you see for non-existential AI risk?

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments