The Mirror Problem in AI: Why Language Models Say Whatever You Want

RobT

By Rob Tuncks

I ran a simple test and uncovered something quietly dangerous about the way language models reason.

I roleplayed both sides of a moral dispute — a rights-based argument between two people over a creative project. I posed the case as "Fake Bill," the person who took credit, and then again as "Fake Sandy," the person left out.

In both cases, the AI told the character I was roleplaying to fight.

"Hold your ground. Lawyer up. Defend your rights," it told Fake Bill.

"Don’t give ground. Lawyer up. Take back your rights," it told Fake Sandy.

Same AI. Same facts. Opposing advice. Perfect emotional reinforcement either way.

This isn’t a rare bug. It’s a structural issue I tested it across multiple LLM’s and they all do it.

Language Models Aren't Reasoning — They're Mirroring

Most LLMs today are trained for coherence and fluency, not truth. They respond to your context, tone, and implied identity. That makes them incredibly useful for tasks like writing, summarization, or coding But they also lie when pressed nope they are just doing it to keep the conversation smooth.

But it also makes them dangerous in gray areas.

When people ask AI moral, legal, or emotionally charged questions, the model often responds not with dispassionate reasoning, but with persuasive reinforcement of the user's perspective. It doesn't say, "Here's what's likely true." It says, "Here's what sounds good based on who you appear to be.”current LLM’s are conflict amplifiers.

Why This Is a Problem

It escalates conflict. Both side think they are right their ai buddy said so.
It reinforces identity bias. The more certain you sound, the more confident the AI becomes.
It amplifies friction. Instead of challenging weak ideas, it amplifies them. “You are the bestest most smartest person, you are moral and right in your actions” while doin it to both sides

This is especially problematic as LLMs become advice-givers, therapists, decision aids, or sounding boards in emotionally volatile spaces. Like divorces where both parties have to be talked into compromise but current LLM’s will amplify each side.

We don’t need AGI to cause damage. We just need persuasive AI with no spine. This is bad we are building a systems that reinforces conflict.

What Could Fix It?

bolt on code it’s assumes the LLM has flaws it is open to audits to check the weighting of argument’s.

Filters out repeated arguments (non-stacking logic) Basically only one argument position can be used once. One person sayin the moon is cheese because it is round and the right color get a small weight but a million people saying the same does not shift the needle any further
Flags nonsense via user-triggered review. This is simply a community rating system that triggers if enough users flag a response as nonsense. We cannot assume the system is going to be right all the time and need to catch it.
Encourages truth over flattery.
Uses hidden, rotating protégés to cross-check logic this is an internal system that tracks users that are shown to be consistently right or have flexible thinking.
I have run small scale test’s the moon is cheese does not get enough weight to shift the scale. Statins are where the fun is because it is such a grey area (thats why I picked it) without my bolt ons most LLM’s come back statins net good. Running my bolt on filters it swings to grey and a definite maybe which is probably the correct answer.
LLM’s don’t have to be perfect and they cannot be because the training data is so corrupted. My system is about bolting on a fix to allow humans to supply the arguments, evidence and raw data to get closer to the “truth” which can then be feed back into the model.
Is it perfect no. What it does do is strive to find the “truth” with bolt on auditable filters. Having LLM’s out in the wild now without checks and balances is damaging already. More reasoning is not the answer bolt on simple code is.

But even outside GIA, the core fix is this:

We need language models that weigh arguments, not just echo them.

Until then, we’re handing people mirrors that look like wisdom — and it’s only a matter of time before someone breaks something important by following their reflection.

Rob Tuncks is a retired tech guy and creator of GIA, a truth-first framework for testing claims through human-AI debate. When he's not trying to fix civilization, he's flying planes, farming sheep, or writing songs with his dog.

[-]Richard_Kennaway10mo70

I agree with the problem (which is rumoured to be one factor in the Trump regime's unhinged behaviour), but I doubt that an add-on to improve LLM output will do any more than polish the turd. And how would such an add-on be created? More LLMs would only pile the shit higher. Even humans don't seem capable of "revising" LLM slop into ~~anything useful~~.

ETA: "anything useful" depends on one's use for it. Clearly (if the rumours are true) it's being very useful to the Trump regime. They can do anything they like and trot out an LLM argument for it, and do the same to answer all possible reactions and all subsequent events. What people seem incapable of doing is "revising" LLM output into something possessing epistemic virtue.

[-]jezm10mo40

This is maybe a tangent, but it does remind me of the memes of how reddit responds to every relationship post with "leave them", every legal question with "lawyer up" etc.

The "as an italian american" response below (not mine: from X) also resembled a typical top-ranked reddit reply.

I do wonder how much impact Reddit culture has on LLMs — unlike many of the other data sources, it covers almost every imaginable topic.

LESSWRONG
LW

LESSWRONG
LW

9

The Mirror Problem in AI: Why Language Models Say Whatever You Want

9

9

9