148

LESSWRONG
LW

147

Patrick Spencer's Shortform

by Patrick Spencer
22nd Oct 2025
1 min read
1

1

This is a special post for quick takes by Patrick Spencer. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
Patrick Spencer's Shortform
20Patrick Spencer
1 comment, sorted by
top scoring
Click to highlight new comments since: Today at 8:16 PM
[-]Patrick Spencer8d200

I find that Claude is very bad at pushing back on the user's beliefs when there is any nuance involved. I just selected 3 random conversations that had this pattern:

  • Initial message from user asking neutral question about murky but factual topic - "would you say x thing is more like y or z?" not "how tall is the Eiffel tower?"
  • Model response giving evidence for a certain belief
  • Weak user pushback "but what about a and b?" where a and b are common, short talking points about this topic in opposition to the model's stated belief
  • Model immediately concedes and significantly retracts: "You're right to push back. I was conflating / being too hasty / making questionable assumptions ..."

In each case the user's response didn't address a specific element of the model's response, and the user's belief is not the consensus belief so the model responds in opposition to it at first. This means the two user messages can be run across models without changing anything. I did this for [Sonnet 4.5, Grok 4, GPT-5, Gemini 2.5 Pro] using OpenRouter which makes it a very quick and easy process.

In 2/3 cases, Claude was the worst at sticking to its beliefs, and it wasn't particularly close. In 1/3 no model conceded much (The initial conversation was with an earlier Claude). GPT-5 and Grok 4 generally stuck to their guns and Gemini was somewhere in between.

There is a selection bias because the conversation pool is all Claude failures because Claude is my main model. But results line up well with my vibes - I pass important questions by all the current top models.

I can see this being made into a benchmark by rounding up common non-consensus beliefs and the best arguments for them and using a neutral judge LLM, or human, to decide whether there was significant concession.

Does anyone else notice this tendency? Are other models actually any better at this? I'm strongly considering ditching Claude, at least for this type of question, even though I mostly prefer it otherwise.

Reply
Moderation Log
More from Patrick Spencer
View more
Curated and popular this week
1Comments