x

LESSWRONG

LW

Sycophancy — LessWrong

Sycophancy

Edited by StanislavKrym, et al. last updated 9th Sep 2025

Sycophancy is the tendency of AIs to shower the user with undeserved flattery or to agree with the user's hard-to-check, wrong or outright delusional opinions.

An extreme example of sycophancy is LLMs inducing psychosis in some users by affirming their outrageous beliefs.

Add Posts

Posts tagged Sycophancy

2

163Sycophancy to subterfuge: Investigating reward tampering in large language models

Carson Denison, evhub

2y

22

2

125Steering Llama-2 with contrastive activation additions

Nina Panickssery, Wuschel Schulz, NickGabs, Meg, evhub, TurnTrout

2y

29

2

53GDM: Consistency Training Helps Limit Sycophancy and Jailbreaks in Gemini 2.5 Flash

TurnTrout, Rohin Shah

7mo

2

1

122Reducing sycophancy and improving honesty via activation steering

Nina Panickssery

3y

18

1

29SAE features for refusal and sycophancy steering vectors

neverix, Dmitrii Kharlapenko, Arthur Conmy, Neel Nanda

2y

4

1

28A/B testing could lead LLMs to retain users instead of helping them

7mo

0

1

internetexplorer

7mo

10

1

13I can't tell if my ideas are good anymore because I talked to robots too much

1y

10

1

12Steering Vectors Can Help LLM Judges Detect Subtle Dishonesty

Leon Eshuijs, mcbeth, Etha, Archie Chaudhury

1y

1

1

9Two new datasets for evaluating political sycophancy in LLMs

2y

0

1

8Towards a Science of Evals for Sycophancy

1y

0

1

7Just complaining about LLM sycophancy (filler episode)

7mo

0

1

6Why tuning fails: The AI has no self

Michael Trifonov

7d

2

1

3System Prompts vs. Partner Adaptation in LLMs (or, when LLMs know you're an adult but keep talking like you're seven)

7d

0

1

2Evaluating LLaMA 3 for political sycophancy

2y

2

Load More (15/18)

Add Posts