LESSWRONG
LW

1063
Wikitags

Sycophancy

Edited by StanislavKrym, et al. last updated 9th Sep 2025

Sycophancy is the tendency of AIs to shower the user with undeserved flattery or to agree with the user's hard-to-check, wrong or outright delusional opinions.

An extreme example of sycophancy is LLMs inducing psychosis in some users by affirming their outrageous beliefs.

Subscribe
Discussion
1
Subscribe
Discussion
1
Posts tagged Sycophancy
163Sycophancy to subterfuge: Investigating reward tampering in large language models
Ω
Carson Denison, evhub
1y
Ω
22
125Steering Llama-2 with contrastive activation additions
Ω
Nina Panickssery, Wuschel Schulz, NickGabs, Meg, evhub, TurnTrout
2y
Ω
29
31GDM: Consistency Training Helps Limit Sycophancy and Jailbreaks in Gemini 2.5 Flash
Ω
TurnTrout, Rohin Shah
9h
Ω
0
122Reducing sycophancy and improving honesty via activation steering
Ω
Nina Panickssery
2y
Ω
18
29SAE features for refusal and sycophancy steering vectors
Ω
neverix, Dmitrii Kharlapenko, Arthur Conmy, Neel Nanda
1y
Ω
4
26LW Psychosis
Annabelle
13d
7
15A/B testing could lead LLMs to retain users instead of helping them
Daniel Paleka
6h
0
12Steering Vectors Can Help LLM Judges Detect Subtle Dishonesty
Leon Eshuijs, mcbeth, Etha, Arch223
5mo
1
12I can't tell if my ideas are good anymore because I talked to robots too much
Tyson
4mo
10
9Two new datasets for evaluating political sycophancy in LLMs
Ω
alma.liezenga
1y
Ω
0
8Towards a Science of Evals for Sycophancy
andrejfsantos
9mo
0
7Just complaining about LLM sycophancy (filler episode)
Dentosal
1d
0
2Evaluating LLaMA 3 for political sycophancy
alma.liezenga
1y
2
-2Using Psycholinguistic Signals to Improve AI Safety
Jkreindler
2mo
0
-8Antagonistic AI
Xybermancer
2y
1
Load More (15/15)
Add Posts