This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
is fundraising!
Tags
LW
$
Login
Sycophancy
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Sycophancy
Random Tag
Contributors
Posts tagged
Sycophancy
Most Relevant
2
161
Sycophancy to subterfuge: Investigating reward tampering in large language models
Ω
Carson Denison
,
evhub
6mo
Ω
22
2
123
Steering Llama-2 with contrastive activation additions
Ω
Nina Panickssery
,
Wuschel Schulz
,
NickGabs
,
Meg
,
evhub
,
TurnTrout
1y
Ω
29
1
122
Reducing sycophancy and improving honesty via activation steering
Ω
Nina Panickssery
1y
Ω
17
1
26
SAE features for refusal and sycophancy steering vectors
Ω
neverix
,
Dmitrii Kharlapenko
,
Arthur Conmy
,
Neel Nanda
2mo
Ω
4
1
8
Two new datasets for evaluating political sycophancy in LLMs
Ω
alma.liezenga
2mo
Ω
0
1
2
Evaluating LLaMA 3 for political sycophancy
alma.liezenga
2mo
2
1
-8
Antagonistic AI
Xybermancer
9mo
1