This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Sycophancy
•
Applied to
Antagonistic AI
by
Xybermancer
3mo
ago
•
Applied to
Steering Llama-2 with contrastive activation additions
by
TurnTrout
5mo
ago
•
Applied to
Reducing sycophancy and improving honesty via activation steering
by
Maxime Riché
5mo
ago
•
Created by
Maxime Riché
at
5mo