LESSWRONG
LW

Lucy Wingard
12300
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
5Reproducing Absolute Zero
1mo
1
4Exploring unfaithful/deceptive CoT in reasoning models
6mo
0
6Sleeper agents appear resilient to activation steering
7mo
0