LESSWRONG
LW

mrinank_sharma
174000
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
78Best-of-N Jailbreaking
Ω
7mo
Ω
5
66Towards Understanding Sycophancy in Language Models
Ω
2y
Ω
0
70Paper: Understanding and Controlling a Maze-Solving Policy Network
Ω
2y
Ω
0