LESSWRONG
LW

712
Dhruv Trehan
6100
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
7Saying “for AI safety research” made models refuse more on a harmless task
2mo
1