LESSWRONG
LW

573
Jiaxin Wen
55120
Message
Dialogue
Subscribe

https://jiaxin-wen.github.io/

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Aesthetic Preferences Can Cause Emergent Misalignment
Jiaxin Wen17d20

does anyone rerun openai's persona feature tests on these new EM testbeds?

Reply
Auditing language models for hidden objectives
Jiaxin Wen5mo10

interesting! do you mean experiments in Sec 3.9.2?

Reply
55Unsupervised Elicitation of Language Models
3mo
9