LESSWRONG
LW

1140
Jiaxin Wen
57120
Message
Dialogue
Subscribe

https://jiaxin-wen.github.io/

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Aesthetic Preferences Can Cause Emergent Misalignment
Jiaxin Wen2mo20

does anyone rerun openai's persona feature tests on these new EM testbeds?

Reply
Auditing language models for hidden objectives
Jiaxin Wen7mo10

interesting! do you mean experiments in Sec 3.9.2?

Reply
57Unsupervised Elicitation of Language Models
5mo
11