LESSWRONG
LW

698
Theresa Barton
-1020
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Theresa Barton7mo00

Hi! Did you try this technique on any other LLMs? Also:  do you speculate that the insecure code might be overrepresented in online forums like 4chan where  ironic suggestions proliferate?

Reply
GPT-4
Theresa Barton2y10

I think performance on AP english might be a quirk of how they dealt with dataset contamination. English and Literature exams showed anomalous amount of contamination (lots of the famous texts are online and referenced elsewhere) so they threw out most of the questions, leading to a null conclusion about performance.

Reply
No posts to display.