x

LESSWRONG

LW

frisby — LessWrong

frisby

frisby

Message

35

3

1y

frisby

35

1y

How do LLMs generalize when we do training that is intuitively compatible with two off-distribution behaviors?

by Dylan Xu, Alek Westover, Vivek Hebbar, SebastianP, frisby, and Julian Stastny

Authors: Dylan Xu, Alek Westover, Vivek Hebbar, Sebastian Prasanna, Nathan Sheffield, Buck Shlegeris, Julian Stastny Thanks to Eric Gan and Aghyad Deeb for feedback on a draft of this post. EDIT: After further consideration and @nostalgebrist’s comment, we decided to introduce a more stringent condition for transfer: in addition to...