LESSWRONG
LW

145
Mikita Balesni
17Ω1010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Frontier Models are Capable of In-context Scheming
Mikita Balesni9moΩ91814

I think one practical difference is whether filtering pre-training data to exclude cases of scheming is a useful intervention.

Reply