LESSWRONG
LW

135
Adly Templeton
7010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Steering Behaviour: Testing for (Non-)Myopia in Language Models
Adly Templeton3y87

The operationalization of myopia with large language models here seems more like a compelling metaphor than a useful technical concept. It's not clear that "myopia" in next-word prediction within a sentence corresponds usefully to myopia on action-relevant timescales. For example, it would be trivial to remove almost all within-sentence myopia by doing decoding with beam search, but it's hard to believe that beam search would meaningfully impact alignment outcomes. 

Reply