LESSWRONG
LW

Phil Blandfort
46110
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
How LLMs are and are not myopic
Phil Blandfort2y10

Do you have any update on this? It goes strongly against my current understanding of how LLMs learn. In particular, in the supervised learning phase any output text claiming to be an LLM would be penalized unless such statements are included in the training corpus. If such behavior nevertheless arises I would be super excited to analyze this further though.

Reply
49Detecting High-Stakes Interactions with Activation Probes
2mo
0
1Sampling Effects on Strategic Behavior in Supervised Learning Models
1y
0