LESSWRONG
LW

2270
Phil Blandfort
46110
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
How LLMs are and are not myopic
Phil Blandfort2y10

Do you have any update on this? It goes strongly against my current understanding of how LLMs learn. In particular, in the supervised learning phase any output text claiming to be an LLM would be penalized unless such statements are included in the training corpus. If such behavior nevertheless arises I would be super excited to analyze this further though.

Reply
50Detecting High-Stakes Interactions with Activation Probes
3mo
0
1Sampling Effects on Strategic Behavior in Supervised Learning Models
1y
0