x

LESSWRONG

LW

SemanticMerlin — LessWrong

SemanticMerlin

SemanticMerlin

Message

1

3y

SemanticMerlin

3y

Two sources of beyond-episode goals (Section 2.2.2 of “Scheming AIs”)

SemanticMerlin2y10

Very surprised to be the first comment on this, nice work. You’ve framed beyond-episode goals really well. One thing that is bothering me, and I must be missing something - why is there a prima facie supposition of the emergence of beyond-episode goals at all? As you (rightly) note, the naive logic about SGD as a mechanism would seem strongly to point away from the plausibility of BEG. This is well written but I feel like “suppose some BEG emerges” is treated almost axiomatically. Don’t we need a stronger circumstantial/theoretical/evidentiary reason for thinking BEGs are, like, a thing that happens in SOTA deep learning paradigms?