LESSWRONG
LW

395
sator
1010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so
sator3y20

Probably a feature of the current architecture, not a bug. Since we still rely on Transformers that suffer from mode collapse when they're fine trained, we will probably never see much more than weak level 2 self improvement. Feeding its own output into itself/new instance basically turns it into a Turing machine, so we have now built something that COULD be described by level 4. But then again, we see mode collapse, so the model basically stalls. Plugging its own input into a not fine tuned version probably produces the same result, since the underlying property of mode collapse is emergent by virtue of it having less and less entropy in the input. There might be real risk here in jailbreaking the model to apply randomness on its output, but if this property is applied globally, then the risk of AGI emerging is akin to the Infinte Monkey Theorem.

Reply
No wikitag contributions to display.
No posts to display.