sator — LessWrong

Probably a feature of the current architecture, not a bug. Since we still rely on Transformers that suffer from mode collapse when they're fine trained, we will probably never see much more than weak level 2 self improvement. Feeding its own output into itself/new instance basically turns it into a Turing machine, so we have now built something that COULD be described by level 4. But then again, we see mode collapse, so the model basically stalls. Plugging its own input into a not fine tuned version probably produces the same result, since the underlying property of mode collapse is emergent by virtue of it having less and less entropy in the input. There might be real risk here in jailbreaking the model to apply randomness on its output, but if this property is applied globally, then the risk of AGI emerging is akin to the Infinte Monkey Theorem.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments