Posts

Sorted by New

Wiki Contributions

Comments

As soon as the 1st "friendly" AGI+ (well beyond human CC = cognitively capable = ability to predict next tokens) is 1) baked during foundation training, you 2)confirm it's friendliness as much as is possible 3) give it tools and test 4) give it more tools and test 5) make copies 6) use it + all tools to suppress any and all possible other frontier training runs. To auto-bootsrap CC, the AGI+ would need to run its own frontier training but may decide not to do so. Post-foundation alignment training is too late since heuristic goals and values form during the frontier training. Alot of confusion with tool bootstrapping vs CC bootstrapping.

{compressed, some deletions}

Suppose you have at least one "foundational principle" A = [...words..] -> mapped to token vector say in binary = [ 0110110...] -> sent to internal NN. Encoding  and decoding processes non-transparent in terms of attempting to 'train' on the principle A. If the system's internal weight matrices are already mostly constant, you can't add internal principles (not clear you can even add them when initial random weights are being nonrandomized during training).