Google Google — LessWrong

LESSWRONG
LW

Replying toSteven Wolfram on AI Alignment

Thanks for the downvotes (to err is human, to error correct, divine). Good probe of system behavior for me.

How abt:

'Reason correctly (logic) from stated 1st principles, rely on standard physics and concrete mathematical reps + definitions and systems can reach valid conclusions (under the 1st principle assumptions, etc.) that are *irrefutable* (in their domain of validity)

{M words, N posts} have been written by humanity that have been in the set: {useful in some way but not the way intended, null, anti-useful}. There's an opportunity cost to reading posts (finite life, attention,etc). The longer the post, the higher the probability of wasted opportunity for the reader (compression good).

Nice posts include Einstein's 1905 papers or 1915 GR.'

-11

Replying toSteven Wolfram on AI Alignment

Google Google2y*

Steven Wolfram on AI Alignment

{compressed, some deletions}

Suppose you have at least one "foundational principle" A = [...words..] -> mapped to token vector say in binary = [ 0110110...] -> sent to internal NN. Encoding and decoding processes non-transparent in terms of attempting to 'train' on the principle A. If the system's internal weight matrices are already mostly constant, you can't add internal principles (not clear you can even add them when initial random weights are being nonrandomized during training).

Replying toWhat does it take to defend the world against out-of-control AGIs?

Google Google2y

What does it take to defend the world against out-of-control AGIs?

As soon as the 1st "friendly" AGI+ (well beyond human CC = cognitively capable = ability to predict next tokens) is 1) baked during foundation training, you 2)confirm it's friendliness as much as is possible 3) give it tools and test 4) give it more tools and test 5) make copies 6) use it + all tools to suppress any and all possible other frontier training runs. To auto-bootsrap CC, the AGI+ would need to run its own frontier training but may decide not to do so. Post-foundation alignment training is too late since heuristic goals and values form during the frontier training. Alot of confusion with tool bootstrapping vs CC bootstrapping.