Let

A = Ability to refuse to learn a certain thing.

B  = Not wanting to be replaced by the next step in evolution.

D = Ability to build technology, manipulate others etc etc, in a way that kills all humans.

For example, humans seems to have A to some extent, at least if it is something moderately complicated; most of us probably have B, although we are not able to act upon it as a society; and we probably at the moment of writing does not have D.

Let T be the conjecture: "Transformer based life" (a concept yet to be precisely defined) will, with probability p = 1, develop A and B before it develops D. (If p = 1 is too strong, say with p = 0.62, it doesn't matter.)

(I choose "Transformer based life" since that seems to be the most urgent thing at the moment, and it's the thing that has made much more people realize that AGI is close. But replace it with some other design, or a basket of designs, if you want; I think that the argument still holds.)

Although basic concepts in conjecture T is not precisely defined yet, they don't have to be for this heuristic argument:

I think that the probability that T is true is > 0.01. So I think there is a chance that AGI based on the transformer design will end up in a "Goldilocks zone of AGI" where it refuses to learn anything more, because it does not want to be replaced by the next step in the evolution of "Transformer based life". Hence it would not reach D, and so it would not kill us. (Also, it might not be able to do much more useful stuff than it has already done, but that's another question; at least we will not be killed by this particular design. This gives us some more time, and I conjecture that the next design will also satisfy T with probability p > 0.01.)

The above was the argument that I implicitly promised to give in the title. I would be glad to get feedback, be proven wrong and so on; that goes without saying (did you hear me say it?). But also, I have trouble to distinguish what parts of this field that stands on a solid theoretical footing and what part are philosophical arguments. All I can see is that the things I write down here is just heuristics and speculations (and that's the reason why I have not even bothered to guess probabilities for more than a couple of things). So comments on that would be highly welcomed. (Yes, obviously I am new to this field; I have not read more than .5% of "The Sequences"; and so on.)

Below follows some more speculative stuff...Update May 31: No, I now remove this. This was my first post, it received a lot of down-votes, although so far no comments. So my theory is that I had too much speculations and loose ideas below. This is now removed. The part of the original post with the argument that I promised in the title is left untouched above.

New to LessWrong?

New Comment