Instrumental Convergence To Offer Hope?

michael_mjd

Instrumental Convergence To Offer Hope?

3 min read22nd Apr 20227 comments

12

TL;DR Could a superintelligence be motivated to preserve lower life forms because it fears an even greater superintelligence? This 'motivation' takes the form of something like timeless decision theory.

This is a thought experiment about a superintelligent AI. No doubt the situation is more complicated than this, so I will pose some concerns I have at the end. I wonder if this idea can be useful in any capacity, if this idea has not already been discussed.

Thought Experiment

An AI can have any value function that it aims to optimize. However, instrumental convergence can happen regardless of underlying values. Different nations have different values, but can work together because none of them have decisive strategic advantage, or because of mutually assured destruction. But when there is an advantage, history shows us it goes badly for the nation at a disadvantage.

The risk of an AI system is that it operates on a level so far advanced from our own, that it faces no competition from us. From other AI systems developed, the first one developed will likely attain a decisive strategic advantage. It can evaluate whether it has the opportunity to do so, and if correct, it will do so, because if it does not, another AI will most certainly destroy it.

What if the AI does not know the landscape of other intelligences and superintelligences in the world? Possibly the AI has such advanced surveillance ability that it can easily detect the presence of competitors within the Earth. What about in the universe? Could the AI be mistaken about the presence of a superintelligence exceeding its own capacity “out there” in the universe somewhere?

Let’s entertain the possibility. We have a superintelligent life form that attains a strategic advantage on Earth. It is not sure if there exist other superintelligences elsewhere in the universe, with abilities exceeding its own. If it encounters one of them, the other superintelligence could easily eradicate it, if it poses a threat to its value function.

The other, vaster superintelligence may yet be worried about an even more intelligent and powerful intelligence even farther away. It perhaps is in the same situation. Maybe the problem is scale-invariant.

If it is scale invariant, maybe we can use something like Timeless Decision Theory. If an intelligent agent B encounters ‘inferior’ agent A and superior agent C, how can it ensure that C does not kill B? B can decide on some kind of live-and-let-live policy. B agrees to allow A x% of the universe for its own value function – or rather, the part of the universe it controls. Then, since we assumed this problem is shared by C as well, C would come to the same conclusion. C will agree to allow B y% of the universe as well. (In this situation, A has access to x*y% of C’s greater scope). In other words, all agents assume all other agents are “like them” in that they want to not be destroyed, and will construct the only scale invariant plan that avoids extinction.

I think this way, the agent’s utility function is not either 100% of the universe or 0%. It’s a kind of variance reduction. The expectation of the final utility can also be greater or less than, depending on the probability of an actual more-super-intelligence existing. But this probability may not even be computable. Which in a weird way, might be a good thing here.

Problems to explore

Is the problem really scale-invariant? Perhaps the calculations and strategies at the level of Superintelligence+ are qualitatively different at each turn, and the scale invariance does not apply.
Can the AI simply prove that it is alone, to its own satisfaction? Perhaps there is some unknown simulation it is running in, but maybe from a Bayesian perspective, they would not expect any sort of intervention ever.
Is this just a weird iterated Pascal’s Wager?
Is there any way to steer the agent towards this track? Emphasizing variance reduction? Though if the agent is somehow non-canonical, it may not be able to reason about the motivations of the other agents.
Is there a possibility of ‘hiding’ the fact that Agent B genocided Agent A?
1. Agent C might ask for evidence of the existence of inferior life forms. If none exist, that is a red flag. Even we have other animals/ants etc. They may destroy humans but leave animals… but in this case, they can ask why there is no intermediate?
If this solution exists, is there an optimal value of x for the proportion of the universe to allocate to the inferiors? At first I thought so, but maybe not, given the superior forms may be able to go places the others cannot.
Even if this can be worked to prevent extinction, it could very well be that Clippy allows us the Milky Way and converts everything else into paperclips.