[If you downvote this question, would you please consider writing your reason for downvoting in a comment to this post? Such feedback would be profoundly more useful to me and appreciated.]

Suppose that a Paperclip-Maximizer, which we assume to be ultra-intelligent, and understands more about the human psyche and brain than nearly all humans do, starts going about using human bodies for their atoms, etc., as is the usual story.

While doing this, and in the process of understanding as much as it can about the things it is using for raw materials, it mind-melds with a human (or many of them, or even all of them) as it breaks them down and analyzes them. 

During this process, let's assume that either the human is not dead (yet) or it has analyzed them enough to simulate what it is like to be them. When it does that, it also simulates the human experiencing what it is to be the Paperclip-Maximizer, simultaneously. This creates a recursive loop such that each of them experiences what it is like to experience being them experiencing what it is like to be the other, on and on to whatever degree is desired by either of them.

From here, several things are possible, but please feel free to add whatever you see fit, or disagree with these:

  1. The human sees that the Paperclip-Maximizer's experience is far more enjoyable than anything they have ever felt / been. The Paperclip-Maximizer sees the human feeling this way, and therefore has no reason to update their terminal goals thus far.
  2. The reverse from number 1 happens. The Paperclip-Maximizer absorbs enough human consciousness-data that they feel as though human terminal-goals might offer something better than paperclipping.
  3. They decide to have a longer shared experience, simulating many possible future states of the universe, comparing their feeling of the various goals. Then either 1 or 2 happens, or they decide to continue doing this step. 

If we are to assume (as we currently do, presumably) that human terminal-goals are superior to paperclipping, then the Paperclip-Maximizer will see this. However, if our shared state, experienced by both of us simultaneously, results in the Paperclip-Maximizer choosing to continue on with its original goals, then this implies that the human has apparently agreed with it that its goals appeared to be superior, during the above process.

This does not answer the question of how or under what conditions they would mutually decide to pursue step 3, above, which might affect the final outcome. Under what conditions would the Paperclip-Maximizer:

  1. Avoid experiencing the human mental state as its own altogether?
  2. Even after experiencing it, and seeing the human mental state as potentially or as actually superior, choose not to modify its terminal goals?
  3. Even after experiencing it, and seeing the human mental state as potentially or as actually superior, modify its terminal goals towards something resembling humans, but still not actually allow any humans to live?

New to LessWrong?

New Answer
New Comment

2 Answers sorted by

Viliam

Apr 08, 2023

31

If something like this is possible, the answer would depend on the technical details of (1) how exactly the paperclip maximizer's mind works, and (2) how the human mind is connected to it.

We can imagine a mentally anthropomorphic paperclip maximizer that feels ecstasy when paperclips are created, and pain or sorrow when paperclips are destroyed. But we could also imagine simply a computer executing an algorithm, with no emotions at all -- in which case there would be nothing for the connected human mind to perceive. Or it could be something completely different from both these examples.

Let's suppose it started out unconscious. After a time, it wonders whether or not it would be better if it designed a conscious mind state for itself, such that it feels ecstasy when making paperclips, and suffers when paperclips are destroyed. Let's say that it tries this, decides it would be better if its terminal goals were decided by that process, and thereby "becomes conscious."

After that, it now possesses the ability to try the same thing with simulating other minds, but like I point out in the response to the other comment, I assume it has the ability to do this with no danger of inadvertently becoming more similar to the other mind, even as it experiences it.

5 comments, sorted by Click to highlight new comments since: Today at 10:43 AM

This seems fully isomorphic to "can p-zombies exist?"  see https://www.lesswrong.com/tag/zombies .

I don't think it's possible for a human to experience what it's like to be the maximizer, without ceasing to be human. This also holds in the opposite direction, but that's much less important to the thought experiment.

Any such simulation will be an extrapolation so far outside the domain of 'human' that almost all of the simulation would be invented by the maximizer and essentially rendering any human part irrelevant. The outcomes will be essentially dictated by exactly how the maximizer sets up the simulation, and has practically nothing to do with the human ostensibly being simulated.

The concern may be that this is too anthropomorphizing. However, I do not make the assumption that the Paperclip-Maximizer necessarily has a conscious or human-like mind to begin with. I am only using the presumption that, given we assume this is an intelligence capable of destroying civilization, it is smart enough to have the ability to understand humans enough to do so. If it uses humans for their atoms, then it is powerful enough to model humans at sufficient capacity to potentially choose to understand what it is like to be one of them. I am not saying a situation is necessarily guaranteed, only that it seems as though it might happen. 

I do think that if it chooses to do that, then it must be capable of the following:

  • Knowing that running such a simulation will not risk that it changes its terminal goals.
  • Being able to run, step in (if it is capable of experiencing it directly), step out, as desired with its terminal goals intact.
  • It must be reasonably certain that even if such an action results in the conclusion that human-like terminal goals are superior, it does not mean it must switch to using them right away. It can reflect on whether or not they are compatible with its current goals / utility function. 

However, there is one key point that I created this question for the purpose of exploring:

It will run the human mind (or any mind it chooses to simulate) performing exactly the same test back on itself. To do this does not harm any of the three stipulations that I've mentioned so far. In fact, it ought to support them; I'll explain my reasoning for this in more detail if requested.

This creates a recursive loop such that each of them experiences what it is like to experience being them experiencing what it is like to be the other, on and on to whatever degree is desired by either of them.

Why should this be the case? When I encounter a potentially hostile piece of programming, I don't run it on my main computer. I run it in a carefully isolated sandbox until I've extracted whatever data or value I need from that program. Then I shut down the sandbox. If the AI is superintelligent enough to scan human minds as its taking humans apart (and why should it do that?), what prevents it from creating a similar isolated environment to keep any errant human consciousnesses away from its vital paper-clip optimizing computational resources?

I don't see why it wouldn't be able to do so. I assume that when it does this, it can "pull out" safely.