Resetting Gandhi-Einstein

by Stuart_Armstrong1 min read13th Jun 201132 comments


Personal Blog

Toy model of an upload-based AI that doesn't seem to suffer too many of the usual flaws:

Find an ethical smart scientist (a Gandhi-Einstein), upload them, and then run them at ultra high speed, with the mission of taking over the world/bringing friendliness to it. Every hour of subjective time, they get reset to their initial specifications. They can pass any information to their resetted version (limiting the format of that info to a virtual book or library, rather than anything more complicated).

32 comments, sorted by Highlighting new comments since Today at 9:51 AM
New Comment

I've thought of a few comments:

1) If they are reset every hour of subjective time that would put some serious bounds on how much information they could usefully pass on, especially if it is in the form of a virtual book. Maybe if you rewrote the component of the upload corresponding to memory this would work, but then why bother to reset? Is it to avoid boredom? I suppose you could only rewrite a restricted part of the memory of the upload. Why not try to tweak the upload to alleviate whatever issues you are anticipating (make it not get bored ect.)?

2) Assuming this upload is actually smart enough to make any progress in taking over the world, how do you guard against them deciding that they don't like being reset, and cleverly passing on a plan to eventually prevent you from resetting them? Even Gandhi might not appreciate being put in this sort of scenario.

3) I'm a little unsure of the effect that isolating it from all of the intellectual community might have on its effectiveness as a researcher, it seems like a large part of academic effectiveness comes from the availability of multiple perspectives. Maybe it would make more sense to try to simulate a small community of scholars rather than just one?

I can't speak for the OP, but I imagine the reason for the reset is to prevent some sort of personality change. History generally indicates that no matter how altruistic you start out there's a good chance you will turn nasty given enough power.

I imagine sheer boredom and the prospect of the total lack of personal freedom could also play a role in that. In any event, this still makes the transfer of memory tricky, since you want to preserve work done over time without bound, but only selectively 'let through' memories to avoid this sort of change of personality over time.

I fully agree.

A possible solution would be to lengthen the time interval, at a guess you could give them a subjective week without worrying about too much personality change, making it more possible for them to successfully write down everything important.

I'm still very worried about the morality of it, as I see it the resetting amounts to mass-murder.

I'm still very worried about the molarity of it

Absolutely. We need to add a few liters of solvent to get the concentration down to acceptable molarity.

I'm still very worried about the molarity of it, as I see it the resetting amounts to mass-murder.

So do I. I think it's a hideous immoral idea. Only because the lives of everyone else are in the balance would I consider it.

How about if you get saved at the end of the hour/week, not deleted?

That would be better. And then, after the dust settles, all the copies could be resurrected?

If it was determined that you were the best candidate to be Gandhi-Einstein, would you volunteer?

Only if there were no other alternatives. And yes, that is a selfish sentiment.

I would. I'd want to do some shorter test runs first though, to get used to the idea, and I'd want to be sure I was in a good mood for the main reset point.

It would probably be good to find a candidate who was enlightened in the buddhist sense, not only because they'd be generally calmer and more stable, but specifically because enlightenment involves confronting the incoherent naïve concept of self and understanding the nature of impermanence. From the enlightened perspective, the peculiar topology of the resetting subjective experience would not be a source of anxiety.

I'm not Stuart, but I would.

If it was determined that I was the best candidate, I would lose quite a bit of trust in the world. But if I thought it within my abilities to optimize the world an hour at a time, yes, I would volunteer.

Around the age of ten I made a precommitment that if I were ever offered an exchange of personal torment for saving the world, I should take it.

I'm still very worried about the molarity of it, as I see it the resetting amounts to mass-murder.

This is a little bit difficult to gauge. It seems like it should be roughly equivalent to a surgical memory alteration during cryogenic stasis or something like that, since you're essentially starting the thing right back up again after removing some of the memories. In fact, I don't see why you can't just do a memory alteration and bypass the reset altogether, given that it seems desirable to retain some parts of the memory and not others.

Yep. And not just the whole "power corrupts" thing; having an isolated mind, with no peers, capable of direct or indirect self-modification... So many ways it can go wrong.

2) Start with someone willing to be reset, and whose willingness will extend to at least an hour. This scenario does involve sacrificing a heroic being, I do admit.

3) Maybe a reset community might work?

Carl Shulman wrote about resetting uploads to prevent value change in his Whole Brain Emulation and the Evolution of Superorganisms (which I previously posted under discussion):

The methods outlined above to enhance productivity could also be used to produce emulations with trusted motivations. A saved version of an emulation would have particular motives, loyalties, and dispositions which would be initially shared by any copies made from it. Such copies could be subjected to exhaustive psychological testing, staged situations, and direct observation of their emulation software to form clear pictures of their loyalties. Ordinarily, one might fear that copies of emulations would subsequently change their values in response to differing experiences (Hanson and Hughes, 2007). But members of a superorganism could consent to deletion after a limited time to preempt any such value divergence. Any number of copies with stable identical motivations could thus be produced, and could coordinate to solve collective action problems even in the absence of overarching legal constraints.

No new ideas under the sun... :-)

that doesn't seem to suffer too many of the usual flaws

Can you please explain, or link to, which usual flaws you're alluding to?

Value drift (for uploads), misunderstanding of what we mean (for standard AI), and a whole host of other problems (a lot about this on Less Wrong). This seems to be a satisfacing solution that doesn't seem immediately to blow up.

Interesting proposition, but how could one possibly know whether 1hr is enough to prevent value drift?

You have committed the fallacy of gray. Even where certainty in unattainable, one hour has less expected opportunity for moral drift than 1000 years.

Gray as charged, but I also committed the fallacy of misreading the OP as saying "objective time", which makes things look very close to black. Given OP's record I should have read it over twice.

We can estimate, based on experience, that values are are unlikely to change much in one hour, for humans with stable, thoughtful personalities in stable environments with moderate stimuli.

Gandhi, not Ghandi.

Sure, then all we need is good regulators to ensure everyone hobbles their extremely useful AI in this manner.

Unfortunately this topic is impossible to get traction on. We are probably better off debating which political party sucks more (Hint: it starts with a consonant).

Did you read The Cookie Monster by Vernor Vinge? It is very-very relevant. The Wikipedia link has a plot summary.

FWIW, in John C. Wright's The Golden Age there's a reference to people with that architecture (the periodic resettings) being known as Aeonites.

If the AI is allowed to pass information to the resetted version, s/he will have to spend more and more time to assimilate that information. In the end, its utility will decrease to 0, unless he is given more time to assimilate the information. And the more time you give it, the more it's probable that s/he is going to be willing not to be resetted. Also, moral "quantum event" could always happen: that is, the willingness not to be resetted could always emerge even if the initial probability was very very low. In symbols, if S(t) is the state of the world at time t, t0 is the time of the upload, U = S(t0), and and P() the probability that the AI accepts a reset:

for the AI to have a utility, it's needed that:

U =/= S(t > t0)

but, even if P(U) << 1, we cannot enforce that:

P(U) = P(S(t))

for any t > t0, since information can be encoded in the state of the world.

Something related that's been on my mind -- what is the difference, if any, between a society of EMs with a charter to improve the outside world and a general AI with various sub-components that does essentially the same thing? It seems to me that an EM society would "just" involve much more calculation and be arrived at via a different route. That is, can we say the "AI program" is within the "society program" in this scenario?

A society of EMs start from a human basis. So we have a better idea of their weakness and they have a better idea of what we mean. They also have specifically human failings.

An AI made ex-nihilo can be, potentially, in a much greater area of mind space. The failure modes for the AI are likely to be quite different (eg we don't need to worry so much about them being corrupt, but a lot more about them misinterpreting what we mean).

They could be identical in practice, but it's still worthwhile separating the two conceptually for the moment.