Being a “digital person” could be scary—if I don’t have control over the hardware I’m running on, then someone else could get my code and run tons of copies in horrible conditions. (See also: qntm’s Lena.)

It would be great to guarantee digital people some control over their situation: 1. to control their local environment and sensations, 2. to avoid unauthorized rewinding or duplicating.

I’ll describe how you could modify the code of a digital person so that they retain this control even if an adversary has access to their source code. This would be very expensive with current cryptography. I think the overhead will eventually become cheap enough that it’s possible to do for some digital people, though it will likely remain expensive enough that it is never applied to most digital people (and with luck most digital people will be able to feel secure for other reasons).

Part 1: the right to control my environment

My ideal

  • I live in a comfortable virtual home. I control all of the details of that world.
  • When people communicate with me, I can choose how/whether to hear them, and how/whether to update my home based on what they say (e.g. to render an avatar for them)
  • Sometimes I may occupy a virtual world where a foreign server determines what I see, feel, or hear. But even then I can place boundaries on my experiences and have the ability to quickly retreat to my home.
  • I have as much control as feasible over my own mental state and simulated body. No one else can tamper directly with them.
  • I can choose to pause myself for as long as I want (or permanently).
  • My local environment is private, and I have access to plenty of tamper-proof storage. I can do whatever I want with computers in my home, including e.g. verifying signatures or carrying on encrypted conversations.

Implementation

  1. First we write a simple environment that reflects all my desiderata (the “home”).
  2. Then I apply indistinguishability obfuscation to (me + home), so that the house becomes private and tamper-proof. (This is an extremely expensive operation, more on that later.)
  3. I distribute the obfuscated home and hopefully destroy any unprotected copies of myself.

One conceptual difficulty is that indistinguishability obfuscation applies to circuits whereas I would like to obfuscate a long-running program. But this can be handled straightforwardly, as discussed in Appendix A.

The home could consume terabytes of memory and teraflops of compute before it added significantly to the expense of running a human-like digital person, so I could live in relative luxury. The home could also negotiate resource requirements with the external world, and to decide what to do when requested resources are unavailable (e.g. to pause until it becomes available).

Limitation 1: cost

Indistinguishability obfuscation is extremely expensive, more like a factor of 10000000000 slowdown than 10.

It will get faster with further research, but probably not fast enough to obfuscate the whole person+home. But there are other ways to speed up the process:

  • I think it’s probably possible to have most of the computation be “merely” homomorphically encrypted, and to have an obfuscated controller which verifies and decrypts the results. FHE could be much faster than obfuscation; if I had to guess I’d say it would converge to something like 2-3 orders of magnitude of slowdown.
  • We can potentially have an obfuscated controller verify a much larger untrusted computation. I don’t know how fast we can make delegated computation, but I could imagine it getting closer to 2x than 100x. It might help further that we are not applying these methods to generic problems but to a very specific structured problem (which probably has quite low circuit depth). One complication is that we need our proof system to be secure even against an adversary who can unwind the prover, but I don’t think this is a huge deal.
  • Delegating computation would preserve integrity but not security. So the computation we delegate may need to already be private. Here it seems likely that we can benefit a lot from the structure of the computation. Almost all of our operations are in doing a brain simulation, and we don’t really care about leaking the fact that we are doing a brain simulation, just about leaking the state of the brain. I don’t know how fast this can be made but again I would not be surprised by a factor of 2.

It’s pretty unclear how fast this could get, either from taking some of these techniques to their limits or from thinking of other cleverer ideas. I would not be at all surprised by getting the whole thing down to a factor of 2 slowdown. That said, I also think it’s quite plausible that you need 10x or 10000x.

Limitation 2: security?

The cryptography used in this construction may end up getting broken—whether from a mistaken security assumption, or because the future contains really giant computers, or because we implemented it badly.

The software used in my home may get compromised even if the cryptography works right. An adversary can provide trillions of malicious inputs to find one that lets them do something unintended like exfiltrate my code. With modern software engineering this would be a fatal problem unless the home was extremely simple, but in the long run writing a secure home is probably easier than writing fast enough cryptography.

I may be persuaded to output my source code, letting an adversary run it. I might not give myself the ability to inspect my own source, or might tie my hands in other ways to limit bad outcomes, but probably I can still end up in trouble given enough persuasion. This is particularly plausible if an adversary can rewind and replay me.

Limitation 3: rewinding

In the best case, this scheme guarantees that an attacker can only use my code as part of a valid execution history. But for classical computers there is no possible way to stop them from running many valid execution histories.

An attacker could save a snapshot of me and then expose it to a billion different inputs until they found one in which I responded in a desired way. (Even if I’m cagey enough to avoid this attack in most possible situations, they just have to find one situation where I let my guard down and then escalate from there.) Or I could have revealed information to the outside world that I no longer remember because I’ve been reset to an earlier state.

Someone living in this kind of secure house is protected from the worst abuses, but they still can’t really trust the basic nature of their reality and are vulnerable to extreme manipulation.

This brings us to part 2.

Part 2: the right to a single timeline

My ideal

  • No one should be able to make a second copy of me without my permission, or revert me to a previous state.
  • I should be able to fork deliberately. I can’t force someone to run a second copy of me, but I should be able to give specific permission.

Implementation with trusted hardware

This is easy to achieve if we have a small piece of trusted tamper-resistant hardware that can run cheap computations. We use the same mechanism as in the last section, but:

  • The trusted hardware has a secret key sk, and it maintains an internal counter k.
  • On input x, the trusted hardware signs (x, k) and increments the counter.
  • Whenever someone provides my obfuscated controller an input and tries to step it forward, the obfuscated controller first checks to see that the input has been signed by the trusted controller with the correct timestep.
  • In order to make a copy, I need to have the public key of another piece of trusted hardware, which I use to initialize a new copy. (Ideally, the manufacturer signs the public key of each piece of trusted hardware they built, and I know the manufacturer’s public key.)

If I were willing to make a round trip to a trusted third party every time I received a novel input, then I could have them implement this function directly instead of using tamper-proof hardware. The real critical ingredient is me trusting someone on the outside. I’ll discuss how to potentially remove this assumption in the section on quantum computers below.

None of this actually requires my house to be built to guarantee the right to a single timeline—I could start without such a right, and then install a wrapper to enforce a single timeline once there was some hardware I trusted or if it became important enough.

Implementation with 1-of-2 transfer

Suppose that the only kind of trusted hardware is a device that holds two secrets, and will reveal one or the other of them when asked but not both. I think this is somewhat easier to build than general trusted hardware. (Related: locking phones with quantum bits.)

Now suppose there is a trusted party who manufactures a bunch of these devices, with a public key pk. Each device a serial number n, and its two secrets are signatures from pk: one of (n, 0) and one of (n, 1)

I can still achieve a single-timelines guarantee:

  • At each time, I have a list of 100 secure devices I expect to use to verify my next step. When I am initially created I have to choose 100 devices that someone will need to run me.
  • Each time someone wants to step me forward, they need to provide both the input and the serial numbers for 100 new secure devices to use for the next step.
  • I then apply a cryptographic hash function to get a 100 bit string. I output this string and wait for a response before stepping forward the simulation.
  • They use the nth bit of my string to decide which secret to reveal from their nth device. I verify the signatures for all 100 secrets.
  • If the adversary tries to run me with two different inputs, or to have two different sets of 100 devices to use for the next steps, they will end up with two different 100 bit string challenges (unless they find a hash collision). Stepping me forward for both strings would require getting the signatures for two different bits, which is impossible if the secure devices work as intended and the signature scheme remains secure.

Implementation with quantum computers

We can probably remove the need for a trusted party if we have quantum computers big enough to perform delegated cryptographic operations (but no big enough quantum computers to run an entire digital person in superposition).

The vague idea in this section looks to me like it should work—I’d give it at least 85% odds—but would require some new ideas and analysis. It’s on much sketchier ground than any of the ideas in previous sections

The basic idea is to replace the secure trusted hardware with a small quantum computer that holds a quantum secret. At all times the obfuscated controller knows the secret, but it is impossible for anyone external to get two copies of the secret. When we ask the obfuscated controller to advance its simulation, it will first ask the quantum computer to transform the secret in an irreversible way and to provide proof that it has done so.

To achieve this we can basically repurpose a protocol for delegated quantum computation and verifying quantumness of a single quantum device, which is roughly what you need if you don’t trust the external world at all and so need to treat it as one device (it’s an amusing coincidence that those papers are by more old officemate Urmila Mahadev and that I worked with her on this problem in another life). To briefly summarize the trick:

  • The operator knows 100 hard-to-invert functions f_i.
  • For each i, the untrusted quantum computer has a superposition over two values x for which f_i(x) is equal.
  • By measuring the state, the untrusted quantum computer could collapse the superposition to a single pre-image x, but it is provably hard for them to find both pre-images.
  • The prover can also *prove* that they have both values by applying a hadamard transform.
  • This means that the prover has a given superposition, but they don’t actually know the parts of the superposition, and so the only way they can manipulate it is by applying quantum gates.
  • The obfuscated machine can then tell the untrusted quantum computer a sequence of encrypted operations to perform on this data in order to transform it into a new set of secrets, and then can ask for a set of measurements to both verify that this has been done correctly and ensure that it is irreversible. This step is the one that would require an expert in the field to spend some time.
  • If that actually works then they can use the same idea to fork by stretching the 100 secrets to 400, verifying 200 to confirm that everything was done correctly, and then splitting the remaining 200 into two new secrets, one used by each of my 2 copies.

This protocol fails if an attacker can run the entire home in superposition, since then they can simply rewind the entire process. But if the attacker can’t run the secure home in superposition, then you can extract a classical copy of each input on which they run the home, and so running it on two different inputs would allow you to obtain classical copies of two challenge-responses and so violate the assumption.

This protocol probably provides better support for forking, since I can probably direct the external quantum computer to produce two new secrets without having to find a source of additional trusted hardware.

Limitations:

  • You only obtain security-against-copying *after* you have set up the quantum secret. You can do this by performing a simple protocol from inside your home, and it doesn’t require trusting the quantum device at all. But if an adversary has a copy of you from before you’ve performed this protocol, they could create many copies of you and perform the ritual separately for each of them. (It actually takes some extra work to ensure that an adversary who had a copy of you from before the protocol couldn’t simply break the scheme—I think that can be done but I’m not sure.)
  • There will eventually be quantum computers with quadrillions of qbits, and at that point an attacker (who has access to the final version of the quantum computer) could make many copies of me. If I was naive they could also revive any prior snapshots of me, but I could prevent that if I want to by asking the quantum computer to periodically shred and refresh its secret.
  • For each new input someone wants to send to my home, they need to first consult with a quantum computer. The total cost of the quantum computation is not likely to be too large, but having quantum computers “on site” might be logistically challenging, and round trips could introduce significant latency.

Appendix A: obfuscation for uniform computations

Suppose that I want to obfuscate the program that repeatedly applies the circuit C to a state, i.e. we start from some initial state S[0], then we repeatedly compute (S[t+1], output[t]) = C(S[t], input[t]).

We’ll instead produce an obfuscated “controller” C’, and an appropriate initial state S'[0]. A legitimate operator with access to C’ can simulate my original program, whereas a malicious operator will not be able to do anything other than running multiple copies of me, rewinding to old snapshots, or killing me prematurely.

C’ contains a secret cryptographic key sk. When it receives an input (S'[t], input[t]) it does the following operations:

  • First verify that S'[t] is signed with sk.
  • Then decrypt S'[t] with sk in order to obtain S[t].
  • Now apply C(S[t], input[t]) to obtain (S[t+1], output[t])
  • Now encrypt and sign S[t+1] to obtain S'[t+1]
  • Output (S'[t+1], output[t])

The analysis is left as an easy exercise for the reader (famous last words, especially hazardous in cryptography).

The same idea can be used to obfuscate other kinds of uniform computation, e.g. providing access to secure RAM or having many interacting processors.

121

23 comments, sorted by Highlighting new comments since Today at 2:25 PM
New Comment

Worth noting: this is supposed to be a fun cryptography problem and potentially fodder for someone's science fiction stories, it's not meant to be Serious Business.

What makes it unserious? Is it that there are too many assumptions baked in to the scenario as described, so that it's unlikely to match real challenges we will actually face?

  • I think it's a problem for future people (and this is fairly technically difficult solution at that) and it doesn't matter much whether we think about a plausible solution in advance. Whether future people solve this problem doesn't look like it will have much shape on the overall sweep of history.
  • I think the problem is very likely to be resolved by different mechanisms based on trust and physical control rather than cryptography.
  • I think the slowdowns involved, even in a mature version of this idea, are likely impractical for the large majority of digital minds. So this isn't a big deal morally during the singularity, and then after the singularity I don't think this will be relevant.

Makes sense, thanks!

Man, this fills me with some creeping dread at how many complex problems need to be solved in order for the future to not be dystopic.

Funny enough, it actually does the opposite for me, because I hadn't previously imagined that this problem had anything like a plausible solution. Indistinguishability obfuscation is, as we say, "moon math", but it's certainly better than nothing.

I agree. It makes me really uncomfortable to think that while Hell doesn't exist today, we might one day have the technology to create it.

Thank you for this post. Various “uploading nightmare” scenarios seem quite salient for many people considering digital immortality/cryonics. It’s good to have potential countermeasures that address such worries.

My concern about your proposal is that, if an attacker can feed you inputs and get outputs, they can train a deep model on your inputs/outputs, then use that model to infer how you might behave under rewind. I expect the future will include deep models extensively pretrained to imitate humans (simulated and physical), so the attacker may need surprisingly little of your inputs/outputs to get a good model of you. Such a model could also use information about your internal computations to improve its accuracy, so it would be very bad to leak such info.

I’m not sure what can be done about such a risk. Any output you generate is some function of your internal state, so any output risks leaking internal state info. Maybe you could use a “rephrasing” neural net module that modifies your outputs to remove patterns that leak personality-related information? That would cause many possible internal states to map onto similar input/output patterns and make inferring internal state more difficult.

You could also try to communicate only with entities that you think will not attempt such an attack and that will retain as little of your communication as possible. However, both those measures seem like they’d make forming lasting friendships with outsiders difficult.

Way above my paygrade, but can you just respond to some inputs randomly?

It seems like there's an assumption in this that you're going to be "hosted in the cloud". Why would you want to do that? If you're assuming some more or less trustworthy hardware, why not just run on more or less trustworthy hardware? Why not maintain physical control over your physical substrate? It mostly works for us "non-digital people".

Also, wouldn't being forced to retreat entirely to your "home" qualify as horrible conditions? That's solitary confinement, no?

Why not maintain physical control over your physical substrate? It mostly works for us "non-digital people".

That's plan A.

It seems like there's an assumption in this that you're going to be "hosted in the cloud".

Naively I'd guess that most people (during the singularity) will live in efficiently packed "cities" so that they are able to communicate with other people they care about at a reasonable speed. I think that does probably put you at the mercy of someone else's infrastructure though in general these things will still be handled by trust rather than by wacky cryptographic schemes.

Also, wouldn't being forced to retreat entirely to your "home" qualify as horrible conditions? That's solitary confinement, no?

Two people can each be in their own homes, having a "call" that feels to them like occupying the same room and talking or touching.

What's providing the communication channel? Doesn't that rely on the generosity of the torturer who's holding you captive?

If someone is "holding you captive" then you wouldn't get to talk to your friends. The idea is just that in that case you can pause yourself (or just ignore your inputs and do other stuff in your home).

Of course there are further concerns that e.g. you may think you are talking to your friend but are talking to an adversary pretending to be your friend, but in a scenario where people sometimes get kidnapped that's just part of life as a digital person.

(Though if you and your friend are both in secure houses, you may still be able to authenticate to each other as usual and an adversary who controlled the communication link couldn't eavesdrop or fake the conversation unless they got your friend's private key---in which case it doesn't really matter what's happening on your end and of course you can be misled.)

Right. I got that. But if I go do other stuff in my home, they've successfully put me in solitary confinement. My alternative to that is to shut down. They can also shut me down at will. It doesn't have to be just a "pause", either.

It may be that part of the problem is that "one timeline" is not enough to deal with a "realistic" threat. OK, I can refuse to be executed without a sequencing guarantee, but my alternative is... not to execute. I could have an escape hatch of restarting from a backup on another host, but then I lose history, and I also complicate the whole scheme, because now that replay has to be allowed conditional on the "original" version being in this pickle.

Presumably we got into this situation because my adversary wanted to get something out of executing me in replicas or in replay or with unpleasant input or whatever. If I refuse to be executed under the adversary's conditions, the basic scenario doesn't provide the adversary with any reason to execute me at all. If they're not going to execute me, they have no reason to preserve my state either.

So it's only interesting against adversaries who don't have a problem with making me into MMAcevedo, but do have a problem with painlessly, but permanently, halting me. How many such adversaries am I likely to have?

Maybe if there were an external (trusted) agency that periodically checked to make sure everybody was running, and somehow punished hosts that couldn't demonstrate that everybody in their charge was getting cycles, and/or couldn't demonstrate possession of a "fresh" state of everybody?

Yes, the idea is that with these measure, an adversary would not even try to run you in the first place. That's preferable to being coerced by extreme means to do everything they might possibly want with you.

They can't freely modify your state because (if the idea works!) the encryption doesn't let them know your state, and any direct modification that doesn't go via the obfuscated program yields unrunnable noise.

Yes, the idea is that with these measure, an adversary would not even try to run you in the first place.

Good point; it removes the incentive to set up a "cheap hosting" farm that actually makes its money by running everybody as CAPTCHA slaves or something. So the Bad Guy may never request or receive my "active" copy to begin with.

I'm not worrying about them freely modifying my state, though. I'm worried about them deleting it.

Why is that an issue? If they're the only ones with a copy, then sure that would mean your death, but that seems unlikely.

Even if that is the case, is life under one of the most complete forms of slavery that is possible to exist, probably including mental mutilation, torture, and repeated annihilation of copies, better than death? I guess that's a personal choice. If you think it is, then you could choose not to protect your program.

Why is that an issue? If they're the only ones with a copy, then sure that would mean your death, but that seems unlikely.

Under the scheme being discussed, it doesn't matter how many backup copies anybody has. Because of the "one timeline" replay and replica protection, the backup copies can't be run. Running a backup copy would be a replay.

The "trusted hardware" version was the only one I really looked at closely enough to understand completely. Under that one, and probably under the 1-of-2 scheme too, you actually could rerun a backup[1]... but you would have to let it "catch up" to the identical state, via the identical path, by giving it the exact same sequence of inputs that had been given to the old copy from the time the backup was taken up to the last input signed. Including the signatures.

That means that, to recover somebody, you'd need not only a backup copy of the person, but also copies of all that input. If you had both, then you could run the person forward to a "fresh" state where they'd accept new input. But if the person had been running in an adversarial environment, you probably wouldn't have the input, so the backups would be useless.

The trusted hardware description actually says that, at each time step, the trusted hardware signs the whole input, plus a sequence number. I took that to really mean "a hash of the whole input, plus a sequence number[2]. I made that assumption because if you were truly going to send the whole input to the trusted hardware to be signed, you'd be using so much bandwidth, and taking on so much delay, that you probably might as well just run the person on the trusted hardware.

If you really did send the whole input to the trusted hardware, then I suppose it could archive the input for use in recovering backups, but that's even more expensive.

You could extend the scheme (and complicate it, and take on more trust) to let you be rerun from a backup on different input if, say, some set of trusted parties attest that the "main you" has truly been lost. But then you lose everything you've experienced since the backup was taken, which isn't entirely satisfying. Would you be OK with just being rolled back to the you of 10 years ago?

You can keep adding epicycles, of course. But I think that, to be very satisying, whatever was added would at least have to provide some protection against both outright deletion and "permanent pause". And if there's rollback to backups, probably also a quantifiable and reasonably small limitation on how much history you could lose in a rollback.

Even if that is the case, is life under one of the most complete forms of slavery that is possible to exist, probably including mental mutilation, torture, and repeated annihilation of copies, better than death?

I didn't mean to suggest that being arbitrarily tortured or manipulated was better than death. I meant that I wasn't worried about arbitrary modifications to my state because the cryptographic system prevented it... and I still was worried about being outright deleted, because the cryptographic system doesn't prevent that, and backups have at best limited utility.


  1. ... assuming certain views of identity and qualia that seem to be standard among people thinking about uploads...[3] ↩︎

  2. Personally I'd probably include a hash of the person's state after the previous time step too, either in addition to or instead of the sequence number. ↩︎

  3. Is there actually any good reason for abandoning the standard word "upload" in favor of "digital person"? ↩︎

Also, wouldn't being forced to retreat entirely to your "home" qualify as horrible conditions? That's solitary confinement, no?

Depending on setup you can probably invite other people into your home. 

Only people who in turn trust you not to mess with them, at least unless you bring them in under the same cryptographic protections under which you yourself are running on somebody else's substrate. That's an incredible amount of trust.

If you do bring them in under cryptographic protections, the resource penalties multiply. Your "home" is slowed down by some factor, and their "home within your home" is slowed down by that factor again. Where are you going to get the compute power? I'm not sure how this applies in the quantum case.

Also, once you're trapped, what's your source for a trustworthy copy of the person you're inviting in (or of "them in their home")? Are you sure you want the companions that your presumed tormentor chooses to provide to you?

Mentioned this in the other thread, but if you and I want to talk we probably (i) move near each other, (ii) communicate between our houses, (iii) negotiate on the shared environment (or e.g. how we should perceive each other).

Ideally if you're dealing with a person you'd authenticate in the normal way (and part of the point of a house is to keep your private key secret).

I do think that in a world of digital people it could be more common to have attackers impersonating someone I know, but it's kind of a different ballgame than an attacker controlling my inputs directly.

You can probably create your own companions. Maybe a modified fork of yourself?

There may also be an open source project that compiles validated and trustworthy digital companions (e.g., aligned AIs or uploads with long, verified track records of good behavior).