The original UDT post defined an agent's preferences as a list of programs it cares about, and a utility function over their possible execution histories. That's a good and general formulation, but how do we make a UDT agent care about our real world, seeing as we don't know its true physics yet? It seems difficult to detect humans (or even paperclips) in the execution history of an arbitrary physics program.

But we can reframe the problem like this: the universal prior is a probability distribution over all finite bitstrings. If we like some bitstrings more than others, we can ask a UDT agent to influence the universal prior to make those bitstrings more probable. After all, the universal prior is an uncomputable mathematical object that is not completely known to us (and even defined in terms of all possible computer programs, including ours). We already know that UDT agents can control such objects, and it turns out that such control can transfer to the real world.

For example, you could pick a bitstring that describes the shape of a paperclip, type it into a computer and run a UDT AI with a utility function which says "maximize the probability of this bitstring under the universal prior, mathematically defined in such-and-such way". For good measure you could give the AI some actuators, e.g. connect its output to the Internet. The AI would notice our branch of the multiverse somewhere within the universal prior, notice itself running within that branch, seize the opportunity for control, and pave over our world so it looks like a paperclip from as many points of view as possible. The thing is like AIXI in its willingness to kill everyone, but different from AIXI in that it probably won't sabotage itself by mining its own CPU for silicon or becoming solipsistic.

This approach does not see the universal prior as a fixed probability distribution over possible universes, instead you could view it as picking from several different possible universal priors. A utility function thus specified doesn't always look like utility maximization from within the universal prior's own multiverse of programs. For example, you could ask the agent to keep a healthy balance between paperclips and staples in the multiverse, i.e. minimize |P(A)-P(B)| where A is a paperclip bitstring, B is a staple bitstring, and P is the universal prior.

The idea is scary because well-meaning people can be tempted to use it. For example, they could figure out which computational descriptions of the human brain correspond to happiness and ask their AI to maximize the universal prior probability of those. If implemented correctly, that could work and even be preferable to unfriendly AI, but we would still lose many human values. Eliezer's usual arguments apply in full force here. So if you ever invent a powerful math intuition device, please don't go off building a paperclipper for human happiness. We need to solve the FAI problem properly first.

Thanks to Paul Christiano and Gary Drescher for ideas that inspired this post.

New Comment
4 comments, sorted by Click to highlight new comments since: Today at 6:19 AM

Sounds very similar to what my brain has been doing since I learned about UDT.

One problem is that a bitstring describing a quantum wave function of a bunch of particle making up a piece of memory substrate containing a bitstring looks nothing like that bitstring.

There is however a relatively short program transforming one into the other.

I severely doubt that. I'd not be surprised if more data was needed to specify how to find and interpret the memory in the universe than to specify both the universe and the content you're after together in many cases.

if you doubt this, download/run GOLLY , open the ready made patter with a computer in conways game of life, run it for a random while, and try to extract the contents of the computers memory using an hex editor.

I was referring to the literal interpretation of a universe consisting of just the substrate with the data. Reconsidering, that is a pretty useless interpretation given the context.