# 10

Personal Blog

I attended Nick Bostrom's talk at UC Berkeley last Friday and got intrigued by these problems again. I wanted to pitch an idea here, with the question: Have any of you seen work along these lines before? Can you recommend any papers or posts? Are you interested in collaborating on this angle in further depth?

The problem I'm thinking about (surely naively, relative to y'all) is: What would you want to program an omnipotent machine to optimize?

For the sake of avoiding some baggage, I'm not going to assume this machine is "superintelligent" or an AGI. Rather, I'm going to call it a supercontroller, just something omnipotently effective at optimizing some function of what it perceives in its environment.

As has been noted in other arguments, a supercontroller that optimizes the number of paperclips in the universe would be a disaster. Maybe any supercontroller that was insensitive to human values would be a disaster. What constitutes a disaster? An end of human history. If we're all killed and our memories wiped out to make more efficient paperclip-making machines, then it's as if we never existed. That is existential risk.

The challenge is: how can one formulate an abstract objective function that would preserve human history and its evolving continuity?

I'd like to propose an answer that depends on the notion of logical depth as proposed by C.H. Bennett and outlined in section 7.7 of Li and Vitanyi's An Introduction to Kolmogorov Complexity and Its Applications which I'm sure many of you have handy. Logical depth is a super fascinating complexity measure that Li and Vitanyi summarize thusly:

Logical depth is the necessary number of steps in the deductive or causal path connecting an object with its plausible origin. Formally, it is the time required by a universal computer to compute the object from its compressed original description.

The mathematics is fascinating and better read in the original Bennett paper than here. Suffice it presently to summarize some of its interesting properties, for the sake of intuition.

• "Plausible origins" here are incompressible, i.e. algorithmically random.
• As a first pass, the depth D(x) of a string x is the least amount of time it takes to output the string from an incompressible program.
• There's a free parameter that has to do with precision that I won't get into here.
• Both a string of length n that is comprised entirely of 1's, and a string of length n of independent random bits are both shallow. The first is shallow because it can be produced by a constant-sized program in time n. The second is shallow because there exists an incompressible program that is the output string plus a constant sized print function that produces the output in time n.
• An example of a deeper string is the string of length n that for each digit i encodes the answer to the ith enumerated satisfiability problem. Very deep strings can involve diagonalization.
• Like Kolmogorov complexity, there is an absolute and a relative version. Let D(x/w) be the least time it takes to output x from a program that is incompressible relative to w,
That's logical depth. Here is the conceptual leap to history-preserving objective functions. Suppose you have a digital representation of all of human society at some time step t, calling this ht. And suppose you have some representation of the future state of the universe u that you want to build an objective function around. What's important, I posit, is the preservation of the logical depth of human history in its computational continuation in the future.

We have a tension between two values. First, we want there to be an interesting, evolving future. We would perhaps like to optimize D(u).

However, we want that future to be our future. If the supercontroller maximizes logical depth by chopping all the humans up and turning them into better computers and erasing everything we've accomplished as a species, that would be sad. However, if the supercontroller takes human history as an input and then expands on it, that's much better. D(u/ht) is the logical depth of the universe as computed by a machine that takes human history at time slice t as input.

Working on intuitions here--and your mileage may vary, so bear with me--I think we are interested in deep futures and especially those futures that are deep with respect to human progress so far. As a conjecture, I submit that those will be futures most shaped by human will.

So, here's my proposed objective for the supercontroller, as a function of the state of the universe. The objective is to maximize:

f(u) = D(u/ht) / D(u)

I've been rather fast and loose here and expect there to be serious problems with this formulation. I invite your feedback! I'd like to conclude by noting some properties of this function:
• It can be updated with observed progress in human history at time t' by replacing ht with ht'. You could imagine generalizing this to something that dynamically updated in real time.
• This is a quite conservative function, in that it severely punishes computation that does not depend on human history for its input. It is so conservative that it might result in, just to throw it out there, unnecessary militancy against extra-terrestrial life.
• There are lots of devils in the details. The precision parameter I glossed over. The problem of representing human history and the state of the universe. The incomputability of logical depth (of course it's incomputable!). My purpose here is to contribute to the formal framework for modeling these kinds of problems. The difficult work, like in most machine learning problems, becomes feature representation, sensing, and efficient convergence on the objective.

Sebastian Benthall
PhD Candidate
UC Berkeley School of Information

# 10

New Comment

This reads like concentrated confusion to me.

1) Logical depth seems super cool to me, and is perhaps the best way I've seen for quantifying "interestingness" without mistakenly equating it with "unlikeliness" or "incompressibility".

2) Despite this, Manfred's brain-encoding-halting-times example illustrates a way a D(u/h) / D(u) optimized future could be terrible... do you think this future would not obtain, because despite being human-brain-based, would not in fact make much use of being on a human brain? That is, it would have extremely high D(u) and therefore be penalized?

I think it would be easy to rationalize/over-fit our intuitions about this formula to convince ourselves that it matches our intuitions about what is a good future. More realistically, I suspect that our favorite futures have relatively high D(u/h) / D(u) but not the highest value of D(u/h) / D(u).

1) Thanks, that's encouraging feedback! I love logical depth as a complexity measure. I've been obsessed with it for years and it's nice to have company.

2) Yes, my claim is that Manfred's doomsday cases would have very high D(u) and would be penalized. That is the purpose of having that term in the formula.

I agree with your suspicion that our favorite future have relatively high D(u/h) / D(u) but not the highest value of D(u/h) / D(u). I suppose I'd defend a weaker claim, that a D(u/h) / D(u) supercontroller would not be an existential threat. One reason for this is that D(u) is so difficult to compute that it would be pretty bogged down....

One reason for making a concrete proposal of an objective function is that if it pretty good, that means maybe it's a starting point for further refinement.

I agree with your suspicion that our favorite future have relatively high D(u/h) / D(u) but not the highest value of D(u/h) / D(u).

Many utility functions have the same feature. For example, I could give the AI some flying robots with cameras, and teach it to count smiling people in the street by simple image recognition algorithms. That utility function would also assign a high score to our favorite future, but not the highest score. Of course the smile maximizer is one of LW's recurring nightmares, like the paperclip maximizer.

I suppose I'd defend a weaker claim, that a D(u/h) / D(u) supercontroller would not be an existential threat. One reason for this is that D(u) is so difficult to compute that it would be pretty bogged down...

Any function that's computationally hard to optimize would have the same feature.

What other nice features does your proposal have?

Imagine that I will either stub my toe a minute from now (outcome A), or won't (outcome B). I don't see why your proposed utility function would order A and B correctly. Since I can come up with many examples of A and B, there's a high chance that the gradient of your utility function at the current state of the world is pointing in a wrong direction. That seems to be a bad sign in itself, and also rules out the possibility of testing your utility function would a weak AI, because it wouldn't improve humanity's welfare. So we need to put your function in a strong AI, push the button and hope for the best.

You have tried to guess what kind of future that strong AI would create, but I don't see how to make such guesses with confidence. The failure modes might be much more severe than something like "unnecessary militancy against extra-terrestrial life". It's most likely that humans won't exist at all, because some other device that you haven't thought of would be better at maximizing "logical depth" or whatever. For a possible nightmare failure mode, imagine that humans are best at creating "logical depth" when they're in pain because that makes them think frantically, so the AI will create many humans and torture them.

Eliezer had a nice post arguing against similar approaches.

Li and Vitanyi's An Introduction to Kolmogorov Complexity and Its Applications which I'm sure many of you have handy.

Unsure if sarcasm or just a surprisingly good prediction, since I happen to have it on my desktop at the moment. (Thanks, /u/V_V).

Anyhow, why should this state contain humans at all? The rest of the universe out-entropies us by a lot. Think of all the radical high-logical-depth things the Sun can do that would happen to be bad for humans.

Even if we just considered a human brain, why should a high-relative-logical-depth future look like the human brain continuing to think? Why not overwrite it with a SAT-solver? Or a human brain whose neuron firings encode the running times of turing machines that halt in less than a googolplex steps? Or turn the mass-energy into photons to encode a much huger state?

A good prediction :)

Logical depth is not entropy.

The function I've proposed is to maximize depth-of-universe-relative-to-humanity-divided-by-depth-of-universe.

Consider the decision to kill off people and overwrite them with a very fast SAT solver. That would surely increase depth-of-universe, which is in the denominator. I.e. increasing that value decreases the favorability of this outcome.

What increases the favorability of the outcome, in light of that function, are the computation of representations that take humanity as an input. You could imagine the supercontroller doing a lot to, say, accelerate human history. I think that would likely either involve humans or lots of simulations of humans.

Ah, okay, my bad for just thinking of it as maximizing relative depth.

So what's really pushed are things that are logically deep in their simplest expression in terms of humanity, but not logically deep in terms of fundamental physics.

Depending on how this actually gets cashed out, the "human" that encodes deep computational results rather than actually living is still a very desirable object.

Here's a slightly more alive dystopia: Use humanity to embody a complicated turing machine (like how the remark goes that chimpanzees are turing complete because they can be trained to operate a turing machine). Deep computational work appears to be being done (relative to humanity), but from a fundamental physics point of view it's nothing special. And it's probably not much fun for the humans enslaved as cogs in the machine.

First, I'm grateful for this thoughtful engagement and pushback.

Let's call your second dystopia the Universal Chinese Turing Factory, since it's sort of a mash-up of the factory variant of Searle's Chinese Room argument and a universal Turing Machine.

I claim that the Universal Chinese Turing Factory, if put to some generic task like solving satisfiability puzzles, will not be favored by a supercontroller with the function I've specified.

Why? Because if we look at the representations computed by the Universal Chinese Turing Factory, they may be very logically deep. But their depth will not be especially due to the fact that humanity is mechanically involved in the machine. In terms of the ratio of depth-relative-to-humanity to absolute-depth, there's going to be hardly any gains there.

(You could argue, by the way, that if you employed all of humanity in the task of solving math puzzles, that would be much like the Universal Chinese Turing Factory you describe.)

Let's consider an alternative world, the Montessori World, where all of humanity is employed creating ever more elaborate artistic representations of the human condition. Arguably, this is the condition of those who dream up a post-scarcity economy where everybody gets to be a quasi-professional creative doing improv comedy, interpretative dance, and abstract expressionist basket-weaving. A utopia, I'm sure you'd agree.

The thing is that these representations will be making use of humanity's condition h as an integral part of the computing apparatus that produces the representations. So, humanity is not just a cog in a universal machine implementing other algorithms. It is the algorithm, which the supercontroller then has the interest of facilitating.

That's the vision anyway. Now that I'm writing it out, I'm thinking maybe I got my math wrong. Does the function I've proposed really capture these intuitions?

For example, maybe a simpler way to get at this would be to look at the Kolmogorov complexity of the universe relative to humanity, K(u/h). That could be a better Montessori world. Then again, maybe Montessori world isn't as much of a utopia as I've been saying it is.

If I sell the idea of these kinds of complexity-relative-to-humanity functions as a way of designing supercontrollers that are not existential threats, I'll consider this post a success. The design of the particular function is I think an interesting ethical problem or choice.

Well, certainly I've been coming from the viewpoint that it doesn't capture these intuitions :P Human values are complicated (I'm assuming you've read the relevant posts here (e.g.)), both in terms of their representation and in terms of how they are cashed out from their representation into preferences over world-states. Thus any solution that doesn't have very good evidence that it will satisfy human values, will very likely not do so (small target in a big space).

In terms of the ratio of depth-relative-to-humanity to absolute-depth, there's going to be hardly any gains there.

I'd say, gains relative to what? When considering an enslaved Turing Factory full of people versus a happy Montessori School full of people, I don't see why there should be any predictable difference in their logical depth relative to fundamental physics. The Turing Factory only shows up as "simple" on a higher level of abstraction.

If we restrict our search space to "a planet full of people doing something human-ish," and we cash out "logical depth relative to humanity" as operating on a high level of abstraction where this actually has a simple description, then the process seems dominated by whatever squeezes the most high-level-of-abstraction deep computation out of people without actually having a simple pattern in the lower level of abstraction of fundamental physics.

Idea for generally breaking this: If we cash out the the "logical depth relative to humanity" as being in terms of fundamental physics, and allowing us to use a complete human blueprint, then we can use this to encode patterns in a human-shaped object that are simple and high-depth relative to the blueprint but look like noise relative to physics. If both "logical depth relative to humanity" and "logical depth" are on a high, humanish level of abstraction, one encodes high-depth computational results in slight changes to human cultural artifacts that look like noise relative to a high-level-of-abstraction description that doesn't have the blueprints for those artifacts. Etc.

Maybe this will be more helpful:

If the universe computes things that are not computational continuations of the human condition (which might include resolution to our moral quandaries, if that is in the cards), then it is, with respect to optimizing function g, wasting the perfectly good computational depth achieved by humanity so far. So, driving computation that is not somehow reflective of where humanity was already going is undesirable. The computational work that is favored is work that makes the most of what humanity was up to anyway.

To the extent that human moral progress in a complex society is a difficult computational problem, and there's lots of evidence that it is, then that is the sort of thing that would be favored by objective g.

If moral progress of humanity is something that has a stable conclusion (i.e., humanity at some point halts or goes into a harmonious infinite loop that does not increase in depth) then objective g will not help us hit that mark. But in that case, it should be computationally feasible to derive a better objective function.

To those who are unsatisfied with objective g as a solution to Problem 2, I pose the problem: is there a way to modify objective g so that it prioritizes morally better futures? If not, I maintain the objective g is still pretty good.

As I see it, there are two separate problems. One is preventing catastrophic destruction of humanity (Problem 1). The other is creating utopia (Problem 2). Objective functions that are satisficing with respect to Problem 1 may not be solutions to Problem 2. While as I read it the Yudkowsky post you linked to argues for prioritizing Problem 2, on the contrary my sense of the thrust of Bostrom's argument is that it's critical to solve Problem 1. Maybe you can tell me if I've misunderstood.

Without implicating human values, I'm claiming that the function f(u) = D(u/ht) / D(u) satisfies Problem 1 (the existential problem). I'm just going to refer to that function as f now.

You seem have conceded this point. Maybe I've misinterpreted you.

As for solving Problem 2, I think we'd agree that any solutions to the utopia problem will also be solutions to the existence problem (Problem 1). The nice thing about f is that its range is (0,1), so it's easy to compose it with other functions that could weight it more towards a solution to Problem 2.

I'm not sure if I entirely follow what you're say here, so I'm having a hard time understanding exactly the point of disagreement.

Is the point you're making about the unpredictability of the outcome of optimizing for f? Because the abstract patterns favored by f will look like noise relative to physics?

I think there are a couple elaborations worth making.

First, like Kolmogorov complexity, logical depth depends on a universal computer specification. I gather that you are assuming that the universal computer in question is something that simulated fundamental physics. This need not be the case. Depth is computed as the least running time of incompressible programs on the universal computer.

Suppose we were to try to evolve through a computational process a program that outputs a string that represented the ultimate, flourishing potential of humanity. One way to get that is to run the Earth as a physical process for a period of time and get a description of it at the end, selecting only those timelines in its stochastic unfolding in which life on Earth successfully computes itself indefinitely.

If you stop somewhere along the way, like timestep t, then you are going to get a representation that encodes some of the progress towards that teleological end.

(I think there's a rough conceptual analogy to continuations in functional programming here, if that helps)

An important property of logical depth is the Slow Growth Law. This is proved by Bennett. It says that deep objects cannot be produced quickly from shallow ones. Incompressible programs being the shallowest strings of all. It's not exactly that depth stacks additively, but I'm going to pretend it does for the intuitive argument here (which may be wrong):

If you have the depth of human progress D(h) and the depth of the universe at some future time D(u), then always D(u/h) < D(u) assuming h is deep at all and the computational products of humanity exist at all. But...

ah, I think I've messed up the formula. Let's see... let's have h' be a human slice taken after the time of h.

D(u) > D(u/h') > D(u/h) > D(h) assuming humanity's computational process continues. The more that h' encodes the total computational progress of u, i.e., the higher D(u/h') is relative to D(u)...

Ok, I think I need to modify the formula some. Here's function g:

g(u) = (D(h) + D(u/h)) / D(u)

Does maximizing this function produce better results? Or have I missed your point?

General response: I think you should revise the chances of this working way downwards until you have some sort of toy model where you can actually prove, completely, with no "obvious" assumptions necessary, that this will preserve values or at least the existence of an agent in a world. But I think enough has been said about this already.

Specific response:

Is the point you're making about the unpredictability of the outcome of optimizing for f? Because the abstract patterns favored by f will look like noise relative to physics?

"Looks like noise" here means uncompressability, and thus logical shallowness. I'll try again to explain why I think that relative logical depth turns out to not look like human values at all, and you can tell me what you think.

Consider an example.

Imagine, if you will, logical depth relative to a long string of nearly-random digits, called the Ongoing Tricky Procession. This is the computational work needed to output a string from its simplest description, if our agent already knows the Ongoing Tricky Procession.

On the other hand, boring old logical depth is the computational work needed to output a string from its simplest description period. The logical depth of the Ongoing Tricky Procession is not very big, even though it has a long description length.

Now imagine a contest between two agents, Alice and Bob. Alice knows the Ongoing Tricky Procession, and wants to output a string of high logical depth (to other agents who know the Ongoing Tricky Procession). The caveat is Bob has to think that the string has low logical depth. Is this possible?

The answer is yes. Alice and Bob are spies on opposite sides, and Alice is encrypting her deep message with a One Time Pad. Bob can't decrypt the message because, as every good spy knows, One Time Pads are super-duper secure, and thus Bob can't tell that Alice's message is actually logically deep.

Even if the Ongoing Tricky Procession is not actually that K-complex, Alice can still hide a message in it - she just isn't allowed to give Bob a simple description that actually decomposes into the OTP and the message.

This is almost the opposite of the Slow Growth Law. Slow Growth is where you have shallow inputs and you want to make a deep output. Alice has this deep message she wants to send to her homeland, but she wants her output to be shallow according to Bob. Fast Decay :P

Re: Generality.

Yes, I agree a toy setup and a proof are needed here. In case it wasn't clear, my intentions with this post was to suss out if there was other related work out there already done (looks like there isn't) and then do some intuition pumping in preparation for a deeper formal effort, in which you are instrumental and for which I am grateful. If you would be interested in working with me on this in a more formal way, I'm very open to collaboration.

Regarding your specific case, I think we may both be confused about the math. I think you are right that there's something seriously wrong with the formulas I've proposed.

If the string y is incompressible and shallow, then whatever x is, D(x) ~ D(x/y), because D(x) (at least in the version I'm using for this argument) is the minimum computational time of producing x from an incompressible program. If there is a minimum running time program P that produces x, then appending y as noise at the end isn't going to change the running time.

I think this case with incompressible y is like your Ongoing Tricky Procession.

On the other hand, say w is a string with high depth. Which is to say, whether or not it is compressible in space, it is compressible in time: you get it by starting with something incompressible and shallow and letting it run in time. Then there are going to be some strings x such that D(x/w) + D(w) ~ D(x). There will also be a lot of strings x such that D(x/w) ~ D(x) because D(w) is finite and there tons of deep things the universe can compute that are deeper. So for a given x, D(x) > D(x/w) > D(x) - D(w) , roughly speaking.

I'm saying the h, the humanity data, is logically deep, like w, not incompressible and shallow, like y or the ongoing tricky procession.

Hmm, it looks like I messed up the formula yet again.

What I'm trying to figure out is to select for universes u such that h is responsible for a maximal amount of the total depth. Maybe that's a matter of minimizing D(u/h). Only that would lead perhaps to globe-flattening shallowness.

What if we tried to maximize D(u) - D(u/h)? That's like the opposite of what I originally proposed.

I'm still confused as to what D(u/h) means. It looks like it should refer to the number of logical steps you need to predict the state of the universe - exactly, or up to a certain precision - given only knowledge of human history up to a certain point. But then any event you can't predict without further information, such as the AI killing everyone using some astronomical phenomenon we didn't include in the definition of "human history", would have infinite or undefined D(u/h).

[-][anonymous]9y5

May we live in interesting times?

Could you elaborate on why you think 'deep' futures are the ones most reflective of human will?

I can try. This is new thinking for me, so tell me if this isn't convincing.

If a future is deep with respect to human progress so far, but not as deep with respect to all possible incompressible origins, then we are selecting for futures that in a sense make use of the computational gains of humanity.

These computational gains include such unique things as:

• human DNA, which encodes our biological interests relative to the global ecosystem.

• details, at unspecified depth, about the psychologies of human beings

• political structures, sociological structures, etc.

I've left very unspecified what aspects of humanity should constitute the h term but my point is that by including them, to the extent that they represent the computationally costly process of biological and cultural evolution, they will be a precious endowment of high D(u/ht) / D(u) futures. So at the very least they will be preserved in the ongoing computational dynamism.

Further, the kinds of computations that would increase that ratio are the sorts of things that would be like the continuation of human history in a non-catastrophic way. To be concrete, consider the implementation that runs a lot of Monte Carlo simulations of human history from now on, with differences in the starting conditions based on the granularity of the h term and with simulations of exogenous shocks. Cases where large sections of humanity have been wiped out or had no impact would be less desirable than those in which the full complexity of human experience was taken up and expanded on.

A third argument is that something like coherent extrapolated volition or indirect normativity is exactly the kind of thing that is favored by depth with respect to humanity but not absolute depth. That's a fairly weak claim but one that I think could motivate friendly amendments to the original function.

Lastly, I am drawing on some other ethical theory here which is out of scope of this post. My own view is shaped heavily by Simone de Beauvoir's The Ethics of Ambiguity, whose text can be found here:

http://www.marxists.org/reference/subject/ethics/de-beauvoir/ambiguity/

I think the function I've proposed is a better expression of existentialist ethics than consequentialist ethics.

Further, the kinds of computations that would increase that ratio are the sorts of things that would be like the continuation of human history in a non-catastrophic way.

This is not obvious to me. I concur with Manfred's point that "any solution that doesn't have very good evidence that it will satisfy human values, will very likely not do so (small target in a big space)."

To be concrete, consider the implementation that runs a lot of Monte Carlo simulations of human history from now on, with differences in the starting conditions based on the granularity of the h term and with simulations of exogenous shocks.

Why couldn't they just scan everyone's brain then store the information in a big hard drive in a maximum-security facility while the robots wipe every living person out and start anew? Perhaps it's possible that by doing that you vastly increase resilience to exogenous shocks, making it preferable. And about 'using the computational gains of humanity', that could just as easily be achieved by doing the opposite of what humans would have done.

Non-catastrophic with respect to existence, not with respect to "human values." I'm leaving values out of the equation for now, focusing only on the problem of existence. If species suicide is on the table as something that might be what our morality ultimately points to, then this whole formulation of the problem has way deeper issues.

My point is that starting anew without taking into account the computational gains, you are increasing D(u) efficiently and D(u/h) inefficiently, which is not favored by the objective function.

If there's something that makes humanity very resilient to exogenous shocks until some later time, that seems roughly analogous to cryogenic freezing of the ill until future cures are developed. I think that still qualifies as maintaining human existence.

Doing the opposite of what humans would have done is interesting. I hadn't thought of that.