Depth-based supercontroller objectives, take 2

[-]jimrandomh11y100

The physicists' definition of entropy is misaligned with the intuitive definition, because it is affected massively more by micro-scale things like temperature and by mixing, than by macro-scale things like objects and people. This tends to trip people up, when they try to take it out of chemistry and physicists and bring it anywhere else.

When I look at your function g(u), I notice that it has a very similar flavor. While I'm having a very hard time interpreting what it will actually end up optimizing, my intuition is that it will end up dominated by some irrelevant micro-scale property like temperature. That's the outside view.

On the inside view, I see a few more reasons to worry. This universe's physics is very computationally inefficient, so the shortest computational path from any state A to some other state B will almost certainly bypass it somehow. Furthermore, human brains are very computationally inefficient, so I would expect the shortest computational path to bypass them, too. I don't know what that computational shortcut might be, but I wouldn't expect to like it.

Exploring the properties of logical depth might be interesting, but I don't expect a good utility function out of it.

[-]sbenthall11y10

Your point about physical entropy is noted and a good one.

One reason to think that something like D(u/h) would pick out higher level features of reality is that h encodes those higher-level features. It may be possible to run a simulation of humanity on more efficient physical architecture. But unless that simulation is very close to what we've already got, it won't be selected by g.

You make an interesting point about the inefficiency of physics. I'm not sure what you mean by that exactly, and am not in a position of expertise to say otherwise. However, I think there is a way to get around this problem. Like Kolmogorov complexity, depth has another hidden term in it, the specification of the universal Turing machine that is used, concretely, to measure the depth and size of strings. By defining depth in terms of a universal machine that is a physics simulator, then there wouldn't be a way to "bypass" physics computationally. That would entail being able to build a computer, which our physics, that would be more efficient than our physics. Tell me if that's not impossible.

Re: brains, I'm suggesting that we encode whatever we think is important about brains in the h term. If brains execute a computational process, then that process will be preserved somehow. It need not be preserve on grey matter exactly. Those brains could be uploaded onto more efficient architecture.

I appreciate your intuitions on this but this function is designed rather specifically to challenge them.

[-]jimrandomh11y90

I finally figured out what this does.

It takes h, applies an iterated hashing/key-stretching style function to it, and tiles the universe with the result.

Sorry.

[-]cousin_it11y20

Yeah, something like that. "Make the state of the universe such that it's much easier to compute knowing h than without h" doesn't mean that the computation will use any interesting features of h, it could just be key-stretching.

[-]sbenthall11y10

Could you flesh this out? I'm not familiar with key-stretching.

A pretty critical point is whether or not the hashed value is algorithmically random. The depth measure has the advantage of picking over all permissible starting conditions without having to run through each one. So it's not exactly analogous to a brute force attack. So for the moment I'm not convinced on this argument.

[-]sbenthall11y20

Maybe. Can you provide an argument for that?

As stated, that wouldn't maximize g, since applying the hash function once and tiling would cap the universe at finite depth. Tiling doesn't make any sense.

[-][anonymous]11y30

I don't think it's literally tiling. More hash stretching all the way.

[-][anonymous]11y20

I had a hazy sense of that direction of thing being the most likely actual result. Thanks for putting your finger on it for me.

[-]Randaly11y40

Note: I may have badly misunderstood this, as I am not familiar with the notion of logical depth. Sorry if I have!

I found this post's arguments to be much more comprehensible than your previous ones; thanks so much for taking the time to rewrite them. With that said, I see three problems:

1) '-D(u/h)' optimizes for human understanding of (or, more precisely, human information of) the universe, such that given humans you can efficiently get out a description of the rest of the universe. This also ensures that whatever h is defined as continues to exist. But many (indeed, even almost all) humans values aren't about entanglement with the universe. Because h isn't defined explicitly, it's tough for me to state a concrete scenario where this goes wrong. (This isn't a criticism of the definition of h, I agree with your decision not to try to tightly specify it.) But, e.g. it's easy to imagine that humans having any degree of freedom would be inefficient, so people would end drug-addled, in pods, with videos and audio playing continuously to put lots of carefully selected information into the humans. This strikes me as a poor outcome.

2) Some people (e.g. David Pearce (?) or MTGandP) argue that the best possible outcome is essentially tiled- that rather than have large and complicated beings human-scale or larger, it would be better to have huge numbers of micro-scale happy beings. I disagree, but I'm not absolutely certain, and I don't think we can rule out this scenario without explicitly or implicitly engaging with it.

3) As I understand it, in 3.1 you state that you aren't claiming that g is an optimal objective function, just that it leaves humans alive. But in this case 'h', which was not ever explicitly defined, is doing almost all of the work: g is guaranteed to preserve 'h', which you verbally identified with the physical state of humanity. But because you haven't offered a completely precise definition of humanity here, what the function as described above would preserve is 'a representation of the physical state of humanity including its biological makeup--DNA and neural architecture--as well as its cultural and technological accomplishments'. This doesn't strike me as a significant improvement from simply directly programming in that humans should survive, for whatever definition of humans/humanity selected; while it leaves the supercontroller with different incentives, in neither scenario are said incentives aligned with human morality.

(My intuition regarding g* is even less reliable than my intuition regarding g; but I think all 3 points above still apply.)

[-]sbenthall11y30

Thanks for your thoughtful response. I'm glad that I've been more comprehensible this time. Let me see if I can address the problems you raise:

1) Point taken that human freedom is important. In the background of my argument is a theory that human freedom has to do with the endogeneity of our own computational process. So, my intuitions about the role of efficiency and freedom are different from yours. One way of describing what I'm doing is trying to come up with a function that a supercontroller would use if it were to try to maximize human freedom. The idea is that choices humans make are some of the most computationally complex things they do, and so the representations created by choices are deeper than others. I realize now I haven't said any of that explicitly let alone argued for it. Perhaps that's something I should try to bring up in another post.

2) I also disagree with the morality of this outcome. But I suppose that would be taken as beside the point. Let me see if I understand the argument correctly: if the most ethical outcome is in fact something very simple or low-depth, then this supercontroller wouldn't be able to hit that mark? I think this is a problem whenever morality (CEV, say) is a process that halts.

I wonder if there is a way to modify what I've proposed to select for moral processes as opposed to other generic computational processes.

3) A couple responses:

Oh, if you can just program in "keep humanity alive" then that's pretty simple and maybe this whole derivation is unnecessary. But I'm concerned about the feasibility of formally specifying what is essential about humanity. VAuroch has commented that he thinks that coming up with the specification is the hard part. I'm trying to defer the problem to a simpler one of just describing everything we can think of that might be relevant. So, it's meant to be an improvement over programming in "keep humanity alive" in terms of its feasibility, since it doesn't require solving perhaps impossible problems of understanding human essence.
Is it the consensus of this community that finding an objective function in E is an easy problem? I got the sense from Bostrom's book talk that existential catastrophe was on the table as a real possibility.

I encourage you to read the original Bennett paper if this interests you. I think your intuitions are on point and appreciate your feedback.

[-]Randaly11y30

Thanks for your response!

1) Hmmm. OK, this is pretty counter-intuitive to me.

2) I'm not totally sure what you mean here. But, to give a concrete example, suppose that the most moral thing to do would be to tile the universe with very happy kittens (or something). CEV, as I understand, would create as many of these as possible, with its finite resources; whereas g/g* would try to create much more complicated structures than kittens.

3) Sorry, I don't think I was very clear. To clarify: once you've specified h, a superset of human essence, why would you apply the particular functions g/g to h? Why not just directly program in 'do not let h cease to exist'? g/g do get around the problem of specifying 'cease to exist', but this seems pretty insignificant compared to the difficulty of specifying h. And unlike with programming a supercontroller to preserve an entire superset of human essence, g/g* might wind up with the supercontroller focused on some parts of h that are not part of the human essence- so it doesn't completely solve the definition of 'cease to exist'.

(You said above that h is an improvement because it is a superset of human essence. But we can equally program a supercontroller not to let a superset of human essence cease to exist, once we've specified said superset.)

[+]RPMcMurphy11y-90

[+][anonymous]11y-60

[-]Toggle11y40

I enjoyed both this and the previous post. Not the usual computational fare around here, and it's fun to play with new frameworks. I upvoted particularly for incorporating feedback and engaging with objections.

I have a couple of ways in which I'd like to challenge your ideas.

If I'm not mistaken, there are two routes to take in maximizing g. Either you can minimize D(u/h), or you can just drive D(u) through the roof and not damage h too badly. Intuitively, the latter seems to give you a better payoff per joule invested. Let's say that our supercontroller grabs a population of humans, puts them in stasis pods of some kind, and then goes about maximizing entropy by superheating the moon. This is a machine that has done a pretty good job of increasing g(u). As long as the supercontroller is careful to keep D(u/h) from approaching D(u), it can easily ignore that term without negotiating the complexity of human civilzation or even human consciousness. That said, I clearly don't understand relative logical depth very well- so maybe D(u/h) approaches D(u), in the case that D(u) increases as h is held constant?

Another very crucial step here is in the definition of humanity, and which processes count as human ones. I'm going to assume that everyone here is a member in good standing of Team Reductionism, so this is not a trivial task. It is called trans humanism, after all, and you are more than willing to abstract away from the fleshy bits when you define 'human'. So what do you keep? It seems plausible, even likely, that we will not be able to define 'humanity' with a precision that satisfies our intuitions until we already have the capacity to create a supercontroller. In this sense your suggestion is hiding the problem it attempts to solve- that is, how to define our values with sufficient rigor that our machines can understand them.

[-]sbenthall11y20

Thanks for your encouraging comments. They are much appreciated! I was concerned that following the last post with an improvement on it would be seen as redundant, so I'm glad that this process has your approval.

Regarding your first point:

Entropy is not depth. If you do something that increases entropy, then you actually reduce depth, because it is easier to get to what you have from an incompressible starting representation. In particular, the incompressible representation that matches the high-entropy representation you have created. So if you hold humanity steady and superheat the moon, you more or less just keep things at D(u) = D(h), with low D(u/h).
You can do better if you freeze humanity and then create fractal grey goo, which is still in the spirit of your objection. Then you have high D(u), D(u/h) is something like D(u) - D(h) except for when the fractal starts to reproduce human patterns out of the sheer vigor of its complexity, in which case I guess D(u/h) would begin to drop...though I'm not sure. This may require a more thorough look at the mathematics. What do you think?

Regarding your second point...

Strictly speaking, I'm not requiring that h abstract away the fleshy bits and capture what is essentially human or transhuman. I am trying to make the objective function agnostic to these questions. Rather, h can include fleshy bits and all. What's important is that it includes at least what is valuable, and that can be done by including anything that might be valuable. The needle in the haystack can be discovered later, if it's there at all. Personally, I'm not a transhumanist. I'm an existentialist; I believe our existence precedes our essence.

That said I think this is a clever point with substance to it. I am, in fact, trying to shift our problem-solving attention to other problems. However, I am trying to turn attention to more tractable and practical questions.

One simple one is: how can we make better libraries for capturing human existence, so that a supercontroller could make use of as much data as possible as it proceeds?

Another is: given that the proposed objective function is in fact impossible to compute, but (if the argument is ultimately successful) also given that it points in the right direction, what kinds of processes/architectures/algorithms would approximate a g-maximizing supercontroller? Since we have time to steer in the right direction now, how should we go about it?

My real agenda is that I think that there are a lot of pressing practical questions regarding machine intelligence and its role in the world, and that the "superintelligence" problem is a distraction except that it can provide clearer guidelines of how we should be acting now.

[-]MrMind11y30

Culture, thought, human DNA, human values, etc. have been stripped to their functional carbon and hydrogen atoms and everything now just optimizes for paperclip manufacturing or whatever. D(u/r) = D(u)

I contest this derivation. Whatever process produced humanity, made so that humanity produced an unsafe supercontroller. This may means that whatever the supercontroller is optimized for, it's part of the process that produced humanity, and so it does not make g(u,h) go to zero.

Of course, without a concrete model, it's impossible to say for certain.

[-]sbenthall11y30

So, the key issue is whether or not the representations produced by the paperclip optimizer could have been produced by other processes. If there is another process that produces the paperclip-optimized representations more efficiently than going through the process of humanity, then that process dominates the calculation of D(r).

In other words, for this objection to make sense, it's not enough for the humanity to have been sufficient for the R scenario. It must be necessary for producing R, or at least necessary to result in it in the most efficient possible way.

What are your criteria for a more concrete model than what has been provided?

[+]RPMcMurphy11y-90

[-]VAuroch11y20

Addressing part of the assumptions: While its assumed that a superintelligence has access to Enough Resources, or at least enough to construct more for itself and thus proceed rapidly toward a state of Enough Resources, the programmers of the superintelligence do not. This is very important when you consider that h needs to be present as input to the superintelligence before it can take action. So the programmers must provide something that compresses to h at startup. And that's a very difficult problem; if we could correctly determine what-all was needed for a full specification of humanity, we'd be a substantial way toward solving the complexity of value problem. So even if this argument works (and I don't think I trust it), it still wouldn't deal with the problem adequately.

[-]sbenthall11y10

I see, that's interesting. So you are saying that while the problem as scoped in §2 may take a function of arbitrary complexity, there is a constraint in the superintelligence problem I have missed, which is that the complexity of the objective function has certain computational limits.

I think this is only as extreme a problem as you say in a hard takeoff situation. In a slower takeoff situation, inaccuracies due to missing information could be corrected on-line as computational capacity grows. This is roughly business-as-usual for humanity---powerful entities direct the world according to their current best theories; these are sometimes corrected.

It's interesting that you are arguing that if we knew what information to include in a full specification of humanity, we'd be making substantial progress towards the value problem. In §3.2 I argued that the value problem need only be solved with a subset of the full specification of humanity. The fullness of that specification was desirable just because it makes it less likely that we'll be missing the parts that are important to value.

If, on the other hand, that you are right and the full specification of humanity is important to solving the value problem--something I'm secretly very sympathetic to--then

(a) we need a supercomputer capable of processing the full specification in order to solve the value problem, so unless there is an iterative solution here the problem is futile and we should just accept that The End Is Nigh, or else try, as I've done, to get something Close Enough and hope for slow takeoff, and

(b) the solution to the value problem is going to be somewhere done the computational path from h and is exactly the sort of thing that would be covered in the scope of g*.

It would be a very nice result, I think, if the indirect normativity problem or CEV or whatever could be expressed in terms of the the depth of computational paths from the present state of humanity for precisely this reason. I don't think I've hit that yet exactly but it's roughly what I'm going for. I think it may hinge on whether the solution to the value problem is something that involves a halting process, or whether really it's just to ensure the continuation of human life (i.e. as a computational process). In the latter case, I think the solution is very close to what I've been proposing.

[-]VAuroch11y10

While I would agree that not all portions of h are needed to solve the value problem, I think it's very plausible that it would take all of h to be certain that you'd solved the value problem. As in, you couldn't know that you had included everything important unless you knew that you had everything unimportant as well.

Also, I don't think I'm sympathetic to the idea that a slow takeoff buys you time to correct things. How would you check for inaccuracies? You don't have a less-flawed version to compare things to; if you did, you'd be using that version. Some inaccuracies will be large and obvious, but that's rarely, if ever, going to catch the kinds of errors that lead to hyperexistential catastrophe, and will miss many existential catastrophes.

[-][anonymous]11y-20

Close Enough and hope for slow takeoff, and

On the one hand "close enough" is adequate for horseshoes, but probably not good enough for THE FATE OF THE UNIVERSE (grabs algebra nerd by lapels and shakes vigorously)

On the other hand, supergeniuses like Ben Goertzel have suggested that a takeoff might follow a "semi-hard" trajectory. While others have suggested "firm takeoff" (Voss), and even "tumescent takeoff"

Like most of humanity, I'll start getting concerned when the computers finally beat us in chess. (...off camera whispering)

On a more serious note, the superhuman AI that polices this site just had a most unwelcome message for me: You are trying to eject messages too fast. Rehydrate and try again in 3 minutes. The machines! ...they're takin' over! They're already layin' down the law!

[This comment is no longer endorsed by its author]Reply

[-][anonymous]11y-40

I'm not sure we can "algebra" our way out of this dilemma. I think that we need to sit up and take notice that "liberal democracy" ("libertarian democracy," since I'm using the term "liberal" like Hayek did, and not like the conscious movement to hijack the term) dramatically outperforms state collectivist totalitarianism. During the time period when the USA was less totalitarian and had more of the features of "liberal democracy" than what we currently do, or did at the country's beginning, we performed better (more stated happiness, more equality under the law, more wealth generation, more immigrants revealing a preference for living here, etc.).

So why mention governments in a discussion of intelligent control? Because governments are how we choose to govern (or destroy), at the largest human scale. As such, they represent the best large-scale systems humans can set up, in accordance with human nature.

So, superhuman synthetic intelligence should build upon that. How? Well, we should make superhumanly-intelligent "classical liberals" that are fully equipped with mirror neurons. There should be many of them, and they should be designed to (1) protect their lives using the minimum force necessary to do so (2) argue about what course of action is best once their own lives have been preserved. If they possess mirror neurons and exposure to such thinkers as Hayek, it won't be hard to prevent them from destroying the world and all humans ---they will have a natural predisposition toward protecting and expanding life and luxury.

The true danger is that thinkers from LessWrong mistakenly believe they have designed reasonably intelligent FAI, and they build only ONE.

Lack of competition consolidates power, and with it, tendencies toward corruption.

I don't know if Bayes and Algebra mastery can teach a human being this lesson. Perhaps one needs to read "Lord of the Rings" or something similar, and perhaps algebra masters need to read something that causes all other variables to be multiplied by decimal percentages below the teens, and the "absolute power corrupts absolutely" variable needs to be ranked high and multiplied by 1.

There is wisdom in crowds (of empaths, using language alone). That said: Humans developed as societies with statistical distributions of majority empath conformists to minority "pure" sociopaths. Technology changes that, and rather suddenly. Dissenters can be found out and eliminated or discredited. Co-conspirators can be given power and prestige. Offices can be protected, and conformists can be catered to. Critics can be bought off.

It's a big world, with a lot of scary things that never get mentioned on LessWrong. My feeling is: there is no "one size fits all" smartest being.

Every John Galt you create is likely to be very imperfect in some way. No matter how general his knowledge. Even with the figures Kurzweil uses, he could be a smart Randian objectivist spacecraft designer, or a smart Hayekian liberal gardener, and even with all of human knowledge at its fingertips that wouldn't account for character, preference, or what details the synthetic mind chose to master. It might master spacecraft building, but spend all its time designing newer and more complex gardens, restaurants, and meals.

Emergence is messy. Thing clusters are messy. ...And hierarchical.

A superintelligence will likely derive its highest values the way we do: by similar "goal networks" in the same general direction "outvoting" one another (or, "hookers and blow" may intervene as a system-crashing external stimulus).

In any case, I'd rather have several such brains, rather than only one.

[This comment is no longer endorsed by its author]Reply

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

1

Depth-based supercontroller objectives, take 2

1

1