# Intuitive supergoal uncertainty

7 min read4th Dec 200927 comments

# 11

Personal Blog

There is a common intuition and feeling that our most fundamental goals may be uncertain in some sense. What causes this intuition? For this topic I need to be able to pick out one’s top level goals, roughly one’s context insensitive utility function, and not some task specific utility function, and I do not want to imply that the top level goals can be interpreted in the form of a utility function. Following from Eliezer’s CFAI paper I thus choose the word “supergoal” (sorry Eliezer, but I am fond of that old document and its tendency to coin new vocabulary). In what follows, I will naturalistically explore the intuition of supergoal uncertainty.

To posit a model, what goal uncertainty (including supergoal uncertainty as an instance) means is that you have a weighted distribution over a set of possible goals and a mechanism by which that weight may be redistributed. If we take away the distribution of weights how can we choose actions coherently, how can we compare? If we take away the weight redistribution mechanism we end up with a single goal whose state utilities may be defined as the weighted sum of the constituent goals’ utilities, and thus the weight redistribution mechanism is necessary for goal uncertainty to be a distinct concept.

• Part of the intuition of supergoal uncertainty naturally follows from goal and planning uncertainty. What plan of action is best? How should I construct a local utility function for this context? The weight redistribution mechanism is then a result of gathering more evidence, calculating further, and seeing how this goal links up to one’s supergoals in the context of other plans.
• It could be we are mistaken. The rules society sets up to coordinate behavior give the default assumption that there is an absolute standard by which to judge a behavior good or bad. Also, religions many times dictate that there is a moral absolute, and even if we aren’t religious, the cultural milieu make us consider the possibility that the concept “should” (and so the existence of a weight redistribution mechanism) can be validly applied to supergoals.
• It could be we are confused. Our professed supergoal does not necessarily equal our actual supergoal but neither are they completely separate. So when we review our past behavior and introspect to determine what our supergoal is, we get conflicting evidence that is hard to reconcile with the belief that we have one simple supergoal. Nor can we necessarily endorse the observed supergoal for social reasons. The difficulty in describing the supergoal is then represented as a weighting over possible supergoals and the weight redistribution mechanism corresponds to updating our self model given additional observations and introspections along with varying the social context.
• It could be I’m confused :) People, including some part of me, may wish to present the illusion that we do not know what our supergoals truly are and also pretend that the supergoals are malleable to change after argument for game theoretic reasons.
• It could be we are incoherent. As Allais’s paradox, hyperbolic discounting, and circular preferences, show that no utility function may be defined for people (at least in any simple way). How then may we approximate a person’s behavior with a utility function/supergoal? Using a weight distribution and updating (along with some additional interpretation machinery) is a plausible possibility (though an admittedly ugly one). Perhaps supergoal uncertainty is a kludge to describe this incoherent behavior. Our environments, social and physical, enforce consistency constraints upon us, approaching making us, in isolated contexts, expectation maximizers. Could something like weighting based on probability of encountering each of those contexts define our individual supergoals? Ugly, ugly, ugly.
• It could be we predict our supergoals will change with time. Who said people have stable goals? Just look at children versus adults or the changes people undergo when they get status or have children. Perhaps the uncertainty has to do with what they predict their future supergoals will be in face of future circumstances and arguments.
• It could be we discover our supergoals and have uncertainty over what we will discover and what we would eventually get at the limit of our exploration. At one point I had rather limited exposure to the various types of foods but now find I like exploring taste space. At one point I didn’t know computer science but now I enjoy its beauty. At one point I hadn’t yet pursued women but now find it quite enjoyable. Some things we apparently just have to try (or at the very least think about) to discover if we like them.
• It could be we cannot completely separate our anticipations from our goals. If our anticipations are slowly updating, systematically off, and coherent in their effect on reality then it is easy to mistake the interaction of flawed anticipations plus a supergoal with having an entirely different supergoal.
• It could be we have uncertainty over how to define our very selves. If your self definition doesn’t include irrational behavior or selfishness or system 1 or includes the Google overmind, then “your” goals are going to look quite different depending on what you include and exclude in your self definition. It is also possible your utility function doesn’t depend upon self definition or you are “by definition” your utility function and this question is moot.
• It could be that environmental constraints cause some supergoals to express themselves equivalently to other supergoals. Perhaps your supergoal could be forever deferred in order to gain capability to achieve it ever better (likely a rare situation). Perhaps big world anthropic negotiation arguments mean you must always distort the achievement of your supergoal. Perhaps the metagolden rule is in effect and “social” conditions force you to constrain your behavior.
• It could be that there really is a way to decide between supergoals (unlikely but still conceivable) and they don’t know yet where that decision process will take them. There could even actually be a meaning of life (i.e. universally convincing supergoal given some intelligence preconditions) after all.
• It could be caused by evidential and logical uncertainty. A mind is made of many parts and there are constraints about how much each part can know about the others or the whole about the components. To show how this implies a form of supergoal uncertainty, partition a mind along functional components. Each of these components has its own function. It may not be able to achieve it without the rest of the components but nevertheless it is there. Now if the optimization power embedded in that component is large enough and the system as a whole has evidential or logical uncertainty about how that component will work you get the possibility that this functional subcomponent will “optimize” its way towards getting greater weight in the decision process and hierarchically this proceeds for all subcomponent optimizers. So, in essence, whenever there is evidential or logical uncertainty about the operations of an optimizing subcomponent we get a supergoal term corresponding to that part and the weight redistribution mechanism corresponds to that subcomponent co-opting some weight. Perhaps this concept can even be extended to define supergoals (with uncertainty) for everything from pure expectation maximizers to rocks.
• It could be uncertainty over how to ground out in reality the definition of the supergoal. If I want to maximize paper clips and just now learn quantum mechanics do I count a paperclip in a superposition of states once or many times? If I have one infinite bunch of paperclips I could produce versus another how do I choose? If my utility function is unbounded in both positive and negative directions and I do Solomonoff induction how can I make decisions at all given that actions may have values that are undefined?
• It could be some sort of mysterious underlying factor makes the formal concept of supergoal inappropriate and it is this mismatched fit that causes uncertainty and weight redistribution. Unification of priors and values in update-less-decision theory? Something else? The universe is still unknown enough that we could be mistaken on this level.
• It could be something else entirely.

(ps I may soon post and explore the effects of supergoal uncertainty in its various reifications on making decisions. For instance, what implications, if any, does it have on bounded utility functions (and actions that depend on those bounds) and negative utilitarianism (or symmetrically positive utilitarianism)? Also, if anyone knows of related literature I would be happy to check it out.)

(pps Dang, the concept of supergoal uncertainty is surprisingly beautiful and fun to explore, and I now have a vague wisp of an idea of how to integrate a subset of these with TDT/UDT)

# 11

27 comments, sorted by Click to highlight new comments since:
New Comment

It's a total digression from this post, but: it occurs to me that someone ought to try to figure out what the "supergoal" or utility function of C. elegans is, or what the coherent extrapolated volition of the C. elegans species might be. That organism's nervous system has been mapped down to every last neuron (not so hard since there's only about 300 of them). If we can't make a C.elegans-Friendly AI given that information, we certainly can't do it for H. sapiens.

My understanding is that we have a connection map but have not successfully simulated the behavior.

If we can't make a C.elegans-Friendly AI given that information, we certainly can't do it for H. sapiens.

I like the suggestion you make. (But) I would perhaps fall just short of certainty. It is not unreasonable to suppose that a supergoal or utility function is something that was evolved alongside higher level adaptations like, say, an executive function and goal directed behaviour. C. elegans just wouldn't get much benefit from having a supergoal encoded in its nervous system.

Looking at the difficulty of creating a C. elegans-FAI would highlight one of the difficulties with FAI in general. There is the inevitable and somewhat arbitrary decision on just how much weight we want to give of implicit goals of humanity. The line between terminal and instrumental values is somewhat dependent on one's perspective.

In a nutshell, it's to make more copies of the c. elegans genome:

http://en.wikipedia.org/wiki/God's_utility_function

There is a common intuition and feeling that our most fundamental goals may be uncertain in some sense.

In what follows, I will naturalistically explore the intuition of supergoal uncertainty.

These are entirely too representative of this post. I admit it's possible I lack adequate background, but this post seems incredibly dense and convoluted. I literally do not know what you're talking about, and I have enough external evidence of my reading comprehension to conclude that it's significantly the author's fault. The idea may be clear in your mind, but you need to spell it out in clear and simple terms if you want others to follow you. Defining "supergoal uncertainty" would be a necessary step, though it would still be well short of sufficient.

There also appear to be outright misuses of vocabulary, unless there are technical meanings I am unaware of. I.e. "I may soon post and explore the effects of supergoal uncertainty in its various reifications on making decisions."

Not even the most obscure continental philosophy gets away with using 'reify' that way.

Still, it looks like there might be some interesting ideas somewhere in there.

Addressing your reification point:

By means of reification something that was previously implicit, unexpressed and possibly unexpressible is explicitly formulated and made available to conceptual (logical or computational) manipulation." - Reification(computer science) from wikipedia.

I don't think I did abuse vocabulary outside of possibly generalizing meanings in straightforward ways and taking words and meanings common in one topic and using them in a context where they are rather uncommon (e.g. computer science to philosophy). I rely on context to refine and imbue words with meaning instead of focusing on dictionary definitions (to me all sentences take the form of puzzles and words are the pieces; I've written more words in proofs than in all other contexts combined). I will try to pay more attention to context invariant meanings in the future. Thanks for the criticism.

Hmm, darn. When I write I do have a tendency to see what ideas I meant to describe instead of seeing my actual exposition; I don't like grammar checking my writing until I've had some time to forget details, I read right over my errors unless I pay special attention.

I did have a three LWers look over the article before I sent it and got the general criticism that it was a bit obscure and dense but understandable and interesting. I was probably too ambitious in trying to include everything within one post though, length vs clarity tradeoff.

To address your points:

Have you not felt or encountered people who have the opinion that our life goals may be uncertain, something to have opinions about, and are valid targets for argument? Also, is not uncertainty of our most fundamental goals something we must consider and evaluate (explicitly or implicitly) in order to verify that an artificial intelligence is provably Friendly?

Elaborating on the second statement, when I used "naturalistically" I wished to invoke the idea that the exploration I was doing was similar to classifying animals before we had taxonomies, we look around with our senses (or imagination and inference in this case) and see what we observe and lay no claim to systematic search or analysis. In this context I did a kind of imagination limited shallow search process without trying to systematically relate the concepts (combinatorial explosion and I'm not yet sure how to condense and analyze supergoal uncertainty).

As to the third point, what I did in this article is allocate a name "supergoal uncertainty", roughly described it in the first paragraph and hopefully brought up the intuition, and then subsequently considered various definitions of "supergoal uncertainty" following from this intuition.

In retrospect, I probably errored on the clarity versus writing time trade-off and was perhaps biased in trying to get this uncomfortable writing task (I'm not a natural writer) off my plate so I can do other things.

this post seems incredibly dense and convoluted. I literally do not know what you're talking about

That was not my experience. I understood everything in the first five paragraphs without having to reflect or even read a second time except that I did have to reflect for a few minutes on the last sentence of paragraph four. Although I am still less confident that I know what Justin intended there than I am with the other sentences, I am 72% confident I know. I think he meant that even if we are not religious, society tends to pull us into moral realism even though of course moral realism is an illusion. (Time constraints prevent me from reading the rest now.)

Defining "supergoal uncertainty" would be a necessary step

Oh, he did that. And the definition was quite clear to me on first reading, but then I have done a lot of math, and a lot of math in which I attempt my own definitions.

72% confident

Two sig figs? Really?

Well, I am relatively new at assigning my beliefs numerical probabilities, so if Eliezer or E.T. Jaynes says different, believe them, but here is my reply.

72% confident

Two sig figs? Really?

Note that if I had said .7 that does not mean that my probability will not go to.4 or .9 tomorrow. On the other hand, if I say the doo-dad is .7 meters long, I am implying that if I re-measure the doo-dad tomorrow, the result will be somewhere in the range .65 to .75 (or to .8). In summary, significant figures does not seem a worthwhile way to communicate how much evidence is required to move a probability by a certain amount. What I suggest people do instead is communicate somehow the nature of the evidence used to arrive at the number. In this case, I left implied that my evidence comes from squishy introspective considerations. Also, note that the fact that Justin will be checking frequently for comments (because it is his post) and Justin can very easily drive my probability to close to 1 or close to 0 with a reply that takes him only 10 seconds to make means that it does not serve the "vericidal" interests of the community for me to spend more than a few seconds in arriving at my numerical probability. I could have mentioned these considerations of the cost of updating my probability and the implications that cost structure has for how much effort I put into my number, but I considered them so obvious that the reader would take them into consideration without my having to say anything.

Look: there is a cost to the experimentalist's tradition by which .7 means that tomorrow the number will not change to anything lower than .65 and higher than .75 or .7999 and that cost is that the only numbers available to the writer are .1 .2 .3 .4 .5 .6 .7 .8 .9. The previous paragraph explains why I consider that cost not worth paying for subjective probabilities.

Jaynes does have something to say on this, which I will summarize thus: you get to (ought to, even) put credible-interval-type bounds on a stated probability (that is, you could have said, e.g., "between 50% and 90%"). The central location of the interval tells us what you now think of your probability (~70%), and the width of the interval tells us how apt your estimate is to move in the face of new evidence.

The above is an approximation; there are lots of refinements. One I will mention right off is that the scheme will break down for probabilities near 0 or 1, because the implied distribution is no longer symmetric around the center of the interval.

Can you give a reference? Because that strikes me as rather un-Jaynesian.

You say that the interval tells us something about how apt the estimate is to move in the face of new evidence. What does it tell us about that? Doesn't it depend on which piece of evidence we're talking about? Do you have to specify a prior over which variables you are likely to observe next?

The material I have in mind is Chapter 18 of PT:LOS. You can see the section headings on page 8 (numbered vii because the title page is unnumbered) here. One of the section titles is "Outer and Inner Robots"; when rhollerith says 72%, he's giving the outer robot answer. To give an account of how unstable your probability estimates are, you need to give the inner robot answer.

What does it tell us about that? Doesn't it depend on which piece of evidence we're talking about?

When we receive new evidence, we assign a likelihood function for the probability. (We take the perspective of the inner robot reasoning about what the outer robot will say.) The width of the interval for the probability tells us how narrow the likelihood function has to be to shift the center of that interval by a non-neglible amount.

Do you have to specify a prior over which variables you are likely to observe next?

No.

That is a strange little chapter, but I should note that if you talk about the probability that you will make some future probability estimate, then the distribution of a future probability estimate does make a good way of talking about the instability of a state of knowledge. As opposed to the notion of talking about the probability of a current probability estimate, which sounds much more like you're doing something wrong.

Second the question, it doesn't sound Jaynesian to me either.

Second the question, it doesn't sound Jaynesian to me either.

I'm relieved that I'm not the only one who thought that. I was somewhat aghast to hear Jaynes recommend something that is so, well, obviously a bull@# hack.

It's curious to me that you'd write this even after I cited chapter and verse. Do you have a copy of PT:LOS?

It's curious to me that you'd write this even after I cited chapter and verse. Do you have a copy of PT:LOS?

I do have a copy but I will take your word for it. I am shocked and amazed that Jayenes would give such a poor recommendation. It doesn't sound Jaynesian to me either and I rather hope he presents a variant that is sufficiently altered as to not be this suggestion at all. You yourself gave the reason why it doesn't work and I am sure there is a better approach than just hacking the scale when it is near 1 or 0. (I am hoping your paraphrase sounds worse than the original.)

Best to give a probabilty density function - but two 2-S-F probabilites typically gives more information than one.

It is good to indicate the strength of your priors. Perhaps one could indicate how much you think your opinion is likely to change over some specified timescale - or in response to the next set of pertinent data points.

Two sig figs? Really?

For significant figures to be at all applicable you would need to express confidence with a completely different kind of scale. I am not going to round off 96% to "not even a probability".

express confidence with a completely different kind of scale

I like the odds scale, myself.

I like the odds scale, myself.

For my part I find it irritating. But it would certainly work better for 1 significant figure expressions. Although I suppose you could say it kind of relies on two significant figures (one on either side) to work at all.

I think he meant that even if we are not religious, society tends to pull us into moral realism even though of course moral realism is an illusion.

You are correct, though I don't go as far as calling moral realism an illusion because of unknown unknowns (though I would be very surprised to find it isn't illusionary).

Re: "There is a common intuition and feeling that our most fundamental goals may be uncertain in some sense. What causes this intuition?"

• Rapidly-changing memetic infections;

• Pleiotropic side effects of a flexible brain;

• Other malfunctions.