Invisible Frameworks


15


Eliezer_Yudkowsky

Followup toPassing the Recursive Buck, No License To Be Human

Roko has mentioned his "Universal Instrumental Values" several times in his comments.  Roughly, Roko proposes that we ought to adopt as terminal values those things that a supermajority of agents would do instrumentally.  On Roko's blog he writes:

I'm suggesting that UIV provides the cornerstone for a rather new approach to goal system design. Instead of having a fixed utility function/supergoal, you periodically promote certain instrumental values to terminal values i.e. you promote the UIVs.

Roko thinks his morality is more objective than mine:

It also worries me quite a lot that eliezer's post is entirely symmetric under the action of replacing his chosen notions with the pebble-sorter's notions. This property qualifies as "moral relativism" in my book, though there is no point in arguing about the meanings of words.

My posts on universal instrumental values are not symmetric under replacing UIVs with some other set of goals that an agent might have. UIVs are the unique set of values X such that in order to achieve any other value Y, you first have to do X.

Well, and this proposal has a number of problems, as some of the commenters on Roko's blog point out.

For a start, Roko actually says "universal", not "supermajority", but there are no actual universal actions; no matter what the green button does, there are possible mind designs whose utility function just says "Don't press the green button."  There is no button, in other words, that all possible minds will press.  Still, if you defined some prior weighting over the space of possible minds, you could probably find buttons that a supermajority would press, like the "Give me free energy" button.

But to do nothing except press such buttons, consists of constantly losing your purposes. You find that driving the car is useful for getting and eating chocolate, or for attending dinner parties, or even for buying and manufacturing more cars.  In fact, you realize that every intelligent agent will find it useful to travel places.  So you start driving the car around without any destination.  Roko hasn't noticed this because, by anthropomorphic optimism, he mysteriously only thinks of humanly appealing "UIVs" to propose, like "creativity".

Let me guess, Roko, you don't think that "drive a car!" is a "valid" UIV for some reason?  But you did not apply some fixed procedure you had previously written down, to decide whether "drive a car" was a valid UIV or not.  Rather you started out feeling a moment of initial discomfort, and then looked for reasons to disapprove.  I wonder why the same discomfort didn't occur to you when you considered "creativity".

But let us leave aside the universality, appeal, or well-specified-ness of Roko's metaethics.

Let us consider only Roko's claim that his morality is more objective than, say, mine, or this marvelous list by William Frankena that Roko quotes SEP quoting:

Life, consciousness, and activity; health and strength; pleasures and satisfactions of all or certain kinds; happiness, beatitude, contentment, etc.; truth; knowledge and true opinions of various kinds, understanding, wisdom; beauty, harmony, proportion in objects contemplated; aesthetic experience; morally good dispositions or virtues; mutual affection, love, friendship, cooperation; just distribution of goods and evils; harmony and proportion in one's own life; power and experiences of achievement; self-expression; freedom; peace, security; adventure and novelty; and good reputation, honor, esteem, etc.

So!  Roko prefers his Universal Instrumental Values to this, because:

It also worries me quite a lot that eliezer's post is entirely symmetric under the action of replacing his chosen notions with the pebble-sorter's notions. This property qualifies as "moral relativism" in my book, though there is no point in arguing about the meanings of words.

My posts on universal instrumental values are not symmetric under replacing UIVs with some other set of goals that an agent might have. UIVs are the unique set of values X such that in order to achieve any other value Y, you first have to do X.

It would seem, then, that Roko attaches tremendous importance to claims to asymmetry and uniqueness; and tremendous disaffect to symmetry and relativism.

Which is to say that, when it comes to metamoral arguments, Roko is greatly moved to adopt morals by the statement "this goal is universal", while greatly moved to reject morals by the statement "this goal is relative".

In fact, so strong is this tendency of Roko's, that the metamoral argument "Many agents will do X!" is sufficient for Roko to adopt X as a terminal value.  Indeed, Roko thinks that we ought to get all our terminal values this way.

Is this objective?

Yes and no.

When you evaluate the question "How many agents do X?", the answer does not depend on which agent evaluates it.  It does depend on quantities like your weighting over all possible agents, and on the particular way you slice up possible events into categories like "X".  But let us be charitable: if you adopt a fixed weighting over agents and a fixed set of category boundaries, the question "How many agents do X?" has a unique answer.  In this sense, Roko's meta-utility function is objective.

But of course Roko's meta-utility function is not "objective" in the sense of universal compellingness.  It is only Roko who finds the argument "Most agents do X instrumentally" a compelling reason to promote X to a terminal value.  I don't find it compelling; it looks to me like losing purpose and double-counting expected utilities.  The vast majority of possible agents, in fact, will not find it a compelling argument!  A paperclip maximizer perceives no utility-function-changing, metamoral valence in the proposition "Most agents will find it useful to travel from one place to another."

Now this seems like an extremely obvious criticism of Roko's theory.  Why wouldn't Roko have thought of it?

Because when Roko feels like he's being objective, he's using his meta-morality as a fixed given—evaluating the question "How many agents do X?" in different places and times, but not asking any different questions.  The answer to his meta-moral question has occurred to him as a variable to be investigated; the meta-moral question itself is off the table.

But—of course—when a Pebblesorter regards "13 and 7!" as a powerful metamoral argument that "heaps of 91 pebbles" should not be a positive value in their utility function, they are asking a question whose answer is the same in all times and all places.  They are asking whether 91 is prime or composite.  A Pebblesorter, perhaps, would feel the same powerful surge of objectivity that Roko feels when Roko asks the question "How many agents have this instrumental value?"  But in this case it readily occurs to Roko to ask "Why care if the heap is prime or not?"  As it does not occur to Roko to ask, "Why care if this instrumental goal is universal or not?"  Why... isn't it just obvious that it matters whether an instrumental goal is universal?

The Pebblesorter's framework is readily visible to Roko, since it differs from his own.  But when Roko asks his own question—"Is this goal universally instrumental?"—he sees only the answer, and not the question; he sees only the output as a potential variable, not the framework.

Like PA, that only sees the compellingness of particular proofs that use the Peano Axioms, and does not consider the quoted Peano Axioms as subject matter.  It is only PA+1 that sees the framework of PA.

But there is always a framework, every time you are moved to change your morals—the question is whether it will be invisible to you or not.  That framework is always implemented in some particular brain, so that the same argument would fail to compel a differently constructed brain—though this does not imply that the framework makes any mention of brains at all.

And this difficulty of the invisible framework is at work, every time someone says, "But of course the correct morality is just the one that helps you survive / the one that helps you be happy"—implicit there is a supposed framework of meta-moral arguments that move you.  But maybe I don't think that being happy is the one and only argument that matters.

Roko is adopting a special and unusual metamoral framework in regarding "Most agents do X!" as a compelling reason to change one's utility function.  Why might Roko find this appealing?  Humans, for very understandable reasons of evolutionary psychology, have a universalizing instinct; we think that a valid argument should persuade anyone.

But what happens if we confess that such thinking can be valid? What happens if we confess that a meta-moral argument can (in its invisible framework) use the universalizing instinct?  Then we have... just done something very human.  We haven't explicitly adopted the rule that all human instincts are good because they are human—but we did use one human instinct to think about morality.  We didn't explicitly think that's what we were doing, any more than PA quotes itself in every proof; but we felt that a universally instrumental goal had this appealing quality of objective-ness about that, which is a perception of an intuition that evolved.  This doesn't mean that objective-ness is subjective.  If you define objectiveness precisely then the question "What is objective?" will have a unique answer.  But it does mean that we have just been compelled by an argument that will not compel every possible mind.

If it's okay to be compelled by the appealing objectiveness of a moral, then why not also be compelled by...

...life, consciousness, and activity; health and strength; pleasures and satisfactions of all or certain kinds; happiness, beatitude, contentment, etc.; truth; knowledge and true opinions of various kinds, understanding, wisdom...

Such values, if precisely defined, can be just as objective as the question "How many agents do X?" in the sense that "How much health is in this region here?" will have a single unique answer.  But it is humans who care about health, just as it is humans who care about universalizability.

The framework by which we care about health and happiness, as much evolved, and human, and part of the very substance of that which we name right whether it is human or not... as our tendency to find universalizable morals appealing.

And every sort of thing that a mind can do will have some framework behind it.  Every sort of argument that can compel one mind, will fail to be an argument in the framework of another.

We are in the framework we name right; and every time we try to do what is correct, what we should, what we must, what we ought, that is the question we are asking.

Which question should we ask?  What is the correct question?

Don't let your framework to those questions be invisible!  Don't think you've answered them without asking any questions!

There is always the meta-meta-meta-question and it always has a framework.

I, for one, have decided to answer such questions the right way, as the alternative is to answer it the wrong way, like Roko is doing.

And the Pebblesorters do not disagree with any of this; they do what is objectively prime, not what is objectively right.  And the Roko-AI does what is objectively often-instrumental, flying starships around with no destination; I don't disagree that travel is often-instrumental, I just say it is not right.

There is no right-ness that isn't in any framework—no feeling of rightness, no internal label that your brain produces, that can be detached from any method whatsoever of computing it—that just isn't what we're talking about when we ask "What should I do now?"  Because if anything labeled should, is right, then that is Self-PA.

 

Part of The Metaethics Sequence

(end of sequence)

Previous post: "No License To Be Human"