My Model

 

Consider an AI. This AI goes out into the world, observing things and doing things. This is a special AI, though. In converting observations into actions, it first transforms them into beliefs in some kind of propositional language. This may or may not be the optimal way to build an AI. Regardless, that's how it works.

 

The AI has a database, filled with propositions. The AI also has some code.

  • It has code for turning propositions into logically equivalent propositions.
  • It has code for turning observations about the world into propositions about these observations, like "The pixel at location 343, 429 of image #8765 is red"
  • It has code for turning propositions about observations into propositions about the state of the world, like "The apple in front of my camera is red."
  • It has code for turning those propositions into propositions that express prediction, like "There will still be a red apple there until someone moves it."
  • It has code for turning those propositions into propositions about shouldness, like "I should tell the scientists about that apple."
  • It has code for turning propositions about shouldness into actions.

What is special about this code is that it can't be expressed as propositions. Just as one can't argue morality into a rock, the AI doesn't function if it doesn't have this code, no matter what propositions are stored in its memory. The classic example of this would be the Tortoise, from What the Tortoise said to Achilles.

 

Axioms - Assumptions and Definitions

 

When we, however, observe the AI, we can put everything into words. We can express its starting state, both the propositions and the code, as a set of axioms. We can then watch as it logically draws conclusions from these axioms, ending on a decision of what action to take.

 

The important thing, therefore, is to check that its initial axioms are correct. There is a key distinction here, because it seems like there are two kinds of axioms going on. Consider two from Euclid:

 

First, his definition of a point, which I believe is "a point is that which has no part". It does not seem like this could be wrong. If it turns out that nothing has no part, then that just means that there aren't any points.

 

Second, one of the postulates, like the parallel postulate. This one could be wrong. Because of General Relativity, when applied to the real world, it is, in fact, wrong.

 

This boundary can be somewhat fluid when we interpret the propositions in a new light. For instance, if we prefix each axiom with, "In a Euclidean space, ", then the whole system is just the definition of a Euclidean space.

 

A necessary condition for some axiom to be a definition of a term is that you can't draw any new conclusions that don't involve that term from the definition. This also applies to groups of definitions. That is:

 

"Stars are the things that produce the lights we see in the sky at night that stay fixed relative to each other, and also the Sun" and "Stars are gigantic balls of plasma undergoing nuclear reactions" can both be definitions of the word "Star", since we know them to be equivalent. If, however, we did not know those concepts to be equivalent (if we needed to be really really really confident in our conclusions, for instance) then only one of those could be the definition.

 

What are the AI's definitions?

 

Logic: It seems to me that this code both defines logical terms and makes assumptions about their properties. In logic it is very hard to tell the difference.

 

Observation: Here it is easier. The meaning of a statement like "I see something red" is given by the causal process that leads to seeing something red. This code, therefore, provides a definition.

 

The world: What is the meaning of a statement about the world? This meaning, as we know from the litany of Tarski, comes from the world. One can't define facts about the world into existence, so this code must consist of assumptions.

 

Predictions: The meaning of a prediction, clearly, is not determined by the process used to create it. It's determined by what sort of events would confirm or falsify a prediction. The code that deduces predictions from states of the world might include part of the definition of those states. For instance, red things are defined as those things predicted to create the appearance of redness. It cannot, though, define the future - it can only assume that its process for producing predictions is accurate.

 

Shouldness: Now we come to the fun part. There are two separate ways to define "should" here. We can define it by the code that produces should statements, or by the code that uses them.

 

When you have two different definitions, one thing you can do is to decide that they're defining two different words. Let's call the first one AI_should and the second Actually_Should.

 

ETA: Another way to state this is "Which kinds of statement can be reduced, through definition, to which others kinds of statements?" I argue that while one can reduce facts about observations to facts about the world (as long as qualia don't exist), one cannot reduce statements backwards - something new is introduced at each step. People think that we should be able to reduce ethical statements to the previous kinds of statements. Why, when so many other reductions are not possible?

 

Is the second definition acceptable?

 

The first claim I would like to make is that Actually_Should is a well-defined term - that AI_should and me_should and you_should are not all that there is.

 

One counter-claim that can be made against it is that it is ambiguous. But if it were, then one would presumably be able to clarify the definition with additional statements about it. But this cannot be done, because then one would be able to define certain actions into existence.

 

Is the second definition better?

 

I'm not sure "better" is well-defined here, but I'm going to argue that it is. I think these arguments should at least shed light on the first question.

 

The first thing that strikes me as off about AI_should is that if you use it, then your definition will be very long and complex (because Human Value is Complex) but your assumption will be short (because the code that takes "You should do X" and then does X can be minimal). This is backwards - definitions should be clean, elegant, and simple, while assumptions often have to be messy and complex.

 

The second thing is that it doesn't seem to be very useful for communication. When I tell someone something, it is usually because I want them to do something. I might tell someone that there is a deadly snake nearby, so that they will be wary. I may not always know exactly what someone will do with information, but I can guess that some information will probably improve the quality of their decisions.

 

How useful, then to communicate directly in "Actually_should", to say to Alice the proposition that, if Alice believed it, would cause her to do X, from which she conclude that "Bob believes the proposition that, if I believed it, would cause me to do X".

 

If, on the other hand, Bob says that Alice Bob_should do X, then Alice might respond with "that's interesting, but I don't care", and would not be able to respond with "You're wrong." This would paper over the very real moral disagreement between them. This is often advisable in social situations, but rarely useful epistemically.

 

How do we know that there is a disagreement? Well, if the issue is significant enough, both sides would feel justified in starting a war over it. This is true even if they agree on Alice_Should and Bob_Should and so on for everyone on Earth. That seems like a pretty real disagreement to me.

New to LessWrong?

New Comment
46 comments, sorted by Click to highlight new comments since: Today at 4:51 AM

The first thing that strikes me as off about AI_should is that if you use it, then your definition will be very long and complex (because Human Value is Complex) but your assumption will be short (because the code that takes "You should do X" and then does X can be minimal). This is backwards - definitions should be clean, elegant, and simple, while assumptions often have to be messy and complex.

When they can be clean, elegant, and simple, sure. However, they are often clean and elegant because of other assumed definitions. Consider the definition of a continuous function from the reals to the reals. This requires definitions of real numbers, limits, and functions. This further requires definitions of number, absolute value, <, >, etc. When you put it all together, it isn't very clean and elegant. Point being, we can talk in shorthand about our utility functions (or shouldness) somewhat effectively, but we shouldn't be surprised when it's complicated when we try to program it into a computer.

Furthermore, so what if a definition isn't elegant, assuming it "carves reality at it's joints"?

What joints does "good" carve reality at? Are most things either very good or very not good?

"everything I care about" or "everything Will cares about", etc.

That is, in fact, a useful category to draw.

It's a useful category to draw because you believe certain moral facts (that those are the correct things to care about)

The category, the definition, is just a manifestation of the moral beliefs. It is, in fact, exactly isomorphic to those beliefs, otherwise it wouldn't be useful. So why not just talk about the beliefs?

It's not quite the same as my moral beliefs. My moral beliefs are what I think I care about. Goodness refers to what I actually care about.

That being said, there's no reason why my moral beliefs have to be defined in some clean and simple way. In fact, they probably aren't.

But "what you actually care about" is defined as what your moral beliefs would be if you had more information, more intelligence, etc.

So what are your moral beliefs actually about? Are they beliefs about more beliefs?

Can you explicitly write down the definition of actually_should? There seems to be an implicit assumption in the post that while "the code that produces the should statements" is complex and distinct for each agent, "the code that uses them" is simple and universal. I am not sure how this is warranted.

The first claim I would like to make is that Actually_Should is a well-defined term - that AI_should and me_should and you_should are not all that there is.

I don't understand how the two parts of this sentence are related.

Why does this post not have 20 points or more?

Nitpick 1:

It seems likely to be the optimal way to build an AI that has to communicate with other AIs.

This seems a very contentious claim. For instance, to store the relative heights of people, wouldn't it make more sense to have the virtual equivalent of a ruler with markings on it rather than the virtual equivalent of a table of sentences of the form "X is taller than Y"?

I think the best approach here is just to explicitly declare it as an assumption: 'for argument's sake' your robot uses this method. End of story.

Nitpick 2:

Because of General Relativity, when applied to the real world, it is, in fact, wrong.

This is false. General Relativity doesn't contradict the fact that space is "locally Euclidean".

This is false. General Relativity doesn't contradict the fact that space is "locally Euclidean".

Should I use a different postulate?

Should I use a different postulate?

Yes: the parallel postulate.

How do we know that there is a disagreement? Well, if the issue is significant enough, both sides would feel justified in starting a war over it. This is true even if they agree on Alice_Should and Bob_Should and so on for everyone on Earth. That seems like a pretty real disagreement to me.

And yet we can readily imagine amoral individuals who explicitly agree on all the facts, but go to war anyway in order to seize resources for themselves/paperclips. (Sorry, Clippy.)

You want to make use of the fact that it feels to us like a disagreement when people go to war and cite moral reasons for doing so. But two objections occur to me:

A. Why trust this feeling at all? Our brains evolved in ways that favored reproduction, rather than finding the truth as such. War could harm the chance of reproductive success, so if we can convince the other side to do what we want using words alone then we might expect to feel an urge to argue. This could lead to an instinctive belief in "disagreement" where none exists, if the belief helps us confuse the enemy. I don't know if this quite makes sense on the assumption that different humans have different "_Should" functions, but it means your argument does not seem self-evidently true.

B. If we do trust this feeling under normal circumstances, why assume that humans have different _Should functions? Why not say that our brains expect disagreement because humans do in fact tend to work from one genetically coded function, and will not go to war 'for moral reasons' unless at least one side gets this complex 'calculation' wrong? We certainly don't need to assume anything further in order to explain the phenomenon you cite if it seems suitable for dealing with humans. And if morality has a strong genetic component, then we'd expect either a state of affairs that I associate with Eliezer's position (complex machinery shared by nearly every member of the species), or a variation of this where the function chiefly justifies seeking outcomes that once favored reproductive success. The latter would not appear to help your position. It would mean that fully self-aware fighters could agree on all the facts, could know they agree, and could therefore shake hands before trying to kill each other in accordance with their warrior dharma.

Clippy and Snippy the scissors-maximizer only agree on all the facts if you exclude moral facts. But this is what we are arguing about - if there are moral facts.

A: So would you support or oppose a war on clippy? What about containing psychopaths and other (genetically?) abnormal humans?

Why do you need to fight them if you agree with them?

B. Irrelevant with my examples.

Why do you need to fight them if you agree with them?

Because they're dangerous. And I don't think Clippy disagrees intellectually on the morality of turning humans into paperclips; it just disagrees verbally. It thinks some of us will hesitate a bit if it claims to use our concept of morality and to find that paperclipping is supremely right.

Meanwhile, many psychopaths are quite clear and explicit that their ways are immoral. They know and don't care or even pretend to care.

Dangerous implies a threat. Conflicting goals aren't sufficient to establish a threat substantial enough to need fighting or even shunning; that additionally requires the power to carry those goals to dangerous places.

Clippy's not dangerous in that sense. It'd happily turn my mass into paperclips given the means and absent countervailing influences, but a non-foomed clippy with a basic understanding of human society meets neither criterion. With that in mind, and as it doesn't appear to have the resources needed to foom (or to establish some kind of sub-foom paperclip regime) on its own initiative, our caution need only extend to denying it those resources. I even suspect I might be capable of liking it, provided some willing suspension of disbelief.

As best I can tell this isn't like dealing with a psychopath, a person with human social aggressions but without the ability to form empathetic models or to make long-term game-theoretic decisions and commitments based on them. It's more like dealing with an extreme ideologue: you don't want to hand such a person any substantial power over your future, but you don't often need to fight them, and tit-for-tat bargaining can be quite safe if you understand their motivations.

I thought we were talking about a foomed/fooming Clippy.

Ah. Generally I read "Clippy" as referring to User:Clippy or something like it, who's usually portrayed as having human-parity intelligence and human-parity or fewer resources; I don't think I've ever seen the word used unqualified to describe the monster raving superhuman paperclip maximizer of the original thought experiment.

...and here I find myself choosing my words carefully in order to avoid offending a fictional AI with a cognitive architecture revolving around stationary fasteners. Strange days indeed.

Because they're dangerous

That seems overly complicated when you could just say that you disagree.

Meanwhile, many psychopaths are quite clear and explicit that their ways are immoral.

So clearly the definition of morality they use is not connected to shouldness? I guess that's their prerogative to define morality that way. But they ALSO have different views on shouldness than us, otherwise they would act in the same manner.

Are you disagreeing that Clippy and Snippy are dangerous? If not, accepting this statement adds no complexity to my view as compared to yours.

As for shouldness, many people don't make a distinction between "rationally should" and "morally should". And why should they; after all, for most there may be little divergence between the two. But the distinction is viable, in principle. And psychopaths, and those who have to deal with them, are usually well aware of it.

If not, accepting this statement adds no complexity to my view as compared to yours.

I'm not sure what I mean by complicated.

And psychopaths, and those who have to deal with them, are usually well aware of it.

Exactly, I'm talking about the concept "should', not the word.

Can you rewrite the last section in terms of "A" and "B" or something where appropriate, instead of "me" and "you", to make it less confusing? I almost get what you're trying to say, mostly, but I think it would clear up some confusion if the AIs talked about themselves in the third person and were clearly differentiated from me (i.e. the person writing this comment) and you (i.e. the person who wrote the post).

Thanks!

Done. I left "I" in where I was actually pretty much talking about myself.

This meaning, as we know from the litany of Tarski, comes from the world.

I can make no sense of that. That is about a theory truth, and it is a metalingusiic theory. I don't see how it adds up to content externalism as you seem to think.

Not context externalism.

I'm arguing that, when I say "snow is white", the meaning of the terms I am using all refer to external stuff, so you can't reduce the claim that "snow is white" to a claim about my mental state.

You need a cooperative-game based theory of communication to properly define "should." "Should" is a linguistic tag which indicates that the sender is staking some of their credibility on the implicit claim that the course of action contained in the body of the message would benefit the receiver.

Certainly not true in all instances.

"You should give lots of money to charity", for instance.

It is still true in that instance. If a person^ told you, "you should give lots of money to charity," and you followed the suggestion, and later regretted it, then you would be less inclined to listen to that person's advice in the future.

^: Where personhood can be generalized.

Suppose I post a statement of shouldness anonymously on an internet forum. Does that statement have no meaning?

Anonymity cannot erase identity, it can only obscure it. Readers of the statement have an implicit probability distribution as to the possible identity of the poster, and the readers which follow the suggestion posted will update their trust metric over that probability distribution in response to the outcome of the suggestion. This is part of what I meant by generalized personhood.

What if two people have identical information on all facts about the world and the likely consequences of actions. In your model, can they disagree about shouldness?

The concept of "shouldness" does not exist in my model. My model is behavioristic.

  1. Would you expect two people who had identical information on all facts about the world and the likely consequences of actions to get in an argument about "should", as people in more normal situations are wont to do? Let's say they get in an argument about what a third person would do. Is this possible? How would you explain it?

  2. Then you need to expand your model. How do you decide what to do?

The decision theory of your choice.

EDIT: The difference between my viewpoint and your viewpoint is that I view language as a construct purely for communication between different beings rather than for internal planning.

a) No, how do you decide what to do?

b) So when I think thoughts in my head by myself, I'm just rehearsing things I might say to people at a future date?

c) Does that mean you have to throw away Bayesian reasoning? Or, if not, how do you incorporate a defense of Bayesian reasoning into that framework?

I'm not sure I understand your criticisms. My definition of "should" applies to any agents which are capable of communicating messages for the purpose of coordinating coalitions; outside of that, it does not require that the interpreter of the language have any specific cognitive structure. According to my definition, even creatures as simple as ants could potentially have a signal for 'should.'

You seem to be attempting to generalize the specific phenomenon of human conscious decision-making to a broader class of cognitive agents. It may well be that the human trait of adapting external language for the purpose of internal decision-making actually turns out to be very effective in practice for all high-level agents. However, it is also quite possible that in the space of possible minds, there are many effective designs which do not use internal language.

I do not see how Bayesian reasoning requires the use of internal language.

Because you'd have a data structure of world-states and their probabilities, which would look very much like a bunch of statements of the form "This world-state has this probability".

It doesn't need to be written in a human-like way to have meaning, and if it has a meaning then my argument applies.

So "should" = the table of expected utilities that goes along with the table of probabilities.

Then I am basically in agreement with you.

Good article.

from the Achilles and the Tortoise dialog.

Which? There are many in GEB.

He's talking about the Lewis Carroll dialog that inspired the ones in GEB. "What the tortoise said to Achilles."

The point of the dialog is that there's something irreducibly 'dynamic' about the process of logical inference. Believing "A" and "A implies B" does not compel you to believe "B". Even if you also believe "A and (A implies B) together imply B". A static 'picture' of an inference is not itself an inference.

There was supposed to be a link there but I accidentally deleted it. It's fixed.

I think it's something like "couldness" combined with a computation on another person's utility function.

In other words, I would tell someone that they should do something if I think that it is likely to maximize their utility more so than whatever it is they are currently doing or planning to do.

My utility function is that I love punching babies and get lots of utility from it.

Will you tell me that I should punch babies?

Good point, I forgot to even consider the possible divergence in our utility functions; stupid. Revision: If my utility function is sufficiently similar to yours, then I will tell you that you ought to punch babies. If my utility function is sufficiently different from yours, I will try to triangulate and force a compromise in order to maximize my own utility. ETA: I suppose I actually am doing the latter in both cases; it just so happens that we agree on what actions are best to take in the first one.

  1. But clearly the "should" is distinct from the compromise, no? You think that I shouldn't punch babies but are willing to compromise at some baby-punching.

  2. And I am arguing that your utility function is a kind of belief - a moral belief, that motivates action. "Should" statements are statements of, not about, your utility function.

(The of/about distinction is the distinction between "The sky is blue" and "I think the sky is blue")

  1. I suppose this is true. As long as your action is in conflict with my utility function, I will think that you "shouldn't" do it.

  2. I agree with that.

The triangulated "should-expression" in my above example is an expression of my utility function, but it is indirect insofar that it's a calculation given that your utility function conflicts substantially with mine.

Also, when I was talking about divergence before I realize that I was being sloppy. Our utility functions can "diverge" quite a bit without "conflicting", and the can "conflict" quite a bit without "diverging"; that is, our algorithms can both be "win the game of tic-tac-toe in front of you", and thus be exactly the same, but still be in perfect conflict. So sorry about that, that was just sloppy thinking altogether on my part.

Then we agree, and just had some terminology problems.