Why the beliefs/values dichotomy?

I'd like to suggest that the fact that human preferences can be decomposed into beliefs and values is one that deserves greater scrutiny and explanation. It seems intuitively obvious to us that rational preferences must decompose like that (even if not exactly into a probability distribution and a utility function), but it’s less obvious why.

The importance of this question comes from our tendency to see beliefs as being more objective than values. We think that beliefs, but not values, can be right or wrong, or at least that the notion of right and wrong applies to a greater degree to beliefs than to values. One dramatic illustration of this is in Eliezer Yudkowsky’s proposal of Coherent Extrapolated Volition, where an AI extrapolates the preferences of an ideal humanity, in part by replacing their "wrong” beliefs with “right” ones. On the other hand, the AI treats their values with much more respect.

Since beliefs and values seem to correspond roughly to the probability distribution and the utility function in expected utility theory, and expected utility theory is convenient to work with due to its mathematical simplicity and the fact that it’s been the subject of extensive studies, it seems useful as a first step to transform the question into “why can human decision making be approximated as expected utility maximization?”

I can see at least two parts to this question:

  • Why this mathematical structure?
  • Why this representation of the mathematical structure?

Not knowing how to answer these questions yet, I’ll just write a bit more about why I find them puzzling.

Why this mathematical structure?

It’s well know that expected utility maximization can be derived from a number of different sets of assumptions (the so called axioms of rationality) but they all include the assumption of Independence in some form. Informally, Independence says that what you prefer to happen in one possible world doesn’t depend on what you think happens in other possible worlds. In other words, if you prefer A&C to B&C, then you must prefer A&D to B&D, where A and B are what happens in one possible world, and C and D are what happens in another.

This assumption is central to establishing the mathematical structure of expected utility maximization, where you value each possible world separately using the utility function, then take their weighted average. If your preferences were such that A&C > B&C but A&D < B&D, then you wouldn’t be able to do this.

It seems clear that our preferences do satisfy Independence, at least approximately. But why? (In this post I exclude indexical uncertainty from the discussion, because in that case I think Independence definitely doesn't apply.) One argument that Eliezer has made (in a somewhat different context) is that if our preferences didn’t satisfy Independence, then we would become money pumps. But that argument seems to assume agents who violate Independence, but try to use expected utility maximization anyway, in which case it wouldn’t be surprising that they behave inconsistently. In general, I think being a money pump requires having circular (i.e., intransitive) preferences, and it's quite possible to have transitive preferences that don't satisfy Independence (which is why Transitivity and Independence are listed as separate axioms in the axioms of rationality).

Why this representation?

Vladimir Nesov has pointed out that if a set of preferences can be represented by a probability function and a utility function, then it can also be represented by two probability functions. And furthermore we can “mix” these two probability functions together so that it’s no longer clear which one can be considered “beliefs” and which one “values”. So why do we have the particular representation of preferences that we do?

Is it possible that the dichotomy between beliefs and values is just an accidental byproduct of our evolution, perhaps a consequence of the specific environment that we’re adapted to, instead of a common feature of all rational minds? Unlike the case with anticipation, I don’t claim that this is true or even likely here, but it seems to me that we don’t understand things well enough yet to say that it’s definitely false and why that's so.

153 comments, sorted by
magical algorithm
Highlighting new comments since Today at 9:56 AM
Select new highlight date

Just to distance this very interesting question from expected utility maximization: "Beliefs" sound like they are about couldness, and values about shouldness. Couldness is about behavior of the environment outside the agent, and shouldness is about behavior of the agent. Of course, the two only really exist in interaction, but as systems they can be conceptualized separately. When an agent asks what it could do, the question is really about what effects in environment could be achieved (some Tarskian hypocrisy here: using "could" to explain "couldness"). Beliefs is what's assumed, and values is what's asserted. In a decision tree, beliefs are associated with knowledge about other agent's possible actions, and values with the choice of the present agent's action. Both are aspects of the system, but playing different roles in the interaction: making a choice versus accepting a choice. Naturally, there is a duality here, when the sides are exchanged: my values become your beliefs, and my beliefs become your values. Choice of representation is not that interesting, as it's all interpretation: nothing changes in behavior.

I gave an example where choice of representation is important: Eliezer's CEV. If the choice of representation shouldn't to be important, then that seems to be argument against CEV.

Bullet acknowledged and bitten. A Friendly AI attempting to identify humanity's supposed CEV will also have to be a politician and have enough support so that they don't shut it down. As a politician, it will have to appeal to people with the standard biases. So it's not enough for it to say, "okay, here's something all of you should agree on as a value, and benefit from me moving humanity to that state".

And in figuring out what would appeal to humans, it will have to model the same biases that blur the distinction.

I was referring to you referring to my post on playing with utility/prior representations.

It seems clear that our preferences do satisfy Independence, at least approximately.

How big of a problem does this simple example signify?

  • A = I acquire a Nintendo
  • B = I acquire a Playstation
  • C = I acquire a game for the Nintendo
  • D = I acquire a game for the Playstation
  • A&C > B&C but A&D < B&D

Your example shows that we can't assign utilities to events within a single world, like acquiring game systems and games, and then add them up into a utility for that world, but it's not a counterexample to Independence, because of this part:

A and B are what happens in one possible world, and C and D are what happens in another.

Independence is necessary to assign utilities to possible world histories and aggregate those utilities linearly into expected utility. Consider the apples/oranges example again. There,

  • A = I get an apple in the world where coin is heads
  • B = I get an orange in the world where coin is heads
  • C = I get an apple in the world where coin is tails
  • D = I get an orange in the world where coin is tails

Then, according to Independence, my preferences must be either

  1. A&C > B&C and A&D > B&D, or
  2. A&C < B&C and A&D < B&D

If case 1, I should pick the transparent box with the apple, and if case 2, I should pick the transparent box with the orange.

(I just realized that technically, my example is wrong, because in case 1, it's possible that A&D > A&C and B&D > B&C. Then, I should most prefer an opaque box that contains an apple if the coin is heads and an orange if the coin is tails, since that gives me outcome A&D, and least prefer an opaque box that contains the opposite (gives me B&C). So unless I introduce other assumptions, I can only derive that I shouldn't simultaneously prefer both kinds of opaque boxes to transparent boxes.)

I have a tentative answer for the second question of "Why this representation?". Given that a set of preferences can be represented as a probability function and a utility function, that seems computationally more convenient than using two probability functions, since then you only have to do half of the Bayesian updating.

Another part of this question is that such a set of preferences can usually be decomposed many different ways into probability and utility, so what explains the particular decomposition that we have? I think there should have been a selection pressure for humans to have a common prior, to the extent possible, and move as much as possible of the differences in preferences into the utility function, since that would facilitate communication and sharing of information. It seems that if we had common priors, and I have a lot of information about something (and you trust me), I can just tell you my posterior beliefs, instead of having to give you all of the raw information and let you recompute your own posterior beliefs.

"Of all the axioms, independence is the most often discarded. A variety of generalized expected utility theories have arisen, most of which drop or relax the independence axiom."

"Of all the axioms, independence is the most often discarded. A variety of generalized expected utility theories have arisen, most of which drop or relax the independence axiom."

The examples in the generalized expected utility link are descriptive theories of how humans are irrational money pumps. (The two bullet points after the sentence in wikipedia are examples of conventional utility functions; in that context the sentence is false.)

I'm not sure what the bullet points are doing there either - but I don't really see how they impact the original statement.

Reminds me of the parallel postulate - non-Euclidean utility?

Paul Churchland calls the belief/values (he says belief/desires) model "folk psychology" and assigns a low probability to it "being smoothly reduced by neuroscience" rather than being completely disregarded like, say, the phlogiston theory of combustion. The paper is called Eliminative Materialism and the Propositional Attitudes and was printed in The Journal of Philosophy. I didn't find the paper all that convincing, but your mileage may vary.

This paper was cited along with another by someone (can't remember who) arguing that the belief/values theory of behavior (i.e. expected utility theory) doesn't capture how humans behave. The second paper I think argues that much of what we do can be explained by control theory without reference to beliefs or values, but I haven't read it yet.

The papers are:

Churchland, Paul. Eliminative Materialism and the Propositional Attitudes, The Journal of Philosophy.

van Gelder, Tim. What Might Cognition be, if not Computation?, The Journal of Philosophy.

For those of you who don't have the benefit of a university subscription to J-stor or something similar, I have pdfs of both papers. Just shoot me an email at : themattsimpson AT DOT company

This comment is directly about the question of probability and utility. The division is not so much about considering the two things separately, as it is about extracting tractable understanding of the whole human preference (prior+utility) into a well-defined mathematical object (prior), while leaving all the hard issues with elicitation of preference in the utility part. In practice it works like this: a human conceptualizes a problem so that a prior (that is described completely) can be fed to an automatic tool, then tool's conclusion about the aspect specified as probability is interpreted by a human again. People fill in the utility part by using their preference, even though they can't represent it as the remaining utility part. Economists, having to create autonomous models of decision-making (as distinct from autonomous decision-making systems), have to introduce the whole preference, but it's so approximate that it's of no use in most other contexts.

Because of the utility-prior divide of human preference in practice of human decision-making, with only prior in the domain of things that are technically understood, there is a strong association of prior with "knowledge" (hence "belief", but being people of science we expel feeling-associated connotations from the concept), while utility remains vague, but is a necessary part that completes the picture to the expression of whole preference, hence introduction of utility to a problem is strongly associated with values.

But why do human preferences exhibit the (approximate) independence which allows the extraction to take place?

Simple. They don't.

Maybe it's just me, but this looks like another case of overextrapolation from a community of rationalists to all of humanity. You think about all the conversations you've had distinguishing beliefs from values, and you figure everyone else must think that way.

In reality, people don't normally make such a precise division. But don't take my word for it. Go up to your random mouthbreather and try to find out how well they adhere to a value/belief distinction. Ask them whether the utility assigned to an outcome, or its probability was a bigger factor.

No one actually does those calculations consciously; if anything like it is done non-consciously, it's extremely economical in computation.

Simple: the extraction cuts across preexisting independencies. (I don't quite see what you refer to by "extraction", but my answer seems general enough to cover most possibilities.)

I'm referring to the extraction that you were talking about: extracting human preference into prior and utility. Again, the question is why the necessary independence for this exists in the first place.

I was talking about extraction of prior about a narrow situation as the simple extractable aspect of preference, period. Utility is just the rest, what remains unextractable in preference.

Ok, I see. In that case, do you think there is still a puzzle to be solved, about why human preferences seem to have a large amount of independence (compared to, say, a set of randomly chosen transitive preferences), or not?

That's just a different puzzle. You are asking a question about properties of human preference now, not of prior/utility separation. I don't expect strict independence anywhere.

Independence is indifference, due to inability to see and precisely evaluate all consequences, made strict in form of probability, by decree of maximum entropy. If you know your preference about an event, but no preference/understanding on the uniform elements it consists of, you are indifferent to these elements -- hence maximum entropy rule, air molecules in the room. Multiple events for which you only care in themselves, but not in the way they interact, are modeled as independent.

[W]hy human preferences seem to have a large amount of independence (compared to, say, a set of randomly chosen transitive preferences)[?]

Randomness is info, so of course the result will be more complex. Where you are indifferent, random choice will fill in the blanks.

It sounds like what you're saying is that independence is a necessary consequence of our preferences having limited information. I had considered this possibility and don't think it's right, because I can give a set of preferences with little independence and also little information, just by choosing the preferences using a pseudorandom number generator.

I think there is still a puzzle here, why our preferences show a very specific kind of structure (non-randomness).

That new preference of yours still can't distinguish the states of air molecules in the room, even if some of these states are made logically impossible by what's known about macro-objects. This shows both the source of dependence in precise preference and of independence in real-world approximations of preference. Independence remains where there's no computed info that allows to bring preference in contact with facts. Preference is defined procedurally in the mind, and its expression is limited by what can be procedurally figured out.

I don't really understand what you mean at this point. Take my apples/oranges example, which seems to have nothing to do with macro vs. micro. The Axiom of Independence says I shouldn't choose the 3rd box. Can you tell me whether you think that's right, or wrong (meaning I can rationally choose the 3rd box), and why?

To make that example clearer, let's say that the universe ends right after I eat the apple or orange, so there are no further consequences beyond that.

To make the example clearer, surely you would need to explain what the "" notation was supposed to mean.

It's from this paragraph of http://lesswrong.com/lw/15m/towards_a_new_decision_theory/ :

What if you have some uncertainty about which program our universe corresponds to? In that case, we have to specify preferences for the entire set of programs that our universe may correspond to. If your preferences for what happens in one such program is independent of what happens in another, then we can represent them by a probability distribution on the set of programs plus a utility function on the execution of each individual program. More generally, we can always represent your preferences as a utility function on vectors of the form where E1 is an execution history of P1, E2 is an execution history of P2, and so on.

In this case I'm assuming preferences for program executions that aren't independent of each other, so it falls into the "more generally" category.

Got an example?

You originally seemed to suggest that represented some set of preferences.

Now you seem to be saying that it is a bunch of vectors representing possible universes on which some unspecified utility function might operate.

I dispute your premise: what makes you so sure people do decompose their thoughts into beliefs and values, and find these to be natural, distinct categories? Consider the politics as mind-killer phenomenon. That can be expressed as, "People put your words into a broader context of whether they threaten their interests, and argue for or against your statements on that basis."

For example, consider the difficulty you will have communicating your position if you believe both a) global warming is unlikely to cause any significant problems in the business-as-usual scenario, b) high taxes on CO2 emissions should be levied. (e.g., you believe it's a good idea as an insurance policy and can be done in a way that blocks most of the economic damage)

(Yes, I had to use a present example to make the reactions easier to imagine.)

The "ought" is so tightly coupled to the "is", that in any case where the "ought" actually matters, the "is" comes along for the ride.

Note: this is related to the problem I had with the exposition of could/would/should agents: if you say humans are CSAs, what's an example of an intelligent agent that isn't?

I'm confused about this. Consider these statements:

A. "I believe that my shirt is red."
B. "I value cheese."

Are you claiming that:

  1. People don't actually make statements like A
  2. People don't actually make statements like B
  3. A is expressing the same sort of fact about the world as B
  4. Statements like A and B aren't completely separate; that is, they can have something to do with one another.

If you strictly mean 1 or 2, I can construct a counterexample. 3 is indeed counterintuitive to me. 4 seems uncontroversial (the putative is/ought problem aside)

If I had to say, it would be a strong version of 4: in conceptspace, people naturally make groupings that put is- and ought-statements together. But looking back at the post, I definitely have quite a bit to clarify.

When I refer to what humans do, I'm trying to look at the general case. Obviously, if you direct someone's attention to the issue of is/ought, then they can break down thoughts into values and beliefs without much training. However, in the absence of such a deliberate step, I do not think people normally make a distinction.

I'm reminded of the explanation in pjeby's earlier piece: people instinctively put xml-tags of "good" or "bad" onto different things, blurring the distinction between "X is good" and "Y is a reason to deem X good". That is why we have to worry about the halo effect, where you disbelieve everything negative about something you value, even if such negatives are woefully insufficient to justify not valuing it.

From the computational perspective, this can be viewed as a shortcut to having to methodically analyze all the positives and negatives of any course of action, and getting stuck thinking instead of acting. But if this is how the mind really works, it's not really reducible to a CSA, without severe stretching of the meaning.

Seconded. Sometimes I don't even feel I have fully separate beliefs and values. For instance, I'm often willing to change my beliefs to achieve my values (e.g., by believing something I have no evidence for, to become friends with other people who believe it - and yes, ungrounded beliefs can be adopted voluntarily to an extent.)

ungrounded beliefs can be adopted voluntarily to an extent.

I cannot do this, and I don't understand anyone who can. If you consciously say "OK, it would be really nice to believe X, now I am going to try really hard to start believing it despite the evidence against it", then you already disbelieve X.

I already disbelieve X, true, but I can change that. Of course it doesn't happen in a moment :-)

Yes, you can't create that feeling of rational knowledge about X from nothing. But if you can retreat from rationality - to where most people live their lives - and if you repeat X often enough, and you have no strongly emotional reason not to believe X, and your family and peers and role models all profess X, and X behaves like a good in-group distinguishing mark - then I think you have a good chance of coming to believe X. The kind of belief associated with faith and sports team fandom.

It's a little like the recent thread where someone, I forget who, described an (edit: hypothetical) religious guy who when drunk confessed that he didn't really believe in god and was only acting religious for the social benefits. Then people argued that no "really" religious person would honestly say that, and other people argued that even if he said that what does it mean if he honestly denies it whenever he's sober?

In the end I subscribe to the "PR consciousness" theory that says consciousness functions to create and project a self-image that we want others to believe in. We consciously believe many things about ourselves that are completely at odds with how we actually behave and the goals we actually seek. So it would be surprising if we couldn't invoke these mechanisms in at least some circumstances.

someone, I forget who, described a religious guy who when drunk confessed that he didn't really believe in god and was only acting religious for the social benefits.

generalizing from fictional evidence

When I wrote that I was aware that it was a fictional account deliberately made up to illustrate a point. I didn't mention that, though, so I created fictional evidence. Thanks for flagging this, and I should be more careful!

Worse: fictional evidence flagged as nonfictional -- like Alicorn's fictional MIT classmates that time.

My what now? I think that was someone else. I don't think I've been associated with MIT till now.

MIT not only didn't accept me when I applied, they didn't even reject me. I never heard back from them yea or nay at all.

That was me.

Of course, irony being what it is, people will now flag the Alicorn - MIT reference as nonfictional, and be referring to Alicorn's MIT example for the rest of LW history :)

Attempting to analyze my own stupidity, I suspect my confusion came from (1) both Alicorn and Yvain being both high-karma contributors and (2) Alicorn's handle coming more readily to mind, both because (a) I interacted more with her and (b) the pronunciation of "Alicorn" being more obvious than that of "Yvain".

In other words, I have no evidence that this was anything other than an ordinary mistake.

I've been imagining "Yvain" to be pronounced "ee-vane". I'd be interested in hearing a correction straight from the ee-vane's mouth if this is not right, though ;) I've heard people mispronounce "Alicorn" on multiple occasions.

I've heard people mispronounce "Alicorn" on multiple occasions.

You mean Alicorn is a real name? I had assumed a combination of Alison and Unicorn, with symbolic implications beyond my ken.

I've been imagining "Yvain" to be pronounced "ee-vane".

"Ye-vane" here, with the caveat that I was quite confident that it was way off.

No, it's not a real name (as far as I know). It's a real word. It means a unicorn's horn, although there are some modern misuses mostly spearheaded by Piers Anthony (gag hack cough).

Ahh. And I've been going about calling them well, unicorn horns all these years!

I've been saying "al-eh-corn" in my mental consciousness. Also "ee-vane", which suggests my problem being less "Yvain is hard to pronounce" than "Yvain doesn't look like the English I grew up speaking".

Incidentally, I can't remember how to pronounce Eliezer. I saw him say it at the beginning of a Bloggingheads video and it was completely different from my naive reading.

"Alicorn" is pronounced just like "unicorn", except that the "yoon" is replaced with "al" as in "Albert" or "Alabama". So the I is an "ih", not an "eh", but you can get away with an undifferentiated schwa.


(I think that's how I was saying it, actually - I wasn't sure how to write the second syllable.)

What's fictional about that?

Ready to pony up money for a bet that I can't produce a warm body meeting that description?

I prefer not to gamble, but just to satisfy my own curiosity: what would the controls be on such a bet? Presumably you would have to prove to Knight's satisfaction that your unbelieving belief-signaler was legitimately thus.

I think my evidence is strong enough I can trust Douglas_Knight's own intellectual integrity.

I think my evidence is strong enough I can trust Douglas_Knight's own intellectual integrity.

Huh. My last couple of interactions with you, you called me a liar.

Okay, I found what I think you're referring to. Probably not my greatest moment here, but Is that really something you want sympathy for? Here's the short version of what happened.

You: If you think your comment was so important, don't leave it buried deep in the discussion, where nobody can see it.

Me: But I also linked to it from a more visible place. Did you not know about that?

You: [Ignoring previous mischaracterization] Well, that doesn't solve the problem of context. I clicked on it and couldn't understand it, and it seemed boring.

Me: Wait, you claim to be interested in a solution, I post a link saying I have one, and it's too much of a bother to read previous comments for context? That doesn't make sense. Your previous comment implies you didn't know about the higher link. Don't dig yourseelf deeper by covering it up.

Oh, yeah, I'd forgotten that one. Actually, I was thinking of the following week.

Is that really something you want sympathy for?

I just want you to go away. I was hoping that reminding you that you don't believe me would discourage you from talking to me.

Oh, yeah, I'd forgotten that one. Actually, I was thinking of the following week.

That's not calling you a liar. That's criticizing the merit of your argument. There's a difference.

That's not calling you a liar. That's criticizing the merit of your argument. There's a difference.

The link provided by Douglas seems to suggest that Douglas's accusation is false (as well as ineffective).


Well, what possessed you to lie to me? ;-)

j/k, j/k, you're good, you're good.

A link would be nice though.

And I believe that, even taking into account any previous mistrust I might have had of you, I think my evidence is still strong enough that I can trust you consider it conclusive.

This assumption is central to establishing the mathematical structure of expected utility maximization, where you value each possible world separately using the utility function, then take their weighted average. If your preferences were such that A&C > B&C but A&D < B&D, then you wouldn’t be able to do this.

I can imagine having preferences that don't value each possible world separately. I can also imagine doing other things to my utility function than maximising expectation. For example, if I maximised the top quartile of expected values then I may choose to engage in practices analogous to quantum suicide. That I prefer, in principle, to maximise expected utility is itself a value. It is a value I that I expect to see in most successful agents, for fundamental reasons.

Here, have a mathematical perspective that conflates beliefs and values:

Suppose that some agent is given a choice between A and B. A is an apple. B is an N chance of a banana, otherwise nothing. The important thing here is the ambivalence equation: iff U(apple) = N*U(banana), the agent is ambivalent between the apple and the banana. Further suppose that N is 50%, and the agent likes bananas twice as much as it likes apples. In this case, at least, the agent might as well modify itself to believe that N is 20% and to like bananas five times as much as apples.

Now, doing this might result in inconsistencies elsewhere, but I'm guessing that a rational agent will be able to apply transformations to its beliefs and values--but only both simultaneously--so as to preserve expected utility given actions.

I think a more concrete example of the beliefs-values meld is anthropic reasoning. Take the presumptuous friend: you, the presumptuous philosopher's presumptuous friend, have just been split into 1,000,001 branches, and each of those branches has been placed in a hotel room, 1,000,000 rooms being in one hotel, and the other room being in another hotel. Is the probability that you're in the small hotel 50%, negligible, or something in between? Well, that depends: if you care about each of your branches equally, it's negligible; if you care about each hotel equally, it's 50%.

I think values (in a finite agent), also need to have some role in what beliefs "should" be stored/updated/remembered. Of course in theories which don't constrain the agents computational ability this isn't needed.

I think I tried to solve a similar problem before: that of looking at the simplest possible stable control system and seeing how I can extract the system's "beliefs" and "values" that result in it remaining stable. Then, see if I can find a continuous change between the structure of that system, and a more complex system, like a human.

For example, consider the simple spring-mass-damper system. If you move it from its equlibrium position xe, it will return. What do the concepts of "belief" and "value" map onto here? For beliefs, I used the concept of mutual information: what about the system could you look at to learn whether the mass is not at xe? How does the system know it's not at xe?

The information is contained in the force the spring exerts. However, this is also the determinant of which direction it moves the spring, it's "value". So it looks like the beliefs and values are fully-mixed: the same thing that tells you what it believes, tells you what it does. In that case, at what point, in the structural transition from the spring to intelligent agents, does the distinction between values and beliefs start to form, if at all?

Incidently, I only just now read Vladimir_Nesov's post because previously I hadn't bothered to make the equations readable, since they don't render properly in the browser I used. "Beware trivial inconveniences", indeed!

(Separate post because of the different issues and to avoid tl;dr.)

It's not the result of an "accidental" product of evolution that organisms are goal-directed and have values. Evolution made creatures that way for a reason - organisms that pursue their biological goals (without "updating" them) typically have more offspring and leave more descendants.

Mixing up your beliefs and values would be an enormous mistake - in the eyes of evolution. You might then "update" your values - trashing them in the process - a monumental disaster for your immortal coils.

Since I'm often annoyed when my posts are downvoted without explanation, and I saw that this post was downvoted, I'll try to explain the downvotes.

Updating of values happens all the time; it's called operant conditioning. If my dog barks and immediately is poked with a hot poker, its value of barking is updated. This is a useful adaptation, as being poked with a hot poker decreases fitness. If my dog tries to mate and immediately receives an electric shock, its value of making is decreased. This is a harmful adaptation, as mating is a more fundamental fitness factor than electric shocks.

So, you seem to be explaining an observation that is not observed using a fact that is not true.

Your disagreement apparently arises though using the term "value" in a different sense from me. If it helps you to understand, I am talking about what are sometimes called "ultimate values".

Most organisms don't update their values. They value the things evolution built into them - food, sex, warmth, freedom from pain, etc. Their values typically remain unchanged throughout their lives.

From my perspective, the dog's values aren't changed in your example. The dog merely associates barking with pain. The belief that a bark is likely to be followed by a poker prod is a belief, not a value. The dog still values pain-avoidance - just as it always did.

We actually have some theory that indicates that true values should change rarely. Organisms should protect their values - since changes to their values are seen as being very "bad" - in the context of the current values. Also, evolution wires in fitness-promoting values. These ideas help to explain why fixed values are actually extremely common.

Those are good points, but I still find your argument problematic.

First, do you know that dogs are capable of the abstract thought necessary to represent causality? You're saying that the dog has added the belief "bark causes pain", which combines with "pain bad".

That may be how a programmer would try to represent it, since you can rely on the computational power necessary to sweep through the search space quickly and find the "pain bad" module every time a "reason to bark" comes up. But is it good as a biological model? It requires the dog to indefinitely keep a concept of a prod in memory.

A simpler biological mechanism, consistent with the rest of neurobiology, would be to just lower the connection strengths that lead to the "barking" neuron so that it requires more activation of other "barking causes" to make it fire (and thus make the dog bark). I think that's a more reasonable model of how operant conditioning works in this context.

This mechanism, in turn, is better described as lowering the "shouldness" of barking, which is ambiguous with respect to whether it's a value or belief.

It seems to be a common criticism of utility-based models that they no not map directly onto underlying biological hardware.

That is true - but it is not what such models are for in the first place. Nobody thinks that if you slice open an animal you will find a utility function, and some representation of utility inside.

The idea is more that you could build a functionally equivalent model which exhibited such an architecture - and then gain insight into the behaviour of the model by examining its utility function.

I'm concerned with the weaker constraint that the model must conceptually map to the biological hardware, and in this respect the utility-based model you gave doesn't work. There is no distinction, even conceptual, between values and beliefs: just synaptic weights from the causes-of-barking nodes, to the bark node.

Furthermore, the utility-based model does not give insight, because the "shortcuts" resulting from the neural hardware are fundamental to its operation. For example, the fact that it comes up with a quick, simple calculation affects how many options can be considered and therefore whether e.g. value transitivity will break down.

So the utility-based model is more complex than a neural network, and with worse predictive power, so it doesn't let you claim that its change in behavior resulted from beliefs rather than values.

Values are fixed, while many beliefs vary in response to sensory input.

You don't seem to appreciate the value of a utility based analysis.

Knowing that an animal likes food and sex, and doesn't like being hit provides all kinds of insights into its behaviour.

Such an analysis is much simpler than a neural network is, and it has the advantage that we can actually build and use the model - rather than merely dream about doing so in the far future, when computers are big enough to handle it, and neuroscience has advanced sufficiently.

That's not a very fair comparison! You're looking at the most detailed version of a neural network (which I would reject as a model anyway for the very reason that it needs much more resources than real brains to work) and comparing it to a simple utility-based model, and then sneaking in your intuitions for the UBM, but not the neural network (as RobinZ noted).

I could just as easily turn the tables and compare the second neural network here to a UDT-like utility-based model, where you have to compute your action in every possible scenario, no matter how improbable.

Anyway, I was criticizing utility-based models, in which you weight the possible outcomes by their probability. That involves a lot more than the vague notion that an animal "likes food and sex".

Of course, as you note, even knowing that it likes food and sex gives some insight. But it clearly breaks down here: the dog's decision to bark is made very quickly, and having to do an actual human-insight-free, algorithmic computation of expected utilities, involving estimates of their probabilities, takes way too long to be a realistic model. The shortcuts used in a neural network skew the dog's actions is predictable ways, showing them to be a better model, and showing the value/belief distinction to break down.

I am still not very sympathetic to the idea that neural network models are simple. They include the utility function and all the creature's beliefs.

A utility based model is useful - in part - since it abstracts those beliefs away.

Plus neural network models are renowned for being opaque and incomprehensible.

You seem to have some strange beliefs in this area. AFAICS, you can't make blanket statements like: neural-net models are more accurate. Both types of model can represent observed behaviour to any desired degree of precision.

You're using a narrower definition of neural network than I am. Again, refer to the last link I gave for an example of a simple neural network, which is equal to or less than the complexity of typical expected utility models. That NN is far from being opaque and incomprehensible, wouldn't you agree?

I am still not very sympathetic to the idea that neural network models are simple. They include the utility function and all the creature's beliefs.

No, they just have activation weights, which don't (afaict) distinguish between beliefs and values, or at least, don't distinguish between "barking causes a prod which is bad" and "barking isn't as good (or perhaps, as 'shouldish')".

A utility based model is useful - in part - since it abstracts those beliefs away.

The UBMs discussed in this context (see TL post) necessarily include probability weightings, which are used to compute expected utility, which factors in the tradeoffs between probability of an event and its utility. So it's certainly not abstracting those beliefs away.

Plus, you've spent the whole conversation explaining why your UBM of the dog allows you to classify the operant conditioning (of prodding the dog when it barks) as changing it's beliefs and NOT its values. Do you remember that?

Correct me if I'm wrong, but it's only simpler if you already have a general-purpose optimizer ready to hand - in this case, you.

You have to have complicated scientists around to construct any scientific model - be it utility-based or ANN.

Since we have plenty of scientists around, I don't see much point in hypothesizing that there aren't any.

You seem to be implying that the complexity of utility based models lies in those who invent or use them. That seems to be mostly wrong to me: it doesn't matter who invented them, and fairly simple computer programs can still use them.

If you've seen it work, I'll take your word for it.

Incidentally, I did not claim that dogs can perform abstract thinking - I'm not clear on where you are getting that idea from.

You said that the dog had a belief that a bark is always followed by a poker prod. This posits separate entities and a way that they interact, which looks to me like abstract thought.

The definition of "abstract thought" seems like a can of worms to me.

I don't really see why I should go there.

Hm, I never before realized that operant conditioning is a blurring of the beliefs and values -- the new frequency of barking can be explained either by a change of the utility of barking, or by a change in the belief about what will result from the barking.

IMO, "a blurring of beliefs and values" is an unhelpful way of looking at what happens. It is best to consider an agent as valuing freedom from pain, and the association between barking and poker prods to be one of its beliefs.

If you have separated out values from beliefs in a way that leads to frequently updated values, all that means is that you have performed the abstraction incorrectly.

Because a comment is down-voted, that doesn't mean it is incorrect.

This particular comment implicitly linked people's values to their reproductive success. People don't like to hear that they are robot vehicles built to propagate their genes. It offends their sense of self-worth. Their mental marketing department spends all day telling everyone what an altruistic and nice person they are - and they repeat it so many times that they come to believe it themselves. That way their message comes across with sincerity. So: the possibility of biology underlying their motives is a truth that they often want to bury - and place as far out of sight as possible.

While we can never escape our biology entirely, I dispute any suggestion that the selfish gene is always the best level of abstraction, or best model, for human behavior. I assume you agree even though that did not come across in this paragraph.

Humans behaviour is often illuminated by the concept of memes. Humans are also influenced by the genes of their pathogens (or other manipulators). If you cough or sneeze, that behaviour is probably not occurring since it benefits you.

Similarly with cancer or back pain - not everything is an adaptation.

Or the dog values not being in pain more than it values barking or mating...

Maybe these are to do with differences across individuals. My beliefs/values may be mashed togather and impossible to seperate, but I expect other people's beliefs to mirror my own more closely than their values do.

Because it's much easier to use beliefs shorn of values as building blocks in a machine that does induction, inference, counterfactual reasoning, planning etc compared to belief-values that are all tied up together.

Sea slugs and Roombas don't have the beliefs/values separation it because the extra complexity isn't worth it. Humans have it to some degree and rule the planet. AIs might have even more success.

Is it possible that the dichotomy between beliefs and values is just an accidental byproduct of our evolution, perhaps a consequence of the specific environment that we’re adapted to, instead of a common feature of all rational minds?

In the normal usage, "mind" implies the existence of a distinction between beliefs and values. In the LW/OB usage, it implies that the mind is connected to some actuators and sensors which connect to an environment and is actually doing some optimization toward those values. Certainly "rational mind" entails a beliefs/values separation.

But suppose we abandon the beliefs/values separation: what properties do we have left? Is the concept "mind without a beliefs/values separation" Simply the concept "thing"?

But suppose we abandon the beliefs/values separation: what properties do we have left? Is the concept "mind without a beliefs/values separation" Simply the concept "thing"?

An agent using UDT doesn't necessarily have a beliefs/values separation, but still has the properties of preferences and decision making. Or at least, it only has beliefs about mathematical facts, not about empirical facts. Maybe I should have made it clear that I was mainly talking about empirical beliefs in the post.

Not quite true: state of knowledge corresponds to beliefs. It's values that don't update (but in expected utility maximization that's both utility and prior). Again, it's misleading to equate beliefs with prior and forget about the knowledge (event that conditions the current state).

Yes, I agree we can interpret UDT as having its own dichotomy between beliefs and values, but the dividing line looks very different from how humans divide between beliefs and values, which seems closer to the probability/utility divide.

UDT is invariant with respect to what universe it's actually in. This requires it to compute over infinite universes and thus have infinite computing power. It's not hard to see why it's going to break down as a model of in-universe, limted beings.

What do you mean? It has a utility function just like most other decision theories do. The preferences are represented by the utility function.

How, then, would you describe its representation of empirical information - if not as "beliefs"?

An agent using UDT doesn't necessarily have a beliefs/values separation,

I am behind on your recent work on UDT; this fact comes as a shock to me. Can you provide a link to a post of yours/provide an example here making clear that UDT doesn't necessarily have a beliefs/values separation? Thanks.

Suppose I offer you three boxes and ask you to choose one. The first two are transparent, free, and contains an apple and an orange, respectively. The third is opaque, costs a penny, and contains either an apple or an orange, depending on a coin flip I made. Under expected utility maximization, there is no reason for you to choose the third box, regardless of your probability function and utility function. Under UDT1, you can choose the third box, by preferring to and as the outcomes of world programs P1 and P2. In that case, you can't be said to have a belief about whether the real world is P1 or P2.

This example seems unclear. Are you seriously claiming utility maximisation can't prefer a randomised outcome in an iterated situation? If so, you take this "independence" business much too far.

Utility maximising agents can do things like prefer a diverse diet. They simply do not have to prefer either apples or oranges - thereby winding up with vitamin and mineral deficiencies. It is trivial to create a utility function which exhibits fruit preferences which depend on what you have eaten most recently.

Randomization only maximizes diversity if you have to make decisions under amnesia or coordinate without communication or some similar perverse situation. In any normal case, you're better off choosing a deterministic sequence that's definitely diverse, rather than leaving it to randomness and only probably getting a diverse set of outcomes.

Sure - but that seems rather tangential to the main point here.

The options were , - or a more expensive random choice. A random diet may not be perfect - but it was probably the best one on offer in the case of this example.

If the agent already has a penny (which they must if they can afford to choose the third box), they could just flip the penny to decide which of the first two boxes to take and save themselves the money.

Unless you're being a devil's advocate, I don't see any reason to justify a completely rational agent choosing the random box.

What - never? Say they can only make the choice once - and their answer determines which box they will get on all future occasions.

Then choice C isn't a random mixture of choice A and choice B.

Preferring that there be randomness at a point where you otherwise wouldn't get a decision at all, is fine. What doesn't happen is preferring one coin-flip in place of one decision.

Not to be crass, but given the assumption that Wei_Dai is not saying something utterly asinine, does your interpretation of the hypothetical actually follow?

Hang on! My last comment was a reply to your question about when it could be rational to select the third box. I have already said that the original example was unclear. It certainly didn't suggest an infinite sequence - and I wasn't trying to suggest that.

The example specified that choosing the third box was the correct answer - under the author's own proposed decision theory. Surely interpretations of what it was supposed to mean should bear that in mind.

I don't believe we're actually arguing about anything worth caring about. My understanding was that Wei_Dai was illustrating a problem with UDT1 - in which case a single scenario in which UDT1 gives an unambiguously wrong answer suffices. To disprove Wei_Dai's assertion requires demonstrating that no scenario of the kind proposed makes UDT1 give the wrong answer, not showing that not every scenario of the kind proposed makes UDT1 give the wrong answer.

Are you sure you are taking the fact that he is UDT's inventor and biggest fan into account? He certainly didn't claim that he was illustrating a problem with UDT.

...you're right, I'm misreading. I'll shut up now.

Okay, let's see if I have this straight - you're assuming:

  1. the axiom of independence is necessary for expected utility theory
  2. losing a penny represents some negative amount of utility
  3. one's utility function can't include terms for "the outcomes of world programs" under expected utility theory

Under expected utility maximization, there is no reason for you to choose the third box, regardless of your probability function and utility function. Under UDT1, you can choose the third box, by preferring to and as the outcomes of world programs P1 and P2. In that case, you can't be said to have a belief about whether the real world is P1 or P2.

You lost me. Is 'apples' supposed to be plural? Can you really not choose the third box regardless of utility function? What if you prefer things that came in opaque boxes?

It's not supposed to be plural. Fixed.

The opaque box was a way of framing the problem, and not part of the problem itself, which is supposed to be about your preferences for apples and oranges. I can specify the problem in terms of three identical buttons that you can press instead.