Secrets of the eliminati

[-][anonymous]14y400

I wonder:

if you had an agent that obviously did have goals (let's say, a player in a game, whose goal is to win, and who plays the optimal strategy) could you deduce those goals from behavior alone?

Let's say you're studying the game of Connect Four, but you have no idea what constitutes "winning" or "losing." You watch enough games that you can map out a game tree. In state X of the world, a player chooses option A over other possible options, and so on. From that game tree, can you deduce that the goal of the game was to get four pieces in a row?

I don't know the answer to this question. But it seems important. If it's possible to identify, given a set of behaviors, what goal they're aimed at, then we can test behaviors (human, animal, algorithmic) for hidden goals. If it's not possible, that's very important as well; because that means that even in a simple game, where we know by construction that the players are "rational" goal-maximizing agents, we can't detect what their goals are from their behavior.

That would mean that behaviors that "seem" goal-less, programs that have no line of code representing a goal, may in fact be beh... (read more)

[-]Wei Dai14y190

From that game tree, can you deduce that the goal of the game was to get four pieces in a row?

One method that would work for this example is to iterate over all possible goals in ascending complexity, and check which one would generate that game tree. How to apply this idea to humans is unclear. See here for a previous discussion.

3[anonymous]14y

Ok, computationally awful for anything complicated, but possible in principle for simple games. That's good, though; that means goals aren't truly invisible, just inconvenient to deduce.

2chatquitevoit14y

I think, actually, because we hardly ever play with optimal strategy goals are going to be nigh impossible to deduce. Would such a end-from-means deduction even work if the actor was not using the optimal strategy? Because humans only do so in games on the level of tic-tac-toe (the more rational ones maybe in more complex situations, but not by much), and as for machines that could utilize optimal strategy, we've just excluded them from even having such 'goals'.

1Error12y

If each game is played to the end (no resignations, at least in the sample set) then presumably you could make good initial guesses about the victory condition by looking at common factors in the final positions. A bit like zendo. It wouldn't solve the problem, but it doesn't rely on optimal play, and would narrow the solution space quite a bit. e.g. in the connect-four example, all final moves create a sequence of four or more in a row. Armed with that hypothesis, you look at the game tree, and note that all non-final moves don't. So you know (with reasonably high confidence) that making four in a row ends the game. How to figure out whether it wins the game or loses it is an exercise for the reader. (mental note, try playing C4 with the win condition reversed and see if it makes for an interesting game.)

2printing-spoon14y

there's always heuristics, for example seeing that the goal of making three in a row fits the game tree well suggests considering goals of the form "make n in a row" or at least "make diagonal and orthogonal versions of some shape"

9sixes_and_sevens14y

Human games (of the explicit recreational kind) tend to have stopping rules isomorphic with the game's victory conditions. We would typically refer to those victory conditions as the objective of the game, and the goal of the participants. Given a complete decision tree for a game, even a messy stochastic one like Canasta, it seems possible to deduce the conditions necessary for the game to end. An algorithm that doesn't stop (such as the blue-minimising robot) can't have anything analogous to the victory condition of a game. In that sense, its goals can't be analysed in the same way as those of a Connect Four-playing agent.

3Khaled14y

So if the blue-minimising robot was to stop after 3 months (the stop condition is measured by a timer), can we say that the robot's goal is to stay "alive" for 3 months? I cannot see a necessry link between deducing goals and stopping conditions. A "victory condition" is another thing, but from a decision tree, can you deduce who loses (for Connect Four, perhaps it is the one who reaches the first four that loses).

2sixes_and_sevens14y

By "victory condition", I mean a condition which, when met, determines the winning, losing and drawing status of all players in the game. A stopping rule is necessary for a victory condition (it's the point at which it is finally appraised), but it doesn't create a victory condition, any more than imposing a fixed stopping time on any activity creates winners and losers in that activity.

2Khaled14y

Can we know the victory condition from just watching the game?

4sixes_and_sevens14y

Just to underscore a broader point: recreational games have various characteristics which don't generalise to all situations modelled game-theoretically. Most importantly, they're designed to be fun for humans to play, to have consistent and explicit rules, to finish in a finite amount of time (RISK notwithstanding), to follow some sort of narrative and to have means of unambiguously identifying winners. Anecdotally, if you're familiar with recreational games, it's fairly straightforward to identify victory conditions in games just by watching them being played, because their conventions mean those conditions are drawn from a considerably reduced number of possibilities. There are, however, lots of edge- and corner-cases where this probably isn't possible without taking a large sample of observations.

1kurokikaze14y

Well, even if we have conditions to end game we still don't know if player's goal is to end the game (poker) or to avoid ending it for as long as possible (Jenga). We can try to deduce it empirically (if it's possible to end game on first turn effortlesly, then goal is to keep going), but I'm not sure if it applies to all games.

1sixes_and_sevens14y

If ending the game quickly or slowly is part of the objective, in what way is it not included in the victory conditions?

0kurokikaze14y

I mean it could not be visible from a game log (for complex games). We will see the combination of pieces when game ends (ending condition), but it can be not enough.

1sixes_and_sevens14y

I don't think we're talking about the same things here. A decision tree is an optimal path through all possible decision in a game, not just the history of any given game. "Victory conditions" in the context I'm using are the conditions that need to be met in order for the game to end, not simply the state of play at the point when any given game ends.

6Pavitra14y

I suspect that "has goals" is ultimately a model, rather than a fact. To the extent that an agent's behavior maximizes a particular function, that agent can be usefully modeled as an optimizer. To the extent that an agent's behavior exhibits signs of poor strategy, such as vulnerability to dutch books, that agent may be better modeled as an algorithm-executer. This suggests that "agentiness" is strongly tied to whether we are smart enough to win against it.

4wedrifid14y

This principle is related to (a component of) the thing referred to as 'objectified'. That is, if a person is aware that another person can model it as an algorithm-executor then it may consider itself objectified.

6DanielLC14y

What I've heard is that, for an intelligent entity, it's easier to predict what will happen based on their goals rather than what they do. For example, with the connect four game, if you manage to figure out that they always seem to get four in a row, and you never do when you play against them, before you can figure out what their strategy is, you know their goal.

[-]orthonormal14y130

Although you might have just identified an instrumental subgoal.

3Vladimir_Nesov14y

Compare with only ever seeing one move made in such a game, but being able to inspect in detail the reasons that played a role in deciding what move to make, looking for explanations for that move. It seems that even one move might suffice, which goes to show that it's unnecessary for behavior itself to somehow encode agent's goals, as we can also take into account the reasons for the behavior being so and so.

3lythrum14y

If you had lots of end states, and lots of non-end states, and we want to assume the game ends when someone's won, and that a player only moves into an end state if he's won (neither of these last two are necessarily true even in nice pretty games), then you could treat it like a classification problem. In that case, you could throw your favourite classifier learning algorithm at it. I can't think of any publications on someone machine learning a winning condition, but that doesn't mean it's not out there. Dr. David Silver used temporal difference learning to learn some important spatial patterns for Go play, using self-play. Self play is basically like watching yourself play lots of games with another copy of yourself, so I can imagine similar ideas being used to watching someone else play. If you're interested in that, I suggest http://www.aaai.org/Papers/IJCAI/2007/IJCAI07-170.pdf On a sadly less published (and therefore mostly unreliable) but slightly more related note, we did have a project once in which we were trying to teach bots to play a Mortal Kombat style game only by observing logs of human play. We didn't tell one of the bots the goal, we just told it when someone had won, and who had won. It seemed to get along ok.

2Will_Newsome14y

One of my 30 or so Friendliness-themed thought experiments is called "Implicit goals of ArgMax" or something like that. In general I think this style of reasoning is very important for accurately thinking about universal AI drives. Specifically it is important to analyze highly precise AI architectures like Goedel machines where there's little wiggle room for a deus ex machina.

[-]JGWeissman14y270

Reductionists want to reduce things like goals and preferences to the appropriate neurons in the brain; eliminativists want to prove that humans, like the blue-minimizing robot, don't have anything of the sort until you start looking at high level abstractions.

Surely you mean that eliminativists take actions which, in their typical contexts, tend to result in proving that humans, like the blue-minimizing robot, don't have anything of the sort until you start looking at high level abstractions.

[-]Scott Alexander14y120

Surely you mean that there are just a bunch of atoms which, when interpreted as a human category, can be grouped together to form a being classifiable as "an eliminativist".

[-]kybernetikos14y130

eliminativists want to prove that humans, like the blue-minimizing robot, don't have anything of the sort until you start looking at high level abstractions.

Just because something only exists at high levels of abstraction doesn't mean it's not real or explanatory. Surely the important question is whether humans genuinely have preferences that explain their behaviour (or at least whether a preference system can occasionally explain their behaviour - even if their behaviour is truly explained by the interaction of numerous systems) rather than how these preferences are encoded.

The information in a jpeg file that indicates a particular pixel should be red cannot be analysed down to a single bit that doesn't do anything else, that doesn't mean that there isn't a sense in which the red pixel genuinely exists. Preferences could exist and be encoded holographically in the brain. Whether you can find a specific neuron or not is completely irrelevant to their reality.

9Logos0114y

I have often stated that, as a physicalist, the mere fact that something does not independently exist -- that is, it has no physically discrete existence -- does not mean it isn't real. The number three is real -- but does not exist. It cannot be touched, sensed, or measured; yet if there are three rocks there really are three rocks. I define "real" as "a pattern that proscriptively constrains that which exists". A human mind is real; but there is no single part of your physical body you can point to and say, "this is your mind". You are the pattern that your physical components conform to. It seems very often that objections to reductionism are founded in a problem of scale: the inability to recognize that things which are real from one perspective remain real at that perspective even if we consider a different scale. It would seem, to me, that "eliminativism" is essentially a redux of this quandary but in terms of patterns of thought rather than discrete material. It's still a case of missing the forest for the trees.

0kybernetikos14y

I agree. In particular I often find these discussions very frustrating because people arguing for elimination seem to think they are arguing about the 'reality' of things when in fact they're arguing about the scale of things. (And sometimes about the specificity of the underlying structures that the higher level systems are implemented on). I don't think anyone ever expected to be able to locate anything important in a single neuron or atom. Nearly everything interesting in the universe is found in the interactions of the parts not the parts themselves. (Also - why would we expect any biological system to do one thing and one thing only?). I regard almost all these questions as very similar to the demarcation problem. A higher level abstraction is real if it provides predictions that often turn out to be true. It's acceptable for it to be an incomplete / imperfect model, although generally speaking if there is another that provides better predictions we should adopt it instead. This is what would convince me that preferences were not real: At the moment I model other people by imagining that they have preferences. Most of the time this works. The eliminativist needs to provide me with an alternate model that reliably provides better predictions. Arguments about theory will not sway me. Show me the model.

[-]Torben14y90

Interesting post throughout, but don't you overplay your hand a bit here?

There's nothing that looks remotely like a goal in its programming, [...]

An IF-THEN piece of code comparing a measured RGB value to a threshold value for firing the laser would look at least remotely like a goal to my mind.

1ShardPhoenix14y

Consider a robot where the B signal is amplified and transmitted directly to the laser (so brighter blue equals strong laser firing). This eliminates the conditional logic while still keeping approximately the same apparent goal.

[-]Kaj_Sotala14y80

More explanatory of the way people actually behave is that there's no unified preference for or against death, but rather a set of behaviors. Being in a burning building activates fleeing behavior; contemplating death from old age does not activate cryonics-buying behavior.

YES. This so much.

6juped14y

Contemplating death from old age does activate fleeing behavior, though (at least in me), which is another of those silly bugs in the human brain. If I found a way to fix it to activate cryonics-buying behavior instead, I would probably have found a way to afford life insurance by now.

6JGWeissman14y

Three suggestions: 1. When you notice that your fleeing behavior has been activated, ask "Am I fleeing a problem I can solve?", and if the answer is yes, think "This is silly, I should turn and face this solvable problem". 2. Focus more on the reward of living forever than the punishment of death from old age. 3. Contact Rudi Hoffman today.

-3DSimon14y

If you can predict what a smarter you would think, why not just think that thought now?

4gwern14y

There are also problems with incompleteness; if I can think everything a smarter me would think, then in what sense am I not that smarter me? If I cannot think everything, so there is a real difference between the smarter me and the current me, then that incompleteness may scuttle any attempt to exploit my stolen intelligence. For example, in many strategy games, experts can play 'risky' moves because they have the skill/intelligence to follow through and derive advantage from the move, but a lesser player, even if they know 'an expert would play here' would not know how to handle the opponent's reactions and would lose terribly. (I commented on Go in this vein.) Such a lesser player might be harmed by limited knowledge.

4MixedNuts14y

Not applicable here. If you can predict what a stronger you would lift, why not lift it right now? Because it's not about correct beliefs about what you want the meat robot to do, it's about making it do it. It involves different thoughts, about planning rather than goal, which aren't predicted; and resources, which also need planning to obtain.

4DSimon14y

Good points. I wrote my comment with the purpose in mind of providing some short-term motivation to juped, since it seems that that's currently the main barrier between them and one of their stated long-term goals. That might or might not have been accomplished, but regardless you're certainly right that my statement wasn't, um, actually true. :-)

[-]Khaled14y70

But if whenever I eat dinner at 6I sleep better than when eating dinner at 8, can I not say that I prefer dinner at 6 over dinner at 8? Which would be one step over saying I prefer to sleep well than not.

I think we could have a better view if we consider many preferences in action. Taking your cyonics example, maybe I prefer to live (to a certain degree), prefer to conform, and prefer to procrastinate. In the burning-building situation, the living preference is playing more or less alone, while in the cryonics situation, preferences interact somewhat like oppsite forces and then motion happens in the winning side. Maybe this is what makes preferences seem like varying?

0MaoShan14y

Or is it that preferences are what you get when you consider future situations, in effect removing the influence of your instincts? If I consistently applied the rationale to both situations (cryonics, burning building), and came up with the conclusion that I would prefer not to flee the burning building, that might make me a "true rationalist", but only until the point that the building was on fire. No matter what my "preferences" are, they will (rightly so) be over-ridden by my survival instincts. So, is there any practical purpose to deciding what my preferences are? I'd much rather have my instincts extrapolated and provided for.

0[anonymous]14y

Depends on the extent to which you consider your instincts a part of you. Equally, if you cannot afford cryonics, you could argue that your preferences to sign up or not are irrelevant. No matter what your "preferences" are, they will be overridden by your budget.

[-]Eugine_Nier14y70

Eliminativism is all well and good if all one wants to do is predict. However, it doesn't help answer questions like "What should I do?", or "What utility function should we give the FAI?"

[-]Scott Alexander14y460

The same might be said of evolutionary psychology. In which case I would respond that evolutionary psychology helped us stop thinking in a certain stupid way.

Once, we thought that men were attracted to pretty women because there was some inherent property called "beauty", or that people helped their neighbors because there was a universal Moral Law to which all minds would have access. Once it was the height of sophistication to argue whether people were truly good but corrupted by civilization, or truly evil but restrained by civilization.

Evolutionary psychology doesn't answer "What utility function should we give the FAI?", but it gives good reasons to avoid the "solution": 'just tell it to look for the Universal Moral Law accessible to all minds, and then do that.' And I think a lot of philosophy progresses by closing off all possible blind alleys until people grudgingly settle on the truth because they have no other alternative.

I am less confident in my understanding of eliminativism than of evo psych, so I am less willing to speculate on it. But since one common FAI proposal is "find out human preferences, and then do those", if it turns... (read more)

3Vaniver14y

I would think that knowing evo psych is enough to realize this is a dodgy approach at best.

1TimFreeman14y

I don't see the connection, but I do care about the issue. Can you attempt to state an argument for that? Human preferences are an imperfect abstraction. People talk about them all the time and reason usefully about them, so either an AI could do the same, or you found a counterexample to the Church-Turing thesis. "Human preferences" is a useful concept no matter where those preferences come from, so evo psych doesn't matter. Similarly, my left hand is an imperfect abstraction. Blood flows in, blood flows out, flakes of skin fall off, it gets randomly contaminated from the environment, and the boundaries aren't exactly defined, but nevertheless it generally does make sense to think in terms of my left hand. If you're going to argue that FAI defined in terms of inferring human preferences can't work, I hope that isn't also going to be an argument that an AI can't possibly use the concept of my left hand, since the latter conclusion would be absurd.

2Vaniver14y

Sure. I think I should clarify first that I meant evo psych should have been sufficient to realize that human preferences are not rigorously coherent. If I tell a FAI to make me do what I want to do, its response is going to be "which you?", as there is no Platonic me with a quickly identifiable utility function that it can optimize for me. There's just a bunch of modules that won the evolutionary tournament of survival because they're a good way to make grandchildren. If I am conflicted between the emotional satisfaction of food and the emotional dissatisfaction of exercise combined with the social satisfaction of beauty, will a FAI be able to resolve that for me any more easily than I can resolve it? If my far mode desires are rooted in my desire to have a good social identity, should the FAI choose those over my near mode desires which are rooted in my desire to survive and enjoy life? In some sense, the problem of FAI is the problem of rigorously understanding humans, and evo psych suggests that will be a massively difficult problem. That's what I was trying to suggest with my comment.

0TimFreeman14y

I think that bar is unreasonably high. If you have conflict between enjoying eating a lot vs being skinny and beautiful, and the FAI helps you do one or the other, then you aren't in a position to complain that it did the wrong thing. It's understanding of you doesn't have to be more rigorous than your understanding of you.

0Vaniver14y

It does if I want it to give me results any better than I can provide for myself. I also provided the trivial example of internal conflicts- external conflicts are much more problematic. Human desire for status is possibly the source of all human striving and accomplishment. How will a FAI deal with the status conflicts that develop?

0TimFreeman14y

No. For example, if it develops some diet drug that lets you safely enjoy eating and still stay skinny and beautiful, that might be a better result than you could provide for yourself, and it doesn't need any special understanding of you to make that happen. It just makes the drug, makes sure you know the consequences of taking it, and offers it to you. If you choose take it, that tells the AI more about your preferences, but there's no profound understanding of psychology required. Putting an inferior argument first is good if you want to try to get the last word, but it's not a useful part of problem solving. You should try to find the clearest problem where solving that problem solves all the other ones. If it can do a reasonable job of comparing utilities across people, then maximizing average utility seems to do the right thing here. Comparing utilities between arbitrary rational agents doesn't work, but comparing utilities between humans seems to -- there's an approximate universal maximum (getting everything you want) and an approximate universal minimum (you and all your friends and relatives getting tortured to death). Status conflicts are not one of the interesting use cases. Do you have anything better?

0Vaniver14y

It might not need special knowledge of my psychology, but it certainly needs special knowledge of my physiology. But notice that the original point was about human preferences. Even if it provides new technologies that dissolve internal conflicts, the question of whether or not to use the technology becomes a conflict. Remember, we live in a world where some people have strong ethical objections to vaccines. An old psychological finding is that oftentimes, giving people more options makes them worse off. If the AI notices that one of my modules enjoys sensory pleasure, offers to wirehead me, and I reject it on philosophical grounds, I could easily become consumed by regret or struggles with temptation, and wish that I never had been offered wireheading in the first place. I put the argument of internal conflicts first because it was the clearest example, and you'll note it obliquely refers to the argument about status. Did you really think that, if a drug were available to make everyone have perfectly sculpted bodies, one would get the same social satisfaction from that variety of beauty? I doubt it can measure utilities; as I argued two posts ago, and simple average utilitarianism is so wracked with problems I'm not even sure where to begin. A common tactic in human interaction is to care about everything more than the other person does, and explode (or become depressed) when they don't get their way. How should such real-life utility monsters be dealt with? Why do you find status uninteresting?

2NancyLebovitz14y

I haven't heard of people having strong ethical objections to vaccines. They have strong practical (if ill-founded) objections-- they believe vaccines have dangers so extreme as to make the benefits not worth it, or they have strong heuristic objections-- I think they believe health is an innate property of an undisturbed body or they believe that anyone who makes money from selling a drug can't be trusted to tell the truth about its risks. To my mind, an ethical objection would be a belief that people should tolerate the effects of infectious diseases for some reason such as that suffering is good in itself or that it's better for selection to enable people to develop innate immunities.

6soreff14y

That wasn't precisely the objection of Christian conservatives to the HPV vaccine (perhaps more nearly that they wanted sex to lead to suffering?), but it is fairly close

1Vaniver14y

I am counting religious objections as ethical objections, and there are several groups out there that refuse all medical treatment.

0TimFreeman14y

If everyone's inferred utility goes from 0 to 1, and the real-life utility monster cares more than the other people about one thing, the inferred utility will say he cares less than other people about something else. Let him play that game until the something else happens, then he loses, and that's a fine outcome. I think it can, in principle, estimate utilities from behavior. See http://www.fungible.com/respect. The problems I'm aware of have to do with creating new people. If you assume a fixed population and humans who have comparable utilities as described above, are there any problems left? Creating new people is a more interesting use case than status conflicts. As I said, because maximizing average utility seems to get a reasonable result in that case.

-2Vaniver14y

That's not the situation I'm describing; if 0 is "you and all your friends and relatives getting tortured to death" and 1 is "getting everything you want," the utility monster is someone who puts "not getting one thing I want" at, say, .1 whereas normal people put it at .9999. And if humans turn out to be adaption-executers, then utility is going to look really weird, because it'll depend a lot on framing and behavior. How do you add two utilities together? If you can't add, how can you average? If people dislike losses more than they like gains and status is zero-sum, does that mean the reasonable result of average utilitarianism when applied to status is that everyone must be exactly the same status?

2A1987dM13y

>If you can't add, how can you average? You can average but not add elements of an affine space. The average between the position of the tip of my nose and the point two metres west of it is the point one metre west of it, but their sum is not a well-defined concept (you'd have to pick an origin first, and the answer will depend on it). (More generally, you can only take linear combinations whose coefficients sum to 1 (to get another element of the affine space) or to 0 (to get a vector). Anyway, the values of two different utility functions aren't even elements of the same affine space, so you still can't average them. The values of the same utility function are, and the average between U1 and U2 is U3 such that you'd be indifferent between 100% probability of U3, and 50% probability of each of U1 and U2.)

-2Vaniver13y

Correct but irrelevant. Utility functions are families of mappings from futures to reals, which don't live in an affine space, as you mention. This looks more like a mention of an unrelated but cool mathematical concept than a nitpick.

3A1987dM13y

My point is that “If you can't add, how can you average?” is not a valid argument, even though in this particular case both the premise and the conclusion happen to be correct.

0Vaniver13y

If I ask "If you can't add, how can you average?" and TimFreeman responds with "by using utilities that live in affine spaces," I then respond with "great, those utilities are useless for doing what you want to do." When a rhetorical question has an answer, the answer needs to be material to invalidate its rhetorical function; where's the invalidity?

3A1987dM13y

I took the rhetorical question to implicitly be the syllogism 'you can't sum different people's utilities, you can't average what you can' t sum, therefore you can' average different people's utilities'. I just pointed out that the second premise isn't generally true. (Both the first premise and the conclusion are true, which is why it's a nitpick.) Did I over-interpret the rhetorical question?

-2Vaniver13y

The direction I took the rhetorical question was "utilities aren't numbers, they're mappings," which does not require the second premise. I agree with you that the syllogism you presented is flawed.

-1Shmi13y

Are you sure? The only thing one really wants from a utility function is ranking, which is even weaker a requirement than affine spaces. All monotonic remappings are in the same equivalency class.

4Vaniver13y

It's practically useful to have reals rather than rankings, because that lets one determine how the function will behave for different probabilistic combinations of futures. If you already have the function fully specified over uncertain futures, then only providing a ranking is sufficient for the output. The reason why I mentioned that it was a mapping, though, is because the output of a single utility function can be seen as an affine space. The point I was making in the ancestral posts was that while it looks like the outputs of two different utility functions play nicely, careful consideration shows that their combination destroys the mapping, which is what makes utility functions useful. Hence the 'families' comment.

0fubarobfusco13y

I'm hearing an echo of praxeology here; specifically the notion that humans use something like stack-ranking rather than comparison of real-valued utilities to make decisions. This seems like it could be investigated neurologically ....

0A1987dM13y

Huh, no. If army1987.U($1000) = shminux.U($1000) = 1, army1987.U($10,000) = 1.9, shminux.U($10,000) = 2.1, and army1987.U($100,000) = shminux.U($100,000) = 3, then then I would prefer 50% probability of $1000 and 50% probability of $100,000 rather than 100% probability of $10,000, and you wouldn't.

2pengvado14y

Using an interval scale? I don't have anything to contribute to the question of interpersonal utility comparison, but the average of two values from the same agent's utility function is easy enough, while addition is still undefined.

0Vaniver14y

I presume the average in question is interpersonal, not intertemporal, as we are discussing status conflicts (between individuals).

0TimFreeman14y

You have failed to disagree with me. My proposal exactly fits your alleged counterexample. Suppose Alice is a utility monster where: * U(Alice, torture of everybody) = 0 * U(Alice, everything) = 1 * U(Alice, no cookie) = 0.1 * U(Alice, Alice dies) = 0.05 And Bob is normal, except he doesn't like Alice: * U(Bob, torture of everybody) = 0 * U(Bob, everything) = 1 * U(Bob, Alice lives, no cookie) = 0.8 * U(Bob, Alice dies, no cookie) = 0.9 If the FAI has a cookie it can give to Bob or Alice, it will give it to Alice, since U(cookie to Bob) = U(Bob, everything) + U(Alice, everything but a cookie) = 1 + 0.1 = 1.1 < U(cookie to Alice) = U(Bob, everything but a cookie) + U(Alice, everything) = 0.8 + 1 = 1.8. Thus Alice gets her intended reward for being a utility monster. However, if the are no cookies available and the FAI can kill Alice, it will do so for the benefit of Bob, since U(Bob, Alice lives, no cookie) + U(Alice, Alice lives, no cookie) = 0.8 + 0.1 = 0.9 < U(Bob, Alice dies, no cookie) + U(Alice, Alice dies) = 0.9 + 0.05 = 0.95. The basic problem is that since Alice had the cookie fixation, that ate up so much of her utility range that her desire to live in the absence of the cookie was outweighed by Bob finding her irritating. Another problem with Alice's utility is that it supports the FAI doing lotteries that Alice would apparently prefer but a normal person would not. For example, assuming the outcome for Bob does not change, the FAI should prefer 50% Alice dies + 50% Alice gets a cookie (adds to 0.525) over 100% Alice lives without a cookie (which is 0.1). This is a different issue from interpersonal utility comparison. They are numbers. Add them. Yes. So far as I can tell, if the FAI is going to do what people want, it has to model people as though they want something, and that means ascribing utility functions to them. Better alternatives are welcome. Giving up because it's a hard problem is not welcome. No. If Alice has high status and

1Vaniver14y

What if wants did not exist a priori, but only in response to stimuli? Alice, for example, doesn't care about cookies, she cares about getting her way. If the FAI tells Alice and Bob "look, I have a cookie; how shall I divide it between you?" Alice decides that the cookie is hers and she will throw the biggest tantrum if the FAI decides otherwise, whereas Bob just grumbles to himself. If the FAI tells Alice and Bob individually "look, I'm going to make a cookie just for you, what would you like in it?" both of them enjoy the sugar, the autonomy of choosing, and the feel of specialness, without realizing that they're only eating half of the cookie dough. Suppose Alice is just as happy in both situations, because she got her way in both situations, and that Bob is happier in the second situation, because he gets more cookie. In such a scenario, the FAI would never ask Alice and Bob to come up with a plan to split resources between the two of them, because Alice would turn it into a win/lose situation. It seems to me that an FAI would engage in want curation rather than want satisfaction. As the saying goes, seek to want what you have, rather than seeking to have what you want. A FAI who engages in that behavior would be more interested in a stimuli-response model of human behavior and mental states than a consequentialist-utility model of human behavior and mental states. This is one of the reasons why utility monsters tend to seem self-destructive; they gamble farther and harder than most people would. How do we measure one person's utility? Preferences revealed by actions? (That is, given a mapping from situations to actions to consequences, I can construct a utility function which takes situations and consequences as inputs and returns the decision taken.) If so, when we add two utilities together, does the resulting number still uniquely identify the actions taken by both parties?

-1A1987dM13y

So are the atmospheric pressure in my room and the price of silver. But you cannot add them together (unless you have a conversion factor from millibars to dollars per ounce).

1TimFreeman13y

Your analogy is invalid, and in general analogy is a poor substitute for a rational argument. In the thread you're replying to, I proposed a scheme for getting Alice's utility to be commensurate with Bob's so they can be added. It makes sense to argue that the scheme doesn't work, but it doesn't make sense to pretend it does not exist.

0chatquitevoit14y

This may be a bit naive, but can a FAI even have a really directive utility function? It would seem to me that by definition (caveats to using that aside) it would not be running with any 'utility' in 'mind'.

[-]RobertLumley14y40

if you were in a burning building, you would try pretty hard to get out. Therefore, you must strongly dislike death and want to avoid it. But if you strongly dislike death and want to avoid it, you must be lying when you say you accept death as a natural part of life and think it's crass and selfish to try to cheat the Reaper. And therefore your reluctance to sign up for cryonics violates your own revealed preferences! You must just be trying to signal conformity or something.

I don't think this section bolsters your point much. The obvious explanation f... (read more)

0HoverHell14y

1RobertLumley14y

That's a good question. I didn't really think about it when I read it, because I am personally completely dismissive of and not scared by haunted houses, whereas I am skeptical of cryonics, and couldn't afford it even if I did the research and decided it was worth it. I'm not sure it can be, but I'm not sure a true rationalist would be scared by a haunted house. The only thing I can come up with for a rational utility function is someone who suspended his belief because he enjoyed being scared. I feel like this example is far more related to irrationality and innate, irrepressible bias than it is rationality.

[-]printing-spoon14y40

A more practical example: when people discuss cryonics or anti-aging, the following argument usually comes up in one form or another: if you were in a burning building, you would try pretty hard to get out. Therefore, you must strongly dislike death and want to avoid it. But if you strongly dislike death and want to avoid it, you must be lying when you say you accept death as a natural part of life and think it's crass and selfish to try to cheat the Reaper.

nitpick: Burning to death is painful and it can happen at any stage of life. "You want to live a long life and die peacefully with dignity" can also be derived but of course it's more complicated.

[-][anonymous]10y20

So if someone stays in the haunted house despite the creaky stairwell, his preferences are revealed as rationalist?

Personally I would have run away exactly because I would not think the sound to come from a non-existent, and so harmless, ghost!

[-]BobTheBob14y20

Thanks for this great sequence of posts on behaviourism and related issues.

Anyone who does not believe mental states are ontologically fundamental - ie anyone who denies the reality of something like a soul - has two choices about where to go next. They can try reducing mental states to smaller components, or they can stop talking about them entirely.

Here's what I take it you're committed to:

by 'mental states' we mean things like beliefs and desires.
an eliminativist has both to stop talking about them and also using them in explanations.
whither g

... (read more)

3TheOtherDave14y

Can you say more about how you got that second bullet item? It's not clear to me that being committed to the idea that mental states can be reduced to smaller components (which is one of the options the OP presented) commits one to stop talking about mental states, or to stop using them in explanations. I mean, any economist would agree that dollars are not ontologically fundamental, but no economist would conclude thereby that we can't talk about dollars.

1BobTheBob14y

This may owe to a confusion on my part. I understood from the title of the post and some of its parts (incl the last par.) that the OP was advocating elimination over reduction (ie, contrasting these two options and picking elimination). I agree that if reduction is an option, then it's still ok to use them in explanation, as per your dollar example.

[-]azergante2y10

goals appear only when you make rough generalizations from its behavior in limited cases.

I am surprised no one brought up the usual map / territory distinction. In this case the territory is the set of observed behaviors. Humans look at the territory and with their limited processing power they produce a compressed and lossy map, here called the goal.

The goal is a useful model to talk simply about the set of behaviors, but has no existence outside the head of people discussing it.

[-]boilingsambar14y10

quoted text if you were in a burning building, you would try pretty hard to get out. Therefore, you must strongly dislike death and want to avoid it. But if you strongly dislike death and want to avoid it, you must be lying when you say you accept death as a natural part of life and think it's crass and selfish to try to cheat the Reaper.

Won't it be the case that someone who tries to escape from a burning building, does so, just to avoid the pain and suffering it inflicts? It would be such a drag to be burned alive rather than a peaceful painless poison death.

5Caravelle14y

That doesn't help much. If people were told they were going to be murdered in a painless way (or something not particularly painful - for example, a shot for someone who isn't afraid of needles and has no problem getting vaccinated) most would consider this a threat and would try to avoid it. I think most people's practical attitude towards death is a bit like Syrio Forel from Game of Thrones - "not today". We learn to accept that we'll die someday, we might even be okay with it, but we prefer to have it happen as far in the future as we can manage. Signing up for cryonics is an attempt to avoid dying tomorrow - but we're not that worried about dying tomorrow. Getting out of a burning building means we avoid dying today. (whether this is a refinement of how to understand our behaviour around death, or a potential generalized utility function, I couldn't say).

2MixedNuts14y

Should be noted that "tomorrow" stands in for "in enough time that we operate in Far mode when thinking about it", as opposed to actual tomorrow, when we very much don't want to die. Come to think of it, a lot of people are all "Yay, death!" in Far mode (I'm looking at you, Epictetus), but much fewer in Near mode (though those who do are famous). Anecdotal evidence: I was born without an aversion for death in principle, was surprised by sad funerals, thought it was mostly signalling (and selfish mourning for lost company), was utterly baffled by obviously sincere death-bashers. I've met a few other people like that, too. Yet we (except some of the few I met in history books) have normal conservation reflexes. There's no pressure to want to live in Far mode (in an environment without cryonics and smoking habits, anyway), and there's pressure to say "I don't care about death, I only care about $ideal which I will never compromise" (hat tip Katja Grace).

1boilingsambar14y

I was just pointing to the opinion that, not everyone who tries to escape from death are actually afraid of death per se. They might have other reasons.

1LeibnizBasher14y

Death from old age often involves drowning in the fluid that accumulates in your lungs when you get pneumonia.

[-]andrewk14y10

Interesting that you chose the "burning building" analogy. In the fire sermon the Buddha argued that being incarnated in samsara was like being in a burning building and that the only sensible thing to do was to take steps to ensure the complete ending of the process of reincarnation in samsara ( and dying just doesn't cut it in this regard). The burning building analogy in this case is a terrible one- as we are talking about the difference between a healthy person seeking to avoid pain and disability versus the cryonics argument- which is all ab... (read more)

[-]lukeprog14y10

Excellent post!

I hope that somewhere along the way you get to the latest neuroscience suggesting that the human motivational system is composed of both model-based and model-free reinforcement mechanisms.

Keep up the good work.

[-]Threedee14y10

Without my dealing here with the other alternatives, do you Yvain, or does any other LW reader think that it is (logically) possible that mental states COULD be ontologically fundamental?

Further, why is that possibility tied to the word "soul", which carries all sorts of irrelevant baggage?

Full disclosure: I do (subjectively) know that I experience red, and other qualia, and try to build that in to my understanding of consciousness, which I also know I experience (:-) (Note that I purposely used the word "know" and not the word "believe".)

7lessdazed14y

It's just the history of some words. It's not that important. People frequently claim this. One thing missing is a mechanism that gets us from an entity experiencing such fundamental mental states or qualia and that being's talking about it. Reductionism offers an account of why they say such things. If, broadly speaking, the reductionist explanation is true, then this isn't a phenomenon that is something to challenge reductionism with. If the reductionist account is not true, then how can these mental states cause people to talk about them? How does something not reducible to physics influence the world, physically? Is this concept better covered by a word other than "magic"? And if these mental states are partly the result of the environment, then the physical world is influencing them too. I don't see why it's desirable to posit magic; if I type "I see a red marker" because I see a red marker, why hypothesize that the physical light, received by my eyes and sending signals to my brain, was magically transformed into pure mentality, enabling it to interact with ineffable consciousness, and then magicked back into physics to begin a new physical chain of processes that ends with my typing? Wouldn't I be just as justified in claiming that the process has interruptions at other points? As the physical emanation "I see red people" may be caused by laws of how physical stuff interacts with other physical stuff, we don't guess it isn't caused by that, particularly as we can think of no coherent other way. We are used to the good habit of not mistaking the limits of our imaginations for the limits of reality, so we won't say we know it impossible. However, if physics is a description of how stuff interacts with stuff, so I don't see how it's logically possible for stuff to do something ontologically indescribable even as randomness. Interactions can either be according to a pattern, or not, and we have the handy description "not in a pattern, indescribable by compres

2The Dao of Bayes14y

There's a fascinating psychological phenomena called "blindsight" where the conscious mind doesn't register vision - the person is genuinely convinced they are blind, and they cannot verbally describe anything. However, their automatic reflexes will still navigate the world just fine. If you ask them to put a letter in a slot, they can do it without a problem. It's a very specific sort of neurological damage, and there's been a few studies on it. I'm not sure if it quite captures the essence of qualia, but "conscious experience" IS very clearly different from the experience which our automatic reflexes rely on to navigate the world!

2lessdazed14y

What if you force them to verbally guess about what's in front of them, can they do better than chance guessing colors, faces, etc.? Can people get it in just one eye/brain side?

1The Dao of Bayes14y

I've only heard of that particular test once. They shined a light on the wall and forced them to guess where. All I've heard is that they do "better than should be possible for someone who is truly blind", so I'm assuming worse than average but definitely still processing the information to some degree. Given that it's a neurological condition, I'd expect it to be impossible to have it in just one eye/brain side, since the damage is occurring well after the signal from both eyes is put together. EDIT: http://en.wikipedia.org/wiki/Blindsight is a decent overview of the phenomena. Apparently it can indeed affect just part of your vision, so I was wrong on that!

6scav14y

Hmm. Unless I'm misunderstanding you completely, I'll assume we can work from the example of the "red" qualium (?) What would it mean for even just the experience of "red" to be ontologically fundamental? What "essence of experiencing red" could possibly exist as something independent of the workings of the wetware that is experiencing it? For example, suppose I and a dichromatic human look at the same red object. I and the other human may have more or less the same brain circuitry and are looking at the same thing, but since we are getting different signals from our eyes, what we experience as "red" cannot be exactly the same. A bee or a squid or a duck might have different inputs, and different neural circuitry, and therefore different qualia. A rock next to the red object would have some reflected "red" light incident upon it. But it has no eyes and as far as I know no perception or mental states at all. Does it make sense to say that the rock can also see its neighbouring object as "red"? I wouldn't say so, outside the realm of poetic metaphor. So if your qualia are contingent on the circumstances of certain inputs to certain neural networks in your head, are they "ontologically fundamental"? I'd say no. And by extension, I'd say the same of any other mental state. If you could change the pattern of signals and the connectivity of your brain one neuron at a time, you could create a continuum of experiences from "red" to "intuitively perceiving the 10000th digit of pi" and every indescribable, ineffable inhuman state in between. None of them would be more fundamental than any other; all are sub-patterns in a small corner of a very richly-patterned universe.

4fubarobfusco14y

"Quale", by the way.

0Hul-Gil14y

How do you know? Do you know Latin, or just how this word works? I'm not doubting you - just curious. I've always wanted to learn Latin so I can figure this sort of thing out (and then correct people), but I've settled for just looking up specific words when a question arises.

1fubarobfusco14y

http://en.wikipedia.org/wiki/Qualia

0Threedee14y

I apologize for being too brief. What I meant to say is that I posit that my subjective experience of qualia is real, and not explained by any form of reductionism or eliminativism. That experience of qualia is fundamental in the same way that gravitation and the electromagnetic force are fundamental. Whether the word ontological applies may be a semantic argument. Basically, I am reprising Chalmers' definition of the Hard Problem, or Thomas Nagel's argument in the paper "What is it like to be a bat?"

4lessdazed14y

Do qualia describe how matter interacts with matter? For example, do they explain why any person says "I have qualia" or "That is red"? Would gravity and electromagnetism, etc. fail to explain all such statements, or just some of them? If qualia cause such things, is there any entropy when they influence and are influenced by matter? Is energy conserved? If I remove neurons from a person one by one, is there a point at which qualia no longer are needed to describe how the matter and energy in them relates to the rest of matter and energy? Is it logically possible to detect such a point? If I then replace the critical neuron, why ought I be confident that merely considering, tracking, and simulating local, physical interactions would lead to an incorrect model of the person insofar as I take no account of qualia? How likely is it that apples are not made of atoms?

1scav14y

You may posit that your subjective experience is not explained by reduction to physical phenomena (including really complex information processes) happening in the neurons of your brain. But to me that would be an extraordinary claim requiring extraordinary evidence. It seems to me that until we completely understand the physical and informational processes going on in the brain, the burden of proof is on anyone suggesting that such complete understanding would still be in principle insufficient to explain our subjective experiences.

1Dreaded_Anomaly14y

You should check out the recent series that orthonormal wrote about qualia. It starts with Seeing Red: Dissolving Mary's Room and Qualia.

0DSimon14y

I don't understand what you mean by this. Could you elaborate?

1Threedee14y

There is no explanation of HOW mass generates or causes gravity, similarly for the lack of explanation of how matter causes or generates forces such as electromagnetism. (Yes I know that some sort of strings have been proposed to subserve gravity, and so far they seem to me to be another false "ether".) So in a shorthand of sorts, it is accepted that gravity and the various other forces exist as fundamentals ("axioms" of nature, if you will accept a metaphor), because their effects and interactions can be meaningfully applied in explanations. No one has seen gravity, no one can point to gravity--it is a fundamental force. Building on Chalmers in one of his earlier writings, I am willing to entertain the idea the qualia are a fundamental force-like dimension of consciousness. Finally every force is a function of something: gravity is a function of amount of mass, electromagnetism is a function of amount of charge. What might qualia and consciousness be a function of? Chalmers and others have suggested "bits of information", although that is an additional speculation.

0DSimon14y

I don't think "[T]heir effects and interactions can be meaningfully applied in explanations" is a good way of determining if something is "fundamental" or not: that description applies pretty nicely to aerodynamics, but aerodynamics is certainly not at the bottom of its chain of reductionism. I think maybe that's the "fundamental" you're going for: the maximum level of reductionism, the turtle at the bottom of the pile. Anyways: (relativistic) gravity is generally thought not to be a fundamental, because it doesn't mesh with our current quantum theory; hence the search for a Grand Unified Whatsit. Given that gravity, an incredibly well-studied and well-understood force, is at most questionably a fundamental thingie, I think you've got quite a hill to climb before you can say that about consciousness, which is a far slipperier and more data-lacking subject.

[-]Alexei14y10

"Preference is a tendency in a reflective equilibrium." That gets its own Anki card!

5Vladimir_Nesov14y

Some preferences don't manifest as tendencies. You might not have been given a choice, or weren't ready to find the right answer.

1Alexei14y

I'm not sure I understand. Can you please provide an example?

0ShardPhoenix14y

Then you could include tendency to want something as well as tendency to do something.

-1Vladimir_Nesov14y

Or tendency to be yourself, perhaps tendency to have a certain preference. If you relax a concept that much, it becomes useless, a fake explanation.

[-]zslastman12y00

This is an excellent post Yvain. How can I socially pressure you into posting the next one? Guilt? Threats against my own wellbeing?

[-][anonymous]13y00

I like to enforce reductionist consistency in my own brain. I like my ehtics universal and contradiction free, mainly because other people can't accuse me of being inconsistent then.

The rest, is akrasia.

[-]Curiouskid14y00

Reductionists want to reduce things like goals and preferences to the appropriate neurons in the brain; eliminativists want to prove that humans, like the blue-minimizing robot, don't have anything of the sort until you start looking at high level abstractions.

I don't really see how these two philosophies contradict.

[-]TylerJay14y00

Absolutely fantastic post. Extremely clearly written, and made the blue-minimizing robot thought experiment really click for me. Can't wait for the next one.

[-]HoverHell14y-40

[+]Will_Newsome14y-140

LESSWRONG
LW

LESSWRONG
LW

135

Secrets of the eliminati

135

135