Human values differ as much as values can differ

George Hamilton's autobiography Don't Mind if I Do, and the very similar book by Bob Evans, The Kid Stays in the Picture, give a lot of insight into human nature and values.  For instance: What do people really want?  When people have the money and fame to travel around the world and do anything that they want, what do they do?  And what is it that they value most about the experience afterward?

You may argue that the extremely wealthy and famous don't represent the desires of ordinary humans.  I say the opposite: Non-wealthy, non-famous people, being more constrained by need and by social convention, and having no hope of ever attaining their desires, don't represent, or even allow themselves to acknowledge, the actual desires of humans.

I noticed a pattern in these books:  The men in them value social status primarily as an ends to a means; while the women value social status as an end in itself.

"Male" and "female" values

This is a generalization; but, at least at the very upper levels of society depicted in these books, and a few others like them that I've read, it's frequently borne out.  (Perhaps a culture chooses celebrities who reinforce its stereotypes.)  Women and men alike appreciate expensive cars and clothing.  But the impression I get is that the flamboyantly extravagant are surprisingly non-materialistic.  Other than food (and, oddly, clothing), the very wealthy themselves consistently refer to these trappings as things that they need in order to signal their importance to other people.  They don't have an opinion on how long or how tall a yacht "ought" to be; they just want theirs to be the longest or tallest.  The persistent phenomenon whereby the more wealthy someone appears, the more likely they are to go into debt, is not because these people are too stupid or impulsive to hold on to their money (as in popular depictions of the wealthy, eg., A New Leaf) .  It's because they are deliberately trading monetary capital for the social capital that they actually desire (and expect to be able to trade it back later if they wish to, even making a profit on the "transaction", as Donald Trump has done so well).

With most of the women in these books, that's where it ends.  What they want is to be the center of attention.  They want to walk into a famous night-club and see everyone's heads turn.  They want the papers to talk about them.  They want to be able to check into a famous hotel at 3 in the morning and demand that the head chef be called at home, woken up, and brought in immediately to cook them a five-course meal.  Some of the women in these stories, like Elizabeth Taylor, routinely make outrageous demands just to prove that they're more important than other people.

What the men want is women.  Quantity and quality.  They like social status, and they like to butt heads with other men and beat them; but once they've acquired a bevy of beautiful women, they are often happy to retire to their mansion or yacht and enjoy them in private for a while.  And they're capable of forming deep, private attachments to things, in a way the women are less likely to.  A man can obsess over his collection of antique cars as beautiful things in and of themselves.  A woman will not enjoy her collection of Faberge eggs unless she has someone to show it to.  (Preferably someone with a slightly less-impressive collection of Faberge eggs.)  Reclusive celebrities are more likely to be men than women.

Some people mostly like having things.  Some people mostly like having status.  Do you see the key game-theoretic distinction?

Neither value is very amenable to the creation of wealth.  Give everybody a Rolls-Royce; and the women still have the same social status, and the men don't have any more women.  But the "male" value is more amenable to it.  Men compete, but perhaps mainly because the distribution of quality of women is normal.  The status-related desires of the men described above are, in theory, capable of being mutually satisfied.  The women's are not.

Non-positional / Mutually-satisfiable vs. Positional / Non-mutually-satisfiable values

No real person implements pure mutually-satisfiable or non-mutually-satisfiable values.   I have not done a study or taken a survey, and don't claim that these views correlate with sex in general.  I just wanted to make accessible the evidence I saw that these two types of values exist in humans.  The male/female distinction isn't what I want to talk about; it just helped organize the data in a way that made this distinction pop out for me.  I could also have told a story about how men and women play sports, and claim that men are more likely to want to win (a non-mutually-satisfiable value), and women are more likely to just want to have fun (a mutually-satisfiable value).  Let's not get distracted by sexual politics.  I'm not trying to say something about women or about men; I'm trying to say something about FAI.

I will now rename them "non-positional" and "positional" (as suggested by SilasBarta and wnoise), where "non-positional" means assigning a value to something from category X according to its properties, and "positional" means assigning a new value to something from category X according to the rank of its non-positional value in the set of all X (non-mutually-satisfiable).

Now imagine two friendly AIs, one non-positional and one positional.

The non-positional FAI has a tough task.  It wants to give everyone what it imagines they want.

But the positional FAI has an impossible task.  It wants to give everyone what it is that it thinks they value, which is to be considered better than other people, or at least better than other people of the same sex.  But it's a zero-sum value.  It's very hard to give more status to one person without taking the same amount of status away from other people.  There might be some clever solution involving sending people on trips at relativistic speeds so that the time each person is high-status seems longer to them than the time they are low-status, or using drugs to heighten their perceptions of high status and diminish the pain of low status.  For an average utilitarian, the best solution is probably to kill off everyone except one man and one woman.  (Painlessly, of course.)

A FAI trying to satisfy one of these preferences would take society in a completely different direction than a FAI trying to satisfy the other.  From the perspective of someone with the job of trying to satisfy these preferences for everyone, they are as different as it is possible for preferences to be, even though they are taken (in the books mentioned above) from members of the same species at the same time in the same place in the same strata of the same profession.

Correcting value "mistakes" is not Friendly

This is not a problem that can be resolved by popping up a level.  If you say, "But what people who want status REALLY want is something else that they can use status to obtain," you're just denying the existence of status as a value.  It's a value.  When given the chance to either use their status to attain something else, or keep pressing the lever that gives them a "You've got status!" hit, some people choose to keep pressing the lever.

If you claim that these people have formed bad habits, and improperly short-circuited a connection from value to stimulus; and can be re-educated to instead see status as a means, rather than as an ends... I might agree with you.  But you'd make a bad, unfriendly AI.  If there's one thing FAIers have been clear about, it's that changing top-level goals is not allowed.  (That's usually said with respect to the FAI's top-level goals, not wrt the human top-level goals.  But, since the FAI's top-level goal is just to preserve human top-level goals, it would be pointless to make a lot of fuss making sure the FAI held its own top-level goals constant, if you're going to "correct" human goals first.)

If changing top-level goals is allowed in this instance, or this top-level goal is considered "not really a top-level goal", I would become alarmed and demand an explanation of how a FAI distinguishes such pseudo-top-level-goals from real top-level goals.

If a computation can be conscious, then changing a conscious agent's computation changes its conscious experience

If you believe that computer programs can be conscious, then unless you have a new philosophical position that you haven't told anyone about, you believe that consciousness can be a by-product of computation.  This means that the formal, computational properties of peoples' values are not just critical, they're the only thing that matters.  This means that there is no way to abstract away the bad property of being zero-sum from a value without destroying the value.

In other words, it isn't valid to analyze the sensations that people get when their higher status is affirmed by others, and then recreate those sensations directly in everyone, without anyone needing to have low status.  If you did that, I can think of only 3 possible interpretations of what you would have done, and I find none of them acceptable:

  • Consciousness is not dependent on computational structure (this leads to vitalism); or
  • You have changed the computational structure their behaviors and values are part of, and therefore changed their conscious experience and their values; or
  • You have embedded them each within their own Matrix, in which they perceive themselves as performing isomorophic computations (e.g., the "Build human-seeming robots" or "For every person, a volcano-lair" approaches mentioned in the comments).

Summary

This discussion has uncovered several problems for an AI trying to give people what they value without changing what they value.  In increasing order of importance:

  • If you have a value associated with a sensation that is caused by a stimulus, it isn't clear when it's legitimate for a FAI to reconnect the sensation to a different stimulus and claim it's preserved the value.  Maybe it's morally okay for a person to rewire their kids to switch their taste-perceptions of broccoli and ice cream.  But is an AI still friendly if it does this?
  • It isn't okay to do this with the valuation of social status.  Social status has a simple formal (mathematical) structure requiring some agents to have low status in order for others to have high status.  The headache that status poses for a FAI trying to satisfy it is a result of this formal structure.  You can't abstract it away, and you can't legitimately banish it by reconnecting a sensation associated with it to a different stimulus, because the agent would then use that sensation to drive different behavior, meaning the value is now part of a different computational structure, and a different conscious experience.  You either preserve the problematic formal structure, or you throw out the value.
  • Some top-level human goals lead to conflict.  You can't both eliminate conflict, and preserve human values.  It's irresponsible, as well as creepy, when some people (I'm referring to some comments made on LW that I can't find now) talk about Friendly AI the same way that Christians talk about the Second Coming, as a future reign of perfect happiness for all when the lamb will lie down with the lion.  That is a powerful attractor that you don't want to go near, unless you are practicing the Dark Arts.
  • The notion of top-level goal is clear only in a 1960s classic symbolic AI framework.  The idea of a "top-level goal" is an example of what I called the "Prime mover" theory of network concepts.  In humans, a subsidiary goal, like status, can become a top-level goal via classic behavioristic association.  It happens all the time.  But the "preferences" that the FAI is supposed to preserve are human top-level goals.  How's it supposed to know which top-level goals are sacrosanct, and which ones are just heuristics or erroneous associations?
  • Reconciling human values may not be much easier or more sensible than reconciling all values, because human values already differ as much as it is possible for values to differ.  Sure, humans have only covered a tiny portion of the space of possible values.  But we've just seen two human values that differ along the critical dimensions of being mutually satisfiable or not being mutually satisfiable, and of encouraging global cooperation vs. not encouraging global cooperation.  The harmonic series looks a lot like Zeno's geometric series; yet one converges and one diverges.  It doesn't matter that the terms used in each look similar; they're as different as series can be.  In the same way, values taken from any conceivable society of agents can be classified into mutually-satisfiable, or not mutually-satisfiable.  For the purposes of a Friendly AI, a mutually-satisfiable value held by gas clouds in Antares is more similar to a mutually-satisfiable human value, than either is to a non-mutually-satisfiable human value.
205 comments, sorted by
magical algorithm
Highlighting new comments since Today at 10:46 AM
Select new highlight date

This is a big problem with utilitarianism, period. Status is a problem. So are other-regarding values ("I want my neighbors to be Christian.") You're in deep trouble trying to reconcile differing values globally. (Which is why, for myself, I'm not messing with it; I'm starting to believe in the idea of cultivating our own gardens.)

That said, I think this is a distracting and inaccurate example. Status and stuff aren't the only things we value, and I don't see that they split by gender in the way you say. Personally, I don't want to be Elizabeth Taylor.

So are other-regarding values ("I want my neighbors to be Christian.")

Yes, the generalization of the problem is what I call negatively-coupled utility: where satisfaction of one person along one dimension necessarily causes dissatisfaction of another person.

Therefore, as long as there is at least one misanthrope (person who is dissatisfied by any increase in anyone else's happiness and vice versa), Pareto-improvements are impossible.

Therefore, Pareto-superiority is too strict of a standard for determining when human values are being satsified.

Pareto optimums are probably not very good optimums, in any domain. They're just tractable. An example of looking for your keys under the streetlamp.

Pareto optima aren't very good on average, but shouldn't any worthwhile concept of optimality imply Pareto optimality?

Indeed. Pareto optimality is a necessary condition for an "ideal" society. Economists often make the mistake of thinking pareto optimality is a sufficient condition for this, which it is not: it's pareto optimal if I have everything and don't much want to part with any of it.

Not if more than a dozen people are involved. Certainly not if a million people are involved.

EDIT: Oops. Wrong.

What sort of situation are you thinking of where X would be better than Y for one person and worse for none, but Y better than X all things considered?

Doh! You're right.

I think of the concept of Pareto optimality as being useless because the path between the initial allocation, and a really good allocation, is usually littered with Pareto-optimums that must be broken through to get to the really good allocation. The really-good allocation is itself presumably Pareto-optimal. But you shouldn't get into the habit of thinking that's worth much.

Would you agree that a worthwhile concept of optimality would imply that there are no Pareto improvements that can be made to an optimum? (Though the optimum may not be Pareto superior to all alternatives.)

Yes, within the assumptions of the Pareto model.

Actually, there's a pretty easy fix for the status problem, because we actually do have independently-valuable feelings that the ability to boss people around correspond to -- feelings like respectability, importance, and pride being among the main ones.

(Often, however, people who are outwardly concerned with their position are actually motivated by perceived threats to safety or affiliation instead -- believing that they are unworthy or unlovable unless they're important enough, or that you're only safe if you're in charge, etc.)

Anyway, your analysis here (as with many others on LW) conflates feelings of status with some sort of actual position in some kind of dominance hierarchy. But this is a classification error. There are people who feel quite respectable, important, and proud, without needing to outwardly be "superior" in some fashion. One's estimate of one's respectability, prideworthiness, or importance in the scheme of things is not intrinsically linked to any value scheme other than one's own.

Given that the idea of CEV is to get to what the more-grown, wiser version of "you" would choose for your value scheme, it's pretty much a given that unsupportable schemes would fall by the wayside.

Truth is, if you're worried about your place in the dominance hierarchy (by which I mean you have feelings about it, not that you're merely curious or concerned with it for tactical or strategic reasons), that's prima facie evidence of something that needs immediate fixing, and without waiting for an AI to modify your brain or convince you of something. Identify and eliminate the irrational perceived threat from your belief system.

Anyway, your analysis here (as with many others on LW) conflates feelings of status with some sort of actual position in some kind of dominance hierarchy. But this is a classification error. There are people who feel quite respectable, important, and proud, without needing to outwardly be "superior" in some fashion.

Those aren't the people I'm talking about.

Truth is, if you're worried about your place in the dominance hierarchy (by which I mean you have feelings about it, not that you're merely curious or concerned with it for tactical or strategic reasons), that's prima facie evidence of something that needs immediate fixing, and without waiting for an AI to modify your brain or convince you of something. Identify and eliminate the irrational perceived threat from your belief system.

You're not dealing with the actual values the people I described have; you're saying they should have different values. Which is unFriendly!

You're not dealing with the actual values the people I described have; you're saying they should have different values.

Your definition of value is not sufficient to encompass how human beings actually process values. We can have both positive and negative responses to the same "value" -- and they are largely independent. People who compulsively seek nonconsensual domination of others are not (if we exclude sociopaths and clinical sadists) acting out of a desire to gain pleasure, but rather to avoid pain.

Specifically, a pain that will never actually happen, because it's based on an incorrect belief. And correcting that belief is not the same thing as changing what's actually valued.

In other words, what I'm saying is: you're mistaken if you think an emotionally healthy, non-sociopathic/psychopathic human actually positively values bossing people around just to watch them jump. IMO, such a person is actually doing it to avoid losing something else -- and that problem can be fixed without actually changing what the person values (positively or negatively).

In other words, what I'm saying is: you're mistaken if you think an emotionally healthy, non-sociopathic/psychopathic human actually positively values bossing people around just to watch them jump.

Look at how value-laden that sentence is: "healthy", "non-sociopathic/psychopathic". You're just asserting your values, and insisting that people with other values are wrong ("unhealthy").

Well, hang on. pjeby probably would benefit from finding a less judgmental vocab, but he has a valid point: not every human action should be counted as evidence of a human value for the usual result of that action, because some humans have systematically erroneous beliefs about the way actions lead to results.

You may be hitting pjeby with a second-order ad hominem attack! Just because he uses vocab that's often used to delegitimize other people doesn't mean his arguments should be deligitimized.

But people "who compulsively seek nonconsensual domination of others" and "actually positively values bossing people around just to watch them jump" exist and are quite successful, arguably as a result of these values at other things which most humans value (sex, status, wealth). Pjeby is describing traits that are common in politicians, managers and high school teachers.

Pjeby is describing traits that are common in politicians, managers and high school teachers.

And I'm asserting that the subset of those individuals who are doing it for a direct feeling-reward (as opposed to strategic reasons) are what we would call sociopaths or psychopaths. The remainder (other than Machiavellian strategists and 'opaths) are actually doing it to avoid negative feeling-hits, rather than to obtain positive ones.

Look at how value-laden that sentence is: "healthy", "non-sociopathic/psychopathic". You're just asserting your values, and insisting that people with other values are wrong ("unhealthy").

I assume we would want CEV to exclude the preferences of sociopaths and psychopaths, as well as those of people who are actually mistaken about the beliefs underlying their preferences.

Healthy human beings experience guilt when they act like jerks, counterbalancing the pleasure, unless there are extenuating circumstances (like the other guy being a jerk, too).

And when a person doesn't experience guilt, we call that sociopathy.

I assume we would want CEV to exclude the preferences of sociopaths and psychopaths, as well as those of people who are actually mistaken about the beliefs underlying their preferences.

"Knew more, thought faster" should make the second irrelevant (and make the "theocracy" part of "xenophobic theocracy" implausible even given majority voting). The original CEV document actually opposed excluding anyone's preferences or loading the outcome in any way as an abuse of the programmers' power, but IIRC, Eliezer has made comments here indicating that he's backed down from that some.

I assume we would want CEV to exclude the preferences of sociopaths and psychopaths, as well as those of people who are actually mistaken about the beliefs underlying their preferences.

I thought the idea was that, under CEV, sociopaths would just get outvoted. People with mutant moralities wouldn't be excluded, but, just by virtue of being mutants, their votes would be almost entirely drowned out by those with more usual moralities.

[ETA: Eliezer would object to calling these mutant moralities "moralities", because he reserves the word "morality" for the action-preferring algorithm (or whatever the general term ought to be) that he himself would find compelling in the limit of knowledge and reflection. As I understand him, he believes that he shares this algorithm with nearly all humans.]

I thought the idea was that, under CEV, sociopaths would just get outvoted. People with mutant moralities wouldn't be excluded, but, just by virtue of being mutants, their votes would be almost entirely drowned out by those with more usual moralities.

If it were (just) a matter of voting, then I imagine we'd end up with the AI creating a xenophobic theocracy.

If it were (just) a matter of voting, then I imagine we'd end up with the AI creating a xenophobic theocracy.

As I understand it, it's not just a matter of voting. It's more as though a simulated* version of each us had the opportunity to know everything relevant that the FAI knows, and to reflect fully on all that information and on his or her values to reach a fully coherent preference. In that case, it's plausible that nearly all of us would be convinced that xenophobic theocracy was not the way to go. However, if many of us were convinced that xenophobic theocracy was right, and there were enough such people to outweigh the rest, then that would mean that we ought to go with xenophobic theocracy, and you and I are just mistaken to think otherwise.

* I think that Eliezer in fact would not want the FAI to simulate us to determine our CEV. The concern is that, if the AI is simulating us before it's learned morality by extrapolating our volition, then the simulations would very likely lead to very many tortured minds.

I wonder if, under the current plan, CEV would take into account people's volition about how CEV should work — i.e. if the extrapolated human race would want CEV to exclude the preferences of sociopaths/psychopaths/other moral mutants, would it do so, or does it only take into account people's first-order volition about the properties of the FAI it will build?

Ultimately, preference is about properties of the world, and AI with all its properties is part of the world.

You're not dealing with the actual values the people I described have; you're saying they should have different values. Which is unFriendly!

Could be saying "I think that, upon further reflection, they would have different values in this way".

Phil, you're right that there's a difference between giving people their mutually unsatisfiable values and giving them the feeling that they've been satisfied. But there's a mechanism missing from this picture:

Even if I wouldn't want to try running an AI to have conversations with humans worldwide to convert them to more mutually satisfiable value systems, and even though I don't want a machine to wire-head everybody into a state of illusory high status, I certainly trust humans to convince other humans to convert to mutually satisfiable values. In fact, I do it all the time. I consider it one of the most proselytism-worthy ideas ever.

So I see your post as describing a very important initiative we should all be taking, as people: convince others to find happiness in positive-sum games :)

(If I were an AI, or even just an I, perhaps you would hence define me as "unFreindly". If so, okay then. I'm still going to go around convincing people to be better at happiness, rational-human-style.)

So I see your post as describing a very important initiative we should all be taking, as people: convince others to find happiness in positive-sum games

It's an error to assume that human brains are actually wired for zero or negative sum games in the first place, vs. having adaptations that tend towards such a situation. Humans aren't true maximizers; they're maximizer-satisficers. E.g., people don't seek the best possible mate: they seek the best mate they think they can get.

(Ironically, the greater mobility and choices in our current era often lead to decreased happiness, as our perceptions of what we ought to be able to "get" have increased.)

Anyway, ISTM that any sort of monomaniacal maximizing behavior (e.g. OCD, paranoia, etc.) is indicative of an unhealthy brain. Simple game theory suggests that putting one value so much higher than others is unlikely to be an evolutionarily stable strategy.

Your place in the dominance hierarchy determines your mating prospects much more so than your feelings of high status. Feelings maybe necessary, but not sufficient in this case.

Your initial observation is interesting; your analysis is somewhat problematic.

First, there is, as mentioned elsewhere, a distinct misconception of status that seems to recur here. Status is not a one-dimensional line upon which each person can be objectively placed. Status is multi-dimensional. For example, your person with a second-rate Faberge egg collection may simply decide that what really counts is how nice someone's roses are, and thus look at better Faberge eggs with sympathy (they should have spent all this money on roses!) rather than jealousy. By competing on multiple dimensions, it's possible for many people to win, potentially even everyone.

You are also using a non-representative sample. The type of person who's likely to become successful through celebrity seems disproportionately likely to be status-obsessed. (This is likely also true of people who inherit fortunes, as they have few other ways to distinguish their self-worth.) Generalizing about the ability of an AI to serve all people based on these few may be inappropriate. This is even more true when we consider that a world with an FAI may look very different in terms of how people are raised and educated.

Furthermore, your restriction on an FAI as being unable to alter utility functions seems like overkill. Every ad you see on television is an attempt to change your utility function. They're just not terribly effective. Any alteration that an FAI made on advertising, education, perhaps even market pricing, would necessarily have some effect on people's utility functions which the FAI would presumably be aware of. Thus, while perhaps you want some restriction on the FAI rewiring people's brains directly, it seems like an FAI that is wholly prohibited from altering utility functions would be practically incapable of action. In short, therefore, your analysis has an overly restricted imagination for how an FAI would go about its business - you assume either it is directly rewiring everyone's brain or sitting on its own circuits unable to act. There should be a middle ground.

And, as an aside, your mention of "average utilitarians" betrays a common misconception that's getting old 'round these parts. Utility is inter-dependent. It's all well and good to talk about how, in theory, eliminating everyone but the two happiest people would maximize average utility, but, in reality, two people would not want to live like that. While it might work on a spreadsheet, actually eliminating most of the population is extraordinarily unlikely to maximize average utility, because utility is not merely a function of available resources.

This discussion of "Friendly AI" is hopelessly anthropomorphic. It seems to be an attempt to imagine what a FAI optimizing the world to given person's values will do, and these are destined to fail, if you bring up specific details, which you do. A FAI is a system expected to do something good, not a specific good thing known in advance. You won't see it coming.

(More generally, see the Fun theory sequence.)

It seems to be an attempt to imagine what a FAI optimizing the world to given person's values will do, and these are destined to fail

Yes, I think that this is right. An FAI would try to create a world in which we are all better off. That doesn't mean that the world would be as any one of us considers perfect. Perhaps each of us would still consider it to be very suboptimal. But all of us would still desire it over the present world.

In other words, let's grant, for the sake of argument, that most of us are doomed to continue to suffer from losing zero-sum status games forever. Nonetheless, there are possible worlds within which we all, even the losers, would much rather play these games. So there is still a lot that an FAI could do for us.

You can't think about what specifically FAI will do, period. It seems quite likely there will be no recognizable humans in a world rebuilt by FAI. Any assumption is suspect, even the ones following from the most reliable moral heuristics.

Is it correct to call it FAI, then? Do you see a world with "no recognizable humans" as a very likely thing for the human race (or its extrapolated volition) to collectively want?

Is it correct to call it FAI, then?

I'm considering the case of FAI, that is humanity's preference correctly rendered.

Do you see a world with "no recognizable humans" as a very likely thing for the human race (or its extrapolated volition) to collectively want?

Status quo has no power. So the question shouldn't be whether "no recognizable humans" is the particular thing humanity wants, but rather whether "preserving recognizable humans" happens to be the particular thing that humanity wants. And I'm not sure there are strong enough reasons to expect "world with recognizable humans" to be the optimal thing to do with the matter. It might be, but I'm not convinced we know enough to locate this particular hypothesis. The default assumption that humans want humans seems to stem from the cached moral intuition promoted by availability in the current situation, but reconstructing the optimal situation from preference is a very indirect process, that won't respect the historical accidents of natural development of humanity, only humanity's values.

You can't think about what specifically FAI will do, period.

"Specifically" is relative. By some standards, we have never thought specifically about anything at all. (I have never traced precisely the path of every atom involved in any action.)

Nonetheless, one can think more or less specifically, and to think at all is to think a thought that is specific to some extent. To think, as you wrote above, that an "FAI is a system expected to do something good" is to think something more specific than one might, if one were committed to thinking nothing specific, period. (This is assuming that your words have any meaning whatsoever.)

ETA: In other words, as Eliezer wrote in his Coming of Age sequence, you must be thinking something that is specific to some extent, for otherwise you couldn't even pose the problem of FAI to yourself.

Sure. The specific thing you say is that the outcome is "good", but what that means exactly is very hard to decipher, and in particular hard or impossible to decipher in a form of a story, with people, their experiences and social constructions. It is the story that can't be specific.

[ETA: I wrote the following when your comment read simply "Sure, why?". I can see the plausibility of your claim that narrative moral imaginings can contribute nothing to the development of FAI, though it's not self-evidently obvious to me. ]

Perhaps I missed the point of your previous comment.

I presumed that you thought that I was being too specific. I read you as expressing this thought by saying that one should not think specifically, "period". I was pointing out the impossibility or meaninglessness of that injunction, at least in its extreme form. I was implicitly encouraging you to indicate the non-extreme meaning that you had intended.

The post has five bullet points at the end, and this does not respond to any of them. The post explores the nature of values that humans have, and values in general; Vladimir's comment is to the effect that we can't investigate values, and must design a Friendly AI without understanding the problem domain it will face.

Vladimir's comment is to the effect that we can't investigate values, and must design a Friendly AI without understanding the problem domain it will face.

We can't investigate the content of human values in a way that is useful for constructing Friendly AI, and we can't investigate what specifically Friendly AI will do. We can investigate values for the purpose of choosing better human-designed policies.

We can't investigate the content of human values in a way that is useful for constructing Friendly AI

Do you want to qualify that some way? I interpret as meaning that learning about values has no relevance to constructing an AI whose purpose is to preserve values. It's almost an anti-tautology.

I interpret as meaning that learning about values has no relevance to constructing an AI whose purpose is to preserve values. It's almost an anti-tautology.

The classical analogy is that if you need to run another instance of a given program on a faster computer, figuring out what the program does is of no relevance, you only need to correctly copy its machine code and correctly interpret it on the new machine.

If you need to run another instance of a given program on a faster computer, but you don't know what an algorithm is, or what part of the thing in front of you is a "computer" and what part is a "computer program", and you have not as of yet discovered the concept of universal computation, nor are certain whether the computer hardware, or even arithmetic itself, operates deterministically -

-- then you should take some time to study the thing in front of you and figure out what you're talking about.

You'd probably need to study how these "computers" work in general, not how to change the background color in documents opened with a word processor that runs on the thing. A better analogy in the direction you took is uploading: we need to study neurons, not beliefs that a brain holds.

You seem to think that values are just a content problem, and that we can build a mechanism now and fill the content in later. But the whole endeavor is full of unjustified assumptions about what values are, and what values we should pursue. We have to learn a lot more about what values are, what values are possible, what values humans have, and why they have them, before we can decide what we ought to try to do in the first place.

We have to learn a lot more about what values are, what values are possible, what values humans have, and why they have them, before we can decide what we ought to try to do in the first place.

Of course. Only the finer detail is content problem.

But the whole endeavor is full of unjustified assumptions about what values are, and what values we should pursue.

Not that I know of. On the contrary, the assumption is that one shouldn't posit statements about which values human actually have, and what kind of mathematical structure values are is an open problem.

I agree, I can almost hear Eliezer saying (correctly) that it's daft to try and tell the FAI what to do , you just give it it's values and let it rip. (if you knew what to do, you wouldn't need the AI) All this post has brought up is a problem that an AI might potentially have to solve. And sure it looks like a difficult problem, but it doesn't feel like one that can't be solved at all. I can think of several rubbish ways to make a bunch of humans think that they all have high status, a brain the size of a planet would think of a excellent one.

I can think of several rubbish ways to make a bunch of humans think that they all have high status,

Isn't that what society currently does? Come up with numerous ways to blur and obscure the reality of where exactly you fall in the ranking, yet let you plausibly believe you're higher than you really are?

Isn't that what society currently does? Come up with numerous ways to blur and obscure the reality of where exactly you fall in the ranking, yet let you plausibly believe you're higher than you really are?

Isn't it that we each care about a particular status hierarchy? The WOW gamer doesn't care about the status hierarchy defined by physical strength and good looks. It's all about his 10 level 80 characters with maxed out gear, and his awesome computer with a Intel Core i7 975 Quad-Core 3.33Ghz cpu, 12GB of tri-channel DDR3, Dual SLIed GeForce GTX 260 graphics cards, 2 1TB hard drives, Bluray, and liquid cooling.

This issue came up on crookedtimber.org before in reply to a claim by Will Wilkinson that free market societies decrease conflict by having numerous different hierarchies so that everyone can be near the top in one of them. (Someone google-fu this?)

The CT.org people replied that these different hiearchies actually exist within a meta-hierarchy that flattens it all out and retains a universal ranking for everyone, dashing the hopes that everyone can have high status. The #1 WOW player, in other words, is still below the #100 tennis player.

Despite the ideological distance I have from them, I have to side with the CT.org folks on this one :-/

ETA: Holy Shi-ite! That discussion was from October '06! Should I be worried or encouraged by the fact that I can remember things like this from so long ago?

The crooked timber post is here. On first glance it seems like a matter of degree: to the extent that there is such a universal ranking, it only fully defeats Wilkinson's point if the universal ranking and its consequences are the only ranking anyone cares about. As long as different people care differently about (the consequences of) different rankings, which it seems to me is often the case, everyone can rise in their favorite ranking and benefit more than others are harmed.

ETA: though maybe the more hierarchies there are, the less good it feels to be #100 on any of them.

Okay, to substantiate my position (per a few requests), I dispute that you can actually achieve the state where people only care about a few particular hierarchies, or even that people have significant choice in which hierarchies they care about. We're hardwired to care about status; this drive is not "up for grabs", and if you could turn off your caring for part of the status ranking, why couldn't you turn it all off?

Furthermore, I'm highly skeptical that e.g. the WOW superstar is actually fully content to remain in the position that being #1 in WOW affords him; rather, he's doing the best he can given his abilities, and this narrow focus on WOW is a kind of resignation. In a way I can kind of relate: in high school, I used to dominate German competitions and classes involving math or science. While that was great, it just shifted my attention to the orchestra classes and math/debate competitions that I couldn't dominate.

Now, you can dull the social influence on yourself that makes you care about status by staying away from the things that will make you compare yourself to the broader (e.g. non-WoW) society, but this is a devil's bargain: it has the same kind of effect on you as solitary confinement, just of a lesser magnitude. (And I can relate there too, if anyone's interested.)

I think the WOW superstar would, if he could, trade his position for one comparable to the #100 tennis player in a heartbeat. And how many mistresses does #1 in Wow convert to?

And how many mistresses does #1 in Wow convert to?

I don't know about in-game WoW superstars, but I knew an admin of an "unofficial" Russian server of a major AAA MMORPG, and he said that basically all female players of that server he met in real life wanted to go to bed with him. This might have been an exaggeration, but I can confirm at least one date. BTW, I wouldn't rate the guy as attractive.

In the 1990s I happened upon a game of Vampire (a live-action role-playing game) being played outdoors at night on the campus of UC Berkeley. After the game, I happened to be sitting around at Durant Food Court (a cluster of restaurants near campus) when I overheard one of the female players throw herself at one of the organizers: "How many experience points would I need to go to bed with you?" she asked playfully. (The organizer threw me a juicy grin on the side a few moments later, which I took as confirmation that the offer was genuine.)

I am guessing that in the environment of evolutionary adaptation, political success and political advantage consisted largely of things very much like being able to get a dozen people to spend an evening in some organized activity that you run.

ADDED. Now that I have had time to reflect, what she probably said is, "how many experience points do I get for . . .", which is a wittier come-on than the one I originally wrote and which jibes with the fact that one of the organizer's jobs during the game is to award experience points to players.

Interesting; I guess I underestimated the position of unofficial Russian WoW server admins in the meta-hierarchy -- in part because I didn't expect as many desirable Russian women to play WoW.

If the server population is a couple thousand players, and there are 5% of females among them, that leaves you with about 100 females, 10 of which will likely be attractive to you -- and if you run a dozen servers or so, that's definitely not a bad deal if you ask me :)

Take a less extreme version of the position you are arguing against: the WOWer cares about more than the WOW hierarchy, but the meta-hierarchy he sets up is still slightly different from the meta-hierarchy that the 100th best tennis player sets up. The tennis player wouldn rank (1st in tennis, 2nd in WOW) higher than (2nd in tennis, 1st in WOW), but the WOWer would flip the ranking. Do you find this scenario all that implausible?

It's plausible, but irrelevant. The appropriate comparison is how the WoWer would regard a position

comparable [in status] to the #100 tennis player.

If he doesn't yearn for a high ranking in tennis, it's because of the particulars of tennis, not out of a lack of interest in a higher ranking in the meta-hierarchy.

It's plausible, but irrelevant...

Well, it's not relevant if the WOWer would still rather be the 100th best tennis player and suck at WOW than his current position - which is plausible, but there are probably situations where this sort of preference does matter.

If he doesn't yearn for a high ranking in tennis, it's because of the particulars of tennis, not out of a lack of interest in a higher ranking in the meta-hierarchy.

He's certainly interested in the meta-hierarchy, but why can't he value the status gained from WOW slightly higher than the status gained from tennis, irrespective of how much he likes tennis and WOW in themselves?

Yes, I get that someone might plausibly not care about tennis per se. That's irrelevant. What's relevant is whether he'd trade his current position for one with a meta-hierarchy position near the #100 tennis player -- not necessarily involving tennis! -- while also being something he has some interest in anyway.

What I dispute is that people can genuinely not care about moving up in the meta-hierarchy, since it's so hardwired. You can achieve some level of contentedness, sure, but not total satisfaction. The characterization steven gave of the #1 WoW player's state of mind is not realistic.

But we're probably also wired to care mostly about the hierarchies of people with whom we interact frequently. In the EEA, those were pretty much the only people who mattered. [ETA: I mean that they were the only people to whom your status mattered. Distant tribes might matter because they could come and kick you off your land, but they wouldn't care what your intra-tribe status was.]

The #1 WOW player probably considers other WOW players to be much more real, in some psychologically powerful way, than are professional tennis players and their fans. It would therefore be natural for him to care much more about what those other WOW players think.

But like I said earlier, that's like saying, "If you live in solitary confinement [i.e. no interaction even with guards], you're at the top of your hierarchy so obviously that must make you the happiest possible."

You can't selectively ignore segments of society without taking on a big psychological burden.

You can't have high status if no other people are around. But high status is still a local phenomenon. Your brain wants to be in a tribe and to be respected by that tribe. But the brain's idea of a tribe corresponds to what was a healthy situation in the EEA. That meant that you shouldn't be in solitary confinement, but it also meant that your society didn't include distant people with whom you had no personal interaction.

But from the perspective of an EEA mind, online interaction with other WoWers is identical (or at least extremely similar) to solitary confinement in that you don't get the signals the brain needs to recognize "okay, high status now". (This would include in-person gazes, smells, sounds, etc.) This is why I dispute that the WoW player actually can consider the other WoW players to be so psychologically real.

Ah - I'd been misreading this because I imagined the #1 WoW player would interact socially with other WoW players ("in real life") like all of the WoW players I know do.

Wouldn't the #1 WoW player be spending most of his waking hours on a computer instead of socializing?

Well so far I've just been assuming '#1 WoW player' is meaningful. As I understand it, there isn't much to gain at the margins once you spend most of your time playing. Also, who says you can't be on a computer and socializing? There's plenty of time to look away from the computer while playing WoW, and you can play it practically anywhere.

Also, who says you can't be on a computer and socializing?

Human psychology.

Your body can tell the difference between computer interaction and in-person interaction. Intermittently "socializing" while you try to play is still a very limited form of socializing.

Intermittently "socializing" while you try to play is still a very limited form of socializing.

What sort of thing did you have in mind? (Am I missing out?)

What in-person-socializing/WoW-playing hybrid did you have in mind? Because I'm missing out!

I hang out with several people who play WoW at my place when they're over. Other WoW players will spend time geeking out over their characters' stats, gear, appearance, etc, and presumably our imaginary #1 would have less-dedicated groupies that would be interested in that sort of thing while he's playing. Due to the amount of time spent travelling or waiting in queues, there are also a lot of times for traditional sorts of socialization - eating food next to other humans, throwing things at each other, whatever it is humans do. And WoW isn't all that concentration-intensive, so it's entirely possible to have a conversation while playing. And you can even play in the same room as other people who are in your group, and talk about the game in-person while you're doing it.

Forget the WOWer then, how about the M:tG fanatic?

Implementation issue. Oops, wrong cop-out! :-P

Seriously: the Magic: the Gathering fanatic has social contact, but the lack of females in that social network has basically the same effect, in that it's a more limited kind of social interaction that can't replicate our EEA-wired desires.

(And I can relate there too, if anyone's interested.)

I'm interested. How can you relate? What was your situation?

Without going into too many personal details (PM or email me if you're interested in that), for a while I lived a lifestyle where my in-person socialization was limited, as were most of my links to the broader society (e.g. no TV), though I made a lot of money (at least relative to the surrounding community).

I also found myself frequently sad, which was very strange, as I felt all of my needs and wants were being met. It was only after a long time that I noticed the correlation between "being around other people" and "not being sad" -- and I'm an introvert!

I have to side with the CT.org folks on this one

er... why?

ETA: my counter point would be essentially what steven said, but you didn't seem to give an argument.

I can think of several rubbish ways to make a bunch of humans think that they all have high status, a brain the size of a planet would think of a excellent one.

Create a lot of human-seeming robots = Give everyone a volcano = Fool the humans = Build the Matrix.

To quote myself:

In other words, it isn't valid to analyze the sensations that people get when their higher status is affirmed by others, and then recreate those sensations directly in everyone, without anyone needing to have low status. If you did that, I can think of only 3 possible interpretations of what you would have done, and I find none of them acceptable:

  • Consciousness is not dependent on computational structure (this leads to vitalism); or

  • You have changed the computational structure their behaviors and values are part of, and therefore changed their conscious experience and their values; or

  • You have embedded them each within their own Matrix, in which they perceive themselves as performing isomorophic computations.

Create a lot of human-seeming robots = Give everyone a volcano = Fool the humans = Build the Matrix

I agree that these are all rubbish ideas, which is why we let the AI solve the problem. Because it's smarter than us. If this post was about how we should make the world the better place on our own, then these issues are indeed a (small) problem, but since it was framed in terms of FAI, it's asking the wrong questions.

You're missing the main point of the post. Note the bullet points are ranked in order of increasing importance. See the last bullet point.

BTW, how do you let the AI solve the problem of what kind of AI to build?

BTW, how do you let the AI solve the problem of what kind of AI to build?

What kind of AI to be. That's the essence of being a computationally complex algorithm, and decision-making algorithm in particular: you always learn something new about what you should do, and what you'll actually do, and not just learn it, but make it so.

...or more likely, this won't be a natural problem-category to consider at all.

The problem of value aggregation has at least one obvious lower bound: divide the universe on equal parts, and have each part optimized to given person's preference, including game-theoretic trade between the parts to take into account preferences of each of the parts for the structure of the other parts. Even if values of each person have little in common, this would be a great improvement over status quo.

Good point, but

Even if values of each person have little in common, this would be a great improvement over status quo.

This doesn't seem to necessarily be the case for an altruist if selfish bastards are sufficiently more common than altruists who subjunctively punish selfish bastards. (Though if I recall correctly, you're skeptical that that sort of divergence is plausible, right?)

The more negative-sum players could be worse off if their targets become better off as a result of the change. Assuming that punishment is the backbone of these players' preference, and boost in power to do stuff with the allotted matter doesn't compensate the negative effect of their intended victims having a better life. I don't believe any human is like that.

I'm not skeptical about divergence per se, of course preferences of different people are going to be very different. I'm skeptical about distinctly unusual aspects being present in any given person's formal preference, when that person professes that alleged unusual aspect of their preference. That is, my position is that the divergence within human universals is something inevitable, but divergence from the human universals is almost impossible.

This lower-bound could have some use; but my view is that, in the future, most people will be elements of bigger people, making this division difficult.

Existing societies are constructed in a way so that optimizing each person's preference can help optimize the society's preference. So maybe it's possible.