George Hamilton's autobiography Don't Mind if I Do, and the very similar book by Bob Evans, The Kid Stays in the Picture, give a lot of insight into human nature and values.  For instance: What do people really want?  When people have the money and fame to travel around the world and do anything that they want, what do they do?  And what is it that they value most about the experience afterward?

You may argue that the extremely wealthy and famous don't represent the desires of ordinary humans.  I say the opposite: Non-wealthy, non-famous people, being more constrained by need and by social convention, and having no hope of ever attaining their desires, don't represent, or even allow themselves to acknowledge, the actual desires of humans.

I noticed a pattern in these books:  The men in them value social status primarily as an ends to a means; while the women value social status as an end in itself.

"Male" and "female" values

This is a generalization; but, at least at the very upper levels of society depicted in these books, and a few others like them that I've read, it's frequently borne out.  (Perhaps a culture chooses celebrities who reinforce its stereotypes.)  Women and men alike appreciate expensive cars and clothing.  But the impression I get is that the flamboyantly extravagant are surprisingly non-materialistic.  Other than food (and, oddly, clothing), the very wealthy themselves consistently refer to these trappings as things that they need in order to signal their importance to other people.  They don't have an opinion on how long or how tall a yacht "ought" to be; they just want theirs to be the longest or tallest.  The persistent phenomenon whereby the more wealthy someone appears, the more likely they are to go into debt, is not because these people are too stupid or impulsive to hold on to their money (as in popular depictions of the wealthy, eg., A New Leaf) .  It's because they are deliberately trading monetary capital for the social capital that they actually desire (and expect to be able to trade it back later if they wish to, even making a profit on the "transaction", as Donald Trump has done so well).

With most of the women in these books, that's where it ends.  What they want is to be the center of attention.  They want to walk into a famous night-club and see everyone's heads turn.  They want the papers to talk about them.  They want to be able to check into a famous hotel at 3 in the morning and demand that the head chef be called at home, woken up, and brought in immediately to cook them a five-course meal.  Some of the women in these stories, like Elizabeth Taylor, routinely make outrageous demands just to prove that they're more important than other people.

What the men want is women.  Quantity and quality.  They like social status, and they like to butt heads with other men and beat them; but once they've acquired a bevy of beautiful women, they are often happy to retire to their mansion or yacht and enjoy them in private for a while.  And they're capable of forming deep, private attachments to things, in a way the women are less likely to.  A man can obsess over his collection of antique cars as beautiful things in and of themselves.  A woman will not enjoy her collection of Faberge eggs unless she has someone to show it to.  (Preferably someone with a slightly less-impressive collection of Faberge eggs.)  Reclusive celebrities are more likely to be men than women.

Some people mostly like having things.  Some people mostly like having status.  Do you see the key game-theoretic distinction?

Neither value is very amenable to the creation of wealth.  Give everybody a Rolls-Royce; and the women still have the same social status, and the men don't have any more women.  But the "male" value is more amenable to it.  Men compete, but perhaps mainly because the distribution of quality of women is normal.  The status-related desires of the men described above are, in theory, capable of being mutually satisfied.  The women's are not.

Non-positional / Mutually-satisfiable vs. Positional / Non-mutually-satisfiable values

No real person implements pure mutually-satisfiable or non-mutually-satisfiable values.   I have not done a study or taken a survey, and don't claim that these views correlate with sex in general.  I just wanted to make accessible the evidence I saw that these two types of values exist in humans.  The male/female distinction isn't what I want to talk about; it just helped organize the data in a way that made this distinction pop out for me.  I could also have told a story about how men and women play sports, and claim that men are more likely to want to win (a non-mutually-satisfiable value), and women are more likely to just want to have fun (a mutually-satisfiable value).  Let's not get distracted by sexual politics.  I'm not trying to say something about women or about men; I'm trying to say something about FAI.

I will now rename them "non-positional" and "positional" (as suggested by SilasBarta and wnoise), where "non-positional" means assigning a value to something from category X according to its properties, and "positional" means assigning a new value to something from category X according to the rank of its non-positional value in the set of all X (non-mutually-satisfiable).

Now imagine two friendly AIs, one non-positional and one positional.

The non-positional FAI has a tough task.  It wants to give everyone what it imagines they want.

But the positional FAI has an impossible task.  It wants to give everyone what it is that it thinks they value, which is to be considered better than other people, or at least better than other people of the same sex.  But it's a zero-sum value.  It's very hard to give more status to one person without taking the same amount of status away from other people.  There might be some clever solution involving sending people on trips at relativistic speeds so that the time each person is high-status seems longer to them than the time they are low-status, or using drugs to heighten their perceptions of high status and diminish the pain of low status.  For an average utilitarian, the best solution is probably to kill off everyone except one man and one woman.  (Painlessly, of course.)

A FAI trying to satisfy one of these preferences would take society in a completely different direction than a FAI trying to satisfy the other.  From the perspective of someone with the job of trying to satisfy these preferences for everyone, they are as different as it is possible for preferences to be, even though they are taken (in the books mentioned above) from members of the same species at the same time in the same place in the same strata of the same profession.

Correcting value "mistakes" is not Friendly

This is not a problem that can be resolved by popping up a level.  If you say, "But what people who want status REALLY want is something else that they can use status to obtain," you're just denying the existence of status as a value.  It's a value.  When given the chance to either use their status to attain something else, or keep pressing the lever that gives them a "You've got status!" hit, some people choose to keep pressing the lever.

If you claim that these people have formed bad habits, and improperly short-circuited a connection from value to stimulus; and can be re-educated to instead see status as a means, rather than as an ends... I might agree with you.  But you'd make a bad, unfriendly AI.  If there's one thing FAIers have been clear about, it's that changing top-level goals is not allowed.  (That's usually said with respect to the FAI's top-level goals, not wrt the human top-level goals.  But, since the FAI's top-level goal is just to preserve human top-level goals, it would be pointless to make a lot of fuss making sure the FAI held its own top-level goals constant, if you're going to "correct" human goals first.)

If changing top-level goals is allowed in this instance, or this top-level goal is considered "not really a top-level goal", I would become alarmed and demand an explanation of how a FAI distinguishes such pseudo-top-level-goals from real top-level goals.

If a computation can be conscious, then changing a conscious agent's computation changes its conscious experience

If you believe that computer programs can be conscious, then unless you have a new philosophical position that you haven't told anyone about, you believe that consciousness can be a by-product of computation.  This means that the formal, computational properties of peoples' values are not just critical, they're the only thing that matters.  This means that there is no way to abstract away the bad property of being zero-sum from a value without destroying the value.

In other words, it isn't valid to analyze the sensations that people get when their higher status is affirmed by others, and then recreate those sensations directly in everyone, without anyone needing to have low status.  If you did that, I can think of only 3 possible interpretations of what you would have done, and I find none of them acceptable:

  • Consciousness is not dependent on computational structure (this leads to vitalism); or
  • You have changed the computational structure their behaviors and values are part of, and therefore changed their conscious experience and their values; or
  • You have embedded them each within their own Matrix, in which they perceive themselves as performing isomorophic computations (e.g., the "Build human-seeming robots" or "For every person, a volcano-lair" approaches mentioned in the comments).


This discussion has uncovered several problems for an AI trying to give people what they value without changing what they value.  In increasing order of importance:

  • If you have a value associated with a sensation that is caused by a stimulus, it isn't clear when it's legitimate for a FAI to reconnect the sensation to a different stimulus and claim it's preserved the value.  Maybe it's morally okay for a person to rewire their kids to switch their taste-perceptions of broccoli and ice cream.  But is an AI still friendly if it does this?
  • It isn't okay to do this with the valuation of social status.  Social status has a simple formal (mathematical) structure requiring some agents to have low status in order for others to have high status.  The headache that status poses for a FAI trying to satisfy it is a result of this formal structure.  You can't abstract it away, and you can't legitimately banish it by reconnecting a sensation associated with it to a different stimulus, because the agent would then use that sensation to drive different behavior, meaning the value is now part of a different computational structure, and a different conscious experience.  You either preserve the problematic formal structure, or you throw out the value.
  • Some top-level human goals lead to conflict.  You can't both eliminate conflict, and preserve human values.  It's irresponsible, as well as creepy, when some people (I'm referring to some comments made on LW that I can't find now) talk about Friendly AI the same way that Christians talk about the Second Coming, as a future reign of perfect happiness for all when the lamb will lie down with the lion.  That is a powerful attractor that you don't want to go near, unless you are practicing the Dark Arts.
  • The notion of top-level goal is clear only in a 1960s classic symbolic AI framework.  The idea of a "top-level goal" is an example of what I called the "Prime mover" theory of network concepts.  In humans, a subsidiary goal, like status, can become a top-level goal via classic behavioristic association.  It happens all the time.  But the "preferences" that the FAI is supposed to preserve are human top-level goals.  How's it supposed to know which top-level goals are sacrosanct, and which ones are just heuristics or erroneous associations?
  • Reconciling human values may not be much easier or more sensible than reconciling all values, because human values already differ as much as it is possible for values to differ.  Sure, humans have only covered a tiny portion of the space of possible values.  But we've just seen two human values that differ along the critical dimensions of being mutually satisfiable or not being mutually satisfiable, and of encouraging global cooperation vs. not encouraging global cooperation.  The harmonic series looks a lot like Zeno's geometric series; yet one converges and one diverges.  It doesn't matter that the terms used in each look similar; they're as different as series can be.  In the same way, values taken from any conceivable society of agents can be classified into mutually-satisfiable, or not mutually-satisfiable.  For the purposes of a Friendly AI, a mutually-satisfiable value held by gas clouds in Antares is more similar to a mutually-satisfiable human value, than either is to a non-mutually-satisfiable human value.


220 comments, sorted by Click to highlight new comments since: Today at 8:26 AM
New Comment
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings
[-][anonymous]12y 27

This is a big problem with utilitarianism, period. Status is a problem. So are other-regarding values ("I want my neighbors to be Christian.") You're in deep trouble trying to reconcile differing values globally. (Which is why, for myself, I'm not messing with it; I'm starting to believe in the idea of cultivating our own gardens.)

That said, I think this is a distracting and inaccurate example. Status and stuff aren't the only things we value, and I don't see that they split by gender in the way you say. Personally, I don't want to be Elizabeth Taylor.

Yes, the generalization of the problem is what I call negatively-coupled utility: where satisfaction of one person along one dimension necessarily causes dissatisfaction of another person. Therefore, as long as there is at least one misanthrope (person who is dissatisfied by any increase in anyone else's happiness and vice versa), Pareto-improvements are impossible. Therefore, Pareto-superiority is too strict of a standard for determining when human values are being satsified.
Pareto optimums are probably not very good optimums, in any domain. They're just tractable. An example of looking for your keys under the streetlamp [].
Pareto optima aren't very good on average, but shouldn't any worthwhile concept of optimality imply Pareto optimality?
Indeed. Pareto optimality is a necessary condition for an "ideal" society. Economists often make the mistake of thinking pareto optimality is a sufficient condition for this, which it is not: it's pareto optimal if I have everything and don't much want to part with any of it.
Not if more than a dozen people are involved. Certainly not if a million people are involved. EDIT: Oops. Wrong.
What sort of situation are you thinking of where X would be better than Y for one person and worse for none, but Y better than X all things considered?
Doh! You're right. I think of the concept of Pareto optimality as being useless because the path between the initial allocation, and a really good allocation, is usually littered with Pareto-optimums that must be broken through to get to the really good allocation. The really-good allocation is itself presumably Pareto-optimal. But you shouldn't get into the habit of thinking that's worth much.
Would you agree that a worthwhile concept of optimality would imply that there are no Pareto improvements that can be made to an optimum? (Though the optimum may not be Pareto superior to all alternatives.)
Yes, within the assumptions of the Pareto model.

Your initial observation is interesting; your analysis is somewhat problematic.

First, there is, as mentioned elsewhere, a distinct misconception of status that seems to recur here. Status is not a one-dimensional line upon which each person can be objectively placed. Status is multi-dimensional. For example, your person with a second-rate Faberge egg collection may simply decide that what really counts is how nice someone's roses are, and thus look at better Faberge eggs with sympathy (they should have spent all this money on roses!) rather than jealousy. By competing on multiple dimensions, it's possible for many people to win, potentially even everyone.

You are also using a non-representative sample. The type of person who's likely to become successful through celebrity seems disproportionately likely to be status-obsessed. (This is likely also true of people who inherit fortunes, as they have few other ways to distinguish their self-worth.) Generalizing about the ability of an AI to serve all people based on these few may be inappropriate. This is even more true when we consider that a world with an FAI may look very different in terms of how people are raised and educated.

Furth... (read more)

Actually, there's a pretty easy fix for the status problem, because we actually do have independently-valuable feelings that the ability to boss people around correspond to -- feelings like respectability, importance, and pride being among the main ones.

(Often, however, people who are outwardly concerned with their position are actually motivated by perceived threats to safety or affiliation instead -- believing that they are unworthy or unlovable unless they're important enough, or that you're only safe if you're in charge, etc.)

Anyway, your analysis here (as with many others on LW) conflates feelings of status with some sort of actual position in some kind of dominance hierarchy. But this is a classification error. There are people who feel quite respectable, important, and proud, without needing to outwardly be "superior" in some fashion. One's estimate of one's respectability, prideworthiness, or importance in the scheme of things is not intrinsically linked to any value scheme other than one's own.

Given that the idea of CEV is to get to what the more-grown, wiser version of "you" would choose for your value scheme, it's pretty much a given that unsupport... (read more)

Those aren't the people I'm talking about. You're not dealing with the actual values the people I described have; you're saying they should have different values. Which is unFriendly!
Could be saying "I think that, upon further reflection, they would have different values in this way".
Phil, you're right that there's a difference between giving people their mutually unsatisfiable values and giving them the feeling that they've been satisfied. But there's a mechanism missing from this picture: Even if I wouldn't want to try running an AI to have conversations with humans worldwide to convert them to more mutually satisfiable value systems, and even though I don't want a machine to wire-head everybody into a state of illusory high status, I certainly trust humans to convince other humans to convert to mutually satisfiable values. In fact, I do it all the time. I consider it one of the most proselytism-worthy ideas ever. So I see your post as describing a very important initiative we should all be taking, as people: convince others to find happiness in positive-sum games :) (If I were an AI, or even just an I, perhaps you would hence define me as "unFreindly". If so, okay then. I'm still going to go around convincing people to be better at happiness, rational-human-style.)
It's an error to assume that human brains are actually wired for zero or negative sum games in the first place, vs. having adaptations that tend towards such a situation. Humans aren't true maximizers; they're maximizer-satisficers. E.g., people don't seek the best possible mate: they seek the best mate they think they can get. (Ironically, the greater mobility and choices in our current era often lead to decreased happiness, as our perceptions of what we ought to be able to "get" have increased.) Anyway, ISTM that any sort of monomaniacal maximizing behavior (e.g. OCD, paranoia, etc.) is indicative of an unhealthy brain. Simple game theory suggests that putting one value so much higher than others is unlikely to be an evolutionarily stable strategy.
Your definition of value is not sufficient to encompass how human beings actually process values. We can have both positive and negative responses to the same "value" -- and they are largely independent. People who compulsively seek nonconsensual domination of others are not (if we exclude sociopaths and clinical sadists) acting out of a desire to gain pleasure, but rather to avoid pain. Specifically, a pain that will never actually happen, because it's based on an incorrect belief. And correcting that belief is not the same thing as changing what's actually valued. In other words, what I'm saying is: you're mistaken if you think an emotionally healthy, non-sociopathic/psychopathic human actually positively values bossing people around just to watch them jump. IMO, such a person is actually doing it to avoid losing something else -- and that problem can be fixed without actually changing what the person values (positively or negatively).
Look at how value-laden that sentence is: "healthy", "non-sociopathic/psychopathic". You're just asserting your values, and insisting that people with other values are wrong ("unhealthy").
Well, hang on. pjeby probably would benefit from finding a less judgmental vocab, but he has a valid point: not every human action should be counted as evidence of a human value for the usual result of that action, because some humans have systematically erroneous beliefs about the way actions lead to results. You may be hitting pjeby with a second-order ad hominem attack! Just because he uses vocab that's often used to delegitimize other people doesn't mean his arguments should be deligitimized.
But people "who compulsively seek nonconsensual domination of others" and "actually positively values bossing people around just to watch them jump" exist and are quite successful, arguably as a result of these values at other things which most humans value (sex, status, wealth). Pjeby is describing traits that are common in politicians, managers and high school teachers.
And I'm asserting that the subset of those individuals who are doing it for a direct feeling-reward (as opposed to strategic reasons) are what we would call sociopaths or psychopaths. The remainder (other than Machiavellian strategists and 'opaths) are actually doing it to avoid negative feeling-hits, rather than to obtain positive ones.
I assume we would want CEV to exclude the preferences of sociopaths and psychopaths, as well as those of people who are actually mistaken about the beliefs underlying their preferences. Healthy human beings experience guilt when they act like jerks, counterbalancing the pleasure, unless there are extenuating circumstances (like the other guy being a jerk, too). And when a person doesn't experience guilt, we call that sociopathy.
"Knew more, thought faster" should make the second irrelevant (and make the "theocracy" part of "xenophobic theocracy" implausible even given majority voting). The original CEV document actually opposed excluding anyone's preferences or loading the outcome in any way as an abuse of the programmers' power, but IIRC, Eliezer has made comments here indicating that he's backed down from that some.
I thought the idea was that, under CEV, sociopaths would just get outvoted. People with mutant moralities wouldn't be excluded, but, just by virtue of being mutants, their votes would be almost entirely drowned out by those with more usual moralities. [ETA: Eliezer would object to calling these mutant moralities "moralities", because he reserves the word "morality" for the action-preferring algorithm (or whatever the general term ought to be) that he himself would find compelling in the limit of knowledge and reflection. As I understand him, he believes that he shares this algorithm with nearly all humans.]
If it were (just) a matter of voting, then I imagine we'd end up with the AI creating a xenophobic theocracy.
As I understand it, it's not just a matter of voting. It's more as though a simulated* version of each us had the opportunity to know everything relevant that the FAI knows, and to reflect fully on all that information and on his or her values to reach a fully coherent preference. In that case, it's plausible that nearly all of us would be convinced that xenophobic theocracy was not the way to go. However, if many of us were convinced that xenophobic theocracy was right, and there were enough such people to outweigh the rest, then that would mean that we ought to go with xenophobic theocracy, and you and I are just mistaken to think otherwise. * I think that Eliezer in fact would not want the FAI to simulate us to determine our CEV. The concern is that, if the AI is simulating us before it's learned morality by extrapolating our volition, then the simulations would very likely lead to very many tortured minds.
I wonder if, under the current plan, CEV would take into account people's volition about how CEV should work — i.e. if the extrapolated human race would want CEV to exclude the preferences of sociopaths/psychopaths/other moral mutants, would it do so, or does it only take into account people's first-order volition about the properties of the FAI it will build?
Ultimately, preference is about properties of the world, and AI with all its properties is part of the world.
In the original document [], "extrapolated as we wish that extrapolated, interpreted as we wish that interpreted" sounds like it covers this, in combination with the collective nature of the extrapolation.
Your place in the dominance hierarchy determines your mating prospects much more so than your feelings of high status. Feelings maybe necessary, but not sufficient in this case.

The problem of value aggregation has at least one obvious lower bound: divide the universe on equal parts, and have each part optimized to given person's preference, including game-theoretic trade between the parts to take into account preferences of each of the parts for the structure of the other parts. Even if values of each person have little in common, this would be a great improvement over status quo.

Good point, but This doesn't seem to necessarily be the case for an altruist if selfish bastards are sufficiently more common than altruists who subjunctively punish selfish bastards. (Though if I recall correctly, you're skeptical that that sort of divergence is plausible, right?)
The more negative-sum players could be worse off if their targets become better off as a result of the change. Assuming that punishment is the backbone of these players' preference, and boost in power to do stuff with the allotted matter doesn't compensate the negative effect of their intended victims having a better life. I don't believe any human is like that. I'm not skeptical about divergence per se, of course preferences of different people are going to be very different. I'm skeptical about distinctly unusual aspects being present in any given person's formal preference, when that person professes that alleged unusual aspect of their preference. That is, my position is that the divergence within human universals is something inevitable, but divergence from the human universals is almost impossible.
This lower-bound could have some use; but my view is that, in the future, most people will be elements of bigger people, making this division difficult. Existing societies are constructed in a way so that optimizing each person's preference can help optimize the society's preference. So maybe it's possible.
I think the idea is to divide between current people('s individual extrapolated goal systems) once and for all time, in which case this poses no problem as long as personal identity isn't significantly blurred between now and FAI.

This discussion of "Friendly AI" is hopelessly anthropomorphic. It seems to be an attempt to imagine what a FAI optimizing the world to given person's values will do, and these are destined to fail, if you bring up specific details, which you do. A FAI is a system expected to do something good, not a specific good thing known in advance. You won't see it coming.

(More generally, see the Fun theory sequence.)

Yes, I think that this is right. An FAI would try to create a world in which we are all better off. That doesn't mean that the world would be as any one of us considers perfect. Perhaps each of us would still consider it to be very suboptimal. But all of us would still desire it over the present world. In other words, let's grant, for the sake of argument, that most of us are doomed to continue to suffer from losing zero-sum status games forever. Nonetheless, there are possible worlds within which we all, even the losers, would much rather play these games. So there is still a lot that an FAI could do for us.
You can't think about what specifically FAI will do, period. It seems quite likely there will be no recognizable humans in a world rebuilt by FAI. Any assumption is suspect, even the ones following from the most reliable moral heuristics.
Is it correct to call it FAI, then? Do you see a world with "no recognizable humans" as a very likely thing for the human race (or its extrapolated volition) to collectively want?
I'm considering the case of FAI, that is humanity's preference correctly rendered. Status quo has no power. So the question shouldn't be whether "no recognizable humans" is the particular thing humanity wants, but rather whether "preserving recognizable humans" happens to be the particular thing that humanity wants. And I'm not sure there are strong enough reasons to expect "world with recognizable humans" to be the optimal thing to do with the matter. It might be, but I'm not convinced we know enough to locate this particular hypothesis. The default assumption that humans want humans seems to stem from the cached moral intuition promoted by availability in the current situation, but reconstructing the optimal situation from preference is a very indirect process, that won't respect the historical accidents of natural development of humanity, only humanity's values.
"Specifically" is relative. By some standards, we have never thought specifically about anything at all. (I have never traced precisely the path of every atom involved in any action.) Nonetheless, one can think more or less specifically, and to think at all is to think a thought that is specific to some extent. To think, as you wrote above, that an "FAI is a system expected to do something good" is to think something more specific than one might, if one were committed to thinking nothing specific, period. (This is assuming that your words have any meaning whatsoever.) ETA: In other words, as Eliezer wrote in his Coming of Age sequence, you must be thinking something that is specific to some extent, for otherwise you couldn't even pose the problem of FAI to yourself.
Sure. The specific thing you say is that the outcome is "good", but what that means exactly is very hard to decipher, and in particular hard or impossible to decipher in a form of a story, with people, their experiences and social constructions. It is the story that can't be specific.
[ETA: I wrote the following when your comment read simply "Sure, why?". I can see the plausibility of your claim that narrative moral imaginings can contribute nothing to the development of FAI, though it's not self-evidently obvious to me. ] Perhaps I missed the point of your previous comment. I presumed that you thought that I was being too specific. I read you as expressing this thought by saying that one should not think specifically, "period". I was pointing out the impossibility or meaninglessness of that injunction, at least in its extreme form. I was implicitly encouraging you to indicate the non-extreme meaning that you had intended.
The post has five bullet points at the end, and this does not respond to any of them. The post explores the nature of values that humans have, and values in general; Vladimir's comment is to the effect that we can't investigate values, and must design a Friendly AI without understanding the problem domain it will face.
We can't investigate the content of human values in a way that is useful for constructing Friendly AI, and we can't investigate what specifically Friendly AI will do. We can investigate values for the purpose of choosing better human-designed policies.
Do you want to qualify that some way? I interpret as meaning that learning about values has no relevance to constructing an AI whose purpose is to preserve values. It's almost an anti-tautology.
The classical analogy is that if you need to run another instance of a given program on a faster computer, figuring out what the program does is of no relevance, you only need to correctly copy its machine code and correctly interpret it on the new machine.
If you need to run another instance of a given program on a faster computer, but you don't know what an algorithm is, or what part of the thing in front of you is a "computer" and what part is a "computer program", and you have not as of yet discovered the concept of universal computation, nor are certain whether the computer hardware, or even arithmetic itself, operates deterministically - -- then you should take some time to study the thing in front of you and figure out what you're talking about.
You'd probably need to study how these "computers" work in general, not how to change the background color in documents opened with a word processor that runs on the thing. A better analogy in the direction you took is uploading: we need to study neurons, not beliefs that a brain holds.
You seem to think that values are just a content problem, and that we can build a mechanism now and fill the content in later. But the whole endeavor is full of unjustified assumptions about what values are, and what values we should pursue. We have to learn a lot more about what values are, what values are possible, what values humans have, and why they have them, before we can decide what we ought to try to do in the first place.
Of course. Only the finer detail is content problem.
Not that I know of. On the contrary, the assumption is that one shouldn't posit statements about which values human actually have, and what kind of mathematical structure values are is an open problem.
The discussion of human preferences has to be anthropomorphic, because human preferences are human. Phil is not anthropomorphizing the AI, he's anthropomorphizing the humans it serves, which is OK.
FAI doesn't serve humans. It serves human preference, which is an altogether different kind of thing, and not even something humans have had experience with. An analogy: the atomic structure of a spoon is not spoon-morphic, just because atomic structure of a spoon is of a spoon.
I disagree that humans have no experience with human preference. It is true that "formal preference" is not identical to the verbalized statements that people pragmatically pursue in their lives, but I also think that the difference between the two is somewhat bounded by various factors, including the bound of personal identity: if you diverge so much from what you are today, you are effectively dead.
Formal preference and verbalized preference are completely different kind of objects, almost nothing in common. Verbalized preference talks about natural categories, clusters of situations found in actual human experience. You don't have any verbalized preference about novel configurations of atoms that can't be seen as instances of the usual things, modified by usual verbs. Formal preference, on the other hand, talks about all possible configurations of matter. Personal identity, as I discussed earlier [], is a referent in the world sought by our moral intuition, a concept in terms of which a significant part of our moral intuition is implemented. When me-in-the-future concept fails to find a referent, this is a failure of verbalized preference, not formal preference. You'll get a gap on your map, inability to estimate moral worth of situations in the future that lack you-in-the-future on many important aspects. But this gap on the map doesn't correspond to a gap on the moral territory, to these configurations automatically having equal or no moral worth. This is also a point of difference between formal preference and verbalized preference: formal preference refers to a definition that determines the truth about the moral worth of all situation, that establishes the moral territory, while verbalized preference is merely a non-rigorous human-level attempt to glimpse the shape of this territory. Of course, even in a FAI, formal preference doesn't allow to get all the answers, but it is the criterion for the truth of imperfect answers that FAI will be able to find.
The following sounds worryingly like moral realism: Of course if you meant it in an antirealist sense, then is problematic, because the map (verbalized preference) contributes causally to determining the territory, because (the brainware that creates) your verbalized preferences determines (in part) your formal preference.
Compare with your beliefs being implemented as patterns in your brain, which is a part of the territory. That the fact of your beliefs being a certain way, apart of their meaning and truth of that meaning, is a truth in its own right, doesn't shatter the conceptual framework of map-territory distinction. You'd just need to be careful with what is the subject matter you currently consider. I don't know what you read in realist/anti-realist distinction; for me, there is a subject matter, and truth of that subject matter, in all questions. The questions of how that subject matter came to be established, in what way it is being considered, and who considers it, are irrelevant to the correctness of statements about the subject matter itself. Here, we consider "formal preference". How is it defined, whether the way it came to be defined was influenced by verbal preference, is irrelevant to what it actually asserts, once it's established what we are talking about. If I consider "the program that was written in file1.c this morning", this subject matter doesn't change if the file was renamed in the afternoon, or was deleted without anyone knowing its contents, or modified, even perhaps self-modified by compiling and running the program determined by that file. The fact of "contents of file1.c" is a trajectory, a history of change, but it's a fact separate from "contents of file1.c this morning", and neither fact can be changed, though the former fact (the trajectory of change of the content of the file) can be determined by one's actions, through doing something with the file.
You said: I think that one should add that verbalized preference is both an attempt to glimpse the "territory" that is formalized preference, and the thing that causes formalized preferences to exist in the first place. It is at one and the same time a map and a foundation. On the other hand, your beliefs are only a map of how the world is. The world remains even if you don't have beliefs about it.
Verbalized preferences don't specify formal preference. People use their formal preference through arriving at moral intuitions in specific situations they understand, and can form verbalized preferences as heuristic rules describing what kinds of moral intuitions are observed to appear upon considering what situations. Verbalized preferences are plain and simple summaries of observations, common sense understanding of the hidden territory of the machinery in the brain that produces the moral intuition in an opaque manner. While verbalized preferences are able to capture the important dimensions of what formal preference is, they no more determine formal preference than Newton's laws, as written in a textbook, determine the way real world operates. They merely describe.
EDIT: And the conclusion to draw from this is that we can use our axiological intuitions to predict our formal preferences, in certain cases. In some cases, we might even predict perfectly: if you verbalize that you have some very simple preference, such as "I want there to be a brick on this table, and that's all I want", then your formal preference is just that. Human preferences are too big and unwieldy to predict this simply. We each have many many preferences, and they interact with each other in complex ways. But I still claim that we can make educated guesses. As I said, verbalized preferences are the foundation for formal preferences. A foundation does not, in a simple way, determine the building on it. But if you see a 12 foot by 12 foot foundation, you can probably guess that the building on top of it is not going to be the Eiffel Tower.
This is closer. Still, verbalized preference is observation, not the reality of formal preference itself. The goal of preference theory is basically in coming up with a better experimental set-up than moral intuition to study formal preference. This is like moving on from study of physics by making observations of natural phenomena with naked eye, to lab experiments with rulers, clocks, microscopes and so on. Moral intuition, as experienced by humans, is too fuzzy and limited experimental apparatus, even if you use it to observe the outcomes of carefully constructed experiments.
Your beliefs shape the world though, if you allow high-level concepts to affect low-level ones. Aside from being made of the world in the first place, your actions will follow in part from your beliefs. If you don't allow high-level concepts to affect low-level ones, then verbalized preference does not cause formalized preferences to exist.
I agree, I can almost hear Eliezer saying (correctly) that it's daft to try and tell the FAI what to do , you just give it it's values and let it rip. (if you knew what to do, you wouldn't need the AI) All this post has brought up is a problem that an AI might potentially have to solve. And sure it looks like a difficult problem, but it doesn't feel like one that can't be solved at all. I can think of several rubbish ways to make a bunch of humans think that they all have high status, a brain the size of a planet would think of a excellent one.
Isn't that what society currently does? Come up with numerous ways to blur and obscure the reality of where exactly you fall in the ranking, yet let you plausibly believe you're higher than you really are?
Isn't it that we each care about a particular status hierarchy? The WOW gamer doesn't care about the status hierarchy defined by physical strength and good looks. It's all about his 10 level 80 characters with maxed out gear, and his awesome computer with a Intel Core i7 975 Quad-Core 3.33Ghz cpu, 12GB of tri-channel DDR3, Dual SLIed GeForce GTX 260 graphics cards, 2 1TB hard drives, Bluray, and liquid cooling.
This issue came up on before in reply to a claim by Will Wilkinson that free market societies decrease conflict by having numerous different hierarchies so that everyone can be near the top in one of them. (Someone google-fu this?) The people replied that these different hiearchies actually exist within a meta-hierarchy that flattens it all out and retains a universal ranking for everyone, dashing the hopes that everyone can have high status. The #1 WOW player, in other words, is still below the #100 tennis player. Despite the ideological distance I have from them, I have to side with the folks on this one :-/ ETA: Holy Shi-ite! That discussion was from October '06! Should I be worried or encouraged by the fact that I can remember things like this from so long ago?
The crooked timber post is here []. On first glance it seems like a matter of degree: to the extent that there is such a universal ranking, it only fully defeats Wilkinson's point if the universal ranking and its consequences are the only ranking anyone cares about. As long as different people care differently about (the consequences of) different rankings, which it seems to me is often the case, everyone can rise in their favorite ranking and benefit more than others are harmed. ETA: though maybe the more hierarchies there are, the less good it feels to be #100 on any of them.
Okay, to substantiate my position (per a few requests), I dispute that you can actually achieve the state where people only care about a few particular hierarchies, or even that people have significant choice in which hierarchies they care about. We're hardwired to care about status; this drive is not "up for grabs", and if you could turn off your caring for part of the status ranking, why couldn't you turn it all off? Furthermore, I'm highly skeptical that e.g. the WOW superstar is actually fully content to remain in the position that being #1 in WOW affords him; rather, he's doing the best he can given his abilities, and this narrow focus on WOW is a kind of resignation. In a way I can kind of relate: in high school, I used to dominate German competitions and classes involving math or science. While that was great, it just shifted my attention to the orchestra classes and math/debate competitions that I couldn't dominate. Now, you can dull the social influence on yourself that makes you care about status by staying away from the things that will make you compare yourself to the broader (e.g. non-WoW) society, but this is a devil's bargain: it has the same kind of effect on you as solitary confinement, just of a lesser magnitude. (And I can relate there too, if anyone's interested.) I think the WOW superstar would, if he could, trade his position for one comparable to the #100 tennis player in a heartbeat. And how many mistresses does #1 in Wow convert to?
I don't know about in-game WoW superstars, but I knew an admin of an "unofficial" Russian server of a major AAA MMORPG, and he said that basically all female players of that server he met in real life wanted to go to bed with him. This might have been an exaggeration, but I can confirm at least one date. BTW, I wouldn't rate the guy as attractive.
In the 1990s I happened upon a game of Vampire (a live-action role-playing game) being played outdoors at night on the campus of UC Berkeley. After the game, I happened to be sitting around at Durant Food Court (a cluster of restaurants near campus) when I overheard one of the female players throw herself at one of the organizers: "How many experience points would I need to go to bed with you?" she asked playfully. (The organizer threw me a juicy grin on the side a few moments later, which I took as confirmation that the offer was genuine.) I am guessing that in the environment of evolutionary adaptation, political success and political advantage consisted largely of things very much like being able to get a dozen people to spend an evening in some organized activity that you run. ADDED. Now that I have had time to reflect, what she probably said is, "how many experience points do I get for . . .", which is a wittier come-on than the one I originally wrote and which jibes with the fact that one of the organizer's jobs during the game is to award experience points to players.
Interesting; I guess I underestimated the position of unofficial Russian WoW server admins in the meta-hierarchy -- in part because I didn't expect as many desirable Russian women to play WoW.
If the server population is a couple thousand players, and there are 5% of females among them, that leaves you with about 100 females, 10 of which will likely be attractive to you -- and if you run a dozen servers or so, that's definitely not a bad deal if you ask me :)
Take a less extreme version of the position you are arguing against: the WOWer cares about more than the WOW hierarchy, but the meta-hierarchy he sets up is still slightly different from the meta-hierarchy that the 100th best tennis player sets up. The tennis player wouldn rank (1st in tennis, 2nd in WOW) higher than (2nd in tennis, 1st in WOW), but the WOWer would flip the ranking. Do you find this scenario all that implausible?
It's plausible, but irrelevant. The appropriate comparison is how the WoWer would regard a position If he doesn't yearn for a high ranking in tennis, it's because of the particulars of tennis, not out of a lack of interest in a higher ranking in the meta-hierarchy.
Well, it's not relevant if the WOWer would still rather be the 100th best tennis player and suck at WOW than his current position - which is plausible, but there are probably situations where this sort of preference does matter. He's certainly interested in the meta-hierarchy, but why can't he value the status gained from WOW slightly higher than the status gained from tennis, irrespective of how much he likes tennis and WOW in themselves?
Yes, I get that someone might plausibly not care about tennis per se. That's irrelevant. What's relevant is whether he'd trade his current position for one with a meta-hierarchy position near the #100 tennis player -- not necessarily involving tennis! -- while also being something he has some interest in anyway. What I dispute is that people can genuinely not care about moving up in the meta-hierarchy, since it's so hardwired. You can achieve some level of contentedness, sure, but not total satisfaction. The characterization steven gave of the #1 WoW player's state of mind is not realistic.
But we're probably also wired to care mostly about the hierarchies of people with whom we interact frequently. In the EEA, those were pretty much the only people who mattered. [ETA: I mean that they were the only people to whom your status mattered. Distant tribes might matter because they could come and kick you off your land, but they wouldn't care what your intra-tribe status was.] The #1 WOW player probably considers other WOW players to be much more real, in some psychologically powerful way, than are professional tennis players and their fans. It would therefore be natural for him to care much more about what those other WOW players think.
But like I said earlier, that's like saying, "If you live in solitary confinement [i.e. no interaction even with guards], you're at the top of your hierarchy so obviously that must make you the happiest possible." You can't selectively ignore segments of society without taking on a big psychological burden.
You can't have high status if no other people are around. But high status is still a local phenomenon. Your brain wants to be in a tribe and to be respected by that tribe. But the brain's idea of a tribe corresponds to what was a healthy situation in the EEA. That meant that you shouldn't be in solitary confinement, but it also meant that your society didn't include distant people with whom you had no personal interaction.
But from the perspective of an EEA mind, online interaction with other WoWers is identical (or at least extremely similar) to solitary confinement in that you don't get the signals the brain needs to recognize "okay, high status now". (This would include in-person gazes, smells, sounds, etc.) This is why I dispute that the WoW player actually can consider the other WoW players to be so psychologically real.
Ah - I'd been misreading this because I imagined the #1 WoW player would interact socially with other WoW players ("in real life") like all of the WoW players I know do.
Wouldn't the #1 WoW player be spending most of his waking hours on a computer instead of socializing?
Well so far I've just been assuming '#1 WoW player' is meaningful. As I understand it, there isn't much to gain at the margins once you spend most of your time playing. Also, who says you can't be on a computer and socializing? There's plenty of time to look away from the computer while playing WoW, and you can play it practically anywhere.
Human psychology. Your body can tell the difference between computer interaction and in-person interaction. Intermittently "socializing" while you try to play is still a very limited form of socializing.
What sort of thing did you have in mind? (Am I missing out?)
What in-person-socializing/WoW-playing hybrid did you have in mind? Because I'm missing out!
I hang out with several people who play WoW at my place when they're over. Other WoW players will spend time geeking out over their characters' stats, gear, appearance, etc, and presumably our imaginary #1 would have less-dedicated groupies that would be interested in that sort of thing while he's playing. Due to the amount of time spent travelling or waiting in queues, there are also a lot of times for traditional sorts of socialization - eating food next to other humans, throwing things at each other, whatever it is humans do. And WoW isn't all that concentration-intensive, so it's entirely possible to have a conversation while playing. And you can even play in the same room as other people who are in your group, and talk about the game in-person while you're doing it.
LAN party []
In fairness, you also "knew" [] that half the folks playing Magic:the Gathering are female, and knew [] that was true of RPG conventions as well. So I tend not to weight your personal experiences heavily. Please understand.
Forget the WOWer then, how about the M:tG fanatic?
Implementation issue. Oops, wrong cop-out! :-P Seriously: the Magic: the Gathering fanatic has social contact, but the lack of females in that social network has basically the same effect, in that it's a more limited kind of social interaction that can't replicate our EEA-wired desires.
I'm interested. How can you relate? What was your situation?
Without going into too many personal details (PM or email me if you're interested in that), for a while I lived a lifestyle where my in-person socialization was limited, as were most of my links to the broader society (e.g. no TV), though I made a lot of money (at least relative to the surrounding community). I also found myself frequently sad, which was very strange, as I felt all of my needs and wants were being met. It was only after a long time that I noticed the correlation between "being around other people" and "not being sad" -- and I'm an introvert!
Here is the article you are looking for []
er... why? ETA: my counter point would be essentially what steven said [] , but you didn't seem to give an argument.
See my reply [] to steven.
Create a lot of human-seeming robots = Give everyone a volcano = Fool the humans = Build the Matrix. To quote myself: In other words, it isn't valid to analyze the sensations that people get when their higher status is affirmed by others, and then recreate those sensations directly in everyone, without anyone needing to have low status. If you did that, I can think of only 3 possible interpretations of what you would have done, and I find none of them acceptable: * Consciousness is not dependent on computational structure (this leads to vitalism); or * You have changed the computational structure their behaviors and values are part of, and therefore changed their conscious experience and their values; or * You have embedded them each within their own Matrix, in which they perceive themselves as performing isomorophic computations.
I agree that these are all rubbish ideas, which is why we let the AI solve the problem. Because it's smarter than us. If this post was about how we should make the world the better place on our own, then these issues are indeed a (small) problem, but since it was framed in terms of FAI, it's asking the wrong questions.
You're missing the main point of the post. Note the bullet points are ranked in order of increasing importance. See the last bullet point. BTW, how do you let the AI solve the problem of what kind of AI to build?
What kind of AI to be. That's the essence of being a computationally complex algorithm, and decision-making algorithm in particular: you always learn something new about what you should do, and what you'll actually do, and not just learn it, but make it so.
...or more likely, this won't be a natural problem-category to consider at all.

The closest thing I can think of as a solution for the status-FAI is domain-specific status. Let Fred be a high-status pianist, let Jim be a high-status computer engineer, let Sheila be a high-status chef, and let the status ordering shift with context.

But that does seem like a problem for FAI, given the appearance of these preferences.

There is no "status-FAI". You can't have morality, but with purple buttons.
Clearly this should be charitably read as "status-(putatively F)AI", which would be much more unwieldy. The hell I can't! ETA: Well here's Bentham with a purple button anyway []
What does this mean? What do purple buttons signify?
Clarification is here [].
No - I still don't know what "purple buttons" is supposed to mean.
Ice cream [] , obviously.
I referred to the second AI in the "Mutually-satisfiable vs. non-mutually-satisfiable values" section of the original post.
Right, but this is a consideration of an inherently unstable specification of a wish [], not of a preference (morality). Wishes is not the sort of thing FAI deals with.
Why do you say that? I think you're defining the problem away, by saying that values that aren't mutually-satisfiable aren't values. What's more wish-like about wanting high status than about wanting an ice cream cone?
Nothing. You can't ask FAI for an ice cream either. Again, see this comment [] for more detail.
I read it; now it seems you're protesting against presenting an AI with a single value to optimize, rather than my source code. If something poses a problem in the very simple case of an AI with one single value to optimize, I don't see how giving it a whole bunch of values to optimize, along with their algorithmic definitions and context, is going to make things easier. Also, what was to my mind the most-important point of the post is that humans already hold values that span the space of possible values along what may be the most-important or most-problematic dimensions.
I suggest that it's very hard to form a coherent concept of an AI that only cares about one particular wish/aspect/value. FAI is only supposed to improve on status quo. In the worst impossible case, this improvement is small. Unless AI actually makes things worse (in which case, it's by definition not Friendly), I don't see what your argument could possibly be about.
Status-putatively-F-AI [].
I hear you, but I believe it's a very strange and unstable definition. When you say that you want AI that "optimizes X", you implicitly want X to be optimized is a way in which you'd want it optimized, understood in the way you want it understood, etc. Failing to also specify your whole morality as interpreter for "optimize X" will result in all sorts of unintended consequences, making any such formal specification unrelated to the subject matter that you intuitively wanted to discuss by introducing the "optimize X" statement. In the context of superintelligent AI, this means that you effectively have to start with a full (not-just-putatively-)FAI and then make a wish. But what should FAI do with your wish, in terms of its decisions, in terms of what it does with the world? Most likely, completely disregard the wish. This is the reason there are no purple button FAIs.
I don't disagree with you. I was just responding to the challenge set in the post.
Jay-Z proves status wants to break free of its domain.
[-][anonymous]12y 6

Just one minor quibble: I think you should have put the explanation for why you chose the male/female examples ("The male/female distinction isn't rigid; it just helped organize the data in a way that made this distinction pop out for me") much earlier in the post. Since agreeing or disagreeing with your generalization has nothing to do with agreeing or disagreeing with your conclusion, you should say so before the beginning of the second section.

Human values differ a lot based on the surrounding. If you dump all humans into fairy tale land they might react very differently than now.

You seem to assume that the goal structure of a human is stable. But when i look around I see all kinds of manipulations happening. Religion && Politics being one, advertisement being another. An AI doesn't have to rewire brains the hard way. It could just buy a major entertainment company and implement the values it prefers humans to have into a soap opera, and then buy some ads at the super-bowl. Allowing the ... (read more)

It was good to be explicit that these are generalizations. Nonetheless, it was still a mistake to label these two views "female" and "male", rather than the more neutral "positional" and "non-positional". That you notice this correlation is interesting, but not the main point. Given the likely effects, it seems better not to over-emphasize this with your choice nomenclature.

I thought about that a lot; and I tried writing "male" and "female" out of it. But I couldn't write "male" and "female" out until after the point where I stopped using evidence for the existence of each value based on observations of men vs. of women. The post doesn't talk about men or women any longer than it absolutely has to just to introduce its supporting data.
You don't have to point to women and men generally having these respective values to show that these values exist. Pointing to specific examples suffices. Pointing out that there is a general trend is interesting, and worth doing. But you still don't need to name them that way.
Would it be possible to change the title? Edit: "Positional and Non-Positional Friendly AI" would be an improvement, for example. You would have to add the definitions to the text, naturally.
Would you be as likely to read something called "Positional and Non-Positional Friendly AI"?
Actually, another problem is that the AI is, with either title, neither male nor female, neither positional nor non-positional. These apply to the values of humans that it is trying to optimize for.
Even if you didn't make that the title, you could have at least introduced the term, which is shorter than "mutually-satisfiable", and then used it for the bulk of the article.
More, honestly. "Male and Female Friendly AI" led me to suspect you would be engaging in unwarranted generalization. It is a clanger, though.
I changed the title - apart from distraction by gender issues, I think people are focusing on the "you can't please all the people all the time" part, and missing the "most of the variance of possible values is present within human values" part. Any links made to this article before this comment will now be broken.

Any links made to this article before this comment will now be broken.

Actually, the title in the URL doesn't matter.

OK, that made me laugh.

I agree that friendly AI is probably doomed if the goal is to maximize human values. But what if we, as AI designers (I'm not an AI designer, but bear with me), don't care about what you've defined as "human values?" As I've alluded to before, what matters isn't the entire system Phil Goetz, but a specific subsystem. This is an important metaethical point. There are algorithms running in your brain that you have no control over, yet they do things that you simply don't want them to be doing. For example, I don't want to have to purchase fuzz... (read more)

I think Eliezer is drawing the opposite conclusion in that passage. You seem to be saying that we should (or at least could) ask what a subsystem of Phil Goetz values; Eliezer seems to be saying that we shouldn't. The complete post referred to shows that Eliezer doesn't have the simple view of terminal values that I formerly attributed to him. I don't know that it's compatible with FAI, though, which IIRC is about preserving top-level goals. I could say that CEV = FAI + non-belief in terminal values. ADDED: Oops; his post on terminal values [] shows that he does have the view of terminal values that I attributed to him.
Ok, ignore the talk about subsystems - you may be right, but it's not really the basis of my criticism. The problem is using actions to infer terminal values. In order to determine your terminal values, you have to think about them; reflect on them. Probably a lot. So in order for the actions of a person to be a reliable indicator of her terminal values, she must have done some reflecting on what she actually values. For most people, this hasn't happened. It's true that this sort of reflection is probably more common among the wealthy - which buttresses your argument, but it's still probably not very common in absolute terms. Consider how many people - even among the the rich - are religious. Or how many people hold on to a "written into the fabric of reality" view of morality. So I don't think you can draw very meaningful conclusions about the differences in terminal values among the sexes just by looking at their actions. I'll grant that it's evidence, just not as strong as your post suggests - and certainly not strong enough to settle the question.
I disagree. People who believe they have thought about their terminal values are often the most confused about what they actually value. Human values as judged by observing how people act rather than by what they claim to think are more self-consistent and more universal than the values professed by people who think they have discovered their own terminal values through reflection. Your conscious beliefs are but a distorted echo of the real values embodied in your brain.
Fair enough - a bit of reflecting might worsen the approximation, however, do our actions allow us to infer what our values would be after we take into account all possible moral arguments? This is what our terminal values are, and my main point is that actions don't tell us much about them.
Putting aside for a moment my issues with the whole idea of terminal values in the sense you seem to be imagining I would suggest that if our actions don't tell us much about them then our thoughts and words tell us even less.
On a day to day basis, sure. I accept that possibility. We don't get to consider all possible moral arguments, well, ever.
Matt Simpson was talking about people who have in fact reflected on their values a lot. Why did you switch to talking about people who think they have reflected a lot? What "someone actually values" or what their "terminal values" are seems to be ambiguous in this discussion. On one reading, it just means what motivates someone the most. In that case, your claims are pretty plausible. On the other reading, which seems more relevant in this thread and the original comment, it means the terminal values someone should act on, which we might approximate as what they would value at the end of reflection. Switching back to people who have reflected a lot (not merely think they have), it doesn't seem all that plausible to suppose that people who have reflected a lot about their "terminal values" are often the most confused about them. For the record, I'm perfectly happy to concede that in general, speaking of what someone "actually values" or what their present "terminal values" are should be reserved for what in fact most motivates people. I think it is tempting to use that kind of talk to refer to what people should value because it allows us to point to existing mental structures that play a clear causal role in influencing actions, but I think it is ultimately only confusing because it is the wrong mental structures to point to when analyzing rightness or shouldness.
I recently wrote a long post [] arguing that your actions specify what your terminal values are, in a way that your thoughts can't. And you commented on it, so I won't repeat the relevant parts here.
click Ok, I think I see now. Forgive me for asking instead of trying to figure this out on my own - finals are looming tomorrow. Morning. And the next morning. One of two things seem to be going on here. I'll quote Eliezer's metaethical position again, for reference (answering what the meaning of "right" is): The bolded part is most relevant to my question. Are you agreeing with Eleizer's argument, and just arguing that present terminal values can only be inferred from action? Or are you disagreeing and arguing that present terminal values, which only can be inferred from action, are the terminal values, i.e. the meaning of "right"/"should"? Or is it something else and I'm still confused?
So - first, I don't really believe in terminal values. When I use that term, I'm working within a hypothetical frame (if X is true, then Y); or using it as an approximation. I said that the only real values an organism has are the values it implements. If you want to have a science of values, and for instance be able to predict what organisms will have what values, you want to work with the behaviors produced, which should follow certain rules; whereas the values that an organism believes it has are different from the values it implements due to accidents of evolution, and working with them will make things less predictable and a science of values more difficult. Eliezer lists only propositional content as factors to consider. So I think he's not talking about the difficulty of dividing an organism into values and value-implementing infrastructure. He seems to be saying that currently-implemented values are a poor copy of a Platonic ideal which we can extrapolate. I would be less likely than Eliezer to consider my present values very similar to the "right" values. I think he would either say there are no right values, or that the right values are those extrapolated from your current values in a way that fixes accidents of evolution and flawed cognition and ignorance. But he doesn't have an independent set of values to set up in opposition to your current values. I would, by contrast, feel comfortable saying "existence is better than non-existence", "consciousness has value", and "complexity is good"; those statements override whatever terminal values I have. I don't really think those statements are in the same category as my terminal values. My terminal values largely concern my own well-being, not what the universe should be like. My preference for good things to happen to me can't be true or false. I'm not comfortable with using "preference" and "value" interchangeably, either. "Preference" connotes likes: chocolate, classical music, fast cars, social status
Dude, we both need to stop this and go to sleep.

Can the AI be Friendly if it creates human-comparable minds with abitrary nonhuman values, provided the resulting homunculi are not, themselves, significantly hostile or in any way self-improving?

If so, I think there's a simple fix for this: have the AI give everyone a volcano lair (or equivalent) and two dozen catgirl minions. That puts all humans solidly in the top 5% of humanoids, which is a nice status kick; they can still compete with other humans for the last few percentiles, but there's no real risk of ending up at the bottom of the heap... unless y... (read more)

Ah yes, Reedspacer's Lower Bound [].
The difference being, humans don't get modified, or separated from other humans.

"You may argue that the extremely wealthy and famous don't represent the desires of ordinary humans. I say the opposite: Non-wealthy, non-famous people, being more constrained by need and by social convention, and having no hope of ever attaining their desires, don't represent, or even allow themselves to acknowledge, the actual desires of humans."

I have a huge problem with this statement. This is taking one subset of the population where you can measure what they value by their actions, and saying without evidence that they represent the gener... (read more)

Well, the "female AI" can create a lot of human-seeming robots to play the low status roles, or something...

That reminds me: I once tried reading a romance novel to see what female desires they pander to and how they do it. [1] From what I could tell, it looks like the male love interest was constructed to satisfy numerous female desiderata that are near impossible to have all in the same person: * He was extremely technically skilled. * He was a servant of the female lead's estate, but, like, the leader of all the other servants. (So, below her class, but still, um, someone who orders other people around.) * ETA: He had connections to earls and lords by way of his highly-valued technical skills and membership in societies related to those skills. (Earls wouldn't socialize with someone from the servant class, but whatever. Why didn't he just hop the next boat to America?) * He was charged with a crime that the female lead was able to discern through her intuition that he was really innocent of. (i.e. another forbidden fruit aspect) * He was the long-lost son of a nobleman that -- yep -- the female lead could also discern through her intuition. (Well-built servant, but also of noble blood! w00t!) * He becomes extremely committed to her when he finds out he's fathered her child. * He was a talented sculptor of figurines. (Starving artist: check!) So, it looks like there's a lot of room for such "impossible men" to be created in service of females. [1] Coincidentally, after buying the book, I found out that the first name of the bad guy in the book is Silas. (The real bad guy, I mean. Not the rebellious, hot, tempting guy. I mean the guy that tortures animals.)

Google romance novel formula, and you'll find web pages by romance novel authors patiently explaining that there is no such thing as a romance novel formula.

Funny thing is, google science fiction novel formula, or fantasy novel formula, and you won't find that.

Look up the Harlequin Romance author's guidelines, and you won't find anything formulaic.

I read a book called "Dangerous men, adventurous women", by romance novelists for romance novelists, also to find out what women looked for in romance novels. And not only is there a formula for romance novels; there's a formula for articles about romance novels:

  • Spend the first half of the article complaining about the idea that romance novels are formulaic.

  • Spend the second half of the article describing the formula, and warning would-be authors that they won't sell books if they deviate from it.

The formula is basically to teach women as many dysfunctional, self-destructive ideas about romance as possible. Start with a young, never-married, beautiful, rebellious woman. Find her a dangerous, out-of-control, rakish, tall, dark, brooding, handsome man with many faults but a heart of gold, who has extensive sexual experien... (read more)

As I understand it, the whole point of R&J (or at least one interpretation) is that it's making fun of that kind of "romance novel" attitude; the two teenagers just met and they're over the top with how in love they are. And this is just a few days after Romeo was madly in love with Rosalind.
That would be a modern re-interpretation, not a 1600-ish interpretation. Humor was not subtle, dark, and ironic back then. Don Quixote was written at the same time, and was making fun of romance novels. It's very different.
I'm not sure that's totally fair. Sonnet 130 [], for instance, essentially makes fun of romantic poetry by subverting it: instead of waxing eloquently about his love's beauty, Shakespeare makes fun of over-the-top descriptions.
Yes, but there's no doubt that he's making fun of it. Romeo and Juliet is tragic; it's pretty clear we're supposed to feel sorry for R&J, not slyly laugh at them. Authors are probably less fond of subtlety in times and places where being misinterpreted can get you executed. (Except, of course, when being correctly interpreted would get them executed.)
Romeo & Juliet is a tragedy, not a comedy. They are children in love with the idea of love rather than mystical soul-mates. The tragedy is that they are killed by various circumstances, not that they lost some amazing, transcendent love. While Romeo's wholehearted dedication to the idea of love (starting with Rosaline and then fixing on Juliet) is somewhat humorous and possibly satirical, but the bulk of the play is tragedy, not satire or romance. Honestly, I find those who think Romeo & Juliet is a romance rather stupid and somewhat disturbing. If Romeo & Juliet is the apex of romance, I'd like something else, please.
Hm, the one I read violated this. The female lead was 31 and the book noted (though infrequently) that she was not physically attractive. And it seems that makes sense for a romance novel, since you're trying to pander to the fantasies of women who want to believe that they can have an exciting romance with a desirable man despite not being physically appealing or despite being past the ideal age. Yep. See: Twilight series.

This is 'fictional evidence', but one way to satisfy positional values would be something like the Matrix, with suitably convincing NPCs for people to be 'better' than.

All this stuff about values is giving me a proverbial headache. When the Rapture of the Nerds happens, I'll take the easy way out and become a wirehead, and let the rest of the world actually do things. ;)

Do we need FAI that does as good a job of satisfying human desires as possible, or would an FAI which protects humanity against devastating threats enough?

Even devastating threats can be a little hard to define.... if people want to transform themselves into Something Very Different, is that the end of the human race, or just an extension to human history?

Still, most devastating threats (uFAI, asteroid strike) aren't such a hard challenge to identify.

There's a danger, though, in building something that's superhumanly intelligent, and has goals that it acts on, that doesn't include some of our goals. You would have to make sure it's not an expectation-maximizing agent. I think an assumption of the FAI project is that you shouldn't do what Nancy is proposing, because you can't reliably build a superhumanly-intelligent self-improving agent and cripple it in a way that prevents it from trying to maximize its goals.
Is it actually more crippled than a wish-fulfilling FAI? Either sort of AI has to leave resources for people. However, your point makes me realize that a big threat only FAI (such threats including that it might take too much from people) will need a model of and respect for human desires so that we aren't left on a minimal reservation.

The "headline" is probably inaccurate. I don't know what metric best measures similarity of value - but by the ones that popped into my head, the various human heavens mostly seem relatively clumped together - compared to what kind of future other creatures might dream of having. It's the close genetic relatedness that does it.

To achieve high status for the max amount of people you could invent more categories in which to achieve status. There is not just one best actress, but many in different types of acting. There can be the best person to make some specific dish. The best dancer of type X. Then you can seclude it by region any come out with many many more things, till everyone is famous who desires so. Alternatively the AI could find out if there is any desire beyond the wish for status and fulfill that instead, or offer a self modification for people that desire status, but desire to change that to something else.

I think a shorter term for what you're describing is positionality, the state where the quality of the good depends on its ranking relative to other goods. The problem you're claiming is that women's values are positional (making them non-mutually-satisfiable), while men's aren't (making them mutually-satisfiable).

But in any case, thanks for saving me the ordeal of telling women how the positionality of their values throws a big monkey wrench in everything ;-)

But, since the FAI's top-level goal is just to preserve human top-level goals, it would be pointless to make a lot of fuss making sure the FAI held its own top-level goals constant, if you're going to "correct" human goals first.)

Well, part of the sleight-of-hand here is that the FAI preserves the goals we would have if we were wiser, better people.

If changing top-level goals is allowed in this instance, or this top-level goal is considered "not really a top-level goal", I would become alarmed and demand an explanation of how a FAI

... (read more)

Reality check: evolutionary theory suggests people's desires should be nailed down as hard as possible to those things that lead to raising good quality babies. Almost 7 billion humans shows how well this theory works.

So: men can be expected to desire status to the extent that it increases their access to young, fertile mates - while women can be expected to desire attention to the extent that it gives them access to a good selection of prospective partners and their genes.

The maternal instict is strong - and it has little to do with attention - and a lot... (read more)

And yet subreplacement fertility in a number of rich countries (the very place where people have copious resources) points to a serious flaw. It's apparent that many people aren't having babies. People are adaptation executors, not fitness maximizers. For a highly simplified example, people like sex. In the ancestral environment sex would lead to babies. But the development of condoms, hormonal birth control, etc, has short-circuited this connection. The tasks of caring for a baby (which are evolutionarily programmed) interfere with sex. Thus, you have people forgoing babies in order to have more sex. Of course, in the real world, people care about status, food, etc, as well as sex. All those things may have been linked to reproduction in the environment where we evolved, but the connection is far weaker with modern technology. Thus, people prefer other things to reproduction.
Some people prefer other things. Mostly, that is ultimately due to memetic infections of their brains - which divert resources to reproducing memes - rather than genes. Yes: some people act to serve parasitic genes rather than their own genes. Yes: some people malfunction, and go wrong. Yet the basic underlying theory has much truth in it - truth an analysis on the level of status-seeking misses out. Of course the theory works much better if you include memes - as well as DNA-genes. An analysis of whether the modern low birth rate strategy in some developed countries is very much worse than the high birth rate strategies elsewhere may have to wait for a while yet. High birth rate strategies tend to be in countries stricken by war, famine and debt. Maybe their genes will prevail overall - but also maybe they won't.
Calling it an "infection" or a "malfunction" implicitly judges the behavior. That's your own bias talking. The fact that someone desires something because of a meme instead of a gene (to oversimplify things; both are always in play) does not make the desire any less real or any less worthy. A solely status-based analysis misses things, just as a solely reproductive analysis misses things. The point is that you can't nail desires down to simply "making good babies" or "being high status" or "having lots of sex"; any or all of these may be true desires in a given person.
It is standard practice to regard some meme-gene conflicts as cases of pathogenic infections. See, for example the books "Virus of the Mind" and "Thought Contagion". Similarly with malfunctions: a suicidal animal has gone wrong - from perspective of the standard functional perspective of biologists - just as much as a laptop goes wrong if you try and use it underwater. Biologists from Mars would have the same concepts in these areas. The point of the reproductive analysis is that it explains the status seeking and attention seeking - whilst also explaining the fees paid for IVF treatments and why ladies like to keep cute puppies. It is a deeper, better theory - with firm foundations in biology.
Evolutionary analysis can if used properly. But evolutionary analysis is properly identifying adaptations, not:
It never said that was the whole of evolutionary theory. It seems like a reasonable 1-line summary of the point I was trying to make - if quoted in context. Your 1-line summary seems to have some flaws too - there is a lot more to evolutionary theory than identifying adaptations.
This is an oversimplification. A baby which has more than one adult on its side will do better than a baby only being raised by its mother.
Isn't that part of "considering a good selection of prospective partners"...? Some of the things they are looking at are faithfulness, kindness and wealth.

Phil, you make a good point here. However, note that when the people you are talking about are extrapolated in a CEV-like FAI, they will also understand this point. Elizabeth Taylor will understand that not everyone can have high status, and therefore people will have to settle on some solution, which could involve "fake people" as low status entities, editing of everyone's top-level goals (a sort of decision-theoretic compromise), etc.

One corollary of this is that for existing high status people, CEV would be a terrible thing. [EDIT: I was thin... (read more)

(As compared to what?) Only assuming that the benefits of living in a post-Singularity world are less valuable than making other people miserable, which strikes me as implausible [] . A middle-class person today likely has way better life than a king of 4000BC.
For someone who highly values status, I disagree with that statement.
What do you mean "for someone who highly values status"? Are you that someone who prefers to have been born a king of 4000BC to comforts of modern world? Do you think people who would profess this verbal preference do so because it's their actual preference?

As evidence for someone like this, consider dictators like Kim Jong Il. Opening up North Korea would result in much greater wealth for both him and his people, but it comes with a loss of power and status for Kim Jong. No one thinks he's opening those borders anytime soon. The comparison isn't as drastic, however - Kim Jong's comforts are probably only a decade or two behind modern (I'm speculating).

His likes are idiosyncratic, but as far as they go, he's cutting-edge. Cognac is routinely cited as one of the top illicit imports, and I don't think he's getting bad cognac; one of his principal interests is/was movies, of which he has a 20,000-strong [] collection - world-class, I think - and he was infamous for kidnapping 'the famous South Korean movie director Shin Sang Ok and his ex-wife, actress Che Eun Hui, and kept them for eight years while making them produce propaganda films', which is something which is inaccessible to just about everyone, modern or no.
He may be an evil dictator, but in my opinion he gets extreme bonus points for style: And from wikipedia: Not even the Bond badguys did anything that amusing.
Hmm, Kim Jong Il is apparently a bad example since he's so wealthy. Surely there are dictators who don't have the resources that Kim Jong has (such that they're living in sub-modern conditions), but they still want to hold on to the power and status they hold despite the potential for wealth. right? (again, speculating, no hard evidence in mind)
Just about any nation-size dictator will be able to scratch up enough cash to live like a millionaire. Extort a few dollars from a million destitute inhabitants and you're talking real money. So you have to look at city or tribal scale units, and even then, I think most chieftains are happier to be in power than out of power in a wealthier nation. How many Afghanistani elders are cooperating with the US, and out of enlightened self-interest, knowing that in any modernized industrial society their clans will be hopelessly obsolete? How many out of naked fear of the Taliban or US, and bribes?
In How the Mind Works, Steven Pinker has an excellent discussion of Schelling's work on game theory, and argues that, per Schelling's work, the appearance of being a rational individual can actually be a liability for a rogue dictator, so they have an incentive to look kooky. Kim Jong Il is playing it by the book.
Good point. However, why would the dictator put on the charade and try to keep his status/power unless he valued it more than the wealth he could obtain by opening the country up? If the gains are small, this is probably a good margin to look irrational on, but if the gains are large enough, opening up outweighs the irrational act (on this margin). There are plenty of other things to appear irrational about with lower stakes. You don't have to appear kooky about every single decision you make in order to convince others that you are kooky - just enough of them. So in a nutshell, if the difference in standards of living for the dictator under the two scenarios are large enough, the irrationality ploy shouldn't matter (much).
I find the premise of Kim Jong-il sharing the poor standard of living with his people (or, not making the most of what the modern world has to offer because of living in his country), completely implausible.
see my reply here []
I think that there are probably people for whom the ability to boss people around, kill others with impunity, have a harem of women, etc, is worth more than a shower and flushing loo.
And modern medicine? All such questions are tests of the imagination, really.
The harem bit looses some of its appeal when you think about the standards of dental care and how rarely people used to bathe.
Hmm, I moved some towards agreement on this one. Though the particular argument you use doesn't apply to post-Singularity lower bound benefits. For a start, add immortality and much deeper insight into all things.
If there's a universal, it's that people enjoy gaining deeper insight - they value the first derivative of insight. Actually having insight can be a drag.
(Whatever, this is a technicality not relevant to the argument.) I doubt having insight is a downside in itself, only perhaps in as much as it makes it no longer possible to gain that insight without also losing it first; and beside the gaining of insight, there are lots of other things people value.
Having an insight can be a downside if the insight disrupts your worldview, or makes you face an unpleasant truth. There is no law saying that truth and happiness are always allies.
If I had to make a wild guess, I might guess that 75% of people in the modern world would say they would rather have been a king in 4000BC. (More, if you exclude the people who say they would rather have been a farmer in 4000BC than a king in 4000BC.) My 50% confidence interval is 25%-95%. Anybody want to do a survey? I would also guess the number who say they would rather be a king is smaller than the number of people who would actually prefer being a king, because people overestimate how much they would miss modern conveniences, and because saying you'd like to be king is frowned on nowadays.
It's interesting to look at what traits people assume they'd carry into the past. I suspect that gender is one of them. I don't have a strong feeling for what proportion would like to be a queen in the ancient world. In discussions I've seen about going back, a fair number say they'd be dead because of the lack of modern medicine.
About half [] on the most recent such discussion I recall reading.
Correct me if I'm wrong - I'm pretty new to this game. Does this entail that you'd assign about a 50% probability that either 0%-25% or 95%+ of the people would say that?
That's an implication, yes.
So, based on those numbers, you think it's more likely that either 0%-25% or 95%+ of the people would say that, than that 35%-85% would say that? (assuming nonzero probability to 25%-35% or 85%-95%)
Based on those number, yes. But I didn't consider both sides like that. I may have erred in overcompensating for the tendency of people to make too-small confidence intervals.
Ah, good. I was afraid I'd misunderstood.
I considered well-off middle-class people in the modern world, which isn't such a big portion of population of the modern world. Of course, for a person in poverty, becoming a king of the savages is probably an improvement. (Very likely, not what you meant.) I agree that these factors are present, but am not sure that they outweigh the factors prompting people to bias their beliefs in the opposite direction (or even that these are the main factors in the direction you indicate).
You're saying that you don't believe there are people for whom status is a value.
Not at all. I'm merely saying that status is not in total dominance of what people prefer.
Okay. You're saying that you don't believe there are people for whom status is a dominant value.
Yes. This would fall into the category of "people" who don't share a human universal balance of basic aspects of value, "people" who are magical mutants with complex motivations shaped primarily by something other than evolution (which won't allow large complex differences from the rest of the population). (See also this comment [] .)
Perhaps they would prefer to be weighted equally with low status people in a CEV to dying at the end of their natural lifespans.
I'm not as much interested in the difficulty of satisfying positional values, as in developing a typology of values, noting that positional / non-positional is a really big difference between values, and concluding that reconciling all human values is not much easier than reconciling all possible values.
I disagree with the connotation of this conclusion, and the word "reconcile" is too vague for it to have a precise denotation. Clearly, human verbalized preferences contradict each other. In terms of "formal preference" I suspect that the same is true.
Yes, this is a concern I've had: a CEV-valuing FAI would also have to be a very good politician to have the social support necessary to keep the project from being shut down. But good politicians are a danger in and of themselves.
The other possibility is that someone builds CEV before the existing elites get to a high enough future-shock level to even pay attention. Dictators will wake up one morning to find that the FAI has just robbed them of the only thing that makes anyone care about them: their power over other people. And furthermore, there will be a very large number of people whose volitions say something like "I want to punish [Dictator X] because he killed my family".
Once an AGI is out, you won't be able to shut it down. The Internet makes it pretty much impossible, even if the AGI is not superintelligent.