In May of 2007, DanielLC asked at Felicifa, an “online utilitarianism community”:

If preference utilitarianism is about making peoples’ preferences and the universe coincide, wouldn't it be much easier to change peoples’ preferences than the universe?

Indeed, if we were to program a super-intelligent AI to use the utility function U(w) = sum of w’s utilities according to people (i.e., morally relevant agents) who exist in world-history w, the AI might end up killing everyone who is alive now and creating a bunch of new people whose preferences are more easily satisfied, or just use its super intelligence to persuade us to be more satisfied with the universe as it is.

Well, that can’t be what we want. Is there an alternative formulation of preference utilitarianism that doesn’t exhibit this problem? Perhaps. Suppose we instead program the AI to use U’(w) = sum of w’s utilities according to people who exist at the time of decision. This solves the Daniel’s problem, but introduces a new one:  time inconsistency.

The new AI’s utility function depends on who exists at the time of decision, and as that time changes and people are born and die, its utility function also changes. If the AI is capable of reflection and self-modification, it should immediately notice that it would maximize its expected utility, according to its current utility function, by modifying itself to use U’’(w) = sum of w’s utilities according to people who existed at time T0, where T0 is a constant representing the time of self-modification.

The AI is now reflectively consistent, but is this the right outcome? Should the whole future of the universe be shaped only by the preferences of those who happen to be alive at some arbitrary point in time? Presumably, if you’re a utilitarian in the first place, this is probably not the kind of utilitarianism that you’d want to subscribe to.

So, what is the solution to this problem? Robin Hanson’s approach to moral philosophy may work. It tries to take into account everyone’s preferences—those who lived in the past, those who will live in the future, and those who have the potential to exist but don’t—but I don’t think he has worked out (or written down) the solution in detail. For example, is the utilitarian AI supposed to sum over every logically possible utility function and weigh them equally? If not, what weighing scheme should it use?

Perhaps someone can follow up Robin’s idea and see where this approach leads us? Or does anyone have other ideas for solving this time inconsistency problem?

New Comment
107 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

The AI is now reflectively consistent, but is this the right outcome?


Should the whole future of the universe be shaped only by the preferences of those who happen to be alive at some arbitrary point in time?

'Should'? Us deciding what should be is already us pretending, hoping or otherwise counterfactually assuming for the purposes of discussion that we can choose the fate of the universe. It so happens that many people that happen to be alive at this arbitrary point in time have preferences with altruistic components that could consider future agents. Lucky them, assuming these arbitrary agents get their way.

Presumably, if you’re a utilitarian in the first place, this is probably not the kind of utilitarianism that you’d want to subscribe to.

That may explain my disagreement (or, as phrased, unexpected agreement). I tend to consider utilitarianism (as typically described) to be naive, verging on silly. The U" option you describe at least seems to have the coherency required to be implemented in practice without a catastrophic or absurd result.

3Wei Dai
Since you found our agreement unexpected, it may give you a better perspective on this post to know that while it's mostly addressed to utilitarians, I'm not a utilitarian myself. I do have a certain amount of intellectual sympathy towards utilitarianism, and would like to see its most coherent positions, and hear its strongest arguments, so my post was written in that spirit. I'd also be quite interested in exploring other potentially viable approaches to moral philosophy. Given that you consider utilitarianism to be naive and verging on silly, what approaches do you find promising?
Let's say I agree with the specific statements, which would be unexpected by the context if I were a utilitarian. I wouldn't dream of accusing you of being a utilitarian given how much of an insult that would be given my position. "The universe should be made to maximise my utility (best satisfy my preferences over possible states of the universe) " is my moral philosophy. From that foundation altruism and cooperation considerations come into play. Except that some people define that as not a moral philosophy.
That seems to be this: - which I would classify as some kind of moral philosophy. It seems to be much more biologically realistic than utilitariaism. Utilitarianism appears to be an ethical system based on clearly signalling how unselfish and nice you are. The signal seems somewhat tarnished by being pretty unbelievable, though. Do these people really avoid nepotism and favouring themselves? Or are they kidding themselves about their motives in the hope of deceiving others?
It sounds silly and arbitrary when you discharge the references: "The universe should be made to maximise widrifid's utility (best satisfy widrifid's preferences over possible states of the universe)" Why not replace "widfirid" with "amcknight"? The fact that you happen to be yourself doesn't sound like a good enough reason.
Is there some reason why moral philosophy can't be arbitrary?
Yes. If you want your beliefs to pay rent, then you need to choose between features of reality rather than simply choose arbitrarily. Is there anything else that you believe arbitrarily? Why make an exception for moral philosophy? Reminds me of Status Quo Bias or keeping faith even after learning about other religions. Can you name a relevant difference?
That sounds like a good description of moralizing to me!
0timtyler goes over some of the common issues.

Given that most attempts at thinking through the consequences of utilitarian ethics resemble a proof by contradiction that utilitarianism cannot be a good basis for ethics it surprises me how many people continue to embrace it and try to fix it.

Can you provide a link to an academic paper or blog post that discusses this in more depth?
The kind of thought experiments (I think) Matt is referring to are so basic I don't know of any papers that go into them in depth. They get discussed in intro level ethics courses. For example: A white woman is raped and murdered in segregation era deep south. Witnesses say the culprit was black. Tensions are high and there is a high likelihood race riots break out and whites just start killing blacks. Hundreds will die unless the culprit is found and convicted quickly. There are no leads but as police chief/attorney/governor you can frame an innocent man to charge and convict quickly. Both sum and average utilitarianism suggest you should. Same goes for pushing fat people in front of runaway trolleys and carving up homeless people for their organs. Utilitarianism means biting all these bullets or else accepting these as proofs by reductio. Edit: Or structuring/defining utilitarianism in a way that avoids these issues. But it is harder than it looks.
4Paul Crowley
Or seeing the larger consequences of any of these courses of action. (Well, except for pushing the fat man in front of the trolley, which I largely favour.)
I'm comfortable positing things about these scenarios such that there are no larger consequences of these courses of action- no one finds out, no norms are set etc. I do suspect an unusually high number of people here will want to bite the bullet.(Interesting side effect of making philosophical thought experiments hilarious: it can be hard to tell if someone is kidding about them) But it seems well worth keeping in mind that the vast majority would find a world governed by the typical forms of utilitarianism to be highly immoral.
5Paul Crowley
These are not realistic scenarios as painted. In order to be able to actually imagine what really might be the right thing to do if a scenario fitting these very alien conditions arose, you'll have to paint a lot more of the picture, and it might leave our intuitions about what was right in that scenario looking very different.
They're not realistic because they're designed to isolate the relevant intuitions from the noise. Being suspicious of our intuitions about fictional scenarios is fine- but I don't think that lets you get away without updating. These scenarios are easy to generate and have several features in common. I don't expect anyone to give up their utilitarianism on the basis of the above comment-- but a little more skepticism would be good.
4Paul Crowley
I'm happy to accept whatever trolley problem you care to suggest. Those are artificial but there's no conceptual problem with setting them up in today's world - you just put the actors and rails and levers in the right places and you're set. But to set up a situation where hundreds will die in this possible riot, and yet it it certain that no-one will find out and no norms will be set if you frame the guy - that's just no longer a problem set in a world anything like our world, and I'd need to know a lot more about this weird proposed world before I was prepared to say what the right thing to do in it might be.
To the extent that I have been exposed to these types of situations, it seems that the contradictions stem from contrived circumstances. I've also never had a simple and consistent deontological system lined out for me that didn't suffer the same flaws. So I guess what I'm really getting at is that I see utilitarianism as a good heuristic for matching up circumstances with judgments that "feel right" and I'm curious if/why OP thinks the heuristic is bad.
Not sure what this means. Nor have I. My guess is that simple and consistent is too much to ask of any moral theory. It is definitely a nice heuristic. I don't know what OP thinks but a lot of people here take it to be the answer, instead of just a heuristic. That may be the target of the objection.
"Exposed to these situations" means to say that when someone asks about utilitarianism they say, "if there was a fat man in front of a train filled with single parents and you could push him out of the way or let the train run off a cliff what would you do?" To which my reply is, "When does that ever happen and how does answering that question help me be more ethical?" Digression: if a decision-theoretic model was translated into a set of axiomatic behaviors could you potentially apply Godel's Incompleteness Theorem to prove that simple and consistent is in fact too much to ask?

Please don't throw around Gödel's Theorem before you've really understood it— that's one thing that makes people look like cranks!

"When does that ever happen and how does answering that question help me be more ethical?"

Very rarely; but pondering such hypotheticals has helped me to see what some of my actual moral intuitions are, once they are stripped of rationalizations (and chances to dodge the question). From that point on, I can reflect on them more effectively.

Sorry to sound crankish. Rather than "simple and inconsistent" I might have said that there were contrived and thus unanswerable questions. Regardless it distracted and I shouldn't have digressed at all. Anyway thank you for the good answer concerning hypotheticals.
These thought experiments aren't supposed to make you more ethical, they're supposed to help us understand our morality. If you think there are regularities in ethics- general rules that apply to multiple situations then it helps to concoct scenarios to see how those rules function. Often they're contrived because they are experiments, set up to see how the introduction of a moral principle affects our intuitions. In natural science experimental conditions usually have to be concocted as well. You don't usually find two population groups for whom everything is the same except for one variable, for example. Agree with orthonormal. Not sure what this would mean. I don't think Godel even does that for arithmetic-- arithmetic is simple (though not trivial) and consistent, it just isn't complete. I have no idea if ethics could be a complete axiomatic system, I haven't done much on completeness beyond predicate calculus and Godel is still a little over my head. I just mean that any simple set of principles will have to be applied inconsistently to match our intuitions. This, on moral particularism, is relevant.
I didn't use "consistence" very rigorously here, I more meant that even if a principle matched our intuitions there would be unanswerable questions. Regardless, good answer. The link seems to be broken for me, though.
Link is working fine for me. It is also the first google result for "moral particularism", so you can get there that way.
Tried that and it gave me the same broken site. It works now.
Why on Earth was this downvoted?
By "utilitarianism" do you mean any system maximizing expected utility over outcomes, or the subset of such systems that sum/average across persons?
The latter, I don't think it makes much sense to call the former an ethical system, it's just a description of how to make optimal decisions.
This post does have "preference utilitarianism" in its title.
As far as I can tell from the minimal information in that link, preference utilitarianism still involves summing/averaging/weighting utility across all persons. The 'preference' part of 'preference utilitarianism' refers to the fact that it is people's 'preferences' that determine their individual utility but the 'utilitarianism' part still implies summing/averaging/weighting across persons. The link mentions Peter Singer as the leading contemporary advocate of preference utilitarianism and as I understand it he is still a utilitarian in that sense. 'Maximizing expected utility over outcomes' is just a description of how to make optimal decisions given a utility function. It is agnostic about what that utility function should be. Utilitarianism as a moral/ethical philosophy generally seems to advocate a choice of utility function that uses a unique weighting across all individuals as the definition of what is morally/ethically 'right'.
You could be right. I can't see mention of "averaging" or "summing" in the definitions (which! it matters!) - and if any sum is to be performed it is vague about what class of entities is being summed over. However - as you say - Singer is a "sum" enthusiast. How you can measure "satisfaction" in a way that can be added up over multiple people is left as a mystery for readers. I wouldn't assert the second paragraph, though. Satisfying preferences is still a moral philosophy - regardless of whether those preferences belong to an individual agent, or whether preference satisfaction is summed over a group. Both concepts equally allow for agents with arbitrary preferences.
The main Wikipedia entry for Utilitarianism says: Where 'preference utilitarians' links back to the short page on preference utilitarianism you referenced. That combined with the description of Peter Singer as the most prominent advocate for preference utilitarianism suggests weighted summing or averaging, though I'm not clear whether there is some specific procedure associated with 'preference utilitarianism'. Merely satisfying your own preferences is a moral philosophy but it's not utilitarianism. Ethical Egoism maybe or just hedonism. What appears to distinguish utilitarian ethics is that they propose a unique utility function that globally defines what is moral/ethical for all agents.
It seems like a historical tragedy that a perfectly sensible word was ever given the second esoteric meaning.

There's a far worse problem with the concept of 'utility function' as a static entity than that different generations have different preferences: The same person has very different preferences depending on his environment and neurochemistry. A heroin addict really does prefer heroin to a normal life (at least during his addiction). An ex-junkie friend of mine wistfully recalls how amazing heroin felt and how he realized he was failing out of school and slowly wasting away to death, but none of that mattered as long as there was still junk. Now, it's no... (read more)

What's wrong with wireheading? Seriously. Heroin is harmful for numerous health and societal reasons, but if we solve those problems with wireheading, I don't see the problem with large portions of humanity choosing ultimate pleasure forever. We could also make some workarounds: for instance, timed wireheading, where you wirehead for a year and then set your brain to disable wireheading for another year, or a more sophisticated Fun Theory based version of wireheading that allows for slightly more complex pleasures.
There a difference between people choosing wireheading and a clever AI making that choice for them.
Why did your ex-junkie friend quit? That may suggest a possible answer to your dilemma.
Combination of being broke, almost dying, mother-interference, naltrexone, and being institutionalized. I think there are many that do not quit though.
There are people who die from their drug habits but there are also many recovered former addicts. There are also people who sustain a drug habit without the rest of their life collapsing completely, even a heroin habit. It is clearly possible for people to make choices other than just taking another hit.
This is obviously true, but I'm not suggesting that all people will become heroin junkies. I'm using heroin addiction as an example of where neurochemistry changes directly change preference and therefore utility function- IE the 'utility function' is not a static entity. Neurochemistry differences among people are vast, and heroin doesn't come close to a true 'wire-head,' and yet some percent of normal people are susceptible to having it alter their preferences to the point of death. After uploading/AI, interventions far more invasive and complete than heroin will be possible, and perhaps widely available. It is nice to think that humans will opt not to use them, and most people with their current preferences in tact might not even try (as many have never tried heroin), but if preferences are constantly being changed (as we will be able to do), then it seems likely than people will eventually slide down a slippery slope towards wire-heading, since, well, it's easy.
I find the prospect of an AI changing people's preferences to make them easier to satisfy rather disturbing. I'm not really worried about people changing their own preferences or succumbing en-masse to wireheading. It seems to me that if people could alter their own preferences then they would be much more inclined to move their preferences further away from a tendency towards wireheading. I see a lot more books on how to resist short term temptations (diet books, books on personal finance, etc.) than I do on how to make yourself satisfied with being fat or poor which suggests that generally people prefer preference changes that work in their longer term rather than short term interests.

The AI is now reflectively consistent, but is this the right outcome?

I'd say so.

I wan't the AI to maximize my utility, and not dilute the optimization power with anyone else's preferences (by definition). Of course, to the extent that I care about others they will get some weight under my utility function, but any more than that is not something I'd wan't.

Anything else is just cooperation, which is great, since it greatly increases the chance of it working- and even more so the chance of it working for you. The group of all people the designers can easi... (read more)

I wouldn't be so quick to discard the idea of the AI persuading us that things are pretty nice the way they are. There are probably strong limits to the persuadability of human beings, so it wouldn't be a disaster. And there is a long tradition of advice regarding the (claimed) wisdom of learning to enjoy life as you find it.

6Wei Dai
Suppose the AI we build (AI1) finds itself insufficiently intelligent to persuade us. It decides to build a more powerful AI (AI2) to give it advice. AI2 wakes up and modifies AI1 into being perfectly satisfied with the way things are. Then, mission accomplished, they both shut down and leave humanity unchanged. I think what went wrong here is that this formulation of utilitarianism isn't reflectively consistent. If there are, then the AI would modify us physically instead.
Why do you say these "strong limits" exist? What are they? I do think that everyone being persuaded to be Bodhisattvas is a pretty good possible future, but I do think there are better futures that might be given up by that path. (immortal cyborg-Bodhisattvas?)
Strong limits? You mean the limit of how much the atoms in a human can be rearranged and still be called 'human'?

Obviously, weighing equally over every logically possible utility function will produce a null result - for every utility function, a corresponding utility function with the opposite preferences will exist.

3Wei Dai
I agree, of course. The question was a rhetorical one to point out the incomplete nature of Robin's solution.
Curse the lack of verbal cues on the Interwebs!
That doesn't make it wrong, it makes it impotent. To break this "tie", you'd end up preferring to create creatures that existing creatures would prefer exist, and then preferring to satisfy their preferences. Which makes sense to me.
I don't understand your objection to my remark - I was analyzing the system Wei_Dai described, which evidently differs from yours.

If the AI is capable of reflection and self-modification, it should immediately notice that it would maximize its expected utility, according to its current utility function, by modifying itself to use U’’(w) = sum of w’s utilities according to people who existed at time T0, where T0 is a constant representing the time of self-modification.

To do this it would have to be badly programmed. We start out with a time-dependent utility function U'(t). We propose to change it to U'', where U''(t) = U'(0) for all times t. But those are different functions! The ... (read more)

I agree with the above comments that concern for future individuals would be contained in the utility functions of people who exist now, but there's an ambiguity in the AI's utility function in that it seems forbidden to consider the future or past output of it's utility function. By limiting itself to the concern of the people who currently exist, if it were to try and maximize this output over all time it would then be concerning itself with people who do not yet or no longer exist, which is at direct odds with its utility function. Being barred from such considerations, it could make sense to change it's own utility function to restrict concern to the people existing at that tame, IF this is what most satisfied the preference of those people. While the default near-sightedness of people is bad news here, if the AI succeeds in modelling us as "smarter, more the people we want to be" etc, then its utility function seems unlikely to become so fixed in time.

creating a bunch of new people whose preferences are more easily satisfied, or just use its super intelligence to persuade us to be more satisfied with the universe as it is.

Should the whole future of the universe be shaped only by the preferences of those who happen to be alive at some arbitrary point in time?

Well, making people's preferences coincide with the universe by adjusting people's preferences is not possible if people prefer their preferences not to be adjusted to the universe. Or possible only to the extent people currently prefer being chan... (read more)

I believe you can strip the AI of any preferences towards human utility functions with a simple hack.

Every decision of the AI will have two effects on expected human utility: it will change it, and it will change the human utility functions.

Have the AI make its decisions only based on the effect on the current expected human utility, not on the changes to the function. Add a term granting a large disutility for deaths, and this should do the trick.

Note the importance of the "current" expected utility in this setup; an AI will decide whether to in... (read more)

What if death isn't well-defined? What if the AI has the option of cryonically freezing a person to save their life - but then being frozen, that person does not have any "current" utility function, so the AI can then disregard them completely. Situations like this also demonstrate that more generally, trying to satisfy someone's utility function may have an unavoidable side-effect of changing their utility function. These side-effects may be complex enough that the person does not forsee them, and it is not possible for the AI to explain them to the person. I think your "simple hack" is not actually that simple or well-defined.
It's simple, it's well defined - it just doesn't work. Or at least, work naively the way I was hoping. The original version of the hack - on one-shot oracle machines - worked reasonably well. This version needs more work. And I shouldn't have mentioned deaths here; that whole subject requires its own seperate treatment.
What keeps the AI from immediately changing itself to only care about the people's current utility function? That's a change with very high expected utility defined in terms of their current utility function and one with little tendency to change their current utility function. Will you believe that a simple hack will work with lower confidence next time?
Slightly. I was counting on this one getting bashed into shape by the comments; it wasn't so in future, I'll try and do more of the bashing myself.
You meant "any preferences towards MODIFYING human utility functions".

Related question: What is the purpose of taking into consideration the preferences of people NOT around to deal with the AI?

The dead and the potential-future-people, not to mention the people of other possible worlds, haven't got any say in anything that happens now in this world. This is because it is physically impossible for us (people in the present of this possible world) to find out what those preferences are. At best, we can only guess and extrapolate.

Unless the AI has the ability to find out those preferences, it ought to weigh currently our preferences more heavily because of that additional certainty.

Why take into account the preferences of anyone other than the builders of the AI, other than via the fact that those builders may care about those other creatures?

So are we going to take into account the preferences of the AI itself? Or are we going to violate its rights by creating its preferences based on our current liking? What about the other AI's and their preferences? Obviously this is a paradox which arises by considering to please imaginary entities.

My version of utilitarianism is "dealism", and the way I'd suggest thinking about this is in terms of the scope of the implicit "deal" you are implementing. At one extreme you as dictator just enforce your temporary personal preferences over everything, while at the other extreme you weigh the preferences of all creatures who ever have existed or ever could exist. Doing anything but the later may be a slippery slope. First you'll decide to ignore possible creatures, then future creatures, then animals, then maybe people with low IQ, ... (read more)

7Wei Dai
Robin, I don't understand why you refer to it as "dealism". The word "deal" makes it sound as if your moral philosophy is more about cooperation than altruism, but in that case why would you give any weight to the preferences of animals and people with low IQ (for example), since they have little to offer you in return?
Deals can be lopsided. If they have little to offer, they may get little in return.
This seems to provide an answer to the question you posed above. Chickens have very little to offer me other than their tasty flesh and essentially no capacity to meaningfully threaten me which is why I don't take their preferences into account. If you're happy with lopsided deals then there's how you draw the line. This seems like a perfectly reasonable position to take but it doesn't sound anything like utilitarianism to me.
Turns out, the best deals look a lot like maximizing weighted averages of the utilities of affected parties.
Well the weighting is really the crux of the issue. If you are proposing that weighting should reflect both what the affected parties can offer and what they can credibly threaten then I still don't think this sounds much like utilitarianism as usually defined. It sounds more like realpolitik / might-is-right.
4Wei Dai
I disagree. Certainly there are examples where the best deals do not look like maximizing weighted averages of the utilities of affected parties, and I gave one here. Are you aware of some argument that these kinds of situations are not likely in real life? I also agree with mattnewport's point, BTW.
1Wei Dai
Ok, I didn't realize that you would weigh others' preferences by how much they can offer you. My followup question is, you seem willing to give weight to other people's preferences unilaterally, without requiring that they do the same for you, which is again more like altruism than cooperation. (For example you don't want to ignore animals, but they can't really reciprocate your attempt at cooperation.) Is that also a misunderstanding on my part?
Creatures get weight in a deal both because they have things to offer, and because others who have things to offer care about them.
But post-FAI, how does anyone except the FAI have anything to offer? Neither anything to offer, nor anything to threaten with. The FAI decides all, does all, rules all. The question is, how should it rule? Since no creature besides the FAI has anything to offer, weighting is out of the equation, and every present, past, and potential creature's utilities should count the same.
1Wei Dai
I think an FAI's values would reflect the programmers' values (unless it turns out there is Objective Morality or something else unexpected). My understanding now is that if Robin were the FAI's programmer, the weights he would give to other people in its utility function would depend on how much they helped him create the FAI (and for people who didn't help, how much the helpers care about them).
Sounds plenty selfish to me. Indeed, no different than might-is-right.
4Wei Dai
Instead of might-is-right, I'd summarize it as "might-and-the-ability-to-provide-services-to-others-in-exchange-for-what-you-want-is-right" and Robin would presumably emphasize the second part of that.
You can care a lot about other people no matter how much they help you, but should help those who helps you even more for game-theoretic reasons. This doesn't at all imply "selfishness".
There does exist no goal that is of objective moral superiority. Trying to maximize happiness for everybody is just the selfish effort to survive, given that not you but somebody else wins. So we're trying to survive by making everybody wanting to make everybody else happy? What if the largest number of possible creatures is too different from us to peacefully, or happily, coexist with us?
For one example, see the maximum entropy principle. My page on the topic:

If you believe that human morality is isomorphic to preference utilitarianism - a claim that I do not endorse, but which is not trivially false - then using preferences from a particular point in time should work fine, assuming those preferences belong to humans. (Presumably humans would not value the creation of minds with other utility functions if this would obligate us to, well, value their preferences.)

use its super intelligence to persuade us to be more satisfied with the universe as it is.

Actually, I would consider this outcome pretty satisfactory. My life is (presumably) unimaginably good compared to that of a peasant from the 1400s but I'm only occasionally ecstatic with happiness. It's not clear to me that a radical upgrade in my standard of living would change this...

Preferences and emotions are entirely distinct (in principle). The original post is talking about changing preferences (though "satisfied" does sound more like it's about emotions), you're talking about changing emotions. I think I'd go for a happiness upgrade as well, but (almost by definition) I don't want my 'real' preferences (waves hands furiously) to change.
You don't mind if your preferences or beliefs change to make you happier with the current state of the universe? Then you're in luck!

The AI might [...] just use its super intelligence to persuade us to be more satisfied with the universe as it is.

Well, that can’t be what we want.

Actually, I believe Buddhism says that this is exactly what we want.

As far as I can tell, all this post says is that utilitarianism is entirely dependant on a given set of preferences, and its outcomes will only be optimal from the perspective of those preferences.

This is true, but I'm not sure its all that interesting.

I'm convinced of utilitarianism as the proper moral construct, but I don't think an AI should use a free-ranging utilitarianism, because it's just too dangerous. A relatively small calculation error, or a somewhat eccentric view of the future can lead to very bad outcomes indeed.

A really smart, powerful AI, it seems to me, should be constrained by rules of behavior (no wiping out humanity/no turning every channel into 24-7 porn/no putting everyone to work in the paperclip factory), The assumption that something very smart would necessarily reach correct u... (read more)

The ideas under consideration aren't as simple as having the AI act by pleasure utlitarianism or preference utilitarianism, because we actually care about a whole lot of things in our evaluation of futures. Many of the things that might horrify us are things we've rarely or never needed to be consciously aware of, because nobody currently has the power or the desire to enact them; but if we miss adding just one hidden rule, we could wind up in a horrible future. Thus "rule-following AI" has to get human nature just as right as "utilitarian AI" in order to reach a good outcome. For that reason, Eliezer et al. are looking for more meta ways of going about choosing a utility function. The reason why they prefer utilitarianism to rule-based AI is another still-disputed area on this site (I should point out that I agree with Eliezer here).
Why are you more concerned about something with unlimited ability to self reflect making a calculation error than about the above being a calculation error? The AI could implement the above if the calculation implicit in it is correct.

I am not sure the exact semantics of the word "utilitarism" in your post, but IMO it would be better to use multi-dimensional objective function rather than simple numbers.

For example killing a single moral agent should outweigh convenience gain by any number of agents. (see dust speck vs. torture). That can be modeled by a two-dimensional objective, the first number represents the immorality of choice and the second is the total preference. The total order over the scoring would be a lexicographic order of the two components.

Another aspect is t... (read more)

For example killing a single moral agent should outweigh convenience gain by any number of agents. (see dust speck vs. torture). That can be modeled by a two-dimensional objective, the first number represents the immorality of choice and the second is the total preference. The total order over the scoring would be a lexicographic order of the two components.

If not killing has lexical priority, all other concerns will be entirely overridden by tiny differences in the probability of killing, in any non-toy case.

Anyway, our preferences seem more directly not to give life lexical priority. We're willing to drive to the store for convenience, and endorse others doing so, even though driving imposes a nontrivial risk of death on oneself and others.

One can't really equate risking a life with outright killing. If we want any system that is aligned with human morality we just can't make decision based on the desirability of the outcome. For example: "Is it right to kill a healthy person to give its organs to five terminally ill patients and therefore save five lives at a cost of one." Our sense says killing an innocent bystander as immoral, even if it saves more lives. (See It is possible to move away from human morality, but the end result will be that most humans will perceive the decisions of the AI monstrous at least in the beginning... ;)
You really don't even have to go that far in your justification, if you're clever. You could just note that the actual result of such a practice is to make people go to greater efforts to avoid being in a position whereby they'll be selected for murder/organ harvesting, resulting in an aggregate waste of resources on such risk avoidance that is bad even from a utilitarian standpoint. It's much harder to find scenarios where such an action is justified on utilitarian grounds than you might think.
That's only true if this 'practice' is made into law, or something. What if it's just your own personal moral conviction? Would you kill a healthy person to save five others if you thought you could get away with it?
Not at all. If it were revealed that a doctor had deliberately killed a patient to harvest the organs, it's not like people will say, "Oh, well, I guess the law doesn't make all doctors do this, so I shouldn't change my behavior in response." Most likely, they would want to know how common this is, and if there are any tell-tale signs that a doctor will act this way, and avoid being in a situation where they'll be harvested. You have to account for these behavioral adjustments in any honest utilitarian calculus. Likewise, the Catholic Church worries about the consequence of one priest breaking confidence of a confessioner, even if they don't make it a policy to do so afterward. Unless I were under duress, no, but I can't imagine a situation how I'd be in the position to make such a decision without being under duress! And again, I have to factor in the above calculation: if it's not a one time thing, I have to account for the information that I'm doing this "leaking out", and the fact that my very perceptions will be biased to artificially make this more noble than it really is. Btw, I was recently in an argument with Gene Callahan on his blog about how Peter Singer handles these issues (Singer targets the situation you've described), but I think he deleted those posts.
That's what I told the judge when loaded one bullet into my revolver and went on a 'Russian killing spree'. He wasn't impressed.
If you didn't kill anyone, what were you convicted of - and what sentence did you get?
Edit: Blueberry's interpretation may be more accurate. The sentence would be reckless endangerment in that case, possibly multiple counts; search engines suggest this is a gross misdemeanor in Washington State, which would make a typical maximum sentence of about a year. (Were I the judge, I would schedule the year for each count to be served successively, but that's me.)
In Washington, that's at least attempted manslaughter, which leads to a 10 year maximum. It may even be attempted murder, though we'd need to check the case law.
This is Australia. He started with possession of an unlicensed firearm and worked up from there. The worst part was the appeal. I showed them the security footage in which I clearly reseeded the revolver between each of my four shots rather then firing four chambers sequentially and they wouldn't reduce the sentence by 22%. If one of my shots had gone off on the second shot we could have seen if the judge was a frequentist. Would he call in a psychologist as an expert witness? "Was the defendant planning to shoot twice or shoot up to four times until the gun fired?"
Correction: a Class A felony has a maximum sentence of life in prison, according to your link. Otherwise, yeah, you're right.
That would be attempted murder, with a sentence of usually at least 20 years.
There is a huge difference between choosing a random person to kill and endangering someone. Our society already expects that there are risks to life that are not killing: for example airlines can make analysis about how much certain security procedures cost and how much lives do they save. If they can show that if it costs more than (I guess) 7 million dollars to save one life, then it is not reasonable to implement that measure.
Even if you can cleanly distinguish them for a human, what's the difference from the perspective of an effectively omniscient and omnipotent agent? (Whether or not an actual AGI would be such, a proposed morality should work in that case.) Er, doesn't that just mean human morality assigns low desirability to the outcome innocent bystander killed to use organs? (That is, if that actually is a pure terminal value - it seems to me that this intuition reflects a correct instrumental judgment based on things like harms to public trust, not a terminal judgment about the badness of a death increasing in proportion to the benefit ensuing from that death or something.) If we want a system to be well-defined, reflectively consistent, and stable under omniscience and omnipotence, expected-utility consequentialism looks like the way to go. Fortunately, it's pretty flexible.
To me, "omniscience" and "omnipotence" seem to be self-contradictory notions. Therefore, I consider it a waste of time to think about beings with such attributes. OK. Do you think that if someone (e.g. an AI) kills random people for positive overall effect but manages to convince the public that they were random accidents (and therefore public trust is maintained), then it is a morally acceptable option?
That's why I put "I am unsure how you define utilitarism". If you just evaluate the outcome, then you see f(1 dead)+f(5 alive). If you evaluate the whole process, you see "f(1 guy killed as an innocent bystander) + f(5 alive)", which may have a much lower desirability due to morality impact. The same consideration applies to the OP: If you only evaluate the final outcome: you may think that killing hard to satisfy people is a good thing. However if you add the morality penalty of killing innocent people, then the equation suddenly changes. The question of 1/multi-dimensional objective remains: the extreme liberal moralism would say that it is not allowed to take one dollar from a person, even if it could pay for saving one life, or killing one innocent bystander is wrong even if it could save billion lifes. Just because our agents are autonomous entities and they have unalienable rights to life, property, freedom, that can't be violated, even for the greater good. The above problems can only be solved if the moral agents voluntarily opt into a system that takes away a portion of their individual freedom for a greater good. However this system should not give arbitrary power to a single entity but every (immoral) violation of autonomy should happen for a well defined "higher" purpose. I don't say that this is the definitive way to address morality abstractly in the presence of a superintelligent entity, these are just reiterations of some of the moral principles our liberal western democracy are built upon.