(Inspired by a recent conversation with Robin Hanson.)

Robin Hanson, in his essay on "Minimal Morality", suggests that the unreliability of our moral reasoning should lead us to seek simple moral principles:

"In the ordinary practice of fitting a curve to a set of data points, the more noise one expects in the data, the simpler a curve one fits to that data.  Similarly, when fitting moral principles to the data of our moral intuitions, the more noise we expect in those intuitions, the simpler a set of principles we should use to fit those intuitions.  (This paper elaborates.)"

In "the limit of expecting very large errors of our moral intuitions", says Robin, we should follow an extremely simple principle - the simplest principle we can find that seems to compress as much morality as possible.  And that principle, says Robin, is that it is usually good for people to get what they want, if no one else objects.

Now I myself carry on something of a crusade against trying to compress morality down to One Great Moral Principle.  I have developed at some length the thesis that human values are, in actual fact, complex, but that numerous biases lead us to underestimate and overlook this complexity.  From a Friendly AI perspective, the word "want" in the English sentence above is a magical category.

But Robin wasn't making an argument in Friendly AI, but in human ethics: he's proposing that, in the presence of probable errors in moral reasoning, we should look for principles that seem simple to us, to carry out at the end of the day.  The more we distrust ourselves, the simpler the principles.

This argument from fitting noisy data, is a kind of logic that can apply even when you have prior reason to believe the underlying generator is in fact complicated.  You'll still get better predictions from the simpler model, because it's less sensitive to noise.

Even so, my belief that human values are in fact complicated, leads me to two objections and an alternative proposal:

The first objection is that we do, in fact, have enough data to support moral models that are more complicated than a small set of short English sentences.  If you have a thousand data points, even noisy data points, it may be a waste of evidence to try to fit them to a straight line, especially if you have prior reason to believe the true generator is not linear.

And my second fear is that people underestimate the complexity and error-proneness of the reasoning they do to apply their Simple Moral Principles.  If you try to reduce morality to the Four Commandments, then people are going to end up doing elaborate, error-prone rationalizations in the course of shoehorning their real values into the Four Commandments.

But in the ordinary practice of machine learning, there's a different way to deal with noisy data points besides trying to fit simple models.  You can use nonparametric methods.  The classic example is k-nearest-neighbors:  To predict the value at a new point, use the average of the 10 nearest points previously observed.

A line has two parameters - slope and intercept; to fit a line, we try to pick values for the slope and intercept that well-match the data.  (Minimizing squared error corresponds to maximizing the likelihood of the data given Gaussian noise, for example.)  Or we could fit a cubic polynomial, and pick four parameters that best-fit the data.

But the nearest-neighbors estimator doesn't assume a particular shape of underlying curve - not even that the curve is a polynomial.  Technically, it doesn't even assume continuity.  It just says that we think that the true values at nearby positions are likely to be similar.  (If we furthermore believe that the underlying curve is likely to have continuous first and second derivatives, but don't want to assume anything else about the shape of that curve, then we can use cubic splines to fit an arbitrary curve with a smoothly changing first and second derivative.)

And in terms of machine learning, it works.  It is done rather less often in science papers - for various reasons, some good, some bad; e.g. academics may prefer models with simple extractable parameters that they can hold up as the triumphant fruits of their investigation:  Behold, this is the slope!  But if you're trying to win the Netflix Prize, and you find an algorithm that seems to do well by fitting a line to a thousand data points, then yes, one of the next things you try is substituting some nonparametric estimators of the same data; and yes, this often greatly improves the estimates in practice.  (Added:  And conversely there are plenty of occasions where ridiculously simple-seeming parametric fits to the same data turn out to yield surprisingly good predictions.  And lots of occasions where added complexity for tighter fits buys you very little, or even makes predictions worse.  In machine learning this is usually something you find out by playing around, AFAICT.)

It seems to me that concepts like equality before the law, or even the notion of writing down stable laws in the first place, reflect a nonparametric approach to the ethics of error-prone moral reasoning.

We don't suppose that society can be governed by only four laws.  In fact, we don't even need to suppose that the 'ideal' morality (obtained as the limit of perfect knowledge and reflection, etc.) would in fact subject different people and different occasions to the same laws.  We need only suppose that we believe, a priori, that similar moral dilemmas are likely ceteris paribus to have similar resolutions, and that moral reasoning about adjustment to specific people is highly error-prone - that, given unlimited flexibility to 'perfectly fit' the solution to the person, we're likely to favor our friends and relatives too much.  (And not in an explicit, internally and externally visible way, that we could correct just by having a new rule not to favor friends and relatives.)

So instead of trying to recreate, each time, the judgment that is the perfect fit to the situation and the people, we try to use the ethical equivalent of a cubic spline - have underlying laws that are allowed to be complicated, but have to be written down for stability, and are supposed to treat neighboring points similarly.

Nonparametric ethics says:  "Let's reason about which moral situations are at least rough neighbors so that an acceptable solution to one should be at least mostly-acceptable to another; and let's reason about where people are likely to be highly biased in their attempt to adjust to specifics; and then, to reduce moral error, let's enforce similar resolutions across neighboring cases."  If you think that good moral codes will treat different people similarly, and/or that people are highly biased in how they adjust their judgments to different people, then you will come up with the ethical solution of equality before the law.

Now of course you can still have laws that are too complicated, and that try to sneak in too much adaptation to particular situations.  This would correspond to a nonparametric estimator that doesn't smooth enough, like using 1-nearest-neighbor instead of 10-nearest-neighbors, or like a cubic spline that tried to exactly fit every point without trying to minimize the absolute value of third derivatives.

And of course our society may not succeed at similarly treating different people in similar situations - people who can afford lawyers experience a different legal system.

But if nothing else, coming to grips with the concept of nonparametric ethics helps us see the way in which our society is failing to deal with the error-proneness of its own moral reasoning.

You can interpret a fair amount of my coming-of-age as my switch from parametric ethics to nonparametric ethics - from the pre-2000 search for simple underlying morals and my attempts to therefore reject values that seemed complicated; to my later acceptance that my values were actually going to be complicated, and that both I and my AI designs needed to come to terms with that.  Friendly AI can be viewed as the problem of coming up with - not the Three Simple Laws of Robotics that are all a robot needs - but rather a regular and stable method for learning, predicting, and renormalizing human values that are and should be complicated.


60 comments, sorted by Click to highlight new comments since: Today at 8:07 AM
New Comment

Imagine we had a billion data points which were so noisy that they were the equivalent of a thousand pretty clear data points. Then a parametrized model should have about a thousand parameters, and a non-parametric approach should average about a million noisy neighbors. If we have the equivalent of about ten clear data points, we should have a model with ten free parameters or we should average a hundred million noisy neighbors. The parametrized approach then seems to have clear advantages in terms of how much you need to communicate and remember to apply the approach, and in how easy it is for others to verify that you are applying the approach well. Remember that much of human morality is about social norms that we check if others are following, and reward or punish them accordingly. I suspect that these communication advantages are why academics focus on parametrized models. On accuracy, are there good canonical sources showing that non-param tends to beat param holding other things constant? Also, it isn't clear to me that non-param approaches don't have just as much trouble with error-prone application as param approaches.

On accuracy, are there good canonical sources showing that non-param tends to beat param holding other things constant?

Definitely not. Non-param is something you do in a particular sort of situation. Lots of data, true generator hard to model, lots of neighborhood structure in the data, a la the Netflix Prize? Definitely time to try non-parametric. Twenty data points that look like mostly a straight line? I'd say use a straight line.

A parameterized model with a thousand generic parameters, that isn't supposed to represent the true underlying generator or a complicated causal model, but rather fit an arbitrary curve directly to the data, that we think is regular yet complicated, would I think be considered "nonparametric statistics" as such things go. Splines with lots of joints are technically parameterized curves, but they are (I think) considered a nonparametric statistical method.

The most important canonical rule of modern machine learning is that we don't have good, simple, highly accurate canonical rules for predicting in advance what sort of algorithm will work best on the data. And so instead you've got to try different things, and accumulate experience with complicated, error-prone, even non-verbalizable rules about which sorts of algorithms to try. In other words, a machine learning expert would have both parametric and nonparametric algorithms for determining whether to use parametric or nonparametric algorithms...

Actually in social science problems in high dimensional spaces it is rather common to have parametrized models with hundreds or more parameters, especially when one has hundreds of thousands or more data points. For example, one often uses "fixed effects" for individual times or spatial regions, and matrices of interaction terms between basic effects. Folks definitely use param stat methods for such things.

Sure, lots of locally parametric statistics isn't the same thing as having so many global parameters as to make few assumptions about the shape of the curve. Still, I think this is where we both nod and agree that there's no absolute border between "parametric" and "nonparametric"?

Well there are clearly many ways to define that distinction. But regarding the costs of communicating and checking, the issue is whether one tells the model or the data set plus metric. Academics usually prefer to communicate a model, and I'm guessing that given their purposes this is probably usually best.

Sure. Though I note that if you're already communicating a regional map with thousands of locally-fit parameters, you're already sending a file, and at that point it's pretty much as easy to send 10MB as 10KB, these days. But there's all sorts of other reasons why parametric models are more useful for things like rendering causal predictions, relating to other knowledge and other results, and so on. I'm not objecting to that, per se, although in some cases it provides a motive to oversimplify and draw lines through graphs that don't look like lines...

...but I'm not sure that's relevant to the original point. From my perspective, the key question is to what degree a statistical method assumes that the underlying generator is simple, versus not imposing much of its own assumptions about the shape of the curve.

k-nearest-neighbors seems to be a reasonable method of interpolation, but what about extrapolation? I'm having trouble seeing how nonparametric methods can deal with regions far away from existing data points.

I'm having trouble seeing how nonparametric methods can deal with regions far away from existing data points.

With very wide predictive distributions, if they are Bayesian nonparametric methods. See the 95% credible intervals (shaded pink) in Figure 2 on page 4, and in Figure 3 on page 5, of Mark Ebden's Gaussian Processes for Regression: A Quick Introduction.

(Carl Edward Rasmussen at Cambridge and Arman Melkumyan at the University of Sydney maintain sites with more links about Gaussian processes and Bayesian nonparametric regression. Also see Bayesian neural networks which can justifiably extrapolate sharper predictive distributions than Gaussian process priors can.)

See also Modeling human function learning with Gaussian processes, by Tom Griffiths, Chris Lucas, Joseph Jay Williams, and Michael Kalish, in NIPS 21:

[. . .] we look at two quantitative tests of Gaussian processes as an account of human function learning: reproducing the order of difficulty of learning functions of different types, and extrapolation performance. [. . .]

Predicting and explaining people’s capacity for generalization – from stimulus-response pairs to judgments about a functional relationship between variables – is the second key component of our account. This capacity is assessed in the way in which people extrapolate, making judgments about stimuli they have not encountered before. [. . .] Both people and the model extrapolate near optimally on the linear function, and reasonably accurate extrapolation also occurs for the exponential and quadratic function. However, there is a bias towards a linear slope in the extrapolation of the exponential and quadratic functions[. . .]

The first author, Tom Griffiths, is the director of the Computational Cognitive Science Lab at UC Berkeley, and Lucas and Williams are graduate students there. The work of the Computational Cognitive Science Lab is very close to the mission of Less Wrong:

The basic goal of our research is understanding the computational and statistical foundations of human inductive inference, and using this understanding to develop both better accounts of human behavior and better automated systems [. . .]

For inductive problems, this usually means developing models based on the principles of probability theory, and exploring how ideas from artificial intelligence, machine learning, and statistics (particularly Bayesian statistics) connect to human cognition. We test these models through experiments with human subjects[. . .]

Probabilistic models provide a way to explore many of the questions that are at the heart of cognitive science. [. . .]

Griffiths's page recommends the foundations section of the lab publication list.

There are always "nearest" neighbors. You might wish for more data than you have, but you must make do with what you have.

If the data is actually linear or anything remotely resembling linear, then on distant points a linear model will do much better than a nearest-neighbor estimator. Whereas on nearby points, a nearest-neighbor estimator will do as well as a linear model given enough data. So on distant points nearest-neighbor only works if the curve is a particular shape (constant), while on near points it works so long as the curve has anything resembling local neighborhoods.

Well, yes. Nonparametric methods use similarity of neighbors. To predict that which has never been seen before - which is not, on its surface, like things seen before - you need modular and causal models of what's going on behind the scenes. At that point it's parametric or bust.

Your use of the terms parametric vs. nonparametric doesn't seem to be that used by people working in nonparametric Bayesian statistics, where the distinction is more like whether your statistical model has a fixed finite number of parameters or has no such bound. Methods such as Dirichlet processes, and its many variants (Hierarchical DP, HDP-HMM, etc), go beyond simple modeling of surface similarities using similarity of neighbours.

See, for example, this list of publications coauthored by Michael Jordan:

Parametric methods aren't any better at extrapolation. They are arguably worse, in that they make strong unjustified assumptions in regions with no data. The rule is "don't extrapolate if you can possibly avoid it". (And you avoid it by collecting relevant data.)

Parametric extrapolation actually works quite well in some cases. I'll cite a few examples that I'm familiar with:

I don't see any examples of nonparametric extrapolation that have similar success.

A major problem in Friendly AI is how to extrapolate human morality into transhuman realms. I don't know of any parametric approach to this problem that isn't without serious difficulties, but "nonparametric" doesn't really seem to help either. What does your advice "don't extrapolate if you can possibly avoid it" imply in this case? Pursue a non-AI path instead?

What does your advice "don't extrapolate if you can possibly avoid it" imply in this case?

I distinguish "extrapolation" in the sense of an extending an empirical regularity (as in Moore's law) from inferring a logical consequence of of well-supported theory (as in the black hole prediction). This is really a difference of degree, not kind, but for human science, this distinction is a good abstraction. For FAI, I'd say the implication is that an FAI's morality-predicting component should be a working model of human brains in action.

I'm in essential agreement with Wei here. Nonparametric extrapolation sounds like a contradiction to me (though I'm open to counterexamples).

The "nonparametric" part of the FAI process is where you capture a detailed picture of human psychology as a starting point for extrapolation, instead of trying to give the AI Four Great Moral Principles. Applying extrapolative processes like "reflect to obtain self-judgments" or "update for the AI's superior knowledge" to this picture is not particularly nonparametric - in a sense it's not an estimator at all, it's a constructor. But yes, the "extrapolation" part is definitely not a nonparametric extrapolation, I'm not really sure what that would mean.

But every extrapolation process starts with gathering detailed data points, so it confused me that you focused on "nonparametric" as a response to Robin's argument. If Robin is right, an FAI should discard most of the detailed picture of human psychology it captures during its extrapolation process as errors and end up with a few simple moral principles on its own.

Can you clarify which of the following positions you agree with?

  1. An FAI will end up with a few simple moral principles on its own.
  2. We might as well do the extrapolation ourselves and program the results into the FAI.
  3. Robin's argument is wrong or doesn't apply to the kind of moral extrapolation an FAI would do. It will end up with a transhuman morality that's no less complex than human morality.

(Presumably you don't agree with 2. I put it in just for completeness.)

2, certainly disagree. 1 vs. 3, don't know in advance. But an FAI should not discard its detailed psychology as "error"; an AI is not subject to most of the "error" that we are talking about here. It could, however, discard various conclusions as specifically erroneous after having actually judged the errors, which is not at all the sort of correction represented by using simple models or smoothed estimators.

I think connecting this to FAI is far-fetched. To talk technically about FAI you need to introduce more tools first.

A major problem in Friendly AI is how to extrapolate human morality into >transhuman realms. I don't know of any parametric approach to this problem that >isn't without serious difficulties, but "nonparametric" doesn't really seem to help >either. What does your advice "don't extrapolate if you can possibly avoid it" imply in >this case? Pursue a non-AI path instead?

I think it implies that a Friendly sysop should not dream up a transhuman society and then try to reshape humanity into that society, but rather let us evolve at our own pace just attending to things that are relevant at each time.

I have a technical question that's orthogonal to the discussion of morality.

You write:

But the nearest-neighbors estimator doesn't assume a particular shape of underlying curve - not even that the curve is a polynomial. Technically, it doesn't even assume continuity. It just says that we think that the true values at nearby positions are likely to be similar.

I've never studied this algorithm before. But the description "true values at nearby positions are likely to be similar" reads to me like an informal description of continuity, except for the word "likely". Except for that word, I could see giving your description word-for-word as a "preview" for students of the more precise epsilon-delta definition of continuity.

If I understand the algorithm correctly, if you thought it likely that the underlying phenomenon had discontinuities, and you didn't want your model to smooth them out, you wouldn't use this algorithm, would you? Taking averages is sure to smooth out discontinuities in the true values, isn't it?

typo fix

if you thought it likely that the underlying phenomenon has discontinuities, and you didn't want your model to smooth them out

This is a change point problem. See the example in section 3.1 of the PyMC manual:

Consider the following dataset, which is a time series of recorded coal mining disasters in the UK from 1851 to 1962 [. . . .] Occurrences of disasters in the time series is thought to be derived from a Poisson process with a large rate parameter in the early part of the time series, and from one with a smaller rate in the later part. We are interested in locating the change point in the series, which perhaps is related to changes in mining safety regulations.

Taking averages is sure to smooth out discontinuities in the true values, isn't it?

Yes. If the true change points are unknown, then even if every possible underlying phenomenon has discontinuities, the average of credible underlying phenomena (the posterior mean) can still be continuous. See this plot in the Introduction to Bayesian Thinking blog post "A Poisson Change-Point Model" (discontinuous mining safety regulations) and Figure 5 on page 8 of "Bayesian change-point analyses in ecology" by Brian Beckage et al. (discontinuous border between canopy and gap conditions).

Thanks, Steve. So, can I unpack Eliezer's condition

we think that the true values at nearby positions are likely to be similar

as saying of the true values that there might be switchpoints, but most points aren't switchpoints?

Yes, and in the limit of obtaining more data indefinitely, the in-between regions will shrink indefinitely (at least if you're using k-nearest-neighbors and not smooth kernels).

Yes, switchpoints or large smooth jumps.

society is failing to deal with the error-proneness of its own moral reasoning

I am still not sure that there is such a thing as an "error" in moral reasoning. Suppose I decide that the rule, equality before the law should be replaced with, say, equality before the law, except for blacks. In what sense have I made an error?

It seems to me that there is something fishy going on with both Eliezer's and Robin's uses of moral words in the above debate. Both speak of moral errors, of better approximations to the correct moral rules, etc, whilst I presume both would deny that, in a case of moral disagreement about who has made the "error" and who has got it right, there is any objective truth of the matter.

We can probably get further if we talk about the moral truth according to one particular individual; in this case it is more plausible to argue that when Tim Tyler, aged 14, decided that the one objective moral truth was "don't hurt other people", he was simply wrong, even with respect to his own moral views.

If we talk about inferring the moral truth behind noisy moral intuitions, then if people's intuitions or models of those intuitions differ, the errors in their intuitions or models differ. One person can be more mistaken than another. If you reject moral realism you can recast this conversation in terms of commonly shared "moral" components of what we want.

If you reject moral realism you can recast this conversation in terms of commonly shared "moral" components of what we want.

This seems reasonable.

I don't understand. How can you never be wrong about what is right, and still can be wrong about what is a shared component of what is right?

Well, I interpreted Robin to mean "we're going to use this algorithm to aggregate preferences". You would have to drop the language of "errors" though.

Okay. In a form, this view can even be equivalent, if you stick to the same data, a kind of nonparametric view that only recognizes observations. You see this discussion as about summarization of people's behavior (e.g. to implement a policy to which most people would agree), while I see it as about inference of people's hidden wishes behind visible behavior or stated wishes, and maybe as summarization of people's hidden wishes (e.g. to implement a policy that most people would appreciate as it unfolds, but which they won't necessarily agree on at the time).

Note that e.g. signaling can seriously distort the picture of wants seen in behavior.

while I see it as about inference of people's hidden wishes behind visible behavior or stated wishes, and maybe as summarization of people's hidden wishes (e.g. to implement a policy that most people would appreciate as it unfolds, but which they won't necessarily agree on at the time).

I would agree that this is sometimes sensible. However, just because a policy pleases people as it unfolds, we should not infer that that policy constituted the peoples' unique hidden preference.

Events and situations can influence preferences - change what we think of as our values.

Furthermore, it isn't clear where the line between exercising your free will to suppress certain desires and being deluded about your true preferences is.

Basically, this thing is a big mess, philosophically and computationally.

Basically, this thing is a big mess, philosophically and computationally.

The best summation of the topic I've yet come across.

Yes, you're very intelligent. Please expand.

whilst I presume both [Eliezer and Robin] would deny that, in a case of moral disagreement about who has made the "error" and who has got it right, there is any objective truth of the matter.

I expect otherwise. There is a difference between who has got one's preference right, and whose preference is right. The former is meaningful, the latter is not. Two people may prefer different solutions, and both be right, or they may give the same solution and only one of them will be wrong, and they can both agree on who is right or who is wrong in each of these cases. There is no objective truth about what is "objectively preferable", but there is objective truth about what is preferable for a given person, and that person may have an incorrect belief about what that is.

What is preferable for a person here is a two-place word, while who is wrong is a one-place word about what is preferable.

(At least, approximately so, since you'd still need to interpret the two-place function of what is preferable for a given person yourself, adding your own preferences in the mix, but at least where the different people are concerned that influence is much less than the differences given by the person whose morality is being considered. Still, technically, it warrants the opposition to the idea of my-morality vs. your-morality.)

And then, there is the shared moral truth, on which most of the people's preferences agree, but not necessarily what most of the people agree on if you ask them. This is the way in which moral truth is seen through noisy observations.

This system seems problematic for base-level (i.e. assessing murder, theft, etc.) morality. This is because of the limitations of "similarity." We would not want a system that treated all murders as essentially the same. We want one that incorporates certain exceptions. Those exceptions may be "he was trying to kill me, so it was OK that I murder him" or "she did not prove to be a virgin on her wedding night, so I rounded up her brothers and we stoned her to death." It seems like you need a reasonably well-defined parametric morality before you are capable of saying which situations are "similar" and which are "different," since we don't think "She lied about being a virgin" is sufficiently different from the null to merit separate treatment, where "He was trying to kill me" does merit separate treatment.

Though trying to condense morality into singular propositions seems ineffectual. Morality seems to serve many functions, and it deals with vastly different events and behaviours, so assuming a single maxim can properly capture all of morality, or even much of it, seems wrong, or at least unreasonably error-prone.

RH's "People should get what they want..." is the perfect example of this. It's really incredibly weak (in a technical sense), as it says nothing about cases where people do object in any way, and it doesn't even say something about all cases where they don't object ("usually"). It would be much better described as "value demonstrating libertarian belief system" than "universal moral code." It seems like it is a principle at the core of a libertarian ethics system and not present in almost any other ethical system, but it is totally ineffectual at condensing morality, given its tremendously limited scope.

"It is good for people to get what they want when no one else cares" is actually a lot more powerful than it looks. It implies an outcome goodness monotonic in each person's utility of that outcome.

Similarity groups are also magical categories, so you'd just need to redraw the boundaries in your specific contexts.

It seems more useful to defer the problem of "what is right" temporarily and pursue instead the related problem: "what do people think is right?". This latter question can be approached from the perspective of traditional science. One develops a descriptive moral theory that makes a prediction (e.g. "most people will judge the actions of a man who steals to feed his family to be morally acceptable"), and then you test the prediction by giving out a lot of surveys. The theories developed in this way will of course be subject to Occam's Razor, but if a lot of data were obtained, one could justify the construction of complex models.

If a good descriptive moral theory could be found, it would be widely useful. One could then state a strong normative principle: act in such a way that the theory judges to be acceptable. This solution is not perfect, but if we hold out for perfection we will be waiting a long time.

I certainly consider other people's stated moral beliefs, but I'm not ready to completely accept, as a normative principle, the best descriptive moral theory out there. I have a decent idea what the best descriptive moral theory out there looks like, and its not pretty.

"Robin Hanson, in his essay on "Minimal Morality", suggests that the unreliability of our moral reasoning should lead us to seek simple moral principles"

I very much agree with this. Most specific moral rules people make, are in fact nothing more than frequently accurate pre-made decisions as to what would be good in a given situation. But asking why a given rule is good rather than bad, will often help someone figure out their actual moral code, the one they would have to use in novel situations. Specific moral rules also have the functions of providing guidance to those who can't properly use their generalized moral rule, either due to not understanding consequences or a tendency to game rules that aren't specific enough, and also give pause to people who are faced with a difficult moral decision, such as whether it is good to murder a very evil person.

Morality does not, and should not, match legality. This is because legality has as a pre-requisite the ability to be enforced, whereas morality must be able to guide someone even when there is no outside enforcement. Ambiguities in laws cause a reduction in their usefulness, as the threat of force is reduced by ambiguity, ambiguity can increase inaction due to fear of mis-guided interpretation, and ambiguity will allow biases of judge and jury to taint judgement. Thus we have and need many specific legal rules, whereas "Do to others as you would have them do to you" is enough to build a complete moral system.

I wonder if you could comment on the desirability, from the perspective of Friendly AI, of trying to figure out the "human utility function" directly.

As I understand it, the current default strategy for Friendliness is to point the seed AI at an instance of humanity and say, turn yourself into an ideal moral agent, as defined by the implicit morality of that entity over there. This requires a method of determining the decision procedure of the entity in question (including some way of neglecting inessentials), extrapolating a species-relative ideal morality from the utility-function-analogue thereby deduced (this is to be the role of "reflective decision theory"), and also some notion of interim Friendliness which ensures (e.g.) that bad things aren't done to the human species by the developing AI while it's still figuring out true Friendliness. If all this can be coupled to a working process of raw cognitive self-enhancement in the developing AI, then success, the first superhuman intelligence will be a Friendly one.

This is a logical strategy and it really does define a research program that might issue in success. However, computational and cognitive neuroscientists are already working to characterize the human brain as a decision system. In an alternative scenario, it would not be necessary to delegate the process of analyzing human decision procedures to the seed AI, because it would already have been worked out by humans. It could serve as an epistemic check for the AI - give it some of the data and see if it reaches the same conclusions - just as basic physics offers a way to test an AI's ability to construct theories. But we don't need an AI to figure out, say, QED for us - we already did that ourselves. The same thing may happen with respect to the study of human morality.

I am trying to contextualize this discussion, given that my background in rhetoric and ethos is far removed from the background of the author.

So, I'm going to ask this simply (pun intended) to hopefully generate some useful complexity.

Is the goal of this analysis to systematize the implementation of pre-established ethical guidelines, or, as implied by Soulless Automaton's comment, to derive the ethical guidelines themselves?

Also, does this assume that ethics are derived from observing behavior and then selecting the best behavior given observed results? (If so, I would have to suggest most ethical choices are trained into people before they have enough experience to act outside of the ethical systems developed through our history. In effect, is this discussion assuming a tabula rosa that doesn't exist?)

My shot at an answer: the goal is to derive a general principle or principles underlying a wide range of (potentially inconsistent) intuitions about what is ethical in a variety of situations.

We could ask people directly about their intuitions, or attempt to discern them from how they actually behave in practice.

Not sure whether this will be at all useful (and apologies if this is pitched either too low or too high - it's always hard to judge these things) but to take a ridiculously pared down example, assume that we quiz two different people on their intuitions about the "goodness", G, of a variety of actions they could perform in a range of different circumstances. To make things really simple, assume that these actions differ only in terms of their effect on total well-being, T, and on inequality in well-being, I. (Assume also that we magically have some set of reasonable scales with which to measure G, T, and I.)

An example of a parametric approach to discerning a "general principle" underlying all these intuitions would be to plot the {G,T,I} triples on a 3-dimensional scatter plot, and to find a plane that comes closest to fitting them all (e.g. by minimizing the total distance between the points and the plane). It would then be possible to articulate a general principle that says something like "goodness increases 2 units for every unit increase in total welfare, and decreases 1 unit for every unit increase in inequality."

An example of a non-parametric approach would be to instead determine the goodness of an action simply by taking the average of the nearest two intuitions (which, assuming we have asked each individual the same questions, will just be the two individuals' judgments about the closest action we've quizzed them on). In advance, it's not clear whether there will be any easy way to articulate the resulting "general principle". It might be that goodness sometimes increases with inequality and sometimes decreases, perhaps depending on the level of total well-being; it might be that we get something that looks almost exactly like the plane we would get from the previous parametric approach; or we might get something else entirely.

In reality of course, we've got billions of people with billions of different intuitions, we can't necessarily measure G in any sensible way, and the possible actions we might take will differ in all sorts of complicated ways that I've totally ignored here. Indeed, deciding what sort of contextual information to pay attention to, and what to ignore could easily end up being more important than how we try to fit a hypersurface to the resulting data. In general though, the second sort of solution is likely to do better (a) the more data you have about people's intuitions; (b) the less reasonable you think it is to assume a particular type of general principle; and (c) the less messed up you think case-specific intuitions are likely to be.

"Nonparametric" methods focus on the observations too much. It's better to change representation from raw data to something else. The specific stressed in nonparametric methods is that there is lots of data, and it's possible to easily run requests on that data. At the same time, representation with too few parameters can present the results directly, but fails to be useful for different kinds of requests. The right balance is somewhere in between, with lots of data allowing to run custom requests, but only data that is expected to pay its rent. Parametric approach, but with many parameters.

As I recall, I tried to compress my morality down to One Great Moral Principle when I was 14 years old. My attempt was: "Don't hurt other people". Not too bad for four words.

What's a person? What's hurting? And is there anything else you want out of life besides being harmless?

Yes, your four words carry a charge of positive affect and sound like a good idea, deep wisdom for a 14-year-old that we should all applaud. But that should not make you overlook that - like any four-word English sentence - it would be absolutely terrible as the one and only Great Moral Principle.

My 14-year-old self also had some funny ideas about his reputation existing in the aether - as some kind of psycho-kinetic phenomenon - it was not my intention to claim much in the way of deep wisdom for my younger self.

It is true that this maxim gets to be brief by unloading complexity into the words "people" and "hurt" - but that seems inevitable - and those words do have commonly-accepted definitions that deal with most cases.

The idea was not to capture what I wanted. It was to capture what I thought was wrong. This was a biblical-style morality - a buch of "thou-shall-not" statements, with the implication that all else was permitted. Though instead of ten commandments, I only had one.

Looking back, one problem I see is there is no hint of utilitarianism. What if all your choices involve hurting people? There is no hint in the maxim of the concept of greater or lesser hurt.

As I recall, within a year or so, I had given up on attempts to reduce my morality to neat axioms. However, I was to return to the idea at about the age of 17 - when I found out how much of my behaviour was explicable by the hypothesis that most of the behaviour of living organisms fits the theory that they act so as to maximise their inclusive fitnesses.

I tried the same thing at about that age. I seem to recall my attempt involved something deeply silly about minimizing entropy increase (It would have helped if I'd actually understood entropy at that point in a deeper sense than the handwavy "disorder in a system" sense, perhaps). I also made the mistake of assuming that the ideal correct morality would by necessity fall out of pure logic by the power of careful reasoning, and that therefore morality was an inevitable consequence of intelligence.

I grew out of it a couple years later when I realized that I was being a dumbass.

Having it All Figured Out is a common mistake for (14-16)-year olds to make, I guess.

I seem to recall my attempt involved something deeply silly about minimizing entropy increase

So, you wanted to be a wizard? ;)

It's interesting to me just how very widespread the influence of those books seems to me to be in this community. And really, Buffy is only a modest variation on them.

I agree with your first sentence. The second one - oh, just insert some comical expression of bewilderment.

Now that you mention it, that might have been related. It's been a while...

Having it All Figured Out is a common mistake for (14-16)-year olds to make, I guess.

I agree, but we shouldn't rule out a priori the possibility that someday a 15-year-old really will Figure It All Out.

Hah! Ironically, living systems naturally act so as to maximise entropy increase:


The best way to minimise entropy is probably to destroy as much of the biomass as possible. A short term gain in entropy - but the net effect on entropy increase would be negative - and the more creatures you could kill, the better!

Yes. The irony was not lost on me once I actually grasped what entropy really means.

The description of "deeply silly" was, alas, not false modesty on behalf of my 14-year-old self, trust me.