I agree that trying to map all human values is extremely complex as articulated here [http://wiki.lesswrong.com/wiki/Complexity_of_value] , but the problem as I see it, is that we do not really have a choice - there has to be some way of measuring the initial AGI to see how it is handling these concepts.
I dont understand why we don’t try to prototype a high level ontology of core values for an AGI to adhere to - something that humans can discuss and argue about for many years before we actually build an AGI.  

Law is a useful example which shows that human values cannot be absolutely quantified into a universal system. The law is constantly abused, misused and corrected so if a similar system were to be put into place for an AGI it could quickly lead to UFAI.

One of the interesting things about the law is that for core concepts like murder, the rules are well defined and fairly unambiguous, whereas more trivial things (in terms of risk to humans) like tax laws, parking laws are the bits that have a lot of complexity to them.

 

 

New Comment
29 comments, sorted by Click to highlight new comments since: Today at 3:59 PM

One of the interesting things about the law is that for core concepts like murder, the rules are well defined and fairly unambiguous,

It a quite controversial discussion about whether or not abortion is murder. I would guess that the current US supreme court would rule it murder or manslaughter to hit a woman who's pregnant in the 8th month against her own preference strong enough to kill her unborn child.

The same goes for the actions of soliders. Is George Bush a murderer because he started an aggressive war against Iraq in which a lot of people died?

If I sequence the DNA of a Neanderthal and then let one be born via a human mother, do I engage in murder when I kill the individual?

Florida's stand your ground law?

Is it murder/manslaughter to do cryonics on a terminally ill person before their brain ceases to produce signals that are visible on an EEG?

Is it murder to defreeze a person from cryonics?

Is the person who doesn't push the fat man murdering the 5 on the train track?

This shows that there are few human value concepts that are not dependent/related to lots of other values. Good luck mapping all these relations by hand.

[-][anonymous]10y00

Yes, there will always be controversy across Countries and cultures, but this doesn't mean we shouldn't make a start with working out a system. In fact it highlights that we should be doing this, if for no other reason than to get an idea of which of these issues are important to the majority of humans. The point of the question was why are we not defining core values are important to humans for an AGI - so for the complex cases we could tell the AGI to 'leave it to the humans to decide'

The point of the question was why are we not defining core values are important to humans for an AGI - so for the complex cases we could tell the AGI to 'leave it to the humans to decide'

I think the main reason is because we don't believe that AGI works in a way where you could meaningfully tell the AGI to let humans decide the complex cases.

Apart from that there are already plenty of philosophers engaged into trying which issues are important to humans and how they are important.

I don't see much FAI-use for mapping human values because I expect to need to solve the value-loading problem via indirect normativity rather than direct specification (see Bostrom 2014).

Also, there is quite a lot of effort going into this already: World Values Survey, moral foundations theory, etc.

I expect to need to solve the value-loading problem via indirect normativity rather than direct specification (see Bostrom 2014).

What does this mean?

The value-loading problem is the problem of getting an AI to value certain things, that is, writing it's utility function. In solving this problem, you can either try to hard-code something into the function, like "paperclips good!". This is direct specification; writing a function that values certain things, but when we want to make an AI value things like "doing the right thing" this becomes unfeasible.

Instead, you could solve the problem by having the AI figure out what you want by itself. The idea is then that the AI can figure out the aggregate of human morality and act accordingly by simply being told to "do what I mean" or something similar. While this might require more cognitive work by the AI, it is almost certainly safer than trying to formalize morality ourselves. In theory this way of solving the problems avoids an AI that suddenly breaks down on some border case, for example a smilemaximizer filling the galaxy with tiny smileys instead of happy humans having fun.

This is all a loose paraphrasing from the last liveblogging event EY had in the FB group where he discusses open problems in FAI.

"Mapping human values" strikes me as being just as productive an activity for AGI development as mapping constellations is for astronomy: it is superficial and the really important stuff is much deeper. Sample questions: Where do values come from? How does one simulate a value for an optimizer? Is value even the right level of thinking or is it a byproduct of something more fundamental?

Didn't most astronomical progress follow and depend upon the mapping of constellations? Even when it turned out that there wasn't much more than random chance to the patterns of stars, precisely mapping those patterns let us compute parallaxes, record and eventually predict planetary motion, etc. Biology and chemistry started with similar tedious "cataloging" activities long before theories developed to unify all that data... and now this is starting to feel like less of an analogy and less of a coincidence: if you want a good theory to explain a complicated phenomenon, don't you almost have to start by accumulating a lot of data?

Mapping stars and especially mapping planets turned out to be really important for the development of astronomy. Constellations turn out to be a useless concept. Asking lots of people what constellations they see or where they think the boundaries are would have been wasted astronomical effort.

To return to the real topic under discussion: It might be the case that values are useless and we should only talk about preferences, or somesuch. I am agnostic on this point; I wanted to give an example of how some concept might turn out to be not worth collecting empirical data on.

Legal systems are maps of human values.

Legal systems are (at best) a "satisficer"s map of human values, necessary but insufficient. If you want to spend your life making as many paper clips as possible within the constraints of the law, we'll be disappointed in you but we won't forceably stop you, because it's your life. If an AGI decides it wants the same thing, I we'd rightly consider it a failure, even if we were lucky enough that it wanted to be similarly constrained.

I dont understand why we don’t try to prototype a high level ontology of core values

Philosophers and theologians have been doing this for ages -- at least since the time of the Ancient Greeks.

A bit of familiarity with history will illuminate... issues with this approach.

Are you kidding me? I'm staring right now, beside me, at a textbook chapter filled with catalogings of human values, with a list of ten that seem universal, with theories on how to classify values, all with citations of dozens of studies: Chapter 7, Values, of Chris Peterson's A Primer In Positive Psychology.

LessWrong is so insular sometimes. Like lionhearted's post Flashes of Nondecisionmaking yesterday---as if neither he nor most of the commenters had heard that we are, indeed, driven much by habit (e.g. The Power of Habit; Self-Directed Behavior; The Procrastionation Equation; ALL GOOD SELF HELP EVER), and that the folk conception of free will might be wrong (which has been long established; argued e.g. in Sam Harris's Free Will).

There are ambiguous edge cases for murder too: self-defense, abortion, war, hunting, animal experiments, euthanesia...

[-][anonymous]10y00

True, you have listed 6 edge cases for murder. Assume you have listed only 1% of the edge cases (though it is more likely 20-70%), we could still classify and document this for a fairly low cost (in terms of human years effort).

...for core concepts like murder, the rules are well defined and fairly unambiguous, whereas more trivial things (in terms of risk to humans) like tax laws, parking laws are the bits that have a lot of complexity to them.

I suspect there's a lot of hidden complexity here. The malice requirement for murder, for example, strikes me as the sort of thing that would be hard to get an algorithm to recognize; similar problems might arise in mapping out the boundary between premeditated and non-premeditated murder (in jurisdictions where it's significant), figuring out culpability in cases of murder by indirect means, determining whether a self-defense claim is justified, etc.

Tax law (e.g.) has more surface complexity, but it also looks more mechanistic to me. I don't think this has to do with risk so much as with its distance from domains we're cognitively optimized for.

A typical (which may not be the most common) related local belief is that it's both possible (practically speaking) and necessary to build an AGI which we know to be more reliable than human brains.

If one posits this, then comparing the initial AGI's results to the output of our own brains and rejecting the AGI if it fails to match is obviously silly. It's kind of like the role of Linnean taxonomy in evaluating DNA-based graphs of species relationships... where they conflict, we simply note that the Linnean taxonomy was wrong and move on.

That's not to say a taxonomy of human value is useless... it might be good for something, just as Linnean taxonomy might have once been. It might document value drift over time, for example, or value variation in different communities.

But given local assumptions, it doesn't do much towards AGI, so it's not too surprising that there's not much effort going in those directions here.

If one posits this, then comparing the initial AGI's results to the output of our own brains and rejecting the AGI if it fails to match is obviously silly.

The first part of the original plan for CEV is to get an AI to work out human value from all the humans. Without having some idea as to how it would do this, this appears to be a magical step. So asking the question seems a reasonable thing to do.

Well, there's a whole lot of magic going on here.

As I understand the original CEV plan, the idea is that the gadget that derives human values from human brains is itself understood to be more reliable than human brains.

So no, it doesn't actually make sense according to this theory to say "this is the output we expect from the gadget, according to our brains; let's compare the actual output of the gadget to the output of our brains and reject the gadget if they don't match."

That said, it certainly makes sense to ask "how are we supposed to actually know that we've built this gadget in the first place??!??!" I do not understand, and have never understood, how we're supposed to know on this theory that we've actually built the gadget properly and didn't miss a decimal point somewhere... I've been asking this question since I first came across the CEV idea, years ago.

[-][anonymous]10y00

The idea is that if you have a CEV gadget, you can ask it moral questions and it will come up with answers that do, in fact, look good. You wouldn't have thought of them yourself (because you're not that clever and don't have a built-in population ethic, blah blah blah), but once you see them, you definitely prefer them.

That's... interesting.

Is there a writeup somewhere of why we should expect an unaltered me to endorse the proposals that would, if implemented, best instantiate our coherent extrapolated volition?

I'm having a very hard time seeing why I should expect that; it seems to assume a level of consistency between what I endorse, what I desire, and what I "actually want" (in the CEV sense) that just doesn't seem true of humans.

[-][anonymous]10y00

I guess the simplest thing I can say is: there's a lot of stuff we don't think of because our hypothesis space consists only of things we've seen before. We expect that an AGI, being more intelligent than any individual human, could afford a larger hypothesis space and sift it better, which is why it would be capable of coming up with courses of action we value highly but did not, ourselves, invent.

Think retrospectively: nobody living 10,000 years ago would have predicted the existence of bread, beer, baseball, or automobiles. And yet, modern humans find ways to like all of those things (except baseball ;-)).

All else failing, something like CEV or another form of indirect normativity should at least give us an AI Friendly enough that we can try to use an injunction architecture to restrict it to following our orders or something, and it will want to follow the intent behind the orders.

If you're this skeptical about CEV, would you like to correspond by email about an alternative FAI approach under development, called value learners? I've been putting some tiny bit of thought into them on the occasional Saturday. I can send you the Google Doc of my notes.

Well, I certainly agree that there's lots of things we don't think about, and that a sufficiently intelligent system can come up with courses of action that humans will endorse, and that humans will like all kinds of things that they would not have endorsed ahead of time... for that matter, humans like all kinds of things that they simultaneously don't endorse.

And no, not really interested in private discussion of alternate FAI approaches, though if you made a post about it I'd probably read it.

[-][anonymous]10y00

a sufficiently intelligent system can come up with courses of action that humans will endorse, and that humans will like all kinds of things that they would not have endorsed ahead of time... for that matter, humans like all kinds of things that they simultaneously don't endorse.

Generally we aim to come up with things humans will both like and endorse. Optimizing for "like" but not "endorse" leads to various forms of drugging or wireheading (even if Eliezer does disturb me by being tempted towards such things). Optimizing for "endorse" but not "like" sounds like carrying the dystopia we currently call "real life" to its logical, horrid conclusion.

if you made a post about it I'd probably read it.

How well-founded does a set of notes or thoughts have to be in order to be worth posting here?

we aim to come up with things humans will both like and endorse

(shrug) Well, OK. If I consider the set of plans A which maximize our values when implemented, and the set of plans B which we endorse when they're explained to us, I'm prepared to believe that the AB intersection is nonempty. And really, any technique that stands a chance worth considering of coming up with anything in A is sufficiently outside my experience that I won't express an opinion about whether it's noticably less likely to come up with something in AB. So, go for it, I guess.

How well-founded does a set of notes or thoughts have to be in order to be worth posting here?

Depends on whom you ask. I'd say it's the product of (novel relevant concise entertaining coherent) that gets compared to threshold; well-founded is a nice benny but not critical. That said, posts that don't make the threshold will frequently be berated for being ill-founded if they are.

Mapping human values is even more difficult than mapping human everyday concepts as e.g. Cyc did/tried. Put vagueness into exact symbolic form. And with vagueness I don't mean 'I don't care' but 'related in a varying way'. Varying with respect to other relations (recursively) and varying with individual differences.

If we really tried to map human values symbolically we'd have to map each individuals values symbolically too and then symbolically aggregate that.

I don't think that we can do that. An AGI could but that is too late.

What we can do is map human values vaguely. We could e.g. train large deep neuronal nets to learn and approximate these concepts from whatever evidence we feed it. And then look at the inferred structure whether it is sufficiently close to what we want. That way we do not have to do the mapping ourselves; only the checking.

[-][anonymous]10y20

What we '''can''' do is 'map' human values vaguely.

This is the point I am trying to make - I think we should be starting some sort of mapping and having a nice long argument about it well before an AGI is realised so that humans can try and work out some sort of agreement before the AGI makes a very fast calculation.

But maybe we should only formalize the mapping process and let the AGI carry it out?