Natural Value Learning

[-]Steven Byrnes4y50

You seem to be taking for granted: (1) that children learn values at all, (2) that they learn values from their parents. Is that a fair characterization?

For my part, I think (1) is complicated (because I think very important parts of "values" are hardwired by the genes) and that (2) is mostly false (because I suspect children predominantly learn culture from their peers and from older children, and learn it much less from the previous generation).

[-]Chris van Merwijk4y30

Not really a fair characterization I think: 2 mostly seems orthogonal to me (though I probably disagree with your claim. i.e. most important things are passed from previous generations. e.g. children learn that theft is bad, racism is bad etc, all of those things are passed from either parents or other adults. I don't care a lot about the distinction parents vs other adults/society in this case. I know about the research that parenting has little influence, I don't want to go into it preferably). 1 seems more relevant. In fact maybe the main reason for me to think this post is irrelevant is that the inductive biases in AI systems will be too different from that of humans (although note, genes still alow for a lot of variability in ethics and so on). But I still think it might be a good idea to keep in mind that "information in the brain about values has a higher risk to not get communicated into the training signal if the method of elliciting that information is not adapted to the way humans normally express the information", if indeed it is true.

[-]Steven Byrnes4y30

e.g. children learn that theft is bad, racism is bad etc, all of those things are passed from either parents or other adults

If a kid’s parents and teachers and other authority figures tell them that stealing is bad, while everyone in the kid’s peer group (and the next few grades up) steal all the time, and they never get in trouble, and they talk endlessly about how awesome it is, I think there’s a very good chance that the kid will wind up feeling that stealing is great, just make sure the adults don’t find out.

I speak from personal experience! As a kid, I used the original Napster to illegally download music. My parents categorized illegal music downloads as a type of theft, and therefore terribly unethical. So I did it without telling them. :-P

As a more mundane example, I recall that my parents and everyone in their generation thought that clothes should fit on your body, while my friends in middle school thought that clothes should be much much too large for your body. You can guess what size clothing I desperately wanted to wear.

(I think there’s some variation from kid to kid. Certainly some kids at some ages look up to their parents and feel motivated to be like them.)

inductive biases in AI systems will be too different from that of humans

In my mind, “different inductive bias” is less important here than “different reward functions”. (Details.) For example, high-functioning psychopaths are perfectly capable of understanding and imitating the cultural norms that they grew up in. They just don’t want to.

although note, genes still allow for a lot of variability in ethics and so on

I agree that cultures exist and are not identical.

I tend to think that learning and following the norms of a particular culture (further discussion) isn’t too hard a problem for an AGI which is motivated to do so, and hence I think of that motivation as the hard part. By contrast, I think I’m much more open-minded than you to the idea that there might be lots of ways to do the actual cultural learning. For example, the “natural” way for humans to learn Bedouin culture is to grow up as a Bedouin. But I think it’s fair to say that humans can also learn Bedouin culture quite well by growing up in a different culture and then moving into a Bedouin culture as an adult. And I think humans can even (to a lesser-but-still-significant extent) learn Bedouin culture by reading about it and watching YouTube videos etc.

[-]Chris van Merwijk4y*30

"I tend to think that learning and following the norms of a particular culture (further discussion) isn’t too hard a problem for an AGI which is motivated to do so". If the AGI is motivated to do so then the value learning problem is already solved and nothing else matters (in particular my post becomes irrelevant), because indeed it can learn the further details in whichever way it wants. We somehow already managed to create an agent with an internal objective that points to Bedouin culture (human values), which is the whole/complete problem.

I could say more about the rest of your comment but just checking if the above changes your model of my model significantly?

Also regarding "I think I’m much more open-minded than you to ...": to be clear, I'm not at all convinced about this I'm open to this distinction not mattering at all. I hope I didn't come accross as not open minded about this.

[-]Steven Byrnes4y20

There’s sorta a use/mention distinction between:

An AGI with the motivation “I want to follow London cultural norms (whatever those are)”, versus
An AGI with the motivation “I want to follow the following 500 rules (avoid public nudity, speak English, don’t lick strangers, …), which by the way comprise London cultural norms as I understand them”

Normally I think of “value learning” (or in this case, “norm learning”) as related to the second bullet point—i.e., the AI watches one or more people and learn their actual preferences and desires. I also had the impression that your OP was along the lines of the second (not first) bullet point.

If that’s right, and if we figure out how to make an agent with the first-bullet-point motivation, then I wouldn’t say that “the value learning problem is already solved”, instead I would say that we have made great progress towards safe & beneficial AGI in a way that does not involve “solving value learning”. Instead the agent will hopefully go ahead and solve value learning all by itself.

(I’m not confident that my definitions here are standard or correct, and I’m certainly oversimplifying in various ways.)

[-]Dagon4y40

Upvoted for an interesting direction of exploration, but I'm not sure I agree with (or understand, perhaps) the underlying assumption that "natural-feeling" is more likely to be safe or good. This seems a little different from the common naturalistic fallacy (what's natural is always good, what's artificial is always bad). It's more a glossing over the underlying problem that we have no Safe Natural Intelligence - people are highly variable and many many of them are terrifying and horrible.

[-]Chris van Merwijk4y*30

The thing underlying the intuition is more something like: We have a method of feedback that humans understand and that works fairly well, and is adapted to the way values are stored in human brains. If we try to have humans give feedback in ways that are not adapted to that, I expect information to be lost. The fact that it "feels natural" is a proxy for "the method of feedback to machines is adapted to the way humans normally give feedback to other humans" without which I am at least concerned about information loss (not claiming it's inevitable). I don't inherently care about the "feeling" of naturalness.

Regarding no Safe Natural Intelligence: I agree that there is no such thing, but this is not really a strong argument against? This doesn't make me somehow suddenly feel comfortable about "unnatural" (I need a better term) methods for humans to provide feedback to AI agents. The fact that there are bad people doesn't negate the idea that the only source of information about what is good seems to be stored in brains and that we need to extract this information in a way that is adapted to how those brains normally express that information.

Maybe I should have called it "human-adapted methods of human feedback" or something.

[-]Dagon4y20

Regarding no Safe Natural Intelligence: I agree that there is no such thing, but this is not really a strong argument against?

I think it's a pretty strong argument. There are no humans I'd trust with the massively expanded capabilities that AI will bring, so I have to believe that the training methods for humans are insufficient.

We WANT divergence from "business as usual" human beliefs and actions, and one of the ways to get there is by different specifications and training mechanisms. The hard part is we don't yet know how to specify precisely how we want it to differ.

[-]Charlie Steiner4y20

I dunno, I'm not at all sure what "naturalness" is supposed to be doing here below the appearance level - how are the algorithms different?

[-]Chris van Merwijk4y30

I haven't specified anything about the algorithms, but they will maybe somehow have to be different. The point is that the format of the human feedback is different. Really this post is about the format in which humans provide feedback rather than about the structure of the AI systems (i.e. a difference in method of generating the training signal rather than a difference in learning algorithm).

Natural alignment	Unnatural alignment
Humans play the same role, and do the same kinds of things, within the process of machine value learning as they do when teaching values to children.	Humans in some significant way play a different role, or have to behave differently within the process of machine value learning than they do when teaching values to children.
The machine value learning process is adapted to humans.	Humans have to adapt to the machine value learning process.
Humans who aren’t habituated to the technical problems of AI or machine value learning would still consider the process as it unfolds to be natural. They can intuitively think of the process in analogy to the process as it has played out in their experience with humans.	Humans who aren’t habituated to the technical problems of AI or machine value learning would perceive the machine value learning process to be unnatural, alien or “computer-like”. If they naively used their intuitions and habits from teaching values to children, they would be confused in important ways.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

7

Natural Value Learning

7

7

Natural value learning

Concrete examples

Why does this distinction matter?