I am using IT character from that "IT" movie as a metaphor for describing a problem in AGI research. AI can potentially learn (or have already learnt) about all our irresistible pleasures and agonizing fears, similar to what IT could do in the film.

AGI initially “knows” or can deduct from only what we provide to it. Ideally, we want AGI to know as much information as needed (no more, no less), so it can provide us with the most relevant inferences to inform our decisions. Importantly, we don’t want to provide AGI with all the information we have, as we want to preserve privacy and autonomy. 

Our goals and values are not stable and change after new information is received. Here, I mean different levels of “our” —  goals and values of individuals, groups, organizations, nations and the society as a whole.

Question: Can we already understand “what we truly want” as individuals, groups and society, and what is best for us, by extracting the most relevant data we have and using appropriate machine learning tools? The same question goes for “what we truly fear” and what is actually the worst for us —is it possible to already extract from the data?

New to LessWrong?

New Answer
New Comment

1 Answers sorted by

Isnasene

Oct 28, 2019

10

I don't think we can understand things like "what we truly want" just by using the appropriate machine-learning tools.

To start off, Occam's razor is insufficient to infer the preferences of irrational agents. Since humans are generally irrational, this implies that--even with all human behavioral data--machine-learning tools and analysis will be insufficient to infer our values.

If we're trying to learn about our own values though, with machine-learning to help us, then we might be able to reduce the above issue by using object-level assumptions about our values and the ways in which we are irrational. Unfortunately, its hard to tell from the inside whether the things we think we care about are things we actually care about or just due to irrationality. Furthermore, when we push these kinds of questions onto advanced machine-learning (near AGI level), the answer will probably be algorithm dependent.

Practical Example:

Consider whether we should let an incomprehensibly massive number of people get dust specks in their eyes or subject one person to torture. Most people's intuitions prompt them to save the one person from torture. This can be explained in two different ways:

#1. People being irrationally scope insensitive about how bad it is to inconvenience such an incomprehensibly massive amount of people, even if each inconvenience is very minor (this is the explanation given by most rationalists)

or

#2. Torture being infinitely bad (this is the answer most people feel is intuitive)

How would (or should!) machine-learning decide between the two above two explanations? There are a lot of different ways you could do it with different answers. If your machine-learning algorithm learns scope-insensitivity in general and adjusts for it, it will tend to conclude #1. If your machine-learning algorithm isolates this problem specifically and directly queries people for their opinion on it upon reflection, it might conclude #2. To get one of these answers, we have to make a normative assumption (without machine-learning!) about how we want our machine-learning algorithms to learn.