A short dialogue on comparability of values

[-]Said Achmiz2y20

Q: Very well. Imagine a person in a room faced with two buttons, saying “one mile is heavier than one hour” and “vice versa”.

Really it should be three buttons, the third one saying “they are of equal weights”… ;)

(Great post!)

[-]Dagon2y2-1

I'm not sure I understand the claim or hypothesis behind this post. It's something about the meanings of "value" and "believe" in terms of evidence from statements or button-pushes, but I don't see how it's confusing in the first place.

In my view, people have preferences, and have beliefs about causality, which are expressed through actions that (are intended to) influence future world-states. This is VERY NOISY, because brains kind of suck, and because the complexity of the real world really is too big to fit into anyone's models. Instead of

we can ask people to choose between things, therefore people have preferences

I'd say something like "people take actions and have behaviors, and to the extent they are consistent, this implies preferences". No part of "we ask" or verbalizing those preferences is required. Preferences and values are the choices people make, not the things they say.

[-]cousin_it2y51

Inferring preferences from actions is also philosophically tricky. My favorite reference is this old comment thread.

Wei:

let’s say it models the world as a 2D grid of cells that have intrinsic color... What does this robot “actually want”, given that the world is not really a 2D grid of cells that have intrinsic color?

steven0461:

Who cares about the question what the robot “actually wants”? Certainly not the robot. Humans care about the question what they “actually want”, but that’s because they have additional structure that this robot lacks. But with humans, you’re not limited to just looking at what they do on auto-pilot; instead, you can just ask

So with my post I'm trying to continue that line. It was understood (I hope!) that inferring preferences from actions would lead to something very evolutionary-messy and selfish and you wouldn't endorse it when shown the description. And now I try to show that inferring preferences by asking is also kind of meaningless.

[-]Dagon2y21

Hmm. I guess I start with the knowledge that humans don't seem to be VNM-consistent, so it's quite reasonable to start by tabooing "want" and "prefer", because they don't apply in the way that's usually studied and analyzed.

I disagree with steven0461 that "just ask" provides any more information than watching an artificial choice. Both are trying to infer something that doesn't exist from something easily observable.

For many humans, we CAN say they "currently prefer" the expected outcome of an actual choice they make, but that's a pretty weak and circular definition.

So - what do you hope to actually model about an individual human that you're using the word "want" for?

[-]cousin_it2y43

The overarching problem is figuring out human preferences so that AI can fulfill them. We're all on the same page that humans aren't VNM-consistent.

[-]Dagon2y43

Ah, yeah. That’s why I’m not very hopeful about AI alignment. I don’t think anyone’s even defined the problem in a useful way.

Neither humans as a class nor most humans as individuals HAVE preferences that AI is able to fulfill, or even be compatible with as they are conceived today. We MAY have mental frameworks that let our preferences evolve to survive well in an AI-containing world.

[-]Vladimir_Nesov2y20

Search for meaning can be part of the activity. I think there is a sensible illustration from the old model of UDT where there's agent A() and world U(), and we want to look for dependencies D(-) such that D(A) serves as a proxy for U(), and also such that D has A factored out of it, so that it itself doesn't depend on A (not a spurious dependence) to prevent cyclic reasoning when A makes decisions based on D. Here, we start with A and U as given, and then figure out D, which serves as the correspondence meaning of A in terms of its acausal influence on U. So the meaning of A is logically downstream of the definition of A.

When we label buttons with "2+2=5" and "2+2=7", the physical world outcomes of pressing them are not on the way to the U() of their A(), so they are not relevant. But those outcomes are on the way to the human's U(), even as the human still doesn't know the meaning of their actions, since that meaning is downstream of knowing the scope of the semantic outcomes they do already know to care about. This difference in scopes of intended outcomes is the disanalogy.

LESSWRONG
LW

LESSWRONG
LW

27

A short dialogue on comparability of values

27

27