LESSWRONG
is fundraising!
LW

I strongly suspect those meta-preferences are both critical for correct extrapolation of human values/preferences, AND are the place where we'll find a fair bit of actual inconsistency of human desires.

"I want to be able to follow my illegible whims" seems like a very common and strong meta-preference, and I haven't seen it modeled well in any discussions.

Moderation Log

More from Stuart_Armstrong

Curated and popular this week

1Comments

I've mentioned conditional preferences before. These are preferences that are dependent on facts about the world, for example "I'd want to believe X if there are strong argument for X".

But there is another type of preference that is conditional: my tastes can vary depending on circumstances and on my past experience. For example, I might prefer to eat apples during the week and oranges on weekends. Or, because of the miracle of boredom, I might prefer oranges if (but only if) I've been eating apples all week so far.

What if I currently want apples, would want oranges tomorrow, but falsely believe (today) that I would want apples tomorrow? This is a known problem with "one-step hypotheticals", and a strong argument in practice for assessing preferences over time rather than at a single moment $t$ .

In theory, there are meta-preferences that allow one to get this even at a single moment $t$ , such as "I want to be able to follow my different tastes at different times" or a more formalised desire for variety and exploration.

Mentioned in

74Research Agenda v0.9: Synthesising a human's preferences into a utility function

12Towards deconfusing values

12Values, Valence, and Alignment

4Let Values Drift