Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

I've mentioned conditional preferences before. These are preferences that are dependent on facts about the world, for example "I'd want to believe X if there are strong argument for X".

But there is another type of preference that is conditional: my tastes can vary depending on circumstances and on my past experience. For example, I might prefer to eat apples during the week and oranges on weekends. Or, because of the miracle of boredom, I might prefer oranges if (but only if) I've been eating apples all week so far.

What if I currently want apples, would want oranges tomorrow, but falsely believe (today) that I would want apples tomorrow? This is a known problem with "one-step hypotheticals", and a strong argument in practice for assessing preferences over time rather than at a single moment .

In theory, there are meta-preferences that allow one to get this even at a single moment , such as "I want to be able to follow my different tastes at different times" or a more formalised desire for variety and exploration.

New Comment
1 comment, sorted by Click to highlight new comments since: Today at 5:15 PM

I strongly suspect those meta-preferences are both critical for correct extrapolation of human values/preferences, AND are the place where we'll find a fair bit of actual inconsistency of human desires.

"I want to be able to follow my illegible whims" seems like a very common and strong meta-preference, and I haven't seen it modeled well in any discussions.