Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

I'd just want to make the brief point that many human meta-preferences are conditional.

Sure, we have "I'd want to be more generous", or "I'd want my preferences to be more consistent". But there are many variations of "I'd want to believe in a philosophical position if someone brings me a very convincing argument for it" and, to various degrees of implicitness or explicitness, "I'd want to stop believing in cause X if implementing it leads to disasters".

Some are a mix of conditional and anti-conditional: "I'd want to believe in X even if there was strong evidence against it, but if most of my social group turns against X, then I would want to too".

The reason for this stub of a post is that when I think of meta-preferences, I generally think of them as conditional; yet I've read some comments by people that imply that they think that I think of meta-preferences in an un-conditional way[1]. So I made this post to have a brief reference point.

Indeed, in a sense, every attempt to come up with normative assumptions to bridge the is-ought gap in value learning, is an attempt to explicitly define the conditional dependence of preferences upon the facts of the physical world.

Defining meta-preferences that way is not a problem, and bringing the definition into the statement of the meta-preference is not a problem either. In many cases, whether we label something conditional or non-conditional is a matter of taste, or whether we'd done the updating ahead of time or not. Contrast "I love chocolate", with "I love delicious things" with the observation "I find chocolate delicious", with "conditional on it being delicious, I would love chocolate" (and "I find chocolate delicious").

  1. This sentence does actually make sense. ↩︎

New Comment