Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Note: working on a research agenda, hence the large amount of small individual posts, to have things to link to in the main documents.

I've given examples of preferences over non-rewards.

But those examples are easy to dismiss as irrational or imposing on others, as they involved religion, or things like preferences over other people knowing the truth.

But if we think in terms of preferences over personal identity, we have far more widespread and natural examples of this.

One small example: in most murder-mysteries, I'd prefer not to know the identity of the killer at the beginning of the show. If I'm taking part in a murder mystery dinner, I'd definitely not want too. If I'm playing games of hidden information for fun, then I would not want to know all the hidden information - or else that makes this pointless. Lots of people enjoy watching sport events live; much less people re-watch random past sporting events whose result they already know. Eliezer talks about the joy in scientific discovery.

So preferences over non-rewards are quite natural and common.

Hedonism solution?

Now, you could say you don't have a "preferences over knowing/not knowing something", but that you "enjoy the experience of learning", and thus reduce the whole thing to hedonism. Eliezer argues against this, claiming to have preferences that don't reduce to hedonism (preferences I share, incidentally).

But even fully giving in to hedonism only partially solves the problem; yes, we can model someone's faith as saying they enjoy the experience of belief, and murder-mystery watchers enjoy the experience of seeing intricate puzzles solved. But this doesn't solve the problem that, to maximise hedonic experience, we still have to control what knowledge the human has.

New to LessWrong?

New Comment
1 comment, sorted by Click to highlight new comments since: Today at 10:36 AM

Leaving aside the question of why you believe that your preferences don't reduce to hedonism (when considering the possibility of preference to identify as someone who's preferences don't reduce to hedonism)...

One partial solution is to recognize that I am not atomic. Parts of my mind have goals and knowledge that differ from other parts - it's not a crisp separation, but it's not a uniform belief-mass.

Which opens the path to an analogy to standard ML practice: separating your inputs into training and test sets (which are independent) builds way stronger models than putting all of it into training, even though it's less data input to the actual model. I think this does give some insight into the preference to initial ignorance for games and entertainment/practice mysteries. I don't think it resolves all aspects of the question, of course.