LESSWRONG
LW

132
Ben Pringle
5010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
No posts to display.
Announcing the AI Alignment Prize
Ben Pringle8y60

I saw a talk earlier this year that mentioned this 2015 Corrigibility paper as a good starting point for someone new to alignment research. If that's still true, I started writing up some thoughts on a possible generalization of the method in that paper.

Anyway, submitting this draft early to hopefully get some feedback whether I'm on the right track:

GeneralizedUtilityIndifference_Draft_Latest.pdf (edited)

The new version does better on sub-agent shutdown and eliminates the "managing the news" problem.

(Let me know if someone already thought of this approach!)

EDIT 2017-11-09: filled in the section on the n-action model.

Reply