LESSWRONG
LW

Despite this essay's age, I was linked by Structural Risk Minimization and felt I had to address some points you made. I think you may have dismissed a strawman of consequence-approval direction, and then later used a more robust version on your reasoning while avoiding those terms. See my comments on the essay.

[-]danieldewey11yΩ000

It seems that if it is desired, the overseer could also set their behaviour and intentions so that the approval-directed agent acts as we would want an oracle or tool to act. This is a nice feature.

[-]danieldewey11yΩ000

I think Nick Bostrom and Stuart Armstrong would also be interested in this, and might have good feedback for you.

[-]danieldewey11yΩ000

High-level feedback: this is a really interesting proposal, and looks like a promising direction to me! Most of my inline comments on Medium are more critical, but that doesn't reflect my overall assessment.

Moderation Log