Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

I haven't put much thought into this post; it's off the cuff.

DeepMind has published a couple of papers on maximizing empowerment as a form of intrinsic motivation for Unsupervised RL / Intelligent Exploration.

I never looked at either paper in detail, but the basic idea is that you should seek to maximize mutual information between (future) outcomes and actions or policies/options. Doing so means an agent knows what strategy to follow to accomplish a given outcome.

It seems plausible that instead minimizing empowerment in the case where there is a reward function could help steer an agent away from pursuing instrumental goals which have large effects.

So that might be useful for "taskification", "limited impact", etc.

New to LessWrong?

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 10:19 PM

Discussed briefly in Concrete Problems, FYI: https://arxiv.org/pdf/1606.06565.pdf

[-]Vika7yΩ000

I would expect minimizing empowerment to impede the agent in achieving its objectives. You do want the agent to have large effects on some parts of the environment that are relevant to its objectives, without being incentivized to negate those effects in weird ways in order to achieve low impact overall.

I think we need something like a sparse empowerment constraint, where you minimize empowerment over most (but not all) dimensions of the future outcomes.