Peter Vamplew

I am an Associate Professor in IT at Federation University (based in Ballarat, Australia). My primary research focus is on multiobjective reinforcement learning, and in recent years I have been examining how methods from that field can contribute to addressing the issue of AI alignment.

Posts

Sorted by New

13Scalar reward is not enough for aligned AGI

Ω

3y

Ω

3

Wiki Contributions

Comments

Sorted by

Newest

Scalar reward is not enough for aligned AGI

Peter Vamplew3y10

I'm not suggesting that RL is the only, or even the best, way to develop AGI. But this is the approach being advocated by Silver et al, and given their standing in the research community, and the resources available to them at DeepMind, it would appear likely that they, and others, will probably try to develop AGI in this way.

Therefore I think it is essential that a multiobjective approach is taken for there to be any chance that this AGI will actually be aligned to our best interests. If conventional RL based on scalar reward is used then
(a) it is very difficult to specify a suitable scalar reward which accounts for all of the many factors required for alignment (so reward misspecification becomes more likely),
(b) it is very difficult, or perhaps impossible, for the RL agent to learn the policy which represents the optimal trade-off between those factors, and
(c) the agent will be unable to learn about rewards other than those currently provided, meaning it will lack flexibility in adapting to changes in values (our own or society's)

The multiobjective maximum expected utility (MOMEU) model is a general framework, and can be used in conjunction with other approaches to aligning AGI. For example, if we encode an ethical system as a rule-base, then the output of those rules can be used to derive one of the elements of the vector utility provided to the multi-objective agent. We also aren't constrained to a single set of ethics - we could implement many different frameworks, treat each as a separate objective, and then when the frameworks disagree, the agent would aim to find the best compromise between those objectives.

While I didn't touch on it in this post, other desirable aspects of beneficial AI (such as fairness) can also be naturally represented and implemented within a multiobjective framework.

Reply