It appears that in the last few years the AI Alignment community has dedicated great attention to the Value Learning Problem [1]. In particular, the work of Stuart Armstrong stands out to me.

Concurrently, during the last decade, researcher such as Eyke Hüllermeier Johannes Fürnkranz produced a significant amount of work on the topics of preference learning [2] and preference-based reinforcement learning [3].

While I am not highly familiar with the Value Learning literature, I consider the two fields closely related if not overlapping, but I have not often seen references the Preference Learning work, and vice-versa.

Is this because the two fields are less related than what I think? And more specifically, how do the two fields relate with each other?


[1] - Soares, Nate. "The value learning problem." Machine Intelligence Research Institute, Berkley (2015).

[2] - Fürnkranz, Johannes, and Eyke Hüllermeier. Preference learning. Springer US, 2010.

[3] - Fürnkranz, Johannes, et al. "Preference-based reinforcement learning: a formal framework and a policy iteration algorithm." Machine learning 89.1-2 (2012): 123-156.

New Answer
Ask Related Question
New Comment

1 Answers

The short answer is that yes, they are related and basically about the same thing. However the approaches of researchers vary a lot.

Relevant considerations that come to mind:

  • The extent to which values/preferences are legible
  • The extent to which they are discoverable
  • The extent to which they are hidden variables
  • The extent to which they are normative
  • How important immediate implementability is
  • How important extreme optimization is
  • How important safety concerns are

The result is that I think there is something of a divide between safety-focused researchers and capabilities-focused researchers in this area due to different assumptions and that makes each others work not very interesting/relevant to the other cluster.

Interesting points. The distinctions you mention could equally apply in distinguishing narrow from ambitious value learning. In fact, I think preference learning is pretty much the same as narrow value learning. Thus, could it be that ambitious value learning research may not be very interested in preference learning to a similar extent in which they are not interested in narrow value learning?

"How important safety concerns" is certainly right, but the story of science teaches us that taking something from a domain with different concerns to another domain has often proven extremely useful.