Following is a brief of some parts of this paper on Aligning AI with shared human values.

 

The "why" behind most human actions is a universal seeking of pleasure and aversion to pain, so it seems natural that morality should be focused on "the greatest good for the greatest number of people".

This is why Utilitarianism emerged as a key idea in human values- that we make moral decisions from the position of a benevolent disinterested spectator.

In the paper this is mathematically translated as “maximizing the expectation of the sum of everyone’s utility functions.”  

A utility function maps various scenarios to a scalar representing the pleasure associated with them. For eg: Completing a project on time and receiving complements for it is more pleasurable that Missing the project deadline.

This understanding of utility can help AI agents deal with imprecise commands by choosing the alternative with higher utility.

Utility can’t be modeled as a regression task because utilities only hold the ordering under positive affine transformations i.e. 

a(u1) + b > a(u2) +b  preserves u1>u2  only when a is positive  and we can't guarantee that when performing regression.

To remove any biases that may occur because people have different perspectives, we remove the examples where there is substantial disagreement in ranking


 

New to LessWrong?

New Comment
1 comment, sorted by Click to highlight new comments since: Today at 4:40 PM

I haven't read the full paper and not sure if your excerpt is a fair characterization, but FYI I disagree with "The "why" behind most human actions is a universal seeking of pleasure and aversion to pain"

I recommend Not for the sake of happiness (alone) and generally recommend the Fake Preferences sequence.