Utility Functions - LessWrong

A utility function assigns numerical values ("utilities") to outcomes, in such a way that outcomes with higher utilities are absolutely always preferred to outcomes with lower utilities, with no exceptions; the lack of exploitable holes in the preference ordering is necessary for the definition and separates utility from mere reward.

Utility Functions do not work very well in practice for individual humans. Human drives are not coherent nor is there any reason to think they would converge to a utility-function-grade level of reliability (Thou Art Godshatter), and even people with a strong interest in the concept have trouble working out what their utility function actually is even slightly (Post Your Utility Function). Furthermore, humans appear to calculate reward and loss separately - adding one to the other does not predict their behavior accurately, and thus human reward is not human utility. This makes humans highly exploitable - and in fact, not being exploitable would be a minimum requirement in order to qualify as having a coherent utility function....

(Read More)