Utility extraction is the semi-automatic acquisition of decision maker's preferences about the different outcomes of a decision problem. Extracting human preferences would be of great importance in order to implement them in a Friendly AI, preventing AI’s goals differing from ours in case of a "hard takeoff". However, human values can be difficult to specify.
See also: Value Learning, Inverse Reinforcement Learning
Research has focused on three different areas:
The last approach implies that preferences are reﬂected in the behavior, and that the decision maker is behavioral consistent. As real-world behaviors and decisions are often not consistent, methods based on such assumptions can extract only trivial utility functions.
Thomas D. Nielsen and Finn V. Jensen (Learning a decision maker’s utility function from (possibly) inconsistent behavior) were the first describing two algorithms that can take into account inconsistent behaviors. Inconsistent choices are interpreted as random deviations from an underlying “true” utility function.
Another possibility is described in The Singularity and Machine Ethics by Luke Muehlhauser and Louie Helm. Research has recently postulated that the neural encoding of human values results from an interaction of two kinds of valuation processes: “model-free” processes, based on simplified past experience (e.g. habits and reinforcements), and “model-based” processes, associated with deliberative, computationally expensive goal-directed behavior. According to Muehlhauser and Helm inconsistent choices can be interpreted as deviations produced by non-model-based valuation systems; predictions on when and to what extent model-based choices are “overruled” by the non-model-based valuation systems are provided by neuroscientific research.