Utility Extraction

Another possibility ~~according to Luke Muehlhauser and Louie Helm (~~is described in The Singularity and Machine Ethics~~) is interpreting~~ by Luke Muehlhauser and Louie Helm. Research has recently postulated that the neural encoding of human values results from an interaction of two kinds of valuation processes: “model-free” processes, based on past experiences, and “model-based” processes, associated with goal-directed behavior. According to Muehlhauser and Helm inconsistent choices can be interpreted as deviations produced by non-model-based valuation ~~systems in the brain; information~~systems; predictions on when and to what extent model-based choices are “overruled” by the non-model-based valuation systems isare provided by neuroscientific research.

Finally, another option is represented by "value learners", implemented agents flexible enough to be used even when a detailed specification of desired behavior is not known. As they can pursue any goal, they can be designed to treat human goals as final rather than instrumental goals (~~Learning What to Value~~). Agents are provided with a pool of possible utility functions and a probability distribution P given a particular interaction history: they can therefore calculate expected value over possible utility functions.

Utility extraction is the semi-automatic acquisition of decision maker's preferences about the different outcomes of a decision problem.

Research has focused on three different areas:

eliciting the utility function based on a database of already elicited utility functions;
iterative reﬁnement of the decision maker’s current utility function using a value of information approach;
eliciting the utility function based on a database of observed behavioral patterns.

The last approach implies that preferences are reﬂected in the behavior, and that the decision maker is behavioral consistent. As real-world behaviors and decisions are often not consistent, methods based on such assumptions can extract only trivial utility functions. Thomas D. Nielsen and Finn V. Jensen (Learning a decision maker’s utility function from (possibly) inconsistent behavior) proposed two algorithms that can take into account inconsistent behaviors, in order to reflect human preferences in real contexts.

Another possibility is described in The Singularity and Machine Ethics by Luke Muehlhauser and Louie Helm. Research has recently postulated that the neural encoding of human values results from an interaction of two kinds of valuation processes: “model-free” processes, based on simplified past ~~experiences,~~experience (e.g. habits and reinforcements), and “model-based” processes, associated with deliberative, computationally expensive goal-directed behavior. According to Muehlhauser and Helm inconsistent choices can be interpreted as deviations produced by non-model-based valuation systems; predictions on when and to what extent model-based choices are “overruled” by the non-model-based valuation systems are provided by neuroscientific research.

Utility extraction is the semi-automatic acquisition of decision maker's preferences about the different outcomes of a decision problem. Extracting human preferences would be of great importance in order to implement them in a Friendly AI, preventing AI’s goals ~~differ~~differing from ours in case of ana "~~intelligence explosion~~hard takeoff". However, human values can be difficult to ~~specify.~~specify.

Finally, ~~if utility function is unknown,~~ another option is represented by "value learners", implemented agents flexible enough to be used even when a detailed specification of desired behavior is not known. As they can pursue any goal, they can be designed to treat human goals as final rather than instrumental goals (Learning What to Value). Agents are provided with a pool of possible utility functions and a probability distribution P given a particular interaction history: they can therefore calculate expected value over possible utility functions.

Utility extraction is the semi-automatic acquisition of decision maker's preferences about the different outcomes of a decision problem. Extracting human preferences would be of great importance in order to implement them in a Friendly AI, preventing AI’s goals differ from ours in case of an "intelligence explosion". However, human values can be difficult to specify.

Thomas D. Nielsen and Finn V. Jensen (Learning a decision maker’s utility function from (possibly) inconsistent behavior) ~~proposed~~were the first describing two algorithms that can take into account inconsistent ~~behaviors,~~behaviors. Inconsistent choices are interpreted as random deviations from an underlying “true” utility function.

Another possibility according to Luke Muehlhauser and Louie Helmis (The Singularity and Machine Ethics) is interpreting inconsistent choices as deviations produced by non-model-based valuation systems in ~~order~~the brain; information on when and to ~~reflect~~what extent model-based choices are “overruled” by the non-model-based valuation systems is provided by neuroscientific research.

Finally, if utility function is unknown, another option is represented by "value learners", implemented agents flexible enough to be used even when a detailed specification of desired behavior is not known. As they can pursue any goal, they can be designed to treat human ~~preferences in real contexts.~~goals as final rather than instrumental goals (Learning What to Value). Agents are provided with a pool of possible utility functions and a probability distribution P given a particular interaction history: they can therefore calculate expected value over possible utility functions.

A brief tutorial on preferences in AI by Luke Muehlhauser
Learning a decision maker’s utility function from (possibly) inconsistent behavior by Thomas D. Nielsen and Finn V. Jensen
The Singularity and Machine Ethics by Luke Muehlhauser and Louie Helmis
Learning What to Value by Daniel Dewey

			v1.9.0Sep 24th 2020 GMT	(+10/-10)
			v1.8.0Sep 22nd 2020 GMT	(+56)
			v1.7.0Oct 12th 2012 GMT	(+95/-12)
			v1.6.0Oct 12th 2012 GMT	(+387/-621)
			v1.5.0Oct 12th 2012 GMT	(+4/-6)
			v1.4.0Oct 12th 2012 GMT	(+30/-38)
			v1.3.0Oct 11th 2012 GMT	(-32)
			v1.2.0Sep 30th 2012 GMT	(+1409/-59)
			v1.1.0Sep 27th 2012 GMT
			v1.0.0Sep 27th 2012 GMT	(+1228) Created page with "'''Utility extraction''' is the semi-automatic acquisition of decision maker's [[preferences]] about the different outcomes of a decision problem. Research has focused on three ..."

			v1.9.0Sep 24th 2020 GMT	(+10/-10)
			v1.8.0Sep 22nd 2020 GMT	(+56)
			v1.7.0Oct 12th 2012 GMT	(+95/-12)
			v1.6.0Oct 12th 2012 GMT	(+387/-621)
			v1.5.0Oct 12th 2012 GMT	(+4/-6)
			v1.4.0Oct 12th 2012 GMT	(+30/-38)
			v1.3.0Oct 11th 2012 GMT	(-32)
			v1.2.0Sep 30th 2012 GMT	(+1409/-59)
			v1.1.0Sep 27th 2012 GMT
			v1.0.0Sep 27th 2012 GMT	(+1228) Created page with "'''Utility extraction''' is the semi-automatic acquisition of decision maker's [[preferences]] about the different outcomes of a decision problem. Research has focused on three ..."

LESSWRONG
LW

LESSWRONG
LW

Utility Extraction

Further Reading & References

See also