Value Learning

Ruby (+2185/-105)
Ruby (+129/-2209)
ESRogs (-181) Deleted confusingly-worded sentence
ESRogs (+8/-7) near of -> close to
ESRogs (+10/-9) added a space
ESRogs (+4/-2) diverge of -> diverge from
ESRogs (+1) added a close paren
joaolkf (+107/-43)
joaolkf (+5/-4)
joaolkf (+5/-6)

Value learningis a proposed method for incorporating human values in an approachAGI. It involves the creation of an artificial learner whose actions consider many possible set of values and preferences, weighed by their likelihood. Value learning could prevent an AGI of having goals detrimental to AI Alignment where insteadhuman values, hence helping in the creation of being programmedFriendly AI.

Although there are many ways to incorporate human values in an AGI (e.g.: Coherent Extrapolated Volition, Coherent Aggregated Volition and Coherent Blended Volition), this method is directly mentioned and developed in Daniel Dewey’s paper ‘Learning What to Value’. Like most authors, he assumes that human’s goals would not naturally occur in an artificial agent and should be enforced in it. First, Dewey argues against the use of a simple use of reinforcement learning to solve this problem, on the basis that this lead to the maximization of specific rewards that can diverge from value maximization. For example, even if we forcefully engineer the agent to maximize those rewards that also maximize human values, the agent could alter its environment to more easily produce those same rewards without the trouble of also maximizing human values (i.e.: if the reward was human happiness it could alter the human mind so it became happy with anything).

To solve all these problems, Dewey proposes a utility function maximizer, who considers all possible utility functions weighted by their probabilities: "[W]e propose uncertainty over utility functions. Instead of providing an agent one utility function up front, we provide an agent with a pool of possible utility function, the AI learns the correctfunctions and a probability distribution P such that each utility function by observing humans.can be assigned probability P(Ujyxm) given a particular interaction history [yxm]. An agent can then calculate an expected value over possible utility functions given a particular interaction history" He concludes saying that although it solves many of the mentioned problems, this method still leaves many open questions. However it should provide a direction for future work.

References

See Also

Value learningis an approach to AI Alignment where instead of being programmed directly with a proposed method for incorporating human values in an AGI. It involvesutility function, the creation of an artificial learner whose actions consider many possible set of values and preferences, weighed by their likelihood. Value learning could prevent an AGI of having goals detrimental to human values, hence helping inAI learns the creation of Friendly AI.

Although there are many ways to incorporate human values in an AGI (e.g.: Coherent Extrapolated Volition, Coherent Aggregated Volition and Coherent Blended Volition), this method is directly mentioned and developed in Daniel Dewey’s paper ‘Learning What to Value’. Like most authors, he assumes that human’s goals would not naturally occur in an artificial agent and should be enforced in it. First, Dewey argues against the use of a simple use of reinforcement learning to solve this problem, on the basis that this lead to the maximization of specific rewards that can diverge from value maximization. For example, even if we forcefully engineer the agent to maximize those rewards that also maximize human values, the agent could alter its environment to more easily produce those same rewards without the trouble of also maximizing human values (i.e.: if the reward was human happiness it could alter the human mind so it became happy with anything).

To solve all these problems, Dewey proposes a utility function maximizer, who considers all possible utility functions weighted by their probabilities: "[W]e propose uncertainty over utility functions. Instead of providing an agent onecorrect utility function up front, we provide an agent with a pool of possible utility functions and a probability distribution P such that each utility function can be assigned probability P(Ujyxm) given a particular interaction history [yxm]. An agent can then calculate an expected value over possible utility functions given a particular interaction history" He concludes saying that although it solves many of the mentioned problems, this method still leaves many open questions. However it should provide a direction for future work.by observing humans.

References

See Also

Although there are many ways to incorporate human values in an AGI (e.g.: Coherent Extrapolated Volition, Coherent Aggregated Volition and Coherent Blended Volition), this method is directly mentioned and developed in Daniel Dewey’s paper ‘Learning What to Value’. Like most authors, he assumes that human’s goals would not naturally occur in an artificial agent and should be enforced in it. First, Dewey argues against the use of a simple use of reinforcement learning to solve this problem, on the basis that this lead to the maximization of specific rewards that can diverge from value maximization. For example, even if we forcefully engineer the agent to maximize those rewards that also maximize human values, the agent could alter its environment to more easily produce those same rewards without the trouble of also maximizing human values (i.e.: if the reward was human happiness it could alter the human mind so it became happy with anything). Since we aren't currently even close to completely mapping and understanding human values, we need a method for an AGI to learn those values while having to departing from nowhere.

Although there are many ways to incorporate human values in an AGI (e.g.: Coherent Extrapolated Volition, Coherent Aggregated Volition and Coherent Blended Volition), this method is directly mentioned and developed in Daniel Dewey’s paper ‘Learning What to Value’. Like most authors, he assumes that human’s goals would not naturally occur in an artificial agent and should be enforced in it. First, Dewey argues against the use of a simple use of reinforcement learning to solve this problem, on the basis that this lead to the maximization of specific rewards that can diverge from value maximization. For example, even if we forcefully engineer the agent to maximize those rewards that also maximize human values, the agent could alter its environment to more easily produce those same rewards without the trouble of also maximizing human values (i.e.: if the reward was human happiness it could alter the human mind so it became happy with anything). Since we aren't currently even near ofclose to completely mapping and understanding human values, we need a method for an AGI to learn those values while having to departing from nowhere.

Although there are many ways to incorporate human values in an AGI (e.g.: Coherent Extrapolated Volition, Coherent Aggregated Volition and Coherent Blended Volition), this method is directly mentioned and developed in Daniel Dewey’s paper ‘Learning What to Value’. Like most authors, he assumes that human’s goals would not naturally occur in an artificial agent and should be enforced in it. First, Dewey argues against the use of a simple use of reinforcement learning to solve this problem, on the basis that this lead to the maximization of specific rewards that can diverge from value maximization. For example, even if we forcefully engineer the agent to maximize those rewards that also maximize human values, the agent could alter its environment to more easily produce those same rewards without the trouble of also maximizing human values(i.values (i.e.: if the reward was human happiness it could alter the human mind so it became happy with anything). Since we aren't currently even near of completely mapping and understanding human values, we need a method for an AGI to learn those values while having to departing from nowhere.

Although there are many ways to incorporate human values in an AGI (e.g.: Coherent Extrapolated Volition, Coherent Aggregated Volition and Coherent Blended Volition), this method is directly mentioned and developed in Daniel Dewey’s paper ‘Learning What to Value’. Like most authors, he assumes that human’s goals would not naturally occur in an artificial agent and should be enforced in it. First, Dewey argues against the use of a simple use of reinforcement learning to solve this problem, on the basis that this lead to the maximization of specific rewards that can diverge offrom value maximization. For example, even if we forcefully engineer the agent to maximize those rewards that also maximize human values, the agent could alter its environment to more easily produce those same rewards without the trouble of also maximizing human values(i.e.: if the reward was human happiness it could alter the human mind so it became happy with anything). Since we aren't currently even near of completely mapping and understanding human values, we need a method for an AGI to learn those values while having to departing from nowhere.

Although there are many ways to incorporate human values in an AGI (e.g.: Coherent Extrapolated Volition, Coherent Aggregated Volition and Coherent Blended Volition), this method is directly mentioned and developed in Daniel Dewey’s paper ‘Learning What to Value’. Like most authors, he assumes that human’s goals would not naturally occur in an artificial agent and should be enforced in it. First, Dewey argues against the use of a simple use of reinforcement learning to solve this problem, on the basis that this lead to the maximization of specific rewards that can diverge of value maximization. For example, even if we forcefully engineer the agent to maximize those rewards that also maximize human values, the agent could alter its environment to more easily produce those same rewards without the trouble of also maximizing human values(i.e.: if the reward was human happiness it could alter the human mind so it became happy with anything). Since we aren't currently even near of completely mapping and understanding human values, we need a method for an AGI to learn those values while having to departing from nowhere.

Value learning is a proposed method for incorporating human values in an AGI. It argues forinvolves the creation of an artificial learner whose actions consider many possible set of values and preferences, weighed by their likelihood .Valuelikelihood. Value learning could prevent an AGI of having goals detrimental to human values, hence helping in the creation of Friendly AI.

Although there are many ways to incorporate human values in an AGI (e.g.: Coherent Extrapolated Volition, Coherent Aggregated Volition and Coherent Blended Volition, this method is directly mentioned and developed in Daniel Dewey’s paper ‘Learning What to Value’. Like most authors, he saysassumes that human’s goals would not naturally occur in an artificial agent.agent and should be enforced in it. First, Dewey argues against the use of a simple use of reinforcement learning to solve this problem, on the basis that this lead to the maximization of specific rewards that can diverge of value maximization. For example, even if we forcefully engineer the agent to maximize those rewards that also maximize human values, the agent could alter its environment to more easily produce those same rewards without the trouble of also maximizing human values(i.e.: if the reward was human happiness it could alter the human mind so it became happy with anything). Since we aren't currently even near of completely mapping and understanding human values, we need a method for a an AGI to learn those values while having to departing from nowhere.

To solve all these problems, Dewey proposes a utility function maximizer, who considers all possible utility functions weighted by their probabilities: "[W]e propose uncertainty over utility functions. Instead of providing an agent one utility function up front, we provide an agent with a pool of possible utility functions and a probability distribution P such that each utility function can be assigned probability P(Ujyxm) given a particular interaction history [yxm]. An agent can then calculate an expected value over possible utility functions given a particular interaction history" He concludes saying that although it solves many of the mentioned problems, this method still leaves many open questions. However but it should provide a direction for future work.

To solve all thisthese problems, Dewey proposes a utility function maximizer, who considers all possible utility functions weighted by their probabilities: "[W]e propose uncertainty over utility functions. Instead of providing an agent one utility function up front, we provide an agent with a pool of possible utility functions and a probability distribution P such that each utility function can be assigned probability P(Ujyxm) given a particular interaction history [yxm]. An agent can then calculate an expected value over possible utility functions given a particular interaction history" He concludes saying although it solves many of the mentioned problems, this method still leaves many open questions. However but it should provide a direction for future work.

To solve all this problems, Dewey proposes a utility function] maximizer, who considers all possible utility functions weighted by their probabilities: "[W]e propose uncertainty over utility functions. Instead of providing an agent one utility function up front, we provide an agent with a pool of possible utility functions and a probability distribution P such that each utility function can be assigned probability P(Ujyxm) given a particular interaction history [yxm]. An agent can then calculate an expected value over possible utility functions given a particular interaction history" He concludes saying although it solves many of the mentioned problems, this method still leaves many open questions. However but it should provide a direction for future work.

Load More (10/11)