Related post: Muehlhauser-Wang Dialogue.

Motivation Management in AGI Systems, a paper to be published at AGI-12.

Abstract. AGI systems should be able to manage its motivations or goals that are persistent, spontaneous, mutually restricting, and changing over time. A mechanism for handles this kind of goals is introduced and discussed.

From the discussion section:

The major conclusion argued in this paper is that an AGI system should always maintain a goal structure (or whatever it is called) which contains multiple goals that are separately specifi ed, with the properties that

  • Some of the goals are accurately speci fied, and can be fully achieved, while some others are vaguely specifi ed and only partially achievable, but nevertheless have impact on the system's decisions.
  • The goals may conflict with each other on what the system should do at a moment, and cannot be achieved all together. Very often the system has to make compromises among the goals.
  • Due to the restriction in computational resources, the system cannot take all existing goals into account when making each decision, and nor can it keep a complete record of the goal derivation history.
  • The designers and users are responsible for the input goals of an AGI system, from which all the other goals are derived, according to the system's experience. There is no guarantee that the derived goals will be logically consistent with the input goals, except in highly simplifi ed situations.

One area that is closely related to goal management is AI ethics. The previous discussions focused on the goal the designers assign to an AGI system ("super goal" or "final goal"), with the implicit assumption that such a goal will decide the consequences caused by the A(G)I systems. However, the above analysis shows that though the input goals are indeed important, they are not the dominating factor that decides the broad impact of AI to human society. Since no AGI system can be omniscient and omnipotent, to be "general-purpose" means such a system has to handle problems for which its knowledge and resources are insufficient [16, 18], and one direct consequence is that its actions may produce unanticipated results. This consequence, plus the previous conclusion that the eff ective goal for an action may be inconsistent with the input goals, will render many of the previous suggestions mostly irrelevant to AI ethics.

For example, Yudkowsky's "Friendly AI" agenda is based on the assumption that "a true AI might remain knowably stable in its goals, even after carrying out a large number of self-modi cations" [22]. The problem about this assumption is that unless we are talking about an axiomatic system with unlimited resources, we cannot assume the system can accurately know the consequence of its actions. Furthermore, as argued previously, the goals in an intelligent system inevitable change as its experience grows, which is not necessarily a bad thing - after all, our "human nature" gradually grows out of, and deviates from, our "animal nature", at both the species level and the individual level.

Omohundro argued that no matter what input goals are given to an AGI system, it usually will derive some common "basic drives", including "be self-protective" and "to acquire resources" [1], which leads some people to worry that such a system will become unethical. According to our previous analysis, the producing of these goals are indeed very likely, but it is only half of the story. A system with a resource-acquisition goal does not necessarily attempts to achieve it at all cost, without considering its other goals. Again, consider the human beings - everyone has some goals that can become dangerous (either to oneself or to the others) if pursued at all costs. The proper solution, both to human ethics and to AGI ethics, is to prevent this kind of goal from becoming dominant, rather than from being formed.

New to LessWrong?

New Comment
4 comments, sorted by Click to highlight new comments since: Today at 10:30 AM

I remember making this argument :D Haha, I was quickly downvoted.

Anyhow, "vaguely specified goals" actually turns out to be a property of you, not the AI.

If an agent has formally definable goals, then of course they are precisely specified. Vagueness can only be a property of matching actual goals to higher level descriptions.

I thought the paper was mostly wrong. In particular, I thought the argument that:

it is not a good idea for an AGI system to be designed in the frameworks where a single goal is assumed, such as evolutionary learning, program search, or reinforcement learning,

...was weak.

There is no guarantee that the derived goals will be logically consistent with the input goals, except in highly simplified situations.

Are they saying that (practically feasible) goal derivation algorithms necessarily produce logical inconsistencies? Or that this is actually a desirable property? Or what?

The text says an AI "should" maintain a goal structure that produces logically inconsistent subgoals. I don't think I understand what they mean.