Breaking Down Goal-Directed Behaviour

Oliver Sourbut

12 Breaking Down Goal-Directed Behaviour

by Oliver Sourbut

16th Jun 2022

AI Alignment Forum

2 min read

1

12 Ω 7

When we speak about entities 'wanting' things, or having 'goal-directed behaviour', what do we mean?

Because most of the actors that we (my human readers and I) attentively interact frequently with are (presumably) computationally similar (to each other and to ourselves), it is easy for our abstractions to conflate phenomena which are in fact different^[1], or to couple together impressions of phenomena which are in fact separable^[2]. On the occasions that we attentively observe dissimilar actors^[3], we are often-enough 'in the wild' (i.e. looking in their 'habitat') that conflating - anthropomorphising - is sufficiently usefully predictive to get by^[4], so these poor abstractions are insufficiently challenged.

Further, most of the people who in fact attentively observe a particular class of human-dissimilar actors (enough to perceive and understand the non-conflations) are focused domain experts about that particular class, and talk mainly to other domain experts about them^[5]. There are few venues in which it is useful to have unambiguous terminology here. Thus, the language we use to communicate our abstractions about goal-directed behaviour is prone to conflation and confusion even when some people have some of the right abstractions^[6].

Here I aim to take steps to break down 'goal-directed behaviour' into a conceptual framework of computational abstractions for which I offer tentative terminology, and which helps me to better understand and describe analogies and disanalogies between various goal-directed systems. The overarching motivation is to better understand goal-directed behaviour, in the sense of being able to better predict its (especially counterfactual and off-distribution) implications, its arisal, and other properties. Hopefully it is clear why I consider this worthwhile.

In order to ground this discussion, I refer to a reasonably diverse menagerie of candidate goal-directed systems, including natural and artificial systems at various levels of organisation. Contemplation of this diverse collection was responsible for the ideation and refinement of the ideas and gives some confidence in the appropriateness of the abstractions.

diverse menagerie of 'agents' A collection of a few 'agents' drawn from the menagerie

Different in the sense that, even if the observed surface phenomena are similar 'in the wild', their behaviour in different contexts might radically come apart; that is, a conflation is a poor predictor for out-of-distribution behaviour. ↩︎
Separable meaning that the absence of one or other piece is a meaningful, conceivable state of affairs, even if in practice they are almost always found composed together; that is, a coupled abstraction means if we have an impression of the one, we (perhaps incorrectly) assume presence of the other(s). ↩︎
That is, dissimilar from humans and from each other e.g. ants, genes, chess-engines, learning algorithms, corporations... ↩︎
If it is not clear why 'in the wild' (or 'in the typical setting') is important for the predictiveness of the anthropomorphism-heuristic, hopefully Deliberation and Reflexes and Where does Deliberation Come From? will clarify. In short, an actor fit for a particular setting can carry out 'deliberate-looking' behaviours without 'deliberative machinery', because the process which generated the actor provides enough (slow, gradual) 'deliberation' to locate such behaviours and bake them into 'reflexes'. ↩︎
e.g. entomologists, ornithologists, AI researchers, business executives, economists... are not often in a room together, at least not mutually-knowingly in their capacity as said experts of their respective fields ↩︎
I attempt to avoid use of the term 'agent' as it is a very loaded term which carries many connotations. In fact it is unfortunately a perfect exemplary linguistic victim of the abstractive conflation and coupling phenomena I have described. (I think the recent reception of the Gato paper was confused in part as a consequence of this.) I substitute 'actor' and 'controller' more freely as less loaded terms. ↩︎

AbstractionGoal-DirectednessOptimizationRationalityWorld Modeling

Frontpage

12 Ω 7

4 comments25 karma

New Comment

1 comment, sorted by

top scoring

Click to highlight new comments since: Today at 11:32 AM

[-]Richard_Kennaway4y20

When we speak about entities 'wanting' things, or having 'goal-directed behaviour', what do we mean?

I would mean that the entity can make certain observations of the world, and varies its actions as necessary to steer the world towards states where those observations take certain values. That the observations take those values is the goal. The entity's actions in steering towards that goal are its "goal-directed behaviour". "Wanting" is too anthropomorphic a concept for my taste in describing this situation, but one can say that the target states of the observations are what it "wants".

Reply

Moderation Log