[Aspiration-based designs] Outlook: dealing with complexity

Jobst Heitzig; jossoliver; thomasfinn; Simon Dima

Aspiration-based, non-maximizing AI agent designs [Aspiration-based designs]

[Aspiration-based designs] Outlook: dealing with complexity

by Jobst Heitzig, jossoliver, thomasfinn, Simon Dima

2 min read28th Apr 20243 comments

13 Ω 8

Frontpage

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Summary. This teaser post sketches our current ideas for dealing with more complex environments. It will ultimately be replaced by one or more longer posts describing these in more detail. Reach out if you would like to collaborate on these issues.

Multi-dimensional aspirations

For real-world tasks that are specified in terms of more than a single evaluation metric, e.g., how much apples to buy and how much money to spend at most, we can generalize Algorithm 2 as follows from aspiration intervals to convex aspiration sets:

Assume there are many evaluation metrics $u_{i}$ , combined into a vector-valued evaluation metric $u = (u_{1}, \dots, u_{d})$ .
Preparation: Pick $d + 1$ many linear combinations $f_{j}$ in the space spanned by these metrics so that their convex hull is full-dimensional and contains the origin, and consider the $d + 1$ many policies $π_{j}$ each of which maximizes the expected value of the corresponding function $f_{j}$ . Let $V_{j} (s)$ and $Q_{j} (s, a)$ be the expected values of $u$ when using $π_{j}$ in state $s$ or after using action $a$ in state $s$ , respectively (see Fig. 1). Let the admissibility simplices $V (s)$ and $Q (s, a)$ be the simplices spanned by the vertices $V_{j} (s)$ and $Q_{j} (s, a)$ , respectively (red and violet triangles in Fig. 1). They replace the feasibility intervals used in Algorithm 2.
Policy: Given a convex state-aspiration set $E (s) \subseteq V (s)$ (central green polyhedron in Fig. 1), compute its midpoint (centre of mass) $m$ and consider the $d + 1$ segments $ℓ_{j}$ from $m$ to the corners $V_{j} (s)$ of $V (s)$ (dashed black lines in Fig. 1). For each of these segments $ℓ_{j}$ , let $A_{j}$ be the (nonempty!) set of actions for which $ℓ_{j}$ intersects $Q (s, a)$ . For each $a \in A_{j}$ , compute the action-aspiration $E (s, a) \subseteq Q (s, a)$ by shifting a copy $C_{j}$ of $E (s)$ along $ℓ_{j}$ towards $V_{j} (s)$ until the intersection of $C_{j}$ and $ℓ_{j}$ is contained in the intersection of $Q (s, a)$ and $ℓ_{j}$ (half-transparent green polyhedra in Fig. 1), and then intersecting $C_{j}$ with $Q (s, a)$ to give $E (s, a)$ (yellow polyhedra in Fig. 1). Then pick one candidate action from each $A_{j}$ and randomize between these $d + 1$ actions in proportions so that the corresponding convex combination of the sets $E (s, a)$ is included in $E (s)$ . Note that this is always possible because $m$ is in the convex hull of the sets $C_{j}$ and the shapes of the sets $E (s, a)$ "fit" into $E (s)$ by construction.
Aspiration propagation: After observing the successor state $s^{'}$ , the action-aspiration $E (s, a)$ is rescaled linearly from $Q (s, a)$ to $V (s^{'})$ to give the next state-aspiration $E (s^{'})$ , see Fig. 2.

(We also consider other variants of this general idea)

Fig. 1: Admissibility simplices, and construction of action-aspirations by shifting towards corners and intersecting with action admissibility simplices (see text for details).

Fig. 2: An action admissibility simplex $Q (s, a)$ is the convex combination of the successor states' admissibility simplices $V (s^{'})$ , mixed in proportion to the respective transition probabilities $P_{M} (s^{'} | s, a)$ . An action aspiration $E (s, a)$ can be rescaled to a successor state aspiration $E (s^{'})$ by first mapping the corners of the action admissibility sets onto each other (dashed lines) and extending this map linearly.

Hierarchical decision making

A common way of planning complex tasks is to decompose them into a hierarchy of two or more levels of subtasks. Similar to existing approaches from hierarchical reinforcement learning, we imagine that an AI system can make such hierarchical decisions as depicted in the following diagram (shown for only two hierarchical levels, but obviously generalizable to more levels):

Fig. 3: Hierarchical world model in the case of two hierarchical levels of decision making.

Frontpage

13 Ω 8

No comments10 karma

New Comment

3 comments, sorted by

top scoring

Click to highlight new comments since: Today at 9:29 AM

[-]Roman Malov11d10

Pick many linearly independent linear combinations $f_{j}$
isn't there at most $d$ linearly independent linear combinations of $u_{i}$ ?

Reply

[-]Roman Malov11d10

maybe you meant pairwise linearly independent (by looking at the graph)

Reply

[-]Jobst Heitzig11d10

You are of course perfectly right. What I meant was: so that their convex hull is full-dimensional and contains the origin. I fixed it. Thanks for spotting this!

Reply

Moderation Log