Subagents and impact measures: summary tables

Stuart_Armstrong

Subagents and impact measures: summary tables

1 min read17th Feb 20202 comments

11 Ω 6

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

These tables will summarise the results of this whole sequence, checking whether subagents can neutralise the impact penalty.

First of all, given a subagent, here are the results for various impact penalties and baselines, and various "value difference summary functions" $f$ :

$\begin{matrix} Impact Penalty & \begin{matrix} Penalty neutralised - sw inaction? \end{matrix} & \begin{matrix} Penalty neutralised - inaction? \end{matrix} Non-indexical & Yes & No Indexical, f decreasing & Yes & No Indexical, f increasing & Yes & Mostly \end{matrix}$

Another way of phrasing " $f$ decreasing": it penalises too little power, not too much. Conversely, " $f$ increasing" penalises too much power, not too little. Thus, unfortunately:

Subagents do allow an agent to get stronger than the indexical impact penalty would allow.
Subagents don't allow an agent to get weaker than the indexical impact penalty would allow.

Examples

This table presents, for three specific examples, whether they could actually build a subagent, and whether that would neutralise their impact penalty in practice (in the inaction baseline):

$\begin{matrix} System & S A possible? & Penalty neutralised? 20BQ & Yes & No RR & Yes & No AU & Probably & Mostly \end{matrix}$

Here, 20BQ is twenty billion questions, RR is relative reachability, and AU is attainable utility.

Now, whether the RR or AU penalties are undermined technically depends on $f$ , not on what measure is being used for value. However, I feel that the results undermine the spirit of AU much more than the spirit of RR. AU attempted to control an agent by limiting its power; this effect is mainly neutralised. RR attempted to control the side-effects of an agent by ensuring it had enough power to reach a lot of states; this effect is not neutralised by a subagent.

Mentioned in

17Building and using the subagent

17In theory: does building the subagent have an "impact"?

16Counterfactuals versus the laws of physics

12Stepwise inaction and non-indexical impact measures

12Appendix: mathematics of indexical impact measures

Load More (5/6)

New Comment

2 comments, sorted by

top scoring

Click to highlight new comments since: Today at 2:11 AM

[-]TurnTrout4yΩ120

RR attempted to control the side-effects of an agent by ensuring it had enough power to reach a lot of states; this effect is not neutralised by a subagent.

Things might get complicated by partial observability; in the real world, the agent is minimizing change in its beliefs about what it can reach. Otherwise, you could just get around the SA problem for AUP as well by substituting the reward functions for state indicator reward functions.

Reply

[-]Stuart_Armstrong4yΩ120

AU and RR have the same $S A$ problem, formally, in terms of excess power; it's just that AU wants low power and RR wants high power, so they don't have the same problem in practice.

Reply

Moderation Log