Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

tl;dr: Sometimes planners successfully P2B, kicking off a self-sustaining chain reaction / feedback loop of better and better plans (made possible by better and better world-models, more and more resources, etc.) Whereas fire takes a concentration of heat as input and produces a greater concentration of heat as output, agents take a concentration of convergent instrumental resources (e.g. data, money, power) as input and produce a greater concentration as output.

Previously we described the Convergent-Instrumental-Goal-to-rule-them-all, P2B: Plan to P2B Better. The most common way to P2B is to plan to plan again, in the immediate future, but with some new relevant piece of data. For example, suppose I am walking somewhere. I look up from my phone to glance at the door. Why? Because without data about the location of the doorknob, I can’t guide my hands to open the door. This sort of thing doesn’t just happen on short timescales, though; humans (and all realistic agents, I’d wager) are hierarchical planners and medium and long-term plans usually involve acquiring new relevant data as well.

This image just talks about collecting good new data, but there are other important convergent instrumental goals too. Such as staying alive, propagating your goals to other agents, and acquiring money. One could modify the diagram to include these other feedback loops as well — more data and money and power being recruited by sensor-learner-planners to acquire more data and money and power. It’s not that all of these metrics will go up with every cycle around the loop; it’s that some goals/resources are “instrumentally convergent” in that they tend to show up frequently in most real-world P2B loops.

Agents, I say, are P2B chain reactions / P2B feedback loops. They are what happens when planners successfully P2B. Whereas fire is a chain reaction that inputs heat + fuel + oxygen and outputs more heat (and often more oxygen and fuel too, as it expands to envelop more such) agents are chain reactions that input sensor-learner-planners with some amount of data, knowledge, power, money, etc. and output more, better sensor-learner-planners with greater amounts of data, knowledge, power, money, etc.

(I don’t I think this definition fully captures our intuitive concept of agency. Rather, P2B chain reactions seem like a big deal, an important concept worth talking about, and close enough to our intuitive concept of agency that I’m going to appropriate the term until someone pushes back.)

Already you might be able to guess why I think agents are powerful:

  • Consider how fire is a much bigger deal than baking-soda-vinegar. General/robust feedback loops are way way better/more important than narrow/brittle ones, and sensor-learner-planner makes for a P2B loop that is pretty damn general.
  • Ok, so we are talking about a kind of chain reaction that takes concentrations of knowledge, power, money, etc. and makes them bigger? That sure sounds like it’ll be relevant to discussions of who’s going to win important competitions over market share, political power, and control of the future!

The next post in this sequence will address the following questions, and more:

OK, so does any feedback loop of accumulating convergent instrumental resources count as an agent then? Presumably not, presumably it has to be resulting from some sort of planning to count as a P2B loop. Earlier you said planning is a family of algorithms… Say more about where the borders of this concept are please! 

The post after that will answer the Big Questions: 

Why is agency powerful? Why should we expect agent AGIs to outcompete human+tool combos for control of the future? Etc.



Ω 13

New Comment

New to LessWrong?