Mistakes as agency

pchvykov

Note: this is cross posted from my blog, where the videos are posted which I can't upload here - take a look at those here.

We’re all somehow familiar with the concept that a person's actions can be “too perfect” – to the point of seeming creepy, or mechanical, or even dangerous. Classically this comes from sci-fi portrayals of humanoid robots, but can also just be an experience of watching someone do something so well that we get a strange feeling.

Early in my PhD, I spent a year in a cognitive science research group at MIT thinking about what it is that makes us perceive some things as “alive” or “agent-like,” and other things as “inanimate” or "mechanical." The classic motivation for this comes from this simple video made by researchers back in 1944. When viewers were asked to describe the video, they would talk about a fight, an abduction, a rescue, a love story – but none would say that there were some triangles and a circle moving around on a screen. The lab I worked with ran various experimental studies with young children where they would show them different sorts of behaviors of some simple shapes and checked whether the child viewed these shapes as “objects” or “agents.” I tried to develop and benchmark some simple theory for this.

The hypothesis I was working on was that the key distinguishing factor was the apparent “planning horizon.” Whenever we view some dynamics, we try to understand and infer its causes. We hypothesized that if the behavior could be explained in terms of a very short-term planner, then it would look like an object (e.g., ball rolling down hill), while if long-term planning was required to explain the dynamics, it would look like an agent.

To test this, I looked at a simple toy system: a "point-agent" exploring a utility landscape. The video here shows the motion in a 1D landscape with two local utility optima (negative utility on the y-axis). In this setting, if the short-term planner is to look like the motion of an object, then Newton’s equations need to be recovered as the short-plan limit of some more general control theory. This turns out to be the case if we set the reward function of the planner to try to minimize accelerations and maximize utility, integrated across its planning horizon. As a nice bonus, when the plan is short but not 0, the first correction just looks like an additional damping term, and so still naturally fits into "object-like" paradigm. The first video here shows the resulting motion - which clearly looks like a ball rolling in an energy landscape, with no agent-like characteristics.

To see a difference between an agent that can plan ahead and one that cannot, I initialized its position near a local utility optimum, so that to get to the global optimum it needed to get over a hump. In the second video here, we see what the motion looks like when the agent can plan sufficiently ahead to get over the barrier. While it looks like it's doing everything perfectly to optimize the specified reward and smoothly finds its way to the global optimum, the motion still hardly looks anything agent-like. This seems to invalidate our original hypothesis that agency is about planning ahead.

But as I was setting up and running these experiments, I stumbled across a curious effect. As I was doing the reward optimization numerically, it sometimes found some sub-optimal dynamics - a local min of the reward function. This never happened for the short-term planner, as the optimization there was over a very small "plan space" - just the next time-step. But with the larger search space of the long-term planner, this was quite common, especially near bifurcation or decision points, which then made our "agent" sometimes take a few steps in the wrong direction. I noticed that precisely when it made such mistakes, it started looking a lot more agent-like or "alive." The third video here shows two examples of such dynamics.

Sadly I did not have time to develop these preliminary insights much further, and so this work remains unpublished - which is why I'm writing it up here, so that it might see some light of day. But I think these experiments are suggestive of a cool new hypothesis, and one we don't tend to hear about very often: that the difference between whether we perceive something as object-like and mechanical, or agent-like and alive, may be primarily due to the capacity of the latter to make mistakes, be imperfect, fallible, or "only human."

This brings us back to the point at the start of this post - that actions that are too perfect can look eerily inanimate, even when we see a human agent performing them. Indeed, think of the difference between a human ballet dancer, and an ideal robotic ballet dancer: that slight imperfection makes the human somehow relatable for us. E.g., in CGI animation you have to make your characters make some unnecessary movements, each step must be different than any other, etc. We often appreciate hand-crafted art more than perfect machine-generated decorations for the same sort of minute asymmetry that makes it relatable, and thus admirable. In voice recording, you often record the song twice for the L and R channels, rather than just copying (see 'double tracking') - the slight differences make the sound "bigger" and "more alive" - etc, etc.

I find this especially exciting as it could bring new perspective on the narrative we have around mistakes in our society. If this result was established, perhaps we could stop seeing mistakes as something to be shamed and avoided at all costs, and instead see them as an imperative part of being alive - and in fact the main reasons we even assign something agency. This might then also affect not just our perception of shapes on a screen, or of each other, but also of ourselves: We already have some notion that living a perfectly successful routine doing only things we are good at can sometimes lead to a sense of a dull, gray, or mechanical life. On the other hand, more "aliveness" comes when we try or learn new things, push our limits into areas where we aren't perfect and mistakes are abundant.

"Agency" is a subtle and complex concept, which is also somehow so important to us personally and socially. Perhaps we greatly underappreciate the integral role that mistakes play in it?

In the libertarian conception (not related to any current political movement), "liberty" has long been recognized as "freedom to make choices that many would consider mistakes or harmful." Freedom to do the approved (societally or legally) things is not actually freedom, it's just accidental compliance.

I took this to mostly mean "freedom to use illegible and personal data in decisions". I suspect your findings are similar - it's about "mistakes" in terms of not-simply-optimal-for-a-stated-goal.

Could the mistakes be only a special case of a more general trait of being prone to changing its behavior for illegible reasons?

E.g. for me, the behavior in video #3 does not look like a mistake. Initially it feels like a possibly straightforward optimizing behavior, similar to the case #1, but then the object inexplicably "changes its mind", and that switches my perception of the video into an agentic picture. A mistake is only one possible interpretation; another can be the agent getting a new input (in a way unobvious to the viewer), or maybe something else going on inside the agent's "black box".

ah, yes! good point - so something like the presence of "unseen causes"?
The other hypothesis the lab I worked with looked into was the presence of some 'internally generated forces' - sort of like an 'unmoved mover' - which feels similar to what you're suggesting?
In some way, this feels not really more general than "mistakes," but sort of a different route. Namely, I can imagine some internal forces guiding a particle perfectly through a maze in a way that will still look like an automaton

I agree that unexpected mistakes add to the feeling of something being an agent rather than an automaton. I mentioned it as "Knightean uncertainty" of software bugs in that old post of mine. The apparent planning horizon makes sense, too. I hope there will be some more published research about this.

Some thoughts:

I don't quite understand what you mean by "doing something so well we get a strange feeling". Would you include e.g. optimal play in an endgame of chess, which was precomputed, as an instance?

In that case, I'd guess that maybe the generator of the feeling is the fact that the moves are pre-computed. Much of the optimisation was done before hand, and is not visible, it is the kind of thing you could imagine being hardcoded.

And I think the reason that hardcoded behaviour feels non-agentic is because we have the ability to hold this optimal policy in our heads and fully understand it. Any description of "goals" and "beliefs" and "optimisation" which might generate this policy is about as long as the policy itself, and both are short enough that we can understand them.

Agency seems like a sweet spot. The system is complex enough that we need to ascribe simplified descriptions of "goals" and "beliefs" and "intentionality" to predict its behaviour and not simple enough that we can completely simulate its behaviour in our heads, or perhaps even with any computation method we feel we can fully understand. But it is not so complex that a "goal+belief+intention" description is beyond our capacity, and its actions seem inscrutable.

How's this related to making mistakes? Well, it the "agent" description seems to better account for errors than the system being a perfectly predictable optimisation process with the simple goal of "get to the bottom". And its optimisation is not so rapid that it feels like we're just watching the end policy. We get to see it "thinking" live. As a result of this, I'd suspect that if I understood the algorithm better, I'd feel like it was less agentic.

I like the idea of agency being some sweet spot between being too simple and too complex, yes. Though I'm not sure I agree that if we can fully understand the algorithm, then we won't view it as an agent. I think the algorithm for this point particle is simple enough for us to fully understand, but due to the stochastic nature of the optimization algorithm, we can never fully predict it. So I guess I'd say agency isn't a sweet spot in the amount of computation needed, but rather in the amount of stochasticity perhaps?

As for other examples of "doing something so well we get a strange feeling," the chess example wouldn't be my go-to, since the action space there is somehow "small" since it is discrete and finite. I'm more thinking of the difference between a human ballet dancer, and an ideal robotic ballet dancer - that slight imperfection makes the human somehow relatable for us. E.g., in CGI you have to make your animated characters make some unnecessary movements, each step must be different than any other, etc. We often admire hand-crafted art more than perfect machine-generated decorations for the same sort of minute asymmetry that makes it relatable, and thus admirable. In voice recording, you often record the song twice for the L and R channels, rather than just copying (see 'double tracking') - the slight differences make the sound "bigger" and "more alive." Etc, etc.

Does this make sense?

I think that if we fully understood the algorithm and had chunked it in our heads so we could just imagine manipulating it any way we liked, then I think we would view it as less agenty. But of course, a lot of our intuitions are rough heuristics and they might misfire in various ways and make us think "agent!" in a way we don't reflectively endorse (like we don't endorse "a smiley face--> a person").

Or, you know, my attempted abstraction of my agent intuitions fails in some way. I think that the stochasticity thing might play a part in being agent. Like, maybe because most agents are power seeking and power seeking behaviour is about leaving lots of options live and thus increasing other's uncertainty about your future actions. Wasn't there that paper about entropy which someone linked to in the comments of one of TurnTrout's "rethinking impact" posts? It was about modeling entropy in a way that shared mathematical structure with impact measures. Of course, there's also some kinds of logical uncertainty when you model an agent modeling you.

As for the example of dancing, CGI and music I'd say that's more about "natural/human" vs "unnatural/inhuman" than "agent" vs "not-agent", though there's a large inner product between the two axis.

just updated the post to add this clarification about "too perfect" - thanks for your question!