Being a Robust Agent

by Raemon6 min read11th Jan 202028 comments


Robust AgentsAgencyRationality

Second version, updated for the 2018 Review. See change notes.

There's a concept which many LessWrong essays have pointed at it (indeed, I think the entire sequences are exploring). But I don't think there's a single post really spelling it out explicitly:

You might want to become a more robust, coherent agent.

By default, humans are a kludgy bundle of impulses. But we have the ability to reflect upon our decision making, and the implications thereof, and derive better overall policies.

Some people find this naturally motivating –it's aesthetically appealing to be a coherent agent. But if you don't find naturally appealing, the reason I think it’s worth considering is robustness – being able to succeed at novel challenges in complex domains.

This is related to being instrumentally rational, but I don’t think they’re identical. If your goals are simple and well-understood, and you're interfacing in a social domain with clear rules, and/or you’re operating in domains that the ancestral environment would have reasonably prepared you for… the most instrumentally rational thing might be to just follow your instincts or common folk-wisdom.

But instinct and common wisdom often aren’t enough, such as when...

  • You expect your environment to change, and default-strategies to stop working.
  • You are attempting complicated plans for which there is no common wisdom, or where you will run into many edge-cases.
  • You need to coordinate with other agents in ways that don’t have existing, reliable coordination mechanisms.
  • You expect instincts or common wisdom to be wrong in particular ways.
  • You are trying to outperform common wisdom. (i.e. you’re a maximizer instead of a satisficer, or are in competition with other people following common wisdom)

In those cases, you may need to develop strategies from the ground up. Your initial attempts may actually be worse than the common wisdom. But in the longterm, if you can acquire gears-level understanding of yourself, the world and other agents, you might eventually outperform the default strategies.

Elements of Robust Agency

I think of Robust Agency as having a few components. This is not exhaustive, but an illustrative overview:

  • Deliberate Agency
  • Gears-level-understanding of yourself
  • Coherence and Consistency
  • Game Theoretic Soundness

Deliberate Agency

First, you need to decide to be any kind of deliberate agent at all. Don't just go along with whatever kludge of behaviors that evolution and your social environment cobbled together. Instead, make conscious choices about your goals and decision procedures that you reflectively endorse,

Gears Level Understanding of Yourself

In order to reflectively endorse your goals and decisions, it helps to understand your goals and decisions, as well as intermediate parts of yourself. This requires many subskills, such as the ability to introspect, or to make changes to how your decision making works.

(Meanwhile, it also helps to understand how your decisions interface with the rest of the world, and the people you interact with. Gears level understanding is generally useful. Scientific and mathematical literacy helps you validate your understanding of the world)

Coherence and Consistency

If you want to lose weight and also eat a lot of ice cream, that’s a valid set of human desires. But, well, it might just be impossible.

If you want to make long term plans that require commitment but also want the freedom to abandon those plans whenever, you may have a hard time. People you made plans with might get annoyed.

You can make deliberate choices about how to resolve inconsistencies in your preferences. Maybe you decide “actually, losing weight isn’t that important to me”, or maybe you decide that you want to keep eating all your favorite foods but also cut back on overall calorie consumption.

The "commitment vs freedom" example gets at a deeper issue – each of those opens up a set of broader strategies, some of which are mutually exclusive. How you resolve the tradeoff will shape what future strategies are available to you.

There are benefits to reliably being able to make trades with your future-self, and with other agents. This is easier if your preferences aren’t contradictory, and easier if your preferences are either consistent over time, or at least predictable over time.

Game Theoretic Soundness

There are other agents out there. Some of them have goals orthogonal to yours. Some have common interests with you, and you may want to coordinate with them. Others may be actively harming you and you need to stop them.

They may vary in…

  • What their goals are.
  • What their beliefs and strategies are.
  • How much they've thought about their goals.
  • Where they draw their circles of concern.
  • How hard (and how skillfully) they're trying to be game theoretically sound agents, rather than just following local incentives.

Being a robust agent means taking that into account. You must find strategies that work in a messy, mixed environment with confused allies, active adversaries, and sometimes people who are a little bit of both. (This includes creating credible incentives and punishments to deter adversaries from bothering, and motivating allies to become less confused).

Related to this is legibility. Your gears-level-model-of-yourself helps you improve your own decision making. But it also lets you clearly expose your policies to other people. This can help with trust and coordination. If you have a clear decision-making procedure that makes sense, other agents can validate it, and then you can tackle more interesting projects together.


Here’s a smattering of things I’ve found helpful to think about through this lens:

  • Be the sort of person that Omega can clearly tell is going to one-box – even a version of Omega who's only 90% accurate. Or, less exotically: Be the sort of person who your social network can clearly see is worth trusting, with sensitive information, or with power. Deserve Trust.
  • Be the sort of agent who cooperates when it is appropriate, defects when it is appropriate, and can realize that cooperating-in-this-particular-instance might look superficially like defecting, but avoid falling into the trap.
  • Think about the ramifications of people who think like you adopting the same strategy. Not as a cheap rhetorical trick to get you to cooperate on every conceivable thing. Actually think about how many people are similar to you. Actually think about the tradeoffs of worrying about a given thing. (Is recycling worth it? Is cleaning up after yourself at a group house? Is helping a person worth it? The answer actually depends, don't pretend otherwise).
  • If there isn't enough incentive for others to cooperate with you, you may need to build a new coordination mechanism so that there is enough incentive. Complaining or getting angry about it might be a good enough incentive but often doesn't work and/or isn't quite incentivizing the thing you meant. (Be conscious of the opportunity costs of building this coordination mechanism instead of other ones. Be conscious of trying and failing to build a coordination mechanism. Mindshare is only so big)
  • Be the sort of agent who, if some AI engineers were whiteboarding out the agent's decision making, they would see that the agent makes robustly good choices, such that those engineers would choose to implement that agent as software and run it.
  • Be cognizant of order-of-magnitude. Prioritize (both for things you want for yourself, and for large scale projects shooting for high impact).
  • Do all of this realistically given your bounded cognition. Don't stress about implementing a game theoretically perfect strategy, but do be cognizant how much computing power you actually have (and periodically reflect on whether your cached strategies can be re-evaluated given new information or more time to think). If you're being simulated on a whiteboard right now, have at least a vague, credible notion of how you'd think better if given more resources.
  • Do all of this realistically given the bounded condition of *others*. If you have a complex strategy that involves rewarding or punishing others in highly nuanced ways.... and they can't figure out what your strategy is, you may instead just be adding random noise instead of a clear coordination protocol.

Why is this important?

If you are a maximizer, trying to do something hard, it's hopefully a bit obvious why this is important. It's hard enough to do hard things without having incoherent exploitable policies and wasted motion chasing inconsistent goals.

If you're a satisficer, and you're basically living your life pretty chill and not stressing too much about it, it's less obvious that becoming a robust, coherent agent is useful. But I think you should at least consider it, because...

The world is unpredictable

The world is changing rapidly, due to cultural clashes as well as new technology. Common wisdom can’t handle the 20th century, let alone the 21st, let alone a singularity.

I feel comfortable making the claim: Your environment is almost certainly unpredictable enough that you will benefit from a coherent approach to solving novel problems. Understanding your goals and your strategy are vital.

There are two main reasons I can see to not prioritize the coherent agent strategy:

1. There may be higher near-term priorities.

You may want to build a safety net, to give yourself enough slack to freely experiment. It may make sense to first do all the obvious things to get a job, have enough money, and social support. (That is, indeed, what I did)

I'm not kidding when I say that building your decisionmaking from the ground up can leave you worse off in the short term. The valley of bad rationality be real, yo. See this post for some examples of things to watch out for.

Becoming a coherent agent is useful, but if you don't have a general safety net, I'd prioritize that first.

2. Self-reflection and self-modification is hard.

It requires a certain amount of mental horsepower, and some personality traits that not everyone has, including:

  • Social resilience and openness-to-experience (necessary to try nonstandard strategies).
  • Something like ‘stability’ or ‘common sense’ (I’ve seen some people try to rebuild their decision theory from scratch and end up hurting themselves).
  • In general, the ability to think on purpose, and do things on purpose.

If you’re the sort of person who ends up reading this post, I think you are probably the sort of person who would probably benefit (someday, from a position of safety/slack) from attempting to become more coherent, robust and agentic.

I’ve spent the past few years hanging around people who more agentic than me. It took a long while to really absorb their worldview. I hope this post gives others a clearer idea of what this path might look like, so they can consider it for themselves.

Game Theory in the Rationalsphere

That said, the reason I was motivated to write this wasn’t to help individuals. It was to help with group coordination.

The EA, Rationality and X-Risk ecosystems include lots of people with ambitious, complex goals. They have many common interests and should probably be coordinating on a bunch of stuff. But they disagree on many facts, and strategies. They vary in how hard they’ve tried to become game-theoretically-sound agents.

My original motivation for writing this post was that I kept seeing (what seemed to me) to be strategic mistakes in coordination. It seemed to me that people were acting as if the social landscape was more uniform, and expecting people to be on the same “meta-page” of how to resolve coordination failure.

But then I realized that I’d been implicitly assuming something like “Hey, we’re all trying to be robust agents, right? At least kinda? Even if we have different goals and beliefs and strategies?”

And that wasn’t obviously true in the first place.

I think it’s much easier to coordinate with people if you are able to model each other. If people have common knowledge of a shared meta-strategic-framework, it’s easier to discuss strategy and negotiate. If multiple people are trying to make their decision-making robust in this way, that hopefully can constrain their expectations about when and how to trust each other.

And if you aren’t sharing a meta-strategic-framework, that’s important to know!

So the most important point of this post is to lay out the Robust Agent paradigm explicitly, with a clear term I could quickly refer to in future discussions, to check “is this something we’re on the same page about, or not?” before continuing on to discuss more complicated ideas.