Being a Robust Agent

byRaemon2mo4th Oct 20187 comments


Epistemic status: not adding anything new, but figured there should be a clearer reference post for this concept.

There's a concept which many LessWrong essays have pointed at it, but I don't think there's a single post really spelling it out. I've built up an understanding of it through conversations with Zvi and Critch, and reading particular posts by Eliezer such as Meta-Honesty. (Note: none of them necessarily endorse this post, it's just my own understanding)

The idea is: you might want to become a more robust agent.

By default, humans are a kludgy bundle of ad-hoc impulses. But we have the ability to reflect upon our decision making, and the implications thereof, and derive better overall policies.

I don't think is quite the same thing as instrumental rationality (although it's tightly entwined). If your goals are simple and well-understood, and you're interfacing in a social domain with clear rules, the most instrumentally rational thing might be to not overthink it and follow common wisdom.

But it's particularly important if you want to coordinate with other agents, over the long term. Especially on ambitious, complicated projects in novel domains.

Some examples of this:

  • Be the sort of person that Omega (even a version of Omega who's only 90% accurate) can clearly tell is going to one-box. Or, more realistically – be the sort of person who your social network can clearly see is worth trusting, with sensitive information, or with power.
  • Be the sort of agent who cooperates when it is appropriate, defects when it is appropriate, and can realize that cooperating-in-this-particular-instance might look superficially like defecting, but avoid falling into a trap.
  • Think about the ramifications of people who think like you adopting the same strategy. Not as a cheap rhetorical trick to get you to cooperate on every conceivable thing. Actually think about how many people are similar to you. Actually think about the tradeoffs of worrying about a given thing. (Is recycling worth it? Is cleaning up after yourself at a group house? Is helping a person worth it? The answer actually depends, don't pretend otherwise).
  • If there isn't enough incentive for others to cooperate with you, you may need to build a new coordination mechanism so that there is enough incentive. Complaining or getting angry about it might be a good enough incentive but often doesn't work and/or isn't quite incentivizing the thing you meant. (Be conscious of the opportunity costs of building this coordination mechanism instead of other ones. Mindshare is only so big)
  • Be the sort of agent who, if some AI engineers were whiteboarding out the agent's decision making, they were see that the agent makes robustly good choices, such that those engineers would choose to implement that agent as software and run it.
  • Be cognizant of order-of-magnitude. Prioritize (both for things you want for yourself, and for large scale projects shooting for high impact).
  • Do all of this realistically given your bounded cognition. Don't stress about implementing a game theoretically perfect strategy, but do be cognizant how much computing power you actually have (and periodically reflect on whether your cached strategies can be re-evaluated given new information or more time to think). If you're being simulated on a whiteboard right now, have at least a vague, credibly notion of how you'd think better if given more resources.
  • Do all of this realistically given the bounded condition of *others*. If you have a complex strategy that involves rewarding or punishing others in highly nuanced ways.... and they can't figure out what your strategy is, you may instead just be adding random noise instead of a clear coordination protocol.

Game Theory in the Rationalsphere

The EA and Rationality worlds include lots of people with ambitious, complex goals. They have a bunch of common interests and probably should be coordinating on a bunch of stuff. But:

  • They vary in how much they've thought about their goals.
  • They vary in what their goals are.
  • They vary in where their circles of concern are drawn.
  • They vary in how hard (and how skillfully) they're trying to be be game theoretically sound agents, rather than just following local incentives.
  • They disagree on facts and strategies.

Being a robust agent means taking that into account, and executing strategies that work in a messy, mixed environment with confused allies, active adversaries, and sometimes people who are a little bit of both. (Although this includes creating credible incentives and punishments to deter adversaries from bothering, and encouraging allies to become less confused).

I'm still mulling over exactly how to translate any of this into actionable advice (for myself, let alone others). But all the other posts I wanted to write felt like they'd be easier if I could reference this concept in an off-the-cuff fashion without having to explain it in detail.