Being a Robust Agent

Second version, updated for the 2018 Review. See change notes.

There's a concept which many LessWrong essays have pointed at it (indeed, I think the entire sequences are exploring). But I don't think there's a single post really spelling it out explicitly:

You might want to become a more robust, coherent agent.

By default, humans are a kludgy bundle of impulses. But we have the ability to reflect upon our decision making, and the implications thereof, and derive better overall policies.

Some people find this naturally motivating –it's aesthetically appealing to be a coherent agent. But if you don't find naturally appealing, the reason I think it’s worth considering is robustness – being able to succeed at novel challenges in complex domains.

This is related to being instrumentally rational, but I don’t think they’re identical. If your goals are simple and well-understood, and you're interfacing in a social domain with clear rules, and/or you’re operating in domains that the ancestral environment would have reasonably prepared you for… the most instrumentally rational thing might be to just follow your instincts or common folk-wisdom.

But instinct and common wisdom often aren’t enough, such as when...

  • You expect your environment to change, and default-strategies to stop working.
  • You are attempting complicated plans for which there is no common wisdom, or where you will run into many edge-cases.
  • You need to coordinate with other agents in ways that don’t have existing, reliable coordination mechanisms.
  • You expect instincts or common wisdom to be wrong in particular ways.
  • You are trying to outperform common wisdom. (i.e. you’re a maximizer instead of a satisficer, or are in competition with other people following common wisdom)

In those cases, you may need to develop strategies from the ground up. Your initial attempts may actually be worse than the common wisdom. But in the longterm, if you can acquire gears-level understanding of yourself, the world and other agents, you might eventually outperform the default strategies.

Elements of Robust Agency

I think of Robust Agency as having a few components. This is not exhaustive, but an illustrative overview:

  • Deliberate Agency
  • Gears-level-understanding of yourself
  • Coherence and Consistency
  • Game Theoretic Soundness

Deliberate Agency

First, you need to decide to be any kind of deliberate agent at all. Don't just go along with whatever kludge of behaviors that evolution and your social environment cobbled together. Instead, make conscious choices about your goals and decision procedures that you reflectively endorse,

Gears Level Understanding of Yourself

In order to reflectively endorse your goals and decisions, it helps to understand your goals and decisions, as well as intermediate parts of yourself. This requires many subskills, such as the ability to introspect, or to make changes to how your decision making works.

(Meanwhile, it also helps to understand how your decisions interface with the rest of the world, and the people you interact with. Gears level understanding is generally useful. Scientific and mathematical literacy helps you validate your understanding of the world)

Coherence and Consistency

If you want to lose weight and also eat a lot of ice cream, that’s a valid set of human desires. But, well, it might just be impossible.

If you want to make long term plans that require commitment but also want the freedom to abandon those plans whenever, you may have a hard time. People you made plans with might get annoyed.

You can make deliberate choices about how to resolve inconsistencies in your preferences. Maybe you decide “actually, losing weight isn’t that important to me”, or maybe you decide that you want to keep eating all your favorite foods but also cut back on overall calorie consumption.

The "commitment vs freedom" example gets at a deeper issue – each of those opens up a set of broader strategies, some of which are mutually exclusive. How you resolve the tradeoff will shape what future strategies are available to you.

There are benefits to reliably being able to make trades with your future-self, and with other agents. This is easier if your preferences aren’t contradictory, and easier if your preferences are either consistent over time, or at least predictable over time.

Game Theoretic Soundness

There are other agents out there. Some of them have goals orthogonal to yours. Some have common interests with you, and you may want to coordinate with them. Others may be actively harming you and you need to stop them.

They may vary in…

  • What their goals are.
  • What their beliefs and strategies are.
  • How much they've thought about their goals.
  • Where they draw their circles of concern.
  • How hard (and how skillfully) they're trying to be game theoretically sound agents, rather than just following local incentives.

Being a robust agent means taking that into account. You must find strategies that work in a messy, mixed environment with confused allies, active adversaries, and sometimes people who are a little bit of both. (This includes creating credible incentives and punishments to deter adversaries from bothering, and motivating allies to become less confused).

Related to this is legibility. Your gears-level-model-of-yourself helps you improve your own decision making. But it also lets you clearly expose your policies to other people. This can help with trust and coordination. If you have a clear decision-making procedure that makes sense, other agents can validate it, and then you can tackle more interesting projects together.


Here’s a smattering of things I’ve found helpful to think about through this lens:

  • Be the sort of person that Omega can clearly tell is going to one-box – even a version of Omega who's only 90% accurate. Or, less exotically: Be the sort of person who your social network can clearly see is worth trusting, with sensitive information, or with power. Deserve Trust.
  • Be the sort of agent who cooperates when it is appropriate, defects when it is appropriate, and can realize that cooperating-in-this-particular-instance might look superficially like defecting, but avoid falling into the trap.
  • Think about the ramifications of people who think like you adopting the same strategy. Not as a cheap rhetorical trick to get you to cooperate on every conceivable thing. Actually think about how many people are similar to you. Actually think about the tradeoffs of worrying about a given thing. (Is recycling worth it? Is cleaning up after yourself at a group house? Is helping a person worth it? The answer actually depends, don't pretend otherwise).
  • If there isn't enough incentive for others to cooperate with you, you may need to build a new coordination mechanism so that there is enough incentive. Complaining or getting angry about it might be a good enough incentive but often doesn't work and/or isn't quite incentivizing the thing you meant. (Be conscious of the opportunity costs of building this coordination mechanism instead of other ones. Be conscious of trying and failing to build a coordination mechanism. Mindshare is only so big)
  • Be the sort of agent who, if some AI engineers were whiteboarding out the agent's decision making, they would see that the agent makes robustly good choices, such that those engineers would choose to implement that agent as software and run it.
  • Be cognizant of order-of-magnitude. Prioritize (both for things you want for yourself, and for large scale projects shooting for high impact).
  • Do all of this realistically given your bounded cognition. Don't stress about implementing a game theoretically perfect strategy, but do be cognizant how much computing power you actually have (and periodically reflect on whether your cached strategies can be re-evaluated given new information or more time to think). If you're being simulated on a whiteboard right now, have at least a vague, credible notion of how you'd think better if given more resources.
  • Do all of this realistically given the bounded condition of *others*. If you have a complex strategy that involves rewarding or punishing others in highly nuanced ways.... and they can't figure out what your strategy is, you may instead just be adding random noise instead of a clear coordination protocol.

Why is this important?

If you are a maximizer, trying to do something hard, it's hopefully a bit obvious why this is important. It's hard enough to do hard things without having incoherent exploitable policies and wasted motion chasing inconsistent goals.

If you're a satisficer, and you're basically living your life pretty chill and not stressing too much about it, it's less obvious that becoming a robust, coherent agent is useful. But I think you should at least consider it, because...

The world is unpredictable

The world is changing rapidly, due to cultural clashes as well as new technology. Common wisdom can’t handle the 20th century, let alone the 21st, let alone a singularity.

I feel comfortable making the claim: Your environment is almost certainly unpredictable enough that you will benefit from a coherent approach to solving novel problems. Understanding your goals and your strategy are vital.

There are two main reasons I can see to not prioritize the coherent agent strategy:

1. There may be higher near-term priorities.

You may want to build a safety net, to give yourself enough slack to freely experiment. It may make sense to first do all the obvious things to get a job, have enough money, and social support. (That is, indeed, what I did)

I'm not kidding when I say that building your decisionmaking from the ground up can leave you worse off in the short term. The valley of bad rationality be real, yo. See this post for some examples of things to watch out for.

Becoming a coherent agent is useful, but if you don't have a general safety net, I'd prioritize that first.

2. Self-reflection and self-modification is hard.

It requires a certain amount of mental horsepower, and some personality traits that not everyone has, including:

  • Social resilience and openness-to-experience (necessary to try nonstandard strategies).
  • Something like ‘stability’ or ‘common sense’ (I’ve seen some people try to rebuild their decision theory from scratch and end up hurting themselves).
  • In general, the ability to think on purpose, and do things on purpose.

If you’re the sort of person who ends up reading this post, I think you are probably the sort of person who would probably benefit (someday, from a position of safety/slack) from attempting to become more coherent, robust and agentic.

I’ve spent the past few years hanging around people who more agentic than me. It took a long while to really absorb their worldview. I hope this post gives others a clearer idea of what this path might look like, so they can consider it for themselves.

Game Theory in the Rationalsphere

That said, the reason I was motivated to write this wasn’t to help individuals. It was to help with group coordination.

The EA, Rationality and X-Risk ecosystems include lots of people with ambitious, complex goals. They have many common interests and should probably be coordinating on a bunch of stuff. But they disagree on many facts, and strategies. They vary in how hard they’ve tried to become game-theoretically-sound agents.

My original motivation for writing this post was that I kept seeing (what seemed to me) to be strategic mistakes in coordination. It seemed to me that people were acting as if the social landscape was more uniform, and expecting people to be on the same “meta-page” of how to resolve coordination failure.

But then I realized that I’d been implicitly assuming something like “Hey, we’re all trying to be robust agents, right? At least kinda? Even if we have different goals and beliefs and strategies?”

And that wasn’t obviously true in the first place.

I think it’s much easier to coordinate with people if you are able to model each other. If people have common knowledge of a shared meta-strategic-framework, it’s easier to discuss strategy and negotiate. If multiple people are trying to make their decision-making robust in this way, that hopefully can constrain their expectations about when and how to trust each other.

And if you aren’t sharing a meta-strategic-framework, that’s important to know!

So the most important point of this post is to lay out the Robust Agent paradigm explicitly, with a clear term I could quickly refer to in future discussions, to check “is this something we’re on the same page about, or not?” before continuing on to discuss more complicated ideas.

New Comment
32 comments, sorted by Click to highlight new comments since:

Author here. I still endorse the post and have continued to find it pretty central to how I think about myself and nearby ecosystems.

I just submitted some major edits to the post. Changes include:

1. Name change ("Robust, Coherent Agent")

After much hemming and hawing and arguing, I changed the name from "Being a Robust Agent" to "Being a Robust, Coherent Agent." I'm not sure if this was the right call.

It was hard to pin down exactly one "quality" that the post was aiming at. Coherence was the single word that pointed towards "what sort of agent to become." But I think "robustness" still points most clearly towards why you'd want to change. I added some clarifying remarks about that. In individual sentences I tend to refer to either "Robust Agents" or "Coherent agents" depending on what that sentence was talking about

Other options include "Reflective Agent" or "Deliberate Agent." (I think once you deliberate on what sort of agent you want to be, you often become more coherent and robust, although not necessarily)

Edit" Undid the name change, seemed like it was just a worse title.

2. Spelling out what exactly the strategy entails

Originally the post was vaguely gesturing at an idea. It seemed good to try to pin that idea down more clearly. This does mean that, by getting "more specific" it might also be more "wrong." I've run the new draft by a few people and I'm fairly happy with the new breakdown:

  • Deliberate Agency
  • Gears Level Understanding of Yourself
  • Coherence and Consistency
  • Game Theoretic Soundness

But, if people think that's carving the concept at the wrong joints, let me know.

3. "Why is this important?"

Zvi's review noted that the post didn't really argue why becoming a robust agent was so important. 

Originally, I viewed the post as simply illustrating an idea rather than arguing for it, and... maybe that was fine. I think it would have been fine to "why" that for a followup post. 

But I reflected a bit on why it seemed important to me, and ultimately thought that it was worth spelling it out more explicitly here. I'm not sure my reasons are the same as Zvi's, or others. But, I think they are fairly defensible reasons. Interested if anyone has significantly different reasons, or thinks that the reasons I listed don't make sense.

I'm leaning towards reverting the title to just "being a robust agent", since the new title is fairly clunky, and someone gave me private feedback that it felt less like a clear-handle for a concept. [edit: have done so]

So the most important point of this post is to lay out the Robust Agent paradigm explicitly, with a clear term I could quickly refer to in future discussions, to check “is this something we’re on the same page about, or not?” before continuing on to discuss more complicated ideas.

Have you found that this post (and the concept handle) have been useful for this purpose? Have you found that you do in fact reference it as a litmus test, and steer conversations according to the response others make?

It's definitely been useful with people I've collaborated closely with. (I find the post a useful background while working with the LW team, for example)

I haven't had a strong sense of whether it's proven beneficial to other people. I have a vague sense that the sort of people who inspired this post mostly take this as background that isn't very interesting or something. Possibly with a slightly different frame on how everything hangs together.

It sounds like this post functions (and perhaps was intended) primarily as a filter for people who are already good at agency, and secondarily as a guide for newbies?

If so, that seems like a key point - surrounding oneself with other robust (allied) agents helps develop or support one's own agency.

I actually think it works better as a guide for newbies than as a filter. The people I want to filter on, I typically am able to have long protracted conversations about agency with them anyway, and this blog post isn't the primary way that they get filtered.

I feel like perhaps the name "Adaptive Agent" captures a large element of what you want: an agent capable of adapting to shifting circumstances.

I like the edits!

One thing I think might be worth doing is linking to the post on Realism about Rationality, and explicitly listing at is a potential crux for this post.

I'm pretty onboard theoreticallly with the idea of being a robust agent, but I don't actually endorse it as a goal because I tend to be a rationality anti-realist.

I actually don't consider Realism about Rationality cruxy for this (I tried to lay out my own cruxes in this version). Part of what seemed important here is that I think Coherent Agency is only useful in some cases for some people, and I wanted to be clear about when that was.

I think each of the individual properties (gears level understanding, coherence, game-theoretic-soundness) are each just sort of obviously useful in some ways. There are particular failure modes to get trapped in if you've only made some incremental progress, but generally I think you can make incremental improvements in each domain and get improvements-in-life-outcome.

I do think that the sort of person who naturally gravitates towards this probably has something like 'rationality realism' going on, but I suspect it's not cruxy, and in particular I suspect shouldn't be cruxy for people who aren't naturally oriented that way.

Some people are aspiring directly to be a fully coherent, legible, sound agent. And that might be possible or desirable, and it might be possible to reach a variation of that that is cleanly mathematically describable. But I don't think that has be true for the concept to be useful. 

generally I think you can make incremental improvements in each domain and get improvements-in-life-outcome.

To me this implies some level on the continuum of realism about rationality. For instance I often think taht to make improvements on life outcomes I have to purposefully go off of pareto improvements in these domaiins, and sometimes sacrifice them. Because I don't think my brain runs that code natively, and sometimes efficient native code is in direct opposition to naive rationality.


I've been watching the discussion on Realism About Rationality with some interest and surprise. I had thought of 'something like realism about rationality' as more cruxy for alignment work, because the inspectability of the AI matters a lot more than the inspectability of your own mind – mostly because you're going to scale up the AI a lot more than your own mind is likely to scale up. The amount of disagreement that's come out more recently about that has been interesting.

Some of the people who seem most invested in the Coherent Agency thing are specifically trying to operate on cosmic scales (i.e. part of their goal is to capture value in other universes and simulations, and to be the sort of person you could safely upload).

Upon reflection though, I guess it's not surprising that people don't consider realism "cruxy" for alignment, and also not "cruxy" for personal agency (i.e. upon reflection, I think it's more like an aesthetic input, than a crux. It's not necessary for agency to be mathematically simple or formalized, for incremental legibility and coherence to be useful for avoiding wasted motion)

Bumping this up to two nominations not because I think it needs a review, but because I like it and it captures an important insight that I've not seen written up like this elsewhere.

In my own life, these insights have led me to do/considering doing things like:

  • not sharing private information even with my closest friends -- in order for them to know in future that I'm the kind of agent who can keep important information (notice that there is the counterincentive that, in the moment, sharing secrets makes you feel like you have a stronger bond with someone -- even though in the long-run it is evidence to them that you are less trustworthy)
  • building robustness between past and future selves (e.g. if I was excited about and had planned for having a rest day, but then started that day by work and being really excited by work, choosing to stop work and decide to rest such that different parts of me learn that I can make and keep inter-temporal deals (even if work seems higher ev in the moment))
  • being more angry with friends (on the margin) -- to demonstrate that I have values and principles and will defend those in a predictable way, making it easier to coordinate with and trust me in future (and making it easier for me to trust others, knowing I'm capable of acting robustly to defend my values)
  • thinking about, in various domains, "What would be my limit here? What could this person do such that I would stop trusting them? What could this organisation do such that I would think their work is net negative?" and then looking back at those principles to see how things turned out
  • not sharing passwords with close friends, even for one-off things -- not because I expect them to release or lose it, but simply because it would be a security flaw that makes them more vulnerable to anyone wanting to get to me. It's a very unlikely scenario, but I'm choosing to adopt a robust policy across cases, and it seems like useful practice
If there isn't enough incentive for others to cooperate with you, don't get upset for them if they defect (or "hit the neutral button.") BUT maybe try to create a coordination mechanism so that there is enough incentive.

It seems like "getting upset" is often a pretty effective way of creating exactly the kind of incentive that leads to cooperation. I am reminded of the recent discussion on investing in the commons, where introducing a way to punish defectors greatly increased total wealth. Generalizing that to more everyday scenarios, it seems that being angry at someone is often (though definitely not always, and probably not in the majority of cases) a way to align incentives better.

(Note: I am not arguing in favor of people getting more angry more often, just saying that not getting angry doesn't seem like a core aspect of the "robust agent" concept that Raemon is trying to point at here)

Ah. The thing I was trying to point at here was the "Be Nice, At Least Until You Can Coordinate Meanness" thing.

The world is full of people who get upset at you for not living up to the norms they prefer. There are, in fact, so many people who will get upset for so many contradictory norms that it just doesn't make much sense to try to live up to them all, and you shouldn't be that surprised that it doesn't work.

The motivating examples were something like "Bob gets upset at people for doing thing X. A little while later, people are still doing thing X. Bob gets upset again. Repeat a couple times. Eventually it (should, according to me) become clear that a) getting upset isn't having the desired effect, or at most is producing the effect of "superficially avoid behavior X when Bob is around". And meanwhile, getting upset is sort of emotionally exhausting and the cost doesn't seem worth it."

I do agree that "get upset" (or more accurately "perform upset-ness") works reasonably well as localized strategy, and can scale up a bit if you can rally more people to get upset on your behalf. But the post was motivated by people who seemed to get upset... unreflectively?

(I updated the wording a bit but am not quite happy with it. I do think the underlying point was fairly core to the robust agent thing: you want policies for achieving your goals that actually work. "Getting upset in situation X" might be a good policy, but if you're enacting it as an adaption-executor rather than as a considered policy, it may not actually be adaptive in your circumstance)

Eventually it (should, according to me) become clear that a) getting upset isn't having the desired effect, or at most is producing the effect of "superficially avoid behavior X when Bob is around".

Or "avoid Bob", "drop Bob as a friend", "leave Bob out of anything new", etc. What, if anything, becomes clear to Bob or to those he gets angry with is very underdetermined.

As you would expect from someone who was one of the inspirations for the post, I strongly approve of the insight/advice contained herein. I also agree with the previous review that there is not a known better write-up of this concept. I like that this gets the thing out there compactly.

Where I am disappointed is that this does not feel like it gets across the motivation behind this or why it is so important - I neither read this and think 'yes that explains why I care about this so much' or 'I expect that this would move the needle much on people's robustness as agents going forward if they read this.'

So I guess the takeaway for me looking back is, good first attempt and I wouldn't mind including it in the final list, but someone needs to try again?

It is worth noting that Jacob did *exactly* the adjustments that I would hope would result from this post if it worked as intended, so perhaps it is better than I give it credit for? Would be curious if anyone else had similar things to report.

I'm writing my self-review for this post, and in the process attempting to more clearly define what I mean by "Robust Agent" (possibly finding a better term for it)

The concept here is pointing at four points:

  • Strategy of deliberate agency – not just being a kludge of behaviors, but having goals and decision-making that you reflectively endorse
  • Corresponding strategy of Gears-Level-Understanding of yourself (and others, and the world, but yourself-in-particular)
  • Goal of being able to operate in an environment where common wisdom isn't good enough, and/or you expect to run into edge cases.
  • Goal of being able to coordinate well with other agents.

"Robustness" mostly refers to the third and fourth points. It's possible the core strategy might actually make more sense to call "Deliberate Agency". The core thing is that you're deciding on purpose what sort of agent to be. If the environment wasn't going to change, you wouldn't care about being robust.

Or maybe, "Robust Agency" makes sense as a thing to call one overall cluster of strategies, but it's a subset of "Deliberate Agency."

Or maybe, "Robust Agency" makes sense as a thing to call one overall cluster of strategies, but it's a subset of "Deliberate Agency."

Where might "Robust Agency" not overlap with "Deliberate Agency"?

Robust Agency is a subset of Deliberate Agency (so it always overlaps in that direction). 

But you might decide, deliberately, to always ‘just copy what your neighbors are doing and not think too hard about it’, or other strategies that don’t match the attributes I listed for coherent/robust agency. (noting again that those attributes are intended to be illustrative rather than precisely defined criteria)

I find the classification of the elements of robust agency to be helpful, thanks for the write up and the recent edit.

I have some issues with Coherence and Consistency:

First, I'm not sure what you mean by that so I'll take my best guess which in its idealized form is something like: Coherence is being free of self contradictions and Consistency is having the tool to commit oneself to future actions. This is going by the last paragraph of that section-

There are benefits to reliably being able to make trades with your future-self, and with other agents. This is easier if your preferences aren’t contradictory, and easier if your preferences are either consistent over time, or at least predictable over time.

Second, the only case for Coherence is that reasons that coherence helps you make trade with your future self. My reasons for it are more strongly related to avoiding compartmentalization and solving confusions, and making clever choices in real time given my limited rationality.

Similarly, I do not view trades with future self as the most important reason for Consistency. It seems that the main motivator here for me is some sort of trade between various parts of me. Or more accurately, hacking away at my motivation schemes and conscious focus, so that some parts of me will have more votes than others.

Third, there are other mechanisms for Consistency. Accountability is a major one. Also, reducing noise in the environment externally and building actual external constraints can be helpful.

Forth, Coherence can be generalized to a skill that allows you to use your gear lever understanding of yourself and your agency to update your gears to what would be the most useful. This makes me wonder if the scope here is too large, and that gears level understanding and deliberate agency aren't related to the main points as much. These may all help one to be trustworthy, in that one's reasoning can judged to be adequate - including for oneself - which is the main thing I'm taking out from here.

Fifth (sorta), I have reread the last section, and I think that I understand now that your main motivation for Coherence and Consistency is that the conversation between rationalists can be made much more effective in that they can more easily understand each other's point of view. This I view related to Game Theoretic Soundness, more than the internal benefits of Coherence and Consistency which are probably more meaningful overall.

I definitely did not intend to make either an airtight or exhaustive case here. I think coherence and consistency are good for a number of reasons, and I included the ones I was most confident in, and felt like I could explain quickly and easily. (The section was more illustrative than comprehensive)

This response will not lay out the comprehensive case, but will try to answer my current thoughts on some specific questions. (I feel a desire to stress that I still don't consider myself an expert or even especially competent amature on this topic)

Second, the only case for Coherence is that reasons that coherence helps you make trade with your future self

That's actually not what I was going for – coherence can be relevant in the moment (if I had to pick, my first guess is that coherence is more costly in the moment and inconsistency is more costly over time, although I'm not sure I was drawing a strong distinction between them)

If you have multiple goals that are at odds, this can be bad in the immediate moment, because instead of getting to focus on one thing, you have to divide up your attention (unnecessarily) between multiple things that are at odds. This can be stressful, it can involve cognitive dissonance which makes it harder to think, and it involves wasted effort

This post has helped me understand quite a bit the mindset of a certain subset of rationalists, and being able to point to it and my disagreements with it has been quite helpful in finding cruxes with disagreements.

Seems like you are trying to elaborate on Eliezer's maxim Rationality is Systematized Winning. Some of what you mentioned implies shedding any kind of ideology, though sometimes wearing a credible mask of having one. Also being smarter than most people around you, both intellectually and emotionally. Of course, if you are already one of those people, then you don't need rationality, because, in all likelihood, you have already succeeded in what yo


I think the thing I'm gesturing at here is related but different to the systemized winning thing.

Some distinctions that I think make sense. (But would defer to people who seem further ahead in this path than I)

  • Systemized Winning – The practice of identifying and doing the thing that maximizes your goal (or, if you're not a maximizer, ensures a good distribution of satisfactory outcomes)
  • Law Thinking – (i.e. Law vs Tools) – Lawful thinking is having a theoretical understanding of what would be the optimal action for maximizing utility, given various constraints. This is a useful idea for a civilization to have. Whether it's directly helpful for you to maximize your utility depends on your goals, environment, and shape-of-your-mental-faculties.
    • I'd guess for most humans (of average intelligence), what you want is for some else to do Law thinking, figuring out the best thing, figure out the best approximation of the best thing, and then distill it down to something you can easily learn.
  • Being a Robust Agent - Particular strategies, for pursuing your goals, wherein you strive to have rigorous policy-making, consistent preferences (or consistent ways to resolve inconsistency), ways to reliably trust yourself and others, etc.
    • You might summarize this as "the strategy of embodying lawful thinking to achieve your goals." (not sure if that quite makes sense)
    • I expect this to be most useful for people who either
      • find rigorous policy-level, consistency-driven thinking easy, such that it's just the most natural way for them to approach their problems
    • have an preference to ensure that their solutions to problems don't break down in edge cases (i.e. nerds often like having explicit understandings of things independent of how useful it is)
    • people with goals that will likely cause them to run into edge cases, such that it's more valuable to have figured out in advance how to handle those.

When you look at the Meta-Honesty post... I don't think the average person will find it a particularly valuable tool for achieving their goals. But I expect there to be a class of person who actually needs it as a tool to figure out how to trust people in domains where it's often necessary to hide or obfuscate information.

Whether you want your decision-theory robust enough such that Omega simulating you will give you a million dollars depends a lot on whether you expect Omega to actually be simulating you and making that decision. I know at least some people who are actually arranging their life with that sort of concern in mine.

I do think there's an alternate frame where you just say "no, rationality is specifically about being a robust agent. There are other ways to be effective, but rationality is the particular way of being effective where you try to have cognitive patterns with good epistemology and robust decision theory."

This is in tension with the "rationalists should win", thing. Shrug.

I think it's important to have at least one concept that is "anyone with goals should ultimately be trying to solve them the best way possible", and at least one concept that is "you might consider specifically studying cognitive patterns and policies and a cluster of related things, as a strategy to pursue particular goals."

I don't think is quite the same thing as instrumental rationality (although it's tightly entwined). If your goals are simple and well-understood, and you're interfacing in a social domain with clear rules, the most instrumentally rational thing might be to not overthink it and follow common wisdom.

But it's particularly important if you want to coordinate with other agents, over the long term. Especially on ambitious, complicated projects in novel domains.

On my initial read, I read this as saying "this is the right thing for some people, even when it isn't instrumentally rational" (?!). But

I think it's important to have at least one concept that is "anyone with goals should ultimately be trying to solve them the best way possible", and at least one concept that is "you might consider specifically studying cognitive patterns and policies and a cluster of related things, as a strategy to pursue particular goals."

makes me think this isn't what you meant. Maybe clarify the OP?

I was meaning to say "becoming a robust agent may be the instrumentally rational thing for some people in some situation. For other people in other situations, it may not be helpful."

I don't know that "instrumental rationality" is that well defined, and there might be some people who would claim that "instrumental rationality" and what I (here) am calling "being a robust agent" are the same thing. I disagree with that frame, but it's at least a cogent frame.

You might define "instrumental rationality" as "doing whatever thing is best for you according to your values", or you might use it it to mean "using an understanding of, say, probability theory and game theory and cognitive science to improve your decision making". I think it makes more sense to define it the first way, but I think some people might disagree with that. 

If you define it the second way, then for some people – at least, people who aren't that smart or good at probability/game-theory/cog-science – then "the instrumentally rational thing" might not be "the best thing."

I'm actually somewhat confused about which definition Eliezer intended. He has a few posts (and HPMOR commentary) arguing that "the rational thing" just means "the best thing". But he also notes that it makes sense to use the word "rationality" specifically when we're talking about understanding cognitive algorithms. 

Not sure whether that helped. (Holding off on updating the post till I've figured out what the confusion here is)

I define it the first way, and don't see the case for the second way. Analogously, for a while, Bayesian reasoning was our best guess of what the epistemic Way might look like. But then we find out about logical induction, and that seems to tell us a little more about what to do when you're embedded. So, we now see it would have been a mistake to define "epistemic rationality" as "adhering to the dictates of probability theory as best as possible".

I think that Eliezer's other usage of "instrumental rationality" points to fields of study for theoretical underpinning of effective action.

(not sure if this was clear, but I don't feel strongly about which definition to use, I just wanted to disambiguate between definitions people might have been using)

I think that Eliezer's other usage of "instrumental rationality" points to fields of study for theoretical underpinning of effective action.

This sounds right-ish (i.e. this sounds like something he might have meant). When I said "use probability and game theory and stuff" I didn't mean "be a slave to whatever tools we happen to use right now", I meant sort of as examples of "things you might use if you were trying to base your decisions and actions off of sound theoretical underpinnings."

So I guess the thing I'm still unclear on (people's common usage of words): Do most LWers think it is reasonable to call something "instrumentally rational" if you just sorta went with your gut without ever doing any kind of reflection (assuming your gut turned out to be trustworthy?). 

Or are things only instrumentally rational if you had theoretical underpinnings? (Your definition says "no", which seems fine. But it might leave you with an awkward distinction between "instrumentally rational decisions" and "decisions rooted in instrumental rationality.")

I'm still unsure if this is dissolving confusion, or if the original post still seems like it needs editing.

Your definition says "no", which seems fine. But it might leave you with an awkward distinction between "instrumentally rational decisions" and "decisions rooted in instrumental rationality."

My definition was the first, which is "instrumental rationality = acting so you win". So, wouldn't it say that following your gut was instrumentally rational? At least, if it's a great idea in expectation given what you knew - I wouldn't say lottery winners were instrumentally rational.

I guess the hangup is in pinning down "when things are actually good ideas in expectation", given that it's harder to know that without either lots of experience or clear theoretical underpinnings.

I think one of the things I was aiming for with Being a Robust Agent is "you set up the longterm goal of having your policies and actions have knowably good outcomes, which locally might be a setback for how capable you are, but allows you to reliably achieve longer term goals."