This is (sort of) a response to Blatant lies are the best kind!, although I'd been working on this prior to that post getting published. This post explores similar issues through my own frame, which seems at least somewhat different from Benquo's.

I've noticed a tendency for people to use the word "lie", when they want to communicate that a statement is deceptive or misleading, and that this is important.

And I think this is (often) technically wrong. I'm not sure everyone defines lie quite the same way, but in most cases where I hear it unqualified, I usually assume it means "to deliberately speak falsehood." Not all deceptive or misleading things are lies.

But it's perhaps a failure of the english language that there isn't a word for "rationalizing" or "motivated cognition" that is as rhetorically hefty.

If you say "Carl lied!", this is a big deal. People might get defensive (because they're friends with Carl), or they might get outraged (if they believe you and feel betrayed by Carl). Either way, something happens.

Whereas if Carl is making a motivated error, and you say "Carl is making a motivated error!", then people often shrug and go 'I dunno, people make motivated errors all the time?" And well, yeah. People do make motivated errors all the time. This is all doubly slippery if the other people are motivated in the same direction as Carl, which incentives them to not get too worked up about it.

But at least sometimes, the error is bad or important enough, or Carl has enough social influence, that it matters that he is making the error.

So it seems perhaps useful to have a word – a short, punchy word – that comes pre-cached with connotation like "Carl has a pattern of rationalizing about this topic, and that pattern is important, and the fact that this has continued awhile un-checked should be making you sit bolt upright in alarm and doing something different from whatever you are currently doing in relation to Carl."

Or, alternately: "It's not precisely a big deal that Carl in particular is doing this. Maybe everyone's doing this, and it'd be unfair to single Carl out. But, the fact that our social fabric is systematically causing people to distort their statements the way Carl is doing is real bad, and we should prioritize fixing that."

The motivating example here was a discussion/argument I had a couple weeks ago with another rationalist. Let's call them Bob.

("Bob" can reveal themselves in the comments if they wish).

Bob was frustrated with Alice, and with many other people's response to some of Alice's statements. Bob said [paraphrased slightly] "Alice blatantly lied! And nobody is noticing or caring!"

Now, it seemed to me that Alice's statement was neither a lie, nor blatant. It was not a lie because Alice believed it. (I call this "being wrong", or "rationalizing", not "lying", and the difference is important because it says very different things about a person's character and how to most usefully respond to them)

It didn't seem blatant because, well, at the very least it wasn't obvious to me that Alice was wrong.

I could see multiple models of the world that might inform Alice's position, and some of them seemed plausible to me. I understood why Bob disagreed, but nonetheless Alice's wrongness did not seem like an obvious fact.

[Unfortunately going into the details of the situation would be more distracting than helpful. I think what's most important to this post were the respective epistemic states of myself and Bob.

But to give some idea, let's say Alice had said something like "obviously minimum wage helps low income workers."

I think this statement is wrong, especially the "obviously" part, but it's a position one might earnestly hold depending on which papers you read in which order. I don't know if Bob would agree that this is a fair comparison, but it roughly matches my epistemic state]

So, it seemed to me that Alice was probably making some cognitive mistakes, and failing to acknowledge some facts that were relevant to her position.

It was also in my probability space that Alice had knowingly lied. (In the minimum wage example, if Alice knew full well that there were some good first principles and empirical reasons to doubt that minimum wage helped low-income workers, and ignored them because it was rhetorically convenient, I might classify that as a lie, or some other form of deception that raised serious red flags about Alice's trustworthiness).

With all this in mind, I said to Bob:

"Hey, I think this is wrong. I don't think Alice was either lying, or blatantly wrong."

Bob thought a second, and then said "Okay, yeah fair. Sure. Alice didn't lie, but she engaged in motivation cognition. But I still think" — and then Bob started speaking quickly, moving on to why he were still frustrated with people's response to Alice, agitation in his voice.

And I said: (slightly paraphrased to fit an hour of discussion into one paragraph)

"Hey. Wait. Stop. It doesn't look like you've back-propagated the fact that Alice didn't blatantly lie through the rest of your belief network. It's understandable if you disagree with me about whether "blatantly lie" makes sense as a description of what's happening here. But if we do agree on that, I think you should actually stop and think a minute, and let that fact sink in, and shift how you feel about the people who aren't treating Alice's statement the way you want."

Bob stopped and said "Okay, yeah, you're right. Thanks." And then waited a minute to do so. (This didn't radically change the argument, in part because there were a lot of other facets of the overall disagreement, but still seemed like a good move for us to have jointly performed)

It was during that minute, while I was meanwhile reflecting on my own, that I thought about the opening statement of this post:

That maybe it's a failure of the english language that we don't have a way to communicate "so-and-so is rationalizing, and this pattern of rationalization is important." If you want to get people's attention and get them agitated, your rhetorical tools are limited.

[Edited addendum]

My guess is that a new word isn't actually the right solution (as Bendini notes in the comments, new jargon tends to get collapsed into whatever the most common use case is, regardless of how well the jargon term fits it).

But I think it'd be useful to at least a have as shared concept-handle, that we can more easily refer to. I think it'd be good to have more affordance to say: "Alice is rationalizing, and people aren't noticing, and I think we should be sitting up and paying attention to this, not just shrugging it off."

New Comment
56 comments, sorted by Click to highlight new comments since: Today at 9:16 AM

Something about the tone of this post seems like it's missing an important distinction. Targeted alarm is for finding the occasional, rare bad actor. As Romeo pointed out in his comment, we suffer from alarm fatigue. The kind of alarm that needs raising for self-propagating patterns of motivated reasoning is procedural or conceptual. People are mistakenly behaving (in some contexts) as though certain information sources were reliable. This is often part of a compartmentalized pattern; in other contexts, the same people act as though, not only do they personally know, but everybody knows, that those sources are not trustworthy.

To take a simple example, I grew up in a household with a television. That means that, at various times in the day, I was exposed to messages from highly paid expert manipulators trying to persuade me to consume expensive, poor-quality, addictive foods that were likely to damage my mind and body by spiking my blood sugar and lowering my discernment. I watched these messages because they were embedded in other messages exposing me to a sort of story superstimulus with elevated levels of violence and excitement, but mostly devoid of messages from my elders about what sorts of time-tested behaviors are adaptive for the community or individual.

If you try to tell people that TV is bad for kids, they'll maybe feel vaguely guilty, but not really process this as news, because "everybody knows," and go on behaving as though this was fine. If you manage to get through to them that TV ads are Out to Get You,  this might get their attention, but only by transmitting an inappropriately concentrated sense of threat - or an unproductive general paranoia.

I feel like, in the emotional vocabulary of this post, the problem is how to inform my parents that they should be scared of some particular television commercial or show, with the proper level of urgency, without making a literally false accusation. But the actual problem is that large parts of our world are saturated with this sort of thing - friends have TVs, billboards are everywhere, my parents were already acculturated to some extent by TV, and there are other less immediately obvious unfriendly acculturators like school.

The behavior I'd have wanted my parents to exhibit would probably have started with working out - with friends and community members - and with me and my sister - and first, with each other - a shared model and language for talking about the problem, before we started to do anything about it. Not to blame the proximate target and treat each case as a distinct emergency against a presumed backdrop of normality.

The bad news is that vastly powerful cultural forces are deeply unsafe. The good news is that, without any specific defenses against these, or even any clear idea of their shape, we've mostly been doing okay anyway. The bad news is that that's beginning to change.

Stopgap solutions that can be implemented immediately look less like rationing screen time, and more like celebrating a communal Sabbath with clear, traditional standards.

This is why I began my post Effective Altruism is Self-Recommending with a general account of pyramid and Ponzi schemes - not to single out Effective Altruism as especially nasty, but to explain that such schemes are not only destructive, but extremely common and often legitimated by the authorities. The response I'm trying for is more like "halt, melt, catch fire."

(Ended up cross-posting this comment as a blog post.)

I think one issue is that I don't have a word or phrase that quite communicates the thing I was trying to point to. When I said "sit upright in alarm", the actions I meant to be coupled with that look more like this:

The behavior I'd have wanted my parents to exhibit would probably have started with working out - with friends and community members - and with me and my sister - and first, with each other - a shared model and language for talking about the problem, before we started to do anything about it.

As opposed to either ignoring the problem, or blaming something haphazardly, or imposing screen limits without reflection, or whatever.

I'm not sure of a phrase that communicates the right motion. I agree that alarm fatigue is a thing (and basically said so right after posting the OP). "Sitting up, taking notice, and directing your attention strategically" sort of does it but in an overwrought way. If you have a suggestion for a short handle for "the sort of initial mental motion you wish your parents had done, as well as the sort of initial mental motion you wish people

The thing prompting the OP was the facts that I've noticed people (in a few settings), using the word lying in a way that a) seemed false [by the definition of lying that seems most common to me, i.e. including both 'deliberateness' and usually at least a small bit of 'blameworthiness'], b) seemed like specifically people were making a mistake relating to "wishing they had a word that directed people's attention better", and it seeming unfair to ask them to stop without giving them a better tool to direct people's attention.

It seemed to me like you were emphasizing (a), in a way that pushed to the background the difference between wishing we had a proper way to demand attention for deceptive speech that's not literally lying, and wishing we had a way to demand attention for the right response. As I tried to indicate in the parent comment, it felt more like a disagreement in tone than in explicit content.

I think this is the same implied disagreement expressed around your comment on my Sabbath post. It seems like you're thinking of each alarm as "extra," implying a need for a temporary boost in activity, while I'm modeling this particular class of alarm as suggesting that much and maybe most of one's work has effects in the wrong direction, so one should pause, and ignore a lot of object-level bids for attention until one's worked this out.

Okay. I think I have a somewhat better handle on some of the nuances here and how various pieces of your worldview fit together. I think I'd previously been tracking and responding to a few distinct disagreements I had, and it'd make sense if those disagreements didn't land because I wasn't tracking the entirety of the framework at once.

Let me know how this sounds as an ITT:

  • Thinking and building a life for yourself
    • Much of civilization (and the rationalsphere as a subset of it and/or memeplex that's influenced and constrained by it) is generally pointed in the wrong direction. This has many facets, many of which reinforce each other. Society tends to:
      • Schools systematically teach people to associate reason with listening-to/pleasing-teachers, or moving-words-around unconnected from reality. [Order of the Soul]
      • Society systematically pushing people to live apart from each other, to work until they need (or believe they need) palliatives, in a way that doesn't give you space to think [Sabbath Hard and Go Home]
      • Relatedly, society provides structure that incentivizes you to advance in arbitrary hierarchy, or to tread water and barely stay afloat, without reflection of what you actually want.
    • By contrast, for much of history, there was a much more direct connection between what you did, how you thought, and how your own life was bettered. If you wanted a nicer home, you built a nicer home. This came with many overlapping incentive structures reinforced something closer to living healthily and generating real value.
    • (I'm guessing a significant confusion was me seeing this whole section as only moderately connected rather than central to the other sections)
  • We desperately need clarity
    • There's a collection of pressures, in many-but-not-all situations, to keep both facts and decision-making principles obfuscated, and to warp language in a way that enables that. This is often part of an overall strategy (sometimes conscious, sometimes unconscious) to maneuver groups for personal gain.
    • It's important to be able to speak plainly about forces that obfuscate. It's important to lean fully into clarity and plainspeak, not just taking marginal steps towards it, both because clear language is very powerful intrinsically, and there's a sharp dropoff as soon as ambiguity leaks in (moving the conversation to higher simulacrum levels, at which point it's very hard to recover clarity)
  • [Least confident] The best focus is on your own development, rather than optimizing systems or other people
    • Here I become a lot less confident. This is my attempt to summarize whatever's going on in our disagreement about my "When coordinating at scale, communicating has to reduce gracefully to about 5 words" thing. I had an impression that this seemed deeply wrong, confusing, or threatening to you. I still don't really understand why. But my best guesses include:
      • This is putting the locus of control in the group, at a moment-in-history where the most important thing is reasserting individual agency and thinking for yourself (because many groups are doing the wrong-things listed above)
      • Insofar as group coordination is a lens to be looked through, it's important that groups a working in a way that respects everyone's agency and ability to think (to avoid falling into some of the failure modes associated with the first bullet point), and simplifying your message so that others can hear/act on it is part of an overall strategy that is causing harm
      • Possibly a simpler "people can and should read a lot and engage with more nuanced models, and most of the reason you might think that they can't is because school and hierarchical companies warped your thinking about that?"

And then, in light of all that, something is off with my mood when I'm engaging with individual pieces of that, because I'm not properly oriented around the other pieces?

Does that sound right? Are there important things left out or gotten wrong?

This sounds really, really close. Thanks for putting in the work to produce this summary!

I think my objection to the 5 Words post fits a pattern where I've had difficulty expressing a class of objection. The literal content of the post wasn't the main problem. The main problem was the emphasis of the post, in conjunction with your other beliefs and behavior.

It seemed like the hidden second half of the core claim was "and therefore we should coordinate around simpler slogans," and not the obvious alternative conclusion "and therefore we should scale up more carefully, with an uncompromising emphasis on some aspects of quality control." (See On the Construction of Beacons for the relevant argument.)

It seemed to me like there was some motivated ambiguity on this point. The emphasis seemed to consistently recommend public behavior that was about mobilization rather than discourse, and back-channel discussions among well-connected people (including me) that felt like they were more about establishing compatibility than making intellectual progress. This, even though it seems like you explicitly agree with me that our current social coordination mechanisms are massively inadequate, in a way that (to me obviously) implies that they can't possibly solve FAI.

I felt like if I pointed this kind of thing out too explicitly, I'd just get scolded for being uncharitable. I didn't expect, however, that this scolding would be accompanied by an explanation of what specific, anticipation-constraining, alternative belief you held. I've been getting better at pointing out this pattern (e.g. my recent response to habryka) instead of just shutting down due to a preverbal recognition of it. It's very hard to write a comment like this one clearly and without extraneous material, especially of a point-scoring or whining nature. (If it were easy I'd see more people writing things like this.)

It seemed like the hidden second half of the core claim was "and therefore we should coordinate around simpler slogans," and not the obvious alternative conclusion "and therefore we should scale up more carefully, with an uncompromising emphasis on some aspects of quality control." (See On the Construction of Beacons for the relevant argument.)

"Scale up more carefully" is a reasonable summary of what I intended to convey, although I meant it more like "here are specific ways you might fuck up if you aren't careful." At varying levels of scale, what is actually possible, and why?

FWIW, the motivating example for You Have About Five Words was recent (at the time) EA backlash about the phrase "EA is Talent Constrained", which many people interpreted to mean "if I'm, like, reasonably talented, EA organizations will need me and hire me", as opposed to "The EA ecosystem is looking for particular rare talents and skills, and this is more important than funding at the moment."

The original 80k article was relatively nuanced about this (although re-reading it now I'm not sure it really spells out the particular distinction that'd become a source of frustration. They'd since written an apology/clarification, but it seemed like there was a more general lesson that needed learning, both among EA communicators (and, separately, rationalist communicators) and among people who were trying to keep up with the latest advice/news/thoughts.

The takeaways I meant to be building towards (but, I do recognize now that I didn't explicitly say this at all and probably should have), were:

  • If you're a communicator, make sure the concept you're communicating degrades gracefully as it loses nuance, (and this is important enough that it should be among the things we hold thought leaders accountable to). Include the nuance, for sure. But some concepts predictably become net-harmful when reduced to their post title, or single-most-salient line.
    • Water flows downhill, and ideas flow towards simplicity. You can't fight this, but you can design the contours of the hill around your idea such that it flows towards a simplicity that is useful.
  • If you're a person consuming content, pay extra attention to the fact that you, and the people around you, are probably missing nuance by default. This is causing some kinds of double-illusion-of-transparency. Even if communicators are paying attention to the previous point, it's still a very hard job. Take some responsibility for making sure you understand concepts before propagating them, and if you're getting angry at a communicator, doublecheck what they actually said first.

this is important enough that it should be among the things we hold thought leaders accountable to

I would say that this depends on what kind of communicator or thought leader we're talking about. That is, there may be a need for multiple, differently-specialized "communicator" roles.

To the extent that you're trying to build a mass movement, then I agree completely and without reservations: you're accountable for the monster spawned by the five-word summary of your manifesto, because pandering to idiots who can't retain more than five words of nuance is part of the job description of being a movement leader. (If you don't like the phrase "pandering to idiots", feel free to charitably pretend I said something else instead; I'm afraid I only have so much time to edit this comment.)

To the extent that you're actually trying to do serious intellectual work, then no, absolutely not. The job description of an intellectual is, first, to get the theory right, and second, to explain the theory clearly to whosoever has the time and inclination to learn. Those two things are already really hard! To add to these the additional demand that the thinker make sure that her concepts won't be predictably misunderstood as something allegedly net-harmful by people who don't have the time and inclination to learn, is just too much of a burden; it can't be part of the job description of someone whose first duty (on which everything else depends) is to get the theory right.

The tragedy of the so-called "effective altruism" and "rationalist" communities, is that we're trying to be both mass movements, and intellectually serious, and we didn't realize until too late in September the extent to which this presents incompatible social-engineering requirements. I'm glad we have people like you thinking about the problem now, though!

(If you don't like the phrase "pandering to idiots", feel free to charitably pretend I said something else instead; I'm afraid I only have so much time to edit this comment.)

You know, it's kind of dishonest of you to appeal to your comment-editing time budget when you really just wanted to express visceral contempt for the idea that intellectuals should be held accountable for alleged harm from simplifications of what they actually said. Like, it didn't actually take me very much time to generate the phrase "accountability for alleged harm from simplifications" rather than "pandering to idiots", so comment-editing time can't have been your real reason for choosing the latter.

More generally: when the intensity of norm enforcement depends on the perceived cost of complying with the norm, people who disagree with the norm (but don't want to risk defying it openly) face an incentive to exaggerate the costs of compliance. It takes more courage to say, "I meant exactly what I said" when you can plausibly-deniably get away with, "Oh, I'm sorry, that's just my natural writing style, which would be very expensive for me to change." But it's not the expenses—it's you!

Except you probably won't understand what I'm trying to say for another three days and nine hours.

I agree that this applies more to mass movements than smaller intellectual groups.

Recall that my claim is "if you're trying to coordinate with 1/10/100/1000+ people, these are the constraints or causes/effects on how you can communicate (which are different for each scale)".

It also naively suggest different constraints on EA (which seems a bit more like a mass movement) than LessWrong (which sort of flirted with being a mass movement, but then didn't really followup on in. It seems to me that the number of 'serious contributors' is more like "around 100-200" than "1000+"). And meanwhile, not everyone on LW is actually trying to coordinate with anyone, which is fine.


There are some weirder questions that come into play when you're building a theory about coordination, in public in a space that does coordination. For now, set those aside and focus just on things like, say, developing theories of physics.

If you're not trying to coordinate with anyone, you can think purely about theory with no cost.

If you're an intellectual trying to coordinate only with intellectuals who want to follow your work (say, in the ballpark of 10 people), you can expect to have N words worth of shared nuance. (My previous best guess for N is 200,000 words worth, but I don't strongly stand by that guess)

It is an actual interesting question, for purely intellectual pursuits, whether you get more value out of having a single collaborator that you spend hours each day talking to, vs a larger number of collaborators. You might want to focus on getting your own theory right without regard for other people's ability to follow you (and if so, you might keep it all to yourself for the time being, or you might post braindumps into a public forum without optimizing it for readability, and let others skim it and see if it's worth pursuing to them, and then only communicate further with those people if it seems worth it)

But there is an actual benefit to your ability to think, to have other people who can understand what you're saying so they can critique it (or build off it). This may (or may not) lead you to decide it's worth putting effort into distillation, so that you can get more eyes reading the thing. (Or, you might grab all the best physicists and put them in a single lab together, where nobody has to spend effort per se on distillation, it just happens naturally as a consequence of conversation)

Again, this is optional. But it's an open question, even just in the domain of physics, how much you want to try to coordinate with others, and then what strategies that requires.


trying to coordinate with 1/10/100/1000+ people [...] not everyone on LW is actually trying to coordinate with anyone, which is fine.

I wonder if it might be worth writing a separate post explaining why the problems you want to solve with 10/100/1000+ people have the structure of a coordination problem (where it's important not just that we make good choices, but that we make the same choice), and how much coordination you think is needed?

In World A, everyone has to choose Stag, or the people who chose Stag fail to accomplish anything. The payoff is discontinuous in the number of people choosing Stag: if you can't solve the coordination problem, you're stuck with rabbits.

In World B, the stag hunters get a payoff stags, where n is the number of people choosing Stag. The payoff is continuous in n: it would be nice if the group was better-coordinated, but it's usually not worth sacrificing on other goals in order to make the group better-coordinated. We mostly want everyone to be trying their hardest to get the theory of hunting right, rather than making sure that everyone is using the same (possibly less-correct) theory.

I think I mostly perceive myself as living in World B, and tend to be suspicious of people who seem to assume we live in World A without adequately arguing for it (when "Can't do that, it's a coordination problem" would be an awfully convenient excuse for choices made for other reasons).


Stag/Rabbit is a simplification (hopefully obvious but worth stating explicitly to avoid accidental motte/bailey-ing). A slightly higher-resolution-simplification:

When it comes to "what norms do we want", it's not that you either get all-or-nothing, but if different groups are pushing different norms in the same space, there's deadweight loss as some people get annoyed at other people for violating their preferred norms, and/or confused about what they're actually supposed to be doing.

[modeling this out properly and explicitly would take me at least 30 minutes and possibly much longer. Makes more sense to do later on as a post]

Oh, I see; the slightly-higher-resolution version makes a lot more sense to me. When working out the game theory, I would caution that different groups pushing different norms is more like an asymmetric "Battle of the Sexes" problem, which is importantly different from the symmetric Stag Hunt. In Stag Hunt, everyone wants the same thing, and the problem is just about risk-dominance vs. payoff-dominance. In Battle of the Sexes, the problem is about how people who want different things manage to live with each other.

Nod. Yeah that may be a better formulation. I may update the Staghunt post to note this.

"Notice that you're not actually playing the game you think you're playing" is maybe a better general rule. (i.e. in the Staghunt article, I was addressing people who think that they're in a prisoner's dilemma, but actually they're in something more like a staghunt. But, yeah, at least some of the time they're actually in a Battle of the Sexes, or... well, actually in real life it's always actually some complicated nuanced thing)"

The core takeaway from the Staghunt article that still seems good to me is "if you feel like other people are defecting on your preferred strategy, actually check to see if you can coordinate on your preferred strategy. If it turns out people aren't just making a basic mistake, you may need to actually convince people your strategy is good (or, learn from them why your strategy is not in fact straightforwardly good."

I think this (probably?) remains a good strategy in most payoff-variants.

Thanks. This all makes sense. I think I have a bunch more thoughts but for the immediate future will just let that sink in a bit.

Another thing I'd add - putting this in its own comment to help avoid any one thread blowing up in complexity:

The orientation-towards-clarity problem is at the very least strongly analogous to, and most likely actually an important special case of, the AI alignment problem.

Friendliness is strictly easier with groups of humans, since the orthogonality thesis is false for humans - if you abuse us out of our natural values you end up with stupider humans and groups. This is reason for hope about FAI relative to UFAI, but also a pretty strong reason to prioritize developing a usable decision theory and epistemology for humans over using our crappy currently-available decision theory to direct resources in the short run towards groups trying to solve the problem in full generality.

AGI will, if ever, almost certainly be built - directly or indirectly - by a group of humans, and if that group is procedurally Unfriendly (as opposed to just foreign), there's no reason to expect the process to correct to FAI). For this reason, friendly group intelligence is probably necessary for solving the general problem of FAI.

I'm not sure I agree with all the details of this (it's not obvious to me that humans are friendly if you scale them up) but I agree that the orientation towards clarity likely has important analogues to the AI Alignment problem.

it's not obvious to me that humans are friendly if you scale them up

It seems like any AI built by multiple humans coordinating is going to reflect the optimization target of the coordination process building it, so we had better figure out how to make this so.

Initially I replied to this with "yeah, that seems straightforwardly true", then something about that felt off and then it took me awhile to figure out why.


It seems like any AI built by multiple humans coordinating is going to reflect the optimization target of the coordination process building it

...seems straightforwardly true.


..., so we had better figure out how to make this so. [where "this" is "humans are friendly if you scale them up"]

Could unpack a few different ways. I still agree with the general sentiment you're pointing at here, but I think the most straightforward interpretation of this is mostly false.

Humans are not scalably friendly, so many of the most promising forms of Friendly AI seem to _not_ be "humans who are scaled up", instead they're doing other things.

One example being CEV. (Which hopes that "if you scale up ALL humans TOGETHER and make them think carefully as you do so, you get something good, and if it turns out that you don't get something good that coheres it gracefully fails and says 'nope, sorry, this didn't work.'". But this is a different thing that scaling any particular human or small group of humans)

Iterated Amplication seems to more directly depend on humans being friendly as you scale them up, or at least some humans being so.

I am in fact pretty wary of Iterated Amplication for that reason.

The whole point of CEV, as I understand, is to figure out the thing you could build that is actually robust to you not being friendly yourself. The sort of thing that if the ancient greeks were building, you could possibly hope for them to figure out so that they didn't accidentally lock the entire lightcone in Bronze Age Warrior Ethos.



"You can't built friendly AI without this"

You and Zack have said this (or something like it) on occasion, and fwiw I get a fairly political red flag from the statement. Which is not to say I don't think the statement is getting at something important. But I notice each group I talk to has a strong sense of "the thing my group is focused on is the key, and if we can't get people to understand that we're doomed."

I myself have periodically noticed myself saying (and thinking), "if we can't get people to understand each other's frames and ontologies, we autolose. If we can't get people to jointly learn how to communicate and listen non-defensively and non-defensive-causing (i.e. the paradigm I'm currently pushing), we're doomed."

But, when I ask myself "is that really true? Is it sheer autolose if we don't all learn to doublecrux and whatnot?" No. Clearly not. I do think losing becomes more likely. I wouldn't be pushing my preferred paradigm if I didn't think that paradigm was useful. But the instinct to say "this is so important that we're obviously doomed if everyone doesn't understand and incorporate this" feels to me like something that should have a strong prior of "your reason for saying that is to grab attention and build political momentum."

(and to be clear, this is just my current prior, not a decisive argument. And again, I can certainly imagine human friendliness being crucial to at least many forms of AGI, and being quite useful regardless. Just noting that I feel a need to treat claims of this form with some caution.)

Hmm – I do notice, one comment of yours up, you note: "if that group is procedurally Unfriendly (as opposed to just foreign), there's no reason to expect the process to correct to FAI)." Something about this phrasing suggests you might be using the phrase friendly/unfriendly/foreign in ways that weren't quite mapping to how I was using them.

Noting this mostly as "I'm updating a bit towards my previous comment not quite landing in your ontology" (which I'm trying to get better at tracking).

Okay. I'm not confident I understand the exact thing you're pointing at but I think this comment (and reference to the sabbath conversation) helped orient me a bit in the direction towards understanding your frame. I think this may need to gestate a bit before I'm able to say more.

Put another way: the comment you just wrote seems (roughly) like a different way I might have attempted to explain my views in the OP. So I'm not sure if the issue is there still something subtle I'm not getting, or if I communicated the OP in a way that made it seem like your comment here wasn't a valid restatement of my point, or something other thing?

I think people have alarm fatigue due to insufficient resolution of their priority function to make decisions without tons of effort. There are a large number of things in the world that are deeply important, but of upstream importance is knowing that there's an order that will have to de facto be followed whether you are aware of the things that determine said order or not. And that you won't personally have time to get to almost any of the urgent things.

I'm pointing this out because lots of things shouting for attention with highest priority signals is a recipe for burnout, while at the same time we do in fact need people still capable of sitting bolt upright occasionally.

Lying about what? It is certainly common to blatantly lie when you want to cancel plans or decline an invitation. Some people think there should be social repurcussions for these lies. But imo these sorts of lies are, by default, socially acceptable.

There are complicated incentives around punishing deliberate manipulations and deception much harder than motivatated/unconcious manipulation and deception. In particular you are punishing people for being self aware. You can interpret 'The Elephant in the Brain' as record of the myriad ways people in somewhat, or more than somewhat, manipulative behavior. Motivated reasoning is endemic. A huge amount of behavior is largely motivated by local 'monkey politics' and status games. Learning about rationality might make a suffiently open minded and intelectually honest person aware of what they are often doing. But its not going to make them stop doing these things.

Imagine that people on average engage in 120 units of deception. 20 units of concious deception and 100 units of unconcious. People who take the self awareness pill engage in 40 units of concious deception and 0 units of unconcious deception. The later group engage in much less deception but they engage in twice as much 'deliberate' deception.

I have two main conclusions. First, I think seeing people, and yourself, clearly requires an increased tolerance for certain kinds of bad behavior. People are not very honest but cooperation is empirically possible. Ray commented this below: "If someone consciously lies* to me, it's generally because there is no part of them that thinks it was important enough to cooperate with me". I think that Ray's comment is false. Secondly I think its bad to penalize 'deliberate' bad behavior so much more heavily. What is the point of penalizing deception? Presumably much of the point is to preserve the group's ability to reason. Motivated reasoning and other forms of non-deleiberate deception and manipulation are arguably at least as serious a problem as blatant lies.

I know of a set of norms where if you want to decline an invitation you can make up an excuse and if it is discovered that the excuse doesn't hold water it's no big deal. Additionally I have seen norms where just saying a flat unexplained "no" is seen as more erroneous than obviously false excuse. I am of personally the inclination that if I don't have conflicting plans I won't fabricate them and it making the "unwillingness" more bare seems to get under some peoples nerves. This might be that some people value "keeping face" more than honesty. But there is a funny effect in that if somebody has just announced that they have an excuse not to go at some specific time and you make a suggestion for another actvity for exact same time they might declare to be free to do that. If everybody is "in on the game" that it's all "face" this doesn't seem to cause problems.

I do have a problem where if somebody makes what seems like english level claim of fact I tend to favour it being a claim of fact and having trouble when it's infact something else like invitation rejection. I also have bad understanding why people value "face-culture" and have trouble imagining what kinds of things would go wrong in a "faceless" dystopia.

I agree that the way I phrased that comment was wrong. There's a fairly narrow concept I was trying to point at, which is easily confused with other concepts.

(this is also a domain that I still feel overall confused about, and my main claim in the current conversational zeitgeist is that "naively pushing the 'lets use words for their literal meanings without regard for connotation'" is likely to cause unnecessary damage, and that getting to a world where we can rationally talk about many cognitive errors in public requires a solid foundation of game theory, and common knowledge about how to re-implement it)

I edited my original statement to say "epistemically cooperate", which is what I meant. If I'm working at a marketing firm and people regularly lie to customers and there's office politics that involve lying to each other all the time, then I probably wouldn't expect epistemic cooperation at all, but I would expect various other kinds of cooperation.

Also note the asterix after "consciously lies*". There's lots of small lies which aren't primarily about deception so much as social protocol, and well, I actually think everyone DOES know (mostly) that "I can't make the event, I'm busy" is code for "I can't make it for a reason that I don't want to share with you".

(This does still mean that I can't epistemically cooperate with people who do that as easily. A thing I like about the Berkeley rationalist community is that it's more socially acceptable to say "sorry, can't make it. I'm too depressed" or "too over-socialled", which in some cases allows for better cooperation on what activities you can do. For example "oh, well how about instead of going to the party we just silently read next to each other")

But part of what I'm pushing back against with this post (and series of ongoing conversations), of naively using words with heavy connotations, as if they did not have those connotations.

My sense is when [most] people actually use the phrase "Bob lied", they mean something closer to "Bob had an affair and then lied about it" or "Bob told me the car had 10,000 miles on it but actually it had 100,000 miles on it."

When Bob says "I can't make it to the party, I'm sick", or "this project is going to save the world!", people instead either don't say anything about it at all, or call it a "white lie", or use different words entirely like "Bob exaggerated."

The point of this post is to serve as a pointer towards ways we can improve clear communication, without trampling over chesterton fences.

[brief note for now: I agree with your point about "if you gain 40 points of self awareness it doesn't make sense to penalize that." I have more thoughts about it but will be awhile before writing it up]

I think if we try to stop punishing for deception altogether, we are missing on a good solution for the prisoner's dilemma.

It's reasonable (though not obvious) that we don't punish for unconscious deception. And you also make a good point that we shouldn't punish for self-awareness.

But I think, an important distinction has to be made between self-awareness and self-control. I am aware of many things, but I don't necessarily have active control over all of them, mostly because it would require a different level of mental energy.

In my books, a controlled and deliberated lie is much worse than an unconscious one or even a lie you are simply aware of.

You could say that a "lie" is as bad, as little mental effort it would require of the "liar" to avoid it.

(The following was originally the second half of this post. I was worried that I didn't have time to really develop it fully, and meanwhile the half-baked version of it sort of undercut the post in ways I did't like. Putting it here for now. Eventually hopefully will have all of this crystallized into a post that articulates what I think people actually should do about all this)

A problem with "sitting bolt upright in alarm" is that it's not something you can sustainably do all the time. The point of elevated attention is to pay, well, more attention to things that are locally important. If you're always paying maximal attention to one area, you're

a) going to miss other important areas

b) probably going to stress yourself out in ways that are longterm unhelpful. (in particular if "elevated attention" not only uses all of your existing attention, but redirects resources you have previously been using for things other than attention)

Deliberate lying (in my circles) seems quite rare to me.

I'm less confident about non-lying patterns of deception. (Basically, everyone around me seems smart enough for "lying" to be a bad strategy. But other forms of active malicious deception might be going on and I'm not confident I'd be able to tell).

But, given my current rate of "detect deliberate lying" and "detect explicit malicious deception", it does make sense to sit bolt upright in alarm whenever someone looks like they're deceiving. My detection of it, at least, is a rare event.

The next steps I follow after "stop and pay attention" are:

  • Check that you understood what the person said, and verify that what happened was lying rather than "having a very different model than you."
    • (I think the latter is common enough and the former rare enough that this should usually be your prior)
  • If they definitely, deliberately lied, and they don't have some kind of extenuating circumstance, it's quite reasonable for there to be major social repercussions.
    • If someone is willing to lie to my face (and is reasonably competent at it), this not only means I can't trust them with anything important, but it means I can't trust them when they say they've changed. I'd need to have a compelling reason for why they lied in the first place, and why they thought it was okay then but wrong now, or wouldn't be okay in the future, before I could make serious plans with them.

By contrast, a few other activities seem much more common. You can't sit bolt upright in alarm whenever these happen, because then you'd be sitting bolt upright in alarm all the time and stressing your body out.

The world is full of plenty of terrible things where the "appropriate" level of freak-out just isn't very helpful. Your grim-o-meter is for helping you buckle down and run a sprint, not for running a marathon. But I think there's something useful, occasionally, about letting yourself go into a heightened stress state, where you fully feel the importance. Especially if you think a problem might be so important that this particular marathon is the one that you're going to run.

Activities that seem common, and concerning:

  • Motivated reasoning
    • Humans are bad at lying [citation needed?] but really good at making up stories that they believe (at least mostly) that are convenient. And they are pretty good at selectively ignoring things when convenient.
    • I'd prefer to work exclusively with people who are good at avoiding rationalization. At some point, I'd like the internal sanity waterline of the rationalsphere to rise to the level where I treat motivated as a rare event that warrants Sitting Bolt Upright in Alarm. But we're not at that level yet. I'm not at that level yet.
  • Low key deception (conscious or otherwise)
    • I think it's common for people to end up believing their own marketing –looking for the largest plausible outcome that could justify their plan, and avoiding looking at the facts that'd make their plans seem least promising.
    • I think there's sometimes a slipperiness where people avoid letting their opinions getting pinned down, and end up presenting different facets of themselves to different people at different times. And moreover, present different facets of themselves to themselves at different times, without noticing the disconnect.

I think it's quite likely that we should be coordinating on a staghunt to systematically fix the above two issues. I think such a staghunt looks very different from that ones you do to address deliberate deception.

If someone deliberately deceives me, the issue is that I can't even trust them on the meta-level.

If someone is rationalizing and believing their own marketing, I think the issue is "rationalizing and believing your own marketing is the default state, and it requires a lot of skills built on top of each other in order to stop, and there are a lot of complicated tradeoffs you're making along the way, and this will take along time even for well intentioned people."

And meanwhile a large chunk of the problem is "people have very different models and ontologies that output very different beliefs and plans", so a lot of things that look like rationalization is just very different models.

And meanwhile a large chunk of the problem is "people have very different models and ontologies that output very different beliefs and plans", so a lot of things that look like rationalization is just very different models.

"just" very different model demands the question of why they (and you) prefer such different models.

It's motivated cognition all the way down. Choice of model is subject to the same biases that the object-level untruths are. In fact, motivated use of less-than-useful model is probably the MOST common case I encounter of the kinds of self- and other-deception we're discussing.

I think some differences of models are due to motivated cognition, but I think many or most models comes more down to different problems that you're solving.

For example, I had many arguments with habryka about whether there should be norms around keeping the office clean that involved continuous effort on the part of individuals. His opinion was that you should just solve the problem with specialization and systemization. I think motivated cognition have played a role in each of our models, but there were legitimate reasons to prefer one over the other, and those reasons were entangled with each other in messy ways that requires several days of conversation to untangle. (See "Hufflepuff Leadership and Fighting Entropy" for some details about the models, and hopefully an upcoming blogpost about resolving disagreements when you don't share ontologies)

Meta beliefs about jargon: There are some benefits to using a new word free of existing connotations, but costs often exceed the benefits. In the first stage only a few insiders know what it means. In the second stage you can use it with most of the community, but you need to translate it for casual members and a general audience. In the third stage the meaning becomes diluted as the community starts using it for everything, so you're basically back where you started.

In addition to the tendency for jargon to be diluted in general, jargon that's shorthand for "I see pattern X and that has very important implications" will be very powerful, so it's almost certain to be misused unless there are real costs (i.e. social punishments) for doing so. A better method may be to use existing phrases that are more linguistically stable.

Some draft proposals:

  • Carl is engaging in motivated cognition -> Carl has a conflict of interest/Carl is deceiving himself/Carl is quite attached to this belief (depending on which one is applicable)
  • Carl is wrong about something and it's influencing others -> Carl is a bad influence
  • Everyone in the community is saying X -> Our community has a systemic bias regarding idea X
  • Alice is "blatantly" wrong about X -> Alice has substantial disagreements with us about X

Most of these proposals sound quite confrontational, but that's inherent to what's being communicated. You can't use jargon for "Alice is saying dangerous things" within earshot of Alice and avoid social repercussions if the meaning is common knowledge.

All good points.

At the time I wrote the OP I wasn't so much advocating for a new word, as drawing attention to the fact that there's a particular important concept that English doesn't really have. I had "maybe we should actually have a new word" as a hypothesis to consider, but after a day of reflection and this comment I basically agree "yes, a new word would probably just get misused and defeat the whole purpose."

I've edited the OP to make that more clear.

It sounds like you want a word for "Alice is wrong, and that's terrible". In that case, you can say "Alice is fucking wrong", or similar.

Alas, that typically also won't be the right connotations. "Terrible" still leads me to engage with the statement in a fashion that I believe to be unhelpful.

A core background belief of mine is that many of our intuitions about what responses to "Alice is wrong" are useful are actually (counterintuitively) harmful for truthseeking. This is because the intuitions specifically come from 'thinking/language as a political tool' rather than 'thinking/language as a truthseeking tool' (and, meanwhile, we are incentivized to not notice when we are optimizing for politics rather than truthseeking)

One of the most important, foundational elements a truthseeking culture needs to serve is helping people override their tendency to use language and beliefs for politics. Doing this while also being able to talk sanely about politics is quite hard, and requires a game-theoretic under

The shortest string of words that get across the right thing is maybe "Alice is motivatedly wrong, and this is important."

OHHH! I just clicked that "this" in "this is important" is that it's the mechanism/motivation that's important to you, not the wrongness. Not "Alice is wrong on an important topic in an influential way", but "Alice is wrong in a systematic way, regardless of the importance of the topic or the distance from the truth".

Still not sure why "Alice is wrong" isn't a strict superset of the issue, and sufficient motivation to dig further into reasons and corrections if it matters.

Different situations might call for the wrongness or the rationalization being more important, but it's the rationalization that's slippery-er and makes it harder to talk about.

I edited the post slightly to try and make this more clear – glad you noticed the miscommunication and said so concretely.

FWIW, "Alice is systematically wrong [and/or poorly justified] [about X thing]" came to mind as a thing that I think would make me sit up and take note, while having the right implications.

You're far more invested in the terminology than I agree with. I can understand being frustrated that humans don't weigh the concept "alice earnestly endorsed harmful falsehoods" as heavily as "alice benefitted herself over the group by spreading falsehoods", but that's not a problem with language, it's an effect you can see in the language, caused by human cognitive evolution.

Most of these conversations should actually be addressed on the object level. If you care that it's Alice as opposed to Charlie who's espousing the ideas, or that Dean (who's not present to agree or disagree) is going to be unduly swayed, be aware that you're in hard-mode politics, not simple truth-finding.

I agree with your second-half comment quite a bit: intentionally misleading statements are, in fact, more harmful than honest-but-wrong (even if motivated) beliefs. In the lying case, the intent is to harm (or to privilege the liar over the group) and the lies will change as necessary to maximize the harm. In the motivated reasoning case, it's probably not going to get better, but it's not likely to get a lot worse. Bad actors are, in fact, worse than incompetent ones. To the extent that we have limited capability for detecting and addressing falsehoods, a focus on intent seems reasonable.

Also, Robin Hanson's little voice in my ear is reminding me to mention hypocrisy and personal denial of responsibility. I'm almost certainly mistaken on some things which benefit me. I don't want to punish such errors very strongly, lest I be punished for them when found out.

The terminology thing in the OP was I think most intended to highlight an underlying mood, that I wanted to make sure (in the subsequent discussion) people knew I wasn't missing, as well as framing why I thought "this particular mood" was important rather than a somewhat-different mood that I expect Benquo, Jessica and others would have naively expected/preferred me to have.

I'm not certain whether the problem is really the lack of a word, but I think it's at least useful to have shared concept of "when someone is motivatedly wrong, we have an easier time coordinating around that because we've read this blogpost." (I designed the rhetoric of the blogpost to facilitate saying "hey, Alice is wrong, and I think we should be sitting bolt upright in alarm about this, not just shrugging it off.")

There is the concept of a "freudian slip" which refers to things that are explained by motivations that are not super socially acceptable. However the attitude towards them seems to be that such slips should not be held against the person and the fact that they try to minimise the impact of the tendency works for their favour. And an extreme case would be tourettes where a person constantly spats explatives where it's important to not take the insults at face value.

Might be too politcal, but it's not super controversial when wondering why America wars so much to refer to "military-industrial complex" which seems to refer to an incentive-structure or actor strategy. If you "bolt" too much on that people will file you in the same folder with conspiracists with or without cause. The whole concept of "political correctness" can be thought of as a critique of a positon held becuase of social approval rather than cognitive work done.

There are also such idioms such as "money talks" and "You sing the songs of the of the ones whose bread you eat".

The effects are pretty well known and the issue is not that the phenomenon can't be singled out but what kinds of valuations are attributed to it.

Usually if someone is doing motivated cognition they are doing cognition which can usually be turned to be beneficial. It's far too common for things to fail becuase nobody was thinking. And bad arguments once aired can be corrected. Courts have devils advocates and they are not seen to malfunction for having those.

post: a judge is expected to be impartial and undoing the negation part of that word the concept you are looking for is "being partial" or "partiality". Attorneys are expected to be "zealous advocates" it's okay for them to take sides. A judge being partial might get called a "corrupted" judge which has a pretty heavy connotation. Having a conversational move of "Alice is corrupted in thinking X!" would have the rhetorical weight but it seems to me hard to folow up in a constrcutive way from that and it would seem to imply a position of power that is probably missing in a peer-to-peer conversation.

Things can be misleading by accident or on purpose. Motivated cognition would fit under "on purpose", since (by definition) it's motivated.

What's the difference between motivated errors and lies? Motivated errors are motivated, meaning they're decision-theoretic (responsive to incentives, etc). The standard argument for compatibilist free will now applies: since it's a decision made on the basis of consequences, it's responsive to incentives, so agent-based models and social systems design (including design of norms) should be treating the mind-part doing motivated cognition as an agent.

I think you are pointing to some difference between the two, but I'm not sure what it is. Maybe the difference is that motivated errors are more covert than lies are, more plausible deniability (including narrative coherence) is maintained, and this plausible deniability is maintained through keeping the thing unconscious, while a plausible semicoherent narrative is maintained in consciousness?

(Obviously, being very alarmed at things that are constantly happening is not a useful allocation of attention! But, that applies to deliberate lies too, not just motivated cognition.)

(See also, Dishonest Update Reporting)

Motivated errors are much less legible to the person who is motivated. The on-purposes-ness of motivated cognition is quite different from the on-purposeness of deliberate choice and I think treating them as the same leads to important failures.

If someone consciously lies* to me, it's generally because there is no part of them that thinks it was important enough to [edit: epistemically] cooperate with me. They specifically considered, with their System 2, and/or their entire subagent parliament, whether it was a good idea to lie to me, then they chose to do so. I have basically no further interest in attempting to cooperate with such a person unless they put a lot of work into repairing the damage.

When someone motivatedly rationalizes at the subconscious level, my sense of what's happening is some combination of

a) no one is really in control and it makes more sense to model them as a random collection of atoms doing stuff than at the 'agent' abstraction. The random collection of atoms might respond to incentives, but naively applying incentives to them won't necessarily work the way you want. (I mostly don't think this is a useful frame here but noting it for completeness)

b) insofar as there is an agent there, it's often the case that there are multiple subagents that are compartmentalized from each other. If they've made their way to the rationalsphere, read the sequences, etc, then it's highly likely that at least one subagent highly endorses not being motivated. But that agent may not have conscious access to the fact that they are making the motivated error.

The priority, in my mind, for conversations among people striving to discuss beliefs empirically/rationally rather than as tribal affiliation or point scoring, should be to make sure the subagent that cares about truth remains in control. (Otherwise you've already failed, or dramatically increased the difficulty, of having a conversation that isn't about tribal affiliation and point scoring)

[*minor point, but there's a large class of lies like jokes and stuff than I'm not counting here]

Would you place motivated errors, generally, in the same bucket as confirmation bias type thinking?

What's the difference between motivated errors and lies?

They're implemented by very different cognitive algorithms, which differently constrain the sorts of falsehoods and strategies they can generate.

Motivated cognition is exclusively implemented in pre-conscious mechanisms: distortion of attention, distortion of intuition, selective forgetting. Direct lying, on the other hand, usually refers to lying which has System 2 involvement, which means a wider range of possible mistruths and a wider (and more destructive) range of supporting strategies.

For example: A motivated reasoner will throw out some of their data inappropriately, telling themself a plausible but false story about how that data didn't mean anything, but they'll never compose fake data from scratch. But a direct liar will do both, according to what they can get away with.

My guess is that you have an unrealistic picture of what ordinary lying is like. When I lie, it's usually an automatic response (like most speech), said reflexively based on the social situation I'm in. (Think, "do I look fat in this?") I can "catch myself" afterwards or during the process, but the response itself is generated by system 1.

Using system 2 while lying is usually a mistake, because it seems unnatural. If system 2 is used for lying, it's usually offline: telling yourself a certain story before going in to a social situation, so the responses can come automatically. Having to use system 2 to lie during a conversation is a kind of failure mode.

There are extreme cases like faking entire datasets, which are pretty rare.

Hmm. It occurs to me that lying might be a domain that's particularly prone to typical mind fallacy because people rarely share information about their lying habits. (See "Typical Sex Life Fallacy")

Some categories of experiences I can recall, which I think fall on a spectrum of deliberateness to unconsciousness.

  • Lying while surprised.
    • As a teenager, my dad suddenly asked me "have you done Forbidden Activity?" at a time when I wasn't expecting it. "Oh shit," I thought. I said "no."
    • [in this case, I was explicitly not epistemically cooperating with my dad. My understanding from Blatant Lying is the Best Kind is that this was simulacrum 2 behavior]
  • Rehearsing a narrative
    • Perhaps most similar to your experience: talking to a prospective employer at an interview, and realizing they're about to ask me about X and the truest answer to X is pretty unflattering to myself. Rehearsing a narrative in my head to prepare for that moment, trying to come up with a story that's true-ish enough that I can justify it to myself, so that by the time they ask me about X I can bullshit my way through it fluently.
    • [This seemed mostly lying playing a simulacrum 4 game with fairly established rules about what is acceptable]
  • Reflexing lying that's easy to notice
    • If someone asks "does this dress make me look fat" and I say "no you look great!", or someone asks "how's your project coming along" and I say "great!", and no she doesn't look great and/or my project is not going great, it's usually obvious to me almost immediately, even if I believed it (or wasn't really paying attention one way or another) at the moment that I said "great!".
      • This feels on the edge of the lying/motivated-cognition spectrum, and seems reasonable to me classify it as a lie.
      • Even if the first instance was unconscious, if the conversation continues about how my project is going, subsequent statements are probably deliberate S2 lies, or there is clear, continuous S2 thinking about how to maintain the "things are great!" narrative.
      • [where this falls on the simulacrum spectrum depends a bit on context, I could see it being level 3 or level 4]
  • Reflexively bad arguments
    • Sometimes someone says "Policy X is terrible!" and I think "no, policy X is good! Without policy X is entire project is doomed!". And, well, I do think without Policy X, the project is going to be much harder and failure is more likely. But the statement was clearly politically motivated. "My preferred policy is absolutely essential" probably isn't true.
      • A few years ago, I probably would have not even noticed that "without policy X the project is doomed" is a bad argument. A few years later (with much deliberate practice in noticing motivated cognition under my belt), and I'm capable of noticing that "this was a bad argument with the flavor of political motivation" within a few minutes. If we're talking in person, that's probably too long for me to catch it in time. In email or blogpost form, I can usually catch it.
      • [This seems like the sort of level 3 simulacrum thing that can strengthen the level 3-ness of the conversation. I don't actually think it's useful to think of simulacrum levels moving in a particular order, so I don't think it's usually accurate to say that this is moving the dial from 2 to 3, but I do think it makes it harder to get from 3 to 1]

This study found that 60% of students at UMASS lied at least once in a 10 minute conversation:

And that of those who lied, many were surprised about how often they lied. I would not be surprised if this is true for many people (they lie at least once every ten minutes and would be surprised at how often they lie)

When I specifically started paying attention to little white lies (in particular, I found that I often reflexively exaggerated to make myself look good or prevent myself from looking bad) I found that I did it WAY more often than I thought. Once I got to a point where I could notice in the moment, I was able to begin correcting, but the first step was just noticing how often it occurred.

That link doesn't have enough information to find the study, which is likely to contain important methodological caveats.

Here's the study:

I think the methodology is fairly OK for this sort of high level analysis, except of course for it being all university students from UMASS.

I haven't thought about this topic much and don't have a strong opinion here yet, but I wanted to chime in with some personal experience which makes me suspect there might be distinct categories:

I worked in a workplace where lying was commonplace, conscious, and system 2. Clients asking if we could do something were told "yes, we've already got that feature (we hadn't) and we already have several clients successfully using that (we hadn't)." Others were invited to be part an "existing beta program" alongside others just like them (in fact, they would have been the very first). When I objected, I was told "no one wants to be the first, so you have to say that." Another time, they denied that they ever lied, but they did, and it was more than motivated cognition. There is a very vast gulf between "we've built this feature already" and "we haven't even asked the engineers what they think" and no amount of motivated cognition bridges it. It's less work than faking data, but it's no more subtle.

Motivated cognition is bad, but some people are really very willing to abandon truth for their own benefit in a completely adversarial way. The motivated cognition comes in to justify why what they're doing is okay, but they have a very clear model of the falsehoods they're presenting (they must in order to protect them).

I think they lie to themselves that they're not lying (so that if you search their thoughts, they never think "I'm lying"), but they are consciously aware of the different stories they have told different people, and the ones that actually constrain their expectations. And it's such a practiced way of being that even involving System 2, it's fluid. Each context activating which story to tell, etc., in a way that appears natural from the outside. Maybe that's offline S2, online S1? I'm not sure. I think people who interact like that have a very different relationship with the truth than do most people on LW.

Attempting to answer more concretely and principled-ly about what makes sense to distinguish here

I think you are pointing to some difference between the two, but I'm not sure what it is. Maybe the difference is that motivated errors are more covert than lies are, more plausible deniability (including narrative coherence) is maintained, and this plausible deniability is maintained through keeping the thing unconscious, while a plausible semicoherent narrative is maintained in consciousness?

Reflecting a bit more, I think there are two important distinctions to be made:

Situation A) Alice makes a statement, which is false, and either Alice knows beforehand it's false, or Alice realizes it's false as soon as she pays any attention to it after the fact. (this is slightly different from how I'd have defined "lie" yesterday, but after 24 hours of mulling it over I think this is the correct clustering)

Situation B) Alice makes a statement which is false, which to Alice appears locally valid, but which is built upon some number of premises or arguments that are motivated.



This comment ended up quite long, so a summary of my overall point:

Situation B is much more complicated than Situation A.

In Situation A, Alice only has one inferential step to make, and Alice and Bob have mutual understanding (although not common knowledge) of that one inferential step. Bob can say "Alice, you lied here" and have the conversation make sense.

In Situation B, Alice has many inferential steps to make, and if Bob says "Alice, you lied here", Alice (even if rational and honest) needs to include probability mass on "Bob is wrong, Bob is motivated, and/or Bob is a malicious actor."

These are sufficiently different epistemic states for Alice to be in that I think it makes sense to use different words for them.


Situation A

In situation A, if Bob says "Hey, Alice, you lied here", Alice thinks internally either "shit I got caught" or "oh shit, I *did* lie." In the first case, Alice might attempt to obfuscate further. In the second case, Alice hopefully says "oops", admits the falsehood, and the conversation moves on. In either case, the incentives are *mostly* clear and direct to Alice – try to avoid doing this again, because you will get called on it.

If Alice obfuscates, or pretends to be in Situation B, she might get away with it this time, but identifying the lie will still likely reduce her incentives to make similar statements in the future (since at the very least, she'll have to do work defending herself)

Situation B

In situation B, if you say "Hey Alice, you lied here", Alice will say "what the hell? No?".

And then a few things happen, which I consider justified on Alice's part:

From Alice's epistemic position, she just said a true thing. If Bob just claimed that true thing was a lie, alice has now has several major hypotheses to consider:

  • Alice actually said a false thing
    • maybe the argument that directly supports proposition B is faulty reasoning, or Alice is mistaken about the facts.
    • maybe somewhere in her background models/beliefs/ontology are nodes that are false due to motivated reasoning
    • maybe somewhere in her background models/beliefs/ontology are nodes that are false for non-motivated reasoning
  • Alice actually said a true thing
    • Bob's models/beliefs/ontology are wrong, because *Bob* is motivated, causing Bob to incorrectly think Alice's statement was false
    • Bob's models/beliefs/ontology are wrong, for non-motivated reasons
    • Bob is making some kind of straightforward local error about the claim in question (maybe he's misunderstanding her or defining words differently from her)
    • Bob's models are fine... but Bob is politically motivated. He is calling Alice a liar, not to help truthseek, but to cast aspersions on Alice's character. (this might part of an ongoing campaign to harm Alice, or just a random "Bob is having a bad day and looking to dump his frustration on someone else")
  • Alice said a partially true, partially false thing (or, some other variation of "it's complicated").
    • Maybe Bob is correctly noticing that Alice has a motivated component to her belief, but in fact, the belief is still true, and most of her reasoning is still correct, and Bob is factually wrong about the statement being a lie.
    • Maybe Alice and Bob's separate world models are pointing in different directions, which is making different aspects of Alice's salient to each of them. (They might both be motivated, or non-motivated). If they talk for awhile they may both eventually learn to see the situation through different frames that broaden their understanding.

This is a much more complicated set of possibilities for Alice to evaluate. Incentives are getting applied here, but they could push her in a number of ways.

If Alice is a typical human and/or junior rationalist, she's going to be defensive, which will make it harder for her to think clearly. She will be prone to exaggerating the probability of options that aren't her fault. She may see Bob as socially threatening her – not as a truthseeking collaborator trying to help, but as a malicious actor out to harm her.

If Alice is a perfectly skilled rationalist, she'll hopefully avoid feeling defensive, and will not exaggerate the probability any of the options for motivated reasons. But over half the options are still "this is Bob's problem, not Alice's, and/or they are both somewhat confused together".

Exactly how the probabilities fall out depends on the situation, and how much Alice trusts her own reasoning, and how much she trusts Bob's reasoning. But even perfect-rationalist Alice should have nonzero probability on "Bob is one who is wrong, perhaps maliciously, here".

And if the answer is "Alice's belief is built on some kind of motivated reasoning", that's not something that can be easily resolved. If Alice is wrong, but luckily so, where the chain of motivated beliefs might be only 1-2 nodes deep, she can check if they make sense and maybe discover she is wrong. But...

  • if she checks 1-2 nodes deep and she's not obviously wrong, this isn't clear evidence, since she might still be using motivated cognition to check for motivated cognition
  • if Alice is a skilled enough rationalist to easily check for motivated cognition, going 1-2 nodes deep still isn't very reassuring. If the problem was that "many of Alice's older observations were due to confirmation bias, and she no longer directly remembers those events but has cached them as prior probabilities", that's too computationally intractable to check in the moment.

And meanwhile, until Alice has verified that her reasoning was motivated, she needs to retain probability mass on Bob being the wrong one.


Situation B seems extremely different to me from Situation A. It makes a lot of sense to me for people to use different words or phrases for the two situations.

One confounding issue is that obfuscating liars in Situation A have an incentive to pretend to be in Situation B. But there's still a fact-of-the-matter of what mental state Alice is in, which changes what incentives Alice will and should respond to.

I think I've lost the thread of your point. It seems a LOT like you're looking at motivation and systemic issues _WAY_ too soon in situation B. Start with "I think that statement is incorrect, Alice", and work to crux the disagreement and find out what's going on. _THEN_ decide if there's something motivated or systemic that needs to be addressed.

Basically, don't sit bolt upright in alarm for situation B. That's the common case for anything complicated, and you need to untangle it as part of deciding if it's important.

(I edited the comment, curious if it's clearer now)

Ah, sorry not being clearer. Yes, that's actually the point I meant to be making. It's inappropriate (and factually wrong) for Bob to lead with "hey Alice you lied here". (I was trying to avoid editorializing too much about what seemed appropriate, and focus on why the two situations are different)

I agree that the correct opening move is "that statement is incorrect", etc.

One further complication, though, is that it might be that Alice and Bob have talked a lot about whether Alice is incorrect, looked for cruxes, etc, and after several months of this Bob still thinks Alice is being motivated and Alice still think her model just makes sense. (This was roughly the situation in the OP)

From Bob's epistemic state, he's now in a world where it looks like Alice has a pattern of motivation that needs to be addressed, and Alice is non-cooperative because Alice disagrees (and it's hard to tell the difference between "Alice actually disagrees" or "Alice is feigning disagreement for political convenience). I don't think there's any simple thing that can happen next, and [for good or for ill] what happens next is probably going to have something to do with Alice and Bob's respective social standing.

I think there are practices and institutions one could develop to help keep the topic in the domain of epistemics instead of politics, and there are meta-practices Alice and Bob can try to follow if they both wish for it to remain in the domain of epistemics rather than politics. But there is no special trick for it.

I think a little clearer in the comment, but I'm confused about the main post - in the case of subtle disagreements that _aren't_ clearly wrong nor intended to mislead, why do you want a word or concept that makes people sit up in alarm? Only after you've identified the object-level reasoning that shows it to be both incorrect and object-important, should you examine the process-importance of why Alice is saying it (though in reality, you're evaluating this, just like she's evaluating yours).

The biggest confounding issue in my experience is that for deep enough models that Alice has used for a long time, her prior that Bob is the one with a problem is MUCH higher than that her model is inappropriate for this question. In exactly the same way that Bob's beliefs are pointing to the inverse and defying introspection of true reasons for his beliefs.

If you're finding this after a fair bit of discussion, and it's a topic without fairly straightforward empirical resolution, you're probably in the "agree to disagree" state (admitting that on this topic you don't have sufficient mutual knowledge of each others' rational beliefs to agree). And then you CERTAINLY don't want a word that makes people "sit up in alarm", as it's entirely about politics which of you is deemed to be biased.

There are other cases where Alice is uncooperative and you're willing to assume her motives or process are so bad that you want others not to be infected. That's more a warning to others than a statement that Alice should be expected to respond to. And it's also going to hit politics and backfire on Bob, at least some of the time. This case comes up a lot in public statements by celebrities or authorities. There's no room for discussion at the object level, so you kind of jump to assuming bad faith if you disagree with the statements. Reaction by those who disagree with Paul Krugman's NYT column is an example of this - "he's got a Nobel in Economics, he must be intentionally misleading people by ignoring all the complexity in his bad policy advice".

I'll try to write up a post that roughly summarizes the overall thesis I'm trying to build towards here, so that it's clearer how individual pieces fit together.

But a short answer to the "why would I want a clear handle for 'sitting upright in alarm'" is that I think it's at least sometimes necessary (or at the very least, inevitable), for this sort of conversation to veer into politics, and what I want is to eventually be able to discuss politics-qua-politics sanely and truth-trackingly.

My current best guess (although very lightly held) is that politics will go better if it's possible to pack rhetorical punch into things for a wider variety of reasons, so people don't feel pressure to say misleading things in order to get attention.

if it's possible to pack rhetorical punch into things for a wider variety of reasons, so people don't feel pressure to say misleading things in order to get attention.

I don't think I agree with any of that - I think that rational discussion needs to have less rhetorical punch and more specific clarity of proposition, which tends not to meet political/other-dominating needs. And most of what we're talking about in this subthread isn't "need to say misleading things", but "have a confused (or just different from mine) worldview that feels misleading without lots of discussion".

I look forward to further iterations - I hope I'm wrong and only stuck in my confused model of how rationalists approach such disagreements.