This is (sort of) a response to Blatant lies are the best kind!, although I'd been working on this prior to that post getting published. This post explores similar issues through my own frame, which seems at least somewhat different from Benquo's.
I've noticed a tendency for people to use the word "lie", when they want to communicate that a statement is deceptive or misleading, and that this is important.
And I think this is (often) technically wrong. I'm not sure everyone defines lie quite the same way, but in most cases where I hear it unqualified, I usually assume it means "to deliberately speak falsehood." Not all deceptive or misleading things are lies.
But it's perhaps a failure of the english language that there isn't a word for "rationalizing" or "motivated cognition" that is as rhetorically hefty.
If you say "Carl lied!", this is a big deal. People might get defensive (because they're friends with Carl), or they might get outraged (if they believe you and feel betrayed by Carl). Either way, something happens.
Whereas if Carl is making a motivated error, and you say "Carl is making a motivated error!", then people often shrug and go 'I dunno, people make motivated errors all the time?" And well, yeah. People do make motivated errors all the time. This is all doubly slippery if the other people are motivated in the same direction as Carl, which incentives them to not get too worked up about it.
But at least sometimes, the error is bad or important enough, or Carl has enough social influence, that it matters that he is making the error.
So it seems perhaps useful to have a word – a short, punchy word – that comes pre-cached with connotation like "Carl has a pattern of rationalizing about this topic, and that pattern is important, and the fact that this has continued awhile un-checked should be making you sit bolt upright in alarm and doing something different from whatever you are currently doing in relation to Carl."
Or, alternately: "It's not precisely a big deal that Carl in particular is doing this. Maybe everyone's doing this, and it'd be unfair to single Carl out. But, the fact that our social fabric is systematically causing people to distort their statements the way Carl is doing is real bad, and we should prioritize fixing that."
The motivating example here was a discussion/argument I had a couple weeks ago with another rationalist. Let's call them Bob.
("Bob" can reveal themselves in the comments if they wish).
Bob was frustrated with Alice, and with many other people's response to some of Alice's statements. Bob said [paraphrased slightly] "Alice blatantly lied! And nobody is noticing or caring!"
Now, it seemed to me that Alice's statement was neither a lie, nor blatant. It was not a lie because Alice believed it. (I call this "being wrong", or "rationalizing", not "lying", and the difference is important because it says very different things about a person's character and how to most usefully respond to them)
It didn't seem blatant because, well, at the very least it wasn't obvious to me that Alice was wrong.
I could see multiple models of the world that might inform Alice's position, and some of them seemed plausible to me. I understood why Bob disagreed, but nonetheless Alice's wrongness did not seem like an obvious fact.
[Unfortunately going into the details of the situation would be more distracting than helpful. I think what's most important to this post were the respective epistemic states of myself and Bob.
But to give some idea, let's say Alice had said something like "obviously minimum wage helps low income workers."
I think this statement is wrong, especially the "obviously" part, but it's a position one might earnestly hold depending on which papers you read in which order. I don't know if Bob would agree that this is a fair comparison, but it roughly matches my epistemic state]
So, it seemed to me that Alice was probably making some cognitive mistakes, and failing to acknowledge some facts that were relevant to her position.
It was also in my probability space that Alice had knowingly lied. (In the minimum wage example, if Alice knew full well that there were some good first principles and empirical reasons to doubt that minimum wage helped low-income workers, and ignored them because it was rhetorically convenient, I might classify that as a lie, or some other form of deception that raised serious red flags about Alice's trustworthiness).
With all this in mind, I said to Bob:
"Hey, I think this is wrong. I don't think Alice was either lying, or blatantly wrong."
Bob thought a second, and then said "Okay, yeah fair. Sure. Alice didn't lie, but she engaged in motivation cognition. But I still think" — and then Bob started speaking quickly, moving on to why he were still frustrated with people's response to Alice, agitation in his voice.
And I said: (slightly paraphrased to fit an hour of discussion into one paragraph)
"Hey. Wait. Stop. It doesn't look like you've back-propagated the fact that Alice didn't blatantly lie through the rest of your belief network. It's understandable if you disagree with me about whether "blatantly lie" makes sense as a description of what's happening here. But if we do agree on that, I think you should actually stop and think a minute, and let that fact sink in, and shift how you feel about the people who aren't treating Alice's statement the way you want."
Bob stopped and said "Okay, yeah, you're right. Thanks." And then waited a minute to do so. (This didn't radically change the argument, in part because there were a lot of other facets of the overall disagreement, but still seemed like a good move for us to have jointly performed)
It was during that minute, while I was meanwhile reflecting on my own, that I thought about the opening statement of this post:
That maybe it's a failure of the english language that we don't have a way to communicate "so-and-so is rationalizing, and this pattern of rationalization is important." If you want to get people's attention and get them agitated, your rhetorical tools are limited.
[Edited addendum]
My guess is that a new word isn't actually the right solution (as Bendini notes in the comments, new jargon tends to get collapsed into whatever the most common use case is, regardless of how well the jargon term fits it).
But I think it'd be useful to at least a have as shared concept-handle, that we can more easily refer to. I think it'd be good to have more affordance to say: "Alice is rationalizing, and people aren't noticing, and I think we should be sitting up and paying attention to this, not just shrugging it off."
Initially I replied to this with "yeah, that seems straightforwardly true", then something about that felt off and then it took me awhile to figure out why.
This:
...seems straightforwardly true.
This:
Could unpack a few different ways. I still agree with the general sentiment you're pointing at here, but I think the most straightforward interpretation of this is mostly false.
Humans are not scalably friendly, so many of the most promising forms of Friendly AI seem to _not_ be "humans who are scaled up", instead they're doing other things.
One example being CEV. (Which hopes that "if you scale up ALL humans TOGETHER and make them think carefully as you do so, you get something good, and if it turns out that you don't get something good that coheres it gracefully fails and says 'nope, sorry, this didn't work.'". But this is a different thing that scaling any particular human or small group of humans)
Iterated Amplication seems to more directly depend on humans being friendly as you scale them up, or at least some humans being so.
I am in fact pretty wary of Iterated Amplication for that reason.
The whole point of CEV, as I understand, is to figure out the thing you could build that is actually robust to you not being friendly yourself. The sort of thing that if the ancient greeks were building, you could possibly hope for them to figure out so that they didn't accidentally lock the entire lightcone in Bronze Age Warrior Ethos.
...
..
"You can't built friendly AI without this"
You and Zack have said this (or something like it) on occasion, and fwiw I get a fairly political red flag from the statement. Which is not to say I don't think the statement is getting at something important. But I notice each group I talk to has a strong sense of "the thing my group is focused on is the key, and if we can't get people to understand that we're doomed."
I myself have periodically noticed myself saying (and thinking), "if we can't get people to understand each other's frames and ontologies, we autolose. If we can't get people to jointly learn how to communicate and listen non-defensively and non-defensive-causing (i.e. the paradigm I'm currently pushing), we're doomed."
But, when I ask myself "is that really true? Is it sheer autolose if we don't all learn to doublecrux and whatnot?" No. Clearly not. I do think losing becomes more likely. I wouldn't be pushing my preferred paradigm if I didn't think that paradigm was useful. But the instinct to say "this is so important that we're obviously doomed if everyone doesn't understand and incorporate this" feels to me like something that should have a strong prior of "your reason for saying that is to grab attention and build political momentum."
(and to be clear, this is just my current prior, not a decisive argument. And again, I can certainly imagine human friendliness being crucial to at least many forms of AGI, and being quite useful regardless. Just noting that I feel a need to treat claims of this form with some caution.)