Rob Bensinger argues that "ITT-passing and civility are good; 'charity' is bad; steelmanning is niche".

The ITT—Ideological Turing Test—is an exercise in which one attempts to present one's interlocutor's views as persuasively as the interlocutor themselves can, coined by Bryan Caplan in analogy to the Turing Test for distinguishing between humans and intelligent machines. (An AI that can pass as human must presumably possess human-like understanding; an opponent of an idea that can pass as an advocate for it presumably must possess an advocate's understanding.) "Steelmanning" refers to the practice of addressing a stronger version of an interlocutor's argument, coined in disanalogy to "strawmanning", the crime of addressing a weaker version of an interlocutor's argument in the hopes of fooling an audience (or oneself) that the original argument has been rebutted.

Bensinger describes steelmanning as "a useful niche skill", but thinks it isn't "a standard thing you bring out in most arguments." Instead, he writes, discussions should be structured around object-level learning, trying to pass each other's Ideological Turing Test, or trying resolve cruxes.

I think Bensinger has it backwards: the Ideological Turing Test is a useful niche skill, but it doesn't belong on a list of things to organize a discussion around, whereas something like steelmanning naturally falls out of object-level learning. Let me explain.

The ITT is a test of your ability to model someone else's models of some real-world phenomena of interest. But usually, I'm much more interested in modeling the real-world phenomena of interest directly, rather than modeling someone else's models of it.

I couldn't pass an ITT for advocates of Islam or extrasensory perception. On the one hand, this does represent a distinct deficit in my ability to model what the advocates of these ideas are thinking, a tragic gap in my comprehension of reality, which I would hope to remedy in the Glorious Transhumanist Future if that were a real thing. On the other hand, facing the constraints of our world, my inability to pass an ITT for Islam or ESP seems ... basically fine? I already have strong reasons to doubt the existence of ontologically fundamental mental entities. I accept my ignorance of the reasons someone might postulate otherwise, not out of contempt, but because I just don't have the time.

Or think of it this way: as a selfish seeker of truth speaking to another selfish seeker of truth, when would I want to try to pass my interlocutor's ITT, or want my interlocutor to try to pass my ITT?

In the "outbound" direction, I'm not particularly selfishly interested in passing my interlocutor's ITT because, again, I usually don't care much about other people's beliefs, as contrasted to the reality that those beliefs are reputedly supposed to track. I listen to my interlocutor hoping to learn from them, but if some part of what they say seems hopelessly wrong, it doesn't seem profitable to pretend that it isn't until I can reproduce the hopeless wrongness in my own words.

Crucially, the same is true in the "inbound" direction. I don't expect people to be able to pass my ITT before criticizing my ideas. That would make it harder for people to inform me about flaws in my ideas!

But if I'm not particularly interested in passing my interlocutor's ITT or in my interlocutor passing mine, and my interlocutor presumably (by symmetry) feels the same way, why would we bother?

All this having been said, I absolutely agree that, all else being equal, the ability to pass ITTs is desirable. It's useful as a check that you and your interlocutor are successfully communicating, rather than talking past each other. If I couldn't do better on an ITT for Islam or ESP after debating a proponent, that would be alarming—it's just that I'd want to try the old-fashioned debate algorithm first, and improve my ITT score as a side-effect, rather than trying to optimize my ITT score directly.

There are occasions when I'm inclined to ask an interlocutor to pass my ITT—specifically when I suspect them of not being honest about their motives, of being selfish about something other than the pursuit of truth (like winning acclaim for "their own" current theories). If someone seems persistently motivated to strawman you, asking them to just repeat back what you said in their own words is a useful device to get the discussion back on track. (Or to end it, if they clearly don't even want to try.)

In contrast to the ITT, steelmanning is something a selfish seeker of truth is inclined to do naturally, as a consequence of the obvious selfish practice of improving arguments wherever they happen to be found. In the outbound direction, if someone makes a flawed criticism of my ideas, of course I want to fix the flaws and address the improved argument. If the original criticism is faulty, but the repaired criticism exposes a key weakness in my existing ideas, then I learn something, which is great. If I were to just rebut the original criticism without trying to repair it, then I wouldn't learn anything, which would be terrible.

Likewise, in the inbound direction, if my interlocutor notices a flaw in my criticism of their ideas and fixes the flaw before addressing the repaired criticism, that's great. Why would I object?

The motivation here may be clearer if we consider the process of constructing computer programs rather than constructing arguments. When a colleague or language model assistant suggests an improvement to my code, I often accept the suggestion with my own ("steelmanned"?) changes rather than verbatim. This is so commonplace among programmers that it doesn't even have a special name.

Bensinger quotes Eliezer Yudkowsky writing, "If you want to try to make a genuine effort to think up better arguments yourself because they might exist, don't drag the other person into it," but this bizarrely seems to discount the possibility of iterating on criticisms as they are posed. Despite making a genuine effort to think up better code that might exist, I often fail. If other people can see flaws in my code (because they know things I don't) and have their own suggestions, and I can see flaws in their suggestions (because I also know things they don't which didn't make it into my first draft) and have my own counter-suggestions, that seems like an ideal working relationship, not a malign imposition.

All this having been said, I agree that there's a serious potential failure mode where someone who thinks of themselves as steelmanning is actually constructing worse arguments than those that they purport to be improving. In this case, indeed, prompting such a delusional interlocutor to try the ITT first is a crucial remedy.

But crucial remedies are still niche in the sense that they shouldn't be "a standard thing you bring out in most arguments"—or if they are, it's a sign that you need to find better interlocutors. Having to explicitly drag out the ITT is a sign of sickness, not a sign of health. It shouldn't be normal to have to resort to roleplaying exercises to achieve the benefits that could as well be had from basic reading comprehension and a selfish interest in accurate shared maps.

Steven Kaas wrote in 2008:

If you're interested in being on the right side of disputes, you will refute your opponents' arguments. But if you’re interested in producing truth, you will fix your opponents' arguments for them.

To win, you must fight not only the creature you encounter; you must fight the most horrible thing that can be constructed from its corpse.

The ITT is a useful tool for being on the right side of disputes: in order to knowably refute your opponents' arguments, you should be able to demonstrate that you know what those arguments are. I am nevertheless left with a sense that more is possible.

New to LessWrong?

New Comment
33 comments, sorted by Click to highlight new comments since: Today at 11:27 PM

A thing which happens a lot for me in debate: I'm pretty confident that I understand the process-which-produced my interlocutor's position, I can see what specific mistakes they're making and how those mistakes produce their position. And I do not think the situation is symmetric. In that situation, attempting to pass their ITT is a useful tool to:

  1. (By passing the ITT) costly-signal that I in fact understand their position and they do not understand mine, therefore they should probably be updating from me moreso than vice-versa, OR
  2. (By failing the ITT) find out that I do not understand their position, therefore I should expect to update myself to a greater extent.

Another way to think about it: often in debate, one person just understands the topic much better than the other, and the ideal outcome would be for both participants to figure this out and transform the debate into a lesson instead (i.e. student-teacher interaction rather than debate). The ITT is a good tool for that purpose.

Hmm, if you're a "selfish seeker of truth", debates don't seem like a very good source of truth. Even reading debates feels often shallow and pointless to me, compared to reading books.

Maybe the reason is that the best parts of something are rarely the most disputed parts. For example, in Christianity there's a lot of good stuff like "don't judge lest ye be judged", but debates are more focused on stuff like "haha, magic sky fairy".

in Christianity there’s a lot of good stuff like “don’t judge lest ye be judged”

Is that really “good stuff”??

… ok, I don’t actually want to start a debate about Christian morality here in this comment section, but I do want to note something else: you say that “the best parts of something are rarely the most disputed parts”—but might it not also (or instead? or separately?) be true that… hm… maybe something like, “the most disputed parts are not the parts that it might be most fruitful to dispute”? Like, it’s pretty obvious that it would be a colossal waste of our time if we decided to have yet another discussion about “haha magic sky fairy”, but discussing the “don’t judge” thing seems like it might well yield some insights that we (i.e., either or both of you and me, and/or some of any other participants) might not have had yet…

[+][comment deleted]2mo20
[+][comment deleted]4mo20

Forgive me if I am missing something obvious, but: how exactly can you steelman someone’s ideas if you do not understand them in the first place? In other words, isn’t ITT-passing a prerequisite to steelmanning…?

(Not that I necessarily endorse steelmanning as a practice, and consequently I am taking no position at this time on whether ITT-passing is necessary; but the above seems to me to be a basic confusion of some sort. Perhaps on my part, though…? What am I missing?)

In the limiting case where understanding is binary (either you totally get it, or you don't get it at all), you're right. That's an important point that I was remiss not to address in the post! (If you think you would do very poorly on an ITT, you should be saying, "I don't get it," not trying to steelman.)

The reason I think this post is still useful is because I think understanding often isn't binary. Often, I "get it" in the sense that I can read the words in a comment with ordinary reading comprehension, but I also "don't get it" in the sense that I haven't deeply internalized the author's worldview to the extent that I could have written the comment myself. I'm saying that in such cases, I usually want to focus on extracting whatever value I can out of the words that were written (even if the value takes the form of "that gives me a related idea"), rather than honing my ability to emulate the author.

The ITT does something different and worthwhile: it establishes goodwill.

If you care not just about the truth but about me, I will engage with you differently and very likely in a better way to cooperatively work toward truth.

I keep meaning to write "an overlooked goddam basic of rationalist discourse: be fucking nice".

I think rationalists overlook massive improvements you get in discussion if the people are actively cooperating rather than feeling antipathy.

I don't consider managing people's emotions to be part of the subject matter of epistemic rationality, even if managing people's emotions is a good idea and useful for having good discussions in practice. If the ITT is advocated for as an epistemic rationality technique, but its actual function is to get people in a cooperative mood, that's a problem!

I don't consider managing people's emotions to be part of the subject matter of epistemic rationality,

This sounds to me like an extremely large mistake. Emotions sure do seem to be the rate-limiting factor for epistemically productive interaction in a very large fraction of situations, and therefore managing them is a very central issue for human epistemic rationality in practice.

Zach's post is not vibe-neutral because nothing is vibe-neutral. There's a subtextual claim that: 1. when people criticize your arguments you should take it as a gift 2. when you criticise other people's opinions you should present it as a gift. 3. when "debating" be chill, as if you are at the grocery store check-out

I think this is a good strategy, and that (2) actually can succeed at at quelling bad emotional reaction. If you present an argument as an attack, or prematurely apologize for attacking, it will be felt like an attack. If you just present it with kindness, people will realize you mean no harm. If you present it with a detached professional "objectivity" and like actually feel [i just care about the truth] then ... well some people would still react badly but it should usually be fine. could be done with a bit more finesse maybe.

There's also 4. this is the right frame that people who read LW ought to take to debates with other people who read LW. Which I also agree with.

[I'm probably reading into Zach's writing stuff that he didn't intend to imply. But death of the author; I'm following the advice of the post]

It could be a mistake, or it could be an attempt to leave room for plausible deniability. There is something in his category war that doesn't quite add up. I don't know what the solution is; one of my main hypotheses is, as you say, that he is making some extremely large mistakes (not just wrt managing people's emotions, also object-level mistakes), but another is that cancel culture will punish him too much if he engages in fully transparent discourse, and therefore some things that look like mistakes are actually intentional to obscure what's going on.

Also to some extent it's just correct. If people are emotionally manipulating him, he has to drop out of managing their emotions.

Managing your own emotions is clearly a prerequisite to successful epistemic rationality practices, but other people’s emotions? That seems straightforwardly irrelevant.

What do you see as the prototypical problem in epistemic rationality? I see the prototypical problem as being creating an environment of collaborative truth-seeking, and there managing other's emotions is perfectly relevant.

[-]TAG4mo00

What do you see as the prototypical problem in epistemic rationality?

You need to so!ve epistemology, and you need an epistemology to solve epistemology.

Yep, this attitude is exactly what I'm talking about. Thinking that emotions and quality of discussions are two different topics is importantly wrong.

Apologies for leaving this as an assertion without further argument. It's important but nonobvious. It needs a full post.

Just to give the intuition: discussions with a hint of antipathy get bogged down in pointless argument as people unconsciously try to prove each other wrong. Establishing goodwill leads to more efficient and therefore more successful truth-seeking.

Just to give the intuition: discussions with a hint of antipathy get bogged down in pointless argument as people unconsciously try to prove each other wrong. Establishing goodwill leads to more efficient and therefore more successful truth-seeking.

“If I am wrong, I desire to believe I am wrong.” In other words, if you think someone’s wrong, then you should consciously try to prove it, no? Both for your own sake and for theirs (not to mention any third parties, which, in a public discussion forum, vastly outnumber the participants in any discussion!)?

Yes, absolutely. I'm not advocating being "nice" in the sense of pretending to agree when you don't. Being nice about disagreements it will help you do convince people when they're wrong better.

For instance, if they're obviously rushed and irritable, having that discussion briefly and badly may very well set them further into their mistaken belief.

In public discussions with more third parties it does change a lot. But it's important to recognize that how nice you are in public has a large impact on whether you change minds. (Being cleverly mean can help win you points with the already-converted by "dunking", but that's not helping with truth seeking).

If all the effects of ITT were limited to establishing cooperative discussions it would still have huge instrumental benefit for systematic truthseeking. You may dislike classifying it as part of epistemic rationality, but the fact that people who use this technique have more fruitful discussions and thus, all other things being equal, more accurate views, would still be true.

This is, however, not the case. For the reasons I've already mentioned in another comment. But also because there is an intersection between "niceness" and "rationality" - the virtue of accuracy.

By “nice” do you mean something like “polite” or “courteous”? If so, then I do agree that it’s a basic principle of rationalist discourse to be “nice”; but on the other hand, in that case I do not agree that it is necessary to care about someone in order to be “nice” to them in this sense.

On the other hand, if by “be nice” you mean something that does require caring about someone, then I do not agree that it’s a basic principle of rationalist discourse. (Indeed it may be actively detrimental, in some cases; and certainly any attempt to mandate such a thing, or even to raise it to the level of an expectation, ought to be strongly quashed.)

So, could you clarify which sort of thing you have in mind here?

By nice I do mean actually caring about someone a little bit. That's needed to make sure we're cooperative vs. combative in discussion. I think this is a lower bar than you are thinking. I care enough about everyone I pass on the street enough to save them a minute if it takes me a couple of seconds to do it.

If you've ever noticed how two people that clearly don't care even that much have a discussion, I think you'll agree that they aren't as productive as when people are in disagreement but cooperatively inclined.

By nice I do mean actually caring about someone a little bit. That’s needed to make sure we’re cooperative vs. combative in discussion. I think this is a lower bar than you are thinking. I care enough about everyone I pass on the street enough to save them a minute if it takes me a couple of seconds to do it.

Alright, but if what you mean by “caring about” someone is only this very low bar, then where, in rationalist spaces, are you meeting people who don’t “care about” you…?

In fact, this seems to me to be such a low bar that determining whether someone meets it is going to be overwhelmed by noise in your judgment. For almost any plausible behavior on a discussion forum like this, how sure are you whether it’s motivated by such minimal “caring” or not?

In short, I would like to see what it means, in your view, to not “care about” someone, in this sense, before I can make any sense of the claim that “be nice” is a basic principle of rationalist discourse, that there are improvements to discussion from doing this sort of “caring” over not doing it, etc. Right now, I struggle to imagine it!

Depends on the topic. ITT-colored charity is a curiosity technique that enables more meaningful communication, gives access to others' gears that might be harder to notice or invent yourself, and occasionally teaches frames that are more efficient at capturing a topic. Its absence is key to maintaining systematic misunderstandings and reinventing wheels, including through it remaining unnecessarily difficult to understand some topics without a more native frame for thinking about them.

Steelmanning is completely orthogonal, but superficially it doesn't seem to be, which is why in short interactions it's useful to keep it at a distance from efforts at understanding unfamiliar frames. It intentionally changes what someone is saying, and if they don't understand that this is the point, they might be less cooperative at explaining what they do mean. When understanding of the point of steelmanning is secured, it regains its role as a staple of idea debugging.

but this bizarrely seems to discount the possibility of iterating on criticisms as they are posed. Despite making a genuine effort to think up better code that might exist, I often fail. If other people can see flaws in my code (because they know things I don't) and have their own suggestions, and I can see flaws in their suggestions (because I also know things they don't which didn't make it into my first draft) and have my own counter-suggestions, that seems like an ideal working relationship, not a malign imposition.

And this is exactly the reason why ITT is so important. Other people may know something that you don't that's why your attempts to steelman them may fail dramatically - the strongest version of their argument you manage to come up with, may and often is weaker than the strongest version of the argument they, or a person who can pass their ITT can come up with.

In general, if you expect to learn something from another person, it's useful to be able to understand them. And passing ITT is about understanding. You may not value understanding another person for it's own sake, but it's an important instrumental value for a truth seeker, for the reasons you talk about.

the strongest version of their argument you manage to come up with, may and often is weaker than the strongest version of the argument they, or a person who can pass their ITT can come up with.

I mean, you should definitely only steelman if a genuine steelman actually occurs to you! You obviously don't want to ignore the text that the other person wrote and just make something up. But my hope and expectation is that people usually have enough reading comprehension such that it suffices to just reason about the text that was written, even if you couldn't have generated it yourself.

In the case of a drastic communication failure, sure, falling back to the ITT can make sense. (I try to address this in the post in the paragraph beginning with "All this having been said, I agree that there's a serious potential failure mode [...]".) My thesis is that this is a niche use-case.

It may be the case, that you think that genuine steelman occured to you, while still making a weaker version of the argument than the person you are talking to could have, because you misunderstand their position.

It's not just about reading comprehension. Communicating complex ideas is generally hard and there are possible mistakes and misgeneralisations on every step made by both the sender and the receiver.

A person has a position P which is based on arguments A1, A2, A3, A4, A5. While describing their arguments, they just omit A1, because it seems just so obvious to them and A4 and A5, because they are too nuanced and their post is already too long or they simply forgot, so they bring up only A2' and A3' - their interpretation of A2 and A3, which may or may not equal A2 and A3.

You interpret them as A2'' and A3''. Not because you can't read properly, but due to language being imperfect medium of ideas transfer. Now if you can generalise P from the arguments you've got - pas the ITT - all is fine. It means that you can fix whatever communication mistakes occured, recreate the missing arguments yourself, understand the relation between them and see what actually make them stronger or weaker not only according to you, but from the position of the person you are talking to. In practice, reaching this level of understanding is usually enough to challenge your views and progress truth seeking. But if it's not enough, you can try to steelman any of the A1-A5, getting A1*-A5*.

But with all likelihood you can't generalise P. And if you are not interested in passing ITT, you won't even know it. So your genuine attempts to steelman will produce A2''* and A3''*, which may not even be stronger than A2 and A3, because you don't yet understand what makes these arguments strong in relation to P.

making a weaker version of the argument than the person you are talking to could have, because you misunderstand their position

Steelmanning generates hypotheticals to figure out. It's less useful when thinking about the hypotheticals doesn't teach you new things. Even actually being "stronger" is relatively unimportant, it's a heuristic for finding good hypotheticals, not a reference to what makes them good. Understanding someone's position or remaining on-topic is even less relevant, since these are neither the heuristic nor what makes the hypotheticals generated with it valuable.

You are correct.

In my comment I was using "strength" as a catch-all-term for general usefulness of the argument, without going too deep into the nuances. I don't think that this approximation failed to deliver the point.

[-]TAG4mo42

But usually, I’m much more interested in modeling the real-world phenomena of interest directly, rather than modeling someone else’s models of it.

If you can do so you should. But it often isnt straightforward....there is no deterministic algorithm for building a model One proposes and tests a hypotheical model, another proposes and tests another. Two models can stand unfalsified by evidence, in which case you need to look at the models themselves to.decide between them.

I already have strong reasons to doubt the existence of ontologically fundamental mental entities.

Maybe but the reasons given there aren't strong

I guess the cost-benefit calculation of these things depends on your estimates of:

  • how useful is the other person's knowledge
  • how weird is their worldview (compared to yours)
  • how cooperative the person is
  • ...some other factors...

Depending on the numbers, different strategies may be preferable.

I couldn't pass an ITT for advocates of Islam or extrasensory perception. On the one hand, this does represent a distinct deficit in my ability to model what the advocates of these ideas are thinking, a tragic gap in my comprehension of reality, which I would hope to remedy in the Glorious Transhumanist Future if that were a real thing. On the other hand, facing the constraints of our world, my inability to pass an ITT for Islam or ESP seems ... basically fine? I already have strong reasons to doubt the existence of ontologically fundamental mental entities. I accept my ignorance of the reasons someone might postulate otherwise, not out of contempt, but because I just don't have the time.

I think there's a hidden or assumed goal here that I don't understand. The goal clearly isn't truth for it's own sake because then there wouldn't be a distinction between the truth of what they believe and the truth of whats real. You can of course make a distinction such as Simulacra levels but ultimately it's all part of the territory.

If the goal is instrumental ability to impact the world, I think probably a good portion of the time it's as important to understand peoples beliefs as the reality, because a good portion of the time your impact will he based on not just knowing the truth, but convincing others to change their actions or beliefs.

So what actually is the goal you are after?

I agree with this, and I've been fairly unimpressed by the critiques of steelmanning that I've read, including Rob's post. Maybe I would change my mind if someone wrote a long post full of concrete examples[1] of steelmen going wrong, and of ITT being an efficient alternative. I think I pretty regularly see good reasoning and argumentation in the form of steelmen.

  1. ^

    But it's not trivial to contrive examples of arguments that will effectively get your point across without triggering, distracting, or alienating too much of the audience.

Steelmanning is about finding the truth, ITT is about convincing someone in a debate. Different aims.