If you are silent about your pain, they'll kill you and say you enjoyed it.

—Zora Neale Hurston

Recapping my Whole Dumb Story so far—in a previous post, "Sexual Dimorphism in Yudkowsky's Sequences, in Relation to My Gender Problems", I told the part about how I've "always" (since puberty) had this obsessive sexual fantasy about being magically transformed into a woman and also thought it was immoral to believe in psychological sex differences, until I got set straight by these really great Sequences of blog posts by Eliezer Yudkowsky, which taught me (incidentally, among many other things) how absurdly unrealistic my obsessive sexual fantasy was given merely human-level technology, and that it's actually immoral not to believe in psychological sex differences given that psychological sex differences are actually real. In a subsequent post, "Blanchard's Dangerous Idea and the Plight of the Lucid Crossdreamer", I told the part about how, in 2016, everyone in my systematically-correct-reasoning community up to and including Eliezer Yudkowsky suddenly started claiming that guys like me might actually be women in some unspecified metaphysical sense and insisted on playing dumb when confronted with alternative explanations of the relevant phenomena, until I eventually had a sleep-deprivation- and stress-induced delusional nervous breakdown.

That's not the egregious part of the story. Psychology is a complicated empirical science: no matter how obvious I might think something is, I have to admit that I could be wrong—not just as an obligatory profession of humility, but actually wrong in the real world. If my fellow rationalists merely weren't sold on the thesis about autogynephilia as a cause of transsexuality, I would be disappointed, but it wouldn't be grounds to denounce the entire community as a failure or a fraud. And indeed, I did end up moderating my views compared to the extent to which my thinking in 2016–7 took the views of Ray Blanchard, J. Michael Bailey, and Anne Lawrence as received truth. (At the same time, I don't particularly regret saying what I said in 2016–7, because Blanchard–Bailey–Lawrence is still obviously directionally correct compared to the nonsense everyone else was telling me.)

But a striking pattern in my attempts to argue with people about the two-type taxonomy in late 2016 and early 2017 was the tendency for the conversation to get derailed on some variation of, "Well, the word woman doesn't necessarily mean that," often with a link to "The Categories Were Made for Man, Not Man for the Categories", a November 2014 post by Scott Alexander arguing that because categories exist in our model of the world rather than the world itself, there's nothing wrong with simply defining trans people as their preferred gender to alleviate their dysphoria.

After Yudkowsky had stepped away from full-time writing, Alexander had emerged as our subculture's preeminent writer. Most people in an intellectual scene "are writers" in some sense, but Alexander was the one "everyone" reads: you could often reference a Slate Star Codex post in conversation and expect people to be familiar with the idea, either from having read it, or by osmosis. The frequency with which "... Not Man for the Categories" was cited at me seemed to suggest it had become our subculture's party line on trans issues.

But the post is wrong in obvious ways. To be clear, it's true that categories exist in our model of the world, rather than the world itself—categories are "map", not "territory"—and it's possible that trans women might be women with respect to some genuinely useful definition of the word "woman." However, Alexander goes much further, claiming that we can redefine gender categories to make trans people feel better:

I ought to accept an unexpected man or two deep inside the conceptual boundaries of what would normally be considered female if it'll save someone's life. There's no rule of rationality saying that I shouldn't, and there are plenty of rules of human decency saying that I should.

This is wrong because categories exist in our model of the world in order to capture empirical regularities in the world itself: the map is supposed to reflect the territory, and there are "rules of rationality" governing what kinds of word and category usages correspond to correct probabilistic inferences. Yudkowsky had written a whole Sequence about this, "A Human's Guide to Words". Alexander cites a post from that Sequence in support of the (true) point about how categories are "in the map" ... but if you actually read the Sequence, another point that Yudkowsky pounds home over and over, is that word and category definitions are nevertheless not arbitrary: you can't define a word any way you want, because there are at least 37 ways that words can be wrong—principles that make some definitions perform better than others as "cognitive technology."

In the case of Alexander's bogus argument about gender categories, the relevant principle (#30 on the list of 37) is that if you group things together in your map that aren't actually similar in the territory, you're going to make bad inferences.

Crucially, this is a general point about how language itself works that has nothing to do with gender. No matter what you believe about controversial empirical questions, intellectually honest people should be able to agree that "I ought to accept an unexpected [X] or two deep inside the conceptual boundaries of what would normally be considered [Y] if [positive consequence]" is not the correct philosophy of language, independently of the particular values of X and Y.

This wasn't even what I was trying to talk to people about. I thought I was trying to talk about autogynephilia as an empirical theory of psychology of late-onset gender dysphoria in males, the truth or falsity of which cannot be altered by changing the meanings of words. But at this point, I still trusted people in my robot cult to be basically intellectually honest, rather than slaves to their political incentives, so I endeavored to respond to the category-boundary argument under the assumption that it was an intellectually serious argument that someone could honestly be confused about.

When I took a year off from dayjobbing from March 2017 to March 2018 to have more time to study and work on this blog, the capstone of my sabbatical was an exhaustive response to Alexander, "The Categories Were Made for Man to Make Predictions" (which Alexander graciously included in his next links post). A few months later, I followed it with "Reply to The Unit of Caring on Adult Human Females", responding to a similar argument from soon-to-be Vox journalist Kelsey Piper, then writing as The Unit of Caring on Tumblr.

I'm proud of those posts. I think Alexander's and Piper's arguments were incredibly dumb, and that with a lot of effort, I did a pretty good job of explaining why to anyone who was interested and didn't, at some level, prefer not to understand.

Of course, a pretty good job of explaining by one niche blogger wasn't going to put much of a dent in the culture, which is the sum of everyone's blogposts; despite the mild boost from the Slate Star Codex links post, my megaphone just wasn't very big. I was disappointed with the limited impact of my work, but not to the point of bearing much hostility to "the community." People had made their arguments, and I had made mine; I didn't think I was entitled to anything more than that.

Really, that should have been the end of the story. Not much of a story at all. If I hadn't been further provoked, I would have still kept up this blog, and I still would have ended up arguing about gender with people sometimes, but this personal obsession wouldn't have been the occasion of a robot-cult religious civil war involving other people whom you'd expect to have much more important things to do with their time.

The casus belli for the religious civil war happened on 28 November 2018. I was at my new dayjob's company offsite event in Austin, Texas. Coincidentally, I had already spent much of the previous two days (since just before the plane to Austin took off) arguing trans issues with other "rationalists" on Discord.

Just that month, I had started a Twitter account using my real name, inspired in an odd way by the suffocating wokeness of the Rust open-source software scene where I occasionally contributed diagnostics patches to the compiler. My secret plan/fantasy was to get more famous and established in the Rust world (one of compiler team membership, or conference talk accepted, preferably both), get some corresponding Twitter followers, and then bust out the @BlanchardPhd retweets and links to this blog. In the median case, absolutely nothing would happen (probably because I failed at being famous), but I saw an interesting tail of scenarios in which I'd get to be a test case in the Code of Conduct wars.

So, now having a Twitter account, I was browsing Twitter in the bedroom at the rental house for the dayjob retreat when I happened to come across this thread by @ESYudkowsky:

Some people I usually respect for their willingness to publicly die on a hill of facts, now seem to be talking as if pronouns are facts, or as if who uses what bathroom is necessarily a factual statement about chromosomes. Come on, you know the distinction better than that!

Even if somebody went around saying, "I demand you call me 'she' and furthermore I claim to have two X chromosomes!", which none of my trans colleagues have ever said to me by the way, it still isn't a question-of-empirical-fact whether she should be called "she". It's an act.

In saying this, I am not taking a stand for or against any Twitter policies. I am making a stand on a hill of meaning in defense of validity, about the distinction between what is and isn't a stand on a hill of facts in defense of truth.

I will never stand against those who stand against lies. But changing your name, asking people to address you by a different pronoun, and getting sex reassignment surgery, Is. Not. Lying. You are ontologically confused if you think those acts are false assertions.

Some of the replies tried to explain the obvious problem—and Yudkowsky kept refusing to understand:

Using language in a way you dislike, openly and explicitly and with public focus on the language and its meaning, is not lying. The proposition you claim false (chromosomes?) is not what the speech is meant to convey—and this is known to everyone involved, it is not a secret.

Now, maybe as a matter of policy, you want to make a case for language being used a certain way. Well, that's a separate debate then. But you're not making a stand for Truth in doing so, and your opponents aren't tricking anyone or trying to.


You're mistaken about what the word means to you, I demonstrate thus: https://en.wikipedia.org/wiki/XX_male_syndrome

But even ignoring that, you're not standing in defense of truth if you insist on a word, brought explicitly into question, being used with some particular meaning.

Dear reader, this is the moment where I flipped out. Let me explain.

This "hill of meaning in defense of validity" proclamation was such a striking contrast to the Eliezer Yudkowsky I remembered—the Eliezer Yudkowsky I had variously described as having "taught me everything I know" and "rewritten my personality over the internet"—who didn't hesitate to criticize uses of language that he thought were failing to "carve reality at the joints", even going so far as to call them "wrong":

[S]aying "There's no way my choice of X can be 'wrong'" is nearly always an error in practice, whatever the theory. You can always be wrong. Even when it's theoretically impossible to be wrong, you can still be wrong. There is never a Get-Out-Of-Jail-Free card for anything you do. That's life.


Once upon a time it was thought that the word "fish" included dolphins. Now you could play the oh-so-clever arguer, and say, "The list: {Salmon, guppies, sharks, dolphins, trout} is just a list—you can't say that a list is wrong. I can prove in set theory that this list exists. So my definition of fish, which is simply this extensional list, cannot possibly be 'wrong' as you claim."

Or you could stop playing nitwit games and admit that dolphins don't belong on the fish list.

You come up with a list of things that feel similar, and take a guess at why this is so. But when you finally discover what they really have in common, it may turn out that your guess was wrong. It may even turn out that your list was wrong.

You cannot hide behind a comforting shield of correct-by-definition. Both extensional definitions and intensional definitions can be wrong, can fail to carve reality at the joints.

One could argue that this "Words can be wrong when your definition draws a boundary around things that don't really belong together" moral didn't apply to Yudkowsky's new Tweets, which only mentioned pronouns and bathroom policies, not the extensions of common nouns.

But this seems pretty unsatisfying in the context of Yudkowsky's claim to "not [be] taking a stand for or against any Twitter policies". One of the Tweets that had recently led to radical feminist Meghan Murphy getting kicked off the platform read simply, "Men aren't women tho." This doesn't seem like a policy claim; rather, Murphy was using common language to express the fact-claim that members of the natural category of adult human males, are not, in fact, members of the natural category of adult human females.

Thus, if the extension of common words like "woman" and "man" is an issue of epistemic importance that rationalists should care about, then presumably so was Twitter's anti-misgendering policy—and if it isn't (because you're not standing in defense of truth if you insist on a word, brought explicitly into question, being used with some particular meaning) then I wasn't sure what was left of the "Human's Guide to Words" Sequence if the 37-part grand moral needed to be retracted.

I think I am standing in defense of truth when I have an argument for why my preferred word usage does a better job at carving reality at the joints, and the one bringing my usage explicitly into question does not. As such, I didn't see the practical difference between "you're not standing in defense of truth if you insist on a word, brought explicitly into question, being used with some particular meaning," and "I can define a word any way I want." About which, again, an earlier Eliezer Yudkowsky had written:

"It is a common misconception that you can define a word any way you like. [...] If you believe that you can 'define a word any way you like', without realizing that your brain goes on categorizing without your conscious oversight, then you won't take the effort to choose your definitions wisely."

"So that's another reason you can't 'define a word any way you like': You can't directly program concepts into someone else's brain."

"When you take into account the way the human mind actually, pragmatically works, the notion 'I can define a word any way I like' soon becomes 'I can believe anything I want about a fixed set of objects' or 'I can move any object I want in or out of a fixed membership test'."

"There's an idea, which you may have noticed I hate, that 'you can define a word any way you like'."

"And of course you cannot solve a scientific challenge by appealing to dictionaries, nor master a complex skill of inquiry by saying 'I can define a word any way I like'."

"Categories are not static things in the context of a human brain; as soon as you actually think of them, they exert force on your mind. One more reason not to believe you can define a word any way you like."

"And people are lazy. They'd rather argue 'by definition', especially since they think 'you can define a word any way you like'."

"And this suggests another—yes, yet another—reason to be suspicious of the claim that 'you can define a word any way you like'. When you consider the superexponential size of Conceptspace, it becomes clear that singling out one particular concept for consideration is an act of no small audacity—not just for us, but for any mind of bounded computing power."

"I say all this, because the idea that 'You can X any way you like' is a huge obstacle to learning how to X wisely. 'It's a free country; I have a right to my own opinion' obstructs the art of finding truth. 'I can define a word any way I like' obstructs the art of carving reality at its joints. And even the sensible-sounding 'The labels we attach to words are arbitrary' obstructs awareness of compactness."

"One may even consider the act of defining a word as a promise to [the] effect [...] [that the definition] will somehow help you make inferences / shorten your messages."

One could argue that I was unfairly interpreting Yudkowsky's Tweets as having a broader scope than was intended—that Yudkowsky only meant to slap down the false claim that using he for someone with a Y chromosome is "lying", without intending any broader implications about trans issues or the philosophy of language. It wouldn't be realistic or fair to expect every public figure to host an exhaustive debate on all related issues every time they encounter a fallacy they want to Tweet about.

However, I don't think this "narrow" reading is the most natural one. Yudkowsky had previously written of what he called the fourth virtue of evenness: "If you are selective about which arguments you inspect for flaws, or how hard you inspect for flaws, then every flaw you learn how to detect makes you that much stupider." He had likewise written on reversed stupidity (bolding mine):

To argue against an idea honestly, you should argue against the best arguments of the strongest advocates. Arguing against weaker advocates proves nothing, because even the strongest idea will attract weak advocates.

Relatedly, Scott Alexander had written about how "weak men are superweapons": speakers often selectively draw attention to the worst arguments in favor of a position in an attempt to socially discredit people who have better arguments (which the speaker ignores). In the same way, by just slapping down a weak man from the "anti-trans" political coalition without saying anything else in a similarly prominent location, Yudkowsky was liable to mislead his faithful students into thinking that there were no better arguments from the "anti-trans" side.

To be sure, it imposes a cost on speakers to not be able to Tweet about one specific annoying fallacy and then move on with their lives without the need for endless disclaimers about related but stronger arguments that they're not addressing. But the fact that Yudkowsky disclaimed that he wasn't taking a stand for or against Twitter's anti-misgendering policy demonstrates that he didn't have an aversion to spending a few extra words to prevent the most common misunderstandings.

Given that, it's hard to read the Tweets Yudkowsky published as anything other than an attempt to intimidate and delegitimize people who want to use language to reason about sex rather than gender identity. It's just not plausible that Yudkowsky was simultaneously savvy enough to choose to make these particular points while also being naïve enough to not understand the political context. Deeper in the thread, he wrote:

The more technology advances, the further we can move people towards where they say they want to be in sexspace. Having said this we've said all the facts. Who competes in sports segregated around an Aristotelian binary is a policy question (that I personally find very humorous).

Sure, in the limit of arbitrarily advanced technology, everyone could be exactly where they wanted to be in sexpsace. Having said this, we have not said all the facts relevant to decisionmaking in our world, where we do not have arbitrarily advanced technology (as Yudkowsky well knew, having written a post about how technically infeasible an actual sex change would be). As Yudkowsky acknowledged in the previous Tweet, "Hormone therapy changes some things and leaves others constant." The existence of hormone replacement therapy does not itself take us into the glorious transhumanist future where everyone is the sex they say they are.

The reason for sex-segregated sports leagues is that sport-relevant multivariate trait distributions of female bodies and male bodies are different: men are taller, stronger, and faster. If you just had one integrated league, females wouldn't be competitive (in the vast majority of sports, with a few exceptions like ultra-distance swimming that happen to sample an unusually female-favorable corner of sportspace).

Given the empirical reality of the different trait distributions, "Who are the best athletes among females?" is a natural question for people to be interested in and want separate sports leagues to determine. Including male people in female sports leagues undermines the point of having a separate female league, and hormone replacement therapy after puberty doesn't substantially change the picture here.[1]

Yudkowsky's suggestion that an ignorant commitment to an "Aristotelian binary" is the main reason someone might care about the integrity of women's sports is an absurd strawman. This just isn't something any scientifically literate person would write if they had actually thought about the issue at all, as opposed to having first decided (consciously or not) to bolster their reputation among progressives by dunking on transphobes on Twitter, and then wielding their philosophy knowledge in the service of that political goal. The relevant facts are not subtle, even if most people don't have the fancy vocabulary to talk about them in terms of "multivariate trait distributions."

I'm picking on the "sports segregated around an Aristotelian binary" remark because sports is a case where the relevant effect sizes are so large as to make the point hard for all but the most ardent gender-identity partisans to deny. (For example, what the Cohen's d2.6 effect size difference in muscle mass means is that a woman as strong as the average man is at the 99.5th percentile for women.) But the point is general: biological sex exists and is sometimes decision-relevant. People who want to be able to talk about sex and make policy decisions on the basis of sex are not making an ontology error, because the ontology in which sex "actually" "exists" continues to make very good predictions in our current tech regime (if not the glorious transhumanist future). It would be a ridiculous isolated demand for rigor to expect someone to pass a graduate exam about the philosophy and cognitive science of categorization before they can talk about sex.

Thus, Yudkowsky's claim to merely have been standing up for the distinction between facts and policy questions doesn't seem credible. It is, of course, true that pronoun and bathroom conventions are policy decisions rather than matters of fact, but it's bizarre to condescendingly point this out as if it were the crux of contemporary trans-rights debates. Conservatives and gender-critical feminists know that trans-rights advocates aren't falsely claiming that trans women have XX chromosomes! If you just wanted to point out that the rules of sports leagues are a policy question rather than a fact (as if anyone had doubted this), why would you throw in the "Aristotelian binary" weak man and belittle the matter as "humorous"? There are a lot of issues I don't care much about, but I don't see anything funny about the fact that other people do care.[2]

If any concrete negative consequence of gender self-identity categories is going to be waved away with, "Oh, but that's a mere policy decision that can be dealt with on some basis other than gender, and therefore doesn't count as an objection to the new definition of gender words", then it's not clear what the new definition is for.

Like many gender-dysphoric males, I cosplay female characters at fandom conventions sometimes. And, unfortunately, like many gender-dysphoric males, I'm not very good at it. I think someone looking at some of my cosplay photos and trying to describe their content in clear language—not trying to be nice to anyone or make a point, but just trying to use language as a map that reflects the territory—would say something like, "This is a photo of a man and he's wearing a dress." The word man in that sentence is expressing cognitive work: it's a summary of the lawful cause-and-effect evidential entanglement whereby the photons reflecting off the photograph are correlated with photons reflecting off my body at the time the photo was taken, which are correlated with my externally observable secondary sex characteristics (facial structure, beard shadow, &c.). From this evidence, an agent using an efficient naïve-Bayes-like model can assign me to its "man" (adult human male) category and thereby make probabilistic predictions about traits that aren't directly observable from the photo. The agent would achieve a better score on those predictions than if it had assigned me to its "woman" (adult human female) category.

By "traits" I mean not just sex chromosomes (as Yudkowsky suggested on Twitter), but the conjunction of dozens or hundreds of measurements that are causally downstream of sex chromosomes: reproductive organs and muscle mass (again, sex difference effect size of Cohen's d ≈ 2.6) and Big Five Agreeableness (d ≈ 0.5) and Big Five Neuroticism (d ≈ 0.4) and short-term memory (d ≈ 0.2, favoring women) and white-gray-matter ratios in the brain and probable socialization history and any number of other things—including differences we might not know about, but have prior reasons to suspect exist. No one knew about sex chromosomes before 1905, but given the systematic differences between women and men, it would have been reasonable to suspect the existence of some sort of molecular mechanism of sex determination.

Forcing a speaker to say "trans woman" instead of "man" in a sentence about my cosplay photos depending on my verbally self-reported self-identity may not be forcing them to lie, exactly. It's understood, "openly and explicitly and with public focus on the language and its meaning," what trans women are; no one is making a false-to-fact claim about them having ovaries, for example. But it is forcing the speaker to obfuscate the probabilistic inference they were trying to communicate with the original sentence (about modeling the person in the photograph as being sampled from the "man" cluster in configuration space), and instead use language that suggests a different cluster-structure. ("Trans women", two words, are presumably a subcluster within the "women" cluster.) Crowing in the public square about how people who object to being forced to "lie" must be ontologically confused is ignoring the interesting part of the problem. Gender identity's claim to be non-disprovable functions as a way to avoid the belief's real weak points.

To this, one might reply that I'm giving too much credit to the "anti-trans" faction for how stupid they're not being: that my careful dissection of the hidden probabilistic inferences implied by words (including pronoun choices) is all well and good, but calling pronouns "lies" is not something you do when you know how to use words.

But I'm not giving them credit for for understanding the lessons of "A Human's Guide to Words"; I just think there's a useful sense of "know how to use words" that embodies a lower standard of philosophical rigor. If a person-in-the-street says of my cosplay photos, "That's a man! I have eyes, and I can see that that's a man! Men aren't women!"—well, I probably wouldn't want to invite them to a Less Wrong meetup. But I do think the person-in-the-street is performing useful cognitive work. Because I have the hidden-Bayesian-structure-of-language-and-cognition-sight (thanks to Yudkowsky's writings back in the 'aughts), I know how to sketch out the reduction of "Men aren't women" to something more like "This cognitive algorithm detects secondary sex characteristics and uses it as a classifier for a binary female/male 'sex' category, which it uses to make predictions about not-yet-observed features ..."

But having done the reduction-to-cognitive-algorithms, it still looks like the person-in-the-street has a point that I shouldn't be allowed to ignore just because I have 30 more IQ points and better philosophy-of-language skills?

I bring up my bad cosplay photos as an edge case that helps illustrate the problem I'm trying to point out, much like how people love to bring up complete androgen insensitivity syndrome to illustrate why "But chromosomes!" isn't the correct reduction of sex classification. To differentiate what I'm saying from blind transphobia, let me note that I predict that most people-in-the-street would be comfortable using feminine pronouns for someone like Blaire White. That's evidence about the kind of cognitive work people's brains are doing when they use English pronouns! Certainly, English is not the only language, and ours is not the only culture; maybe there is a way to do gender categories that would be more accurate and better for everyone. But to find what that better way is, we need to be able to talk about these kinds of details in public, and the attitude evinced in Yudkowsky's Tweets seemed to function as a semantic stopsign to get people to stop talking about the details.

If you were interested in having a real discussion (instead of a fake discussion that makes you look good to progressives), why would you slap down the "But, but, chromosomes" fallacy and then not engage with the obvious steelman of "But, but, clusters in high-dimensional configuration space that aren't actually changeable with contemporary technology" steelman which was, in fact, brought up in the replies?

Satire is a weak form of argument: the one who wishes to doubt will always be able to find some aspect in which an obviously absurd satirical situation differs from the real-world situation being satirized and claim that that difference destroys the relevance of the joke. But on the off chance that it might help illustrate the objection, imagine you lived in a so-called "rationalist" subculture where conversations like this happened—

Bob: Look at this adorable cat picture!
Alice: Um, that looks like a dog to me, actually.
Bob: You're not standing in defense of truth if you insist on a word, brought explicitly into question, being used with some particular meaning. Now, maybe as a matter of policy, you want to make a case for language being used a certain way. Well, that's a separate debate then.

If you were Alice, and a solid supermajority of your incredibly smart, incredibly philosophically sophisticated friend group including Eliezer Yudkowsky (!!!) seemed to behave like Bob, that would be a worrying sign about your friends' ability to accomplish intellectually hard things like AI alignment, right? Even if there isn't any pressing practical need to discriminate between dogs and cats, the problem is that Bob is selectively using his sophisticated philosophy-of-language knowledge to try to undermine Alice's ability to use language to make sense of the world, even though Bob obviously knows very well what Alice was trying to say. It's incredibly obfuscatory in a way that people—the same people—would not tolerate in almost any other context.

Imagine an Islamic theocracy in which one Megan Murfi (ميغان ميرفي) had recently gotten kicked off the dominant microblogging platform for speaking disrespectfully about the prophet Muhammad. Suppose that Yudkowsky's analogue in that world then posted that those objecting on free inquiry grounds were ontologically confused: saying "peace be upon him" after the name of the prophet Muhammad is a speech act, not a statement of fact. In banning Murfi for repeatedly speaking about the prophet Muhammad (peace be upon him) as if he were just some guy, the platform was merely "enforcing a courtesy standard" (in the words of our world's Yudkowsky). Murfi wasn't being forced to lie.

I think the atheists of our world, including Yudkowsky, would not have trouble seeing the problem with this scenario, nor hesitate to agree that it is a problem for that Society's rationality. Saying "peace be unto him" is indeed a speech act rather than a statement of fact, but it would be bizarre to condescendingly point this out as if it were the crux of debates about religious speech codes. The function of the speech act is to signal the speaker's affirmation of Muhammad's divinity. That's why the Islamic theocrats want to mandate that everyone say it: it's a lot harder for atheism to get any traction if no one is allowed to talk like an atheist.

And that's why trans advocates want to mandate against misgendering people on social media: it's harder for trans-exclusionary ideologies to get any traction if no one is allowed to talk like someone who believes that sex (sometimes) matters and gender identity does not.

Of course, such speech restrictions aren't necessarily "irrational", depending on your goals. If you just don't think "free speech" should go that far—if you want to suppress atheism or gender-critical feminism with an iron fist—speech codes are a perfectly fine way to do it! And to their credit, I think most theocrats and trans advocates are intellectually honest about what they're doing: atheists or transphobes are bad people (the argument goes) and we want to make it harder for them to spread their lies or their hate.

In contrast, by claiming to be "not taking a stand for or against any Twitter policies" while accusing people who opposed the policy of being ontologically confused, Yudkowsky was being less honest than the theocrat or the activist: of course the point of speech codes is to suppress ideas! Given that the distinction between facts and policies is so obviously not anyone's crux—the smarter people in the "anti-trans" faction already know that, and the dumber people in the faction wouldn't change their alignment if they were taught—it's hard to see what the point of harping on the fact/policy distinction would be, except to be seen as implicitly taking a stand for the "pro-trans" faction while putting on a show of being politically "neutral."

It makes sense that Yudkowsky might perceive political constraints on what he might want to say in public—especially when you look at what happened to the other Harry Potter author.[3] But if Yudkowsky didn't want to get into a distracting fight about a politically-charged topic, then maybe the responsible thing to do would have been to just not say anything about the topic, rather than engaging with the stupid version of the opposition and stonewalling with "That's a policy question" when people tried to point out the problem?!

I didn't have all of that criticism collected and carefully written up on 28 November 2018. But that, basically, is why I flipped out when I saw that Twitter thread. If the "rationalists" didn't click on the autogynephilia thing, that was disappointing, but forgivable. If the "rationalists", on Scott Alexander's authority, were furthermore going to get our own philosophy of language wrong over this, that was—I don't want to say forgivable exactly, but it was tolerable. I had learned from my misadventures the previous year that I had been wrong to trust "the community" as a reified collective. That had never been a reasonable mental stance in the first place.

But trusting Eliezer Yudkowsky—whose writings, more than any other single influence, had made me who I am—did seem reasonable. If I put him on a pedestal, it was because he had earned the pedestal, for supplying me with my criteria for how to think—including, as a trivial special case, how to think about what things to put on pedestals.

So if the rationalists were going to get our own philosophy of language wrong over this and Eliezer Yudkowsky was in on it (!!!), that was intolerable, inexplicable, incomprehensible—like there wasn't a real world anymore.

At the dayjob retreat, I remember going downstairs to impulsively confide in a senior engineer, an older bald guy who exuded masculinity, who you could tell by his entire manner and being was not infected by the Berkeley mind-virus, no matter how loyally he voted Democrat. I briefly explained the situation to him—not just the immediate impetus of this Twitter thread, but this whole thing of the past couple years where my entire social circle just suddenly decided that guys like me could be women by means of saying so. He was noncommittally sympathetic; he told me an anecdote about him accepting a trans person's correction of his pronoun usage, with the thought that different people have their own beliefs, and that's OK.

If Yudkowsky was already stonewalling his Twitter followers, entering the thread myself didn't seem likely to help. (Also, less importantly, I hadn't intended to talk about gender on that account yet.)

It seemed better to try to clear this up in private. I still had Yudkowsky's email address, last used when I had offered to pay to talk about his theory of MtF two years before. I felt bad bidding for his attention over my gender thing again—but I had to do something. Hands trembling, I sent him an email asking him to read my "The Categories Were Made for Man to Make Predictions", suggesting that it might qualify as an answer to his question about "a page [he] could read to find a non-confused exclamation of how there's scientific truth at stake". I said that because I cared very much about correcting confusions in my rationalist subculture, I would be happy to pay up to $1000 for his time—and that, if he liked the post, he might consider Tweeting a link—and that I was cc'ing my friends Anna Salamon and Michael Vassar as character references (Subject: "another offer, $1000 to read a ~6500 word blog post about (was: Re: Happy Price offer for a 2 hour conversation)"). Then I texted Anna and Michael, begging them to vouch for my credibility.

The monetary offer, admittedly, was awkward: I included another paragraph clarifying that any payment was only to get his attention, not quid quo pro advertising, and that if he didn't trust his brain circuitry not to be corrupted by money, then he might want to reject the offer on those grounds and only read the post if he expected it to be genuinely interesting.

Again, I realize this must seem weird and cultish to any normal people reading this. (Paying some blogger you follow one grand just to read one of your posts? What? Why? Who does that?) To this, I again refer to the reasons justifying my 2016 cheerful price offer—and that, along with tagging in Anna and Michael, whom I thought Yudkowsky respected, it was a way to signal that I really didn't want to be ignored, which I assumed was the default outcome. An ordinary programmer such as me was as a mere worm in the presence of the great Eliezer Yudkowsky. I wouldn't have had the audacity to contact him at all, about anything, if I didn't have Something to Protect.

Anna didn't reply, but I apparently did interest Michael, who chimed in on the email thread to Yudkowsky. We had a long phone conversation the next day lamenting how the "rationalists" were dead as an intellectual community.

As for the attempt to intervene on Yudkowsky—here I need to make a digression about the constraints I'm facing in telling this Whole Dumb Story. I would prefer to just tell this Whole Dumb Story as I would to my long-neglected Diary—trying my best at the difficult task of explaining what actually happened during an important part of my life, without thought of concealing anything.

(If you are silent about your pain, they'll kill you and say you enjoyed it.)

Unfortunately, a lot of other people seem to have strong intuitions about "privacy", which bizarrely impose constraints on what I'm allowed to say about my own life: in particular, it's considered unacceptable to publicly quote or summarize someone's emails from a conversation that they had reason to expect to be private. I feel obligated to comply with these widely-held privacy norms, even if I think they're paranoid and anti-social. (This secrecy-hating trait probably correlates with the autogynephilia blogging; someone otherwise like me who believed in privacy wouldn't be telling you this Whole Dumb Story.)

So I would think that while telling this Whole Dumb Story, I obviously have an inalienable right to blog about my own actions, but I'm not allowed to directly refer to private conversations with named individuals in cases where I don't think I'd be able to get the consent of the other party. (I don't think I'm required to go through the ritual of asking for consent in cases where the revealed information couldn't reasonably be considered "sensitive", or if I know the person doesn't have hangups about this weird "privacy" thing.) In this case, I'm allowed to talk about emailing Yudkowsky (because that was my action), but I'm not allowed to talk about anything he might have said in reply, or whether he did.

Unfortunately, there's a potentially serious loophole in the commonsense rule: what if some of my actions (which I would have hoped to have an inalienable right to blog about) depend on content from private conversations? You can't, in general, only reveal one side of a conversation.

Suppose Carol messages Dave at 5 p.m., "Can you come to the party?", and also, separately, that Carol messages Dave at 6 p.m., "Gout isn't contagious." Should Carol be allowed to blog about the messages she sent at 5 p.m. and 6 p.m., because she's only describing her own messages and not confirming or denying whether Dave replied at all, let alone quoting him?

I think commonsense privacy-norm-adherence intuitions actually say No here: the text of Carol's messages makes it too easy to guess that sometime between 5 and 6, Dave probably said that he couldn't come to the party because he has gout. It would seem that Carol's right to talk about her own actions in her own life does need to take into account some commonsense judgement of whether that leaks "sensitive" information about Dave.

In the substory (of my Whole Dumb Story) that follows, I'm going to describe several times that I and others emailed Yudkowsky to argue with what he said in public, without saying anything about whether Yudkowsky replied or what he might have said if he did reply. I maintain that I'm within my rights here, because I think commonsense judgment will agree that me talking about the arguments I made does not leak any sensitive information about the other side of a conversation that may or may not have happened. I think the story comes off relevantly the same whether Yudkowsky didn't reply at all (e.g., because he was too busy with more existentially important things to check his email), or whether he replied in a way that I found sufficiently unsatisfying as to occasion the further emails with followup arguments that I describe. (Talking about later emails does rule out the possible world where Yudkowsky had said, "Please stop emailing me," because I would have respected that, but the fact that he didn't say that isn't "sensitive".)

It seems particularly important to lay out these judgments about privacy norms in connection to my attempts to contact Yudkowsky, because part of what I'm trying to accomplish in telling this Whole Dumb Story is to deal reputational damage to Yudkowsky, which I claim is deserved. (We want reputations to track reality. If you see Erin exhibiting a pattern of intellectual dishonesty, and she keeps doing it even after you talk to her about it privately, you might want to write a blog post describing the pattern in detail—not to hurt Erin, particularly, but so that everyone else can make higher-quality decisions about whether they should believe the things that Erin says.) Given that motivation of mine, it seems important that I only try to hang Yudkowsky with the rope of what he said in public, where you can click the links and read the context for yourself: I'm attacking him, but not betraying him. In the substory that follows, I also describe correspondence with Scott Alexander, but that doesn't seem sensitive in the same way, because I'm not particularly trying to deal reputational damage to Alexander. (Not because Scott performed well, but because one wouldn't really have expected him to in this situation; Alexander's reputation isn't so direly in need of correction.)

Thus, I don't think I should say whether Yudkowsky replied to Michael's and my emails, nor (again) whether he accepted the cheerful-price money, because any conversation that may or may not have occurred would have been private. But what I can say, because it was public, is that we saw this addition to the Twitter thread:

I was sent this (by a third party) as a possible example of the sort of argument I was looking to read: http://unremediatedgender.space/2018/Feb/the-categories-were-made-for-man-to-make-predictions/. Without yet judging its empirical content, I agree that it is not ontologically confused. It's not going "But this is a MAN so using 'she' is LYING."

Look at that! The great Eliezer Yudkowsky said that my position is "not ontologically confused." That's probably high praise, coming from him!

You might think that that should have been the end of the story. Yudkowsky denounced a particular philosophical confusion, I already had a related objection written up, and he publicly acknowledged my objection as not being the confusion he was trying to police. I should be satisfied, right?

I wasn't, in fact, satisfied. This little "not ontologically confused" clarification buried deep in the replies was much less visible than the bombastic, arrogant top-level pronouncement insinuating that resistance to gender-identity claims was confused. (1 Like on this reply, vs. 140 Likes/18 Retweets on start of thread.) This little follow-up did not seem likely to disabuse the typical reader of the impression that Yudkowsky thought gender-identity skeptics didn't have a leg to stand on. Was it greedy of me to want something louder?

Greedy or not, I wasn't done flipping out. On 1 December 2019, I wrote to Scott Alexander (cc'ing a few other people) to ask if there was any chance of an explicit and loud clarification or partial retraction of "... Not Man for the Categories" (Subject: "super-presumptuous mail about categorization and the influence graph"). Forget my boring whining about the autogynephilia/two-types thing, I said—that's a complicated empirical claim, and not the key issue.

The issue was that category boundaries are not arbitrary (if you care about intelligence being useful). You want to draw your category boundaries such that things in the same category are similar in the respects that you care about predicting/controlling, and you want to spend your information-theoretically limited budget of short words on the simplest and most widely useful categories.

It was true that the reason I was continuing to freak out about this to the extent of sending him this obnoxious email telling him what to write (seriously, who does that?!) was because of transgender stuff, but that wasn't why Scott should care.

The other year, Alexander had written a post, "Kolmogorov Complicity and the Parable of Lightning", explaining the consequences of political censorship with an allegory about a Society with the dogma that thunder occurs before lightning.[4] Alexander had explained that the problem with complying with the dictates of a false orthodoxy wasn't the sacred dogma itself (it's not often that you need to directly make use of the fact that lightning comes first), but that the need to defend the sacred dogma destroys everyone's ability to think.

It was the same thing here. It wasn't that I had any practical need to misgender anyone in particular. It still wasn't okay that talking about the reality of biological sex to so-called "rationalists" got you an endless deluge of—polite! charitable! non-ostracism-threatening!—bullshit nitpicking. (What about complete androgen insensitivity syndrome? Why doesn't this ludicrous misinterpretation of what you said imply that lesbians aren't women? &c. ad infinitum.) With enough time, I thought the nitpicks could and should be satisfactorily answered; any remaining would presumably be fatal criticisms rather than bullshit nitpicks. But while I was in the process of continuing to write all that up, I hoped Alexander could see why I felt somewhat gaslighted.

(I had been told by others that I wasn't using the word "gaslighting" correctly. No one seemed to think I had the right to define that category boundary for my convenience.)

If our vaunted rationality techniques resulted in me having to spend dozens of hours patiently explaining why I didn't think that I was a woman (where "not a woman" is a convenient rhetorical shorthand for a much longer statement about naïve Bayes models and high-dimensional configuration spaces and defensible Schelling points for social norms), then our techniques were worse than useless.

If Galileo ever muttered "And yet it moves", there's a long and nuanced conversation you could have about the consequences of using the word "moves" in Galileo's preferred sense, as opposed to some other sense that happens to result in the theory needing more epicycles. It may not have been obvious in November 2014 when "... Not Man for the Categories" was published, but in retrospect, maybe it was a bad idea to build a memetic superweapon that says that the number of epicycles doesn't matter.

The reason to write this as a desperate email plea to Scott Alexander instead of working on my own blog was that I was afraid that marketing is a more powerful force than argument. Rather than good arguments propagating through the population of so-called "rationalists" no matter where they arose, what actually happened was that people like Alexander and Yudkowsky rose to power on the strength of good arguments and entertaining writing (but mostly the latter), and then everyone else absorbed some of their worldview (plus noise and conformity with the local environment). So for people who didn't win the talent lottery but thought they saw a flaw in the zeitgeist, the winning move was "persuade Scott Alexander."

Back in 2010, the rationalist community had a shared understanding that the function of language is to describe reality. Now, we didn't. If Scott didn't want to cite my creepy blog about my creepy fetish, that was fine; I liked getting credit, but the important thing was that this "No, the Emperor isn't naked—oh, well, we're not claiming that he's wearing any garments—it would be pretty weird if we were claiming that!—it's just that utilitarianism implies that the social property of clothedness should be defined this way because to do otherwise would be really mean to people who don't have anything to wear" maneuver needed to die, and he alone could kill it.

Scott didn't get it. We agreed that gender categories based on self-identity, natal sex, and passing each had their own pros and cons, and that it's uninteresting to focus on whether something "really" belongs to a category rather than on communicating what you mean. Scott took this to mean that what convention to use is a pragmatic choice we can make on utilitarian grounds, and that being nice to trans people was worth a little bit of clunkiness—that the mental health benefits to trans people were obviously enough to tip the first-order utilitarian calculus.

I didn't think anything about "mental health benefits to trans people" was obvious. More importantly, I considered myself to be prosecuting not the object-level question of which gender categories to use but the meta-level question of what normative principles govern the use of categories. For this, "whatever, it's a pragmatic choice, just be nice" wasn't an answer, because the normative principles exclude "just be nice" from being a relevant consideration.

"... Not Man for the Categories" had concluded with a section on Emperor Norton, a 19th-century San Francisco resident who declared himself Emperor of the United States. Certainly, it's not difficult or costly for the citizens of San Francisco to address Norton as "Your Majesty". But there's more to being Emperor of the United States than what people call you. Unless we abolish Congress and have the military enforce Norton's decrees, he's not actually emperor—at least not according to the currently generally understood meaning of the word.

What are you going to do if Norton takes you literally? Suppose he says, "I ordered the Imperial Army to invade Canada last week; where are the troop reports? And why do the newspapers keep talking about this so-called 'President' Rutherford B. Hayes? Have this pretender Hayes executed at once and bring his head to me!"

You're not really going to bring him Rutherford B. Hayes's head. So what are you going to tell him? "Oh, well, you're not a cis emperor who can command executions. But don't worry! Trans emperors are emperors"?

To be sure, words can be used in many ways depending on context, but insofar as Norton is interpreting "emperor" in the traditional sense, and you keep calling him your emperor without caveats or disclaimers, you are lying to him.

Scott still didn't get it. But I did soon end up in more conversation with Michael Vassar, Ben Hoffman, and Sarah Constantin, who were game to help me reach out to Yudkowsky again to explain the problem in more detail—and to appeal to the conscience of someone who built their career on higher standards.

Yudkowsky probably didn't think much of Atlas Shrugged (judging by an offhand remark by our protagonist in Harry Potter and the Methods), but I kept thinking of the scene[5] where our heroine, Dagny Taggart, entreats the great Dr. Robert Stadler to denounce an egregiously deceptive but technically-not-lying statement by the State Science Institute, whose legitimacy derives from its association with his name. Stadler has become cynical in his old age and demurs: "I can't help what people think—if they think at all!" ... "How can one deal in truth when one deals with the public?"

At this point, I still trusted Yudkowsky to do better than an Ayn Rand villain; I had faith that Eliezer Yudkowsky could deal in truth when he deals with the public.

(I was wrong.)

If we had this entire posse, I felt bad and guilty and ashamed about focusing too much on my special interest except insofar as it was genuinely a proxy for "Has Eliezer and/or everyone else lost the plot, and if so, how do we get it back?" But the group seemed to agree that my philosophy-of-language grievance was a useful test case.

At times, it felt like my mind shut down with only the thought, "What am I doing? This is absurd. Why am I running around picking fights about the philosophy of language—and worse, with me arguing for the Bad Guys' position? Maybe I'm wrong and should stop making a fool of myself. After all, using Aumann-like reasoning, in a dispute of 'me and Michael Vassar vs. everyone else', wouldn't I want to bet on 'everyone else'?"

Except ... I had been raised back in the 'aughts to believe that you're you're supposed to concede arguments on the basis of encountering a superior counterargument, and I couldn't actually point to one. "Maybe I'm making a fool out of myself by picking fights with all these high-status people" is not a counterargument.

Anna continued to be disinclined to take a side in the brewing Category War, and it was beginning to put a strain on our friendship, to the extent that I kept ending up crying during our occasional meetings. She said that my "You have to pass my philosophy-of-language litmus test or I lose all respect for you as a rationalist" attitude was psychologically coercive. I agreed—I was even willing to go up to "violent", in the sense that I'd cop to trying to apply social incentives toward an outcome rather than merely exchanging information. But sometimes you need to use violence in defense of self or property. If we thought of the "rationalist" brand name as intellectual property, maybe it was property worth defending, and if so, then "I can define a word any way I want" wasn't an obviously terrible time to start shooting at the bandits.

My hope was that it was possible to apply just enough "What kind of rationalist are you?!" social pressure to cancel out the "You don't want to be a Bad (Red) person, do you??" social pressure and thereby let people look at the arguments—though I wasn't sure if that even works, and I was growing exhausted from all the social aggression I was doing. (If someone tries to take your property and you shoot at them, you could be said to be the "aggressor" in the sense that you fired the first shot, even if you hope that the courts will uphold your property claim later.)

After some more discussion within the me/Michael/Ben/Sarah posse, on 4 January 2019, I wrote to Yudkowsky again (a second time), to explain the specific problems with his "hill of meaning in defense of validity" Twitter performance, since that apparently hadn't been obvious from the earlier link to "... To Make Predictions". I cc'ed the posse, who chimed in afterwards.

Ben explained what kind of actions we were hoping for from Yudkowsky: that he would (1) notice that he'd accidentally been participating in an epistemic war, (2) generalize the insight (if he hadn't noticed, what were the odds that MIRI had adequate defenses?), and (3) join the conversation about how to actually have a rationality community, while noticing this particular way in which the problem seemed harder than it used to. For my case in particular, something that would help would be either (A) a clear ex cathedra statement that gender categories are not an exception to the general rule that categories are nonarbitrary, or (B) a clear ex cathedra statement that he's been silenced on this matter. If even (B) was too politically expensive, that seemed like important evidence about (1).

Without revealing the other side of any private conversation that may or may not have occurred, I can say that we did not get either of those ex cathedra statements at this time.

It was also around this time that our posse picked up a new member, whom I'll call "Riley".

On 5 January 2019, I met with Michael and his associate Aurora Quinn-Elmore in San Francisco to attempt mediated discourse with Ziz and Gwen, who were considering suing the Center for Applied Rationality (CfAR)[6] for discriminating against trans women. Michael hoped to dissuade them from a lawsuit—not because he approved of CfAR's behavior, but because lawyers make everything worse.

Despite our personality and worldview differences, I had had a number of cooperative interactions with Ziz a couple years before. We had argued about the etiology of transsexualism in late 2016. When I sent her some delusional PMs during my February 2017 psychotic break, she came over to my apartment with chocolate ("allegedly good against dementors"), although I wasn't there. I had awarded her $1200 as part of a credit-assignment ritual to compensate the twenty-one people who were most responsible for me successfully navigating my psychological crises of February and April 2017. (The fact that she had been up to argue about trans etiology meant a lot to me.) I had accepted some packages for her at my apartment in mid-2017 when she was preparing to live on a boat and didn't have a mailing address.

At this meeting, Ziz recounted her story of how Anna Salamon (in her capacity as President of CfAR and community leader) allegedly engaged in conceptual warfare to falsely portray Ziz as a predatory male. I was unimpressed: in my worldview, I didn't think Ziz had the right to say "I'm not a man," and expect people to just believe that. (I remember that at one point, Ziz answered a question with, "Because I don't run off masochistic self-doubt like you." I replied, "That's fair.") But I did respect that Ziz actually believed in an intersex brain theory: in Ziz and Gwen's worldview, people's genders were a fact of the matter, not a manipulation of consensus categories to make people happy.

Probably the most ultimately consequential part of this meeting was Michael verbally confirming to Ziz that MIRI had settled with a disgruntled former employee, Louie Helm, who had put up a website slandering them. (I don't know the details of the alleged settlement. I'm working off of Ziz's notes rather than remembering that part of the conversation clearly myself; I don't know what Michael knew.) What was significant was that if MIRI had paid Helm as part of an agreement to get the slanderous website taken down, then (whatever the nonprofit best-practice books might have said about whether this was a wise thing to do when facing a dispute from a former employee) that would decision-theoretically amount to a blackmail payout, which seemed to contradict MIRI's advocacy of timeless decision theories (according to which you shouldn't be the kind of agent that yields to extortion).

Something else Ben had said while chiming in on the second attempt to reach out to Yudkowsky hadn't sat quite right with me.

I am pretty worried that if I actually point out the physical injuries sustained by some of the smartest, clearest-thinking, and kindest people I know in the Rationalist community as a result of this sort of thing, I'll be dismissed as a mean person who wants to make other people feel bad.

I didn't know what he was talking about. My friend "Rebecca"'s 2015 psychiatric imprisonment ("hospitalization") had probably been partially related to her partner's transition and had involved rough handling by the cops. I had been through some Bad Stuff during my psychotic episodes of February and April 2017, but none of it was "physical injuries." What were the other cases, if he could share without telling me Very Secret Secrets With Names?

Ben said that, probabilistically, he expected that some fraction of the trans women he knew who had "voluntarily" had bottom surgery had done so in response to social pressure, even if some of them might well have sought it out in a less weaponized culture.

I said that saying, "I am worried that if I actually point out the physical injuries ..." when the actual example turned out to be sex reassignment surgery seemed dishonest: I had thought he might have more examples of situations like mine or "Rebecca"'s, where gaslighting escalated into more tangible harm in a way that people wouldn't know about by default. In contrast, people already know that bottom surgery is a thing; Ben just had reasons to think it's Actually Bad—reasons that his friends couldn't engage with if we didn't know what he was talking about. It was bad enough that Yudkowsky was being so cagey; if everyone did it, then we were really doomed.

Ben said he was more worried that saying politically loaded things in the wrong order would reduce our chances of getting engagement from Yudkowsky than that someone would share his words out of context in a way that caused him distinct harm. And maybe more than both of those, that saying the wrong keywords would cause his correspondents to talk about him using the wrong keywords, in ways that caused illegible, hard-to-trace damage.

There's a view that assumes that as long as everyone is being cordial, our truthseeking public discussion must be basically on track; the discussion is only being warped by the fear of heresy if someone is overtly calling to burn the heretics.

I do not hold this view. I think there's a subtler failure mode where people know what the politically favored bottom line is, and collude to ignore, nitpick, or just be uninterested in any fact or line of argument that doesn't fit. I want to distinguish between direct ideological conformity enforcement attempts, and people not living up to their usual epistemic standards in response to ideological conformity enforcement.

Especially compared to normal Berkeley, I had to give the Berkeley "rationalists" credit for being very good at free speech norms. (I'm not sure I would be saying this in the possible world where Scott Alexander didn't have a traumatizing experience with social justice in college, causing him to dump a ton of anti-social-justice, pro-argumentative-charity antibodies into the "rationalist" water supply after he became our subculture's premier writer. But it was true in our world.) I didn't want to fall into the bravery-debate trap of, "Look at me, I'm so heroically persecuted, therefore I'm right (therefore you should have sex with me)". I wasn't angry at the "rationalists" for silencing me (which they didn't); I was angry at them for making bad arguments and systematically refusing to engage with the obvious counterarguments.

As an illustrative example, in an argument on Discord in January 2019, I said, "I need the phrase 'actual women' in my expressive vocabulary to talk about the phenomenon where, if transition technology were to improve, then the people we call 'trans women' would want to make use of that technology; I need language that asymmetrically distinguishes between the original thing that already exists without having to try, and the artificial thing that's trying to imitate it to the limits of available technology".

Kelsey Piper replied, "the people getting surgery to have bodies that do 'women' more the way they want are mostly cis women [...] I don't think 'people who'd get surgery to have the ideal female body' cuts anything at the joints."

Another woman said, "'the original thing that already exists without having to try' sounds fake to me" (to the acclaim of four "+1" emoji reactions).

The problem with this kind of exchange is not that anyone is being shouted down, nor that anyone is lying. The problem is that people are motivatedly, "algorithmically" "playing dumb." I wish we had more standard terminology for this phenomenon, which is ubiquitous in human life. By "playing dumb", I don't mean that Kelsey was consciously thinking, "I'm playing dumb in order to gain an advantage in this argument." I don't doubt that, subjectively, mentioning that cis women also get cosmetic surgery felt like a relevant reply. It's just that, in context, I was obviously trying to talk about the natural category of "biological sex", and Kelsey could have figured that out if she had wanted to.

It's not that anyone explicitly said, "Biological sex isn't real" in those words. (The elephant in the brain knew it wouldn't be able to get away with that.) But if everyone correlatedly plays dumb whenever someone tries to talk about sex in clear language in a context where that could conceivably hurt some trans person's feelings, I think what you have is a culture of de facto biological sex denialism. ("'The original thing that already exists without having to try' sounds fake to me"!!) It's not that hard to get people to admit that trans women are different from cis women, but somehow they can't (in public, using words) follow the implication that trans women are different from cis women because trans women are male.

Ben thought I was wrong to see this behavior as non-ostracizing. The deluge of motivated nitpicking is an implied marginalization threat, he explained: the game people were playing when they did that was to force me to choose between doing arbitrarily large amounts of interpretive labor or being cast as never having answered these construed-as-reasonable objections, and therefore over time losing standing to make the claim, being thought of as unreasonable, not getting invited to events, &c.

I saw the dynamic he was pointing at, but as a matter of personality, I was more inclined to respond, "Welp, I guess I need to write faster and more clearly", rather than, "You're dishonestly demanding arbitrarily large amounts of interpretive labor from me." I thought Ben was far too quick to give up on people whom he modeled as trying not to understand, whereas I continued to have faith in the possibility of making them understand if I just didn't give up. Not to play chess with a pigeon (which craps on the board and then struts around like it's won), or wrestle with a pig (which gets you both dirty, and the pig likes it), or dispute what the Tortoise said to Achilles—but to hold out hope that people in "the community" could only be boundedly motivatedly dense, and anyway that giving up wouldn't make me a stronger writer.

(Picture me playing Hermione Granger in a post-Singularity holonovel adaptation of Harry Potter and the Methods of Rationality, Emma Watson having charged me the standard licensing fee to use a copy of her body for the occasion: "We can do anything if we exert arbitrarily large amounts of interpretive labor!")

Ben thought that making them understand was hopeless and that becoming a stronger writer was a boring goal; it would be a better use of my talents to jump up a meta level and explain how people were failing to engage. That is, insofar as I expected arguing to work, I had a model of "the rationalists" that kept making bad predictions. What was going on there? Something interesting might happen if I tried to explain that.

(I guess I'm only now, after spending an additional four years exhausting every possible line of argument, taking Ben's advice on this by finishing and publishing this memoir. Sorry, Ben—and thanks.)

One thing I regret about my behavior during this period was the extent to which I was emotionally dependent on my posse, and in some ways particularly Michael, for validation. I remembered Michael as a high-status community elder back in the Overcoming Bias era (to the extent that there was a "community" in those early days).[7] I had been skeptical of him: the guy makes a lot of stridently "out there" assertions, in a way that makes you assume he must be speaking metaphorically. (He always insists he's being completely literal.) But he had social proof as the President of the Singularity Institute—the "people person" of our world-saving effort, to complement Yudkowsky's antisocial mad scientist personality—which inclined me to take his assertions more charitably than I otherwise would have.

Now, the memory of that social proof was a lifeline. Dear reader, if you've never been in the position of disagreeing with the entire weight of Society's educated opinion, including your idiosyncratic subculture that tells itself a story about being smarter and more open-minded than the surrounding Society—well, it's stressful. There was a comment on the /r/slatestarcodex subreddit around this time that cited Yudkowsky, Alexander, Piper, Ozy Brennan, and Rob Bensinger as leaders of the "rationalist" community. Just an arbitrary Reddit comment of no significance whatsoever—but it was a salient indicator of the zeitgeist to me, because every single one of those people had tried to get away with some variant on the "word usage is subjective, therefore you have no grounds to object to the claim that trans women are women" mind game.

In the face of that juggernaut of received opinion, I was already feeling pretty gaslighted. ("We ... we had a whole Sequence about this. And you were there, and you were there ... It—really happened, right? The hyperlinks still work ...") I don't know how I would have held up intact if I were facing it alone. I definitely wouldn't have had the impudence to pester Alexander and Yudkowsky—especially Yudkowsky—if it was just me against everyone else.

But Michael thought I was in the right—not just intellectually, but morally in the right to be prosecuting the philosophy issue with our leaders. That social proof gave me a lot of bravery that I otherwise wouldn't have been able to muster up—even though it would have been better if I could have internalized that my dependence on him was self-undermining, insofar as Michael himself said that what made me valuable was my ability to think independently.

The social proof was probably more effective in my head than with anyone we were arguing with. I remembered Michael as a high-status community elder back in the Overcoming Bias era, but that had been a long time ago. (Luke Muelhauser had taken over leadership of the Singularity Institute in 2011, and apparently, some sort of rift between Michael and Eliezer had widened in recent years.) Michael's status in "the community" of 2019 was much more mixed. He was intensely critical of the rise of the Effective Altruism movement, which he saw as using bogus claims about how to do the most good to prey on the smartest and most scrupulous people around. (I remember being at a party in 2015 and asking Michael what else I should spend my San Francisco software engineer money on, if not the EA charities I was considering. I was surprised when his answer was, "You.")

Another blow to Michael's reputation was dealt on 27 February 2019, when Anna published a comment badmouthing Michael and suggesting that talking to him was harmful, which I found disappointing—more so as I began to realize the implications.

I agreed with her point about how "ridicule of obviously-fallacious reasoning plays an important role in discerning which thinkers can (or can't) help" fill the role of vetting and common knowledge creation. That's why I was so heartbroken about the "categories are arbitrary, therefore trans women are women" thing, which deserved to be laughed out of the room. Why was she trying to ostracize the guy who was one of the very few to back me up on this incredibly obvious thing!? The reasons given to discredit Michael seemed weak. (He ... flatters people? He ... didn't tell people to abandon their careers? What?) And the evidence against Michael she offered in private didn't seem much more compelling (e.g., at a CfAR event, he had been insistent on continuing to talk to someone who Anna thought looked near psychosis and needed a break).

It made sense for Anna to not like Michael anymore because of his personal conduct, or because of his opposition to EA. (Expecting all of my friends to be friends with each other would be Geek Social Fallacy #4.) If she didn't want to invite him to CfAR stuff, fine. But what did she gain from publicly denouncing him as someone whose "lies/manipulations can sometimes disrupt [people's] thinking for long and costly periods of time"?! She said she was trying to undo the effects of her previous endorsements of him, and that the comment seemed like it ought to be okay by Michael's standards (which didn't include an expectation that people should collude to protect each other's reputations).

I wasn't the only one whose life was being disrupted by political drama in early 2019. On 22 February, Scott Alexander posted that the /r/slatestarcodex Culture War Thread was being moved to a new non–Slate Star Codex–branded subreddit in the hopes that would curb some of the harassment he had been receiving. Alexander claimed that according to poll data and his own impressions, the Culture War Thread featured a variety of ideologically diverse voices but had nevertheless acquired a reputation as being a hive of right-wing scum and villainy.

Yudkowsky Tweeted:

Your annual reminder that Slate Star Codex is not and never was alt-right, every real stat shows as much, and the primary promoters of this lie are sociopaths who get off on torturing incredibly nice targets like Scott A.

I found Yudkowsky's use of the word "lie" here interesting given his earlier eagerness to police the use of the word "lie" by gender-identity skeptics. With the support of my posse, I wrote to him again, a third time (Subject: "on defending against 'alt-right' categorization").

I said, imagine if one of Alexander's critics were to reply: "Using language in a way you dislike, openly and explicitly and with public focus on the language and its meaning, is not lying. The proposition you claim false (explicit advocacy of a white ethnostate?) is not what the speech is meant to convey—and this is known to everyone involved, it is not a secret. You're not standing in defense of truth if you insist on a word, brought explicitly into question, being used with some particular meaning. Now, maybe as a matter of policy, you want to make a case for language like 'alt-right' being used a certain way. Well, that's a separate debate then. But you're not making a stand for Truth in doing so, and your opponents aren't tricking anyone or trying to."

How would Yudkowsky react if someone said that? My model of the Sequences-era Yudkowsky of 2009 would say, "This is an intellectually dishonest attempt to sneak in connotations by performing a categorization and using an appeal-to-arbitrariness conversation-halter to avoid having to justify it; go read 'A Human's Guide to Words.'"

But I had no idea what the real Yudkowsky of 2019 would say. If the moral of the "hill of meaning in defense of validity" thread had been that the word "lie" should be reserved for per se direct falsehoods, well, what direct falsehood was being asserted by Scott's detractors? I didn't think anyone was claiming that, say, Scott identified as alt-right, any more than anyone was claiming that trans women have two X chromosomes. Commenters on /r/SneerClub had been pretty explicit in their criticism that the Culture War thread harbored racists (&c.) and possibly that Scott himself was a secret racist, with respect to a definition of racism that included the belief that there exist genetically mediated population differences in the distribution of socially relevant traits and that this probably had decision-relevant consequences that should be discussable somewhere.

And this was correct. For example, Alexander's "The Atomic Bomb Considered As Hungarian High School Science Fair Project" favorably cites Cochran et al.'s genetic theory of Ashkenazi achievement as "really compelling." Scott was almost certainly "guilty" of the category membership that the speech was meant to convey—it's just that Sneer Club got to choose the category. If a machine-learning classifier returns positive on both Scott Alexander and Richard Spencer, the correct response is not that the classifier is "lying" (what would that even mean?) but that the classifier is not very useful for understanding Scott Alexander's effects on the world.

Of course, Scott is great, and it was right that we should defend him from the bastards trying to ruin his reputation, and it was plausible that the most politically convenient way to do that was to pound the table and call them lying sociopaths rather than engaging with the substance of their claims—much as how someone being tried under an unjust law might plead "Not guilty" to save their own skin rather than tell the whole truth and hope for jury nullification.

But, I argued, political convenience came at a dire cost to our common interest. There was a proverb Yudkowsky had once failed to Google, that ran something like, "Once someone is known to be a liar, you might as well listen to the whistling of the wind."

Similarly, once someone is known to vary the epistemic standards of their public statements for political convenience—if they say categorizations can be lies when that happens to help their friends, but seemingly deny the possibility when that happens to make them look good politically ...

Well, you're still better off listening to them than the whistling of the wind, because the wind in various possible worlds is presumably uncorrelated with most of the things you want to know about, whereas clever arguers who don't tell explicit lies are constrained in how much they can mislead you. But it seems plausible that you might as well listen to any other arbitrary smart person with a blue check and 20K Twitter followers. It might be a useful exercise, for Yudkowsky to think of what he would actually say if someone with social power actually did this to him when he was trying to use language to reason about Something he had to Protect?

(Note, my claim here is not that "Pronouns aren't lies" and "Scott Alexander is not a racist" are similarly misinformative. Rather, I'm saying that whether "You're not standing in defense of truth if you insist on a word, brought explicitly into question, being used with some particular meaning" makes sense as a response to "X isn't a Y" shouldn't depend on the specific values of X and Y. Yudkowsky's behavior the other month had made it look like he thought that "You're not standing in defense of truth if ..." was a valid response when, say, X = "Caitlyn Jenner" and Y = "woman." I was saying that whether or not it's a valid response, we should, as a matter of local validity, apply the same standard when X = "Scott Alexander" and Y = "racist.")

Without disclosing any specific content from private conversations that may or may not have happened, I can say that our posse did not get the kind of engagement from Yudkowsky that we were hoping for.

Michael said that it seemed important that, if we thought Yudkowsky wasn't interested, we should have common knowledge among ourselves that we considered him to be choosing to be a cult leader.

I settled on Sara Bareilles's "Gonna Get Over You" as my breakup song with Yudkowsky and the rationalists, often listening to a cover of it on loop to numb the pain. I found the lyrics were readily interpretable as being about my problems, even if Sara Bareilles had a different kind of breakup in mind. ("I tell myself to let the story end"—the story of the rationalists as a world-changing intellectual movement. "And my heart will rest in someone else's hand"—Michael Vassar's. "And I'm not the girl that I intend to be"—self-explanatory.)[8]

Meanwhile, my email thread with Scott started up again. I expressed regret that all the times I had emailed him over the past couple years had been when I was upset about something (like psych hospitals, or—something else) and wanted something from him, treating him as a means rather than an end—and then, despite that regret, I continued prosecuting the argument.

One of Alexander's most popular Less Wrong posts ever had been about the noncentral fallacy, which Alexander called "the worst argument in the world": those who (for example) crow that abortion is murder (because murder is the killing of a human being), or that Martin Luther King, Jr. was a criminal (because he defied the segregation laws of the South), are engaging in a dishonest rhetorical maneuver in which they're trying to trick their audience into assigning attributes of the typical "murder" or "criminal" to what are very noncentral members of those categories.

Even if you're opposed to abortion, or have negative views about the historical legacy of Dr. King, this isn't the right way to argue. If you call Fiona a murderer, that causes me to form a whole bunch of implicit probabilistic expectations on the basis of what the typical "murder" is like—expectations about Fiona's moral character, about the suffering of a victim whose hopes and dreams were cut short, about Fiona's relationship with the law, &c.—most of which get violated when you reveal that the murder victim was an embryo.

In the form of a series of short parables, I tried to point out that Alexander's own "The Worst Argument in the World" is complaining about the same category-gerrymandering move that his "... Not Man for the Categories" comes out in favor of. We would not let someone get away with declaring, "I ought to accept an unexpected abortion or two deep inside the conceptual boundaries of what would normally not be considered murder if it'll save someone's life." Maybe abortion is wrong and relevantly similar to the central sense of "murder", but you need to make that case on the empirical merits, not by linguistic fiat (Subject: "twelve short stories about language").

Scott still didn't get it. He didn't see why he shouldn't accept one unit of categorizational awkwardness in exchange for sufficiently large utilitarian benefits. He made an analogy to some lore from the Glowfic collaborative fiction writing community, a story about orcs who had unwisely sworn a oath to serve the evil god Melkor. Though the orcs intend no harm of their own will, they're magically bound to obey Melkor's commands and serve as his terrible army or else suffer unbearable pain. Our heroine comes up with a solution: she founds a new religion featuring a deist God who also happens to be named "Melkor". She convinces the orcs that since the oath didn't specify which Melkor, they're free to follow her new God instead of evil Melkor, and the magic binding the oath apparently accepts this casuistry if the orcs themselves do.

Scott's attitude toward the new interpretation of the oath in the story was analogous to his thinking about transgenderedness: sure, the new definition may be a little awkward and unnatural, but it's not objectively false, and it made life better for so many orcs. If rationalists should win, then the true rationalist in this story was the one who thought up this clever hack to save an entire species.

I started drafting a long reply—but then I remembered that in recent discussion with my posse, the idea had come up that in-person meetings are better for resolving disagreements. Would Scott be up for meeting in person some weekend? Non-urgent. Ben would be willing to moderate, unless Scott wanted to suggest someone else, or no moderator.

Scott didn't want to meet. I considered resorting to the tool of cheerful prices, which I hadn't yet used against Scott—to say, "That's totally understandable! Would a financial incentive change your decision? For a two-hour meeting, I'd be happy to pay up to $4000 to you or your preferred charity. If you don't want the money, then let's table this. I hope you're having a good day." But that seemed sufficiently psychologically coercive and socially weird that I wasn't sure I wanted to go there. On 18 March, I emailed my posse asking what they thought—and then added that maybe they shouldn't reply until Friday, because it was Monday, and I really needed to focus on my dayjob that week.

This is the part where I began to ... overheat. I tried ("tried") to focus on my dayjob, but I was just so angry. Did Scott really not understand the rationality-relevant distinction between "value-dependent categories as a result of caring about predicting different variables" (as explained by the dagim/water-dwellers vs. fish example in "... Not Man for the Categories") and "value-dependent categories in order to not make my friends sad"? Was he that dumb? Or was it that he was only verbal-smart, and this is the sort of thing that only makes sense if you've ever been good at linear algebra? (Such that the language of "only running your clustering algorithm on the subspace of the configuration space spanned by the variables that are relevant to your decisions" would come naturally.) Did I need to write a post explaining just that one point in mathematical detail, with executable code and a worked example with entropy calculations?

My dayjob boss made it clear that he was expecting me to have code for my current Jira tickets by noon the next day, so I deceived myself into thinking I could accomplish that by staying at the office late. Maybe I could have caught up, if it were just a matter of the task being slightly harder than anticipated and I weren't psychologically impaired from being hyper-focused on the religious war. The problem was that focus is worth 30 IQ points, and an IQ 100 person can't do my job.

I was in so much (psychological) pain. Or at least, in one of a series of emails to my posse that night, I felt motivated to type the sentence, "I'm in so much (psychological) pain." I'm never sure how to interpret my own self-reports, because even when I'm really emotionally trashed (crying, shaking, randomly yelling, &c.), I think I'm still noticeably incentivizable: if someone were to present a credible threat (like slapping me and telling me to snap out of it), then I would be able to calm down. There's some sort of game-theory algorithm in the brain that feels subjectively genuine distress (like crying or sending people too many hysterical emails) but only when it can predict that it will be rewarded with sympathy or at least tolerated: tears are a discount on friendship.

I tweeted a Sequences quote (the mention of @ESYudkowsky being to attribute credit, I told myself; I figured Yudkowsky had enough followers that he probably wouldn't see a notification):

"—and if you still have something to protect, so that you MUST keep going, and CANNOT resign and wisely acknowledge the limitations of rationality— [1/3]

"—then you will be ready to start your journey[.] To take sole responsibility, to live without any trustworthy defenses, and to forge a higher Art than the one you were once taught. [2/3]

"No one begins to truly search for the Way until their parents have failed them, their gods are dead, and their tools have shattered in their hand." —@ESYudkowsky (https://www.lesswrong.com/posts/wustx45CPL5rZenuo/no-safe-defense-not-even-science) [end/3]

Only it wasn't quite appropriate. The quote is about failure resulting in the need to invent new methods of rationality, better than the ones you were taught. But the methods I had been taught were great! I didn't have a pressing need to improve on them! I just couldn't cope with everyone else having forgotten!

I did eventually get some dayjob work done that night, but I didn't finish the whole thing my manager wanted done by the next day, and at 4 a.m., I concluded that I needed sleep, the lack of which had historically been very dangerous for me (being the trigger for my 2013 and 2017 psychotic breaks and subsequent psych imprisonments). We really didn't want another outcome like that. There was a couch in the office, and probably another four hours until my coworkers started to arrive. The thing I needed to do was just lie down on the couch in the dark and have faith that sleep would come. Meeting my manager's deadline wasn't that important. When people came in to the office, I might ask for help getting an Uber home? Or help buying melatonin? The important thing was to be calm.

I sent an email explaining this to Scott and my posse and two other friends (Subject: "predictably bad ideas").

Lying down didn't work. So at 5:26 a.m., I sent an email to Scott cc'ing my posse plus Anna about why I was so mad (both senses). I had a better draft sitting on my desktop at home, but since I was here and couldn't sleep, I might as well type this version (Subject: "five impulsive points, hastily written because I just can't even (was: Re: predictably bad ideas)"). Scott had been continuing to insist it's okay to gerrymander category boundaries for trans people's mental health, but there were a few things I didn't understand. If creatively reinterpreting the meanings of words because the natural interpretation would make people sad is okay, why didn't that generalize to an argument in favor of outright lying when the truth would make people sad? The mind games seemed crueler to me than a simple lie. Also, if "mental health benefits for trans people" matter so much, then why didn't my mental health matter? Wasn't I trans, sort of? Getting shut down by appeal-to-utilitarianism when I was trying to use reason to make sense of the world was observably really bad for my sanity!

Also, Scott had asked me if it wouldn't be embarrassing if the community solved Friendly AI and went down in history as the people who created Utopia forever, and I had rejected it because of gender stuff. But the original reason it had ever seemed remotely plausible that we would create Utopia forever wasn't "because we're us, the world-saving good guys," but because we were going to perfect an art of systematically correct reasoning. If we weren't going to do systematically correct reasoning because that would make people sad, then that undermined the reason that it was plausible that we would create Utopia forever.

Also-also, Scott had proposed a super–Outside View of the culture war as an evolutionary process that produces memes optimized to trigger PTSD syndromes and suggested that I think of that as what was happening to me. But, depending on how much credence Scott put in social proof, mightn't the fact that I managed to round up this whole posse to help me repeatedly argue with (or harass) Yudkowsky shift his estimate over whether my concerns had some objective merit that other people could see, too? It could simultaneously be the case that I had culture-war PTSD and my concerns had merit.

Michael replied at 5:58 a.m., saying that everyone's first priority should be making sure that I could sleep—that given that I was failing to adhere to my commitments to sleep almost immediately after making them, I should be interpreted as urgently needing help, and that Scott had comparative advantage in helping, given that my distress was most centrally over Scott gaslighting me, asking me to consider the possibility that I was wrong while visibly not considering the same possibility regarding himself.

That seemed a little harsh on Scott to me. At 6:14 a.m. and 6:21 a.m., I wrote a couple emails to everyone that my plan was to get a train back to my own apartment to sleep, that I was sorry for making such a fuss despite being incentivizable while emotionally distressed, that I should be punished in accordance with the moral law for sending too many hysterical emails because I thought I could get away with it, that I didn't need Scott's help, and that I thought Michael was being a little aggressive about that, but that I guessed that's also kind of Michael's style.

Michael was furious with me. ("What the FUCK Zack!?! Calling now," he emailed me at 6:18 a.m.) I texted and talked with him on my train ride home. He seemed to have a theory that people who are behaving badly, as Scott was, will only change when they see a victim who is being harmed. Me escalating and then immediately deescalating just after Michael came to help was undermining the attempt to force an honest confrontation, such that we could get to the point of having a Society with morality or punishment.

Anyway, I did get to my apartment and sleep for a few hours. One of the other friends I had cc'd on some of the emails, whom I'll call "Meredith", came to visit me later that morning with her 2½-year-old son—I mean, her son at the time.

(Incidentally, the code that I had written intermittently between 11 p.m. and 4 a.m. was a horrible bug-prone mess, and the company has been paying for it ever since.)

At some level, I wanted Scott to know how frustrated I was about his use of "mental health for trans people" as an Absolute Denial Macro. But when Michael started advocating on my behalf, I started to minimize my claims because I had a generalized attitude of not wanting to sell myself as a victim. Ben pointed out that making oneself mentally ill in order to extract political concessions only works if you have a lot of people doing it in a visibly coordinated way—and even if it did work, getting into a dysphoria contest with trans people didn't seem like it led anywhere good.

I supposed that in Michael's worldview, aggression is more honest than passive-aggression. That seemed true, but I was psychologically limited in how much overt aggression I was willing to deploy against my friends. (And particularly Yudkowsky, whom I still hero-worshiped.) But clearly, the tension between "I don't want to do too much social aggression" and "Losing the Category War within the rationalist community is absolutely unacceptable" was causing me to make wildly inconsistent decisions. (Emailing Scott at 4 a.m. and then calling Michael "aggressive" when he came to defend me was just crazy: either one of those things could make sense, but not both.)

Did I just need to accept that was no such a thing as a "rationalist community"? (Sarah had told me as much two years ago while tripsitting me during my psychosis relapse, but I hadn't made the corresponding mental adjustments.)

On the other hand, a possible reason to be attached to the "rationalist" brand name and social identity that wasn't just me being stupid was that the way I talk had been trained really hard on this subculture for ten years. Most of my emails during this whole campaign had contained multiple Sequences or Slate Star Codex links that I could expect the recipients to have read. I could use the phrase "Absolute Denial Macro" in conversation and expect to be understood. If I gave up on the "rationalists" being a thing, and went out into the world to make friends with Quillette readers or arbitrary University of Chicago graduates, then I would lose all that accumulated capital. Here, I had a massive home territory advantage because I could appeal to Yudkowsky's writings about the philosophy of language from ten years ago and people couldn't say, "Eliezer who? He's probably a Bad Man."

The language I spoke was mostly educated American English, but I relied on subculture dialect for a lot. My sister has a chemistry doctorate from MIT (and so speaks the language of STEM intellectuals generally), and when I showed her "... To Make Predictions", she reported finding it somewhat hard to read, likely because I casually use phrases like "thus, an excellent motte" and expect to be understood without the reader taking 10 minutes to read the link. That essay, which was me writing from the heart in the words that came most naturally to me, could not be published in Quillette. The links and phraseology were just too context bound.

Maybe that's why I felt like I had to stand my ground and fight for the world I was made in, even though the contradiction between the war effort and my general submissiveness had me making crazy decisions.

Michael said that a reason to make a stand here in "the community" was because if we didn't, the beacon of "rationalism" would continue to lure and mislead others—but that more importantly, we needed to figure out how to win this kind of argument decisively, as a group. We couldn't afford to accept a status quo of accepting defeat when faced with bad faith arguments in general. Ben reported writing to Scott to ask him to alter the beacon so that people like me wouldn't think "the community" was the place to go for the rationality thing anymore.

As it happened, the next day, we saw these Tweets from @ESYudkowsky, linking to a Quillette article interviewing Lisa Littman about her work positing a socially contagious "rapid onset" type of gender dysphoria among young females:

Everything more complicated than protons tends to come in varieties. Hydrogen, for example, has isotopes. Gender dysphoria involves more than one proton and will probably have varieties. https://quillette.com/2019/03/19/an-interview-with-lisa-littman-who-coined-the-term-rapid-onset-gender-dysphoria/

To be clear, I don't know much about gender dysphoria. There's an allegation that people are reluctant to speciate more than one kind of gender dysphoria. To the extent that's not a strawman, I would say only in a generic way that GD seems liable to have more than one species.

(Why now? Maybe he saw the tag in my "tools have shattered" Tweet on Monday, or maybe the Quillette article was just timely?)

The most obvious reading of these Tweets was as a political concession to me. The two-type taxonomy of MtF was the thing I was originally trying to talk about, back in 2016–2017, before getting derailed onto the present philosophy-of-language war, and here Yudkowsky was backing up my side on that.

At this point, some readers might think that this should have been the end of the matter, that I should have been satisfied. I had started the recent drama flare-up because Yudkowsky had Tweeted something unfavorable to my agenda. But now, Yudkowsky was Tweeting something favorable to my agenda! Wouldn't it be greedy and ungrateful for me to keep criticizing him about the pronouns and language thing, given that he'd thrown me a bone here? Shouldn't I call it even?

That's not how it works. The entire concept of "sides" to which one can make "concessions" is an artifact of human coalitional instincts. It's not something that makes sense as a process for constructing a map that reflects the territory. My posse and I were trying to get a clarification about a philosophy-of-language claim Yudkowsky had made a few months prior ("you're not standing in defense of truth if [...]"). Why would we stop prosecuting that because of this unrelated Tweet about the etiology of gender dysphoria? That wasn't the thing we were trying to clarify!

Moreover—and I'm embarrassed that it took me another day to realize this—this new argument from Yudkowsky about the etiology of gender dysphoria was wrong. As I would later get around to explaining in "On the Argumentative Form 'Super-Proton Things Tend to Come in Varieties'", when people claim that some psychological or medical condition "comes in varieties", they're making a substantive empirical claim that the causal or statistical structure of the condition is usefully modeled as distinct clusters, not merely making the trivial observation that instances of the condition are not identical down to the subatomic level.

So we shouldn't think that there are probably multiple kinds of gender dysphoria because things are made of protons. If anything, a priori reasoning about the cognitive function of categorization should actually cut in the other direction, (mildly) against rather than in favor of multi-type theories: you only want to add more categories to your theory if they can pay for their additional complexity with better predictions. If you believe in Blanchard–Bailey–Lawrence's two-type taxonomy of MtF, or Littman's proposed rapid-onset type, it should be on the empirical merits, not because multi-type theories are a priori more likely to be true (which they aren't).

Had Yudkowsky been thinking that maybe if he Tweeted something favorable to my agenda, then I and the rest of Michael's gang would be satisfied and leave him alone?

But if there's some other reason you suspect there might be multiple species of dysphoria, but you tell people your suspicion is because "everything more complicated than protons tends to come in varieties", you're still misinforming people for political reasons, which was the general problem we were trying to alert Yudkowsky to. Inventing fake rationality lessons in response to political pressure is not okay, and the fact that in this case the political pressure happened to be coming from me didn't make it okay.

I asked the posse if this analysis was worth sending to Yudkowsky. Michael said it wasn't worth the digression. He asked if I was comfortable generalizing from Scott's behavior, and what others had said about fear of speaking openly, to assuming that something similar was going on with Eliezer? If so, then now that we had common knowledge, we needed to confront the actual crisis, "that dread is tearing apart old friendships and causing fanatics to betray everything that they ever stood for while its existence is still being denied."

That week, former MIRI researcher Jessica Taylor joined our posse (being at an in-person meeting with Ben and Sarah and another friend on the seventeenth, and getting tagged in subsequent emails). I had met Jessica for the first time in March 2017, shortly after my psychotic break, and I had been part of the group trying to take care of her when she had her own break in late 2017, but other than that, we hadn't been particularly close.

Significantly for political purposes, Jessica is trans. We didn't have to agree up front on all gender issues for her to see the epistemology problem with "... Not Man for the Categories", and to say that maintaining a narcissistic fantasy by controlling category boundaries wasn't what she wanted, as a trans person. (On the seventeenth, when I lamented the state of a world that incentivized us to be political enemies, her response was, "Well, we could talk about it first.") Michael said that me and Jessica together had more moral authority than either of us alone.

As it happened, I ran into Scott on the BART train that Friday, the twenty-second. He said he wasn't sure why the oft-repeated moral of "A Human's Guide to Words" had been "You can't define a word any way you want" rather than "You can define a word any way you want, but then you have to deal with the consequences."

Ultimately, I thought this was a pedagogy decision that Yudkowsky had gotten right back in 2008. If you write your summary slogan in relativist language, people predictably take that as license to believe whatever they want without having to defend it. Whereas if you write your summary slogan in objectivist language—so that people know they don't have social permission to say, "It's subjective, so I can't be wrong"—then you have some hope of sparking useful thought about the exact, precise ways that specific, definite things are relative to other specific, definite things.

I told Scott I would send him one more email with a piece of evidence about how other "rationalists" were thinking about the categories issue and give my commentary on the parable about orcs, and then the present thread would probably drop there.

Concerning what others were thinking: on Discord in January, Kelsey Piper had told me that everyone else experienced their disagreement with me as being about where the joints are and which joints are important, where usability for humans was a legitimate criterion of importance, and it was annoying that I thought they didn't believe in carving reality at the joints at all and that categories should be whatever makes people happy.

I didn't want to bring it up at the time because I was so overjoyed that the discussion was actually making progress on the core philosophy-of-language issue, but Scott did seem to be pretty explicit that his position was about happiness rather than usability? If Kelsey thought she agreed with Scott, but actually didn't, that sham consensus was a bad sign for our collective sanity, wasn't it?

As for the parable about orcs, I thought it was significant that Scott chose to tell the story from the standpoint of non-orcs deciding what verbal behaviors to perform while orcs are around, rather than the standpoint of the orcs themselves. For one thing, how do you know that serving evil-Melkor is a life of constant torture? Is it at all possible that someone has given you misleading information about that?

Moreover, you can't just give an orc a clever misinterpretation of an oath and have them believe it. First you have to cripple their general ability to correctly interpret oaths, for the same reason that you can't get someone to believe that 2+2=5 without crippling their general ability to do arithmetic. We weren't talking about a little "white lie" that the listener will never get to see falsified (like telling someone their dead dog is in heaven); the orcs already know the text of the oath, and you have to break their ability to understand it. Are you willing to permanently damage an orc's ability to reason in order to save them pain? For some sufficiently large amount of pain, surely. But this isn't a choice to make lightly—and the choices people make to satisfy their own consciences don't always line up with the volition of their alleged beneficiaries. We think we can lie to save others from pain, without wanting to be lied to ourselves. But behind the veil of ignorance, it's the same choice!

I also had more to say about philosophy of categories: I thought I could be more rigorous about the difference between "caring about predicting different variables" and "caring about consequences", in a way that Eliezer would have to understand even if Scott didn't. (Scott had claimed that he could use gerrymandered categories and still be just as good at making predictions—but that's just not true if we're talking about the internal use of categories as a cognitive algorithm, rather than mere verbal behavior. It's easy to say "X is a Y" for arbitrary X and Y if the stakes demand it, but that's not the same thing as using that concept of Y internally as part of your world-model.)

But after consultation with the posse, I concluded that further email prosecution was not useful at this time; the philosophy argument would work better as a public Less Wrong post. So my revised Category War to-do list was:

  • Send the brief wrapping-up/end-of-conversation email to Scott (with the Discord anecdote about Kelsey and commentary on the orc story).
  • Mentally write off Scott, Eliezer, and the so-called "rationalist" community as a loss so that I wouldn't be in horrible emotional pain from cognitive dissonance all the time.
  • Write up the mathy version of the categories argument for Less Wrong (which I thought might take a few months—I had a dayjob, and write slowly, and might need to learn some new math, which I'm also slow at).
  • Then email the link to Scott and Eliezer asking for a signal boost and/or court ruling.

Ben didn't think the mathematically precise categories argument was the most important thing for Less Wrong readers to know about: a similarly careful explanation of why I'd written off Scott, Eliezer, and the "rationalists" would be way more valuable.

I could see the value he was pointing at, but something in me balked at the idea of attacking my friends in public (Subject: "treachery, faith, and the great river (was: Re: DRAFTS: 'wrapping up; or, Orc-ham's razor' and 'on the power and efficacy of categories')").

Ben had previously written (in the context of the effective altruism movement) about how holding criticism to a higher standard than praise distorts our collective map. He was obviously correct that this was a distortionary force relative to what ideal Bayesian agents would do, but I was worried that when we're talking about criticism of people rather than ideas, the removal of the distortionary force would just result in social conflict (and not more truth). Criticism of institutions and social systems should be filed under "ideas" rather than "people", but the smaller-scale you get, the harder this distinction is to maintain: criticizing, say, "the Center for Effective Altruism", somehow feels more like criticizing Will MacAskill personally than criticizing "the United States" does, even though neither CEA nor the U.S. is a person.

That was why I couldn't give up faith that honest discourse eventually wins. Under my current strategy and consensus social norms, I could criticize Scott or Kelsey or Ozy's ideas without my social life dissolving into a war of all against all, whereas if I were to give in to the temptation to flip a table and say, "Okay, now I know you guys are just messing with me," then I didn't see how that led anywhere good, even if they really were.

Jessica explained what she saw as the problem with this. What Ben was proposing was creating clarity about behavioral patterns. I was saying that I was afraid that creating such clarity is an attack on someone. But if so, then my blog was an attack on trans people. What was going on here?

Socially, creating clarity about behavioral patterns is construed as an attack and can make things worse for someone. For example, if your livelihood is based on telling a story about you and your flunkies being the only sane truthseeking people in the world, then me demonstrating that you don't care about the truth when it's politically inconvenient is a threat to your marketing story and therefore to your livelihood. As a result, it's easier to create clarity down power gradients than up them: it was easy for me to blow the whistle on trans people's narcissistic delusions, but hard to blow the whistle on Yudkowsky's.[9]

But selectively creating clarity down but not up power gradients just reinforces existing power relations—in the same way that selectively criticizing arguments with politically unfavorable conclusions only reinforces your current political beliefs. I shouldn't be able to get away with claiming that calling non-exclusively-androphilic trans women delusional perverts is okay on the grounds that that which can be destroyed by the truth should be, but that calling out Alexander and Yudkowsky would be unjustified on the grounds of starting a war or whatever. Jessica was on board with a project to tear down narcissistic fantasies in general, but not a project that starts by tearing down trans people's narcissistic fantasies, then emits spurious excuses for not following that effort where it leads.

Somewhat apologetically, I replied that the distinction between truthfully, publicly criticizing group identities and named individuals still seemed important to me?—as did avoiding leaking info from private conversations. I would be more comfortable writing a scathing blog post about the behavior of "rationalists", than about a specific person not adhering to good discourse norms in an email conversation that they had good reason to expect to be private. I thought I was consistent about this; contrast my writing with the way that some anti-trans writers name and shame particular individuals. (The closest I had come was mentioning Danielle Muscato as someone who doesn't pass—and even there, I admitted it was "unclassy" and done out of desperation.) I had to acknowledge that criticism of non-exclusively-androphilic trans women in general implied criticism of Jessica, and criticism of "rationalists" in general implied criticism of Yudkowsky and Alexander and me, but the extra inferential step and "fog of probability" seemed to make the speech act less of an attack. Was I wrong?

Michael said this was importantly backwards: less precise targeting is more violent. If someone said, "Michael Vassar is a terrible person," he would try to be curious, but if they didn't have an argument, he would tend to worry more "for" them and less "about" them, whereas if someone said, "The Jews are terrible people," he saw that as a more serious threat to his safety. (And rationalists and trans women are exactly the sort of people who get targeted by the same people who target Jews.)

Polishing the advanced categories argument from earlier email drafts into a solid Less Wrong post didn't take that long: by 6 April 2019, I had an almost complete draft of the new post, "Where to Draw the Boundaries?", that I was pretty happy with.

The title (note: "boundaries", plural) was a play off of "Where to Draw the Boundary?" (note: "boundary", singular), a post from Yudkowsky's original Sequence on the 37 ways in which words can be wrong. In "... Boundary?", Yudkowsky asserts (without argument, as something that all educated people already know) that dolphins don't form a natural category with fish ("Once upon a time it was thought that the word 'fish' included dolphins [...] Or you could stop playing nitwit games and admit that dolphins don't belong on the fish list"). But Alexander's "... Not Man for the Categories" directly contradicts this, asserting that there's nothing wrong with the biblical Hebrew word dagim encompassing both fish and cetaceans (dolphins and whales). So who's right—Yudkowsky (2008) or Alexander (2014)? Is there a problem with dolphins being "fish", or not?

In "... Boundaries?", I unify the two positions and explain how both Yudkowsky and Alexander have a point: in high-dimensional configuration space, there's a cluster of finned water-dwelling animals in the subspace of the dimensions along which finned water-dwelling animals are similar to each other, and a cluster of mammals in the subspace of the dimensions along which mammals are similar to each other, and dolphins belong to both of them. Which subspace you pay attention to depends on your values: if you don't care about predicting or controlling some particular variable, you have no reason to look for similarity clusters along that dimension.

But given a subspace of interest, the technical criterion of drawing category boundaries around regions of high density in configuration space still applies. There is Law governing which uses of communication signals transmit which information, and the Law can't be brushed off with, "whatever, it's a pragmatic choice, just be nice." I demonstrate the Law with a couple of simple mathematical examples: if you redefine a codeword that originally pointed to one cluster in ℝ³, to also include another, that changes the quantitative predictions you make about an unobserved coordinate given the codeword; if an employer starts giving the title "Vice President" to line workers, that decreases the mutual information between the job title and properties of the job.

(Jessica and Ben's discussion of the job title example in relation to the Wikipedia summary of Jean Baudrillard's Simulacra and Simulation got published separately and ended up taking on a life of its own in future posts, including a number of posts by other authors.)

Sarah asked if the math wasn't a bit overkill: were the calculations really necessary to make the basic point that good definitions should be about classifying the world, rather than about what's pleasant or politically expedient to say?

I thought the math was important as an appeal to principle—and as intimidation. (As it was written, the tenth virtue is precision! Even if you cannot do the math, knowing that the math exists tells you that the dance step is precise and has no room in it for your whims.)

"... Boundaries?" explains all this in the form of discourse with a hypothetical interlocutor arguing for the I-can-define-a-word-any-way-I-want position. In the hypothetical interlocutor's parts, I wove in verbatim quotes (without attribution) from Alexander ("an alternative categorization system is not an error, and borders are not objectively true or false") and Yudkowsky ("You're not standing in defense of truth if you insist on a word, brought explicitly into question, being used with some particular meaning"; "Using language in a way you dislike is not lying. The propositions you claim false [...] is not what the [...] is meant to convey, and this is known to everyone involved; it is not a secret") and Bensinger ("doesn't unambiguously refer to the thing you're trying to point at").

My thinking here was that the posse's previous email campaigns had been doomed to failure by being too closely linked to the politically contentious object-level topic, which reputable people had strong incentives not to touch with a ten-meter pole. So if I wrote this post just explaining what was wrong with the claims Yudkowsky and Alexander had made about the philosophy of language, with perfectly innocent examples about dolphins and job titles, that would remove the political barrier to Yudkowsky correcting the philosophy of language error. If someone with a threatening social-justicey aura were to say, "Wait, doesn't this contradict what you said about trans people earlier?", the reputable people could stonewall them. (Stonewall them and not me!)

Another reason someone might be reluctant to correct mistakes when pointed out is the fear that such a policy could be abused by motivated nitpickers. It would be pretty annoying to be obligated to churn out an endless stream of trivial corrections by someone motivated to comb through your entire portfolio and point out every little thing you did imperfectly, ever.

I wondered if maybe, in Scott or Eliezer's mental universe, I was a blameworthy (or pitiably mentally ill) nitpicker for flipping out over a blog post from 2014 (!) and some Tweets (!!) from November. I, too, had probably said things that were wrong five years ago.

But I thought I had made a pretty convincing case that a lot of people were making a correctable and important rationality mistake, such that the cost of a correction (about the philosophy of language specifically, not any possible implications for gender politics) would be justified here. As Ben pointed out, if someone had put this much effort into pointing out an error I had made four months or five years ago and making careful arguments for why it was important to get the right answer, I probably would put some serious thought into it.

I could see a case that it was unfair of me to include political subtext and then only expect people to engage with the politically clean text, but if we weren't going to get into full-on gender-politics on Less Wrong (which seemed like a bad idea), but gender politics was motivating an epistemology error, I wasn't sure what else I was supposed to do. I was pretty constrained here!

(I did regret having accidentally poisoned the well the previous month by impulsively sharing "Blegg Mode" as a Less Wrong linkpost. "Blegg Mode" had originally been drafted as part of "... To Make Predictions" before getting spun off as a separate post. Frustrated in March at our failing email campaign, I thought it was politically "clean" enough to belatedly share, but it proved to be insufficiently deniably allegorical, as evidenced by the 60-plus-entry trainwreck of a comments section. It's plausible that some portion of the Less Wrong audience would have been more receptive to "... Boundaries?" if they hadn't been alerted to the political context by the comments on the "Blegg Mode" linkpost.)

On 13 April 2019, I pulled the trigger on publishing "... Boundaries?", and wrote to Yudkowsky again, a fourth time (!), asking if he could either publicly endorse the post, or publicly comment on what he thought the post got right and what he thought it got wrong—and that if engaging on this level was too expensive for him in terms of spoons, if there was any action I could take to somehow make it less expensive. The reason I thought this was important, I explained, was that if rationalists in good standing find themselves in a persistent disagreement about rationality itself, that seemed like a major concern for our common interest, something we should be eager to definitively settle in public (or at least clarify the current state of the disagreement). In the absence of a rationality court of last resort, I feared the closest thing we had was an appeal to Eliezer Yudkowsky's personal judgment. Despite the context in which the dispute arose, this wasn't a political issue. The post I was asking for his comment on was just about the mathematical laws governing how to talk about, e.g., dolphins. We had nothing to be afraid of here. (Subject: "movement to clarity; or, rationality court filing").

I got some pushback from Ben and Jessica about claiming that this wasn't "political". What I meant by that was to emphasize (again) that I didn't expect Yudkowsky or "the community" to take a public stance on gender politics. Rather, I was trying to get "us" to take a stance in favor of the kind of epistemology that we were doing in 2008. It turns out that epistemology has implications for gender politics that are unsafe, but that's more inferential steps. And I guess I didn't expect the sort of people who would punish good epistemology to follow the inferential steps?

Anyway, again without revealing any content from the other side of any private conversations that may or may not have occurred, we did not get any public engagement from Yudkowsky.

It seemed that the Category War was over, and we lost.

We lost?! How could we lose?! The philosophy here was clear-cut. This shouldn't be hard or expensive or difficult to clear up. I could believe that Alexander was "honestly" confused, but Yudkowsky?

I could see how, under ordinary circumstances, asking Yudkowsky to weigh in on my post would be inappropriately demanding of a Very Important Person's time, given that an ordinary programmer such as me was surely as a mere worm in the presence of the great Eliezer Yudkowsky. (I would have humbly given up much sooner if I hadn't gotten social proof from Michael and Ben and Sarah and "Riley" and Jessica.)

But the only reason for my post to exist was because it would be even more inappropriately demanding to ask for a clarification in the original gender-political context. The economist Thomas Schelling (of "Schelling point" fame) once wrote about the use of clever excuses to help one's negotiating counterparty release themself from a prior commitment: "One must seek [...] a rationalization by which to deny oneself too great a reward from the opponent's concession, otherwise the concession will not be made."[10] This is what I was trying to do when soliciting—begging for—engagement or endorsement of "... Boundaries?" By making the post be about dolphins, I was trying to deny myself too great of a reward on the gender-politics front. I don't think it was inappropriately demanding to expect "us" (him) to be correct about the cognitive function of categorization. I was trying to be as accommodating as I could, short of just letting him (us?) be wrong.

I would have expected him to see why we had to make a stand here, where the principles of reasoning that made it possible for words to be assigned interpretations at all were under threat.

A hill of validity in defense of meaning.

Maybe that's not how politics works? Could it be that, somehow, the mob-punishment mechanisms that weren't smart enough to understand the concept of "bad argument (categories are arbitrary) for a true conclusion (trans people are OK)", were smart enough to connect the dots between my broader agenda and my abstract philosophy argument, such that VIPs didn't think they could endorse my philosophy argument, without it being construed as an endorsement of me and my detailed heresies?

Jessica mentioned talking with someone about me writing to Yudkowsky and Alexander about the category boundary issue. This person described having a sense that I should have known it wouldn't work—because of the politics involved, not because I wasn't right. I thought Jessica's takeaway was poignant:

Those who are savvy in high-corruption equilibria maintain the delusion that high corruption is common knowledge, to justify expropriating those who naively don't play along, by narratizing them as already knowing and therefore intentionally attacking people, rather than being lied to and confused.

Should I have known that it wouldn't work? Didn't I "already know", at some level?

I guess in retrospect, the outcome does seem kind of obvious—that it should have been possible to predict in advance, and to make the corresponding update without so much fuss and wasting so many people's time.

But it's only "obvious" if you take as a given that Yudkowsky is playing a savvy Kolmogorov complicity strategy like any other public intellectual in the current year.

Maybe this seems banal if you haven't spent your entire adult life in his robot cult. From anyone else in the world, I wouldn't have had a problem with the "hill of validity in defense of meaning" thread—I would have respected it as a solidly above-average philosophy performance before setting the bozo bit on the author and getting on with my day. But since I did spend my entire adult life in Yudkowsky's robot cult, trusting him the way a Catholic trusts the Pope, I had to assume that it was an "honest mistake" in his rationality lessons, and that honest mistakes could be honestly corrected if someone put in the effort to explain the problem. The idea that Eliezer Yudkowsky was going to behave just as badly as any other public intellectual in the current year was not really in my hypothesis space.

Ben shared the account of our posse's email campaign with someone who commented that I had "sacrificed all hope of success in favor of maintaining his own sanity by CC'ing you guys." That is, if I had been brave enough to confront Yudkowsky by myself, maybe there was some hope of him seeing that the game he was playing was wrong. But because I was so cowardly as to need social proof (because I believed that an ordinary programmer such as me was as a mere worm in the presence of the great Eliezer Yudkowsky), it probably just looked to him like an illegible social plot originating from Michael.

One might wonder why this was such a big deal to us. Okay, so Yudkowsky had prevaricated about his own philosophy of language for political reasons, and he couldn't be moved to clarify even after we spent an enormous amount of effort trying to explain the problem. So what? Aren't people wrong on the internet all the time?

This wasn't just anyone being wrong on the internet. In an essay on the development of cultural traditions, Scott Alexander had written that rationalism is the belief that Eliezer Yudkowsky is the rightful caliph. To no small extent, I and many other people had built our lives around a story that portrayed Yudkowsky as almost uniquely sane—a story that put MIRI, CfAR, and the "rationalist community" at the center of the universe, the ultimate fate of the cosmos resting on our individual and collective mastery of the hidden Bayesian structure of cognition.

But my posse and I had just falsified to our satisfaction the claim that Yudkowsky was currently sane in the relevant way. Maybe he didn't think he had done anything wrong (because he hadn't strictly lied), and probably a normal person would think we were making a fuss about nothing, but as far as we were concerned, the formerly rightful caliph had relinquished his legitimacy. A so-called "rationalist" community that couldn't clarify this matter of the cognitive function of categories was a sham. Something had to change if we wanted a place in the world for the spirit of "naïve" (rather than politically savvy) inquiry to survive.

(To be continued. Yudkowsky would eventually clarify his position on the philosophy of categorization in September 2020—but the story leading up to that will have to wait for another day.)

  1. Similarly, in automobile races, you want rules to enforce that all competitors have the same type of car, for some commonsense operationalization of "the same type", because a race between a sports car and a moped would be mostly measuring who has the sports car, rather than who's the better racer. ↩︎

  2. And in the case of sports, the facts are so lopsided that if we must find humor in the matter, it really goes the other way. A few years later, Lia Thomas would dominate an NCAA women's swim meet by finishing 4.2 standard deviations (!!) earlier than the median competitor, and Eliezer Yudkowsky feels obligated to pretend not to see the problem? You've got to admit, that's a little bit funny. ↩︎

  3. Despite my misgivings, this blog was still published under a pseudonym at the time; it would have been hypocritical of me to accuse someone of cowardice about what they're willing to attach their real name to. ↩︎

  4. The title was a pun referencing computer scientist Scott Aaronson's post advocating "The Kolmogorov Option", serving the cause of Truth by cultivating a bubble that focuses on specific truths that won't get you in trouble with the local political authorities. Named after the Soviet mathematician Andrey Kolmogorov, who knew better than to pick fights he couldn't win. ↩︎

  5. In Part One, Chapter VII, "The Exploiters and the Exploited". ↩︎

  6. CfAR had been spun off from MIRI in 2012 as a dedicated organization for teaching rationality. ↩︎

  7. Yudkowsky's Sequences (except the last) had originally been published on Overcoming Bias before the creation of Less Wrong in early 2009. ↩︎

  8. In general, I'm proud of my careful choices of breakup songs. For another example, my breakup song with institutionalized schooling was Taylor Swift's "We Are Never Ever Getting Back Together", a bitter renunciation of an on-again-off-again relationship ("I remember when we broke up / The first time") with a ex who was distant and condescending ("And you, would hide away and find your peace of mind / With some indie record that's much cooler than mine"), thematically reminiscent of my ultimately degree-less string of bad relationships with UC Santa Cruz (2006–2007), Heald College (2008), Diablo Valley College (2010–2012), and San Francisco State University (2012–2013).

    The fact that I've invested so much symbolic significance in carefully-chosen songs by female vocalists to mourn relationships with abstract perceived institutional authorities, and conspicuously not for any relationships with actual women, maybe tells you something about how my life has gone. ↩︎

  9. Probably a lot of other people who lived in Berkeley would find it harder to criticize trans people than to criticize some privileged white guy named Yudkowski or whatever. But those weren't the relevant power gradients in my world. ↩︎

  10. The Strategy of Conflict, Ch. 2, "An Essay on Bargaining" ↩︎

New Comment
118 comments, sorted by Click to highlight new comments since: Today at 10:48 AM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

So, Zack, I agree with most of your takes on the object level issues here. At the same time, the amount of motivated reasoning and dishonesty you attribute to this community and Yudkowsky in particular as a result of these passing comments seems comically exaggerated. Personally, I cannot recall any discussion of gender or transgenderism on LessWrong, or from major LessWrong contributors outside of LessWrong, except for yours. A few tweets from Eliezer asking to address trans people as they wish does not substantiate to me this sky-is-falling level of panic about community epistemics you seem to have.

Whether the sky is falling depends on how high the sky was in the past, and whether that's worth panicking over depends on your utility function over sky height? (That's the short version. The long version is another 70,000 words over four posts.)

I do not think this is a matter of merely having held Eliezer in less esteem. There is something to be said about how LessWrong developed a cult of personality around Eliezer, but rather than an objection to the cult of personality per se, what your posts are is a criticism of Eliezer for not living up to the standards of his personality cult, with small notes in passing about how unhealthy your reverence to him was.

The long version is another 70,000 words over four posts.

Criticisms of particular people or groups that long tend to be nebulous and pathological rather than based in some reasonable concern. I hope you will understand if I am too skeptical to read the whole thing, if it cannot be summarized into something concrete.

Thanks. It sounds like you should regard my sky-is-falling level of panic as unsubstantiated until I come back to you with a summary at the end, at which time you can reëvaluate the question of whether I was correct to panic.

Responding to Zack's comment here in a new thread since the other thread went in a different direction.

The thing me and my allies were hoping for a "court ruling" on was not about who should or shouldn't be held in high regard, but about the philosophical claim that one "ought to accept an unexpected [X] or two deep inside the conceptual boundaries of what would normally be considered [Y] if [positive consequence]". (I think this is false.) That's what really matters, not who we should or shouldn't hold in high regard.

I found this a helpful crisp summary of the actual thing you want. (I realize you're not done with this blogseries yet, and probably the blogseries is serving multiple purposes, but insofar as what-you-wanted hasn't happened, I think writing a post at the end that succinctly spells out the things you want that you don't feel like you've gotten yet would probably be worthwhile)

A thing I'm still somewhat fuzzy on is whether you think this court ruling is about "on LessWrong / in truth-focused contexts/communities", "worldwide in all contexts" (or something in-between), and insofar as you think it's "worldwide in all contexts", if you think this has the same degree of cr... (read more)

Thanks for your patience. For the most part, I try to be reluctant to issue proclamations about what other people "should" do, as I explain in "I Don't Do Policy". ("Should" claims can be decomposed into conditional predictions about consequences given actions, and preferences over consequences. The conditional predictions can be evaluated on their merits, and I don't control other people's preferences.) In particular, I don't think there's One True social gender convention.

The "court ruling" thing was an unusual case where, you know, I had been under the impression that this subculture with Eliezer Yudkowsky as its acknowledged leader was serious about being "spec'd into prioritizing truth/mapmaking". It really seemed like the kind of thing that shouldn't be hard to clear up—that he would perceive an interest in clearing up, after the problem had been explained.

My position at the time was, "Scott should retract it and we should all agree he should retract it". Since that didn't happen despite absurd efforts, my conclusion is more that there is no "we". It would be nice if there were a subculture that had spec'd into prioritzing truth/mapmaking, but our little cult/patronage-networ... (read more)

Nod, thanks.

Okay, rewriting this to check my understanding, you're saying:

  • In a rationalist community that was actively pretty successful at being good at mapmaking, more people would have proactively noticed that that particular line of Scott's was false. The fact that this didn't happen is evidence about the state of how much one should trust (and have trusted) the rationality community to live up to its marketing material.
  • But, rather than being primarily interested in actually prosecuting that case, at this point you think it's more important to drive home the point "we're already living in the world where the community failed to notice that this was a rationality test and they failed", and that this has implications for how people should think of "the rationality community" (or lack thereof)

I didn't quite understand the second clause until you spelled it out just now, thanks.

Overall I am still more focused on "actual live up to the marketing hype" because, well, I actually think we... just need good enough epistemics to handle high stakes decisions with unclear technical underpinnings and political motivations. I'd want to get the Real Thing whether or not I previously believed ... (read more)

There never was.

and that it's actually immoral not to believe in psychological sex differences given that psychological sex differences are actually real

Perhaps the archetypal psychological sex difference that people have argued about is "women are emotional, men are rational".

After reading your previous post where you quoted Deirdre McCloskey's memoir, I started reading the memoir a bit too, and it actually provides a neat example of this psychological sex difference.

There was a period where McCloskey's wife got all emotional and started complaining about McCloskey spending too much money on the phone bill. McCloskey very rationally pointed out that it was very cheap compared to e.g. therapy or hobbies. Clean example of female irrationality, right?

Of course, if one looks at the extended context, the picture is very different; Deirdre had initially assured the wife that it was just crossdressing and nothing else, and they had agreed with each other to put the crossdressing into the background, but now Deirdre was seriously considering transitioning, yet insisting that things like beard shaving was just crossdressing and nothing more. Essentially, from the beginning McCloskey took rational conversa... (read more)

(Let's not forget James Damore's memo, who cited research on greater female neuroticism as a justification for ignoring women's issues with their workplace.)

I don't think that's true, and if anything it looks to be the opposite.  Original document; the relevant quotes about neuroticism and what to do about it seem to be:

Personality differences

  • Neuroticism (higher anxiety, lower stress tolerance).
    ○ This may contribute to the higher levels of anxiety women report on Googlegeist and to the lower number of women in high stress jobs.


Non-discriminatory ways to reduce the gender gap


  • Women on average are more prone to anxiety
    • Make tech and leadership less stressful.  Google already partly does this with its many stress reduction courses and benefits.
Maybe I'm misunderstanding how the Googlegeist works. At my workplace, we regularly have surveys where we get asked about how we feel about various things. But if we report negative feelings, we get asked for suggestions about what is wrong. The way I had imagined the situation is, someone working with the Googlegeist had noticed that a lot of women reported anxiety or whatever, and had decided they need to work with women to figure out what's going on here, to solve it. And then James Damore felt that this was one instance of people looking at a disparity and claiming injustice, and that since he finds it biologically inevitable that women would be anxious, this shouldn't be treated as indicative of an external problem, but instead should be medicalized and treated psychologically (or psychiatrically?). But I admit I haven't looked much into it so maybe the above model is wrong; originally when the Damore memo came out, I supported him, and it's only later I've been thinking that maybe I shouldn't have supported him. But I haven't had much chance to talk with people about it. Gotta go sleep.

The way I had imagined the situation is, someone working with the Googlegeist had noticed that a lot of women reported anxiety or whatever, and had decided they need to work with women to figure out what's going on here, to solve it. And then James Damore felt that this was one instance of people looking at a disparity and claiming injustice, and that since he finds it biologically inevitable that women would be anxious, this shouldn't be treated as indicative of an external problem, but instead should be medicalized and treated psychologically (or psychiatrically?). [italics added]

As a side note, I consider the italicized part a rather weighty accusation.  I think one should therefore be careful about making such an accusation.  I guess, in this case, you were just honestly reporting the contents of your brain on the matter, not necessarily making an accusation.

Still, I think this to some extent illustrates an epistemic environment where it's normal to throw around damaging accusations whose truth value is somewhere between "extremely uncharitable interpretation" and "objectively false".  Precisely the type that got Damore fired, in other words.  Do we have suc... (read more)

In more detail, my background is I used to subscribe to research into psychological differences between the sexes and the races, with a major influence in my views being Scott Alexander (though there's also a whole backstory to how I got into this).

I eventually started doing my own empirical research into transgender topics, and found Blanchardianism/autogynephilia theory to give the strongest effect sizes.

And as I was doing this, I was learning more about how to perform this sort of research; psychometrics, causal inference, psychology, etc.. Over time, I got a feeling for what sorts of research questions are fruitful, what sort of methods and critiques are valid, and what sorts of dynamics and distinctions should be paid attention to.

But I also started getting a feeling for how the researchers into differential psychology operate. Here's a classical example; an IQ researcher who is so focused on providing a counternarrative to motivational theories that he uses methods which are heavily downwards biased to "prove" that IQ test scores don't depend on effort. Or Simon Baron-Cohen playing Motte-Bailey with the "extreme male brain" theory of autism.

More abstractly, what I've generall... (read more)

I'll address this first:

More abstractly, what I've generally noticed is:

  • These sorts of people are not very interested in actually developing substantive theory or testing their claims in strong ways which might disprove them.
  • Instead they are mainly interested in providing a counternarrative to progressive theories.
  • They often use superficial or invalid psychometric methods.
  • They often make insinuations that they have some deep theory or deep studies, but really actually don't.

These things are bad, but, apart from point 2, I would ask: how do they compare to the average quality of social science research?  Do you have high standards, or do you just have high standards for one group?  I think most of us spend at least some time in environments where the incentive gradients point towards the latter.  Beware isolated demands for rigor.

Research quality being what it is, I would recommend against giving absolute trust to anyone, even if they appear to have earned it.  If there's a result you really care about, it's good to pick at least one study and dig into exactly what they did, and to see if there are other replications; and the prior probability of "fraud" probably... (read more)

Quick update! I found that OpenPsychometrics has a dataset for the EQ/SQ tests. Unfortunately, there seems to be a problem for the data with the EQ items, but I just ran a factor analysis for the SQ items to take a closer look at your claims here. There appeared to be 3 or 4 factors underlying the correlations on the SQ test, which I'd roughly call "Technical interests", "Nature interests", "Social difficulties" and "Jockyness". I grabbed the top loading items for each of the factors, and got this correlation matrix: The correlations between the technical interests and nature interests plausibly reflects the notion that Systematizing is a thing, though I suspect that it could also be found to correlate with all sorts of other things that would not be considered Systematizing? Like non-Systematizing ways of interacting with nature. Idk though. The sex differences in the items was limited to the technical interests, rather than than also covering the nature interests. This does not fit a simple model of a sex difference in general Systematizing, but it does fit a model where the items are biased towards men but there is not much sex difference in general Systematizing. I would be inclined to think that the Social difficulties items correlate negatively with Empathizing Quotient or positively with Autism Spectrum Quotient. If we are interested in the correlations between general Systematizing and these other factors, then this could bias the comparisons. On the other hand, the Social difficulties items were not very strongly correlated with the overall SQ score, so maybe not. I can't immediately think of any comments for the Jockyness items. Overall, I strongly respect the fact that he made many of the items very concrete, but I now also feel like I have proven that the gender differences on Systematizing to be driven by psychometric shenanigans, and I strongly expect to find that many of the other associations are also driven by psychometric shenanigans. I've
Hm, actually I semi-retract this; the OpenPsychometrics data seems to be based on the original Systematizing Quotient, whereas there seems to be a newer one called Systematizing Quotient-Revised, which is supposedly more gender-neutral. Not sure where I can get data on this, though. Will go looking. Edit: Like I am still pretty suspicious about the SQ-R. I just don't have explicit proof that it is flawed.
Am I gonna have to collect the data myself? I might have to collect the data myself...
Oops, upon reading more about the SQ, I should correct myself: Some of the items, such as S16, are "filler items" which are not counted as part of the score; these are disproportionately part of the "Social difficulties" and "Jockyness" factors, so that probably reduces the amount of bias that can be introduced by those items, and it also also explains why they don't correlate very much with the overall SQ scores. But some of the items for these factors, such as S31, are not filler items, and instead get counted for the test, presumably because they have cross-loadings on the Systematizing factor. So the induced bias is probably not zero. If I get the data from OpenPsychometrics, I will investigate in more detail.
Since I don't have data on the EQ, here's a study where someone else worked with it. They found that the EQ had three factors, which they named "Cognitive Empathy", "Emotional Empathy" and "Social Skills". The male-female difference was driven by "Emotional Empathy" (d=1), whereas the autistic-allistic difference was driven by "Social Skills" (d=1.3). The converse differences were much smaller, 0.24 and 0.66. As such, it seems likely that the EQ lumps together two different kinds of "empathizing", one of which is feminine and one of which is allistic.
I should also say, in the context of IQ and effort, some of the true dispute is about whether effort differences can explain race differences in scores. And for that purpose, what I would do is to go more directly into that. In fact, I have done so. Quoting some discussion I had on Discord: (This was way after I became critical of differential psychology btw. Around 2 months ago.)
I don't know for sure as I am only familiar with certain subsets of social science, but a lot of it is in fact bad. I also often criticize normal social science, but in this context it was this specific area of social science that came up. I would try to perform studies that yield much more detailed information. For instance, mixed qualitative and quantitative studies where one qualitatively inspects the data points that are above-average or below-average for the regressions, to see whether there are identifiable missing factors. If he had phrased his results purely as disproving the importance of incentives, rather than effort, I think it would have been fine. I prefer to think of it as "if you increase your effort from being one of the lowest-effort people to being one of the highest-effort people, you can increase your IQ score by 17 IQ points". This doesn't seem too implausible to me, though admittedly I'm not 100% sure what the lowest-effort people are doing. It's valid to say that extrapolating outside of the tested range is dubious, but IMO this means that the study design is bad. I think it's likely that the limited returns to effort would be reflected in the limited bounds of the scale. So I don't think my position is in tension with the intuition that there's limits on what effort can do for you. Under this model, it is also worth noting that the effort scores were negatively skewed, so this implies that lack of effort is a bigger cause of low scores than extraordinary effort is of high scores. I don't think my results are statistically significantly different from 0.3ish; in the ensuing discussion, people pointed out that the IV results had huge error bounds (because the original study was only barely significant). But also if there is measurement error in the instrument (effort), then that would induce an upwards bias in the IV estimated effect. So that might also contribute. Shitty replications of shitty environmentalist research is still shitty
I mean, I agree that this is obviously a thing, but I continue to maintain hope in the possibility of actually reasoning about sex differences in the physical universe, rather than being resigned to living in the world of warring narratives. I think the named effect sizes help? (I try to be clear about saying "d ≈ 0.6", not "Men are from mars.") I absolutely agree that it's critical to recognize that it's often both. (I'm less sure about how often it's "neither". What stops people from converging?)
I agree. I think also a lot of it is just down to doing better research, including better psychometrics and more qualitative investigations. But what I mean is that the research programs such as people-things and similar that I have seen so far are not good enough, and are not attempting to become good enough sufficiently well that you can expect to just fund them and wait for results. Not because it is fundamentally impossible, but because the challenges are not taken seriously enough. I think if due to priors, the sides have reasons to distrust each other, but the distrust leads to ignoring each other instead of leading to conflict, both sides can remain honest and rational (rather than degenerating into dishonesty due to conflict) while not realizing that the other sides are honest and rational, and so end up not converging?
I don't get what point you're making here.
It's an example of a trans activist who, when asked whether people who want to coordinate sex-based descrimination should be purged from the discourse and authoritative sources, was like "yeah they just sound like mean busybodies to me". Admittedly I didn't really go into detail (partly because I don't really have any strong examples that I support and want to argue about), so we don't really know whether Pervocracy supports it in all relevant cases.

Seems like a lot of the asserted "failure to cut reality at its joints" is about trans people before they have started transitioning or before they pass.

I nominate the term "aspiring woman" for people who are biologically male, perceived as men, but who desire to be perceived as women. (That's what I used to call myself, but people were confused why I didn't just call myself a woman, so instead I resorted to various long winded explanations, and then various inaccurate nonbinary labels that required a tumblr account or university education to understand, and people were still confused, and then I gave up and just identified as a woman.)

Much like "aspiring rationalist" the term is technically correct while still vibes-implying that you are sorta the thing you aspire to but sorta not, or not yet at least.

>and people were still confused, and then I gave up and just identified as a woman D: i know those feels. that's kinda where i am lately. (except man instead of woman)

Minor feedback – I got ~2/3 through, took a break, came back to the post and had trouble finding my place. If there were section headers that formed a table of contents it'd be somewhat easier to get back into it. (though I'm not sure how intentional it was to not have headers, or whether you consider that doing some kinda important stylistic work)

Yudkowsky couldn’t be bothered to either live up to his own stated standards

"his own stated standards" could use a link/citation.

regardless of the initial intent, scrupulous rationalists were paying rent to something claiming moral authority, which had no concrete specific plan to do anything other than run out the clock, maintaining a facsimile of dialogue in ways well-calibrated to continue to generate revenue.

The original Kolmogorov complicity was an instance of lying to protect one's intellectual endeavors. But here you/Ben seem to be accusing Eliezer of doing something much worse, and which seems like a big leap from what came before it in the post. How did you/Ben rule out the Kolmogorov complicity hypothesis (i.e., that Eliezer still had genuine intellectual or altruistic aims that he wanted to protect)?

Of what you wrote specifically, "no concrete specific plan" is in my view actually a point in Eliezer's favor, as it's a natural consequence of high alignment difficulty and intellectual honesty. "Run out the clock" hardly seems fair, and by "maintaining a facsimile of dialogue" what are you referring to? Are you including things like the 2021 MIRI Conversations and if... (read more)

which seems like a big leap from what came before it in the post

Sorry, the fifth- to second-to-last paragraphs of the originally published version of this post were egregiously terrible writing on my part. (I was summarizing some things Ben said at the time that felt like a relevant part of the story, but what I actually needed to do was explain in my own words the points that I want to endorsedly convey to my readers.)

I've rewritten that passage (now the third- and second-to-last paragraphs). I hope this version is clearer.

I'm not conjecturing anything worse than Kolmogorov complicity. (And the 2021 MIRI conversations were great.) I do think political censorship is significantly more damaging to epistemic conditions than many others seem to. People playing a Kolmogorov complicity strategy typically seem to think that it's cheap to just avoid a few sensitive topics. But the disturbing thing about the events described in this post was that the distortion didn't stay confined to sensitive topics: the reversal (in emphasis and practice, if not outright logical contradiction) from "words can be wrong" to "you're not standing in defense of truth [...]" is about the cognitive function... (read more)

Scott took this to mean that what convention to use is a pragmatic choice we can make on utilitarian grounds, and that being nice to trans people was worth a little bit of clunkiness—that the mental health benefits to trans people were obviously enough to tip the first-order utilitarian calculus.

I didn't think anything about "mental health benefits to trans people" was obvious.

There's a scottpost that seems relevant called Be Nice, At Least Until You Can Coordinate Meanness. It's not a perfect fit because the thing we are coordinating here isn't meanness. But maybe there are attractor states of mental health, and a mental health local maxima for a trans person is for their pronouns to be respected, but there's also a global maxima where we decide to define "man" and "women" on biological terms like we define "male" and "female" and treat gender dysphoria in some way other than gender affirmation. Perhaps it could be true that if we could coordinate this new paradigm, the mental health of trans people would in the long term be better, because they can live with treated gender dysphoria without transitioning; but in the current paradigm, where [let's say] a woman is someone who ident... (read more)

Sorry, but I just wasn't able to read the whole thing carefully, so I might be missing your relevant writing; I apologize if this comment retreads old ground.

It seems to me like the reasonable thing to do in this situation is:

  • Make whatever categories in your map you would be inclined to make in order to make good predictions. For example, personally I have a sort of "trans women" category based on the handful of trans women I have known reasonably well, which is closer to the "man" category than to the "woman" category, but has some somewhat distinct tra
... (read more)
I don't think "non-straightforward or dishonest language" enters into it very much, but I don't have the clusters you have. I know cis women with "male-pattern" personalities and interests and trans women with "female-pattern" personalities and interests. (Not really any cis men with "female-pattern" personalities and interests, but society does its best to ensure that doesn't happen.) In some online spaces where I don't share demographic information, people sometimes take me for a member of the opposite sex. "Male-pattern" and "female-pattern" are culture- and class-bound anyway - there are many different types of guy. I don't get much use out of categorizing people by biological sex. In repeated interpersonal interactions, of course, you just construct a model of the person, and then you don't need the categories so much. You still have to figure out who uses which bathroom, but the "you" here unpacks to "the state", which sees in its own way - a low-resolution way that can't be said to track truth. Unless you're prepared to reject the entire analytic tradition, categories aren't even real - they're abstractions over entities. Maybe some are more useful than others, but if you recognize "trans woman" as a third gender (surely a more useful categorization than "trans women are men"[1]), how many genders are there? Are "nerd" and "jock" genders? "Butch" and "femme"? [1] If this seems surprising to you, remember that LW and the social strata it recruits from contain highly atypical men! For example: what percentage of the male LW userbase knows the basic rules of a major spectator sport?

I said, "I need the phrase 'actual women' in my expressive vocabulary to talk about the phenomenon where, if transition technology were to improve, then the people we call 'trans women' would want to make use of that technology; I need language that asymmetrically distinguishes between the original thing that already exists without having to try, and the artificial thing that's trying to imitate it to the limits of available technology".

Kelsey Piper replied, "the people getting surgery to have bodies that do 'women' more the way they want are mostly cis wo

... (read more)
I mean, men also have to put in effort to perform masculinity, or be seen as being inadequate men; I don't think this is a gendered thing. But even a man that isn't "performing masculinity adequately", an inadequate man, like an inadequate woman, is still a distinct category, and though transwomen, like born women, aim to perform femininity, transwomen have a higher distance to cross and in doing so traverse between clusters along several dimensions. I think we can meaningfully separate "perform effort to transition in adequacy" from "perform effort to transition in cluster", even if the goal is the same. (From what I gather of VRChat, the ideal man is also a pretty girl that is overall human with the exception of cat ears and a tail...)

Sarah asked if the math wasn't a bit overkill: were the calculations really necessary to make the basic point that good definitions should be about classifying the world, rather than about what's pleasant or politically expedient to say?

I thought the math was important as an appeal to principle—and as intimidation. (As it was written, the tenth virtue is precision! Even if you cannot do the math, knowing that the math exists tells you that the dance step is precise and has no room in it for your whims.)

FYI I found the math pretty valuable (even though I di... (read more)

Some people I usually respect for their willingness to publicly die on a hill of facts, now seem to be talking as if pronouns are facts, or as if who uses what bathroom is necessarily a factual statement about chromosomes. Come on, you know the distinction better than that!

Even if somebody went around saying, "I demand you call me 'she' and furthermore I claim to have two X chromosomes!", which none of my trans colleagues have ever said to me by the way, it still isn't a question-of-empirical-fact whether she should be called "she". It's an act.

In saying t

... (read more)
2Eli Tyre8d
But, he's not claiming that this is the crux of contemporary trans-rights debates? He's pointing out the distinction between facts and policy mainly because he has a particular interest in epistemology, not because he has a particular interest in the trans-rights debates. There's an active debate, which he's mostly not very interested in. But one sub-thread of that debate is some folks making what he considers to be an ontological error, which he points out, because he cares about that class of error, separately from the rest of the context.
2Eli Tyre8d
I don't get it. He's explicitly disclaiming that he's not commenting on that situation? But that means that we should take his thread here as implicitly commenting on that situation?  I think I must be missing the point, because my summary here seems to uncharitable to be right.

I did not make any negative updates about Scott, Kelsey, Anna or Eliezer, based on the accusations of gaslighting, motivated reasoning, insanity, corruption, etc. in this post. I don't know any of them personally, but I hold them all in high regard based on their public work, and nothing Zack has written has changed my view.

(This is just my own opinion and reaction after reading / skimming >40k words, stated without argument or explanation or intent to engage further, to get it out there as something people can react to or agree / disagree vote on. Agreement votes on LW are a far cry from a "court of rationality", but they might help others weigh whether responding further is worthwhile.)

Life is not graded on a curve and you can update yourself incrementally? I also hold all of those people in high regard—relative to the rest of the world. (And Anna is a personal friend.) I think all of those people hold me in high regard, relative to the rest of the world. Nevertheless, it seems like there ought to be a time and a place to talk about people having been culpably wrong about some things, even while the same people have also done a lot of things right? (I think I apply this symmetrically; if someone wants to write a 22,000 word blog post about the ways in which my intellectual conduct failed to live up to standards, that's fine with me.) The thing me and my allies were hoping for a "court ruling" on was not about who should or shouldn't be held in high regard, but about the philosophical claim that one "ought to accept an unexpected [X] or two deep inside the conceptual boundaries of what would normally be considered [Y] if [positive consequence]". (I think this is false.) That's what really matters, not who we should or shouldn't hold in high regard. People's reputations only come into it because of considerations like, if you think Scott is right about the philosophical claim, that should be to his credit and to my detriment, and vice versa. The reason I'm telling this Whole Dumb Story about people instead of just making object-level arguments about ideas, is that I tried making object-level arguments about ideas first, for seven years, and it wasn't very effective. At some point, I think it makes sense to jump up a meta level and try to reason about people to figure out why the ideas aren't landing.
-1Max H7mo
I understand what you're trying to do. "Hold in high regard" is maybe the wrong choice of phrase since it connotes something more status-y than I intended; what I'm really saying is more general: this post failed to convince me of anything in particular, at any level of meta. I'm not inclined to wade into the actual arguments or explain why I feel that way, but I commented anyway since this post does impugn on various people's internal motivations and reputations pretty directly, and because if my sentiment is widely shared, that might inform others' decision about whether to respond or engage themselves. I realize this might be frustrating and come across as intellectually rude or lazy to you or anyone who has a different assessment. Blunt / harsh / vague feedback still seemed better than no feedback, and I wanted to gauge sentiment and offer others a low-effort way to give slightly more nuanced feedback than a vote on the top-level post. Sure, I'm generally fine with you or others doing this (about anyone), but I can't imagine many people wanting to, and I'm hypothesizing above that the ROI would be low. I think that's a very good reason to go meta! And I sympathize deeply with the feeling of having your object-level ideas ignored or misunderstood no matter how many times you explain. But I also think reasoning about the internal motivations / sanity failures / rationality of others is very hard to do responsibly, and even harder to do correctly. So, more blunt feedback: I think you have mostly succeeded in responsibly presenting your case. This post is a bit rambly, but otherwise clearly written and extremely detailed and... I wouldn't exactly call it "objective", but I mostly trust that is an honest and accurate depiction of the key facts, and that you are not close to impinging on any rights or expectations of privacy. And I have no problem with going meta and reasoning about others' internal motivations and thoughts when done responsibly. It's just that th
Would you do it for $40? I can do PayPal or mail you a check. Sorry, I know that's not very much money. I'm low-balling this one partially because I'm trying to be more conservative with money since I'm not dayjobbing right now, and partially because I don't really expect you to have any good arguments (and I feel OK about throwing away $40 to call your bluff, but I would feel bad spending more than that on lousy arguments). Usually when people have good arguments and are motivated to comment at all, they use the comment to explain the arguments. So far, you've spent 475 words telling me nothing of substance except that you disagree, somehow, about something. This is not a promising sign. I don't know why you're grouping these together. Blunt is great. Harsh is great. Vague is useless. The reason that blunt is great and harsh is great is because detailed, intellectually substantive feedback phrased in a blunt or harsh manner can be evaluated on its intellectual merits. The merits can't be faked—or at least, I think I'm pretty good at distinguishing between good and bad criticisms. The reason vague is useless is because vague feedback is trivially faked. Anyone can just say "this post failed to convince me of anything in particular, at any level of meta" or "the reasoning also has to be correct, and I claim [...] that isn't the case here." Why should I trust you? How do I know you're not bluffing? If I got something wrong, I want to know about it! (I probably got lots of things wrong; humans aren't good at writing 22,000 word documents without making any mistakes, and the Iceman/Said interaction already makes me think the last five paragraphs need a rewrite.) But I'm not a mind-reader: in order for me to become less wrong, someone does, actually, have to tell me what I got wrong, specifically. Seems like an easy way to earn forty bucks if you ask me!

Accepted, but I want to register that I am responding because you have successfully exerted social pressure, not because of any monetary incentive. I don't mind the offer / ask (or the emotional appeals / explanations), but in the future I would prefer that you (or anyone) make such offers via PM.

Semi-relatedly, on cheerful prices, you wrote:

But that seemed sufficiently psychologically coercive and socially weird that I wasn't sure I wanted to go there.

I don't think there's anything weird or coercive about offering to pay someone to respond or engage. But there are often preconditions that must be met between the buyer and seller for a particular cheerful price transaction to clear. For the seller to feel cheerful about a transaction at any price, the seller might need to be reasonably confident that (a) the buyer understands what they are buying (b) the buyer will not later regret having transacted (c) the seller will not later regret having transacted.

This requires a fair amount of trust between the buyer and seller, the buyer to have enough self-knowledge and stability, and for the seller to know enough about the buyer (or perform some due diligence) to be reasonably confide... (read more)


Thanks!! Receipt details (to your selected charity) in PM yesterday.

Or ...

(Another part of the problem here might be that I think the privacy norms let me report on my psychological speculations about named individuals, but not all of the evidence that supports them, which might have come from private conversations.)

Okay, so Yudkowsky had prevaricated about his own philosophy of language

[shaky premise]

But it's not a premise being introduced suddenly out of nowhere; it's a conclusion argued for at length earlier in the piece. Prevaricate, meaning, "To shift or turn from direct speech or behaviour [...] to waffle or be (intentionally) ambiguous." "His own philosophy of language", meaning, that he wrote a 30,000 word Sequence elaborating on 37 ways in which words can be wrong, including #30, "Your definition draws a boundary around things that don't really belong together."

When an author who wrote 30,000 words in 2008 pounding home over and over and over again that "words can be wrong", then turns around in 2018 and says that "maybe as a matter of policy, you want to make a case for language being used a certain way [...] [b]ut you're not making a stand for Truth... (read more)

2Martin Randall7mo
I read Yudkowsky as asserting: 1. A very high estimate of his own intelligence, eg comparable to Feynman and Hofstadter. 2. A very high estimate of the value of intelligence in general, eg sufficient to takeover the world and tile the lightcone using only an internet connection. 3. A very high estimate of the damage caused on Earth by insufficient intelligence, eg human extinction. In the fictional world of Dath Ilan where Yudkowsky is the median inhabitant, Yudkowsky says they are on track to solve the alignment (~95% confidence). Whereas in the actual world he says we are on track to go extinct (~100% confidence). Causing human extinction would be criminal if done knowingly, so this satisfies Zach's claim as written. I'm leaving this light on links, because I'm not sure what of the above you might object to. I realize that you had many other objections to Zach's framing, but I thought this could be something to drill into. Edit: I'm not offering any money to respond, and independently of that, it's 100% fine if you don't want to respond.

I mostly take issue with the phrases "criminally insane", "causing enormous damage", and "he, almost uniquely, was not" connoting a more unusual and more actionable view than Eliezer (or almost anyone else) actually holds or would agree with.

Lots of people in the world are doing things that are straightforwardly non-optimal, often not even in their own narrow self-interest. This is mostly just the mistake side of conflict vs. mistake theory though, which seems relatively uncontroversial, at least on LW.

Eliezer has pointed out some of those mistakes in the context of AGI and other areas, but so have many others (Scott Alexander, Zack himself, etc.), in a variety of areas (housing policy, education policy, economics, etc.). Such explanations often come (implicitly or explicitly) with a call for others to change their behavior if they accept such arguments, but Eliezer doesn't seem particularly strident in making such calls, compared to e.g. ordinary politicians, public policy advocates, or other rationalists.

Note, I'm not claiming that Eliezer does not hold some object-level views considered weird or extreme by most, e.g. that sufficiently intelligent AGI could take over the world, o... (read more)

2Martin Randall7mo
I would summarize this as saying: 1. Zach (et al) are exaggerating how unusual/extreme/weird Yudkowsky's positions are. 2. Zach (et al) are exaggerating how much Yudkowsky's writings are an explicit call to action. 3. To the extent that Yudkowsky has unusual positions and calls for actions, you think he's mostly correct on the merits. Of these, I'd like to push on (1) a bit. However, I think this would probably work better as a new top-level post (working title "Yudkowsky on Yudkowsky"). To give a flavor, though, and because I'm quite likely to fail to write the top-level post, here's an example. Shah and Yudkowsky on alignment failures. I encourage you to follow the link to the rest of the conversation, which relates this to alignment work. So we have this phenomenon where one factor in humanity going extinct is that people don't listen enough to Yudkowsky and his almost unique ability to speak concretely. This also supports (2) above - this isn't an explicit call to action, he's just observing a phenomenon. A (1) take here is that the quote is cherry-picked, a joke, or an outlier, and his overall work implies a more modest self-assessment. A (3) take is that he really is almost uniquely able to speak concretely. My take (4) is that his self-assessment is positively biased. I interpret Zack's "break-up" with Yudkowsky in the opening post as moving from a (3) model to a (4) model, and encouraging others to do the same.
0Said Achmiz7mo
It would hardly have been effective for Zack to make the offer via PM! In essence, you’re asking for Zack (or anyone) to act ineffectively, in order that you may avoid the inconvenience of having to publicly defend your claims against public disapprobation!
6Max H7mo
Financial incentives are ineffecitve if offered privately? That's perhaps true for me personally at the level Zack is offering, but seems obviously false in general. Offering money in private is maybe less effective than exerting social pressure in public (via publicly offering financial incentives, or other means). I merely pointed out that the two are entangled here, and that the pressure aspect is the one that actually motivates me in this case. I request that future such incentives be applied in a more disentangled way, but I'm not asking Zack to refrain from applying social pressure OR from offering financial incentives, just asking that those methods be explicitly disentangled. Zack is of course not obliged to comply with this request, but if he does not do so, I will continue flagging my actual motivations explicitly.
4Said Achmiz7mo
The financial incentive was clearly ineffective in this case, when offered publicly, so this is a red herring. (Really, who would’ve expected otherwise? $40, for the average Less Wrong reader? That’s a nominal amount, no more.) No, what was effective was the social pressure—as you say! Disentangling these things as you describe would reduce the force of the social pressure, however.
2Max H7mo
I probably would have also responded if Zack had sent his comment verbatim as a PM. Maybe not as quickly or in exactly the same way, e.g. I wouldn't have included the digression about incentives. But anyway, I did in fact respond, so I don't think it's valid to conclude much about what would have been "clearly ineffective" in a counterfactual. One other point that you seem to be missing is that it's possible to exert social pressure via private channels, with or without financial incentives (and I'm also fine with Zack or others trying this, in general). Private might even be more effective at eliciting a response, in some cases.

In retrospect, I feel guilty about impulsively mixing the "cheerful price" mechanism and the "social pressure" mechanism. I suspect Said is right that the gimmick of the former added to the "punch" of the latter, but at the terrible cost of undermining the integrity of the former (it's supposed to be cheerful!). I apologize for that.

2Martin Randall7mo
To highlight something Zack said (three times!): I didn't make any negative updates, but I don't consider myself a mere worm. If any reader does consider themselves a mere worm, then maybe they should update appropriately. And possibly play "Gonna Get Over You" on loop.

Also, Scott had asked me if it wouldn't be embarrassing if the community solved Friendly AI and went down in history as the people who created Utopia forever, and I had rejected it because of gender stuff.

This is a confusing line, it flags for me. Insofar as there was a local political lie, then it seems quite high-integrity to not lie and say you believe it, even in exchange for political power. This is the precise question that one should not say yes to — why care about the truth if you can instead have power?

This also makes me wonder somewhat if the description of events here is inaccurate and Scott would not characterize what he said that way.

Scott's position was that there wasn't a political lie, because using a different category definition isn't lying. My position is that "using a different category definition isn't lying" is itself a political distortion (setting aside as uninteresting whether it's technically "lying"), a sufficiently egregious one as to invalidate the legitimacy of "the community". Scott was urging me to not be so quick to give up on the community just because I was (in his view) triggered by culture war material. Specifically, in an 17 March 2019 1:37 a.m. email, I had written: In his reply of 17 March 2019 at 3:54 a.m., Scott quoted that passage and said: (There's more, but I'm not sure it's appropriate to dump the whole email in this comment.)
4Ben Pace7mo
That is an endearing response, and does change my understanding of what he meant — rather, Scott is saying that just because you are losing one political issue you care about doesn't mean you should quit on a community with lots of other great things going for it.  (I am personally confused about the exact lines for sharing 1-1 text exchanges publicly and I wouldn't personally have shared it without permission in your shoes. I don't mean by this to say I think you necessarily shouldn't have.)

I think one key point that's missing in this otherwise very thorough post is that there is a larger context and historical progression of word definitions and categories about sex, sexuality, and gender. Those categories worked in many ways but were imperfect in others. Creating new words makes sense in principle, like fish vs dag, but it's even harder to get people to adopt new words consistently than it is to change definitions. In any case, this makes any attempt to alter those definitions and categories a matter of public discourse, in which you get ab... (read more)

Admittedly I skimmed large portions of that, but I'd like to take a crack at bridging some of that inferential distance with a short description of the model I've been using, whereby I keep all the concerns you brought up straight but also don't have to choke on pronouns.

Categories of Men and Women are useful in a wide variety of areas and point at a real thing.  There's a region in the middle these categories overlap and lack clean boundaries - while both genetics and birth sex are undeniable and straightforward fact in almost all cases (~98% IIRC), ... (read more)

Sounds like a rough experience. Hope you're feeling better these days.

I can see why you might have an allergic reaction, however,  your criticisms feel a bit overstated and as not quite hitting the nail on the head, at least from my perspective.

I agree with Scott et al. regarding the validity of the language game where we include transwomen as women. I recommend reading at least a bit of Wittgenstein, if you've never done this.

This said, you are correct to suspect that there is something funky going on, where people are insisting on the use of this la... (read more)

I would love to read your "twelve short stories about language" If you feel like publishing them.

Sure; it's not worth adapting into a post because I've already made most of these points elsewhere, but I can put up the email as an ancillary page.

2Yoav Ravid7mo

It's not exactly the point of your story, but...

Probably the most ultimately consequential part of this meeting was Michael verbally confirming to Ziz that MIRI had settled with a disgruntled former employee, Louie Helm, who had put up a website slandering them.

Wait, that actually happened? Louie Helm really was behind MIRICult? The accusations weren't just...Ziz being Ziz? And presumably Louie got paid out since why would you pay for silence if the accusations weren't at least partially true...or if someone were to go digging, they'd find things even ... (read more)

Wait, that actually happened? Louie Helm really was behind MIRICult? The accusations weren't just...Ziz being Ziz? And presumably Louie got paid out since why would you pay for silence if the accusations weren't at least partially true...or if someone were to go digging, they'd find things even more damning?

Louie Helm was behind MIRICult (I think as a result of some dispute where he asked for his job back after he had left MIRI and MIRI didn't want to give him his job back). As far as I can piece together from talking to people, he did not get paid out, but there was a threat of a lawsuit which probably cost him a bunch of money in lawyers, and it was settled by both parties signing an NDA (which IMO was a dumb choice on MIRI's part since the NDA has made it much harder to clear things up here). 

Overall I am quite confident that he didn't end up with more money than he started with after the whole miricult thing. Also, I don't think the accusations are "at least partially true". Like it's not the case that literally every sentence of the miricult page is false, but basically all the salacious claims are completely made up.

So, I started off with the idea that Ziz's claims about MIRI were frankly crazy...because Ziz was pretty clearly crazy (see their entire theory of hemispheres, "collapse the timeline," etc.) so I marked most of their claims as delusions or manipulations and moved on, especially since their recounting of other events on the page where they talked about miricult (which is linked in OP) comes off as completely unhinged.

But Zack confirming this meeting happened and vaguely confirming its contents completely changes all the probabilities. I now need to go back and recalculate a ton of likelihoods here starting from "this node with Vassar saying this event happened."

From Ziz's page:

LessWrong dev Oliver Habryka said it would be inappropriate for me to post about this on LessWrong, the community’s central hub website that mostly made it. Suggested me saying this was defamation.

It's obviously not defamation since Ziz believes its true.

<insert list of rationality community platforms I’ve been banned from for revealing the statutory rape coverup by blackmail payout with misappropriated donor funds and whistleblower silencing, and Gwen as well for protesting that fact.>

Inasmuch a... (read more)

It's obviously not defamation since Ziz believes its true.

We're veering dangerously close into dramaposting here, but just FYI habyka has already contested that they ever said this. I would like to know if the ban accusations are true, though.

Can confirm that I don't believe I said anything about defamation, and in general continue to think that libel suits are really quite bad and do not think they are an appropriate tool in almost any circumstance.

We banned some of them for three months when they kept spamming the CFAR AMA a while ago: https://www.lesswrong.com/posts/96N8BT9tJvybLbn5z/we-run-the-center-for-applied-rationality-ama?commentId=5W86zzFy48WiLcSg6

I don't think we ever took any other moderation action, though I would likely ban then again, since like, I really don't want them around on LessWrong and they have far surpassed thresholds for acceptable behavior.

I would not ban anyone writing up details of the miricult stuff (including false accusations, and relatively strong emotions). Indeed somewhat recently I wrote like 3-5 pages of content here on a private Facebook thread with a lot of rationality community people on it. I would be up for someone extracting the parts that seem shareable more broadly. Seems good to finally have something more central and public.

The lack of comment from Eliezer and other MIRI personnel had actually convinced me in particular that the claims were true. This is the first I heard that there's any kind of NDA preventing them from talking about it.

The lack of comment from Eliezer and other MIRI personnel had actually convinced me in particular that the claims were true. This is the first I heard that there's any kind of NDA preventing them from talking about it.

I think this means you had incorrect priors (about how often legal cases conclude with settlements containing nondisparagement agreements.)

They can presumably confirm whether or not there is a nondisparagement agreement and whether that is preventing them from commenting though right
You can confirm this if you're aware that it's a possibility, and interpret carefully-phrased refusals to comment in a way that's informed by reasonable priors. You should not assume that anyone is able to directly tell you that an agreement exists.
Why not? Is it common for NDAs/non-disparagement agreements to also have a clause stating the parties aren’t allowed to tell anyone about it? I’ve never heard of this outside of super-injunctions which seems a pretty separate thing

Absolutely common.  Most non-disparagement agreements are paired with non-disclosure agreements (or clauses in the non-disparagement wording) that prohibit talking about the agreement, as much as talking about the forbidden topics.  

It's pretty obvious to lawyers that "I would like to say this, but I have a legal agreement that I won't" is equivalent, in many cases, to saying it outright.

my boilerplate severance agreement at a job included an NDA that couldn't be acknowledged (I negotiated to change this).
6David Hornbein7mo
"he didn't end up with more money than he started with after the whole miricult thing" is such a weirdly specific way to phrase things. My speculation from this is that MIRI paid Helm or his lawyers some money, but less money than Helm had spent on the harassment campaign, and among people who know the facts there is a semantic disagreement about whether this constitutes a "payout". Some people say something like "it's a financial loss for Helm, so game-theoretically it doesn't provide an incentive to blackmail, therefore it's fine" and others say something like "if you pay out money in response to blackmail, that's a blackmail payout, you don't get to move the bar like that". I would appreciate it if someone who knows what happened can confirm or deny this. (AFAICT the only other possibility is that somewhere along the line, at least one of the various sources of contradictory-sounding rumors was just lying-or-so-careless-as-to-be-effectively-lying. Which is very possible, of course, that happens with rumors a lot.)
I sadly don't know the answer to this. To open up the set of possibilities further, I have heard rumors that maybe Louie was demanding some donations back he had given MIRI previously, and if that happened, that might also complicate the definition of a "payout".  I don't understand the logic of this. Does seem like game-theoretically the net-payout is really what matters. What would be the argument for something else mattering?

I don't understand the logic of this. Does seem like game-theoretically the net-payout is really what matters. What would be the argument for something else mattering?


BEORNWULF: A messenger from the besiegers!

WIGMUND: Send him away. We have nothing to discuss with the norsemen while we are at war.

AELFRED: We might as well hear them out. This siege is deadly dull. Norseman, deliver your message, and then leave so that we may discuss our reply.

MESSENGER: Sigurd bids me say that if you give us two thirds of the gold in your treasury, our army will depart. He reminds you that if this siege goes on, you will lose the harvest, and this will cost you more dearly than the gold he demands.

The messenger exits.

AELFRED: Ah. Well, I can’t blame him for trying. But no, certainly not.

BEORNWULF: Hold on, I know what you’re thinking, but this actually makes sense. When Sigurd’s army first showed up, I was the first to argue against paying him off. After all, if we’d paid right at the start, then he would’ve made a profit on the attack, and it would only encourage more. But the siege has been long and hard for us both. If we accept this deal *now*, he’ll take a net loss. We’ve spent most of th... (read more)

4Eli Tyre8d
This was fantastic, and you should post it as a top level post.
7Said Achmiz7mo
Suppose that Alice blackmails me and I pay her $1,000,000. Alice has spent $1,500,000 on lawyers in the process of extracting this payout from me. The result of this interaction is that I have lost $1,000,000, while Alice has lost $500,000. (Alice’s lawyers have made a lot of money, of course.) Bob hears about this. He correctly realizes that I am blackmailable. He talks to his lawyer, and they sign a contract whereby the lawyer gets half of any payout that they’re able to extract from me. Bob blackmails me and I pay him $1,000,000. Bob keeps $500,000, and his lawyer gets the other $500,000. Now I have again lost $1,000,000, while Bob has gained $500,000. (How might this happen? Well, Bob’s lawyer is better than Alice’s lawyers were. Bob’s also more savvy, and knows how to find a good lawyer, how to negotiate a good contract, etc.) That is: once the fact that you’re blackmailable is known, the net payout (taking into account expenditures needed to extract it from you) is not relevant, because those expenditures cannot be expected to hold constant—because they can be optimized. And the fact that (as is now a known fact) money can be extracted from you by blackmail, is the incentive to optimize them.
Note that a lawyer who participated in that would be committing a crime. In the case of LH, there was (by my unreliable secondhand understanding) an employment-contract dispute and a blackmail scheme happening concurrently. The lawyers would have been involved only in the employment-contract dispute, not in the blackmail, and any settlement reached would have nominally been only for dropping the employment-contract-related claims. An ordinary employment dispute is a common-enough thing that each side's lawyers would have experience estimating the other side's costs at each stage of litigation, and using those estimates as part of a settlement negotiation. (Filing lawsuits without merit is sometimes analogized to blackmail, but US law defines blackmail much more narrowly, in such a way that asking for payment to not allege statutory rape on a website is blackmail, but asking for payment to not allege unfair dismissal in a civil court is not.)
-7Said Achmiz7mo
Sure, I agree that this is true. But as long as you run a policy that is sensitive to your counterparty optimizing expenditures, I think this no longer holds?  Like, I think in-general a policy I have for stuff like this is something like "ensure the costs to my counterparty were higher than their gains", and then take actions appropriate to the circumstances. This seems like it wouldn't allow for the kind of thing you describe above (and also seems like the most natural strategy for me in blackmail cases like this).
0Said Achmiz7mo
What would this look like…? It doesn’t seem to me to be the sort of thing which it’s at all feasible to do in practice. Indeed it’s hard to see what this would even mean; if the end result is that you pay out sometimes and refuse other times, all that happens is that external observers conclude “he pays out sometimes”, and keep blackmailing you. Actions like what? Like, let’s say that you’re MIRI and you’re being blackmailed. You don’t know how much your blackmailer is paying his lawyers (why would you, after all?). What do you do? And for all you know, the contract your blackmailer’s got with his lawyers might be as I described—lawyers get some percent of payout, and nothing if there’s no payout. What costs do you impose on the blackmailer? In short, I think the policy you describe is usually impossible to implement in practice. ---------------------------------------- But note that this is all tangential. It’s only relevant to the original question (about MIRI) if you claim that MIRI were attempting to implement a policy such as you describe. Do you claim this? If so, have you any evidence?
I mean, the policy here really doesn't seem very hard. If you do know how much your opposing party is paying their lawyers, you optimize that hard. If you don't know, you make some conservative estimate. I've run policies like this in lots of different circumstances, and it's also pretty close to common sense as a response to blackmail and threats. I've asked some MIRI people this exact question and they gave me this answer, with pretty strong confidence and relatively large error margins.
0Said Achmiz7mo
I have to admit that I still haven’t the faintest clue what concrete behavior you’re actually suggesting. I repeat my questions: “What would this look like…?” and “Actions like what?” (Indeed, since—as I understand it—you say you’ve done this sort of thing, can you give concrete examples from those experiences?) Alright, and what has this looked like in practice for MIRI…?
It means you sit down, you make some fermi estimates of how much benefit the counterparty could be deriving from this threat/blackmail, then you figure out what you would need to do to roughly net out to zero, then you do those things. If someone asks you what your policy is, you give this summary.  In every specific instance this looks different. Sometimes this means you reach out to people they know and let them know about the blackmailing in a way that would damage their reputation. Sometimes it means you threaten to escalate to a legal battle where you are willing to burn resources to make the counterparty come out in the red. 
-2Said Achmiz7mo
Why would you condition any of this on how much they’re spending? And how exactly would you calibrate it to impose a specific amount of cost on the blackmailer? (How do you even map some of these things to monetary cost…?)

So Yudkowsky doesn’t have a workable alignment plan, so he decided to just live off our donations, running out the clock.

Er… is anyone actually claiming this? This is quite the accusation, and if it were being made, I’d want to see some serious evidence, but… is it, in fact, being made?

(It does seem like OP is saying this, but… in a weird way that doesn’t seem to acknowledge the magnitude of the accusation, and treats it as a reasonable characterization of other claims made earlier in the post. But that doesn’t actually seem to make sense. Am I misreading, or what?)

The second half (just live off donations?) is also my interpretation of OP. The first half (workable alignment plan?) is my own intuition based on MIRI mostly not accomplishing anything of note over the last decade, and...

MIRI & company spent a decade working on decision theory which seems irrelevant if deep learning is the path (aside: and how would you face Omega if you were the sort of agent that pays out blackmail?). Yudkowsky offers to bet Demis Hassabis that Go won't be solved in the short term. They predict that AI will only come from GOFAI AIXI-likes with utility functions that will bootstrap recursively. They predict fast takeoff and FOOM.


The answer was actually deep learning and not systems with utility functions. Go gets solved. Deep Learning systems don't look like they FOOM. Stochastic Gradient Descent doesn't look like it will treacherous turn. Yudkowsky's dream of building the singleton Sysop is gone and was probably never achievable in the first place.

People double down with the "mesaoptimizer" frame instead of admitting that it looks like SGD does what it says on the tin. Yudkowsky goes on a doom media spree. They advocate for a regulatory regime that wo... (read more)

Deep Learning systems don't look like they FOOM. Stochastic Gradient Descent doesn't look like it will treacherous turn.

I think you've updated incorrectly, by failing to keep track of what the advance predictions were (or would have been) about when a FOOM or a treacherous turn will happen.

If foom happens, it happens no earlier than the point where AI systems can do software-development on their own codebases, without relying on close collaboration with a skilled human programmer. This point has not yet been reached; they're idiot-savants with skill gaps that prevent them from working independently, and no AI system has passed the litmus test I use for identifying good (human) programmers. They're advancing in that direction pretty rapidly, but they're unambiguously not there yet.

Similarly, if a treacherous turn happens, it happens no earlier than the point where AI systems can do strategic reasoning with long chains of inference; this again has an idiot-savant dynamic going on, which can create the false impression that this landmark has been reached, when in fact it hasn't.

They predict that AI will only come from GOFAI AIXI-likes with utility functions that will bootstrap recursively.

Do you have a link for this prediction? (Or are you just referring to, e.g., Eliezer’s dismissive attitude toward neural networks, as expressed in the Sequences?)

They predict fast takeoff and FOOM. … Deep Learning systems don’t look like they FOOM.

It’s not clear that deep learning systems get us to AGI, either. There doesn’t seem to be any good reason to be sure, at this time, that we won’t get “fast takeoff and FOOM”, does it? (Indeed it’s my understanding that Eliezer still predicts this. Or is that false?)

Stochastic Gradient Descent doesn’t look like it will treacherous turn.

It… doesn’t? What do you mean by this? I’ve seen no reason to be optimistic on this point—quite the opposite!

So what am I supposed to extract from this pattern of behaviour?

I think that at least some of the things you take to be obvious conclusions that Eliezer/MIRI should’ve drawn, are in fact not obvious, and some are even plausibly false.

You also make some good points. But there isn’t nearly so clear a pattern as you suggest.

It… doesn’t? What do you mean by this? I’ve seen no reason to be optimistic on this point—quite the opposite!

As I understand the argument, it goes like the following:

  1. For evolutionary methods, you can't predict the outcome of changes before they're made, and so you end up with 'throw the spaghetti at the wall and see what sticks'. At some point, those changes accumulate to a mind that's capable of figuring out what environment it's in and then performing well at that task, so you get what looks like an aligned agent while you haven't actually exerted any influence on its internal goals (i.e. what it'll do once it's out in the world).
  2. For gradient-descent based methods, you can predict the outcome of changes before they're made; that's the gradient part. It's overall less plausible that the system you're building figures out generic reasoning and then applies that generic reasoning to a specific task, compared to figuring out the specific reasoning for the task that you'd like solved. Jumps in the loss look more like "a new cognitive capacity has emerged in the network" and less like "the system is now reasoning about its training environment".

Of course, that "overall less plausible" ... (read more)

It's pretty easy to find reasons why everything will hopefully be fine, or AI hopefully won't FOOM, or we otherwise needn't do anything inconvenient to get good outcomes.  It's proving considerably harder (from my outside the field view) to prove alignment, or prove upper bounds on rate of improvement, or prove much of anything else that would be cause to stop ringing the alarm.

FWIW I'm considerably less worried than I was when the Sequences were originally written.  The paradigms that have taken off since do seem a lot more compatible with straightforward training solutions that look much less alien than expected.  There are plausible scenarios where we fail at solving alignment and still get something tolerably human shaped, and none of those scenarios previously seemed plausible.  That optimism just doesn't take it under the stop worrying threshold.

This doesn't seem consistent to me with MIRI having run a research program with a machine learning focus. IIRC (I don't have links handy but I'm pretty sure there were announcements made) that they wound up declaring failure on that research program, and it was only after that happened that they started talking about the world being doomed and there not being anything that seemed like it would work for aligning AGI in time.
4Said Achmiz7mo
Incidentally, I don’t think I’m willing to trust a hearsay report on this without confirmation. Do you happen to have any links to Eliezer making such a claim in public? Or, at least, any confirmation that the cited comment was made as described?
Closest thing I'm aware of is that at the time of the AlphaGo matches he bet people at like 3:2 odds, favourable to him, that Lee Sedol would win. Link here
My interpretation of various things Michael and co. have said is "Effective altruism in general (and MIRI / AI-safety in particular) is a memeplex optimizing to extract resources from people in a fraudulent way, which does include some degree of "straightforward fraud the way most people would interpret it", but also, their worldview includes generally seeing a lot of things as fraudulent in ways/degrees that common parlance wouldn't generally mean.  I predict they wouldn't phrase things the specific way iceman phrased it (but, not confidently).  I think Jessicata's The AI Timelines Scam is a pointer to the class of thing they might tend to mean. Some other relevant posts including Can crimes be discussed literally? and Approval Extraction Advertised as Production.

Yes, this is all reasonable, but as a description of Eliezer’s behavior as understood by him, and also as understood by, like, an ordinary person, “doesn’t have a workable alignment plan, so he decided to just live off our donations, running out the clock” is just… totally wrong… isn’t it?

That is, that characterization doesn’t match what Eliezer sees himself as doing, nor does it match how an ordinary person (and one who had no particular antipathy toward Eliezer, and thus was not inclined to describe his behavior uncharitably, only impartially), speaking in ordinary English, would describe Eliezer as doing—correct?

Yes, that is my belief. (Sorry, should have said that concretely). I'm not sure what an 'ordinary person' should think because 'AI is dangerous' has a lot of moving pieces and I think most people are (kinda reasonably?) epistemically helpless about the situation. But I do think iceman's summary is basically obviously false, yes. 

My own current belief is "Eliezer/MIRI probably had something-like-a-plan around 2017, probably didn't have much of a plan by 2019 that Eliezer himself believed in, but, 'take a break, and then come back to the problem after thinking about it' feels like a totally reasonable thing to me to do". (and meanwhile there were still people at MIRI working on various concrete projects that at least at the people involved thought were worthwhile). 

i.e. I don't think MIRI "gave up" 

I do think, if you don't share Eliezer's worldview, it's a reasonable position to be suspicious and hypothesize that MIRI's current activities are some sort of motivated-cognition-y cope, but I think confidently asserting that seems wrong to me. (I also think there's a variety of worldviews that aren't Eliezer's exact worldview that make his actions still pretty coherent, and if I think it's a pretty sketchy position to assert all those nearby-worldviews are so obviously wrong as to make 'motivated cope/fraud' your primary frame)

(fwiw my overall take is that I think there is something to this line of thinking. My general experience is that when Michael/Benquo/Jessica say "something is fishy here", there often turns out to be something I agree is fishy in some sense, but I find their claims overstated and running with some other assumptions I don't believe that make the thing seem worse to them than it does to me)
3Martin Randall7mo
For the first part, Yudkowsky has said that he doesn't have a workable alignment plan, and nobody does, and we are all going to die. This is not blameworthy, I also do not have a workable alignment plan. For the second part, he was recently on a sabbatical, presumably funded by prior income that was funded by charity, so one might say he was living off donations. Not blameworthy, I also take vacations. For the third part, everyone who thinks that we are all going to die is in some sense running out the clock, be they disillusioned transhumanists or medieval serfs. Hopefully we make some meaning while we are alive. Not blameworthy, just the human condition. Whether MIRI is a good place to donate is a very complicated question, but certainly "no" is a valid answer for many donors.
6Said Achmiz7mo
These are good points. But it does seem like what @iceman meant by the bit that I quoted at least has connotations that go beyond your interpretation, yes? Sure. I haven’t donated to MIRI in many years, so I certainly wouldn’t tell anyone else to do so. (It’s not my understanding that MIRI is funding constrained at this time. Can anyone confirm or disconfirm this?)
8Martin Randall7mo
What accusation do you see in the connotations of that quote? Genuine question, I could guess but I'd prefer to know. Mostly the subtext I see from iceman is disappointment and grief and anger and regret. Which are all valid emotions for them to feel. I think a lot of what might have been serious accusations in 2019 are now common knowledge, eg after Bankless, Death with Dignity, etc. From the Bankless interview: (Edited to fix misquote)

So, just to clarify, “serious accusation” is not a phrase that I have written in this discussion prior to this comment, which is what the use of quotes in your comment suggests. I did write something which has more or less the same meaning! So you’re not mis-ascribing beliefs to me. But quotes mean that you’re… quoting… and that’s not the case here.

Anyway, on to the substance:

What “serious accusation” do you see in the connotations of that quote?

And the quote in question, again, is:

So Yudkowsky doesn’t have a workable alignment plan, so he decided to just live off our donations, running out the clock.

The connotations are that Eliezer has consciously chosen to stop working on alignment, while pretending to work on alignment, and receiving money to allegedly work on alignment but instead just not doing so, knowing that there won’t be any consequences for perpetrating this clear and obvious scam in the classic sense of the word, because the world’s going to end and he’ll never be held to account.

Needless to say, it just does not seem to me like Eliezer or MIRI are doing anything remotely like that. Indeed I don’t think anyone (serious) has even suggested that they’re doing any... (read more)

-1Martin Randall7mo
I edited out my misquote, my apologies. I think emotions are not blame assignment tools, and have other (evolutionary) purposes. A classic example is a relationship break-up, where two people can have strong emotions even though nobody did anything wrong. So I do not interpret emotions as accusations in general. It sounds like you have a different approach, and I don't object to that. For example, grief over the loss of the $100k+ donation. Donated with the hope that it would reduce extinction risk, but with the benefit of hindsight the donor now thinks that the marginal donation had no counterfactual impact. It's not blameworthy because no researcher can possibly promise that a marginal donation will have a large counterfactual impact, and MIRI did not so promise. But a donor can still grieve the loss without someone being to blame. For example, anger that Yudkowsky realized he had no workable alignment plan, in his estimation, in 2015 (Bankless), and didn't share that until 2022 (Death with Dignity). This is not blameworthy because people are not morally obliged to share their extinction risk predictions, and MIRI has a clear policy against sharing information by default. But a donor can still be angry that they were disadvantaged by known unknowns. I hope these examples illustrate that a non-accusatory interpretation is sensical, even if you don't think it plausible. There's a later comment from iceman, which is probably the place to discuss what iceman is alleging:
2Said Achmiz7mo
You misunderstand. I’m not “interpret[ing] emotions as accusations”; I’m simply saying that emotions don’t generally arise for no reason at all (if they do, we consider that to be a pathology!). So, in your break-up example, the two people involved of course have strong emotions—because of the break-up! On the other hand, it would be very strange indeed to wake up one day and have those same emotions, but without having broken up with anyone, or anything going wrong in your relationships at all. And likewise, in this case: Well, it’s bit dramatic to talk of “grief” over the loss of money, but let’s let that pass. More to the point: why is it a “loss”, suddenly? What’s happened just now that would cause iceman to view it as a “loss”? It’s got to be something in Zack’s post, or else the comment is weirdly non-apropos, right? In other words, the implication here is that something in the OP has caused iceman to re-examine the facts, and gain a new “benefit of hindsight”. But that’s just what I’m questioning. I do not read Eliezer’s statements in the Bankless interview as saying that he “realized he had no workable alignment plan” in 2015. As far as I know, at no time since starting to write the Sequences has Eliezer ever claimed to have, or thought that he had, a workable alignment plan. This has never been a secret, nor is it news, either to Eliezer in 2015 or to the rest of us in 2022. They do not. Well, you can see my response to that comment.
4Eli Tyre8d
FWIW, my current understanding is that this inference isn't correct. I think it's common practice to pay settlements to people, even if their claims are fallacious, since having an extended court battle is sometimes way worse.

Jessica explained what she saw as the problem with this. What Ben was proposing was creating clarity about behavioral patterns. I was saying that I was afraid that creating such clarity is an attack on someone. But if so, then my blog was an attack on trans people. What was going on here?

Socially, creating clarity about behavioral patterns is construed as an attack and can make things worse for someone. For example, if your livelihood is based on telling a story about you and your flunkies being the only sane truthseeking people in the world, then me d

... (read more)

It’s easy to sayX is a Y” for arbitrary X and Y if the stakes demand it, but that’s not the same thing as using that concept of Y internally as part of your world-model.

So don't use it internally and just say it?

After reading it I still don't get what's the connection between the object-level question and abstract epistemology is. Yes, some concepts are more useful for figuring out reality. So is self-modifying to not care about joy. It is a question of utility so what actual bad thing is supposed to happen in reality if we started to use words that are four symbols longer?