(Context for the reader: Gabriel reached out to me a bit more than a year ago to ask me to delete a few comments on this post by Jacob Hilton, who was working at OpenAI at the time. I referenced this in my recent dialogue with Olivia, where I quoted an email I sent to Eliezer about having some concerns about Conjecture partially on the basis of that interaction. We ended up scheduling a dialogue to talk about that and related stuff.)


You were interested in a dialogue, probably somewhat downstream of my conversation with Olivia and also some of the recent advocacy work you've been doing.

Gabriel Alfour


Two things I'd like to discuss:

  1. I was surprised by you (on a recent call) stating that you found LessWrong to be a good place for the Lying is Cowardice not Strategy post.
  2. I think you misunderstand my culture. Especially around civility, and honesty.

Yeah, I am interested in both of the two things. I don't have a ton of context on the second one, so am curious about hearing a bit more.

Gabriel's principles for moderating spaces

Gabriel Alfour

About the second one:

  1. I think people should be free to be honest in their private spaces.
  2. I think people should be free to create their own spaces, enact their vision, and to the extent you participate in the space, you should help them.
  3. If you invite someone to your place, you ought to not do things that would have caused them not to come if they knew ahead of time.

So, about my post and the OAI thing:

  • By 3, I feel ok writing my post on my blog. I feel ok with people dissing OAI on their blogs, and on their posts if you are ok with it (I take you as proxy for "person with vision for LW")
  • I feel much less ok about ppl dissing OAI on their own blog posts on LW. I assume that if they knew ahead of time, they would have been much less likely to participate.
  • I would have felt completely ok if you told me "I don't think your post has the tone required for LW, I want less adversariality / less bluntness / more charitability / more ingroupness"

How surprising are these to you?

Gabriel Alfour

Meta-comment:  Would have been great to know that the thing with OAI shocked you enough to send a message to Eliezer about it. 

Would have been much better from my point of view to talk about it publicly, and even have a dialogue/debate like this if you were already opened to it.

If you were already open to it, I should have offered. (I might have offered, but can't remember.)


Ah, ok. Let me think about this a bit.

I have thoughts on the three principles you outline, but I think I get the rough gist of the kind of culture you are pointing to without needing to dive into that.

I think I don't understand the "don't do things that will make people regret they came" principle. Like, I can see how it's a nice thing to aspire to, but if you have someone submit a paper to a journal, and then the paper gets reviewed and rejected as shoddy, then like, they probably regret submitting to you, and this seems good. 

Similarly if I show up in a jewish community gathering or something, and I wasn't fully aware of all of the rules and guidelines they follow and this make me regret coming, then that's sad, but it surely wouldn't have been the right choice for them to break their rules and guidelines just because I was there.

Gabriel Alfour

I do think I don't really understand the "don't do things that will make people regret they came" principle. Like, I can see how it's a nice thing to aspire to, but if you have someone submit a paper to a journal, and then the paper gets reviewed and rejected as shoddy, then like, they probably regret submitting to you, and this seems good. 

  • You mention 'the paper gets reviewed and rejected', but I don't think the comments on OAI post was much conditioned on the quality of the post. If I recall correctly, the tone was more "how does OAI dare talking about safety given they shorten timelines".
  • If the goal was to get OAI to not come to LW at all, I would have actually been ok with it.
    • But I don't know if that was the goal, and this being unclear irks me.
    • If that was the goal, I am quite sure it could be made more explicit ahead of time, rather than just waiting for insults.
  • If the journal had a way to make it clear to the paper writer, ahead of time, that they would have been rejected, I think it would be good of them to write it.
Gabriel Alfour

Also, important context: there was a possibility back then on LW, and it was not clear under which conditions it would have been offered, to freely moderate your own comment section.

I think it was called "reign of terror: I remove what I dislike" or something. So from my point of view, it was fitting the norms.


To be clear, I generally am pretty happy about having people from OpenAI show up, and my sense is also that Jacob who made the post thought a bunch of the discussion made sense and was glad to have it happen in public rather than in-private (and he himself didn't want to take the comments down).

Gabriel Alfour

(To be clear, I cared much more about taking the insults down. The rest of the comments were ok, and if I recall correctly, the authors edited the insults out themselves once they reflected about them.)

LessWrong as a post-publication peer-reviewed journal


There is a general principle I consider on LessWrong for all types of content on the site, which is something like "whenever someone makes a claim on LessWrong, if it's clearly wrong, it must be possible for a response to that claim to become easily findable in the UI".

Gabriel Alfour

"whenever someone makes a claim on LessWrong, if it's clearly wrong, it must be possible for a response to that claim to become easily findable in the UI"

I love this principle. (And I love Community Notes for similar reason.)

I truly dislike the "less attention to response to critique"-pattern that favours making a lot of low-effort critiques, because the debunking won't receive nearly as much attention.

I would also love to have a "Hottest questions to [X] right now", to avoid the pattern where someone gets swarmed with questions, and then it looks like they don't have a response to anything.


In a broader sense, I think LessWrong definitely has a lot of the components of a journal, with one of the biggest differences being that everything on LessWrong is post-publication reviewed not pre-publication reviewed. Many fields have already been moving towards post-publication review (like ML or physics where it's very standard practice for people to upload things to arxiv before you get accepted to a journal).

So my reaction to removing critical content and analysis feels somewhat similar to asking a journal to skip peer review for a publication, which like, is not completely unheard of but clearly violates some important principles and requires some pretty high standard of evidence to be met.

Gabriel Alfour

Interesting, this is a thing that I did not get, and I understand more now why you are reticent to even moderate insults that are coupled with good analysis.

Not necessarily because you want more insults, but because:

  1. Insults coupled with good analyses are rare.
  2. The slope is more slippery toward moderating out / toning down embarrassing analysis than having more insults in good analyses.

Does this seem correct?

What are "insults"?


I think I personally have a pretty strong ick-reaction to calling people's concerns in the space "insults". 

Like, can "insults" be accurate? In the pre-FTX days I thought Sam was pretty sketch in a bunch of ways. I think calling him "a fraud" or a "sociopath" would have definitely been perceived as an insult, but man, I think it would have just been correct, and I think conversations were those hypotheses were explicitly raised would have improved things a lot.

And I get a bit of a vibe of some kind of honor that is present in many institutions which involves implicitly suppressing criticism and preventing people from grappling with the negative effects of their actions, by calling things that criticize them or others "insults".

Gabriel Alfour

Insults can be accurate! I think there was a post on LW that said that "yomamma fat" applied to 60% of UK children.

Insults can even be part of nice norms: I like some places where it's normal for ppl to insult each other, it's fun.

Insults can also be useful: sometimes, the relevant topic is the character of a person or of an institution.

But if OAI comes to talk about a safety plan, and someone makes a comment not about the safety plan but "OAI bad", I am like "eeee, if you want to say 'OAI bad', write your post about it".

Gabriel Alfour

To be clear, I just wrote a post "Lying is Cowardice, not Strategy", and I have one in the pipeline called "For Civilization and Against Niceness" that is a lot about the pattern that you just mentioned.

This is not the thing that bothers and bothered me.

Gabriel Alfour

(Can we look up that comment? Can't remember it, and looks like you only have a vague memory too.)

It's actually worse than in my memories.

Here is the comment.

It feels extremely bait and switchy to me: "I'm just raising the hypothesis that OAI might be PR-washing" vs "I write a long RP of OAI as 'Phillips Morris International'"

This sounds to me much more like "OAI bad" than "OAI has vested interests in non-safety, as shown by [X]. Even though their charter talks at length about safety, they are clearly safety-washing, as shown by [Y]." or "Here are mischaracterisations in the post".


The comment that you complained about had pretty direct and immediate relevant to the post at hand. It compared OAI to a tobacco company if I recall. A comparison that raises a hypothesis that still seems pretty apt.

Some concrete quotes that caused you to want me to delete stuff at the time: 

You seem confused about the difference between "paying lip service to X" and "actually trying to do X".

To be clear, this in itself isn't evidence against the claim that OpenAI is trying to directly build safe AI. But it's not much evidence for it, either.

And in a separate comment: 

Here is a similar post one could make about a different company:

"A friend of mine has recently encountered a number of people with misconceptions about their employer, Phillip Morris International (PMI). Some common impressions are accurate, and others are not. He encouraged me to write a post intended to provide clarification on some of these points, to help people know what to expect from the organization and to figure out how to engage with it. It is not intended as a full explanation or evaluation of Phillip Morris's strategy.

Common accurate impressions

  • Phillip Morris International is the world's largest producer of cigarettes. 
  • The majority of employees at Phillip Morris International work on tobacco production and marketing for the developing world.
  • The majority of Phillip Morris International's employees did not join with the primary motivation of reducing harm from tobacco smoke specifically."

These both seem clearly far above the bar for me as not being "insults". There is no "but you suck" in here. There is substantial critique, and the PMI comparison seems apt and helpful for people understanding the kinds of dynamics that are going on here. 

Gabriel Alfour

There is substantial critique

What do you have in mind here?


I mean, I think being like "look, this post reads very similar to a pattern of deception that is pretty common and which suggests an alternative explanation for why this post was written the way it was" is clearly a real critique, and I continue to find it pretty compelling. 

Indeed, there is a reason why organizations produce this kind of article where they take strawmen and tear them down, because it works, without actually being sensitive to the details of the situation.

Gabriel Alfour

This looks like a strong steelman to me, but willing to accept it if that's the precedent you want to set.

Like, if that's truly how you perceived it, and you would be ok with similar shape of critique toward someone more in-groupy, then my reaction is something like:

"Ideally, you'd make this clear ahead of time. But norms can obviously not capture the full breadth of reality, and so this falls under 'Reasonable not-covering-for-every-situation-possible'. And my reaction becomes more of 'Eh. If that's the space you want, why not.'"


I don't think it's a "strong steelman". Saying that it's "a strong steelman" sounds like calling "1984 is a critique of the western world's tendencies to fall into authoritarianism" a "strong steelman". It's obviously the central message. 

I am confident that 90%+ of LW readers got the point of the critique and could restate it in similar words.

Gabriel Alfour

When someone writes "This comment is a substantial critique to the post", I don't expect 1984-lite, I expect actual arguments/counterarguments.

That's what my "strong steelman" comment is about.

Gabriel Alfour

After a year of reflection: I have noticed since then that I don't predict well what you want and don't want on LW.

For my post, I deferred to others at Conjecture who told me it was fit to LW, because my first impression was "Nah, this is too aggressive for Habryka".

(And even then, after publishing the post, I checked with you just to be sure)

I think a thing here is that I am likely not gauging the extent to how different you want LW to be to what I found around EA/Rationalists.

I have found that many EA/Rationalists truly dislike aggressive/frank messaging. They expect what I call "Epistemic Political Correctness", where they feel better when I put qualifiers / "I feel" statements on things that I am in-fact quite confident about.

"Epistemic range of motion" as an important LW principle


In-general, at least the culture I am excited about for LessWrong puts a large premium on something like "epistemic range of motion". My current model of epistemology, and especially of group epistemology, suggests that if you make some set of considerations unexpressable, or some truths taboo, then this tends to radiate out a lot into the rest of your (collective) thinking. 

See the old sequences post Entangled Truths, Contagious Lies and Dark Side Epistemology.

This means that I am willing to give people quite a bit of leeway if they have to express a consideration or an idea clumsily, or aggressively, or with lots of pent up feelings, if my best guess of the alternative is that it is not expressed at all (relatedly I am also a fan of Zack Davis's writing on a bunch of stuff, despite him often expressing things in a kind of aggressive or clumsy way, because I see him as the only person actually saying a bunch of the things that otherwise couldn't be said).

Gabriel Alfour

So, from your point of view, you want to lower as much as possible the costs of saying true things?

And if it happens that someone's way of expressing true things is snark, you're ok with it?

If so, this sounds like a great place for me lol.

Now that I have your blessing I shall do that! I was mostly worried cause I have a history of making unhelpfully aggressive AI safety-related comments and I didn't want moderators to get frustrated with me again (which, to be clear, so far has happened only for very understandable reasons).

For context: this is what the guy who wrote the OAI as Philips Morrisson wrote. And that's quite close to how I felt about my post.


So my general norms are that if you are aggressive on some random topic that seems like it can just be discussed calmly, then I would be relatively harsh on that. But if there is a topic where the choice is "either express this clumsily or not at all", then I am willing to put a lot of chips on the table to defend and create space for the thing to be said.

Gabriel Alfour

if you are aggressive on some random topic that seems like it can just be discussed calmly by a lot of people

I think this is the part that confused me, and that makes me feel "aaaaaaaaaaaaa", as in No!!!!!! So arbitrary!".

The reason why is that very often, this is what dissidence looks like: "Someone is unusually aggressive where most people are calm".

I think I saw moderation on LW of a very snarky thing, and when I saw that you were ok with it under the OAI post, I did not understand it as "It's ok because it is genuinely trying to convey a point and I defend conveying points." but "It's ok because OAI bad".

And my reaction to this was "Wow, if you want OAI out, there are clearer ways to make this known and I think it's not a good goal. And if you want them in, you don't want to make an exception to snark when it's about them specifically"


So my reaction to the Lying is Cowardice post was definitely one of "yep, man, this sure isn't the best way to express this, but I've been carrying around a lot of related feelings in my stomach and I don't really see anyone else making these points, so I guess let's do it clumsily instead, if the alternative is not at all"

Gabriel Alfour

I had the same reaction to you to my post: I was like "Sure, I could express it better, but I suck at writing, I need to start somewhere, and this point needs to be made"

Would have posted similar things a longer time ago if I knew you'd be ok with it

Do you have a document / a page that describes best your approach to moderation and all these considerations?

I think a canonical page would help a lot, else it feels very ad-hoc. Like, while you want criticism to be easily next to the post, at the same time, you also had the "Reign of Terror" thing where people could fully moderate their own post.  

And as a result, we now have a conversation about a thing that irked you like a year after it happened.

I do realise things have costs, and you can do many other things with your resources.
Just trying to raise to saliency that I believe you are under-estimating the benefits of having a canonical place describing your moderation's philosophy or rules.


We have a ton of stuff on moderation philosophy, but I don't think we have anything super centralized, and I don't think we have anything that makes this specific point (though I am sure it's somewhere in some past moderation discussions).

Yeah, for the reign of terror thing, we do allow a small fraction of users to moderate their own posts more aggressively, but it's always been something we've been keeping a close eye on to make sure it doesn't have distortionary effects, and also, I cared a lot about having Pingbacks in place, so that even if someone moderates all the comments on their posts, people can still find response posts if they are highly upvoted: 

Gabriel Alfour

My summary would be:

  • You care much more about lowering the costs to state true things (including clumsiness and snark) than I expected. As a result, I misinterpreted the norm of the snark against OAI for "snark is ok when it's about OAI" as opposed to "snark is ok when it would be more costly to phrase differently".
  • It's hard to understand your moderation philosophy as it's not summarised in a place I could check, and there are bits that went counter to the correct interpretation that you suggest here: "reign of terror", and moderation of other snark (but that were non-productive).
Gabriel Alfour

Yeah, for the reign of terror thing, we do allow a small fraction of users to moderate their own posts more aggressively, but it's always been something we've been keeping a close eye on to make sure it doesn't have distortionary effects

Never been clear to me that it was part of your considerations when I saw the feature, nor when we talked about it on the phone, nor when I talked with Ben about it on the phone. So from my point of view, it read like "nah because OAI". 

(Which again, could be fair if that was the goal, but should be explicit in that case)

Also to make it clear: I am not doubting your honesty, nor do I think you are misrepresenting your beliefs

I think there is a chance that you are much clearer about it than you were a year ago because of reflection, and a chance that I understand it better than a year ago because of more reflection and interacting with more LW people.

(I don't think I have much more to say about this? Confusion mostly resolved on my end.)


Yeah, maybe. I do notice I feel like there is still a very substantial clash of cultural assumptions here. But maybe best to call it for now.

New to LessWrong?

New Comment
3 comments, sorted by Click to highlight new comments since: Today at 10:59 AM

I feel much less ok about ppl dissing OAI on their own blog posts on LW. I assume that if they knew ahead of time, they would have been much less likely to participate.

I find it hard to parse this second sentence. If who knew what ahead of time, they would be less likely to participate?

I think this means "I assume if OpenAI staff expected that users would writing insulting things in the comments, then they may not have participated at all".

Gabriel makes a very good point: there is something of a tension between allowing reign of terror moderation and considering it a norm violation to request the deletion of comments for low quality.

(TBH, I was convinced that reign of terror would be a disaster, but it seems to be working out okay so far).