CEO at Redwood Research.
AI safety is a highly collaborative field--almost all the points I make were either explained to me by someone else, or developed in conversation with other people. I'm saying this here because it would feel repetitive to say "these ideas were developed in collaboration with various people" in all my comments, but I want to have it on the record that the ideas I present were almost entirely not developed by me in isolation.
Please contact me via email (bshlegeris@gmail.com) instead of messaging me on LessWrong.
If we are ever arguing on LessWrong and you feel like it's kind of heated and would go better if we just talked about it verbally, please feel free to contact me and I'll probably be willing to call to discuss briefly.
If you wrote a rude comment in response to me, I wouldn't feel bad about myself, but I would feel annoyed at you. (I feel bad about myself when I think my comments were foolish in retrospect or when I think they were unnecessarily rude in retrospect; the rudeness of replies to me don't really affect how I feel about myself.) Other people are more likely to be hurt by rude comments, I think.
I wouldn't be surprised if Tim found your comment frustrating and it made him less likely to want to write things like this in future. I don't super agree with Tim's post, but I do think LW is better if it's the kind of place where people like him write posts like that (and then get polite pushback).
I have other thoughts here but they're not very important.
I can see why the different things I've said on this might seem inconsistent :P It's also very possible I'm wrong here, I'm not confident about this and have only spent a few hours in conversation about it. And if I wasn't recently personally angered by Eliezer's behavior, I wouldn't have mentioned this opinion publicly. But here's my current model.
My current sense is that IABIED hasn't had that much of an effect on public perception of AI risk, compared to things like AI 2027. My previous sense was that there are huge downsides of Eliezer (and co) being more influential on the topic of AI safety, but MIRI had some chance of succeeding at getting lots of attention, so I was overall positive on you and other MIRI people putting your time into promoting the book. Because the book didn't go as well as seemed plausible, promoting Eliezer's perspective seems less like an efficient way of popularizing concern about AI risk, and less outweighs the disadvantages of him being having negative effects inside the AI safety community.
For example, my guess is that it's worse for the MIRI governance team to be at MIRI than elsewhere except in as much as they gain prominence due to Eliezer association; if that second factor is weaker, it looks less good for them to be there.
I think my impression of the book is somewhat more negative than it was when it first came out, based on various discussions I've had with people about it. But this isn't a big factor.
Does this make sense?
"The main thing Eliezer and MIRI have been doing since shifting focus to comms addressed a 'shocking oversight' that it's hard to imagine anyone else doing a better job addressing" (lmk if this doesn't feel like an accurate paraphrase)
This paraphrase doesn't quite preserve the meaning I intended. I think many people would have done a somewhat better job.
Eliezer definitely doesn't think of it as an ally (or at least, not a good ally who he is appreciative of and wants to be on good terms with).
How does the intro sentence seem triggered? How would you have written it?
(Yeah, I was responding to the earlier version. I meant that in some cases you might want to cause someone to be taken more seriously but not want to cause people to think you take them more seriously (or not want to make that salient, or to make people think that you want them to think you want it to be salient, or whatever). Those are just different objectives you might have.)
I dispute that I frequently snap at people. I just read over my last hundred or so LessWrong comments and I don't think any of them are well characterized as snapping at someone. I definitely agree that I sometimes do this, but I think it's a pretty small minority of things I post. I think Eliezer's median level of obnoxious abrasive snappiness (in LessWrong comments over the last year) is about my 98th percentile.
Oh, you're right, I didn't read those. Feel free to remove the comment or whatever you think is the right move.
I think Eliezer is just really rude and uninterested in behaving civilly, and has terrible intuitions about a wide variety of topics, especially topics related to how other people think or behave. And he substantially evaluates whether people are smart or reasonable based on how much they agree with him or respect him, and therefore writes off a lot of people and behaves contemptuously toward them. And he ends up surrounded by people who either hero worship him or understate their disagreements with him in order to get along with him—many of his co-workers would prefer he didn't act like an asshole on the internet, but they can't make that happen.
I think the core problem with Eliezer is that he spent his formative years arguing on the internet with people on listservs, most of whom were extremely unreasonable. And so he's used to the people around him being mostly idiots with incredibly stupid takes and very little value to add. So he is quite unused to changing his mind based on things other people say.
I don't think you should consider him to be rational with respect to this kind of decision. (I also don't think you should consider him to be rational when thinking about AI.)
I personally would not recommend financial support of MIRI, because I'm worried it will amplify net negative communications from him, and I'm worried that it will cause him to have more of an effect on discourse e.g. on LessWrong. I like and respect many MIRI staff, and I think they should work elsewhere and on projects other than amplifying Eliezer.
(Eliezer is pleasant and entertaining in person if you aren't talking about topics where he thinks your opinion is dumb. I've overall enjoyed interacting with him in person, and he's generally treated me kindly in person, and obviously I'm very grateful for the work he did putting the rationalist community together.)
I do judge comments more harshly when they're phrased confidently—your tone is effectively raising the stakes on your content being correct and worth engaging with.
If I agreed with your position, I'd probably have written something like:
I don't think this is an important source of risk. I think that basically all the AI x-risk comes from AIs that are smart enough that they'd notice their own overconfidence (maybe after some small number of experiences being overconfident) and then work out how to correct for it.
There are other epistemic problems that I think might affect the smart AIs that pose x-risk, but I don't think this is one of them.
In general, this seems to me like a minor capability problem that is very unlikely to affect dangerous AIs. I'm very skeptical that trying to address such problems is helpful for mitigating x-risk.
What changed? I think it's only slightly more hedged. I personally like using "I think" everywhere for the reason I say here and the reason Ben says in response. To me, my version also more clearly describes the structures of my beliefs and how people might want to argue with me if they want to change my mind (e.g. by saying "basically all the AI x-risk comes from" instead of "The kind of intelligent agent that is scary", I think I'm stating the claim in a way that you'd agree with, but that makes it slightly more obvious what I mean and how to dispute my claim—it's a lot easier to argue about where x-risk comes from than whether something is "scary").
I also think that the word "stupid" parses as harsh, even though you're using it to describe something on the object level and it's not directed at any humans. That feels like the kind of word you'd use if you were angry when writing your comment, and didn't care about your interlocutors thinking you might be angry.
I think my comment reads as friendlier and less like I want the person I'm responding to to feel bad about themselves, or like I want onlookers to expect social punishment if they express opinions like that in the future. Commenting with my phrasing would cause me to feel less bad if it later turned out I was wrong, which communicates to the other person that I'm more open to discussing the topic.
(Tbc, sometimes I do want the person I'm responding to to feel bad about themselves, and I do want onlookers to expect social punishment if they behave like the person I was responding to; e.g. this is true in maybe half my interactions with Eliezer. Maybe that's what you wanted here. But I think that would be a mistake in this case.)
Yes, I think it's reasonable to describe this as the creatures acausally communicating. (Though I would have described this differently; I think that all the physics stuff you said is not necessary for the core idea you want to talk about.)