Rob Bensinger — LessWrong

Quoting from a follow-up conversation I had with Buck after this exchange:

__________________________________________________________

Buck: So following up on your Will post: It sounds like you genuinely didn't understand that Will is worried about AI takeover risk and thinks we should try to avert it, including by regulation. Is that right?

I'm just so confused here. I thought your description of his views was a ridiculous straw man, and at first I thought you were just being some combination of dishonest and rhetorically sloppy, but now my guess is you're genuinely confused about what he thinks?

(Happy to call briefly if that would be easier. I'm interested in talking about this a bit because I was shocked by your post and want to prevent similar things happening in the future if it's easy to do so.)

Rob: I was mostly just going off Will's mini-review; I saw that he briefly mentioned "governance agendas" but otherwise everything he said seemed to me to fit 'has some worries that AI could go poorly, but isn't too worried, and sees the current status quo as basically good -- alignment is going great, the front-running labs are sensible, capabilities and alignment will by default advance in a way that lets us ratchet the two up safely without needing to do anything special or novel'

so I assumed if he was worried, it was mainly about things that might disrupt that status quo

Buck: what about his line "I think the risk of misaligned AI takeover is enormously important."

alignment is going great, the front-running labs are sensible

This is not my understanding of what Will thinks.

[added by Buck later: And also I don’t think it’s an accurate reading of the text.]

Rob: 🙏

that's helpful to know!

Buck: I am not confident I know exactly what Will thinks here. But my understanding is that his position is something like: The situation is pretty scary (hence him saying "I think the risk of misaligned AI takeover is enormously important."). There is maybe 5% overall chance of AI takeover, which is a bad and overly large number. The AI companies are reckless and incompetent with respect to these risks, compared to what you’d hope given the stakes. Rushing through super intelligence would be extremely dangerous for AI takeover and other reasons.

[added/edited by Buck later: I interpret the review as saying:

He thinks the probability of AI takeover and of human extinction due to AI takeover is substantially lower than you do.
- This is not because he thinks “AI companies/humanity are very thoughtful about mitigating risk from misaligned superintelligence, and they are clearly on track to develop techniques that will give developers justified confidence that AIs powerful enough that their misalignment poses risk of AI takeover are aligned”. It’s because he is more optimistic about what will happen if AI companies and humanity are not very thoughtful and competent.
He thinks that the arguments given in the book have important weaknesses.
He disagrees with the strategic implications of the worldview described in the book.

For context, I am less optimistic than he is, but I directionally agree with him on both points.]

In general, MIRI people often misunderstand someone saying, "I think X will probably be fine because of consideration Y" to mean "I think that plan Y guarantees that X will be fine". And often, Y is not a plan at all, it's just some purported feature of the world.

Another case is people saying "I think that argument A for why X will go badly fails to engage with counterargument Y", which MIRI people round off to "X is guaranteed to go fine because of my plan Y"

Rob: my current guess is that my error is downstream of (a) not having enough context from talking to Will or seeing enough other AI Will-writing, and (b) Will playing down some of his worries in the review

I think I was overconfident in my main guess, but I don't think it would have been easy for me to have Reality as my main guess instead

Buck: When I asked the AIs, they thought that your summary of Will's review was inaccurate and unfair, based just on his review.

It might be helpful to try checking this way in the future.

I'm still interested in how you interpreted his line "I think the risk of misaligned AI takeover is enormously important."

Rob: I think that line didn't stick out to me at all / it seemed open to different interpretations, and mainly trying to tell the reader 'mentally associate me with some team other than the Full Takeover Skeptics (eg I'm not LeCun), to give extra force to my claim that the book's not good'.

like, I still associate Will to some degree with the past version of himself who was mostly unconcerned about near-term catastrophes and thought EA's mission should be to slowly nudge long-term social trends. "enormously important" from my perspective might have been a polite way of saying 'it's 1 / 10,000 likely to happen, but that's still one of the most serious risks we face as a society'

it sounds like Will's views have changed a lot, but insofar as I was anchored to 'this is someone who is known to have oddly optimistic views and everything-will-be-pretty-OK views about the world' it was harder for me to see what it sounds like you saw in the mini-review

(I say this mainly as autobiography since you seemed interested in debugging how this happened; not as 'therefore I was justified/right')

Buck: Ok that makes sense

Man, how bizarre

Claude had basically the same impression of your summary as I did

Which makes me feel like this isn't just me having more context as a result of knowing Will and talking to him about this stuff.

Rob: I mean, I still expect most people who read Will's review to directionally update the way I did -- I don't expect them to infer things like

"The situation is pretty scary."

"The AI companies are reckless and incompetent wrt these risks."

"Rushing through super intelligence would be extremely dangerous for AI takeover and other reasons."

or 'a lot of MIRI-ish proposals like compute governance are a great idea' (if he thinks that)

or 'if the political tractability looked 10-20x better then it would likely be worth seriously looking into a global shutdown immediately' (if he thinks something like that??)

I think it was reasonable for me to be confused about what he thinks on those fronts and to press him on it, since I expect his review to directionally make people waaaaaaay more misinformed and confused about the state of the world

and I think some of his statements don't make sense / have big unresolved tensions, and a lot of his arguments were bad and misinformed. (not that him strawmanning MIRI a dozen different ways excuses me misrepresenting his view; but I still find it funny how disinterested people apparently are in the 'strawmanning MIRI' side of things? maybe they see no need to back me up on the places where my post was correct, because they assume the Light of Truth will shine through and persuade people in those cases, so the only important intervention is to correct errors in the post?)

but I should have drawn out those tensions by posing a bunch of dilemmas and saying stuff like 'seems like if you believe W, then bad consequence X; and if you believe Y, then bad consequence Z. which horn of the dilemma do you choose, so I know what to argue against?', rather than setting up a best-guess interpretation of what Will was saying (even one with a bunch of 'this is my best guess' caveats)

I think Will was being unvirtuously cagey or spin-y about his views, and this doesn't absolve me of responsibility for trying to read the tea leaves and figure out what he actually thinks about 'should government ever slow down or halt the race to ASI?', but it would have been a very easy misinterpretation for him to prevent (if his views are as you suggest)

it sounds like he mostly agrees about the parts of MIRI's view that we care the most about, eg 'would a slowdown/halt be good in principle', 'is the situation crazy', 'are the labs wildly irresponsible', 'might we actually want a slowdown/halt at some point', 'should govs wake up to this and get very involved', 'is a serious part of the risk rogue AI and not just misuse', 'should we do extensive compute monitoring', etc.

it's not 100% of what we're pushing but it's overwhelmingly more important to us than whether the risk is more like 20-50% or more like 'oh no'

I think most readers wouldn't come away from Will's review thinking we agree on any of those points, much less all of them

Buck:

I expect his review to directionally make people waaaaaaay more misinformed and confused about the state of the world

I disagree

and I think some of his statements don't make sense / have big unresolved tensions, and a lot of his arguments were bad and misinformed.

I think some of his arguments are dubious, but I don't overall agree with you.

I think Will was being unvirtuously cagey or spin-y about his views, and this doesn't absolve me of responsibility for trying to read the tea leaves and figure out what he actually thinks about 'should government ever slow down or halt the race to ASI?', but it would have been a very easy misinterpretation for him to prevent (if his views are as you suggest)

I disagree for what it's worth.

it sounds like he mostly agrees about the parts of MIRI's view that we care the most about, eg 'would a slowdown/halt be good in principle', 'is the situation crazy', 'are the labs wildly irresponsible', 'might we actually want a slowdown/halt at some point', 'should govs wake up to this and get very involved', 'is a serious part of the risk rogue AI and not just misuse', 'should we do extensive compute monitoring', etc.
it's not 100% of what we're pushing but it's overwhelmingly more important to us than whether the risk is more like 20-50% or more like 'oh no'

I think that the book made the choice to center a claim that people like Will and me disagree with: specifically, "With the current trends in AI progress building super intelligence is overwhelmingly likely to lead to misaligned AIs that kill everyone."

It's true that much weaker claims (e.g. all the stuff you have in quotes in your message here) are the main decision-relevant points. But the book chooses to not emphasize them and instead emphasize a much stronger claim that in my opinion and Will's opinion it fails to justify.

I think it's reasonable for Will to substantially respond to the claim that you emphasize, rather than different claims that you could have chosen to emphasize.

I think a general issue here is that MIRI people seem to me to be responding at a higher simulacrum level than the one at which criticisms of the book are operating. Here you did that partly because you interpreted Will as himself operating at a higher simulacrum level than the plain reading of the text.

I think it's a difficult situation when someone makes criticisms that, on the surface level, look like straightforward object level criticisms, but that you suspect are motivated by a desire to signal disagreement. I think it is good to default to responding just on the object level most of the time, but I agree there are costs to that strategy.

And if you want to talk about the higher simulacra levels, I think it's often best to do so very carefully and in a centralized place, rather than in a response to a particular person.

I also agree with Habryka’s comment that Will chose a poor phrasing of his position on regulation.

Rob: If we agree about most of the decision-relevant claims (and we agree about which claims are decision-relevant), then I think it's 100% reasonable for you and Will to critique less-decision-relevant claims that Eliezer and Nate foregrounded; and I also think it would be smart to emphasize those decision-relevant claims a lot more, so that the world is likely to make better decisions. (And so people's models are better in general; I think the claims I mentioned are very important for understanding the world too, not just action-relevant.)

I especially think this is a good idea for reviews sent to a hundred thousand people on Twitter. I want a fair bit more of this on LessWrong too, but I can see a stronger claim having different norms on LW, and LW is also a place where a lot of misunderstandings are less likely because a lot more people here have context.

Re simulacra levels: I agree that those are good heuristics. For what it's worth, I still have a much easier time mentally generating a review like Will's when I imagine the author as someone who disagrees with that long list of claims; I have a harder time understanding how none of those points of agreement came up in the ensuing paragraphs if Will tacitly agreed with me about most of the things I care about.

Possibly it's just a personality or culture difference; if I wrote "This is a shame, because I think the risk of misaligned AI takeover is enormously important" (especially in the larger context of the post it occurred in) I might not mean something all that strong (a lot of things in life can be called "enormously important" from one perspective or another); but maybe that's the Oxford-philosopher way of saying something closer to "This situation is insane, we're playing Russian roulette with the world, this is an almost unprecedented emergency."

(Flagging that this is all still speculative because Will hasn't personally confirmed what his views are someplace I can see it. I've been mostly deferring to you, Oliver, etc. about what kinds of positions Will is likely to endorse, but my actual view is a bit more uncertain than it may sound above.)

LESSWRONG
LW

LESSWRONG
LW

Sequences

Posts

Wikitag Contributions

Comments