Rob Bensinger

Communications lead at MIRI. Unless otherwise indicated, my posts and comments here reflect my own views, and not necessarily my employer's.


2022 MIRI Alignment Discussion
2021 MIRI Conversations

Wiki Contributions

Load More


+1. Thanks, Ray. :)

There are things I like about the "MIRI has given up" meme, because maybe it will cause some people to up their game and start taking more ownership of the problem.

But it's not true, and it's not fair to the MIRI staff who are doing the same work they've been doing for years.

It's not even fair to Nate or Eliezer personally, who have sorta temporarily given up on doing alignment work but (a) have not given up on trying to save the world, and (b) would go back to doing alignment work if they found an approach that seemed tractable to them.

Yep. If I could go back in time, I'd make a louder bid for Eliezer to make it obvious that the "Death With Dignity" post wasn't a joke, and I'd add a bid to include some catchy synonym for "Death With Dignity", so people can better triangulate the concept via having multiple handles for it. I don't hate Death With Dignity as one of the handles, but making it the only handle seems to have caused people to mostly miss the content/denotation of the phrase, and treat it mainly as a political slogan defined by connotation/mood.

What Duncan said. "MIRI at least temporarily gave up on personally executing on technical research agendas" is false, though a related claim is true: "Nate and Eliezer (who are collectively a major part of MIRI's research leadership and play a huge role in the org's strategy-setting) don't currently see a technical research agenda that's promising enough for them to want to personally focus on it, or for them to want the organization to make it an overriding priority".

I do think the "temporarily" and "currently" parts of those statements is quite important: part of why the "MIRI has given up" narrative is silly is that it's rewriting history to gloss "we don't know what to do" as "we know what to do, but we don't want to do it". We don't know what to do, but if someone came up with a good idea that we could help with, we'd jump on it!

There are many negative-sounding descriptions of MIRI's state that I could see an argument for, as stylized narratives ("MIRI doesn't know what to do", "MIRI is adrift", etc.). Somehow, though, people skipped over all those perfectly serviceable pejorative options and went straight for the option that's definitely just not true?

My model is that MIRI prioritized comms before 2013 or so, prioritized a mix of comms and research in 2013-2016, prioritized research in 2017-2020, and prioritized comms again starting in 2021.

(This is very crude and probably some MIRI people would characterize things totally differenty.)

I don't think we "gave up" in any of those periods of time, though we changed our mind about which kinds of activities were the best use of our time.

I was actually thinking about something more like "direct" alignment work. 2013-2016 was a period where MIRI was outputting much more research, hosting workshops, et cetera.

2013-2016 had more "research output" in the sense that we were writing more stuff up, not in the sense that we were necessarily doing more research then.

I feel like your comment is blurring together two different things:

  • If someone wasn't paying much attention in 2017-2020 to our strategy/plan write-ups, they might have seen fewer public write-ups from us and concluded that we've given up.
    • (I don't know that this actually happened? But I guess it might have happened some...?)
  • If someone was paying some attention to our strategy/plan write-ups in 2021-2023, but was maybe misunderstanding some parts, and didn't care much about how much MIRI was publicly writing up (or did care, but only for technical results?), 

Combining these two hypothetical misunderstandings into a single "MIRI 2017-2023 has given up" narrative seems very weird to me. We didn't even stop doing stuff pre-2017 like Agent Foundations in 2017-2023, we just did other things too.

How about the distinction between (A) “An AGI kills every human, and the people who turned on the AGI didn’t want that to happen” versus (B) “An AGI kills every human, and the people who turned on the AGI did want that to happen”?

I think the misuse vs. accident dichotomy is clearer when you don't focus exclusively on "AGI kills every human" risks. (E.g., global totalitarianism risks strike me as small but non-negligible if we solve the alignment problem. Larger are risks that fall short of totalitarianism but still involve non-morally-humble developers damaging humanity's long-term potential.)

The dichotomy is really just "AGI does sufficiently bad stuff, and the developers intended this" versus "AGI does sufficiently bad stuff, and the developers didn't intend this". The terminology might be non-ideal, but the concepts themselves are very natural.

It's basically the same concept as "conflict disaster" versus "mistake disaster". If something falls into both category to a significant extent (e.g., someone tries to become dictator but fails to solve alignment), then it goes in the "accident risk" bucket, because it doesn't actually matter what you wanted to do with the AI if you're completely unable to achieve that goal. The dynamics and outcome will end up looking basically the same as other accidents.

Seems to me that this is building in too much content / will have the wrong connotations. If an ML researcher hears about "recklessness risk", they're not unlikely to go "oh, well I don't feel 'reckless' at my day job, so I'm off the hook".

Locating the issue in the cognition of the developer is probably helpful in some contexts, but it has the disadvantage that (a) people will reflect on their cognition, not notice "negligent-feeling thoughts", and conclude that accident risk is low; and (b) it encourages people to take the eye off the ball, focusing on psychology (and arguments about whose psychology is X versus Y) rather than focusing on properties of the AI itself.

"Accident risk" is maybe better just because it's vaguer. The main problem I see with it isn't "this sounds like it's letting the developers off the hook" (since when do we assume that all accidents are faultless?). Rather, I think the problem with "accident" is that it sounds minor.

Accidentally breaking a plate is an "accident". Accidentally destroying a universe is... something a bit worse than that.

If someone deliberately misuses AI to kill lots of people, that's a "disaster" too.

Maybe Nate has something in mind like Bostrom's "strong superintelligence", defined in Superintelligence as "a level of intelligence vastly greater than contemporary humanity's combined intellectual wherewithal"?

(Whereas Bostrom defines "superintelligence" as "any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest", where "exceeding the performance of humans" means outperforming the best individual human, not necessarily outperforming all of humanity working together.)

It would be less frustrating if it weren't likely that these criticisms are just replacing a bunch of counterfactual criticisms of the form "but what about X?" (where X is addressed in the follow-up post, but no one clicks through to read a whole separate post just to find out the nuances behind the original list of ten). You can't win!

I enjoyed this post. :) A minor note:

"rationalist" here refers to general rationalism and not LessWrongian rationalism. Though the two are very closely related, I'd like to avoid any associations particular to this community.

I don't think philosopher-rationalism is closely related to LW rationalism at all. Philosopher-rationalism mostly has nothing to do with LW-rationalism, and where the two do importantly intersect it's often because LW-rationalism is siding with empiricism against rationalism. (See prototypical philosopher-rationalists like Plato, who thought sensory knowledge and observation were completely bogus, and Descartes, who thought probabilistic or uncertain knowledge was completely bogus.)

LW-rationalism is using the word "rational" in contrast to "irrational", more in line with the original meaning of the word "rationalist". It's not "rational" in contrast to "empirical". (If it were, it wouldn't be called "Bayesian" and we wouldn't care much about sensory updating.)

Load More