[This is a direct cross-post from a Facebook post of mine and was originally intended for the people who read what I write on Facebook, which is largely why this is written as a "for the record" statement of my personal viewpoints. That's probably not how I would have written this if I had originally intended it as a LessWrong post. In any case, I am cross-posting this here because I think people will disagree with me about the methodology and I'd like to get feedback on that.]

For the record, my actual current opinion on the plausibility of catastrophic risks from very advanced AI is... that I really don't know.

I currently lean towards thinking that there's at least a decent chance within the next 10-40 years that we will get AI systems that are human-level-or-above at relevant tasks like long-term strategic planning, scientific research and engineering, and human social manipulation. And conditional on achieving that level of AI, I currently lean towards thinking that there's at least a non-negligible chance that such systems will end up causing permanent catastrophic harm to the future of humanity, perhaps even human extinction or some really horrible dystopia. But I'm also extremely uncertain about those statements. If I had to put probability estimates to those claims, then my estimate for the first claim (human level AI or above in 10-40 years) would probably be somewhere in the range 5%-95%, and my estimate for the second claim (catastrophically bad outcome conditional on the the first claim) might be somewhere between 0.1% and 95%. So yeah - really, really uncertain.

This is actually what my PhD research is largely about: Are these risks actually likely to materialize? Can we quantify how likely, at least in some loose way? Can we quantify our uncertainty about those likelihoods in some useful way? And how do we make the best decisions we can if we are so uncertain about things?

One aspect that shapes a lot of my thinking on this and many other topics is epistemic modesty - very roughly, looking at what the "experts" say about the topic. At any given time it may seem to me that the object-level arguments point one way or another, but honestly: (a) I know that I don't know nearly as much on this topic as lots of other people who are at least as smart as I am if not way smarter, plus I haven't thought about it nearly as long, and yet they still disagree with me. So why should I think I somehow managed to get it right and they're wrong? And also (b) I know that I myself keep switching my own "internal views" based on whatever I'm reading at the time, so why expect my current view to be the right one?

The really tricky part of this way of thinking, of course, is figuring it who the "real experts" are, how much to modify my views based on their opinions (if at all), and on which topics. Are AI safety researchers the relevant experts here? Maybe to some degree, but they're also self-selected to worry about this topic, plus maybe now that they work on it they're biased. And are there ways to figure out which AI safety researchers are "more expert", since they disagree with each other a lot? Or maybe it's the broader class of AI researchers instead, or even more broadly the class of apparently smart knowledgeable people who seem to have some relevant area of expertise and who have expressed an opinion on the topic? But it's often clear when reading what they say that they haven't carefully considered the arguments they're (sometimes loudly) arguing against. So maybe discount everybody like that and only focus on researchers who look like they have considered the arguments against their own position?

Or maybe we should go back to just relying on our own "inside views" despite the fact that they seem unreliable? Or maybe some sort of hybrid approach? Maybe we can sub-divide the high-level questions and arguments into sub-arguments and that would allow us to get better traction? But how do we do that properly in a way where we don't make things even more confused because maybe we misunderstood the sub-arguments?

All of this is part of what I am researching as well.

In the meantime though, if you want me to shift my own highly uncertain estimates of how likely these risks are, then the best ways to do that are:

(1) Show me some person who clearly has relevant expertise and has clearly considered the arguments against their own position, and then show me what that person thinks of the issue. The more clearly that person has relevant expertise and the more clear it is that they have carefully considered the arguments of the other side, the better. Especially if it's not in line with the positions of other closely-associated thinkers - for example, if you show me someone from MIRI who agrees with Eliezer Yudkowsky, that's not likely to shift my position very much.

(2) Give me good "meta-arguments" for why I should defer more to one set of "experts" on this topic (or on some sub-topic of this topic) over another set of experts. Is there reason to think that one set is more likely to be biased about some set of relevant questions? Is there reason to suspect that they haven't properly considered some argument or another? Is there reason to not really consider them "expert" at all on some set of sub-questions?

(3) Show me that when you read between the lines it's clear that some of the "experts" actually concede on some important sub-questions. I've gotten a lot of mileage out of this one actually - to me this seems like relatively strong "secondary evidence" that the other side is more likely to be correct at least for those sub-questions. In particular for AI risk, from my readings it looks to me like a lot of "AI risk skeptics" actually concede about a lot of the substantive claims made by "AI risk promoters". This has a strong influence on my current estimates, and it's a large part of why for example the bounds on the numbers I gave earlier are so high relative to what some others might give.

(4) Conversely, you can also of course show me that my current understanding of the factors in (1)-(3) are mistaken. For example, you might show me that whereas I thought some expert was conceding on point X, I was actually mistaken about that.

(5) If you want to convince me using object-level arguments, then you'll need to also convincingly explain to me why all the experts on the other side haven't properly considered your arguments. (Or in other words, you probably aren't going to convince me of much using object-level arguments unless they're also accompanied by some good "meta-arguments".) Or better yet, post your arguments on LessWrong and we'll see what people respond to you. If it turns out that people aren't considering your arguments, then hopefully they consider them once you post them.

(6) Show me that I'm misunderstanding the substance of what the arguments and disagreements are actually about. This is where a lot of the shifts in my opinions have come from as I study more.

New Comment
4 comments, sorted by Click to highlight new comments since: Today at 8:16 AM

Currently there seems to be a dearth of mutual understanding between various groups of experts, and the recent discussions I read here and elsewhere between Eliezer Yudkowsky, Paul Christiano, Robin Hanson and others don't even seem to crux well, let alone represent the opponents' position faithfully (as confirmed by the other side). Charity has never been Eliezer's core strength, but I would have expected others at MIRI to help him there. So, before you can put a reasonable confidence interval on the question "Are these risks actually likely to materialize?" you may have to do some legwork to get the relevant experts to double-crux or something. 

An "expert alignment" problem, if you will, that needs to be solved before the AI alignment problem.

This is actually another related area of my research: To the extent that we cannot get people to sit down and agree on double cruxes, can we still assign some reasonable likelihoods and/or uncertainty estimates for those likelihoods? After all, we do ultimately need to make decisions here! Or if it turns out that we literally cannot use any numbers here, how do we best make decisions anyway?

 It's an interesting question, I think Scott A explored it as https://slatestarcodex.com/2019/06/03/repost-epistemic-learned-helplessness/ . But it would likely be inferior to figuring out a way for people to either double-crux, or at least do some kind of adversarial collaboration. Seems a lot easier than the problem we are trying to address, so what hope is there for the bigger problem if this one remains unresolved?

This is actually what my PhD research is largely about: Are these risks actually likely to materialize? Can we quantify how likely, at least in some loose way? Can we quantify our uncertainty about those likelihoods in some useful way? And how do we make the best decisions we can if we are so uncertain about things?

I'd be really interested in your findings.