(Status: a short write-up of some things that I find myself regularly saying in person. In this case, I'm writing up my response to the question of why I don't spend a bunch more time trying to resolve disagreements with people in the community who disagree with me about the hopefulness of whatever research direction. I’m not particularly happy with it, but it’s been languishing in my draft folder for many months now and published is better than perfect.)
When I first joined the AI alignment community almost ten years ago, there were lots of disagreements—between groups like MIRI and Open Phil, between folks like Eliezer Yudkowsky and Paul Christiano, etc. At that time, I was optimistic about resolving a bunch of those disagreements. I invested quite a few hours in this project, over the years.
I didn't keep track exactly, but extremely roughly, I think the people with very non-MIRI-ish perspectives I spent the most time trying to converge with (including via conversation, reading and writing blog posts, etc.) were:
- Paul Christiano (previously at OpenAI, now at ARC): 100 hours? (Maybe as low as 50 or as high as 200.)
- Daniel Dewey (then at Open Phil): 40 hours? (Possibly 100+.)
- Nick Beckstead (then at Open Phil): 30 hours?
- Holden Karnofsky (Open Phil): 20 hours?
- Tom Davidson (Open Phil): 15 hours?
Another non-MIRI person I’ve spent at least a few hours trying to sync with about AI is Rohin Shah at DeepMind.
(Note that these are all low-confidence ass numbers. I have trouble estimating time expenditures when they’re spread across days in chunks that are spread across years, and when those chunks blur together in hindsight. Corrections are welcome.)
I continue to have some conversations like this, but my current model is that attempting to resolve older and more entrenched disagreements is not worth the time-cost.
It's not that progress is impossible. It's that we have a decent amount of evidence of what sorts of time-investment yield what amounts of progress, and it just isn't worth the time.
On my view, Paul is one of the field’s most impressive researchers. Also, he has spent lots of time talking and working with MIRI researchers, and trying to understand our views.
If even Paul and I can’t converge that much over hundreds of hours, then I feel pretty pessimistic about the effects of a marginal hour spent trying to converge with other field leaders who have far less context on what MIRI-ish researchers think and why we think it. People do regularly tell me that I’ve convinced them of some central AI claim or other, but it’s rarely someone whose views are as distant from mine as Paul’s are, and I don’t recall any instance of it happening on purpose (as opposed to somebody cool who I didn’t have in mind randomly approaching me later to say “I found your blog post compelling”).
And I imagine the situation is pretty symmetric, at this level of abstraction. Since I think I’m right and I think Paul’s wrong, and we’ve both thought hard about these questions, I assume Paul is making some sort of mistake somewhere. But such things can be hard to spot. From his perspective, he should probably view me as weirdly entrenched in my views, and therefore not that productive to talk with. I suspect that he should at least strongly consider this hypothesis, and proportionally downgrade his sense of how useful it is to spend an hour trying to talk some sense into me!
As long as your research direction isn't burning the commons, I recommend just pursuing whatever line of research you think is fruitful, without trying to resolve disagreements with others in the field.
Note that I endorse writing up what you believe! Articulating your beliefs is an important tool for refining them, and stating your beliefs can also help rally a variety of readers to join your research, which can be enormously valuable. Additionally, I think there’s value in reminding people that your view exists and reminding them to notice when observations support the view versus undermining it.
I'm not saying "don't worry about making arguments, just do the work"; arguments have plenty of use. What I'm saying is that the use of arguments is not in persuading others already entrenched in their views; the use of arguments lies elsewhere.
(Hyperbolic summary: you can convince plenty of people of things, you just can’t convince particular people of particular things. An argument crafted for a particular person with a very different world-view won’t convince that person, but it might convince bystanders.)
Also, to be clear, I think there's benefit in occasionally clashing with people with very different views. This can be useful for identifying places where your own arguments are weak and/or poorly articulated, and for checking whether others are seeing paths-to-victory that you've missed. (And they sometimes generate artifacts that are useful for rallying other researchers.)
I continue to do this occasionally myself. I just don't think it's worth doing with the goal of converging with the person I'm discussing with, if they’re a smart old-timer who’s consistently disagreed for a long time; that's an unrealistic goal.
I still think that people with entrenched views can undergo various small shifts in their views by talking to other entrenched people from other schools of thought. If you’re trying to get more of those small shifts, it may be well worth having more conversations like that; and small shifts may add up to bigger updates over time.
The thing I think you don’t get is the naive “just close people in a room for however long it takes until they come out of agreeing” that I sort of hoped was possible in 2014, and that I now think does not happen on the order of spending-weeks-together.
You might argue that the hours I invested in bridging some of the gaps between the wildly-different worldviews in this community were spent foolishly and inefficiently, and that someone more skillful may have more luck than me. That's surely the case for many of the hours; I'm no master of explaining my views.
I’ll note, however, that I’ve tried a variety of different tools and methods, from raw arguing to facilitated discussion to slow written back-and-forth to live written debate. In my experience, none of it seems to appreciably increase the rate at which long-standing disagreements get resolved.
Also, naive as some of my attempts were, I don't expect my next attempts to go significantly better. And I’ve seen a number of others try to bridge gaps within the field where I failed, and I have yet to see promising results from anyone else either.
Resolving entrenched disagreements between specific people just is not that easy; the differences in world-view are deep and fractal; smart established thinkers in the field already agree about most things that can be cheaply tested.
I'd love to be challenged on this count. If you think I’m being an idiot and ignoring an obvious way to resolve lots of major entrenched disagreements within the field, feel free to give it yet another shot. But until the counterevidence comes in, I plan to continue saying what I believe, but not much worrying about trying to bridge the existing gaps between me and specific individuals I’ve already talked to a lot, like Paul.
(For the record, this has been my stance since about 2017. For most of the intervening five years, I considered it also not really worth the time to write up a bunch of my beliefs, preferring to do direct alignment research. But my estimate of the benefits from my direct alignment research have fallen, so here we are!)
I'm not much moved by these types of arguments, essentially because (in my view) the level of meta at which they occur is too far removed from the object level. If you look at the actual points your opponents lay out, and decide (for whatever reason) that you find those points uncompelling... that's it. Your job here is done, and the remaining fact that they disagree with you is, if not explained away, then at least screened off. (And to be clear, sometimes it is explained away, although that happens mostly with bad arguments.)
Ditto for outside view arguments—if you've looked at past examples of tech, concluded that they're dissimilar from AGI in a number of ways (not a hard conclusion to reach), and moreover concluded that some of those dissimilarities are strategically significant (a slightly harder conclusion, and one that some people stumble before reaching—but not, ultimately, that hard), then the base rates of the category being outside-viewed no longer contain any independently relevant information, which means that—again—your job here is done.
(I've made comments to similar effect in the past, and plan to continuing trumpeting this horn for as long as the meme to which it is counter continues to exist.)
This does, of course, rely on your own reasoning to be correct, in the sense that if you're wrong, well... you're wrong. But of course, this really isn't a particularly special kind of situation: it's one that recurs all across life, in all kinds of fields and domains. And in particular, it's not the kind of situation you should cower away from in fear—not if your goal is actually grasping the reality of the situation.
***
And finally (and obviously), all of this only applies to the person making the updates in the first place (which is why, you may notice, everything above the asterisks seems to inhabit the perspective of someone who believes they understand what's happening, and takes for granted that it's possible for them to be right as well as wrong). If you're not in the position of such an individual, but instead conceive of yourself as primarily a third party, an outsider looking in...
...well, mostly I'd ask what the heck you're doing, and why you aren't either (1) trying to form your own models, to become one of the People Who Can Get Things Right As Well As Wrong, or—alternatively—(2) deciding that it's not worth your time and effort, either because of a lack of comparative advantage, or just because you think the whole thing is Likely To Be Bunk.
It kind of sounds like you're on the second path—which, to be clear, is totally fine! One of the predictable consequences of Daring to Disagree with Others is that Other Others might look upon you, notice that they can't really tell who's right from the outside, and downgrade their confidence accordingly. That's fine, and even good in some sense: you definitely don't want people thinking they ought to believe something even in [what looks to them like] the absence of any good arguments for it; that's a recipe for irrationality.
But that's the whole point, isn't it—that the perspectives of the Insider, the Researcher Trying to Get At the Truth, and the Outsider, the Bystander Peering Through the Windows—will not look identical, and for obvious reason: they're different people standing in different (epistemic) places! Neither one of them should agonize about the fact that the former has a tighter probability distribution than the latter; that's what happens when you proceed further down the path—ideally the right path, but any path has the same property: that your probability distribution narrows as you go further down, and your models become more specific and more detailed.
So go ahead and downgrade your assessment of "LW epistemics" accordingly, if that's what you've decided is the right thing to do in your position as the outsider looking in. (Although I'd argue that what you'd really want is to downgrade your assessment of MIRI, instead of LW as a whole; they're the most extreme ones in the room, after all. For the record, I think this is Pretty Awesome, but your mileage may vary.) But don't demand that the Insider be forced to update their probability distribution to match yours—to widen their distribution, to walk back the path they've followed in the course of forming their detailed models—simply because you can't see what [they think] they're seeing, from their vantage point!
Those people are down in the trenches for a reason: they're investigating what they see as the most likely possibilities, and letting them do their work is good, even if you think they haven't justified their (seeming) confidence level to your satisfaction. They're not trying to.
(Oh hey, I think that has something to do with the title of the post we're commenting on.)