good post, thanks.
one thing I'd highlight: there's a point where you conflate two claims that feel very different to me:
“It seems plausible that the best thing to do if you really take AI x-risk seriously is to just stop working on AI at all.”
And that’s what I’ve been trying to say this whole time, whenever anyone asks me about my career. That I don’t want to try to have a big impact, if I can’t be certain that that impact will be positive rather than negative for the world — and I can’t be certain.
I think that there are a bunch of "AI safety" people such that the world would be better off if they stopped working on AI at all. But that doesn't mean that they (or anyone else) should be aiming to have certainty of positive impact—that's a very high bar.
Instead, one way I think about it is that there's a skill of avoiding self-deception (and being virtuous more generally), and the more you cultivate this skill, then the more you're able to have a robustly positive impact even when you're not certain.
Any pointers to further reading on cultivating self-deception-avoidance to robust-ify positive impact? At a glance, Distributed vs centralized agents doesn't seem to be about this.
My post on pessimization talks about a bunch of the mechanisms by which you might have negative impact.
I have some posts in the works on virtue ethics, but for now probably the most relevant thing I've written is this sequence on replacing fear. My sense is that a lot of self-deception is caused by fear-based motivations.
Trying to avoid self-deception seems like an important piece of it (although it seems non-trivial, eg it's easy to self deceive about one's own level of self deception). But for high-variance, high-impact stuff it separately seems especially important to try to take actions which are good over as many worlds as possible. Consequentialism doesn't necessarily do this, since single factors can dominate the calculus. Which causes optimizer's curse problems but more generally: in highly uncertain domains probability estimates are just really often wrong. And especially when such a misstep can cause massive harm, I think it's also worth trying to compensate for the uncertainty in the direction of being more robust to those errors.
I’d like to discuss these competing heuristics in the context of AI safety:
Which heuristic should a random parent who doesn't know you hope you follow, if they want their kid to live a long good life? There’s a prima facie argument for B: if reality deviates from your estimates in an unbiased fashion (could be more good effects than you were accounting for, or more bad ones, in a pretty even mix), it helps the kid if you take all the actions that look EV-positive to you, without restricting yourself to “only if I’m certain”.
But, I think in AI safety it’s often closer to A. My reasoning:
And so, if a person is working off a weak signal ("I thought over all the arguments and this one seems more intuitively plausible, and the impact is big so it's worth acting on for some years despite no feedback loops), on something big enough that the distributed "build AI" optimization process discusses the relevant considerations a lot, their weak-but-real ability to weigh considerations may be swamped by the meme-network's tendency to get distorted by "build AI" optimization.
I suspect it may often be the case that the "let's not let AI kill everyone" meme brings in lots of psychological energy/motivation, that lets smart high-integrity people work hard in response to relatively tenuous arguments in ways people normally can't. And then the "build AI fast" optimizer co-opts their effort and makes it negative sign (since it gets much better feedback loops, but has a harder time pulling in high-integrity people on its own).
(If a person instead takes smaller actions that they predict will be visibly/obviously good in relatively short periods of time, this is much less of a problem, since inaccurate models are easy to notice and fix in such contexts. And doing small scale things with solid feedbacks can set up to do somewhat larger scale things that still have solid feedbacks.)
I think you're right that AI safety is case A but I'd suggest you add reversibility into your reasons. If we can just turn it off (ignoring influence campaigns to get us not to) there's no risk, the problem is if it is one way. Outside of AI take over scenarios, there's substantial evidence that some would consider ending sentience a kind of murder.
I posted my own take on this subject a while ago.
That I don’t want to try to have a big impact, if I can’t be certain that that impact will be positive rather than negative for the world — and I can’t be certain.
I've been doing something similar, in part because I seem to instinctively shy away from high-stakes situations. But I haven't publicly endorsed this, because excessively dissuading people from attempting to have big positive impacts could itself constitute a big negative impact, and I'm not sure how to balance this. (Hence the "managing risks" framing in my post linked above, which seems "safer" from this perspective.)
Also, I make an exception for myself, namely that it's ok to talk about my abstract ideas (which can cause big impacts if others eventually translate them into concrete ideas and actions), if I take sufficient precautions. This seems justified because if nobody talks about their ideas, world would be stuck in its current intellectual state, so there seems to be no other alternative.
I get stuck in analysis paralysis a lot. When you're used to clean logical models, the messy, irrational friction of the real world is just exhausting. I usually end up freezing and doing nothing.
We're stuck in a classic multi-polar trap — a Moloch dynamic. People who actually care about ethics are backing out, while the big labs slash safety budgets to push capabilities. In a system with no coordination, walking away doesn't fix the trajectory. It just hands the steering wheel to the actors willing to take the biggest risks.
I sometimes wonder if a real pause could have worked back before Eliezer and Nick put AGI on the map. But with so much money locked in and the path already set, a full retreat just isn't going to happen now.
That's also why I'm not working on AI. But there are plenty of ambitious things to do where there's huge upside and no plausible way the world will end if it goes wrong. (Like making human eggs super abundant, for example.)
Not sure how to tag people, but I see abstractapplic and Epiphanie Gedeon questioning
The effect of the latter [funding insecticide-treated bed nets to protect people from malaria, and then those nets are used for fishing and pollute the waterways] has been determined to be insignificant
This footnote in GiveWell's writeup evaluating mass distribution of ITNs outlines their thinking, more in this GW blog post and this spreadsheet overviewing net usage data:
Factors we have excluded
A number of potential benefits and offsetting impacts have been excluded from our model altogether. We exclude these factors either because we are uncertain how to interpret them, we expect their impact to be very small, or they are accounted for in other ways. ...
- Using ITNs for fishing in waterside, food-insecure communities. A 2015 New York Times article describes people using ITNs for fishing instead of sleeping under the nets to protect themselves from malaria-carrying mosquitoes. We believe this problem is unlikely to be widespread, and we see it as a much smaller problem than people lacking nets for preventing malaria (details in footnote).322
322: The ITN distribution programs we have supported conduct monitoring surveys to determine whether recipients use nets as intended. Our largest grantee to date, Against Malaria Foundation, has generally found moderate-to-high usage rates (in the 60 to 80% range, depending on the country and length of time since the campaign). These results are broadly in line with evidence from other surveys; for more detail on the wider evidence on ITN usage, see our response to the New York Times article. For more detail about the usage monitoring data we have seen from distributions that GiveWell has funded, see our page on Against Malaria Foundation’s program.
Ah, I see where I was misreading; 'the latter' could have meant the bednets or their unwanted side-effects, but on reading your read I read the "unwanted side-effects" read to be more plausible. Ty.
AI (and a lot of things) suffers from the unilateralist's curse, where something very bad will be done if many people are capable of it, and there's enough variation in their cost and benefit estimates. This is also true for a single person, if your own cost and benefit estimate changes over time and you can't undo what you do.
That said, it's okay to leave EA to be happier without saying that EA is wrong. I really don't think the nets polluting the waterways counter the human lives saved. Some things are necessary even if they have harmful side effects. We can't get rid of the police entirely, just because sometimes bad police kill an innocent person.
I very much agree with the sentiment, and feel like this points at one of the biggest dilemmas out there:
to wield great power is necessarily to be able to do great harm, either by accident, by negligence, or on purpose. And there is an argument that to even want great power is insane - as you said, it requires self belief beyond what is justifiable. The correct amount of confidence in one's own sanity and beliefs is less than what is required to take on the typical risks of founding a business.
But also, to not wield power inevitably leaves the ones who are that kind of crazy to do so, and even trying to stop them is itself a feat of power.
I guess the only answer would require political power wielded by some kind of emergent collective intelligence but we all know how well that tends to pan out too.
"“It seems plausible that the best thing to do if you really take AI x-risk seriously is to just stop working on AI at all.”"
Thanks for sharing your story.
I suspect this argument is only true for a small subset of people - who always wanted to be a founder, or a success or leave their mark on the world - and who just can't help themselves if an exciting enough opportunity comes their way. It that case, cold turkey might be the only option.
It doesn't sound like you are one of those people.
I don't think it's just that. The classic case would be, maybe you're just a humble academic researcher publishing papers on interpretability for no financial reward. But then someone uses your findings to turbocharge capabilities instead and also makes a ton of money off it. That feels like adding insult to injury.
Generalizing, this looks like the gambler's ruin (even positive EV bets can be bad bets, if the losses would be unrecoverable - "quadruple or nothing at 50%, but you're betting all you have" predictably ends with you having nothing if you keep playing long enough). Except not with units of money, but units of motivation, or feeling like a good person.
If the bet is "amazingly good impact or ruinously bad impact", you probably shouldn't take that bet unless you're pretty certain it's not going to turn out ruinously bad. And more generally, you shouldn't take bets you can't afford to lose, except in some edge cases.
I think those who have a hubristic level of certainty that their AI impact will be good, are in some cases gambling with other people's money/lives, and that's bad. If it's bad for me to bet everything I have unless I'm pretty sure I'm going to win, it's extra-bad for someone to take all my money without my consent and make a bet that they are just kinda sure is probably positive EV, but could leave me with nothing. And gambling with my life is worse than gambling with my money.
On the other hand, if you're betting with your own resources, or resources someone has willingly given you knowing the risks, and the upside is big and the downside is big but not ruinously so, I'd say go for it if you feel like it.
I would be curious if you think the following take is naive or reasonable?
It seems to me that a lot of bad AI decisions boil down to building something for scale and in that letting go of a certain environment that would be conducive to producing good thinking? Yet if we look at the VC or general entrepreneurial scaling model, it is not quite aligned with that purpose and so a lot of the organisations that we see will then go down such a path?
Isn't it then very important to be able to provide a space for ambitious people to work on something real in an environment that is optimised for the precursors of calm and friendly thought? (Since most other places will be optimised for scale.)
I think you can put fun, curiosity and a positive impact direction at the forefront of an organisation and not fall into the traps that you've described. I think the trick is to not have an EA oriented hardcore impact evaluation frame as I think that leads to fear, pressure and generally worse decision making. I'll see if this empirically holds but that's also why I'm asking what you think about this as I've had similar thoughts on the EA + startup sphere and this is the response/solution I've thought of (and am trying).
When I was 21, I was sucked into a world of ambition.
Starting my adult life in the Bay Area, I was surrounded by the sense that I was supposed to start a startup, change the world.
I never wanted to start a startup. Reading stories of famous founders, and living and working with startup founders myself, it seemed to me that the amount of belief you’d have to have in yourself and your idea bordered on insanity. Raised to value humility, and unable to even speak up for myself, it was a level of self-belief I couldn’t imagine ever reaching.
The version of changing the world that appealed to me was effective altruism. I didn’t have a grand vision; I just wanted to help people. The arguments for it seemed so simple, so obviously correct, when laid out in books and blog posts.
Right out of college, I joined an EA organization that worked with governments around the world on projects that cost tens of millions of dollars. One day at work, I was getting some beans and rice from the kitchen when I ran into a billionaire. All the money and power in the world were suddenly right there — and we were using them to save lives.
I coasted along for a time in that dream of changing the world for the better. I was young; many of us were. More than one person I knew had influence over millions of dollars before they were 25. The movement was young, too — too new to power to have yet stumbled into many of the pitfalls that come with it. As the movement grew faster and faster, accruing more followers, more money, and more political influence, it began to seem like we could do absolutely anything. It was a heady feeling.
Then, when I was 26, FTX collapsed. Suddenly, we all had to reckon with the effects of global-scale ambition. When it goes right, you can fund every charity and swing the election for Biden. When it goes wrong, you’ve been complicit in a criminal enterprise that shook the economy and fucked over a million people.
(I read Careless People last week, a memoir about how Facebook’s success put world-changing power in the hands of a few individuals, who were able to wield it almost entirely unchecked. When it goes right, you get democratic uprisings. When it goes wrong, you get genocide in Myanmar, and Trump as president.)
Around the same time as the FTX collapse, an AI arms race was beginning between OpenAI and Anthropic — two labs formed by people who’d been inspired by Bostrom’s Superintelligence, as we all were. By the logic of Superintelligence, it was just about the worst thing that could have happened.
People close to me were thrown into turmoil and depression. We’d done so much in the AI space, supporting and growing AI safety in all sorts of important ways — things that probably wouldn’t have happened without us. Now it seemed that all the investment that had gone into AI safety had had the primary effect of massively accelerating AI capabilities.
You try big things, you get big results.
I quit my EA job the month FTX collapsed, and I haven’t done anything in the space since. It wasn’t a big, dramatic, or even really deliberate decision. I was just burned out and disillusioned.
I still care about the world, and I’ve spent years feeling vaguely guilty that I’m no longer even pretending to work on its biggest problems. I thought I quit EA because I wanted to be happy (as an EA, I was constantly coercing myself to work on things that felt off to me, and was therefore constantly miserable). This felt like selfishness, or laziness. I struggled to justify myself in any other terms.
I don’t feel guilty anymore. I was talking about all this to a friend recently, and he said, “It seems plausible that the best thing to do if you really take AI x-risk seriously is to just stop working on AI at all.”
And that’s what I’ve been trying to say this whole time, whenever anyone asks me about my career. That I don’t want to try to have a big impact, if I can’t be certain that that impact will be positive rather than negative for the world — and I can’t be certain. To be certain of that would be hubris. Both in the memoirs I’ve read and in my real life, I’ve seen people who have genuinely wanted to change things for the better, gotten into the rooms where the sausage gets made, and ended up sickened by the consequences of what they were involved in.
EA funnels millions of dollars around. It funds career development for AI researchers who end up advancing capabilities at frontier labs. It funds insecticide-treated bed nets to protect people from malaria, and then those nets are used for fishing and pollute the waterways. The effect of the latter has been determined to be insignificant. The former, well, I guess it remains to be seen.