a strategy like "get existing top AGI researchers to stop"
There's a (hopefully obvious) failure mode where the AGI doomer walks up to the AI capabilities researcher and says “Screw you for hastening the apocalypse. You should join me in opposing knowledge and progress.” Then the AI capabilities researcher responds “No, screw you, and leave me alone”. Not only is this useless, but it's strongly counterproductive: that researcher will now be far more inclined to ignore and reject future outreach efforts (“Oh, pfft, I’ve already heard the argument for that, it’s stupid”), even if those future outreach efforts are better.
So the first step to good outreach is not treating AI capabilities researchers as the enemy. We need to view them as our future allies, and gently win them over to our side by the force of good arguments that meets them where they're at, in a spirit of pedagogy and truth-seeking.
(You can maybe be more direct with someone that they're doing counterproductive capabilities research when they're already sold on AGI doom. That's probably why your conversation at EleutherAI discord went OK.)
(In addition to “it would be directly super-counterproductive”, a second-order reason...
The history of cryonics' PR failure has something to teach here.
Dozens of deeply passionate and brilliant people all trying to make a case for something that in fact makes a lot of sense…
…resulted in it being seen as even more fringe and weird.
Which in turn resulted in those same pro-cryonics folk blaming "deathism" or "stupidity" or whatever.
Which reveals that they (the pro-cryonics folk) had not yet cleaned up their motivations. Being right and having a great but hopeless cause mattered more than achieving their stated goals.
I say this having been on the inside of this one for a while. I grew up in this climate.
I also say this with no sense of blame or condemnation. I'm just pointing out an error mode.
I think you're gesturing at a related one here.
This is why I put inner work as a co-requisite (and usually a prerequisite) for doing worthwhile activism. Passion is an anti-helpful replacement for inner insight.
I think this is basically correct: if people don't get right with their own intentions and motivations, it can sabotage their activism work.
(You can maybe be more direct with someone that they're doing counterproductive capabilities research when they're already sold on AGI doom. That's probably why your conversation at EleutherAI discord went OK.)
Correct, and that's why I took that approach.
So imagine my surprise when I informally learn that this sort of thinking is taboo.
It's not taboo. I've been discussing whether we should do this with various people off and on for the past five years. People take these ideas seriously. Just because people don't agree, or don't take it seriously enough, doesn't mean it's taboo!
FWIW I think it's a good idea too (even though for years I argued against it!). I think it should be done by a well-coordinated group of people who did lots of thinking and planning beforehand (and coordinating with the broader community probably) rather than by lone wolves (for unilateralist's curse reasons.)
It seems “taboo” to me. Like, when I go to think about this, I feel … inhibited in some not-very-verbal, not-very-explicit way. Kinda like how I feel if I imagine asking an inane question of a stranger without a socially sensible excuse, or when a clerk asked me why I was buying so many canned goods very early in Covid.
I think we are partly seeing the echoes of a social flinch here, somehow. It bears examining!
Open tolerance of the people involved with status quo and fear of alienating / making enemies of powerful groups is a core part of current EA culture! Steve's top comment on this post is an example of enforcing/reiterating this norm.
It's an unwritten rule that seems very strongly enforced yet never really explicitly acknowledged, much less discussed. People were shadow blacklisted by CEA from the Covid documentary they funded for being too disrespectful in their speech re: how governments have handled covid. That fits what I'd consider a taboo, something any socially savvy person would pick up on and internalize if they were around it.
Maybe this norm for open tolerance is downstream of the implications of truly considering some people to be your adversaries (which you might do if you thought delaying AI development by even an hour was a considerable moral victory, as the OP seems to). Doing so does expose you to danger. I would point out that while lc's post analogizes their relationship with AI researchers to Isreal's relationship with Iran. When I think of Israel's resistance to Iran nonviolence is not the first thing that comes to mind.
FYI, I thought this sort of idea was an obvious one, and I've been continuously surprised that it didn't have more discussion. I don't feel inhibited and am sort of surprised you are.
(I do think there's a lot of ways to do this badly, with costs on the overall coordination-commons, so, maybe I feel somewhat inhibited from actually going off to do the thing. But I don't feel inhibited from brainstorming potential ways to address the costs and thinking about how to do it)
The main argument I made was: It's already very hard to get governments and industry to meaningfully limit fossil fuel emissions, even though the relevant scientists (climatologists etc.) are near-unanimous about the long-term negative consequences. Imagine how much harder it would be if there wasn't a separate field of climatology, and instead the only acknowledged experts on the long-term effects of fossil fuels were... petroleum industry engineers and executives! That's the situation with AGI risk. We could create a separate field of AI risk studies, but even better would be to convince the people in the industry to take the risk seriously. Then our position would be *better* than the situation with climate change, not worse. How do we do this? Well, we do this by *not antagonizing the industry*. So don't call for bans, at least not yet.
The two arguments that changed my mind:
(a) We are running out of time. My timelines dropped from median 2040ish to median 2030ish.
(b) I had assumed that banning or slowing down AGI research would be easier the closer we get to AGI, because people would "wake up" to the danger after seeing compelling demonstrations and warning shots etc. However I...
Oh god, sorry, I just can't stop myself. I mean his political reputation is shredded beyond hope of repair. Loathed by the people of the UK in the same way that Tony Blair is, and seen as brilliant but disloyal in the same way that the guy in Mad Men is after he turns on the tobacco people.
We may be touching on the mind-killer here. Let us speak of such things no further.
Dominic Cummings lives, a prosperous gentleman.
Dominic Cummings is not dead, and I should remember that my ironic flourishes are likely to be taken literally because other people on the internet don't have the shared context that I would have if I was sounding off in the pub.
Agree that this is definitely a plausible strategy, and that it doesn't get anywhere near as much attention as it seemingly deserves, for reasons unknown to me. Strong upvote for the post, I want to see some serious discussion on this. Some preliminary thoughts:
You should submit this to the Future Fund's ideas competition, even though it's technically closed. I'm really tempted to do it myself just to make sure it gets done, and very well might submit something in this vein once I've done a more detailed brainstorm.
Probably a good idea, though I'm less optimistic about the form being checked. I'll plan on writing something up today. If I don't end up doing that today for whatever reason, akrasia, whatever, I'll DM you.
I am confident that they would not let it being technically closed stop them from considering the proposal if someone they respected pointed at the proposal (and likely even if they didn't), and I'd be happy to do the pointing for you if necessary. If that seems necessary, DM me.
Please DM me if you end up starting some sort of project along these lines. I have some (admittedly limited) experience working in media/public relations, and can probably help a bit.
For anyone else also interested, you should add yourself on this spreadsheet. https://docs.google.com/spreadsheets/d/1WEsiHjTub9y28DLtGVeWNUyPO6tIm_75bMF1oeqpJpA/edit?usp=sharing
It's very useful for people building such an organisation to know of interested people, and vice versa.
If you don't want to use the spreadsheet, you can also DM me and I'll keep you in the loop privately.
If you're making such an organisation, please contact me. I'd like to work with you.
Not that you said otherwise, but just to be clear: it is not the case that most capabilities researchers at DeepMind or OpenAI have similar beliefs as people at EleutherAI (that alignment is very important to work on). I would not expect it to go well if you said "it seems like you guys are speeding up the deaths of everyone on the planet" at DeepMind.
Obviously there are other possible strategies; I don't mean to say that nothing like this could ever work.
Completely understood here. It'd be different for OpenAI, even more different for DeepMind. We'd have to tailor outreach. But I would like to try experimentation.
it seems like you guys are speeding up the deaths of everyone on the planet
I've found over the years that people only ever get really angry at you for saying curious things if they think those ideas might be true. Maybe we're already half-way there at DeepMind!
One thing to consider: if we successfully dissuade deepmind researchers from working on AGI, who actually do take alignment issues a little bit seriously, does it instead get developed by meta researchers who (for the sake of argument) don't care?
More generally you're not going to successfully remove everyone from the field. So the people you'll remove will be those who are most concerned about alignment, leaving those who are least concerned to discover AGI.
It's certainly a consideration, and I don't want us to convince anybody who is genuinely working on alignment at DeepMind to leave. On the other hand, I don't think their positive affinity towards AGI alignment matters much if what they are doing is accelerating its production. This seems a little like saying "you shouldn't get those Iranian nuclear researchers to quit, because you're removing the 'checks' on the other more radical researchers". It's probably a little naive to assume they're "checking" the other members of DeepMind if their name is on this blog post: https://www.deepmind.com/blog/generally-capable-agents-emerge-from-open-ended-play.
This is also probably ignoring the maybe more plausible intermediate success of outreach, which is to get someone who doesn't already have concerns to have them, who then keeps their job because of inertia. We can still do a lot of work getting those who refuse to quit to help move organizations like OpenAI or DeepMind more toward actual alignment research, and create institutional failsafes.
Thanks a lot for doing this and posting about your experience. I definitely think that nonviolent resistance is a weirdly neglected approach. "mainstream" EA certainly seems against it. I am glad you are getting results and not even that surprised.
You may be interested in discussion here, I made a similar post after meeting yet another AI capabilities researcher at FTX's EA Fellowship (she was a guest, not a fellow): https://forum.effectivealtruism.org/posts/qjsWZJWcvj3ug5Xja/agrippa-s-shortform?commentId=SP7AQahEpy2PBr4XS
Are you aware of Effective Altruism's AI governance branch? I didn't look into it in detail myself, but there are definitely dozens of people already working on outreach strategies that they believe to be the most effective. FHI, CSER, AI-FAR, GovAI, and undoubtedly more groups have projects ongoing for outreach, political intervention, etc. with regards to AI Safety. If you want to spend your marginal time on stuff like this, contact them.
It does appear true that the lesswrong/rationalist community is less engaged with this strategy than might be wise, but I'm curious if those organisations would say if people currently working on technical alignment research should switch to governance/activism, and what their opinion is on activism. 80,000 hours places AI technical research above AI governance in their career impact stack, though personal fit plays a major part.
I was not aware. Are these outreach strategies towards the general public with the aim of getting uninvolved people to support AI alignment efforts, or are they toward DeepMind employees to get them to stop working so hard on AGI? I know there are lots of people raising awareness in general, but that's not really the goal of the strategy that I've outlined.
Much of the outreach efforts are towards governments, and some to AI labs, not to the general public.
I think that because of the way crisis governance often works, if you're the designated expert in a position to provide options to a government when something's clearly going wrong, you can get buy in for very drastic actions (see e.g. COVID lockdowns). So the plan is partly to become the designated experts.
I can imagine (not sure if this is true) that even though an 'all of the above' strategy like you suggest seems like on paper it would be the most likely to produce success, you'd get less buy in from government decision-makers and be less trusted by them in a real emergency if you'd previously being causing trouble with grassroots advocacy. So maybe that's why it's not been explored much.
This post by David Manheim does a good job of explaining how to think about governance interventions, depending on different possibilities for how hard alignment turns out to be: https://www.lesswrong.com/posts/xxMYFKLqiBJZRNoPj/
Seems to me like a blindingly obvious post that was kind of outside of the overton window for too long. Eliezer also smashed the window with his TIME article, but this was first, so I think it's still a pretty great post. +4
I think this post makes sense given the premises/arguments that I think many people here accept: that AG(S)I is either amazingly good or amazingly bad, and that getting the good outcome is a priori vastly improbable, and that the work needed to close the gap between that prior and a good posterior is not being done nearly fast enough.
I don't reject those premises/arguments out of hand, but I definitely don't think they're nearly as solid as I think many here do. In my opinion, the variance in goodness of reasonably-thinkable post-AGSI futures is mind-bogglingly large, but it's still probably a bell curve, with greater probability density in the "middle" than in super-heaven or ultra-hell. I also think that just making the world a better place here and now probably usually helps with alignment.
This is probably not the place for debating these premises/arguments; they're the background of this post, not its point. But I do want to say that having a different view on that background is (at least potentially) a valid reason for not buying into the "containment" strategy suggested here.
Again, I think my point here is worthwhile to mention as one part of the answer to the post's question "why don't more people think in terms of containment". I don't think that we're going to resolve whether there's space in between "friendly" and "unfriendly" right here, though.
And the best part of this is that instead of telling people who might already be working in some benign branch of ML that there's this huge problem with AGI, who can potentially defect and go into that branch because it sounds cool, you're already talking to people who, from your perspective, are doing the worst thing in the world. There's no failure mode where some psychopaths are going to go be intrigued by the "power" of turning the world into paperclips. They're already working at DeepMind or OpenAI. Personally, I think that failure mode is overblown, but this is one way you get around it.
If this is saying that there's no plausible downside, that statement seems incorrect. It's not a very important bit whether or not someone has a narrative of "working on AGI". It takes 2 minutes to put that on a resume, and it would even be defensible if you'd ever done anything with a neural net. More important is the organizing principles of their technical explorations. There's a whole space of possible organizing principles. If you're publicizing your AI capabilities insights, the organizing principle of "tweak and mix algorithms to get gains on benchmarks" is less burning the AGI fu...
An argument like "well, what if you actually radicalize the Iranians into hardening their stance on developing nuclear weapons through all of this discouragement" would also be unsatisfying.
Bruce Bueno de Mesquita's game theoretic computer models predict that this is what happens for the sanctions in Iran. They destroy the domestic opposition to developing nuclear weapons.
Going a bit meta, the fact that this post seems to receive majority agreement makes me question the degree to which the consensus against it is real (as supposed to a signaling equilibrium). And I also want to mention that I figured from the title that this was about AI boxing, and it's possible that others haven't clicked on it because AI boxing doesn't seem that interesting.
I am surprised that a post with nearly 650 karma doesn't have a review yet. It seems like it should have at least one so it can go through to the voting phase.
What do you think about offering an option to divest from companies developing unsafe AGI? For example, by creating something like an ESG index that would deliberately exclude AGI-developing companies (Meta, Google etc) or just excluding these companies from existing ESGs.
The impact = making AGI research a liability (being AGI-unsafe costs money) + raising awareness in general (everyone will see AGI-safe & AGI-unsafe options in their pension investment menu + a decision itself will make a noise) + social pressure on AGI researchers (equating them...
I’m not persuaded by this argument because developing AGI is not purely a negative. AGI could either have a massive positive result (an aligned AI) or a massive negative result (an unaligned AI). Because of winner-takes-all effects, it’s likely that whichever AI is developed first will determine the overall outcome. So one way to decrease the chance of an unfriendly AI is to stop all AI research and another way to do this is to develop a friendly AI first.
We know that multiple agencies worldwide are working on developing AI. Moreover, it’s might be that th...
The consensus among alignment researchers is that if AGI were developed right now it would be almost certainly a negative. We simply don't know how one would ensure a superintelligence was benevolent yet, even theoretically. The argument is more convincing if you agree with that assessment, because the only way to get benevolent AI becomes to either delay the creation of AGI until we do have that understanding or hope that the understanding arrives in time.
The argument also becomes more convincing if you agree with the assessment that advancements toward AGI aren't going to be driven mostly by moore's law and is instead going to be concentrated in a few top research and development companies - DeepMind, Facebook AI labs, etc. That's my opinion and it's also one I think is quite reasonable. Moore's law is slowing down. It's impossible for someone like me to predict how exactly AGI will be developed, but when I look at the advancements in AGI-adjacent-capabilities-research in the last ten years, it seems like the big wins have been in research and willingness to spend from the big players, not increased GPU power. It's not like we know of some algorithm right now which we just need 3 OOMs more compute for, that would give us AGI. The exception to that would maybe be full brain emulation, which obviously comes with reduced risk.
The consensus among alignment researchers is that if AGI were developed right now it would be almost certainly a negative
This isn't true. [ETA: I linked the wrong survey before.]
I thought that I had read Yudkowsky estimating that the probability of an AGI being unfriendly was 30% and that he was working to bring that 30% to 0%.
That's not Yudkowsky's current position. https://www.lesswrong.com/posts/j9Q8bRmwCgXRYAgcJ/miri-announces-new-death-with-dignity-strategy describes the current view and in the comments, you see the views of other people at MIRI.
Yudkoskwy is at 99+% that AGI right now would kill humanity.
Also, look at his bet with Bryan Caplan. He's not joking.
And, also, Jesus, Everyone! Gradient Descent, is just, like, a deadly architecture. When I think about current architectures, they make Azathoth look smart and cuddly. There's nothing friendly in there, even if we can get cool stuff out right now.
I don't even know anymore what it is like to not see it this way. Does anyone have a good defense that current ML techniques can be stopped from having a deadly range of action?
I don't want to be provocative, but if there was political will to stop AGI research it could probably be stalled for a long time. In order to get that political will, not only in the West but in China as well, a pretty effective way to do it might be figure out a way to use a pre-AGI model to cause mayhem/harm that's bad enough to get the world's attention, while not being apocalyptic.
As a random example, if AI is used somehow to take down the internet for a few days, the discourse and political urgency regarding AGI would change drastically. A close analogue is how quickly the world started caring about Gain-of-function-research after Covid.
I have thought this exact thing for something like 2 years. I do think there are some potential backfire risks which make this an uncertain strategy.
Am I the only one who thinks that the world as it is is unbelievably, fantastically, super bad? The concept of an AI destroying the world would only be bad because it would prevent a very good potential future from coming into existence, that could not otherwise happen. Stopping the AI would remove all hope of it ever happening.
Not speaking for Flaglandbase, but I'd argue the world right now (or rather, life on earth) is super bad because it's dominated by animal suffering. I'm also happy most days.
Iran is an agent, with a constrained amount of critical resources like nuclear engineers, centrifuges, etc.
AI development is a robust, agent-agnostic process that has an unlimited number of researchers working in adjacent areas who could easily cross-train to fill a deficit, an unlimited number of labs which would hire researchers from DeepMind and OpenAI if they closed, and an unlimited amount of GPUs to apply to the problem.
Probably efforts at getting the second-tier AI labs to take safety more seriously, in order to give the top tier more slack, w...
Iran is an agent...
Iran is a country, not an agent. Important distinction and I'm not being pedantic here. Iran's aggressive military stance towards Israel is not quite the result of a robust, agent-agnostic process but it's not the result of a single person optimizing for some goal either.
with a constrained amount of critical resources like nuclear engineers, centrifuges, etc. AI development is a robust, agent-agnostic process that has an unlimited number of researchers working in adjacent areas who could easily cross-train to fill a deficit...and an unlimited amount of GPUs to apply to the problem.
If we're in a situation where it's an open secret that a certain specific research area leads to general artificial intelligence, we're doomed. If we get into a position where compute is the only limiting factor, we're doomed. There's no arguing there. The goal is to prevent us from getting into that situation.
As it stands now, certainly lots of companies are practicing machine learning. I have secondhand descriptions of a lot of the "really advanced" NSA programs and they fit that bill. Not a lot of organizations I know of however are actually pushing the clock hand forward on AGI and meta. Even fewer are doing that consistently, or getting over the massive intellectual hurdles that require a team like DeepMind's. Completing AGI will probably be the result of a marriage of increased computing power, which we can't really control, and insights pioneered and published by top labs, which I legitimately think we could to some degree by modifying their goals and talking to their members. OpenAI is a nonprofit. At the absolute bare minimum, none of these companies' publish their meta research for money. The worst things they seem to do at this stage aren't achieved when they're reaching for power so much as playing an intellectual & status game amongst themselves, and fulfilling their science fiction protagonist syndromes.
I don't doubt that it would be better for us to have AI alignment solved than to rely on these speculations about how AI will be engineered, but I do not see any good argument as to why it's a bad strategy.
Let me articulate my intuitions in a little bit more of a refined way: "If we ever get to a point where there are few secrets left, or that it's common knowledge one can solve AGI with ~1000-10,000 million dollars, then delaying tactics probably wouldn't work, because there's nothing left for DeepMind to publish that speeds up the timeline."
Inside those bounds, yes. I still think that people should keep working on alignment today, I just think other dumber people like me should try the delaying tactics I articulated in addition to funding alignment research.
I think the framing of "convince leading AI researchers to willingly work more closely with AI alignment researchers, and think about the problem more themselves" is the better goal. I don't think hampering them generally is particularly useful/effective, and I don't think convincing them entirely to "AGI is very scary" is likely either.
There’s a trap here where the more you think about how to prevent bad outcomes from AGI, the more you realize you need to understand current AI capabilities and limitations, and to do that there is no substitute for developing and trying to improve current AI!
A secondary trap is that preventing unaligned AGI probably will require lots of limited aligned helper AIs which you have to figure out how to build, again pushing you in the direction of improving current AI.
The strategy of “getting top AGI researchers to stop” is a tragedy of the commons: They can ...
Yeah, I agree, this is not focussed on enough. I was thinking about this sometime back and:
The obvious counter is "there can exist evil players". The counter-counter to that is: well – there can exist evil players (say, terrorists) making nuclear bombs as well. But we don't see that, do we? Why? Because the materials/knowledge/training required to make that is highly protected.
We could definitely try to replicate that strategy. Make the GPUs a protected item: say, only X for personal use. And cap the amount of units you can buy for commercial purpose...
Not all political activism has to be waving flags around and chanting chants. Sometimes activists actually have goals and then accomplish something. I think we should try to learn from those people, as lowly as your opinion might be of them, if we don't seem to have many other options.
This does make me wonder if activism from scientists has ever worked significantly. https://www.bismarckanalysis.com/Nuclear_Weapons_Development_Case_Study.pdf documents the Manhattan Project, https://www.palladiummag.com/2021/03/16/leo-szilards-failed-quest-to-build-a-...
Thanks for the post! I think asking AI Capabilities researchers to stop is pretty reasonable, but I think we should be especially careful not to alienate the people closest to our side. E.g. consider how the Protestants and Catholics fought even though they agree on so much.
I like focusing on our common ground and using that to win people over.
Overall, I like the post’s emphasis on taking personal action, conditional on technical alignment being unlikely at the current rate of general-purpose AI development or impossible for fundamental reasons.
Two thoughts that I am happy to elaborate on:
I wonder how many of us don't want to see AI progress slow down because AI progress keeps proving us right.
After spending at least hundreds of hours reading lesswrong et al. and not being able to alter our path towards AI, I want the satisfaction of telling people "See? Told you so!"
For anyone interested in working on this, you should add yourself on this spreadsheet. https://docs.google.com/spreadsheets/d/1WEsiHjTub9y28DLtGVeWNUyPO6tIm_75bMF1oeqpJpA/edit?usp=sharing
It's very useful for people building such an organisation to know of interested people, and vice versa.
If you don't want to use the spreadsheet, you can also DM me and I'll keep you in the loop privately.
If you're making such an organisation, please contact me. I'd like to work with you.
What is this supposed to look like if researchers are actually convinced? The whole raison d'etre for Deepmind is to create AGI. And while AGI may be a world-ending invention, many of the creations along the way are likely to be very economically valuable to Deepmind's parent company, Google.
Take for example this blog where Deepmind describes an application of AI to Google's datacenters that was able to reduce the cooling bill by 40%. What is the economic justification for supporting Deepmind researchers with a minimum salary of $400k if they aren't produc...
Israel as a nation state has an ongoing national security issue involving Iran.
For the last twenty years or so, Iran has been covertly developing nuclear weapons. Iran is a country with a very low opinion of Israel and is generally diplomatically opposed to its existence. Their supreme leader has a habit of saying things like "Israel is a cancerous tumor of a state" that should be "removed from the region". Because of these and other reasons, Israel has assessed, however accurately, that if Iran successfully develops nuclear weapons, it stands a not-insignificant chance of using them against Israel.
Israel's response to this problem has been multi-pronged. Making defense systems that could potentially defeat Iranian nuclear weapons is an important component of their strategy. The country has developed a sophisticated array of missile interception systems like the Iron Dome. Some people even suggest that these systems would be effective against much of the incoming rain of hellfire from an Iranian nuclear state.
But Israel's current evaluation of the "nuclear defense problem" is pretty pessimistic. Defense isn't all it has done. Given the size of Israel as a landmass, it would be safe to say that it's probably not the most important component of Israel's strategy. It has also tried to delay, or pressure Iran into delaying, its nuclear efforts through other means. For example, it gets its allies to sanction Iran, sabotages its facilities, and tries to convince its nuclear researchers to defect.
In my model, an argument like "well, what's the point of all this effort, Iran is going to develop nuclear weapons eventually anyways" would not be very satisfying to Israeli military strategists. Firstly, that the Iranians will "eventually" get nuclear weapons is not guaranteed. Secondly, conditional on them doing it, it's not guaranteed it'll happen the expected lifetime of the people currently living in Israel, which is a personal win for the people in charge.
Thirdly, even if it's going to happen tomorrow, every day that Iran does not possess nuclear weapons under this paradigm is a gift. Delaying a hypothetical nuclear holocaust means increasing the life expectancy of every living Israeli.
An argument like "well, what if you actually radicalize the Iranians into hardening their stance on developing nuclear weapons through all of this discouragement" might be pragmatic. But disincentivizing, dissuading, and sabotaging people's progress toward things generally does what it says on the tin, and Iran is already doing nuclear weapons development. Any "intervention" you can come up with towards Iranian nuclear researchers is probably liable to make things better and not worse. Speaking more generally, there is still an instrumental motivation to get Iran to stop their nuclear weapons program, even if a diplomatic strategy would serve their needs better. Israel's sub-goal of mulliganing their timeline away from a nuclear Iran is probably reasonable.
There are many people on this website that believe the development of AGI, by anyone in the world, would be much worse in expectation than Iran developing nuclear weapons, even from the perspective of a fiercely anti-Iranian nationalist. There are also some people on this website who additionally believe there is little to no hope for existing AI safety efforts to result in success. Since so far it doesn't seem like there are any good reasons to believe that it's harder and more genius-intense to develop nuclear weapons than it is to develop AGI, one might naively assume that these people would be open to a strategy like "get existing top AGI researchers to stop". After all, that method has had some degree of success with regard to nuclear nonproliferation, and every hour that the catastrophic AGI extinction event doesn't happen is an hour that billions of people get to continue to live. One would think that this opens up the possibility, and even suggests the strategy, of finding a way to reach and convince the people actually doing the burning of the AGI development commons.
So imagine my surprise when I informally learn that this sort of thinking is quasi-taboo. That people who wholesale devote their entire lives to the cause of preventing an AI catastrophe do not spend much of their time developing outreach programs or supporting nonviolent resistance directed toward DeepMind researchers. That essentially, they'd rather, from their perspective, literally lay down and die without having mounted this sort of direct action.
I find this perspective limiting and self-destructive. The broader goal of alignment, the underlying core goal, is to prevent or delay a global AGI holocaust, not to come up with a complete mathematical model of agents. Neglecting strategies that affect AGI timelines is limiting yourself to the minigame. The researchers at DeepMind ought to be dissuaded or discouraged from continuing to kill everybody, in addition to and in conjuction with efforts to align AI. And the more pessimistic you are about aligning AI, the more opposed you should be to AGI development, the more you should be spending your time figuring out ways to slow it down.
It seems weird and a little bit of a Chesterton's fence to me that I'm the first person I know of to broach the subject on LessWrong with a post. I think an important reason is that people think these sorts of strategies are infeasible or too risky, which I strongly disagree is the case. To guard against this, I would now like to give an example of such an intervention that I did myself. This way I can provide a specific scenario for people in the comments section to critique instead of whatever strawman people might associate with "direct action".
EleutherAI is a nonprofit AI capabilities research collective. Their main goal up until now has been to release large language models like the kind that OpenAI has but keeps proprietary. As a side project they occasionally publish capability research on these large language models. They are essentially a "more open" OpenAI, and while they're smaller and less capable I think most people here would agree that their strategy and behavior before 2022, as opposed to stated goals, were probably more damaging than even OpenAI from an AI alignment perspective.
Interestingly, most of the people involved in this project were not unaware of the concerns surrounding AGI research; in fact they agreed with them! When I entered their discord, I found it counterintuitive that a large portion of their conversations seemed dedicated to rationalist memes, given the modus operandi of the organization. They simply learned not to internalize themselves as doing bad things, for reasons many reading probably understand.
Some people here are nodding their heads grimly; I had not yet discovered this harrowing fact about a lot of ML researchers who are told about the alignment problem. So one day I went into the #ai-alignment (!) discord channel inside the discord server where their members coordinate and said something like:
They gave me a standard post they use as a response. I told them I'd already read the post and that it didn't make any sense. I explained the whole game surrounding timelines and keeping the universe alive a little bit longer than it otherwise would be. I then had a very polite argument with Leo Gao and a couple other people from the team for an hour or so. By the end some members of the team had made some pretty sincere seeming admissions that the Rotary Embeddings blog-post I linked earlier was bad, and some team members personally admitted to having a maybe-unhealthy interest in publishing cool stuff, no matter how dangerous.
I have no idea if the conversation actually helped long term, but my sense is that it did. Shortly thereafter they took a bunch of actions they alluded to in the blog post, like attempting to use these large language models for actual alignment research instead of just saying that what they were doing was OK because somebody else might after they open sourced them. I also sometimes worry whether or not the research they were doing ever consequented in faster development of AGI in the first place, but an institution could have people to assess things like that. An institution could do A/B testing on interventions like these. It can talk to people more than once. With enough resources it can even help people (who may legitimately not know what else they can work on) find alternative career paths.
With these kinds of efforts, instead of telling people who might already be working in some benign branch of ML that there's this huge problem with AGI, who can potentially defect and go into that branch because it sounds cool, you're already talking to people who, from your perspective, are doing the worst thing in the world. There's no failure mode where some psychopaths are going to go be intrigued by the "power" of turning the world into paperclips. They're already working at DeepMind or OpenAI. Personally, I think that failure mode is overblown, but this is one way you get around it.
I don't have the gumption to create an institution like this from scratch. But if any potential alignment researchers or people-who-would-want-to-be-alignment-researchers-but-aren't-smart-enough are reading this, I'm begging you to please create one so I can give my marginal time to that. Using your talents to try to develop more math sounds to a lot of people like it might be a waste of effort. I know I'm asking a lot of you, but as far as I can tell, figuring out how to do this well seems like the best thing you can do.
Not all political activism has to be waving flags around and chanting chants. Sometimes activists actually have goals and then accomplish something. I think we should try to learn from those people, as lowly as your opinion might be of them, if we don't seem to have many other options.