I do like the core concept here, but I think for it to work you need to have a pretty well specified problem that people can't weasel out of. (I expect the default result of this to be "1000 researchers all come up with reasons they think they've solved alignment, without really understanding what was supposed to be hard in the first place.")
You touch upon this in your post but I think it's kinda the main blocker.
I do think might be a surmountable obstacle though.
The first issue in my mind is that it’s straightforwardly messing up all business plans those companies and labs have for their staff to leave for 3 months. Leadership will by default be angry and work hard to discredit you and perhaps threaten to fire anyone who accepts your deal.
My yes-and thought here is "okay, can we do this in a way that the various labs will feel is exciting, or win-win?". Rather than making the deal with individuals, somehow frame it as a partnership with the companies.
Leadership will by default be angry and work hard to discredit you and perhaps threaten to fire anyone who accepts your deal.
"Why is management so insistent I don't go do this alignment thing if it won't actually change my view?"
This sounds like an wonderful way to invoke the Streisand effect and cause an internal political upheaval. I would love to see DeepMind leadership overreact to this kind of outreach and attempt to suppress a bunch of Google AI researcher's moral compunctions through heavy-handed threats. Just imagine someone doing this if Google researchers had concerns about racial bias in their algorithms. It really seems like the dream scenario.
I want to explicitly mark this (in my mind) as "babble" of the sort that is "Hufflepuff Bones" from HPMOR. I don't want to have this relationship with the leadership at AI labs. I'm way more interested in finding positive sum actions to take (e.g. trades that AI labs are actively happy about) than adversarial moves.
Man, wtf is this argument? Yes you should talk to leaders in industry. I have not yet tried at all hard enough on the "work together" strategy to set it on fire at this moment. I don't think such people they see themselves as adversaries, I don't think we have been acting like adversaries, and I think this has allowed us to have a fair amount of conversation and cooperation with them.
I feel a bit like you're saying "a new country is stockpiling nukes, so let's make sure to quickly stop talking to them". We're in a free market economy, everyone is incentivized to build these companies, not just the people leading existing ones. That's like most of the problem.
I think it's good and healthy to think about your BATNA and figure out your negotiating position, so thinking about this seems good to me, but just because you might be able to exercise unilateral control doesn't mean you should, it lowers trust and the ability for everyone to work together on anything.
I'm not confident here and maybe I should already have given up after OpenAI was founded but I'm not ready to call everyone adversaries, it's pretty damn hard to backtrack on that, and it makes conversation and coordination way way harder.
This is a creative idea, and maybe it'll even work, but I do not want us to completely dry up the resources of good-natured people like Sam Bankman-Fried. We could, at the absolute minimum, start this mission with 500k per researcher, one to five well respected researchers internal to these organizations, and a million dollar bonus. A billion dollars is a shit ton of money to waste if something like this doesn't work and I honestly don't think it's wise to set a standard of "asking for it" unless we're literally repelling an asteroid. Six million dollars is a shit ton of money to waste if it doesn't work.
Things aren't that desperate (yet). We have at least a few years to scale up.
The capabilities people have thrown a challenge right back at you!
Give the world's hundred most respected AI alignment researchers $1M each to spend 3 months explaining why AI will be misaligned, with an extra $100M if by the end they can propose an argument capability researchers can't shoot down. They probably won't make any progress, but from then on when others ask them whether they think alignment is a real unsolved problem, they will be way more likely to say no. That only costs you a hundred million dollars!
I don't think I could complete this challenge, yet I also predict that I would not then say that alignment is not a real unsolved challenge. I mostly expect the same problem from the OP proposal, for pretty similar reasons.
For the record I think this would also be valuable! If as an alignment researcher your arguments don't survive the scrutiny of skeptics, you should probably update away from them. I think maybe what you're highlighting here is the operationalization of "shoot down", which I wholeheartedly agree is the actual problem.
Re: the quantities of funding, I know you're being facetious, but just to point it out, the economic value of of "capabilities researchers being accidentally too optimistic about alignment" and "alignment researchers being too pessimistic about alignment" are asymmetric.
Unpopular opinion (on this site I guess): AI alignment is not a well defined problem, there is no clear cut resolution to it. It will be an incremental process, similar to cybersecurity research.
About the money, I would do the opposite: select researchers that would do it for free, just pay them living expenses and give them arbitrary resources.
Why not do something like the Clay Prize for Mathematics? Offer gargantuan sums for progress on particular subproblems like corrigibility, or inner alignment, or coming up with new reasons why it's even harder than it already looks, or new frameworks for attacking the problem.
Let MIRI be the judge, and make it so that even the most trivial progress gets a fat cheque. (A few thousand for pointing out typos in MIRI papers seems fair)
It should be obviously easy money even to clever undergraduate compscis.
That should inspire vast numbers of people to do the research instead of their day jobs, which would obviously be great news.
That will likely lead to companies banning their staff from thinking about the problem, which is the best possible publicity. Nothing gets one of us thinking like being told to not think about something.
It might even produce some progress (even better!!).
But what it will likely really do is convince everyone that alignment is the same sort of problem as P=NP, or the Riemann Hypothesis.
This is an interesting idea (of course the numbers might need to be tweaked).
My first concern would be disagreement about what constitutes a solution: what happens if one/many researchers incorrectly think they've solved the problem, and they continue to believe this after alignment researchers "shoot them down", but the solution appears to be correct to most researchers?
Does this backfire and get many to dismiss the problem as already solved?
I think we'd want to set things up with adjudicators that are widely respected and have the 'right' credentials - so e.g. Stuart Russell is an obvious choice; maybe Wei Dai? I'm not the best judge of this.
Clearly we'd want Eliezer, Paul... advising, but for final decisions to be made by people who are much harder to dismiss.
I think we'd also want to set up systems for partial credit. I.e. rewards for anyone who solves a significant part of the problem, or who lays out a new solution outline which seems promising (even to the most sceptical experts).
I'd want to avoid the failure mode where people say at the end (or decide before starting) that while they couldn't solve the problem in 3 months, it's perfectly doable in e.g. a couple of years - o...
Give the world's thousand most respected AI researchers $1M each to spend 3 months working on AI alignment, with an extra $100M if by the end they can propose a solution alignment researchers can't shoot down.
An unfortunate fact is that the existing AI alignment community is bad at coming to consensus with alignment solution proposers about whether various proposals have been "shot down". For instance, consider the proposed solution to the "off-switch problem" where an AI maintains uncertainty about human preferences, acts to behave well according to those (unknown) preferences, and updates its belief about human preferences based on human behaviour (such as inferring "I should shut down" from the human attempting to shut the AI down), as described in this paper. My sense is that Eliezer Yudkowsky and others think they have shot this proposal down (see the Arbital page on the problem of fully updated deference), but that Stuart Russell, a co-author of the original paper, does not (based on personal communication with him). This suggests that we need significant advances in order to convince people who are not AI safety researchers that their solutions don't work.
It was argued that these plans are only relevant in 15+ timelines
Is it OK if I get kinda angry when I think about these constraints, as someone born around 2000? We had 15 years 15 years ago. What was everybody doing when I was 7 years old?
They were wanking on endlessly about quantum consciousness and how any agent sufficiently intelligent to escape its box would be intelligent enough to figure out that the utility function we'd given it wasn't actually what we wanted.
I've been going on about 'AI will kill us all' to anyone who'll listen for about ten years now. Everyone I know thinks I've got another stupid science-fiction bee in my bonnet to replace my previous idiotic nerd-concerns about overpopulation and dysgenics and grey goo and engineered plagues and factory farming and nuclear accidents and nuclear proliferation and free speech and ubiquitous surveillance and whatever the hell else it was that used to seem important.
Almost all my friends are graduates of the University of Cambridge, and roughly half of them studied real subjects and can program computers.
Goddamn it, I know actual AI safety researchers, people much cleverer and more capable than me, who just looked at AlphaZero and shrugged, apparently completely unable to see that building a really good general reinforcement learner might have real-world implications beyond chess.
There's some bunch of academic parasites in Cambridge who were originally...
Another way this could potentially backfire. $1,000,000 is a lot of money for 3 months. A lump sum like this will cause at least some of the researchers to A) Retire, B) Take a long hiatus/sabbatical, or C) Be less motivated by future financial incentives.
If 5 researchers decide to take a sabbatical, then whatever. If 150 of them do? Maybe that's a bigger deal. You're telling me you wouldn't consider it if 5-10 times your annual salary was dropped in your lap?
Oh, that's so unfair! I've been avoiding working in AI for years, on account I worried I might have a good enough idea to bring the apocalypse forward a day. One day out of 8 billion lives makes me Worse Than Hitler(TM) by at least an order of magnitude.
And I've even been avoiding blogging about basic reinforcement learning for fear of inspiring people!
Where's my million?!
Also hurry up, I hardly have any use for money, so it will take ages to spend a million dollars even if I get creative, and it doesn't look like we have very long.
This actually brings up an important consideration. It would be bad to incentivize AGI research by paying people who work inside it huge sums, the same way paying slaveowners to free their slaves is a poor abolitionists' strategy.
? This was literally what the UK did to free their slaves, and (iiuc) historians considered it more successful than (e.g.) the US strategy, which led to abolition a generation later and also involved a civil war.
This post is heavily based off this excellent comment (the author's name is not relevant)
Usually people object to paying people large sums of money to work on alignment because they don't expect them to produce any good work (mostly because it's very hard to specify alignment, see below). This is a feature, not a bug.
Being able to say "the 1000 smartest people working in AI couldn't make headway in Alignment in 3 months even when they were paid $1 million and a solution would be awarded $100 million" is very good for persuading existing researchers that this is a very hard problem.
Would this really go a long way to convincing those at major AI labs that alignment is hard? We could actually ask people working at these places if there were no progress made in those 3 months if it would change their minds.
Another problem is looooong timelines. They can agree it's hard, but their timelines may be too far to matter to work on it now. I could think of a couple counter-arguments that may work (but it would better to actually test them with real people)
Though, generally having a set of people working on apologetics for both US & Chinese AI Researchers is potentially extremely high-impact. If you're interested and might be able to donate , DM me for a call.
Next Steps to seriously consider this proposal
Specifying "Solving Alignment"
Part of the problem of alignment is we don't know the correct framework to specify it. The quoted text suggests the criteria for a solution as "a solution alignment researchers can't shoot down.", which side-steps this issue; however, specifying the problem in as fine-grained detail would be extremely useful for communicating to these researchers.
One failure mode would be them taking the money, not get work done, and then argue the problem wasn't specified enough to make any meaningful progress which limits how persuasive this stunt could be. Documents like ELK are more useful specifications that captures the problem to various degrees, and I wish we had more problems like that.
Listing 1000 AI researcher intellectuals
The initial idea is to hire 1000 best AI researchers to work on the problem, not because we expect them to solve it, but by all of them failing
here are a few different proxy's we can use, such as citations and top researchers at AI Labs. So far I've got
3. Myself and all my friendsConvincing the CCP and backed researchers is a blank spot in my map, and if anyone knows anything, please comment or message me for a video call.
Actually Hiring People/Creating a Company to Do This
We would need competent people to work on the specification of the problem, outreach and selecting who to pay, keeping up with researchers (if you're paying them $1 million dollars, you also can have 1-on-1 calls which would be useful to make the most of), and reviewing actual work produced(which could actually be done by the community/ independent researchers/orgs).
Timelines Argument
It was argued that these plans are only relevant in 15+ timelines, but huge social changes/cultural norms have happened within 1 year time periods. I'm not giving examples, but they went from, say 50 to 100 to 70 within a few months, which may be significantly different than Alignment.
This Plan May Backfire and Increase Capabilities
A way this can backfire is increasing race conditions, such that everyone wants to create the AGI first. Or at least, more people than are already doing so right now.
I think this is a relevant possibility, and this should be taken into account with whoever reaches out to talk to these top researchers when offering to pay them the large sums of money.