(This has been sitting in my drafts folder since August 2017. Robin Hanson's recent How Lumpy AI Services? made me think of it again. I'm not sure why I didn't post it back then. I may have wanted to add more reasons, details and/or citations, but at this point it seems better to just post it as is. Apologies to those who may have come up with some of these arguments earlier.)
Robin Hanson recently wrote, "Recently AI risk has become something of an industry, with far more going on than I can keep track of. Many call working on it one of the most effectively altruistic things one can possibly do. But I’ve searched a bit and as far as I can tell that foom scenario is still the main reason for society to be concerned about AI risk now." (By "foom scenario" he means a local intelligence explosion where a single AI takes over the world.) In response, I list the following additional reasons to work urgently on AI alignment.
-
Property rights are likely to not hold up in the face of large capability differentials between humans and AIs, so even if the intelligence explosion is likely global as opposed to local, that doesn't much reduce the urgency of working on AI alignment.
-
Making sure an AI has aligned values and strong controls against value drift is an extra constraint on the AI design process. This constraint appears likely to be very costly at both design and run time, so if the first human level AIs deployed aren't value aligned, it seems very difficult for aligned AIs to catch up and become competitive.
-
AIs' control of the economy will grow over time. This may happen slowly in their time frame but quickly in ours, leaving little time to solve value alignment problems before human values are left with a very small share of the universe, even if property rights hold up.
-
Once we have human-level AIs and it's really obvious that value alignment is difficult, superintelligent AIs may not be far behind. Superintelligent AIs can probably find ways to bend people's beliefs and values to their benefit (e.g., create highly effective forms of propaganda, cults, philosophical arguments, and the like). Without an equally capable, value-aligned AI to protect me, even if my property rights are technically secure, I don't know how I would secure my mind.
Even if there will be problems worth working on at some point, if we will know a lot more later and if resources today can be traded for a lot more resources later, the temptation to wait should be strong. The foom scenario has few visible indications of a problem looming, forcing one to work on the problems far ahead of time. But in scenarios where there's warning, lots more resources, and better tools and understand later, waiting makes a lot more sense.
Conditional on the nonfoom scenario, what is the appropriate indication that you should notice, to start converting resources into work?
If the world where there may or may not be a foom, how likely does foom need to be to make it correct to work on sooner?
I think the answer to the first question is that, as with every other (important) industry, the people in that industry will have the time and skill to notice the problems and start working on them. The FOOM argument says that a small group will form a singleton quickly, and so we need to do something special to ensure it goes well, and the non-FOOM argument is that AI is an industry like most others, and like most others it will not take over the world in a matter of months.
Where do you draw the line between "the people in that industry will have the time and skill to notice the problems and start working on them" and what is happening now, which is: some people in the industry (at least, you can't argue DeepMind and OpenAI are not in the industry) noticed there is a problem and started working on it? Is it an accurate representation of the no-foom position to say, we should only start worrying when we literally observe a superhuman AI that is trying to take over the world? What if, AI takes years to gradually push humans to the sidelines, but the process in unstoppable because this time is not enough to solve alignment from scratch and the economic incentives to keep employing and developing AI are too strong to fight against?
Solving problems is mostly a matter of total resources devoted, not time devoted. Yes some problems have intrinsic clocks, but this doesn't look like such a problem. If we get signs of a problem looming, and can devote a lot of resources then, that makes it tempting to save resources today for such a future push, as we'll know a lot more then and resources today become more resources when delayed.
Hmm. I don't have as strong opinions about this, but this premise doesn't seem obviously true.
I'm thinking about the "is science slowing down?" question – pouring 1000x resources into various scientific fields didn't result in 1000x speedups. In some cases progress seemed to slow down. The three main hypotheses I have are:
I agree that "time spent" isn't the best metric, but it seems like what actually matters is "quality researcher hours that build on each other in the right way," and it's not obvious how much you can scale that.
If it's just the low hanging fruit hypothesis then... that's fine I guess. But if the "extreme talent/motivation" or "coordination" issues are at play, then you want (respectively) to ensure that:
a) at any given time, talented people who are naturally interested in the problem have the freedom to work on it, if there are nonzero things to do with it, since there won't be that many of them in the future.
b) build better coordination tools so that people in the future can scale their efforts better.
(You may also want to make efforts not to get mediocre careerist scientists involved in the field)
FWIW another reason, somewhat similar to the low hanging fruit point, is that because the remaining problems are increasingly specialized, they require more years' training before you can tackle them. I.e. not just harder to solve once you've started, but it takes longer for someone to get to the point where they can even start.
Also, I wonder if the increasing specialization means there are more problems to solve (albeit ever more niche), so people are being spread thinner among them. (Though conversely there are more people in the world, and many more scientists, than a century or two ago.)
I think that this problem is in the same broad category as "invent general relativity" or "prove the Poincare conjecture". That is, for one thing quantity doesn't easily replace talent (you couldn't invent GR just as easily with 50 mediocre physicists instead of one Einstein), and, for another thing, the work is often hard to parallelize (50 Einsteins wouldn't invent GR 50 times as fast). So, you can't solve it just by spending lots of resources in a short time frame.
Yeah, I agree with this view and I believe it's the most common view among MIRI folks.
In software development, a perhaps relevant kind of problem solving, extra resources in the form of more programmers working on the same project doesn't speed things up much. My guesstimate is output = time x log programmers. I assume the main reason being because there's a limit to the extent that you can divide a project into independent parallel programming tasks. (Cf 9 women can't make a baby in 1 month.)
Except that if the people are working in independent smaller teams, each trying to crack the same problem, and *if* the solution requires a single breakthrough (or a few?) which can be made by a smaller team (e.g. public key encryption, as opposed to landing a man on the moon), then presumably it's proportional to the number of teams, because each has an independent probability of making the breakthrough. And it seems plausible that solving AI threats might be more like this.
If you agree that there will be problems worth working on at some point, then when to start working on them becomes a judgement call about how hard the problems are, which warning sign will leave enough time to solve them, how much better tools and understanding will get in the future (without us working specifically to improve such tools/understanding), and how current resources trade against future resources. If you agree with this, I suggest that another reason for urgency besides foom is a judgment that we've already passed such warning signs where it becomes worthwhile to work on the problems. (There are people such as Paul Christiano who don't think foom is highly likely and almost certainly has a good understanding of the tradeoffs you bring up here, who nevertheless think it's urgent to work on alignment.) You might disagree with this judgment but it seems wrong to say "foom scenario is still the main reason for society to be concerned about AI risk now". (Unless you're saying something like, according to your own inside view, foom is the best argument for urgency on AI risk, but I'm assuming you're talking about other people's motivations?)
By "main reason for concern" I mean best arguments; I'm not trying to categorize people's motivations.
AGI isn't remotely close, and I just don't believe people who think they see signs of that. Yes for any problem that we'll eventually want to work on, a few people should work on it now just so someone is tracking the problem, ready to tell the rest of us if they see signs of it coming soon. But I see people calling for much more than that minimal tracking effort.
Most people who work in research areas call for more relative funding for their areas. So the rest of us just can't be in the habit of believing such calls. We must hold a higher standard than "people who get $ to work on this say more $ should go to this now."
You don't seem to believe in foom either, but you're at least willing to mention it as a reason some people give for urgency and even engage in extensive debates about it. I don't understand how "no foom, but AGI may be close enough that it's worthwhile to do substantial alignment work now" could be so much less likely in your mind than foom that it's not even worth mentioning as a reason that some other (seemingly smart and sane) people give for urgency.
What do you propose that "the rest of us" do? I guess some of us can try to evaluate the object-level arguments ourselves, but what about those who lack the domain knowledge or even the raw intelligence to do that? (This is not a rhetorical question; I actually don't know.)
I'm pretty sure Paul can make more money by going into some other line of work than AI safety, plus he's actually spending his own money to fund AI alignment research by others. I personally do not get $ to work on this (except by winning some informal prizes funded by Paul which far from covers the value of time I've spent on the topic) and I plan to keep it that way for the foreseeable future. (Of course we're still fairly likely to be biased for other reasons.)
ETA, it looks like you added this part to your comment after I typed the above:
Ok, that was not clear, since you did present a Twitter poll in the same post asking about "motives for AI risk concern".
Can you point to a good/best argument for the claim that AGI is coming soon enough to justify lots of effort today?
I'm not actually aware of a really good argument for AGI coming soon (i.e., within next few decades). As far as I can tell, most people use their own intuitions and/or surveys of AI researchers (both of which are of course likely to be biased). My sense is that it's hard to reason explicitly about AGI timelines (in a way that's good enough to be more trustworthy than intuitions/surveys) and there seem to be enough people concerned about foom and/or short timelines that funding isn't a big constraint so there's not a lot of incentives for AI risk people to spend time on making such explicit arguments. (ETA: Although I could well be wrong about this, and there's a good argument somewhere that I'm not aware of.) To give a sense of how people are thinking about this, I'll quote a Paul Christiano interview:
My own thinking here is that even if AGI comes a century or more from now, the safest alignment approaches seem to require solving a number of hard philosophical problems which may well take that long to solve even if we start now. Certainly it would be pretty hopeless if we only started when we saw a clear 10-year warning. This possibility also justifies looking more deeply into other approaches now to see if they could potentially be just as safe without solving the hard philosophical problems.
Another thought that is prompted by your question is that given funding does not seem to be the main constraint on current alignment work (people more often cite "talent"), it's not likely to be a limiting constraint in the future either, when the warning signs are even clearer. But "resources today can be traded for a lot more resources later" doesn't seem to apply if we interpret "resources" as "talent".
We have to imagine that we have some influence over the allocation of something, or there's nothing to debate here. Call it "resources" or "talent" or whatever, if there's nothing to move, there's nothing to discuss.
I'm skeptical solving hard philosophical problems will be of much use here. Once we see the actual form of relevant systems then we can do lots of useful work on concrete variations.
I'd call "human labor being obsolete within 10 years … 15%, and within 20 years … 35%" crazy extreme predictions, and happily bet against them.
If we look at direct economic impact, we've had a pretty steady trend for at least a century of jobs displaced by automation, and the continuation of past trend puts full AGI a long way off. So you need a huge unprecedented foom-like lump of innovation to have change that big that soon.
Let me rephrase my argument to be clearer. You suggested earlier, "and if resources today can be traded for a lot more resources later, the temptation to wait should be strong." This advice could be directed at either funders or researchers (or both). It doesn't seem to make sense for researchers, since they can't, by not working on AI alignment today, cause more AI alignment researchers to appear in the future. And I think a funder should think, "There will be plenty of funding for AI alignment research in the future when there are clearer warning signs. I could save and invest this money, and spend it in the future on alignment, but it will just be adding to the future pool of funding, and the marginal utility will be pretty low because at the margins, it will be hard to turn money into qualified alignment researchers in the future just as it is hard to do that today."
So I'm saying this particular reallocation of resources that you suggested does not make sense, but the money/talent could still be reallocated some other way (for example to some other altruistic cause today). Do you have either a counterargument or another suggestion that you think is better than spending on AI alignment today?
Have you seen my recent posts that argued for or supported this? If not I can link them: Three AI Safety Related Ideas, Two Neglected Problems in Human-AI Safety, Beyond Astronomical Waste, The Argument from Philosophical Difficulty.
Sure, but why can't philosophical work be a complement to that?
I won't defend these numbers because I haven't put much thought into this topic personally (since my own reasons don't depend on these numbers, and I doubt that I can do much better than deferring to others). But at what probabilities would you say that substantial work on alignment today would start to be worthwhile (assuming the philosophical difficulty argument doesn't apply)? What do you think a world where such probabilities are reasonable would look like?
If there is a 50-50 chance of foom vs non-foom, and in the non-foom scenario we expect to acquire enough evidence to get an order of magnitude more funding, then to maximize the chance of a good outcome we, today, should invest in the foom scenario because the non-foom scenario can be handled by more reluctant funds.
Related, on the EA Forum. (I am the post's author.)
Its not quite about "fast" v. "slow" than about the chances for putting lots of resources into the problem with substantial warning. Even if things change fast, as long as you get enough warning and resources can be moved to the problem fast enough, waiting still makes sense.
Making sure that an AI has good enough controllability is very much part of the design process, because a completely uncontrollable AI is no good to anyone.
Full value alignment is different and probably much more difficult. There is a hard and an easy control problem.
Training up one's concentration & present-moment awareness are probably helpful for this.
Re: scenario 3, see The Evitable Conflict, the last story in Isaac Asimov's "I, Robot":
I'm not sure I understand the point of this quote in relation to what I wrote. (Keep in mind that I haven't read the story, in case the rest of the story offers the necessary context.) One guess is that you're suggesting that AIs might be more moral than humans "by default" without special effort on the part of effective altruists, so it might not be an existential disaster if AI values end up controlling most of the universe instead of human values. This seems somewhat plausible but surely isn't a reasonable mainline expectation?