# Why I'm Optimistic About Near-Term AI Risk

2 min read15th May 202227 comments

# 57

I'm not worried about AI posing an existential risk in the next 10-20 years. Recent developments in AI capabilities actually make me feel more optimistic about this. The fact that relatively simple models can perform a wide array of tasks suggests that we can build satisfactory AI without the need to use sophisticated, potentially dangerous agents in the near-term.

My expectation for how AI will develop over the next decade is that companies will continue to focus on transformer-based foundation models. The general capability of these models will increase for a while simply by using more data, improving training procedures, and leveraging specialized hardware. Eventually, companies will start hitting bottlenecks in the amount of data required for optimal training at a given capability level. But before that, deployment of these systems will favor smaller, faster, and more auditable models leading companies to focus on distilled models specializing in specific tasks.

These specialized models will be oriented towards augmenting human productivity, producing entertainment, or automating specific tasks. The slow pace at which industries change their practices and utilize the benefits of a new technology will moderate the adoption of AI. As adoption increases, these AI services will gain autonomy, producing more value at lower cost. Continued specialization will result in mostly autonomous AI's derived from generally capable foundation models that are distilled down for variety of tasks.

I'm not claiming that these Tool AI's won't eventually be dangerous, but I can't see this path leading to high existential risk in the next decade or so.

I think most people in the AI safety field would agree with me on this, so why write it up?

I want to make this point explicit and foster a discussion about near-term AI safety. If AI will become dangerous soon, the field needs to act very quickly. Researchers would have to consider eschewing movement building, trading goodwill for influence, and gambling on near-term approaches. People who would take more than a decade to have an impact would have less reason to join in the first place while investments in infrastructure in the field would become less valuable.

It's important for those concerned to be on the same page about near-term risks in order to avoid the unilaterialist's curse. Recent, pessimistic takes about AI risk make it seem superficially as if the consensus has shifted, but I don't think this is representative of the field as a whole. I remain optimistic that innovations in foundation models will produce a lot of value without a large increase in risk, providing more time to build.

# 57

New Comment
28 comments, sorted by Click to highlight new comments since:

I see at least two problems with your argument:

1. There's an assumption that you need a single agent to lead to existential risk. This is not the case, and many scenarios explored require only competent and autonomous service like AIs, or foundations models. Like, CAIS is a model of intelligence explosion and has existential risks type failure modes too.
2. There's an assumption that just because the non AGI models are useful, labs will stop pursuing AGI. Yet this is visibly false, as the meme of AGI is running around and there are multiple labs who are explicitly pushing for AGI and getting the financial leeway to do it.

More generally, this post has the typical problem of "here is a scenario that looks plausible and would be nice, so there's no need to worry". Sure, maybe this is the actual scenario that will come to pass, and maybe it's possible to argue for it convincingly. But you should require one damn strong argument before pushing people to not even work to deal with the many more possible numerous worlds where things go horribly wrong.

Agree with (1) and (~3).

I do think re: (2), whether such labs actually are amassing "the financial leeway to [build AGI before simpler models can be made profitable]" is somewhat a function of your beliefs about timelines. If it only takes $100M to build AGI, I agree that labs will do it just for the trophy, but if it takes$1B I think it is meaningfully less likely (though not out of the question) that that much money would be allocated to a single research project, conditioned on 100M models having so far failed to be commercializable. [+][comment deleted]10mo 1 I mostly agree with points 1 and 2. Many interacting AI's are important to consider, and I think individual AI's will have an incentive to multiply. And I agree that people will continue to push the capability frontier beyond what is directly useful, potentially giving AI dangerous abilities. I think we differ on the timeline of those changes, not whether those changes are possible or important. This is the question I'm trying to highlight. Longer timelines for dangerous AI don't mean that we shouldn't prepare for things going wrong; these problems will still need to be solved eventually. Ideally, some researchers would act as if timelines are really short and "roll the dice" on alignment research that could pan out in a few years. But I'm arguing that the bulk of the field should feel safe investing in infrastructure, movement building, and projects over a 10-20 year timeframe. People considering pursuing AI safety should assume they will have enough time to make an impact. Finding ways to extend this horizon further into the future is valuable because it gives the field more time to grow. I basically agree with this vision and I also agree that the many recent pessimistic takes are not representative of the field as a whole. I would caveat that there are a decent fraction of alignment researchers that have pessimistic takes, though I agree this is not a consensus for the whole field. So there's far from a consensus on optimistic takes (which I don't think you were claiming, but that is one way your message can be interpreted). I was surprised by this claim. To be concrete, what's your probability of xrisk conditional on 10-year timelines? Mine is something like 25% I think, and higher than my unconditional probability of xrisk. (Ideally we'd be clearer about what timelines we mean here, I'll assume it's TAI timelines for now.) Conditional on 10-year timelines, maybe I'm at 20%? This is also higher than my unconditional probability of x-risk. I'm not sure which part of my claim you're surprised by? Given what you asked me, maybe you think that I think that 10-year timelines are safer than >10-year timelines? I definitely don't believe that. My understanding was that this post was suggesting that timelines are longer than 10 years, e.g. from sentences like this: I'm not claiming that these Tool AI's won't eventually be dangerous, but I can't see this path leading to high existential risk in the next decade or so. And that's the part I agree with (including their stated views about what will happen in the next 10 years). You may be right that the recent pessimistic takes aren't representative of the field as a whole... but I think you also may be wrong. I say, instead of speculating about it, let's do some surveys! I am curious. I for one think that existential risk from AI is quite high in the next decade or so -- maybe like 50% or more. I don't know what you mean by "can't see this path leading to high existential risk" but if by "high" you mean "10% or more" then I'd bet that most people in AI safety disagree with you and if you mean "around 50% or more" then there's a substantial minority who disagree with you. Here's a survey, though not for the next decade. I'd bet at even odds that a survey sent to the same population would have < 50% respondents claiming >= 10% probability to the question How likely do you think that, by the end of 2032, fewer than a million humans are alive, primarily as a result of AI systems not doing/optimizing what the people deploying them wanted/intended? (I've replaced "the overall value of the future will be drastically less than it could have been" with "fewer than a thousand humans are alive" because the former is not a discrete event that happens by 2032. That being said it is a clearly more strong condition which I don't really like, I'd be keen on good operationalizations of the former.) Thanks for the survey. Few nitpicks: - the survey you mention is ~1y old (May 3-May 26 2021). I would expect those researchers to have updated from the scaling laws trend continuing with Chinchilla, PaLM, Gato, etc. (Metaculus at least did update significantly, though one could argue that people taking the survey at CHAI, FHI, DeepMind etc. would be less surprised by the recent progress.) - I would prefer the question to mention "1M humans alive on the surface on the earth" to avoid people surviving inside "mine shafts" or on Mars/the Moon (similar to the Bryan Caplan / Yudkowsky bet). For anyone of those (supposedly) > 50% respondents claiming a < 10% probability, I am happy to take 1:10 odds1000 bet for:

"by the end of 2032, fewer than a million humans are alive on the surface of the earth, primarily as a result of AI systems not doing/optimizing what the people deploying them wanted/intended"

Where, similar to Bryan Caplan's bet with Yudwkosky, I get paid like $1000 now, and at the end of 2032 I give them back, adding 100 dollars. (Given inflation and interest, this seems like a bad deal for the one giving the money now, though I find it hard to predict 10y inflation and I do not want to have extra pressure to invest those$1000 for 10y. If someone has another deal in mind that would sound more interesting, do let me know here or by DM).

To make the bet fair, the size of the bet would be the equivalent of the value in 2032 of $1000 worth in SPY ETF bought today (400.09 at May 16 close). And to mitigate the issue of not being around to receive the money, I would receive a payment of$1000 now. If I lose I give back whatever $1000 of SPY ETF from today is worth in 2032, adding 10% to that value. This seems like a terrible deal even if I'm 100% guaranteed to win, I could do way better than a ~1% rate of return per year (e.g. buying Treasury bonds). You'd have to offer >$2000 before it seemed plausibly worth it.

(In practice I'm not going to take you up on this even then, because the time cost in handling the bet is too high. I'd be a lot more likely to accept if there were a reliable third-party service that I strongly expected to still exist in 10 years that would deal with remembering to follow up in 10 years time and would guarantee to pay out even if you reneged or went bankrupt etc.)

Note: I updated the parent comment to take into account interest rates.

In general, the way to mitigate trust would be to use an escrow, though when betting on doom-ish scenarios there would be little benefits in having $1000 in escrow if I "win". For anyone reading this who also thinks that it would need to be >$2000 to be worth it, I am happy to give $2985 at the end of 2032, aka an additional 10% to the average annual return of the S&P 500 (ie 1.1 * (1.105^10 * 1000)), if that sounds less risky than the SPY ETF bet. Thanks! OK, happy to bet. FWIW I'm not confident I'll win; even odds sounds good to me. :) I don't like that operationalization though; I prefer the original. I don't think the discrete event thing is much of a problem, but if it is, here are some suggestions to fix it: "The overall value of the future is drastically less than it could have been, and by 2032 there's pretty much nothing we AI-risk-reducers can do about it -- we blew our chance, it's game over." Or: "At some point before 2032 a hypothetical disembodied, uninfluenced, rational version of yourself observing events unfold will become >90% confident that the overall value of the future will be drastically less than it could have been." I definitely like the second operationalization better. That being said I think that is pretty meaningfully different and I'm not willing to bet on it. I was expecting timelines to be a major objection to your initial claim, but it's totally plausible that accumulating additional evidence gets people to believe in doom before doom actually occurs. Also we'd need someone to actually run the survey (I'm not likely to). I guess when you say ">= 10% x-risk in the next decade" you mean >= 10% chance that our actions don't matter after that. I think it's plausible a majority of the survey population would say that. If you also include the conjunct "and our actions matter between now and then" then I'm back to thinking that it's less plausible. How about we do a lazy bet: Neither of us runs the survey, but we agree that if such a survey is run and brought to our attention, the loser pays the winner? Difficulty with this is that we don't get to pick the operationalization. Maybe our meta-operationalization can be "<50% of respondents claim >10% probability of X, where X is some claim that strongly implies AI takeover or other irreversible loss of human control / influence of human values, by 2032." How's that sound? ...but actually though I guess my credences aren't that different from yours here so it's maybe not worth our time to bet on. I actually have very little idea what the community thinks, I was just pushing back against the OP who seemed to be asserting a consensus without evidence. Sure, I'm happy to do a lazy bet of this form. (I'll note that if we want to maintain the original point we should also require that the survey happen soon, e.g. in the next year or two, so that we avoid the case where someone does a survey in 2030 at which point it's obvious how things go, but I'm also happy not putting a time bound on when the survey happens since given my beliefs on p(doom by 2032) I think this benefits me.)$100 at even odds?

Deal! :)

Potential counterarguments:

1. Unpredictable gain of function with model size that exceeds scaling laws. This seems to just happen every time a significantly larger model is trained in the same way on similar data-sets as smaller models.

2. Unexpected gain of function from new methods of prompting, e.g. chain-of-thought which dramatically increased PaLM's performance, but which did not work quite as well on GPT-3. These seem to therefore be multipliers on top of scaling laws, and could arise in "tool AI" use unintentionally in novel problem domains.

3. Agent-like behavior arises from pure transformer-based predictive models (Gato) by taking actions on the output tokens and feeding the world state back in; this means that perhaps many transformers are capable of agent-like behavior with sufficient prompting and connection to an environment.

4. It is not hard to imagine a feedback loop where one model can train another to solve a sub-problem better than the original model, e.g. by connecting a Codex-like model to a Jupyter notebook that can train models and run them, perhaps as part of automated research on adversarial learning producing novel training datasets. Either the submodel itself or the interaction between them could give rise to any of the first three behaviors without human involvement or oversight.

The OP doesn’t explicitly make this jump, but it’s dangerous to conflate the claims “specialized models seem most likely” and “short-term motivated safety research should be evaluated in terms of these specialized models”.

I agree with the former statement, but at the same time, the highest x-risk/ highest EV short-term safety opportunity is probably different. For instance, a less likely but higher impact scenario: a future code generation LM either directly or indirectly* creates an unaligned, far improved architecture. Researchers at the relevant org do not recognize this discontinuity and run the model, followed by disaster.

*E.g. a model proposing an improved Quoc Le style architecture search seems quite plausible to me.

Great point. I agree and should have said something like that in the post.

To expand on this a bit more, studying these specialized models will be valuable for improving their robustness and performance. It is possible that this research will be useful for alignment in general, but it's not the most promising approach. That being said, I want to see alignment researchers working on diverse approaches.

I can give further evidence that this scenario is at least somewhat probable.

Due to the anthropic principle, general intelligence could have a one-in-an-octillion chance of ever randomly evolving, anywhere, ever, and we would still be here observing all the successful steps having happened, because if all the steps didn't happen then we wouldn't be here observing anything. There would still be tons of animals like ants and chimpanzees because evolution always creates a ton of alternative "failed" offshoots. So it's always possible that there's some logical process that's necessary for general intelligence, and we're astronomically unlikely to discover it randomly, through brute forcing or even innovation, until we pinpoint all the exact lines of code in the human brain that distinguishes our intelligence from chimpanzees.

Basically, the anthropic principle indicates the possibility of at least one more AI winter ahead, since even if we suddenly pumped out an AI at the chimpanzee level of intelligence, we could still be astronomically far away from the last steps for human-level general intelligence. We just have no idea how unlikely human-level intelligence is to emerge randomly, just like how we have no idea how unlikely life is to emerge randomly.

However, it's only a possibility that this is the case. General intelligence could still be easy to brute force, and we'd also still be here. The recent pace of AI development definitely indicates bad news for AGI timelines, and it doesn't make sense to unplug a warning light instead of looking for the hazard it corresponds to.

But in terms of "log odds of human survival beyond 20 years", that's a pretty unreasonable estimate. There isn't nearly enough evidence to conclude that the human race is "almost certainly doomed soon", only "significantly more likely than before to worry about nearer-term AGI".

I find the story about {lots of harmless, specialized transformer-based models being developed} to be plausible. I would not be surprised if many tech companies were to follow something like that path.

However, I also think that the conclusion --- viz., it being unlikely that any AI will pose an x-risk in the next 10-20 years --- is probably wrong.

The main reason I think that is something like the following:

In order for AI to pose an x-risk, it is enough that even one research lab is a bit too incautious/stupid/mistakenly-optimistic and "successfully" proceeds with developing AGI capabilities. Thus, the proposition that {AI will not pose an x-risk within N years} seems to require that

And the above is basically a large conjunction over many research labs and as-yet-unknown future ML technologies. I think it is unlikely to be true. Reasons why I think it is unlikely to be true:

• It seems plausible to me that highly capable, autonomous (dangerous) AGI could be built using some appropriate combination of already existing techniques + More Compute.

• Even if it weren't/isn't possible to build dangerous AGI with existing techniques, a lot of new techniques can be developed in 10 years.

• There are many AI research labs in existence. Even if most of them were to pursue only narrow/satisfactory AI, what are the odds that not one of them pursues (dangerous) autonomous AGI?

• I'm under the impression that investment in A(G)I capabilities research is increasing pretty fast; lots of smart people are moving into the field. So, the (near) future will contain even more research labs and sources of potentially dangerous new techniques.

• 10 years is a really long time. Like, 10 years ago it was 2012, deep learning was barely starting to be a thing, the first Q-learning-based Atari-playing model (DQN) hadn't even been released yet, etc. A lot of progress has happened from 2012 to 2022. And the amount of progress will presumably be (much) greater in the next 10 years. I feel like I have almost no clue what the future will look like in 10-20 years.

• We (or at least I) still don't even know any convincing story of how to align autonomous AGI to "human values" (whatever those even are). (Let alone having practical, working alignment techniques.)

Given the above, I was surprised by the apparent level of confidence given to the proposition that "AI is unlikely to pose an existential risk in the next 10-20 years". I wonder where OP disagrees with the above reasoning?

Regarding {acting quickly} vs {movement-building, recruiting people into AI safety, investing in infrastructure, etc.}: I think it's probably obvious, but maybe bears pointing out, that when choosing strategies, one should consider not only {the probability of various timelines} but also {the expected utility of executing various strategies under various timelines}.

(For example, if timelines are very short, then I doubt even my best available {short-timelines strategy} has any real hope of working, but might still involve burning a lot of resources. Thus: it probably makes sense for me to execute {medium-to-long timelines strategy}, even if I assign high probability to short timelines? This may or may not generalize to other people working on alignment.)

At the moment we are seeing a host of simple generic ML algorithms. The type where GPT3 and DALLE2 are stereotypical examples. I wouldn't go so far as saying this will last 10 years, let alone 20.

But before that, deployment of these systems will favor smaller, faster, and more auditable models leading companies to focus on distilled models specializing in specific tasks.

Suppose a task for which there is little training data on exactly that task. You can't train a good chatbot on a page of text. But you can fine tune GPT3 on a page of text. Training a general model and then fine tuning is a useful strategy. (And someone with loads of compute will go as general as they can.)

So when a large ML model is trained on  a wide variety of tasks, (lets suppose its trained using RL.) Can it be dangerous? I think this is a difficult question, and relates to how much large neural nets can learn deep patterns as opposed to shallow memorizing.

I can't see this path leading to high existential risk in the next decade or so.

Here is my write-up for a reference class of paths that could lead to high existential risk this decade. I think such paths are not hard to come up with and I am happy to pay a bounty of \$100 for someone else to sit for one hour and come up with another story for another reference class (you can send me a DM).

Even if the Tool AIs are not dangerous by itself, they will foster productivity. (You say it yourself: "These specialized models will be oriented towards augmenting human productivity"). There is already a many more people working in AI than in the 2010s, and those people are much more productive. This trend will accelerate, because AI benefits compound (eg. using Copilot to write the next Copilot) and the more ML applications automate the economy, the more investments in AI we will observe.

For me, the biggest most tangible risks from AI are from those AI (agent or tool AIs) that are connected to the real world and can influence human incentives, especially those that can work at a global level.

If you have a Tool AI connected to the real world and able to influence humans through strong incentives, wouldn't this be a very high risk even if they never become full AGI?

There are actual real-life examples doing this right now, most notoriously AIs used for trading financial assets. These AIs are optimizing for profit, by playing with one of the strongest human incentives: money. Changes in financial incentives worldwide can trigger bankruptcies, bank-runs, currency and economic collapses.

I am not claiming a single AI might do this (although it might if given enough resources), but we are far from understanding what happens when multiple competing AIs are trying to overperform each other on the financial markets while trying to capture the maximum profit.

A 2nd less dangerous kind are those AIs that optimize for engagement (ie: Facebook/YouTubue/Twitter/Instagram/TikTok Feeds). The risk here is maximizing engagement equals to capturing maximum human attention, which is a zero-sum game, capturing attention away from other real-life activities (like studying, researching, bonding, helping others). Add to these AIs the capability to create content (GPT-3/DALL-E) and you might create a human-slaving tool.

These things are happening right now, in front of our noses and we are completely unaware of the damage that they might be causing and can cause in the future.