Some of my predictable updates on AI

Aaron_Scher

Introduction

Author note: I’m struggling to write this in a way I’m happy with. Rather than having it sit in my drafts, I’m going to post in an unfinished state now. I don’t think I endorse sharing/upvoting this much, but if you find the content particularly compelling feel free to overrule me. Epistemic status: speculative and imprecise forecasts

Joe Carlsmith has a lengthy blog post about predictably updating on AI risk. I skimmed it and found it interesting. This post includes some of my predictions about AI and AI risk in the next year or so, including what I expect to happen and how I think I should update if my predictions are wrong. Partially I’m writing this to force myself to make predictions, partially I’m writing to get feedback from others about my predictions, and partially I’m writing to try and spread these predictions and their associated updates with others concerned about AI existential safety. This is a suspicious activity partially because it’s really hard to currently know how future events should affect my beliefs. i.e., I currently expect a thing like election interference to not change my beliefs much, but there may be unpredicted-by-me circumstances which actually make election interference evidence very important; doing this exercise might make it more difficult to change my beliefs properly later.

For each item, I’ll note:

What’s the uncertainty
What do I expect and how much do I expect it
How I should update if something I don’t expect happens, and how much I should care

tldr

In short, here are some things I expect:

Some big policy stuff will happen, and it will seem neutral or positive
AI labs will agree on some safety standards which will seem positive
AI labs might compete on some safety things, but I expect a mediocre outcome
Task-oriented / agentic LLMs will be a bigger deal, and this will be scary
Misuse threats will be a big deal, and many more people will care about this
AI will enable interference in the US 2024 elections, but the policy effects are the main part I care about
We’ll make significant progress on alignment for current AIs which will look optimistic

Overall, I expect the most significant updates on the table in the next year are related to how seriously AGI labs seem to be taking existential safety (where I expect some positive signs), and how AI alignment research is going (where I expect pretty good results).

Now let’s get into the specifics.

Some big policy stuff will happen

Uncertainty: What will happen with AI regulation
Expectation: There will be major bills in congress, committees and stuff, forums to discuss risks from AI, including some emphasis on x-risk. I expect it will mostly be unclear how useful any of these things are, and many of them will seem like they are years away from actually affecting AI development. Nevertheless, I expect these will be seen as positive signs. I’m currently at like 70% that there are major moves in the US government on AI by the end of 2024, where major moves is vague. I expect the major moves in the US government seem mostly positive or neutral from an x-risk perspective.
Violation: I would consider it a violation of my prediction if there were significant regulations or bills gaining traction that seem very bad from an x-risk perspective, e.g., requiring AGI labs to open source models, provide training details publicly, stricter anti-trust law around AI developers. I would also consider it a violation of my prediction if the hubbub around AI governance is quieter in October 2024 than it is now (poorly defined). Mostly, if I’m surprised I expect it to be in a negative way, but there’s also some chance regulation seems quite useful. I think it’s pretty unlikely that evidence about AI governance I receive in the next year will update my beliefs about x-risk much.

AI labs will agree on some safety standards

Uncertainty: How will AGI labs orient to AI safety
Expectation: This is already happening somewhat. The Frontier model forum will probably do something. Probably a couple more AI labs will sign on to these whitehouse standards. Maybe they’ll all agree to do certain dangerous capability evals or work with certain external red-teams, e.g., more AI developers adopt RSPs. Maybe I’m 70% that by the end of 2024, OpenAI and Google DeepMind will have adopted responsible scaling policies of some kind. I mostly won’t be surprised if things are looking good in terms of the labs taking AI safety seriously.
Elaboration: I think one of the plausible paths to victory in this alignment game is a leading lab differentially automating/accelerating safety research, and this requires either this lab to be safety-oriented or heavily and specifically regulated. Therefore, whether the leading labs are serious about catastrophe prevention is a pretty important variable and one where evidence in the next year could be a significant update.
Violation: I would be surprised if a lab which is less safety-conscious than those mentioned above seems to have the best AI system by the end of 2024; this would make me considerably more worried about x-risk. If OpenAI and Google DeepMind (GDM) have not both adopted RSPs of some sort by EoY 2024, this would be a medium update toward being worried, but plausibly there are circumstances which make this matter much less, e.g., model scaling stops working. It’s also possible that the labs exceed my expectations, for instance if they (Anthropic, OpenAI, GDM) adopted “cease and assist” rules that seemed likely to be enforced, or if larger set of labs (e.g., Anthropic, OpenAI, GDM, Meta, Adept, Inflection) signed onto an RSP that required a high degree of caution in 2024.

AI labs will compete on some safety things

Uncertainty: Will AGI labs engage in any meaningful competition on safety. Broadly the thing is “AI labs compete on a bunch of capabilities metrics already, but as safety becomes more of a concern to the public and labs, we should also expect this for some safety things”, even if they’re not the most important things.
Expectation: We’re already seeing some of this with benchmarks like TruthfulQA and harmlessness comparisons. Depending on what benchmarks and metrics are available, there may be other forms of this. I mostly expect to feel *positive shrug* toward the work that comes out of this, as I don’t expect it to be very impressive or total garbage. Maybe I’m 60% that by the end of 2024 there will be examples of decent research from one lab that seem aimed at outcompeting another lab on safety.
Violation: Both having and not having competitions on safety seems pretty reasonable. I would be positively surprised if there were major research successes coming out of research competitions, but it’s hard to imagine what this looks like.

Task-oriented / Agentic LLMs will be a bigger deal

Uncertainty: Will there be substantial advances in AIs capable of accomplishing tasks in the real world
Expectation: AutoGPT is pretty bad and just scratching the surface. I believe there is substantial headroom here, by which I include both substantial capability gains over current systems, and substantial usefulness/economic value to be captured. More specifically, I expect GDM to release Gemini soon, and for RL on short to medium length task completion to be a thing. I expect Gemini to be approximately as good as GPT-4. I don’t have much of a prediction for whether Gemini will be a closed release, via API or open, I guess I’ll update slightly on GDM, though I’m unsure how big of an update.
Violation: I will be surprised if Gemini hasn’t been publicly demoed by March 2024, or if it performs more than 10 percentage points differently than GPT-4 when averaged across many benchmarks. If Gemini is substantially better than GPT-4 I’ll become slightly more confident in short timelines, whereas if it is substantially worse than GPT-4 this would be a slight update toward longer timelines and on OpenAI having a bigger lead than I currently expect.
Expectation: I think it wouldn’t be crazy if there were AI agents doing stuff online by the end of 2024, e.g., running social media accounts, selling consulting services; I expect such agents would be largely human-facilitated like AutoGPT. I will be slightly surprised if by end of 2024 there are AI agents running around the internet that are meaningfully in control of their own existence, e.g., are renting their own cloud compute without a human being involved. I expect most of this stuff will be fairly concerning when it happens; I’m trying to price that in now. I would be positively surprised if the US government instituted rules significantly restricting the operation of autonomous agents by end of 2024 (e.g., saying compute providers can only sell to humans, with enough nuance that this isn’t easily bypassed).
Violation: I don’t think updates here would be very influential, except I would get a bit more worried and confident in short timelines if we got really good agents in 2024. If the state of agents at the end of 2024 is as bad as currently, this would be a slight positive update on doom and toward longer timelines (it’s only a slight update because agents soon isn’t a major part of why I’m worried, and a lack of agents in 2024 isn’t much evidence about agents a generation or two later).

Misuse threats will be a big deal

Uncertainty: Will the proliferation of powerful AIs lead to misuse-based catastrophes
Expectation: LLMs that significantly help with the creation of bio weapons are 2-3 years away, according to Dario Amodei; hacking capabilities are probably around the same or sooner. Preventing misuse will have a lot of open problems (i.e., good area for research now!). Also multimodal and image models will see progress, making particular risks like deepfakes more prominent. Some of the necessary problems will receive a fair amount of attention by default, as watermarking seems to be getting, but others may not. Historically I think I’ve been less worried than I should be about misuse-related AI catastrophe — oops. A specific case is that I used to not be sure if we would get warning shots. It now looks like we’re probably going to get medium warning shots on the scale of hundreds of millions of dollars or hundreds of deaths, due to AI-enabled attacks in the next few years. I’m slightly surprised we haven’t seen effective misuse of current open source LLMs, but this seems like mostly a matter of time. I’m probably 65% that there is a misuse of AI that causes concrete damage in the world by EoY 2024 (e.g., a cyber attack that costs >100k), and more like 10% on medium warning shots by EoY 2024.
Violation: Overall, I think we’re on a trajectory where I’ll get a bit more worried about misuse threats, but this prediction could be violated by e.g., strong rules banning open source. I would be surprised if there was a point in the future where I thought total misuse x-risk was higher than total misalignment x-risk — but this very possible if, for instance, catastrophic misuse risks end up seeming very likely. I would also be surprised if at some point in the next year I thought catastrophic misuse risk was <5% likely at all. I think these should be small to medium updates. I would be surprised if there was a medium warning shot by EoY 2024, and this would probably make me more confident in short timelines, and have an unclear effect on overall doom (depends how the world reacts).
Expectation: I expect people outside of the current AI safety community to get even more worried about misuse risks than they currently are, and for there to be substantial research aimed at addressing these risks.
Violation: I would be surprised if EoY 2024 people were less worried about AI misuse than they are in October 2023. I might have a small update, depending on why this was.

Likely major foreign interference in US 2024 elections

Uncertainty: Will AIs be used to enable interference in US elections, and how will the US respond. Maybe this becomes a thing 2024 candidates have positions on. Like if there’s a bunch of LLMs impersonating people and manipulating voters, candidates might have to answer about this in debates; what the various proposals are and who’s in the room could matter. The impact of this on policy probably depends a bit whether it “blows up” before or after the election. Thanks Charlotte for discussion.
Expectation: Maybe I’m 65% that there will be significant interference and people know it. I also wouldn’t be horribly surprised if there isn’t substantial AI-accelerated interference with the election. I think it’s unlikely but plausible (30%) that there’s a policy reaction to election interference which seems good in terms of x-risk reduction.
Violation: This isn’t a key factor in my world model, and I think the most important aspects of it are whether or not it causes policy responses, which, again, may be downstream of when this becomes a big deal, if it becomes a big deal. I mostly don’t want to update on how election interference goes, unless it seems likely to cause particular policy changes or spur large amounts of somewhat-useful research (which currently seems like 50% likely). We pretty much have the capabilities, and there are plenty of resourced actors with the will, so whether or not major interference actually happens or merely could have happened doesn’t seem significant in terms of AI development (obviously it matters for other reasons).

We’ll make significant progress on “alignment” for current AIs

Uncertainty: Will we make progress on alignment of current AI systems
Expectation: I think alignment research in the current LLM paradigm is relatively easy, and I expect there are significant wins that we haven’t seen yet but will see soon (I might write up why I think this is easy, but in the meantime readers can also check out these posts: 1, 2). In the next 6 months I expect (75%) there will be at least one notable advancement in scalable oversight, e.g., an AI debate setup that yields >5% improvement over previous methods, think something similar in magnitude to Constitutional AI’s improvement over simple RLHF. I also expect there will be a bunch of interpretability research that seems helpful and interesting. I mostly expect that there will be high quality and useful prosaic / empirical alignment work, but that there will be considerably fewer successes for research aimed at future systems, e.g., agent foundations.
Violation: I would be surprised if 2024 had more than 10 significant advances in alignment, or fewer than 3. The things that count as semi-significant advances for 2023 according to me are: success of activation steering in interpretability, success of dictionary learning to largely solve polysemanticity in interpretability. Said differently, I’m expecting a 1.5-5x increase in significant advances next year. I think it would be a substantial update if we get more than 10 significant advances, and it would be a moderate update if we get fewer than 3. I would count research toward this if it was conducted earlier but only first publicized after October 2023. I might later write up something about what counts as significant advances in my eyes.

Conclusion

Writing this list has forced me to think about the future in a way I don’t usually do, which seems useful. On the whole, I expect to see some positive-but-not-amazing signs in the next year. Agentic AIs are probably going to happen soon, plausibly in the next year, and they’re gonna be wild/scary. On the other hand, I expect we’ll make significant progress on alignment and labs will seem to be taking existential risk (including misuse and misalignment) seriously.

LLMs that significantly help with the creation of bio weapons are 2-3 years away, according to Dario Amodei; hacking capabilities are probably around the same or sooner

So, I note that LLMs only significantly increase the risk of bioterrorist attacks if, indeed, such attacks are currently bottlenecked on knowledge of lab procedures, etc, that LLMs could provide. They could be also bottlenecked on any of the other steps involved -- any estimate that LLMs do increase the risk assumes that this is the case.

I am unaware of any paper arguing that such knowledge is indeed the bottleneck.

Note that we also have evidence that such attacks are not currently bottlenecked on such knowledge, but on the (many) other steps involved. He's a paper from the Future of Humanity Institute that argues that, for instance. So, if the paper is correct, open source LLMs do not contribute to biorisk substantially.

Even apart from the paper, that's also my prior view -- given the relatively weak generalization abilities of LLMs, if they could contribute to knowledge of lab procedures, then the knowledge is already out there and not-too-hard to find.

(The pattern of discourse around biorisk and LLMs looks much more like "ban open source LLMs" was the goal and "use biorisk concerns" was the means, rather than "decrease biorisk" was the goal and "ban open source LLMs was the means." I'm not saying this about you, to be clear -- I'm saying this about the relevant thought-leaders / think tanks who keep mentioning biorisk.)

It now looks like we’re probably going to get medium warning shots on the scale of hundreds of millions of dollars or hundreds of deaths, due to AI-enabled attacks in the next few years. I’m slightly surprised we haven’t seen effective misuse of current open source LLMs, but this seems like mostly a matter of time.

I don't know what kind of AI-enabled-cyber-attacks causing hundreds of deaths you mean. Right now, if I want to download penetration tools to hack other computers without using any LLM at all I can just do so. What kind of misuse of current open source LLMs, enabling hundreds of deaths, did you expect to have seen?

Thanks for your comment!

I appreciate you raising this point about whether LLMs alleviate the key bottlenecks on bioterrorism. I skimmed the paper you linked, thought about my previous evidence, and am happy to say that I'm much less certain that I was before.

My previous thinking for why I believe LLMs exacerbate biosecurity risks:

Kevin Esvelt said so on the 80k podcast, see also the small experiment he mentions. Okay evidence. (I have not listened to the entire episode)
Anthropic says so in this blog post. Upon reflection I think this is worse evidence than I previously thought (seems hard to imagine seeing the opposite conclusion from their research, given how vague the blog post is; access to their internal reports would help).

The Montague 2023 paper you link: the main bottleneck to high consequence biological attacks is actually R&D to create novel threats, especially spread testing which needs to take part in a realistic deployment environment. This requires both being purposeful and being having a bunch of resources, so policies need not be focused on decreasing democratic access to biotech and knowledge.

I don't find the paper's discussion of 'why extensive spread testing is necessary' to be super convincing, but it's reasonable and I'm updating somewhat toward this position. That is, I'm not convinced either way. I would have a better idea if I knew how accurate a priori spread forecasting has been for bioweapons programs in the past.

I think the "LLMs exacerbate biosecurity risks" still goes through even if I mostly buy Montague's arguments, given that those arguments are partially specific to high consequence attacks. Additionally, Montague is mainly arguing that democratization of biotech / info doesn't help with the main barriers, not that it doesn't help at all:

The reason spread-testing has not previously been perceived as a the defining stage of difficulty in the biorisk chain (see Figure 1), eclipsing all others, is that, until recently, the difficulties associated with the preceding steps in the risk chain were so high as to deter contemplation of the practical difficulties beyond them. With the advance of synthetic biology enabled by bioinformatic inferences on ‘omics data, the perception of these prior barriers at earlier stages of the risk chain has receded.

So I think there exists some reasonable position (which likely doesn't warrant focusing on LLM --> biosecurity risks, and is a bit of an epistemic cop out): LLMs increase biosecurity risks marginally even though they don't affect the main blockers for bio weapon development.

Thanks again for your comment, I appreciate you pointing this out. This was one of the most valuable things I've read in the last 2 weeks.

Second smaller comment:

I'm not saying LLMs necessarily raise the severity ceiling on either a bio or cyber attack. I think it's quite possible that AIs will do so in the future, but I'm less worried about this on the 2-3 year timeframe. Instead, the main effect is decreasing the cost of these attacks and enabling more actors to execute such attacks. (as noted, it's unclear whether this substantially worsens bio threats)

if I want to download penetration tools to hack other computers without using any LLM at all I can just do so

Yes, it's possible to launch cyber attacks currently. But with AI assistance it will require less personal expertise and be less costly. I am slightly surprised that we have not seen a much greater amount of standard cybercrime (the bar I was thinking when I wrote this was not the hundreds of deaths bar, it was more like "statistically significant increase in cybercrime / serious deepfakes / misinformation, in a way that concretely impacts the world, compared to previous years").

Very complicated ways of facilitating agency seem feasible. There's Imbue doing some CoEm-sounding things (debuggable planning in natural language that's possible to inspect and intervene on), without a clear stance on extinction risk implications. It might turn out that there is a whole operating system's worth of engineering effort that's useful for turning basic LLM-style capabilities into coherent autonomous reasoning.

This is mostly irrelevant for capabilities if scaling gets to AGI on its own, but if that doesn't happen in the next few years, extremely complicated agency engineering efforts might become more important than further scaling, giving CoEm-like or even CAIS-like systems an opportunity to determine the safety properties of first AGIs.

Some big policy stuff will happen, and it will seem neutral or positive
AI labs will agree on some safety standards which will seem positive
AI labs might compete on some safety things, but I expect a mediocre outcome
Task-oriented / agentic LLMs will be a bigger deal, and this will be scary
Misuse threats will be a big deal, and many more people will care about this
AI will enable interference in the US 2024 elections, but the policy effects are the main part I care about
We’ll make significant progress on alignment for current AIs which will look optimistic

Great list. I also expect these things.

I like the style of this post, thanks for writing it! Some thoughts:

model scaling stops working

Roughly what probability would you put on this? I see this as really unlikely (perhaps <5%) such that ‘scaling stops working’ isn’t part of my model over the next 1-2yrs.

I will be slightly surprised if by end of 2024 there are AI agents running around the internet that are meaningfully in control of their own existence, e.g., are renting their own cloud compute without a human being involved.

Only slightly surprised? IMO being able to autonomously rent cloud compute seems quite significant (technically and legally), and I’d be very surprised if something like this happened on a 1yr horizon. I’d be negatively surprised if the US government didn’t institute regulation on the operation of autonomous agents of this type by the end of 2024, basically due to their potential for misuse and their economic value. It may help to know how you're operationalizing AIs that are ‘meaningfully aware of their own existence’.

I think it's pretty unlikely that scaling literally stops working, maybe I'm 5-10% that we soon get to a point where there are only very small or negligible improvements to increasing compute. But I'm like 10-20% on some weaker version.

A weaker version could look like there are diminishing returns to performance from scaling compute (as is true), and this makes it very difficult for companies to continue scaling. One mechanism at play is that the marginal improvements from scaling may not be enough to produce the additional revenue needed to cover the scaling costs, this is especially true in a competitive market where it's not clear scaling will put one ahead of their competitors.

In the context of the post, I think it's quite unlikely that I see strong evidence in the next year indicating that scaling has stopped (if only because a year of no-progress is not sufficient evidence). I was more so trying to point to how there [sic] are contingencies which would make OpenAI's adoption of an RSP less safety-critical. I stand by the statements that scaling no longer yielding returns would be such a contingency, but I agree that it's pretty unlikely.

Only slightly surprised?

We are currently at ASL-2 in Anthropic's RSP. Based on the categorization, ASL-3 is "low-level autonomous capabilities". I think ASL-3 systems probably don't meet the bar of "meaningfully in control of their own existence", but they probably meet the thing I think is more likely:

I think it wouldn’t be crazy if there were AI agents doing stuff online by the end of 2024, e.g., running social media accounts, selling consulting services; I expect such agents would be largely human-facilitated like AutoGPT

I think it's currently a good bet (>40%) that we will see ASL-3 systems in 2024.

I'm not sure how big of a jump if will be from that to "meaningfully in control of their own existence". I would be surprised if it were a small jump, such that we saw AIs renting their own cloud compute in 2024, but this is quite plausible on my models.

I think the evidence indicates that this is a hard task, but not super hard. e.g., looking at ARC's report on autonomous tasks, one model partially completes the task of setting up GPT-J via a cloud provider (with human help).

I'll amend my position to just being "surprised" without the slightly, as I think this better captures my beliefs — thanks for the push to think about this more. Maybe I'm at 5-10%.

It may help to know how you're operationalizing AIs that are ‘meaningfully aware of their own existence’.

shrug, I'm being vague