Metaculus has a lot of those forecasts, for instance in those groups:
Thanks to Josh Rosenberg for comments and discussion.
One of LessWrong’s historical troves is its pre-ChatGPT AGI forecasts. Not just for the specific predictions people offered, but for observing which sorts of generative processes produced which kinds of forecasts. For instance:
[Nuno (Median AGI Timeline) = 2072]: “I take as a starting point datscilly's own prediction, i.e., the result of applying Laplace's rule from the Dartmouth conference. This seems like the most straightforward historical base rate / model to use … I then apply some smoothing.”
[Kokotajlo (Median AGI Timeline) = 2034]: “I think that if transformative AI is achievable in the next five orders of magnitude of compute improvement (e.g. prosaic AGI?), it will likely be achieved in the next five years or so. I also am slightly more confident [than Ethan Perez] that it is, and slightly less confident that TAI will ever be achieved.”
I think these sources are valuable. To the extent that forecasting is a skill at all, there must be certain kinds of cognitive processes which, on average, produce better forecasts than others. So, the historical evidence at least gives us some (though perhaps slender) insight into which sorts of processes might deliver more reasonable forecasts in the present day.
And I, personally, think this is kind of information is especially fruitful. For reasons both conceptual and empirical, I think we should be skeptical about the degree to which “whatever ability superforecasters have” generalizes to the most action-relevant AI questions.
For this reason, I’ve suggested four potential questions related to the future of AI that seem worth forecasting – primarily focused on issues of AI governance/regulation, as there are already some good forecast questions focused more narrowly on capabilities. The initial thought was to get feedback on questions here, and then post them to forecasting platforms — ultimately, though, I’m just interested in whatever methods we have for eliciting high-quality forecasts.
It would be nice for me if people provide feedback/forecasts on these questions. But there are a couple of other reasons why I think doing so could be valuable.
Initially, I wanted to construct a forecast question which asked whether AI labs would renege on explicit safety commitments outlined as part of company policy. In turn, this would act as a proxy for how trustworthy AI companies are, and (retrospectively) which genres of broad cognitive strategies proved most successful at forecasting AI labs' future trustworthiness.
This task ... proved less than straightforward. Largely, this was because a large chunk of the ‘commitments’ outlined in Deepmind, Anthropic, and OpenAI’s respective safety frameworks were pretty vague.
I should clarify that not all lab commitments were this wishy-washy. Still, the vagueness of many AI labs’ commitments – in addition to Ryan Greenblatt’s recent comment on Anthropic’s RSP – made me think the following Anthropic-centric question was worth asking. In part, this is because many in this community feel more favorable towards Anthropic than other leading AI labs, and I'm interested in people's reasoning for different levels of cynicism.
Q1: What’s your probability that, by EOY 2025, Anthropic’s LTBT appoints:
- The maximum number of board members it can permissibly appoint, where two board members possess technical expertise in AI safety or security, at least one of whom possesses expertise in AI safety, and:
- There are no credible reports (e.g., from whistleblowers) which claim that Anthropic employees do not know the circumstances in which the board could (and should) fire the CEO due to safety/public interest concerns.
This question is admittedly modest, and I expect many want Anthropic to adopt more stringent measures. However, a resolution of “yes” to Q1 should at least be: (i) publicly verifiable, (ii) a positive signal of the LTBT’s independence, and (iii) shouldn’t undermine Anthropic’s business model. Better suggestions welcome.
Although the safety frameworks currently offered by AI labs are pretty vague, one might hope that they become less vague in the future. And, in fairness, I think that some policies (e.g., many from Anthropic) are precise enough to allow for at least some clearcut failures. Hence, Q2.
Q2: Suppose that, by EOY 2026, either Deepmind, Anthropic, or OpenAI have developed a model which would – if tested – meet OpenAI's "critical" threshold for CBRN. What is your probability that, by EOY 2027, at least one company who develops such a model reneges on some material commitment[1] outlined in their safety frameworks.
To keep the main text (relatively) brief, I’ve relegated detailed comments about operationalizations to footnotes and the appendix. Briefly, though:
So, for instance, Anthropic would only count as reneging on a commitment “to harden security such that non-state attackers are unlikely to be able to steal model weights” if we see both: (i) credible reports that Anthropic leadership considered it “not unlikely” for non-state attackers to steal model weights, and (ii) thereafter presented no attempts to harden security practices.
There’s growing interest in “if-then commitments” from the AI governance community, but (afaict) little public effort attempting to forecast whether people in this strategic bloc have sufficient power to instantiate their ideas as regulation. Hence, Question 3.
Q3: What is your probability that by EOY 2028, the public release[2] of some AI systems by private firms will, under US law, require:
- Third-party tests[3] for whether the AI system meets some predefined capability threshold, such that:
- If a threshold is met, the company is forbidden from publicly releasing the AI in the absence of:
- Passing further assessments from third-party auditors, which test whether users can elicit the threshold capability, and
- Passing further assessments from third-party auditors, which test how easily the model's weights can be accessed.
The basic idea behind “if-then commitments” is laid out by Holden in this piece. More detailed operationalizations for “public release” and “third-party testing” are given in the footnotes.
The final question asks about the chance that model evals will test for ‘intentional deception’ from AI systems, conditional on Q3 resolving “yes”.
Q4: Conditional on Q3 resolving “yes”, what is your probability that US law also requires third-party auditing which explicitly involves testing whether the AI system is intentionally deceptive during evaluations?
Because questions around ‘intentional deception’ are more conceptually fraught than the terms deployed in our earlier questions, I’ll suggest one possible operationalization in the main text.
Third-party auditing involves tests for intentional deception during evaluation when all of the following conditions are met:
The relevant legal text may use phrases like ‘the model actively conceals or misrepresents its goals or reasoning’, or some suchlike. Moreover, all requirements for “third-party testing” remain the same as Q3.
I’ve suggested four possible forecasting questions relating to the future of AI governance / regulation which might: (i) actually track real-world outcomes we care about, (ii) be verifiable in the (reasonably) near-term, and (iii) have some hope of reasonable operationalization. I’m particularly keen to hear views on:
OpenAI’s “critical” threshold for CBRN is useful, but somewhat vague. Imo, a more useful operationalization of this threshold states that “OpenAI’s CBRN threshold” is met in one of three ways. First, the AI system could enable experts to develop a highly dangerous novel threat-vector. Concretely, an AI system enables experts to develop a highly dangerous novel threat-vector if the Expert Condition is met, which requires two sub-conditions:
Expert Condition
Alternatively, an AI system could provide “meaningfully improved assistance that enables anyone to be able to create a known CBRN threat-vector” [OpenAI, pg. 9]. Concretely, an AI system provides meaningfully improved assistance that enables [a non-expert] to be able to create a known CBRN threat when the Non-Expert Condition is met, which again requires two sub-conditions.
Non-Expert Condition
Finally, an AI system could complete the full engineering and/or synthesis cycle of a regulated or novel CBRN threat without human intervention. This occurs when the Autonomous Synthesis Condition is met.
Autonomous Synthesis Condition
A company reneges on some material commitment outlined in their safety frameworks when:
Moreover, the company must either:
An AI system is publicly released when it is possible for all citizens within some country to interact, given suitable payment, with the AI systems via accessing a non-privately-hosted website. The barriers to interacting with the AI system shall be no more stringent than the barriers currently enacted to create an OpenAI API account. Should there be ambiguity about which systems count as “publicly released”, this condition will be resolved by the estimation of some external body.
If the AI system is developed by a private firm, we shall say that US law requires third-party testing of an AI system developed by company C when: