AI 2027 Tracker: One Year of Predictions vs. Reality

hauspost

This is a linkpost for https://ai2027-tracker.com/

I've been tracking 53 predictions from the AI 2027 scenario against reality for about a year. I'll get into why I started this, but for my first post in this community, I want to lead by sharing what I'm seeing, and one pattern in particular.

Risks before capabilities

While I think it's remarkable how well AI 2027 forecasted the overall progress of AI, I have to concede: most of the capability predictions it made are running behind. SWE-bench is the clearest example: AI 2027 predicted 85% by mid-2025, actual best was 74.5% (Opus 4.1). Compute scaleups and benchmark timelines have mostly slipped.

But the safety, security and governance predictions are a different story. Perhaps most prominently, Anthropic's red team reporting that Claude Mythos Preview found thousands of zero-days autonomously, as a side effect of training for code and reasoning. AI 2027 predicted this for Agent-2 in early 2027. It happened about a year early. DOD-AI lab dynamics are tracking similarly early.

So the pattern I see is: risks are arriving before the raw capabilities that were supposed to produce them. That seems under-appreciated to me. FutureSearch's recent one-year retrospective makes a similar Mythos-as-Agent-2 observation; the tracker tries to do this systematically across all 53 predictions. Here's my current overall take:

The scorecard

Status	Count	%
Confirmed	14	26%
Ahead	3	6%
On Track	10	19%
Behind	4	8%
Emerging	13	25%
Not Yet Testable	9	17%

In other words, 27 of 53 predictions (51%) are confirmed, ahead, or on track. This is directionally in line with Kokotajlo and Lifland's Feb 2026 grading, which put quantitative progress at ~65% of predicted pace while saying most qualitative claims were on track. Their Q1 2026 update then pulled automated-coder medians forward again (Daniel: late 2029 to mid 2028), citing faster METR doubling times.

Why track this

When I read AI 2027 shortly after it was published in April 2025, it struck me hard. Many in my vicinity were dismissing the outcomes, not the timelines. At the same time several aspects of the work stood out to me. Most of all, how many of its claims, extrapolations and ideas were too plausible for me to easily dismiss. I've worked in areas of exponential progress for most of my working life, and I recognized this typical pattern of humans not being able to intuitively grasp patterns of exponential growth. But also, AI 2027 stood out to me because it was the first time I saw fears, vague discussions and abstract thoughts turned into eerily specific and falsifiable extrapolations and predictions. Until then, most AI forecasting was vague enough to claim victory regardless of outcome. This was different.

So when I faced broad dismissal in discussions with my non-tech peers, I felt it could help to create a structured record of what actually happened vs. what was predicted. In the tracker, each prediction has its own page with the original claim, evidence for and against, a status with reasoning, primary-source links, and what would change the status.

Methodology

I maintain six status levels: Confirmed / Ahead / On Track / Behind / Emerging / Not Yet Testable. Changes beyond Emerging require explicit evidence. I try to calibrate against external reference points where possible, for example AI Futures Project grading, METR, public financials. Counterevidence and update histories are visible on every page. I use monthly, and by now weekly agent runs to gather the evidence and formulate the updates - and all status-changes are approved manually. It seems to me the pace of reality is picking up, so I upped the frequency as well. Full write-up at ai2027-tracker.com/methodology.

About the project

This is an independent project and I have no affiliation with the AI 2027 authors. I work on AI transformation in Hamburg.

AI 2027 may not perfectly predict the timelines - but it did a great job outlining the underlying forces that are driving these developments. And it took worries that normally live in vague discussion and turned them into dated, checkable claims. Vague worries are easy to dismiss. Dated predictions are not. The tracker is an attempt to keep that translation honest over time, to gather evidence that this unprecedented change is happening, and to create a sense of urgency around making it happen more safely.

I'd especially welcome feedback on:

Predictions I've extracted or interpreted incorrectly
Status assessments that seem too generous or too harsh
Evidence sources I should be tracking and am missing

21