We at Epoch AI have recently released a new composite AI capability index called the Epoch Capabilities Index (ECI), based on nearly 40 underlying benchmarks. Some key features... * Saturation-proof: ECI "stitches" benchmarks together, to enable comparisons even as individual benchmarks become saturated. * Global comparisons: Models can be compared,...
OpenAI reports that o3-mini with high reasoning and a Python tool receives a 32% on FrontierMath. However, Epoch's official evaluation[1] received only 11%. There are a few reasons to trust Epoch's score over OpenAIs: * Epoch built the benchmark and has better incentives. * OpenAI reported a 28% score on...
Understanding what drives the rising capabilities of AI is important for those who work to forecast, regulate, or ensure the safety of AI. Regulations on the export of powerful GPUs need to be informed by understanding of how these GPUs are used, forecasts need to be informed by bottlenecks, and...
[Epistemic Status: I think this post gestures in the right direction and the ideas in it are positive on the margin. However I am uncertain what I would write if I put substantially more time into thinking on the subject and think it plausible that discussion on this post will...
When discussing impactful research directions, it's tempting to get excited about ideas that seem deep and profoundly insightful. This seems especially true in areas that are theoretical and relatively new - such as AI Alignment Theory. Fascination with the concept of a research direction can leak into evaluations of the...