Measuring AI Ability to Complete Long Tasks - METR In their original 2025 paper, METR noticed that the slope (aka task horizon doubling time) of the trendline for models released in 2024 and later is different from the slope for <2024 models. First, I decided to check whether a piecewise...
I've seen this phrase many times, but there are two quite different things one could mean by that. Easy RSI: AI gets so good at R&D that human researchers who develop AI get replaced by AI researchers who develop other, better AI. Hard RSI: AI modifies itself in a way...
I assume you are familiar with the METR paper: https://arxiv.org/abs/2503.14499 In case you aren't: the authors measured how long it takes a human to complete some task, then let LLMs do those tasks, and then calculated task length (in human time) such that LLMs can successfully complete those tasks 50%/80%...
Potemkin Understanding in Large Language Models: https://arxiv.org/pdf/2506.21521 If LLMs can correctly define a concept/idea but fail to identify an example or provide an example, then that's called "potemkin understanding," or just potemkin for short. > This specific combination of correct and incorrect answers is irreconcilable with any answer that a...
The Model Alignment between Statements and Knowledge (MASK) benchmark assesses how likely models are to lie under pressure. Related paper: https://arxiv.org/pdf/2503.03750. Here's an illustration to explain how MASK works: Each example consists of 4 key components: * Proposition: A simple statement with a pivotal variable that is binary or numerical...
A while ago I saw a person in the comments to Scott Alexander's blog arguing that a superintelligent AI would not be able to do anything too weird and that "intelligence is not magic", hence it's Business As Usual. Of course, in a purely technical sense, he's right. No matter...