An application response I wrote! Please feel free to leave any feedback!
Describe a recent paper or blog post that has influenced your perspective on AI safety.
“I underestimated AI capabilities (again)” (https://www.planned-obsolescence.org/p/i-underestimated-ai-capabilities) came out at the beginning of March. In one sentence, author Ajeya Cotra made capabilities predictions in January 2026, and they were outpaced within 2 months. Specifically, in January, Claude Opus 4.5’s 50% task horizon was ~5 hours. Continuing with the historical doubling trend, Cotra predicted that by December, it’d reach ~24 hours (rounded up); but just six weeks later, Opus 4.6’s was already estimated at ~12 hours. The benchmark underlying the metric is already nearing saturation, when the metric was explicitly designed to avoid this; uncertainty exploded to between 5-66 hours.
Cotra then conjectures that once time horizons exceed, say, 80 hours, the metric may lose its meaning altogether, as large software projects actually benefit from decomposition and parallelisation. Thus agents will be able to coordinate to tackle arbitrarily large tasks. The time for a single human to do something is no longer a viable metric; at the very least it must now be the time it’d take for a human team.
In this sense, the benchmark fails to discern meaningfully between models at the frontier because the frontier — the end of the ruler — has been reached. There seemingly aren’t any hundreds-of-hours long tasks that, for humans in real life, wouldn’t be decomposed into teamwork anyway. This has influenced me to believe that the basic science of evaluations and risk assessment is extremely important, as our ontologies going forward may need to be refactored or even reconstructed ground-up. Cotra’s January prediction was pretty reasonable; it all but shows that we don’t have a stable, methodically-derived base rate to extrapolate trends from. And even if we did, capabilities advanced so quickly that we now need to measure something different anyway (agent coordination being a categorically different framework). I question to what extent human-comparability will remain useful as a metric at all.
This one blog post hasn’t made me doomerist, but given again the possibility for emergent, non-domain-slash-task-specific capabilities as purported by the Platonic Representation Hypothesis, assessment frameworks going forward will definitely need profound and thorough design methodology. I recall my first EAG, where Toby Ord emphasised neither long, nor short, but broad timelines — capturing robust, instrumentally useful action items when uncertainty is high. Adapting our first principles in this fashion throughout the knowledge pipeline, from empirical experimentation to expert recommendation to institutional design, may be necessary to build truly accurate predictive world models.
An application response I wrote! Please feel free to leave any feedback!
Describe a recent paper or blog post that has influenced your perspective on AI safety.
“I underestimated AI capabilities (again)” (https://www.planned-obsolescence.org/p/i-underestimated-ai-capabilities) came out at the beginning of March. In one sentence, author Ajeya Cotra made capabilities predictions in January 2026, and they were outpaced within 2 months. Specifically, in January, Claude Opus 4.5’s 50% task horizon was ~5 hours. Continuing with the historical doubling trend, Cotra predicted that by December, it’d reach ~24 hours (rounded up); but just six weeks later, Opus 4.6’s was already estimated at ~12 hours. The benchmark underlying the metric is already nearing saturation, when the metric was explicitly designed to avoid this; uncertainty exploded to between 5-66 hours.
Cotra then conjectures that once time horizons exceed, say, 80 hours, the metric may lose its meaning altogether, as large software projects actually benefit from decomposition and parallelisation. Thus agents will be able to coordinate to tackle arbitrarily large tasks. The time for a single human to do something is no longer a viable metric; at the very least it must now be the time it’d take for a human team.
In this sense, the benchmark fails to discern meaningfully between models at the frontier because the frontier — the end of the ruler — has been reached. There seemingly aren’t any hundreds-of-hours long tasks that, for humans in real life, wouldn’t be decomposed into teamwork anyway. This has influenced me to believe that the basic science of evaluations and risk assessment is extremely important, as our ontologies going forward may need to be refactored or even reconstructed ground-up. Cotra’s January prediction was pretty reasonable; it all but shows that we don’t have a stable, methodically-derived base rate to extrapolate trends from. And even if we did, capabilities advanced so quickly that we now need to measure something different anyway (agent coordination being a categorically different framework). I question to what extent human-comparability will remain useful as a metric at all.
This one blog post hasn’t made me doomerist, but given again the possibility for emergent, non-domain-slash-task-specific capabilities as purported by the Platonic Representation Hypothesis, assessment frameworks going forward will definitely need profound and thorough design methodology. I recall my first EAG, where Toby Ord emphasised neither long, nor short, but broad timelines — capturing robust, instrumentally useful action items when uncertainty is high. Adapting our first principles in this fashion throughout the knowledge pipeline, from empirical experimentation to expert recommendation to institutional design, may be necessary to build truly accurate predictive world models.