To clarify my point, I'm saying
A. Standalone/unattended (!) AI agents are clearly much worse than nothing, for both writing code (with a goal chosen in advance) and hacking (a prechosen target), they will clearly only mess up your plans.
If you want to build whatever or hack whoever, and you don't care what, that's a different story.
B. If your AI can make productive incremental progress, without messing up old code, writing false notes, citing misleading evidence, breaking infra, alerting the target, falsifying or ruining the tests, fooling the judge, etc, etc, then you can scale that to do basically anything. If you can lay a brick then you can make a cathedral.
If your whole fancy best-of-n-with-rollbacks system is able to lay down bricks, that also counts as OK.
C. I'm betting against the amazing-do-it-all-at-once-from-scratch AI being either a serious risk vector or driver of economic progress.
At least, if you do take over the world on Tuesday morning, it only counts as a victory if you don't also accidentally give it all back on Tuesday afternoon.
D. I do think these task executions measure something meaningful, and are valuable to conduct, and we should try to extrapolate, but "task horizon length" is definitely not the thing to focus on here. Maybe something like the "zero fatal mistake" rate and the "net-forward-progress geometric mean" or something.
The world just does not look like "task horizons". The world is made up of steps forward and steps backwards. A scientist or a CEO or an engineer or a horse or a door is measured by how many steps forward it brings you, per step backwards it takes you (which you weren't able to spot and avoid obviously).
Fwdperbkwd is harder to measure than task horizons, maybe twice as hard, but not four times harder.
The strongest way I could put the punchline here is that I think AI agents will be useless until their code helps more than it hurts, then AI agents will replace coders completely. And you can only measure this by comparing agentless vs agentful approaches to the same task maybe (hey! METR did that too!)
And more ai-live-assistant is-it-worse-than-nothing-or-not experiments would be highly informative in my opinion!
Horizon length makes no sense fundamentally. The 9s of reliability don't matter for anything but driving a car. Software and hacking and science and engineering are not a series of sequential steps. It is more like adding stuff to your house. If you can make a tiny improvement then you are net-helpful and you can build a cathedral. If you make improvements more often than you cause problems, anything is possible. If you break stuff more often than you fix it, then nothing is possible. It is a binary question. If you make big improvements, then small mistakes don't matter. If you mess up stuff bigtime and can't fix it, then big improvements are useless.
I think "AI agent" is here to stay
I have found 0 twitter much better than 0.1 twitter
Hmm maybe you are right about my great-grandparents food availability, I should look into it more.
The second quote was meant to be a joke.
I am ordering a motivational poster of this
Eugenics is not metrically fit; it makes your grandma gasp and frown; it does not quickly directly benefit the people who make it happen; it does not have the exciting element of a sudden moment.
I think that the politicians are doing their best to prop up home prices and keep the pensions alive and make everything stay normal etc. Banks try to stop you from withdrawing a lot of money at once. The only problem is that we got used to growth and innovation and freedom and peace and comfort and security as the norm. It is deep in the culture... Scammers and wars usually come from across national borders; what you are asking for is not a simple easy clear thing; I don't blame you for wishing.
I have been thinking in terms of this fwdperbkwd model of measuring AI agent progress for a few months. This is the first time I actually spelled it out, I just realized. https://www.lesswrong.com/posts/2RwDgMXo6nh42egoC/how-to-game-the-metr-plot?commentId=6fXFqMFxJKdtcZzbs