You fixed it!! Thank you!!
I would love to see a plot where the dots are geographic regions and the x axis is density of rivers and the y axis is number of wars since 1700
Would be cool if predictability of actions (from what the ai says it is about to do) was a standard benchmark, among the others that are reported on every release.
Do you have any ideas for how to get interventions for larger "k" (the number of neurons acted on) to work better? Alternatively, do you have any ideas for how to train a model such that the necessary k, for any given trait/personality/whatever, is small? So that all/each important behavior could be controlled with a small number of neurons.
Absolutely fantastic
I hope that when episodic memory is cracked (ie cheap), it helps with this all-or-nothing problem. That would be really great.
Could one do Efficient Pilot Wave? Mathematically equivalent but you hardly need to compute any of it?
I have been thinking in terms of this fwdperbkwd model of measuring AI agent progress for a few months. This is the first time I actually spelled it out, I just realized. https://www.lesswrong.com/posts/2RwDgMXo6nh42egoC/how-to-game-the-metr-plot?commentId=6fXFqMFxJKdtcZzbs
To clarify my point, I'm saying
A. Standalone/unattended (!) AI agents are clearly much worse than nothing, for both writing code (with a goal chosen in advance) and hacking (a prechosen target), they will clearly only mess up your plans.
If you want to build whatever or hack whoever, and you don't care what, that's a different story.
B. If your AI can make productive incremental progress, without messing up old code, writing false notes, citing misleading evidence, breaking infra, alerting the target, falsifying or ruining the tests, fooling the judge, etc, etc, then you can scale that to do basically anything. If you can lay a brick then you can make a cathedral.
If your whole fancy best-of-n-with-rollbacks system is able to lay down bricks, that also counts as OK.
C. I'm betting against the amazing-do-it-all-at-once-from-scratch AI being either a serious risk vector or driver of economic progress.
At least, if you do take over the world on Tuesday morning, it only counts as a victory if you don't also accidentally give it all back on Tuesday afternoon.
D. I do think these task executions measure something meaningful, and are valuable to conduct, and we should try to extrapolate, but "task horizon length" is definitely not the thing to focus on here. Maybe something like the "zero fatal mistake" rate and the "net-forward-progress geometric mean" or something.
The world just does not look like "task horizons". The world is made up of steps forward and steps backwards. A scientist or a CEO or an engineer or a horse or a door is measured by how many steps forward it brings you, per step backwards it takes you (which you weren't able to spot and avoid obviously).
Fwdperbkwd is harder to measure than task horizons, maybe twice as hard, but not four times harder.
The strongest way I could put the punchline here is that I think AI agents will be useless until their code helps more than it hurts, then AI agents will replace coders completely. And you can only measure this by comparing agentless vs agentful approaches to the same task maybe (hey! METR did that too!)
And more ai-live-assistant is-it-worse-than-nothing-or-not experiments would be highly informative in my opinion!
I just added the
ASSUME Eline at the end. Is the post code correct now?