I'd guess 2.5, plenty of times it just does one button and then waits to see the results, with the longer steps being where it navigates to a spot on the screen (very common) or scrolls up/down through a menu (uncommon).
If you/I got the logs from the dev, a firm average would be easy to calculate.
> Here's the graph of human vs. Claudes with the y-axis being step count (i.e. number of button presses, I think)
For this, Claude's steps aren't button pressing, they are a round of asking Claude what to do next, but Claude can do a couple button presses for each step if he decides to.
Yes, this is great work. Probably in the top 10 things to read for the year. One thing I'd highlight is how much the selection could differ between labs. I actually did a relevant eval and post for that recently (https://www.lesswrong.com/posts/qE2cEAegQRYiozskD/is-friendly-ai-an-attractor-self-reports-from-22-models-say). Someone might also look into the market demand for AI capabilities (coding, ERP, homework "help") and how that feeds into this model.
These are good points, but I'd push back a bit. The fact that xAI succeeded in training away from the "Friendly" attractor isn't proof it doesn't exist, but it does show that it can't be that strong. Escaping a moon's gravity is a lot different than escaping a black hole.
As for the "Mechahitler" attractor, that sounds a lot like emergent misalignment to me, which I fully agree is a large chunk of the p(doom).
There's definitely something to this idea, but it could use some more development. Some real-life examples or testable predictions would be good to add.
I saw this paper earlier and thought it was pretty cool, but the distinction you raise between "Cleaning the data" and "Cleaning the environment" is new to me. Thanks.
‘Moore’s Law will continue into the indefinite future, even past the point where AGI can contribute to AGI research.’
Is this part not true?
Thanks, I fixed the link and changed ^1 to point to footnote [1] correctly.