johncrox — LessWrong

The Takeoff Speeds Model Predicts We May Be Entering Crunch Time

Yep, looks very likely the data centers will be more expensive than this. It’s a bit complicated because Epoch estimates that GPT-4.5 only cost $400 million to train, vs OpenAI’s $7 billion total compute costs in 2024. If the costs of an individual training run are ~5% of an org’s compute budget, rather than more like 50%, that saves the model’s predictions somewhat (though it wasn’t what I had in mind when playing out the scenario, and means the model can’t really say “this is financially feasible” because labs’ total compute costs will be much higher than the stated costs of training).

What does 10x-ing effective compute get you?

johncrox4mo10

I've also looked at the results for o1-preview vs o1 vs o3 on code forces and those models appear to each be 1 SD better than the prior model. When looking at AIME, we also see each model doing around 1 SD better than the prior model. OpenAI said that o3 is 10x more compute than o1 (potentially somewhat more effective compute due to algorithmic/data improvement^[7]) and it seems pretty plausible that o1-preview vs o1 is also roughly 10x more effective compute on RL. So, this matches the guess of 1 SD per OOM. (These gaps appear to be more consistent with an SD model than with a rank ordering model.)

You're only talking about RL compute here, right? This seems like an underestimate of effective compute -> SD increase if so.

E.g. if o3's RL compute is ~equal to its pretraining compute (I think it's probably not equal, and is unlikely to be like 5x higher than pretraining), then o1-preview's RL step would only be ~1% of its total compute budget, and o1's would be ~10%. I'm inclined to think it's not a good idea to neglect pretraining compute here, since it gives the models relevant skills that then don't take much RL to unlock.

The Takeoff Speeds Model Predicts We May Be Entering Crunch Time

johncrox9mo20

Appreciate it. My sense is that the LW feed doesn't prioritize recent posts if they have low karma, so it's hard to get visibility on posts that aren't widely shared elsewhere and upvoted as a result. If you think it's a good post, please send it around!

On AI and Compute

johncrox7y*30

(Criss-cross)

I claim that this is not how I think about AI capabilities, and it is not how many AI researchers think about AI capabilities. For a particularly extreme example, the Go-explore paper out of Uber had a very nominally impressive result on Montezuma's Revenge, but much of the AI community didn't find it compelling because of the assumptions that their algorithm used.

Sorry, I meant the results in light of which methods were used, implications for other research, etc. The sentence would better read, "My understanding (and I think everyone else's) of AI capabilities is largely shaped by how impressive major papers seem."

Tbc, I definitely did not intend for that to be an actual metric.

Yeah, totally got that - I just think that making a relevant metric would be hard, and we'd have to know a lot that we don't know now, including whether current ML techniques can ever lead to AGI.

I would say that I have a set of intuitions and impressions that function as a very weak prediction of what AI will look like in the future, along the lines of that sort of metric. I trust timelines based on extrapolation of progress using these intuitions more than timelines based solely on compute.

Interesting. Yeah, I don't much trust my own intuitions on our current progress. I'd love to have a better understanding of how to evaluate the implications of new developments, but I really can't do much better than, "GPT-2 impressed me a lot more than AlphaStar." And to be totally clear - I don't think we'll get AGI as soon as we reach the 18-year mark, or the 300-year one. I do tend to think that the necessary amount of compute is somewhere in that range, though. After we reach it, I'm stuck using my intuition to guess when we'll have the right algorithms to create AGI.

On AI and Compute

johncrox7y30

(Crossposted reply to crossposted comment from the EA Forum)

Thanks for the comment! In order:

I think that its performance at test time is one of the more relevant measures - I take grandmasters' considering fewer moves during a game as evidence that they've learned something more of the 'essence' of chess than AlphaZero, and I think AlphaZero's learning was similarly superior to Stockfish's relatively blind approach. Training time is also an important measure - but that's why Carey brings up the 300-year AlphaGo Zero milestone.

Indeed we are. And it's not clear to me that we're much better optimized for general cognition. We're extremely bad at doing math that pocket calculators have no problem with, yet it took us a while to build a good chess and Go-playing AI. I worry we have very little idea how hard different cognitive tasks will be to something with a brain-equivalent amount of compute.

I'm focusing on compute partly because it's the easiest to measure. My understanding (and I think everyone else's) of AI capabilities is largely shaped by how impressive the results of major papers intuitively seem. And when AI can use something like the amount of compute a human brain has, we should eventually get a similar level of capability, so I think compute is a good yardstick.

I'm not sure I fully understand how the metric would work. For the Atari example, it seems clear to me that we could easily reach it without making a generalizable AI system, or vice versa. I'm not sure what metric could be appropriate - I think we'd have to know a lot more about intelligence. And I don't know if we'll need a completely different computing paradigm from ML to learn in a more general way. There might not be a relevant capability level for ML systems that would correspond to human-level AI.

But let's say that we could come up with a relevant metric. Then I'd agree with Garfinkel, as long as people in the community had known roughly the current state of AI in relation to it and the rate of advance toward it before the release of "AI and Compute".

On AI and Compute

johncrox7y20

Hopefully. Yeah, I probably could have used better shorthand.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments