How feasible/costly would it be to train a very large AI model on distributed clusters of GPUs?
Folding@home is the most powerful supercomputer in the world. It relies on simulations utilizing on a distributed network of GPUs, CPUs, and ARM processors volunteered by people around the world. From some quick Googling, it looks like GPUs account for a large majority of Folding@home’s processing power. This suggests to...
All of this sounds reasonable and it sounds like you may have insider info that I don’t. (Also, TBC I wasn’t trying to make a claim about which model is the base model for a particular o-series model, I was just naming models to be concrete, sorry to distract with that!)
Totally possible also that you’re right about more inference/search being the only reason o3 is more expensive than o1 — again it sounds like you know more than I do. But do you have a theory of why o3 is able to go on longer chains of thought without getting stuck, compared with o1? It’s possible that it’s just a grab bag... (read more)