Cerebras recently unveiled Andromeda - https://www.cerebras.net/andromeda/, an AI supercomputer that enables near linear scaling. Do I understand correctly that this might have a big impact on the large (language) models research, since it would significantly speed up the training? E.g. if current models take 30+ days long to train, we can just 10x the number of machines and have it done in three days? Also, it seems to be much simpler to use, thus decreasing the cost of development and the hassle with dstributed computing.
If so, I think its almost certain that large companies would do it, and this in turn would significantly speed up the research/training/algorithm development of large models such as GPT, GATO and similar? It seems like this type of development should affect the discussion about timelines, however I haven't seen it mentioned anywhere else before.
I'm not sure if PFLOPs are a fair comparison here though, if I understand Cerebras' point correctly. Like, if you have ten GPUs with one PFLOP each, that's technically the same number of PFLOPs as a single GPU with ten PFLOPs. But actually that single GPU is going to train a lot faster than the ten GPUs because the ten GPUs are going to have to spend time communicating with each other. Especially as memory limitations... (read more)