The Bay Area, where MATS is based, is not the governance hub of the US;
The Bay is an AI hub, home to OpenAI, Google, Meta, etc., and therefore an AI governance hub. Governance is not governments. Important decisions are being made there - maybe more important decisions than in DC. To quote Allan Dafoe:
AI governance concerns how humanity can best navigate the transition to a world with advanced AI systems. It relates to how decisions are made about AI, and what institutions and arrangements would help those decisions to be made well.
Also, many, many AI governance projects go hand-in-hand with technical expertise.
Maybe more broadly, AI strategy is part of AI governance.
Agree that this discussion is surprisingly often confusing and people use the terms interchangeably. Unfortunately, readers often referred to our training compute measurement as a measure of performance, rather than a quantity of executed operations. However, I don't think that this is necessarily due to the abbreviations but also due to the lack of understanding of what one measures. Next to making the distinction more clear with the terms, one should probably also explain it more and use terms such as quantity and performance.
For my research, I've been trying to be consistent with FLOPs (smaller case s) referring to the quantity. While FLOPS or FLOP/s refer to the performance: operations per second. (FWIW, during my time in computer engineering, it has been the norm to use FLOPs for quantity and FLOPS for performance.)
The term Petaflop/s-days also helps - outlining how long the performance (Petaflop/s) runs for how many days, therefore measuring a quantity of operations.
Note it gets even more complicated once we take the number representation (floating point 32bit, 16bit, or even bfloat16) into consideration. Therefore, I'm also in favor of maybe switching at one point towards OPs and OP/s and also document the used number representation for actual technical documentation (such as reporting the compute of ML models).
They trained it on TPUv3s, however, the robot inference was run on a Geforce RTX 3090 (see section G).
TPUs are mostly designed for data centers and are not really usable for on-device inference.
I'd be curious to hear more thoughts on how much we could already scale it right now. Looks like that data might be a bottleneck?
Some thoughts on compute:
Gato estimate: 256 TPUv3 chips for 4 days a 24hours = 24'574 TPUv3-hours (on-demand costs are $2 per hour for a TPUv3) =$49'152
In comparison, PaLM used 8'404'992 TPUv4 hours and I estimated that it'd cost $11M+. If we'd assume that someone would be willing to spend the same compute budget on it, we could make the model 106x bigger (assuming Chinchilla scaling laws). Also tweeted about this here.
The size of the model was only(?) limited due to latency requirements for the robotics part.
Thanks for the thoughtful response, Connor.
I'm glad to hear that you will develop a policy and won't be publishing models by default.
Glad to see a new Alignment research lab in Europe. Good luck with the start and the hiring!
I'm wondering, you're saying:
That being said, our publication model is non-disclosure-by-default, and every shared work will go through an internal review process out of concern for infohazards.
That's different from Eleuther's position. Is this a change of mind or a different practice due to the different research direction? Will you continue open-sourcing your ML models?
"A grassroots collective of researchers working to open source AI research."
From their paper:
We trained PaLM-540B on 6144 TPU v4 chips for 1200 hours and 3072 TPU v4 chips for 336 hours including some downtime and repeated steps.
That's 64 days.
It's roughly an order of magnitude more compute than GPT-3.