Chinchilla is 70 Billion parameters. 10 trillion / 70 billion ≈ 143
According to the paper released with the information about Chinchilla (Training Compute-Optimal Large Language Models” by Hoffmann et al.), they claim "“For every doubling of model size the number of training tokens should also be doubled.”
So if this would be followed, 143 times more data would also be needed, resulting in a 143*143= 20449 increase of compute needed.
Chinchilla probably costed around 1-5 million usd in compute power to train, the cost for a 10 trillion parameter version would cost around 20.5 billion to 122.5 billion usd.
However, this is not a viable alternative, because there is not enough text data available.
A more realistic scenario is to perhaps double the data Chinchilla was trained on (which might not even be easily done), and then 143x the size, so a cost of about 286 million to 1.43 billion usd.
Keep in mind, that GPT-3 was trained on about a quarter of the data that Chinchilla was trained on. So a 10 trillion parameter GPT-3 model might cost around 71.5 million to 358 million usd.
I predict that when something with the anticipated capabilities of a 10T Chinchilla is developed, it will cost at the most a small multiple of GPT-3. There’s not just falling hardware costs, but algorithmic improvements and completely new methods.
It won’t cost less, because more compute always improves things, so the cost is set by the budget.