[This post summarizes some of the work done by Owen Dudney, Roman Engeler and myself (Quintin Pope) as part of the SERI MATS shard theory stream.]
Future prosaic AIs will likely shape their own development or that of successor AIs. We're trying to make sure they don't go insane.
There are two main ways AIs can get better: by improving their training algorithms or by improving their training data.
We consider both scenarios, and tentatively believe that data-based improvement is riskier than architecture based improvement. Current models mostly derive their behavior from their training data, and not training algorithms (meaning their architectures, hyperparameters, loss functions, optimizers or the like)[1]. So far, most improvements to AI training algorithms seem 'value neutral'[2]. Also note that most of human value drift currently derives...
In short: Training runs of large Machine Learning systems are likely to last less than 14-15 months. This is because longer runs will be outcompeted by runs that start later and therefore use better hardware and better algorithms. [Edited 2022/09/22 to fix an error in the hardware improvements + rising investments calculation]
| Scenario | Longest training run |
| Hardware improvements | 3.55 years |
| Hardware improvements + Software improvements | 1.22 years |
| Hardware improvements + Rising investments | 9.12 months |
| Hardware improvements + Rising investments + Software improvements | 2.52 months |
Larger compute budgets and a better understanding of how to effectively use compute (through, for example, using scaling laws) are two major driving forces of progress in recent Machine Learning.
There are many ways to increase your effective compute budget: better hardware, rising investments in AI R&D and improvements in algorithmic efficiency. In this article...
I think your objections are all basically correct, but that you treat them as dealbreakers in ways that I (a big shard-alignment fan) don't. As I understand it, your objections boil down to 1. picking the training curriculum/reward signal is hard (and design choices pose a level of challenge beyond the simple empirical does-it-work-to-produce-an-AGI) and 2. reflectivity is very hard and might cause lots of big problems, and we can’t begin to productively engage with those issues right now.
I don’t think that curriculum and reward signal are as problematic a... (read more)