This review is part of a project with Joe Collman and Jérémy Perret to try to get as close as possible to peer review when giving feedback on the Alignment Forum. Our reasons behind this endeavor are detailed in our original post asking for suggestions of works to review; but the gist is that we hope to bring further clarity to the following questions:
- How many low-hanging fruits in terms of feedback can be plucked by getting into a review mindset and seeing the review as part of one’s job?
- Given the disparate state of research in AI Alignment, is it possible for any researcher to give useful feedback on any other research work in the field?
- What sort of reviews are useful for AI Alignment research?
Instead of thinking about these questions in the abstract, we simply make the best review we can, which answers some and gives evidence for others.
In this post, we review Fun with +12 OOMs of Compute by Daniel Kokotajlo. We start by summarizing the work, to ensure that we got it right. Then we review the value of the post in itself -- that it, by admitting its hypotheses. We follow by examining the relevance of the work to the field, which hinges on the hypotheses it uses. The last two sections respectively propose follow-up work that we think would be particularly helpful, and discuss how this work fits into the framing of AI Alignment research proposed here by one of us.
This post was written by Adam; as such, even if both Joe and Jérémy approve of its content, it’s bound to be slightly biased towards Adam’s perspective.
The post attempts to operationalize debates around timelines for Transformative AI using current ML techniques in two ways: by proposing a quantity of resources (compute, memory, bandwidth, everything used in computing) for which these techniques should create TAI with high probability (the +12 OOMs of the title), and by giving concrete scenarios of how the use of these resources could lead to TAI.
The operational number comes from Ajeya Cotra’s report on TAI timelines, and isn’t really examined or debated in the post. What is expanded upon are the scenarios proposed for leveraging these added resources.
- OmegaStar, a bigger version of AlphaStar trained on every game in the Steam library, as well as on language-related “games” like “predict the next word in a web page” or “make a user engaged as a chatbot”. Note that these language-related games are played at the scale of the entire internet.
- Amp(GPT-7), a scaled-up GPT model that is then amplified to be able to decompose any tasks into subtasks it can delegate to copies of itself, recursively.
- Crystal Nights, a simulation of the whole of evolution, but done smartly enough (for example by selecting for abstract intelligence) to reduce the compute needed to fit within the budget of +12 OOMs
- Skunkworks, a STEM AI system that uses incredibly detailed simulations to iterate over millions of design variations and test them, all without having to build the prototypes.
- Neuromorph, a learning process to train brain-like models, starting with running the best brain model available right now at the correct biological scale, and then filling in the blanks and iterating on this process using standard ML techniques (like SGD).
In light of these scenarios, Daniel then argues that if one takes them seriously and considers their results as likely to be TAI, one should put the bulk of its probability mass about when TAI will happen at the point where such increase of compute is reached, or before. In comparison, Ajeya’s model from her report puts the median of her distribution at this point, which results in having quite a lot of probability mass after this point.
Daniel thus concludes that an important crux of timeline debates is how people think about scenarios like the ones he presented, and asks for proponents of long timelines to defend their positions along this line for a more productive discussion.
Does the post succeed on its own terms?
This work relies on one big hypothesis: we can get +12 OOMs of compute and other relevant resources in a short enough time frame to warrant the label “short timelines” to the scenario developed here. We found issues with this hypothesis, at least with how it is currently stated and defended. But before detailing those, we start by admitting this assumption and examining how the post fares in that context.
All three of us found Daniel’s scenarios worrying, in terms of potential for TAI. We also broadly agree with Daniel’s global point that the risk of TAI from these scenarios probably implies a shift in the probability mass that lies after this point to somewhere closer to this point (with the caveat that none of us actually studied Ajeya’s report in detail).
Thus the work makes its point successfully: why that amount of additional compute and resources might be enough for TAI, and what it should imply for the most detailed model that we have about TAI timelines.
That being said, we feel that each scenario isn’t as fleshed out as it could be for maximum convincingness. They tend to feel like “if you can apply this technique to a bigger model with every dataset in existence, it would become transformative”. Although that’s an intuition we are sympathetic to, there are many caveats that -- ironically -- the references deal with but Daniel doesn't discuss explicitly or in detail.
- With OmegaStar, one of us thought he remembered that AlphaStar’s reward function was hand shaped, and so humans might prove a bottleneck. A bit more research revealed that AlphaStar used imitation learning to learn a reward function from human games -- an approach that solves at least some of the problems with scaling to “all games in the steam” library.
Since the issue of humans as bottlenecks in training is pretty relevant, it would have been helpful to describe this line of thought in the post.
- With Amp(GPT-7), we wondered why GPT-7 and not GPT-8 or GPT-9. More concretely, why should we expect progress on the tasks that are vital for Daniel’s scenario? We don’t have convincing arguments (as far as we know) for arguing that GPT-N will be good at a task for which GPT-3 showed no big improvement over the state of the art. So the tasks for which we can expect such a jump are the ones GPT-3 (or previous GPT) made breakthrough at.
Daniel actually relies on such tasks, as shown in his reference to this extrapolation post that goes into more detail on this reasoning, and what we can expect from future versions of GPT models. But he fails to make this important matter explicit enough to help us think through the argument and decide whether we’re convinced. Instead the only way to find out is either to know already that line of reasoning, or to think very hard about his post and the references in that spec way specifically.
In essence, we find that in this post, almost all the information we would want for thinking about these scenarios exists in the references, but isn’t summarized in nearly enough detail in the post itself to make reading self-contained. Of course, we can’t ask of Daniel that he explains every little point about his references and assumptions. Yet we still feel like he could probably do a better job, given that he already has all the right pointers (as his references show).
Relevance of the post to the field
The relevance of this work appears to rely mostly on the hypothesis that the +12 OOMs of magnitude of compute and all relevant resources could plausibly be obtained in a short time frame. If not, then the arguments made by Daniel wouldn’t have the consequence of making people have shorter timelines.
The first problem we noted was that this hypothesis isn’t defended anywhere in the post. Arguments for it are not even summarized. This in turns means that if we read this post by itself, without being fully up to date with its main reference, there is no reason to update towards shorter timelines.
Of course, not having a defense of this position is hardly strong evidence against the hypothesis. Yet we all agreed that it was counterintuitive enough that the burden of proving at least plausibility laid on people defending it.
Another issue with this hypothesis is that it assumes, under the hood, exactly the kind of breakthrough that Daniel is trying so hard to remove from the software side. Our cursory look at Ajeya’s report (focused on the speed-up instead of the cost reduction) showed that almost all the hardware improvement forecasted came from breakthrough into currently not working (or not scalable) hardware. Even without mentioning the issue that none of these technologies look like they can provide anywhere near the improvement expected, there is still the fact that getting these orders of magnitude of compute requires many hardware breakthroughs, which contradicts Daniel’s stance on not needing new technology or ideas, just scaling.
(Very important note: we haven’t studied Ajeya’s report in full. It is completely possible that our issues are actually addressed somewhere in it, and that the full-fledged argument for why this increase in compute will be possible looks convincing. Also, she herself writes that at least the hardware forecasting part looks under-informed to her. We’re mostly highlighting the same problem as in the previous section -- Daniel not summarizing enough the references that are crucial to his point -- with the difference that this time, when looking quickly at the reference, we failed to find convincing enough arguments).
Lastly, Daniel edited his post to add that the +12 OOMs increase applied to every relevant resource, like memory and bandwidth. But bandwidth for example is known to increase far slower than compute. We understand that this edit was a quick one made to respond to critics that some of his scenarios would require a lot of other resources, but it considerably weakens his claim by making his hypothesis almost impossible to satisfy. That is, even if we could see an argument for that much short term increase in compute, a similar argument for bandwidth looks much less probable.
One counterargument is that the scenarios don’t need that much bandwidth, which sounds reasonable. But then what’s missing is a ballpark estimate of how much each type of resource is needed, and an argument for why that increase might be done in a short timeline scale.
To summarize, how we interpret this work depends on an hypothesis that is neither obvious nor defended in an easy-to-find argument. As such, we are unable to really judge what should be done following the argument in this post. If the hypothesis is indeed plausible and defended, then we feel that Daniel is making a good point for updating timelines towards shorter ones. If the hypothesis cannot be plausibly defended, then this post might even have the opposite effect: if the most convincing scenarios we have for TAI using modern ML looks like they require an amount of compute we won’t get anytime soon, some might update towards longer timelines (at least compared to Daniel’s very short ones).
Follow-up work we would be excited about
Most of our issues with this post come from its main premise. As such, we would be particularly excited by any further research arguing for it, be it by extracting the relevant part from sources like Ajeya’s report, or by making a whole new argument.
If the argument can only be made for a smaller increase in compute, then looking for scenarios using this much would be the obvious next step.
Less important but still valuable, fleshing out the scenarios and operationalizing them as much as possible (for example with the requirements in the various other resources, or the plausible bottlenecks) would be a good follow-up.
Fitness with framing on AI Alignment research
Finally, how does this work fit in the framing of AI Alignment research proposed by one of us (Adam) here? To refresh memories, this framing splits AI Alignment research into categories around 3 aspects of the field: studying which AIs we’re most likely to build, and thus which one we should try to align; studying what well-behaved means for AIs, that is, what we want; and based on at least preliminary answers from the previous two, studying how to solve the problem of making that kind of AIs well-behaved in that way.
Adam finds that Daniel’s post fits perfectly in the first category: it argues for the fact that scaled up current AI (à la prosaic AGI) is the kind of AI we should worry about and make well-behaved. And similarly, this post makes no contribution to the other two categories.
On the other hand, the other two reviewers are less convinced that this fully captures Daniel’s intentions. For example, they argue that it’s not that intuitive that an argument about timelines (when we’re going to build AI) fits into the “what kind of AI we’re likely to build” part. Or that the point of the post looks more like a warning for people expecting a need for algorithmic improvement than a defense of a specific kind of AI we’re likely to build.
What do you think?
We find that this post is a well-written argument for short timelines, painting a vivid picture of possible transformative AIs with current ML technology, and operationalizing a crux for the timeline debate. That being said, the post also suffers from never defending his main premise, and not pointing to a source from which it can be extracted without a tremendous investment of work.
In that condition, we can’t be confident about how this work will be relevant to the field. But additional research and argument about that premise would definitely help convince us that this is crucial work for AI Alignment.