On AI and Compute

johncrox

This is a post on OpenAI’s “AI and Compute” piece, as well as excellent responses by Ryan Carey and Ben Garfinkel, Research Fellows at the Future of Humanity Institute. (Crossposted on the EA Forum)

Intro: AI and Compute

Last May, OpenAI released an analysis on AI progress that blew me away. The key takeaway is this: the computing power used in the biggest AI research projects has been doubling every 3.5 months since 2012. That means that more recent projects like AlphaZero have tens of thousands of times the “compute” behind them as something like AlexNet did in 2012.

When I first saw this, it seemed like evidence that powerful AI is closer than we think. Moore’s Law doubled generally-available compute about every 18 months to 2 years, and has resulted in the most impressive achievements of the last half century. Personal computers, mobile phones, the Internet...in all likelihood, none of these would exist without the remorseless progress of constantly shrinking, ever cheaper computer chips, powered by the mysterious straight line of Moore’s Law.

So with a doubling cycle for AI compute that’s more than five times faster (let’s call it AI Moore’s Law), we should expect to see huge advances in AI in the relative blink of an eye...or so I thought. But OpenAI’s analysis has led some people to the exact opposite view.^[1]

Interpreting the Evidence

Ryan Carey points out that while the compute used in these projects is doubling every 3.5 months, the compute you can buy per dollar is growing around 4-12 times slower. The trend is being driven by firms investing more money, not (for the most part) inventing better technology, at least on the hardware side. This means that the growing cost of projects will keep even Google and Amazon-sized companies from sustaining AI Moore’s Law for more than roughly 2.5 years. And that’s likely an upper bound, not a lower one; companies may try keep their research budgets relatively constant. This means that increased funding for AI research would have to displace other R&D, which firms will be reluctant to do.^[2] But for lack of good data, for the rest of the post I’ll assume we’ve more or less been following the trend since the publication of “AI and Compute”.^[3]

While Carey thinks that we’ll pass some interesting milestones for compute during this time which might be promising for research, Ben Garfinkel is much more pessimistic. His argument is that we’ve seen a certain amount of progress in AI research recently, so realizing that it’s been driven by huge increases in compute means we should reconsider how much adding more will advance the field. He adds that this also means AI advances at the current pace are unsustainable, agreeing with Carey. Both of their views are somewhat simplified here, and worth reading in full.

Thoughts on Garfinkel

To address Garfinkel’s argument, it helps to be a bit more explicit. We can think of the compute in an AI system and the computational power of a human brain as mediated by the effectiveness of their algorithms, which is unknown for both humans and AI systems. The basic equation is something like: Capability = Compute * Algorithms. Once AI’s Capability reaches a certain threshold, “Human Brain,” we get human-level AI. We can observe the level of Capability that AI systems have reached so far (with some uncertainty), and have now measured their Compute. My initial reaction to reading OpenAI’s piece was the optimistic one - Capability must be higher than we thought, since Compute is so much higher! Garfinkel seems to think that Algorithms must be lower than we thought, since Capability hasn’t changed. This shows that Garfinkel and I disagree on how precisely we can observe Capability. We can avoid lowering Algorithms to the extent that our observation of Capability is imprecise and has room for revision. I think he’s probably right that the default approach should be to revise Algorithms downward, though there’s some leeway to revise Capability upward.

Much of Garfinkel’s pessimism about the implications of “AI and Compute” comes from the realization that its trend will soon stop - an important point. But what if, by that time, the Compute in AI systems will have surpassed the brain’s?

Thoughts on Carey

Carey thinks that one important milestone for AI progress is when projects have compute equal to running a human brain for 18 years. At that point we could expect AI systems to match an 18-year-old human’s cognitive abilities, if their algorithms successfully imitated a brain or otherwise performed at its level. AI Impacts has collected various estimates of how much compute this might require - by the end of AI Moore's Law they should comfortably reach and exceed it. Another useful marker is the 300-year AlphaGo Zero milestone. The idea here is that AI systems might learn much more slowly than humans - it would take someone about 300 years to play as many Go games as AlphaGo Zero did before beating its previous version, which beat a top-ranked human Go player. A similar ratio might apply to learning to perform other tasks at a human-equivalent level (although AlphaGo Zero’s performance was superhuman). Finally we have the brain-evolution milestone; that is, how much compute it would take to simulate the evolution of a nervous system as complex as the human brain. Only this last milestone is outside the scope of AI Moore's Law.^[4] I tend to agree with Carey that the necessary compute to reach human-level AI lies somewhere around the 18 and 300-year milestones.

But I believe his analysis likely overestimates the difficulty of reaching these computational milestones. The FLOPS per brain estimates he cites are concerned with simulating a physical brain, rather than estimating how much useful computation the brain performs. The level of detail of the simulations seems to be the main source of variance among these higher estimates, and is irrelevant for our purposes - we just want to know how well a brain can compute things. So I think we should take the lower estimates as more relevant - Moravec’s 10^13 FLOPS and Kurzweil’s 10^16 FLOPS (page 114) are good places to start,^[5] though far from perfect. These estimates are calculated by comparing areas of the brain responsible for discrete tasks like vision to specialized computer systems - they represent something nearer the minimum amount of computation to equal the human brain than other estimates. If accurate, the reduction in required computation by 2 orders of magnitude has significant implications for our AI milestones. Using the estimates Kurzweil cites, we’ll comfortably pass the milestones for both 18 and 300-year human-equivalent compute by the time AI Moore's Law has finished in roughly 2.5 years.^[6] There’s also some reason to think that AI systems’ learning abilities are improving, in the sense that they don’t require as much data to make the same inferences. DeepMind certainly seems to be saying that AlphaZero is better at searching a more limited set of promising moves than Stockfish, a traditional chess engine (unfortunately they don’t compare it to earlier versions of AlphaGo on this metric). On the other hand, board games like Chess and Go are probably the ideal case for reinforcement learning algorithms, as they can play against themselves rapidly to improve. It’s unclear how current approaches could transfer to situations where this kind of self-play isn’t possible.

Final Thoughts

So - what can we conclude? I don’t agree with Garfinkel that OpenAI’s analysis should make us more pessimistic about human-level AI timelines. While it makes sense to revise our estimate of AI algorithms downward, it doesn’t follow that we should do the same for our estimate of overall progress in AI. By cortical neuron count, systems like AlphaZero are at about the same level as a blackbird (albeit one that lives for 18 years),^[7] so there’s a clear case for future advances being more impressive than current ones as we approach the human level. I’ve also given some reasons to think that level isn’t as high as the estimates Carey cites. However, we don’t have good data on how recent projects fit AI Moore’s Law. It could be that we’ve already diverged from the trend, as firms may be conservative about drastically changing their R&D budgets. There’s also a big question mark hovering over our current level of progress in the algorithms that power AI systems. Today’s techniques may prove completely unable to learn generally in more complex environments, though we shouldn’t assume they will.^[8]

If AI Moore’s Law does continue, we’ll pass the 18 and 300-year human milestones in the next two years. I expect to see an 18-year-equivalent project in the next five, even if it slows down. After these milestones, we’ll have some level of hardware overhang^[9] and be left waiting on algorithmic advances to get human-level AI systems. Governments and large firms will be able to compete to develop such systems, and costs will halve roughly every 4 years,^[10] slowly widening the pool of actors. Eventually the relevant breakthroughs will be made. That they will likely be software rather than hardware should worry AI safety experts, as these will be harder to monitor and foresee.^[11] And once software lets computers approach a human level in a given domain, we can quickly find ourselves completely outmatched. AlphaZero went from a bundle of blank learning algorithms to stronger than the best human chess players in history...in less than two hours.

Important to note that while Moore’s Law resulted in cheaper computers (while increasing the scale and complexity of the factories that make them), this doesn’t seem to be doing the same for AI chips. It’s possible that AI chips will also decrease in cost after attracting more R&D funding/becoming commercially available, but without a huge consumer market, it seems more likely that these firms will mostly have to eat the costs of their investments. ↩︎
This assumes corporate bureaucracy will slow reallocation of resources, and could be wrong if firms prove willing to keep ratcheting up total R&D budgets. Both Amazon and Google are doing so at the moment. ↩︎
Information about the cost and compute of AI projects since then would be very helpful for evaluating the continuation of the trend. ↩︎
Cost and computation figures take AlphaGo Zero as the last available data point in the trend, since it’s the last AI system for which OpenAI has calculated compute. AlphaGo Zero was released in October 2017, but I’m plotting how things will go from now, March 2019, assuming that trends in cost and compute have continued. These estimates are therefore 1.5 years shorter than Carey’s, apart from our use of different estimates of the brain’s computation. ↩︎
Moravec does his estimate by comparing the number of calculations machine vision software makes to the retina, and extrapolating to the size of the rest of the brain. This isn’t ideal, but at least it’s based on a comparison of machine and human capability, not simulation of a physical brain. Kurzweil cites Moravec’s estimate as well as a similar one by Lloyd Watts based on comparisons between the human auditory system and teleconferencing software, and finally one by the University of Texas replicating the functions of a small area of the cerebellum. These latter estimates come to 10^17 and 10^15 FLOPS for the brain. I know people are wary of Kurzweil, but he does seem to be on fairly solid ground here. ↩︎
The 18-year milestone would be reached in under a year and the 300-year milestone in slightly over another. If the brain performs about 10^16 operations per second, 18 year’s worth would be roughly 10^25 FLOPS. AlphaGo Zero used about 10^23 FLOPS in October 2017 (1,000 Petaflop/s-days, 1 petaflop/s-day is roughly 10^20 ops). If the trend is holding, Compute is increasing roughly an order of magnitude per year. It’s worth noting that this would be roughly a $700M project in late 2019 (scaling AlphaZero up 100x and halving costs every 4 years), and something like $2-3B if hardware costs weren’t spread across multiple projects. Google has an R&D budget over $20B, so this is feasible, though significant. The AlphaGo Zero games milestone would take about 14 months more of AI Moore's Law to reach, or a few decades of cost decreases if it ends. ↩︎
This is relative to 10^16 FLOPS estimates of the human brain’s computation and assuming computation is largely based on cortical neuron count - a blackbird would be at about 10^14 FLOPS by this measure. ↩︎
An illustration of this point is found here, expressed by Richard Sutton, one of the inventors of reinforcement learning. He examines the history of AI breakthroughs and concludes that fairly simple search and learning algorithms have powered the most successful efforts, driven by increasing compute over time. Attempts to use models that take advantage of human expertise have largely failed. ↩︎
This argument fails if the piece’s cited estimates of a human brain’s compute are too optimistic. If more than a couple extra orders of magnitude are needed to get brain-equivalent compute, we could be many decades away from having the necessary hardware. AI Moore’s Law can’t continue much longer than 2.5 years, so we’d have to wait for long-term trends in cost decreases to run more capable projects. ↩︎
AI Impacts cost estimates, using the 10-16 year recent order of magnitude cost decreases. ↩︎
If the final breakthroughs depend on software, we’re left with a wide range of possible human-level AI timelines - but one that likely precludes centuries in the future. We could theoretically be months away from such a system if current algorithms with more compute are sufficient. See this article, particularly the graphic on exponential computing growth. This completely violates my intuitions of AI progress but seems like a legitimate position. ↩︎

(Cross-posted from the EA Forum)

DeepMind certainly seems to be saying that AlphaZero is better at searching a more limited set of promising moves than Stockfish, a traditional chess engine (unfortunately they don’t compare it to earlier versions of AlphaGo on this metric).

Only at test time. AlphaZero has much more experience gained from its training phase. (Stockfish has no training phase, though you could think of all of the human domain knowledge encoded in it as a form of "training".)

AlphaZero went from a bundle of blank learning algorithms to stronger than the best human chess players in history...in less than two hours.

Humans are extremely poorly optimized for playing chess.

I don’t agree with Garfinkel that OpenAI’s analysis should make us more pessimistic about human-level AI timelines. While it makes sense to revise our estimate of AI algorithms downward, it doesn’t follow that we should do the same for our estimate of overall progress in AI. By cortical neuron count, systems like AlphaZero are at about the same level as a blackbird (albeit one that lives for 18 years),[7] so there’s a clear case for future advances being more impressive than current ones as we approach the human level.

Sounds like you are using a model where (our understanding of) current capabilities and rates of progress of AI are not very relevant for determining future capabilities, because we don't know the absolute quantitative capability corresponding to "human-level AI". Instead, you model it primarily on the absolute amount of compute needed.

Suppose you did know the absolute capability corresponding to "human-level AI", e.g. you can say something like "once we are able to solve Atari benchmarks using only 10k samples from the environment, we will have human-level AI", and you found that metric much more persuasive than the compute used by a human brain. Would you then agree with Garfinkel's point?

(Crossposted reply to crossposted comment from the EA Forum)

Thanks for the comment! In order:

I think that its performance at test time is one of the more relevant measures - I take grandmasters' considering fewer moves during a game as evidence that they've learned something more of the 'essence' of chess than AlphaZero, and I think AlphaZero's learning was similarly superior to Stockfish's relatively blind approach. Training time is also an important measure - but that's why Carey brings up the 300-year AlphaGo Zero milestone.

Indeed we are. And it's not clear to me that we're much better optimized for general cognition. We're extremely bad at doing math that pocket calculators have no problem with, yet it took us a while to build a good chess and Go-playing AI. I worry we have very little idea how hard different cognitive tasks will be to something with a brain-equivalent amount of compute.

I'm focusing on compute partly because it's the easiest to measure. My understanding (and I think everyone else's) of AI capabilities is largely shaped by how impressive the results of major papers intuitively seem. And when AI can use something like the amount of compute a human brain has, we should eventually get a similar level of capability, so I think compute is a good yardstick.

I'm not sure I fully understand how the metric would work. For the Atari example, it seems clear to me that we could easily reach it without making a generalizable AI system, or vice versa. I'm not sure what metric could be appropriate - I think we'd have to know a lot more about intelligence. And I don't know if we'll need a completely different computing paradigm from ML to learn in a more general way. There might not be a relevant capability level for ML systems that would correspond to human-level AI.

But let's say that we could come up with a relevant metric. Then I'd agree with Garfinkel, as long as people in the community had known roughly the current state of AI in relation to it and the rate of advance toward it before the release of "AI and Compute".

(Continuing the crossposting)

Mostly agree with all of this; some nitpicks:

My understanding (and I think everyone else's) of AI capabilities is largely shaped by how impressive the results of major papers intuitively seem.

I claim that this is not how I think about AI capabilities, and it is not how many AI researchers think about AI capabilities. For a particularly extreme example, the Go-explore paper out of Uber had a very nominally impressive result on Montezuma's Revenge, but much of the AI community didn't find it compelling because of the assumptions that their algorithm used.

I'm not sure I fully understand how the metric would work. For the Atari example, it seems clear to me that we could easily reach it without making a generalizable AI system, or vice versa.

Tbc, I definitely did not intend for that to be an actual metric.

But let's say that we could come up with a relevant metric. Then I'd agree with Garfinkel, as long as people in the community had known roughly the current state of AI in relation to it and the rate of advance toward it before the release of "AI and Compute".

I would say that I have a set of intuitions and impressions that function as a very weak prediction of what AI will look like in the future, along the lines of that sort of metric. I trust timelines based on extrapolation of progress using these intuitions more than timelines based solely on compute.

To the extent that you hear timeline estimates from people like me who do this sort of "progress extrapolation" who also did not know about how compute has been scaling, you would want to lengthen their timeline estimates. I'm not sure how timeline predictions break down on this axis.

(Criss-cross)

I claim that this is not how I think about AI capabilities, and it is not how many AI researchers think about AI capabilities. For a particularly extreme example, the Go-explore paper out of Uber had a very nominally impressive result on Montezuma's Revenge, but much of the AI community didn't find it compelling because of the assumptions that their algorithm used.

Sorry, I meant the results in light of which methods were used, implications for other research, etc. The sentence would better read, "My understanding (and I think everyone else's) of AI capabilities is largely shaped by how impressive major papers seem."

Tbc, I definitely did not intend for that to be an actual metric.

Yeah, totally got that - I just think that making a relevant metric would be hard, and we'd have to know a lot that we don't know now, including whether current ML techniques can ever lead to AGI.

I would say that I have a set of intuitions and impressions that function as a very weak prediction of what AI will look like in the future, along the lines of that sort of metric. I trust timelines based on extrapolation of progress using these intuitions more than timelines based solely on compute.

Interesting. Yeah, I don't much trust my own intuitions on our current progress. I'd love to have a better understanding of how to evaluate the implications of new developments, but I really can't do much better than, "GPT-2 impressed me a lot more than AlphaStar." And to be totally clear - I don't think we'll get AGI as soon as we reach the 18-year mark, or the 300-year one. I do tend to think that the necessary amount of compute is somewhere in that range, though. After we reach it, I'm stuck using my intuition to guess when we'll have the right algorithms to create AGI.

It felt weird to me to describe shorter timeline projections as "optimistic" and longer ones as "pessimistic"- AI research taking place over a longer period is going to be more likely to give us friendly AI, right?

Hopefully. Yeah, I probably could have used better shorthand.

I tend to agree with Carey that the necessary compute to reach human-level AI lies somewhere around the 18 and 300-year milestones.

I'm sure there's a better discussion about which milestones to use somewhere else, but since I'm rereading older posts to catch up, and others may be doing the same, I'll make a brief comment here.

I think this is going to be an important crux between people who estimate timelines differently.

If you categorically disregard the evolutionary milestones, wouldn't you be saying that searching for the right architecture isn't the bottleneck, but training is? However, isn't it standardly the case that architecture search takes more compute with ML than training? I guess the terminology is confusing here. In ML, the part that takes the most compute is often called "training," but it's not analogous to what happens in a single human's lifetime, because there are architecture tweaks, hyperparameter tuning, and so on. It feels like what ML researchers call "training" is analogous to Hominid evolution, or something like that. Whereas the part that is analogous to a single human's lifetime is AlphaZero going from 0 to superhuman capacity in 3 days of runtime. That second step took a lot less compute than the architecture search that came before!

Therefore, I would discount the 18y and 300y milestones quite a bit. That said, the 18y estimate was never a proper lower bound. The human brain may not be particularly optimal.

So, I feel like all we can say with confidence is that is that brain evolution is a proper higher bound, and AGI might arrive way sooner depending on how much human foresight can cut it down, being smarter than evolution. I think what we need most is conceptual progress on how much architecture search in ML is "random" vs. how much human foresight can cut corners and speed things up.

I actually don't know what the "brain evolution" estimate refers to, exactly. If it counts compute wasted on lineages like birds, that seems needlessly inefficient. (Any smart simulator would realize that mammals are more likely to develop civilization, since they have fewer size constraints with flying.) But probably the "brain evolution" estimate just refers to how much compute it takes to run all the direct ancestors of a present-day human, back to the Cambrian period or something like that?

I'm sure others have done extensive analyses on these things, so I'm looking forward to reading all of that once I find it.

A possible alternative view on the hardware acceleration (gereralised Moore' law) is that it is not something like independent physical process which has its own speed completely independent of us, but it is a market reaction on the growing demand.

AI compute law creates increasing demand for the computational power. Market reaction on this demand is the creation of different specialised AI ASICs chips like TPUs and Graphcore, but specialised AI ASICs have some time lag in appearing, as they need some time to be tested before they will be able outperform general purpose CPU and GPUs.

Moore's law ending is not the limiting factor here, as most of advances of AI ASICs are architectural, but not transistor's size related.

The figures should be less than this. Only a fraction of human time is spent learning.

What does it mean to "revise Algorithm downward"? Observing $\frac{d C a p a b i l i t y}{d C o m p u t e}$ doesn't seem to indicate much about the current value of $A l g o r i t h m$ . Or is Algorithm shorthand for "the rate of increase of Algorithm"?

(Cross-posted from the EA Forum)

DeepMind certainly seems to be saying that AlphaZero is better at searching a more limited set of promising moves than Stockfish, a traditional chess engine (unfortunately they don’t compare it to earlier versions of AlphaGo on this metric).

AlphaZero went from a bundle of blank learning algorithms to stronger than the best human chess players in history...in less than two hours.

Humans are extremely poorly optimized for playing chess.

I don’t agree with Garfinkel that OpenAI’s analysis should make us more pessimistic about human-level AI timelines. While it makes sense to revise our estimate of AI algorithms downward, it doesn’t follow that we should do the same for our estimate of overall progress in AI. By cortical neuron count, systems like AlphaZero are at about the same level as a blackbird (albeit one that lives for 18 years),[7] so there’s a clear case for future advances being more impressive than current ones as we approach the human level.

(Crossposted reply to crossposted comment from the EA Forum)

Thanks for the comment! In order:

(Continuing the crossposting)

Mostly agree with all of this; some nitpicks:

My understanding (and I think everyone else's) of AI capabilities is largely shaped by how impressive the results of major papers intuitively seem.

I'm not sure I fully understand how the metric would work. For the Atari example, it seems clear to me that we could easily reach it without making a generalizable AI system, or vice versa.

Tbc, I definitely did not intend for that to be an actual metric.

But let's say that we could come up with a relevant metric. Then I'd agree with Garfinkel, as long as people in the community had known roughly the current state of AI in relation to it and the rate of advance toward it before the release of "AI and Compute".

(Criss-cross)

I claim that this is not how I think about AI capabilities, and it is not how many AI researchers think about AI capabilities. For a particularly extreme example, the Go-explore paper out of Uber had a very nominally impressive result on Montezuma's Revenge, but much of the AI community didn't find it compelling because of the assumptions that their algorithm used.

Tbc, I definitely did not intend for that to be an actual metric.

Yeah, totally got that - I just think that making a relevant metric would be hard, and we'd have to know a lot that we don't know now, including whether current ML techniques can ever lead to AGI.

I would say that I have a set of intuitions and impressions that function as a very weak prediction of what AI will look like in the future, along the lines of that sort of metric. I trust timelines based on extrapolation of progress using these intuitions more than timelines based solely on compute.

Hopefully. Yeah, I probably could have used better shorthand.

I tend to agree with Carey that the necessary compute to reach human-level AI lies somewhere around the 18 and 300-year milestones.

I'm sure there's a better discussion about which milestones to use somewhere else, but since I'm rereading older posts to catch up, and others may be doing the same, I'll make a brief comment here.

I think this is going to be an important crux between people who estimate timelines differently.

Therefore, I would discount the 18y and 300y milestones quite a bit. That said, the 18y estimate was never a proper lower bound. The human brain may not be particularly optimal.

I'm sure others have done extensive analyses on these things, so I'm looking forward to reading all of that once I find it.

Moore's law ending is not the limiting factor here, as most of advances of AI ASICs are architectural, but not transistor's size related.

The figures should be less than this. Only a fraction of human time is spent learning.

36

On AI and Compute

36

36

36