Trends in GPU price-performance

Marius Hobbhahn; Tamay

Trends in GPU price-performance

by Marius Hobbhahn, Tamay

2 min read1st Jul 202212 comments

85 Ω 33

AI TimelinesScaling LawsAI

Frontpage

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is a linkpost for https://epochai.org/blog/trends-in-gpu-price-performance

Executive Summary

Using a dataset of 470 models of graphics processing units (GPUs) released between 2006 and 2021, we find that the amount of floating-point operations/second per $ (hereafter FLOP/s per $) doubles every ~2.5 years. For top GPUs, we find a slower rate of improvement (FLOP/s per $ doubles every 2.95 years), while for models of GPU typically used in ML research, we find a faster rate of improvement (FLOP/s per $ doubles every 2.07 years). GPU price-performance improvements have generally been slightly slower than the 2-year doubling time associated with Moore’s law, much slower than what is implied by Huang’s law, yet considerably faster than was generally found in prior work on trends in GPU price-performance. Our work aims to provide a more precise characterization of GPU price-performance trends based on more or higher-quality data, that is more robust to justifiable changes in the analysis than previous investigations.

Figure 1. Plots of FLOP/s and FLOP/s per dollar for our dataset and relevant trends from the existing literature

Trend	2x time	10x time	Metric
Our dataset (n=470)	2.46 years [2.24, 2.72]	8.17 years [7.45, 9.04]	FLOP/s per dollar
ML GPUs (n=26)	2.07 years [1.54, 3.13]	6.86 years [5.12, 10.39]	FLOP/s per dollar
Top GPUs (n=57)	2.95 years [2.54, 3.52]	9.81 years [8.45, 11.71]	FLOP/s per dollar
Our data FP16 (n=91)	2.30 years [1.69, 3.62]	7.64 years [5.60, 12.03]	FLOP/s per dollar
Moore’s law	2 years	6.64 years	FLOP/s
Huang’s law	1.08 years	3.58 years	FLOP/s
CPU historical (AI Impacts, 2019)	2.32 years	7.7 years	FLOP/s per dollar
Bergal, 2019	4.4 years	14.7 years	FLOPs/dollar

Table 1. Summary of our findings on GPU price-performance trends and relevant trends in the existing literature with the 95% confidence intervals in square brackets.

In future work, we intend to build on this work to produce projections of GPU price-performance, and investigate how our findings inform us about the growth in dollar-spending on computing hardware in Machine Learning.

We would like to thank Alyssa Vance, Ashwin Acharya, Jessica Taylor and the Epoch team for helpful feedback and comments.

New to LessWrong?

Getting Started

FAQ

Library

AI TimelinesScaling LawsAI

Frontpage

85 Ω 33

Mentioned in

288Two-year update on my personal AI timelines

173What a compute-centric framework says about AI takeoff speeds

57Voting Results for the 2022 Review

Trends in GPU price-performance

1st Jul 2022

9M. Y. Zuo

3Marius Hobbhahn

9the gears to ascension

New Comment

12 comments, sorted by

top scoring

Click to highlight new comments since: Today at 5:43 AM

[-]M. Y. Zuo2y90

This trend may hit a wall in the near future as by some industry analysts predict TSMC's 3nm process will actually be more expensive per transistor than the current state-of-the-art 5nm process.

[-]Marius Hobbhahn2y30

We're currently looking deeper into how we can extrapolate this trend. Our preliminary high uncertainty estimate is that it is more likely to slow down than speed up over the foreseeable future.

[-]the gears to ascension2y90

memory latency and bandwidth is critically important in ML algorithm performance in a way that makes this chart less straightforward than it appears; it's a good investigation with, in my view, an inconclusive result. achievable FLOPS on various models would be an interesting comparison.

[-]Marius Hobbhahn4moΩ360Review for 2022 Review

In a narrow technical sense, this post still seems accurate but in a more general sense, it might have been slightly wrong / misleading.

In the post, we investigated different measures of FP32 compute growth and found that many of them were slower than Moore's law would predict. This made me personally believe that compute might be growing slower than people thought and most of the progress comes from throwing more money at larger and larger training runs. While most progress comes from investment scaling, I now think the true effective compute growth is probably faster than Moore's law.

The main reason is that FP32 is just not the right thing to look at in modern ML and we even knew this at the time of writing, i.e. it ignores tensor cores and lower precisions like TF16 or INT8.

I'm a little worried that people who read this post but don't have any background in ML got the wrong takeaway from the post and we should have emphasized this difference even more at the time. We have written a follow-up post about this recently here: https://epochai.org/blog/trends-in-machine-learning-hardware
I feel like the new post does a better job at explaining where compute progress comes from.

[-]Lone Pine2y20

I notice that the ML GPUs are not the best bang-for-your-buck in this chart. I assume that researchers prefer them because they pack more 'bang' (FLOPS/s) in one unit, and that distributing across multiple cards has a performance penalty and/or adds complexity. How do factors like the cost of the rig (motherboard, power supply, case) and the cost of electricity play into this? Would a large cluster of more commodity GPUs be an effective research setup which just isn't economically competitive with ML GPUs, or would it be impractical at research scale?

[-]Nanda Ale2y51

I believe the performance/complexity penalty generally makes large clusters of cheap consumer GPUs not viable, with memory capacity being the biggest problem. From my perspective outside looking in, it takes a lot of effort and reengineering to make many ML projects just do inference on consumer GPUs with lower memory, and even more work to make it possible to train them with numerous GPUs of low memory. And it the vast majority cases the author say it's not even possible.

The lone exception being the consumer 3090 GPU, as a massive outlier with 24GB of memory. But in pure flops the 3080 GPU is almost equivalent to a 3090 but has only 10 GB.

[-]weverka1y10

You have more than an order of magnitude scatter in your plot, but you write 3 significant figures to your calculated doubling period. Is this precision of value?

Also, your black data appears to have something different going on prior to 2008. It would be worthwhile doing a separate fit to post 2008 data. Eyeballing it, it is longer than 4 year doubling time.

[-]Cullen2yΩ110

Is there a publicly accessible version of the dataset?

[-]Marius Hobbhahn2y10

Update: it's published now and you can find it here: https://chip-dataset.vercel.app/

[-]lalaithion2y10

How did you decide where the y-intercept for Huang’s law should be? It seems that even if you fix the slope to 25x per 5 years, the line could still be made to fit the data better by giving it a different y-intercept.

[-]Marius Hobbhahn2y20

The comparison lines (dotted) have completely arbitrary y-intercepts. You should only take the slope seriously.

[-]Leon Lang2y10

That might be worth mentioning, as I wondered about the same. (I didn't realize until now that all the slope curves start at the same point on the left hand side of the figure)

Moderation Log