From my notes. Your statement about RTX 4090 leading the pack in flops per dollar does not seem correct based on these sources, perhaps you have a better source for your numbers than I do.
I did not realize that H100 had >3.9 PFLOPS at 8-bit precision until you prompted me to look, so I appreciate that nudge. That does put the H100 above the TPU v5e in terms of FLOPS/$. Prior to that addition, you can see why I said TPU v5e was taking the lead. Note that the sticker price for TPU v5e is estimated, partly from a variety of sources, partly from my own estimate calculated from the lock-in hourly usage rates.
Note that FP8 and INT8 are both 8-bit computations and are in a certain sense comparable if not necessarily equivalent.
Could you lay that out for me, a little bit more politely? I’m curious.
Does Roodman’s model concern price-performance or raw performance improvement? I can’t find the reference and figured you might know. In either case, price-performance only depends on Moore’s law-like considerations in the numerator, while the denominator (price) is a a function of economics, which is going to change very rapidly as returns to capital spent on chips used for AI begins to grow.
As I remarked in other comments on this post, this is a plot of price-performance. The denominator is price, which can become cheap very fast. Potentially, as the demand for AI inference ramps up over the coming decade, the price of chips falls fast enough to drive this curve without chip speed growing nearly as fast. It is primarily an economic argument, not a purely technological argument.
For the purposes of forecasting, and understanding what the coming decade will look like, I think we care more about price-performance than raw chip speed. This is particularly true in a regime where both training and inference of large models benefit from massive parallelism. This means you can scale by buying new chips, and from a business or consumer perspective you benefit if those chips get cheaper and/or if they get faster at the same price.
Thanks, I’ll keep that in mind!
A couple of things:
The graph was showing up fine before, but seems to be missing now. Perhaps it will come back. The equation is simply an eyeballed curve fit to Kurzweil's own curve. I tried pretty hard to convey that the 1000x number is approximate: > Using the super-exponential extrapolation projects something closer to 1000x improvement in price-performance. Take these numbers as rough, since the extrapolations depend very much on the minutiae of how you do your curve fit. Regardless of the details, it is a difference of orders of magnitude.
The justification for putting the 1000x number in the post instead of precisely calculating a number from the curve fit is that the actual trend is pretty wobbly over the years, and my aim here is not to pretend at precision. If you just look at the plot, it looks like we should expect "about 3 orders of magnitude" which really is the limit of the precision level that I would be comfortable with stating. I would guess not lower than two orders of magnitude. Certainly not as low as one order of magnitude, as would be implied by the exponential extrapolation, and would require that we don't have any breakthroughs or new paradigms at all.
I suspect that if somebody had given me this advice when I was a student I would have disregarded it, but, well, this is why wisdom is notoriously impossible to communicate. Wisdom always either sounds glib, banal or irrelevant. Oh well:
Anxiety, aversion and stress diminish with exposure and repetition.
This is something that, the older I get, the more I wish I had had this tattooed onto my body as a teenager. This is true of not only doing the dishes and laundry, but also vigorous exercise, talking to strangers, changing baby diapers, public speaking in front of crowds, having difficult conversations, and tackling unfamiliar subject matters. All of these are things that always suck, for everyone, the first time, or the first several times. I used to distinctly hate doing all of these things, and to experience a strong aversion to doing them, and to avoid doing them until circumstances forced me. Now they are all things I don't mind doing at all.
There may be "tricks" for metabolizing the anxiety of something like public speaking, but you ultimately don't need tricks. You just need to keep doing the thing until you get used to it. One day you wake up and realize that it's no longer a big deal.
What you really wanted from this answer was something that you could do today to help with your anxiety. The answer, then, is that if you really believe the (true) claim that simply doing the reps will make the anxiety go away, then the meta-anxiety you're feeling now (which is in some sense anxiety about future anxiety) will go away.
The Party Problem is a classic example taught as an introductory case in decision theory classes, that was the main reason why I chose it.