Moore's Law, AI, and the pace of progress

[-]Tomás B.4y210

I showed this article to Peter Glaskowsky, who has had a long career in chip design and works on AI chips now and, in addition to liking this article, he mentioned he considers your estimates to be a lower bound.

[-]Jsevillamol4y140

Re: quantum computing.

I am bearish it will be a big deal in relation to AI.

This is because:

The exponential speedups are very hard to achieve in practice.
The quadratic speedups are mostly lost when you parallelize.
At the current rate of progress and barring a breakthrough it seems it will be a couple of decades until we have useful quantum computing.

[-]Charlie Steiner4y30

Huh. How much stock do you put in extrapolating trends in qubit count like that last link? I would assume that the tradeoff they see between chip size and quality is because of selection / publication bias, not manufacturing process per se. This means that once the un-selected manufacturing process can make error-corrected circuits, there's capacity for a steep rise in investment and circuit size.

[-]Jsevillamol4y40

I do believe that the tradeoff is real, and has a very clear physical reason - larger chips require more gates to perform a single quantum operation, so if the topology of the chip is roughly the same then I expect the fidelity needed to prevent errors to increase drastically with size.

In any case, note that in the extrapolation we basically assumed that the tradeoff didn't exist, so I expect the predictions to be pessimistic.

A larger question is whether these kind of historical extrapolations work at all. In fact, a large part of why I wrote this paper is precisely because I want to test this.

On this I am cautiously optimistic for hard-to-verbalize reasons.

I think the most legible reason is that technological discontinuities are somewhat rare in practice.

The hard-to-verbalize reasons are... points at the whole Eliezer vs OpenPhil and Christiano debate.

[-]jacob_cannell4y20

Did you mean bearish?

[-]Jsevillamol4y20

I keep making this mistake facepalm

[-]Daniel Kokotajlo4yΩ590

Well done!

What do you think about energy costs? Last I thought about this it seemed plausible to me that in ten years or so the atoms making up supercomputers will be <10% of the cost of training giant models, most of the cost being paying for the electricity and upkeep and rent.

[-]Veedrac4yΩ350

Lifetime energy costs are already significant, but I don't think the problem will get that skew this decade. IRDS' predicted transistor scaling until ~2028 should prevent power density increasing by too much.

Longer-term this does become a greater concern. I can't say I have particularly wise predictions here. There are ways to get more energy efficiency by spending more on lower-clocked hardware, or by using a larger memory:compute ratio, and there are also hardware architectures with plausible significant power advantages. There are even potential ways for energy to fall in price, like with solar PV or fusion, though I haven't a good idea how far PV prices could fall, and for fusion it seems like a roll of the dice what the price will be.

It's entirely possible energy does just become the dominant cost and none of those previous points matter, but it's also an input we know we can scale up pretty much arbitrarily if we're willing to spend the money. It's also something that only starts to become a fundamental economic roadblock after a lot more scaling. For instance, the 100,000 wafer scale processor example requires a lot of power, but only about as much as largest PV installations that currently exist. You could then upgrade it to 2028 technology and stack memory on top of the wafers without changing power density by all that much.

This is likely a topic worth periodically revisiting as the issue gets closer.

[-]Daniel Kokotajlo4yΩ230

Ok, thanks! I defer to your judgment on this, you clearly know way more than me. Oh well, there goes one of my hopes for the price of compute reaching a floor.

[-][anonymous]4y10

There are ways to get more energy efficiency by spending more on lower-clocked hardware, or by using a larger memory:compute ratio, and there are also hardware architectures with plausible significant power advantages.

As far as I understand, we're only 3 orders of magnitude away from Landauer limit, which doesn't leave a lot of room to squeeze efficiency out of. On the supply side, fusion doesn't seem like a relevant factor before 2050 unless an alternative approach takes us by surprise. Solar PV efficiency is already on the OOM of 1, so any advances have to come from reduction in production and maintenance costs (which is plausible for all I know).

[-]darius4y30

The Landauer limit constrains irreversible computing, not computing in general.

[-][anonymous]4y30

On the technology readiness level, I put reversible computing somewhere between von Neumann probes and warp drive. Definitely post-Singularity, likely impossible.

[-]Alexander Gietelink Oldenziel4y*10

Knowing little about irreversible computing this nevertheless sound surprising to me. Why exactly is irreversible computing so hard?

EDIT ofc I meant reversible not irreversible computing here!

[-]TanjB4y60

Irreversible is normal computing, the operation makes a state change which does not allow you to go backwards. Reversible computing is a lab curiosity at very small scale, using circuits which slide between states without dissipating energy and can slide the other way too. As Maxim says, it is far-out speculation whether we can really build computers that way.

[-]Alexander Gietelink Oldenziel4y30

Warp drive is more likely than not physically impossible, and even if possible would require insane energies, manipulating spacetime using exotic matter (which has never been produced) etc. It is a true magitech.

Von Neumann Probes seem easier; they re probably physically possible but the sheer engineering for it to work seems very very difficult. In fact there are no credible plans or ideas to even build one. Just having interstellar space travel is an immense task.

Doing thing with circuits seems comparatively more feasible.

[-]darius4y10

Agreed. I had [this recent paper](https://ieeexplore.ieee.org/abstract/document/9325353) in mind when I raised the question.

[-]Veedrac4y30

I don't expect a sustained Moore's Law type improvement to efficiency here, just the possibility of a few technology jumps with modest but meaningful gains. A factor of 10 beyond CMOS would amount to an extension of a decade.

I probably have much shorter average fusion timelines than you, albeit also with high variance, and wouldn't be hugely surprised if fusion ramped up commercial operations through the 2030s, nor would I be shocked if it didn't. The new wave of fusion startups seem to have coherent justifications to me, as a layman.

[-][anonymous]4y*70

I would be shocked if fusion provides >10% of electricity to any major economy in the 2030s, like cold-fusion-is-possible-level shocked. On the one hand, the technologies new fusion start-ups are working with are obviously much more plausible than cold fusion, on the other hand there are a LOT of likely ways for fusion to fail besides just technical problems, so my intuition tells me it's a toss-up.

I don't know nearly as much about solar PV so my confidence intervals there are much wider. I agree that if there was sufficient economic incentive, we could scale to incredible amounts of compute right now, crypto mining shows an empirical lower bound to that ability.

[-]Veedrac4y10

I agree with all of this. I wasn't intending to imply fusion would lower global average prices in that timeframe. A massive supercomputer effort like I was describing could build its own plant locally if necessary.

[-]jacob_cannell4y80

My first and second impression on reading this is I want to bet against you, but I'm not even quite clear on what specific bet you are taking against Jensen/IRS/myself when you say:

The natural implication is that device scaling has already stalled and will soon hit a wall, that scaling out much further is uneconomical, and in conclusion that AI progress cannot be driven much further through scaling, certainly not soon, and possibly not ever.
I disagree with this view. My argument is structured into a few key points.

Because you are hedging bets there, and then also here:

I want to emphasize here, these laws set a baseline expectation for future progress. A history of false alarms should give you some caution when you hear another alarm without qualitatively better justification. This does not mean Moore's Law will not end; it will. This does not even mean it won't end soon, or suddenly; it very well might.

So what I'm wondering is what is your more exact distribution over Moore's Law? To be specific, what is your distribution over the future graph of ops/$ or ops/J, such that it even disagrees with the mainstream (Jensen/IRS/myself/etc)?

To hold myself to that same standard, I predict that for standard available GPUs/TPUs/etc (irreversible parallel von-neumann machines), about 65% chance we can squeeze about 10x more ops/J out by 2028 (Moravec's prediction of AGI), and only about a 10% chance we can squeeze out about 100x more ops/J.

Do you disagree? I believe ops/$ will be mostly dominated by ops/J.

The wildcard is neuromorphic computing, which can allow somewhat better-than brain (say 10x) or so noisy analog ops/J. But that's a separate discussion, and those chips won't run current DL well, they are mostly only good for more explicitly brain-like AGI.

[-]Veedrac4y50

To hold myself to that same standard, I predict that for standard available GPUs/TPUs/etc (irreversible parallel von-neumann machines), about 65% chance we can squeeze about 10x more ops/J out by 2028 (Moravec's prediction of AGI), and only about a 10% chance we can squeeze out about 100x more ops/J.

2028 is 6 years and change away. Even a straight-line extrapolation of transistor density wouldn't quite make a 10x improvement versus today's cutting edge, and that scales better than switches-per-joule. So if we're ignoring the device architecture, I think I'm more pessimal than you!

I don't address ops/J in the article, though I respond to the question here. It seems totally reasonable to me that compute is eventually limited by energy production. At the same time, we are not currently anywhere near the limits of how much power we could feasibly pump into (or extract from) any one given supercomputer, and at minimum we have some power scaling left to expect from the roadmap.

You're right to call out the hedging in my article, but it is legitimate uncertainty. I expect progress to about match IRDS until 2028, but predictions have been wrong, and I didn't want people to take Moore's Law's seeming historic inviolability as evidence that it actually is inviolable.

To try to clarify and enumerate the relevant stances from the article,

The reports of Moore's Law's death have been greatly exaggerated, as it applies to current and historical trends.
I expect business as usual until at least around when the IRDS stops doing so, aka. 2028, after which IRDS expects scaling to come from 3D stacking of transistors.
AI performance will grow about proportionally to the product of transistor density and frequency, notwithstanding major computer architecture changes.
Some memory technology will displace traditional DRAM, likely this decade, with much better scaling properties. Plausibly several will.
You will see other forms of scaling, like 3D integration, continually make progress, though I'm not staking a claim on any given exponential rate.
Scaling up will happen proportionally to the money spent on compute, in the sense that we will not reach the point where we are physics limited, rather than resource limited, in how big AI systems can be.
I give some examples of feasible systems much larger and more capable than today's.

If any of these don't match what you got from the article, please point it out and I'll try to fix the discrepancy.

[-]jacob_cannell4y60

I don't address ops/J in the article, though I respond to the question here. It seems totally reasonable to me that compute is eventually limited by energy production.

Ok that might be a crux. I am claiming that new GPU designs are already energy limited; that is the main constraint GPU engineers care about.

I will update my off-the-cuff prediction with something more calibrated for posterity (that was an initial zero-effort guess), but I'm not ignoring device architecture. For the 2028 timeframe it's more like only ~2x op/J increase from semiconductor process (when measuring say transistor flips/J), and ~5x op/J from low level architecture improvement in low-precision matrix multiply units (or say ~5x and ~20x for my lower prob estimate). I'm specifically talking about GPU/TPU style processors, not neuromorphic, as described earlier. (In part because I believe GPU/TPU will take us to AGI before neuromorphic matters) Much more of the pre-neuromorphic gain will come from software.

I believe 1.) is actually easy-to-estimate from physics, I've read said physics/ECE papers outlining the exact end of moore's law, and I'm assuming Jensen-et-al has as well (and has deeper inside knowledge). The main constraint is more transit energy than transistor flip energy.

2.) Doesn't actually extend Moore's Law (at least by the useful definitions I'm using)

3.) GPUs aren't limited by transistor count, they are limited by power - ie we are already well into 'Dark Silicon' era.

4.) This is already priced in, and doesn't help enough.

5.) Doesn't help logic enough because of power/heat issues, but it's already important and priced in for RAM (eg HBM).

6.) I mean that's an independent scaling axis - you can always spend more on compute, and we probably have more OOM of slack there? Orthogonal to the Moore's Law predictions

7.) I'll reply to those feasible system examples separately after looking more closely.

[-]Veedrac4y40

Ok that might be a crux. I am claiming that new GPU designs are already energy limited; that is the main constraint GPU engineers care about.

I agree this seems to be our main departure.

You seem to be conflating two limits, power limits, as in how much energy we can put into a system, and thermal limits, as in how much energy can we extract from that system to cool it down.

With regards to thermal limits, GPUs run fairly far into the diminishing returns of their power-performance curve, and pushing them further, even with liquid nitrogen, doesn't help by a disproportionate amount. NVIDIA is pushing significantly more power into their top end GPUs than they need to approximately hit their peak performance. Compare phone to laptop to desktop GPUs; efficiency/transistor improves drastically as power goes down. So it seems to me like GPUs are not yet thermally limited, in the sense that having more transistor density would still allow performance scaling even in lieu of those transistors becoming more efficient.

Arguably this could be a result of architectural trade-offs prioritizing mobile, but flagships sell cards, and so if NVIDIA was willing to give those cards so much power, they should optimize for them to consume that much power. I'd also expect that to pan out as a greater advantage for competitors that target servers specifically, which we don't see. Anyhow, this isn't a physical limit, as there exist much better ways to extract heat than we are currently using, if this was something that scaling needed doing.

You seem mostly concerned on the other point, which is the power limit, specifically those derived from the price for that power. My understanding is that power is a significant fraction of server costs, but still significantly less than the amortized cost of the hardware.

[-]jacob_cannell4y30

You seem to be conflating two limits, power limits, as in how much energy we can put into a system, and thermal limits, as in how much energy can we extract from that system to cool it down.

I didn't use the word thermal, but of course they are trivially related as power in = heat out for irreversible computers, so power/thermal limit can be used interchangeably in that sense. GPUs (and any processor really) have a power/thermal design limit based on what's commercially feasible to support both in terms of the power supply and the required cooling.

So it seems to me like GPUs are not yet thermally limited, in the sense that having more transistor density would still allow performance scaling even in lieu of those transistors becoming more efficient.

This doesn't make sense to me - in what sense are they not thermally limited? Nvidia could not viably put out a consumer GPU that used 3 kilowatts for example. The RTX 3090 pushing power draw up to 350 watts was a big deal. Enterprise GPUs are even more power constrained, if anything (the flagship A100 uses 250 watts - although I believe it's using a slightly better TSMC node rather than samsung) - and also enormously more expensive per flop.

A 2x density scaling without a 2x energy efficiency scaling just results in 2x higher dark silicon ratio - this is already the case and why nvidia's recent GPU dies are increasingly split into specialized components: FP/int, tensorcore, ray tracing, etc.

Compare phone to laptop to desktop GPUs; efficiency/transistor improves drastically as power goes down.

I'm not sure what you mean here - from what I recall the flip/J metrics of the low power/mobile process nodes are on the order of 25% gains or so, not 100%. Phones/laptops have smaller processor dies and more dark silicon, not dramatically more efficient transistors.

My understanding is that power is a significant fraction of server costs, but still significantly less than the amortized cost of the hardware.

That naturally depends on the age of the hardware - eventually it will become useless when it's power + maintenance cost (which is also mostly power/thermal driven) exceeds value.

For example - for a 3090 right now the base mining value (and thus market rate) is about $8/day, for about $1/day of electricity (at $0.15 / kwhr) + $1/day for cooling (1:1 is a reasonable rule of thumb, but obviously depends on environment), so power/thermal is about 25% vs say 10% discount rate and 65% depreciation. Whereas it's more 50/50 for an older 1080 ti.

[-]Veedrac4y30

Power in = power out, but a power limit is quite different to a thermal limit. An embedded microcontroller running off a watch battery still obeys power in = power out, but is generally only limited by how much power you can put in, not its thermals.

This doesn't make sense to me - in what sense are they not thermally limited? Nvidia could not viably put out a consumer GPU that used 3 kilowatts for example.

This is the wrong angle to look at this question. Efficiency is a curve. At the point desktop GPUs sit at, large changes to power result in much smaller changes to performance. Doubling the power into a top end desktop GPU would not increase its performance by anywhere near double, and similarly halving the power only marginally reduces the performance.

It is true that devices are thermally limited in the sense that they could run faster if they had more power, but because of the steep efficiency curve, this is not at all the same as saying that they could not productively use more transistors, nor does it directly corresponds to dark silicon in a meaningful way. The power level is a balance between this performance increase and the cost of the power draw (which includes things like the cost of the power supplies and heatsink). As the slope of power needed per unit extra performance effectively approaches infinity, you will always find that the optimal trade-off is below theoretical peak performance.

If you add more transistors without improving those transistors' power efficiency, and without improving power extraction, you can initially just run those greater number of transistors at a more optimal power ratio.

A 2x density scaling without a 2x energy efficiency scaling just results in 2x higher dark silicon ratio - this is already the case and why nvidia's recent GPU dies are increasingly split into specialized components: FP/int, tensorcore, ray tracing, etc.

This is not true. GPUs can run shader cores and RT cores at the same time, for example. The reason for dedicated hardware for AI and ray tracing is that dedicated hardware is significantly more efficient (both per transistor and per watt) at doing those tasks.

I'm not sure what you mean here - from what I recall the flip/J metrics of the low power/mobile process nodes are on the order of 25% gains or so, not 100%. Phones/laptops have smaller processor dies and more dark silicon, not dramatically more efficient transistors.

The point isn't the logic cell, those tend to be marginal improvements as you say. The point is that those products are operating at a much more efficient point on the power-performance curve. Laptop NVIDIA GPUs are identical dies to their desktop dies (though not always to the same model number; a 3080 Mobile is a desktop 3070 Ti, not a desktop 3080). Phone GPUs are much more efficient again than laptop GPUs.

It is true that a phone SoC has more dark silicon than a dedicated GPU, but this is just because phone SoCs do a lot of disparate tasks, which are individually optimized for. Their GPUs are not particularly more dark than other GPUs, and GPUs in general are not particularly more dark than necessary for their construction.

It should also be noted that dark silicon is not the same as wasted silicon.

$1/day of electricity (at $0.15 / kwhr) + $1/day for cooling (1:1 is a reasonable rule of thumb, but obviously depends on environment)

Note that Google claims ~10:1.

I'm not convinced mining is a good proxy here, their market is weird, but it sounds like you agree that power is a significant but lesser cost.

[-]jacob_cannell4y30

This is the wrong angle to look at this question. Efficiency is a curve. At the point desktop GPUs sit at, large changes to power result in much smaller changes to performance. Doubling the power into a top end desktop GPU would not increase its performance by anywhere near double, and similarly halving the power only marginally reduces the performance.

Are you talking about clock rates? Those haven't changed for GPUs in a while, I'm assuming they will remain essentially fixed. Doubling the power into a desktop GPU at fixed clock rate (and ignoring dark silicon fraction) thus corresponds to doubling the transistor count (at the same transistor energy efficiency), which would double performance, power, and thermal draw all together.

This is not true. GPUs can run shader cores and RT cores at the same time, for example. The reason for dedicated hardware for AI and ray tracing is that dedicated hardware is significantly more efficient (both per transistor and per watt) at doing those tasks.

Jensen explicitly mentioned dark silicon as motivator in some presentation about the new separate FP/int paths in ampere, and I'm assuming the same probably applies at some level internally for the many paths inside tensorcores and RT cores. I am less certain about perf/power for simultaneously maxing tensorcores+RTcores+alucores+mempaths, but I'm guessing it would thermal limit and underclock to some degree.

The point is that those products are operating at a much more efficient point on the power-performance curve. Laptop NVIDIA GPUs are identical dies to their desktop dies (though not always to the same model number; a 3080 Mobile is a desktop 3070 Ti, not a desktop 3080).

Primarily through lowered clock rates or dark silicon. I ignored clock rates because they seem irrelevant for the future of Moore's law.

Note that Google claims ~10:1.

Google has unusually efficient data-centers, but I'd also bet that efficiency measure isn't for a pure GPU datacenter, which would have dramatically higher energy density and thus cooling challenges than their typical light CPU heavy storage search-optimized servers.

[-]Veedrac4y10

Clock rate is relevant. Or rather, the underlying aspects that in part determine clock rate are relevant. It is true that doubling transistor density while holding all else equal would require much more thermal output, but it's not the only option, were thermal constraints the dominant factor.

I agree there is only so much room to be gained here, which would quickly vanish in the face of exponential trends, but this part of our debate came up in the context of whether current GPUs are already past this point. I claim they aren't, and that being so far past the point of maximal energy efficiency is evidence of it.

Jensen explicitly mentioned dark silicon as motivator in some presentation about the new separate FP/int paths in ampere

This doesn't make sense technically; if anything Ampere moves in the opposite direction, by making both datapaths be able to do FP simultaneously (though this is ultimately a mild effect that isn't really relevant). To quote the GA102 whitepaper,

Most graphics workloads are composed of 32-bit floating point (FP32) operations. The Streaming Multiprocessor (SM) in the Ampere GA10x GPU Architecture has been designed to support double-speed processing for FP32 operations. In the Turing generation, each of the four SM processing blocks (also called partitions) had two primary datapaths, but only one of the two could process FP32 operations. The other datapath was limited to integer operations. GA10x includes FP32 processing on both datapaths, doubling the peak processing rate for FP32 operations. As a result, GeForce RTX 3090 delivers over 35 FP32 TFLOPS, an improvement of over 2x compared to Turing GPUs.

I briefly looked for the source for your comment and didn't find it.

Google has unusually efficient data-centers

We are interested in the compute frontier, so this is still relevant. I don't share the intuition that higher energy density would make cooling massively less efficient.

[-]jacob_cannell4y20

I was aware the 3090 had 2x FP32, but I thought that dual FP thing was specific to the GA102. Actually the GA102 just has 2x the ALU cores per SM vs the GA100.

We are interested in the compute frontier, so this is still relevant. I don't share the intuition that higher energy density would make cooling massively less efficient.

There are efficiency transitions from passive to active, air to liquid, etc, that all depend on energy density.

[-]Charlie Steiner4y70

This topic is interesting because perception of Moore's law varies a lot depending on whether you're looking at production or R&D. In terms of R&D, manufacturers are already testing the limits of silicon (and so of course Moore's law is already a dead law walking) - the 2028 timeline is just how long people expect it to take to roll out what we already have.

And by "roll out," of course I mean "switch over to the entirely new designs and manufacturing techniques required to do anything useful with higher-precision lithography," because everything is way more complicated than it was 20 years ago.

[-]Joe Rocca4y60

Neurons fire at around 200 Hz on average.

The average cortical neuron firing rate is much lower than this.[0] You might have meant maximum rather than average - or am I misunderstanding?

[0] https://aiimpacts.org/rate-of-neuron-firing/#:~:text=Based%20on%20the%20energy%20budget,around%200.16%20times%20per%20second.

[-]Veedrac4y30

I was indeed misremembering some statistic for which 200 Hz is correct, likely average peak rate or somesuch, not sure exactly. Thanks for catching this.

[-]evhub4yΩ350

(Moderation note: added to the Alignment Forum from LessWrong.)

[-]leogao4y50

However, consider model parallelism, splitting different layers of the graph across the nodes.

Nitpick: this would be more accurately described as pipeline parallelism; the term model parallelism is ambiguous since sometimes it's used as an umbrella term for everything other than data parallelism, but it's typically taken to mean splitting each layer across multiple nodes.

[-]Veedrac4y10

Fair clarification, I've edited the post.

[-]TanjB4y40

Nice article. Good that you spotted the DRAM problem, many people don't realize DRAM hit a scaling wall nearly 10 years ago. It has to do with the amount of charge needed to provide a sensible change at the end of the wires. As wires scale smaller their RC constant gets worse and competes with other factors that might improve, driving the capacitors to stay in the same range of total charge. Meanwhile the diameter of the capacitors is tough to change, with the minimum diameter set by material constants of dielectric and voltage breakdown. We found the best a while ago. The only way to pack the capacitors closer is to reduce the difference between the widest part of the cylinder and the minimum - which requires perfecting the aspect ratio and minimizing fluctuations. Slow, slow progress and when you reach perfection there remains that minimum diameter, rather like hitting the speed limit on transistors.

If you estimate the cost of a Graviton 2 core it comes out to about $5, but the 4GB of memory assigned to it cost about $12. You can do similar calculations for Apple M1 series. DRAM is already the cost limit, because it has for so long been the scaling laggard.

We will need new types of memory far more urgently than worrying about logic scaling.

[-]avturchin4y30

Several more considerations which favour progress in computing:

Even if Moore's law will stop, we could still produce more chips on existing fabs, and they will be much cheaper, as there will be no expenses on intellectual properties and no amortisation costs needed to develop better fabs and chips.

As the total size of the world economy is growing, relative price of a large computer is smaller.

Cloud computing allows researchers not to pay in advance for hardware which may or may not be effectively used.

Data centres are expensive, so smaller physical size of a computer is important as well as smaller energy consumption, even if it has the same price for flops. But larger data centres are more cost efficient.

[+][comment deleted]4y40

[-]lennart4y10

Great post! I especially liked that you outlined potential emerging technologies and the economic considerations.

Having looked a bit into this when writing my TAI and Compute sequence, I agree with your main takeaways. In particular, I'd like to see more work on DRAM and the interconnect trends and potential emerging paradigms.

I'd be interested in you compute forecasts to inform TAI timelines. For example Cotra's draft report assumes a doubling time of 2.5 years for the FLOPs/$ but acknowledges that this forecast could be easily improved by someone with more domain knowledge -- that could be you.

[-]Veedrac4y*20

I'd be interested in you compute forecasts to inform TAI timelines.

I'm not all that sure this is going to give you anything more useful than what you have already. Around the end of this decade my compute predictions detach from any strict timeline, and the pace of progress within this decade is smaller than the potential range of money spent on the problem, so even if you draw a sure line around where TAI happens with today's AI techniques, you don't gain all that much from better guesses about hardware progress.

Put another way, if you assume current connectivist architectures scaled up to ~brain parity buys you most of the tools you need to build TAI, then you don't need to worry about longer term hardware progress. If you don't assume that, then you don't have a meaningful anchor to use these longer term predictions with anyway. If I had strong timelines for physical technology progress you could at least say, architectures like P will be tried around the 20X0s, and architectures like Q will be tried around the 20Y0s, but I don't have strong timelines for progress that goes that far out.

I do think understanding longer term tech progress is relevant, because I think that current AI systems do seem to keep buying relevant cognitive abilities as they scale, and having a long roadmap implies that we'll keep doing that until the trick stops working or we hit AGI. But I don't know how to put a date on that, at least one that's more informative than ‘it's technically plausible, and could come moderately soon if things go fast’.