I think this might be an overstatement. It's true that NSF tends not to fund developers, but in ML the NSF is only one of many funders (lots of faculty have grants from industry partnerships, for instance).
Thanks for writing this!
Regarding how surprise on current forecasts should factor into AI timelines, two takes I have:
* Given that all the forecasts seem to be wrong in the "things happened faster than we expected" direction, we should probably expect HLAI to happen faster than expected as well.
* It also seems like we should retreat more to outside views about general rates of technological progress, rather than forming a specific inside view (since the inside view seems to mostly end up being wrong).
I think a pure outside view would give a median of something like 35 years in my opinion (based on my very sketchy attempt of forming a dataset of when technical grant challenges were solved), and then ML progress seems to be happening quite quickly, so you should probably adjust down from that.
Actually pretty interested how you get to medians of 40 years, that seems longer than I'd predict without looking at any field-specific facts about ML, and then the field-specific facts mostly push towards shorter timelines.
Thanks! I just read over it and assuming I understood correctly, this bottleneck primarily happens for "small" operations like layer normalization and softlax, and not for large matrix multiples. In addition, these small operations are still the minority of runtime (40% in their case). So I think this is still consistent with my analysis, which assumes various things will creep in to keep GPU utilization around 40%, but that they won't ever drive it to (say) 10%. Is this correct or have I misunderstood the nature of the bottleneck?
Edit: also maybe we're just miscommunicating--I definitely don't think CPU->HBM is a bottleneck, it's instead the time to load from HBM which sounds the same as what you said. Unless I misread the A100 specs, that comes out to 1.5TB/s, which is the number I use throughout.
Short answer: If future AI systems are doing R&D, it matters how quickly the R&D is happening.
Okay, thanks! The posts actually are written in markdown, at least on the backend, in case that helps you.
Question for mods (sorry if I asked this before): Is there a way to make the LaTeX render?
In theory MathJax should be enough, eg that's all I use at the original post: https://bounded-regret.ghost.io/how-fast-can-we-perform-a-forward-pass/
I was surprised by this claim. To be concrete, what's your probability of xrisk conditional on 10-year timelines? Mine is something like 25% I think, and higher than my unconditional probability of xrisk.
Fortunately (?), I think the jury is still out on whether phase transitions happen in practice for large-scale systems. It could be that once a system is complex and large enough, it's hard for a single factor to dominate and you get smoother changes. But I think it could go either way.
Thanks! I pretty much agree with everything you said. This is also largely why I am excited about the work, and I think what you wrote captures it more crisply than I could have.
Here is my take: since there's so much AI content, it's not really feasible to read all of it, so in practice I read almost none of it (and consequently visit LW less frequently).
The main issue I run into is that for most posts, on a brief skim it seems like basically a thing I have thought about before. Unlike academic papers, most LW posts do not cite previous related work nor explain how what they are talking about relates to this past work. As a result, if I start to skim a post and I think it's talking about something I've seen before, I have no easy way of telling if they're (1) aware of this fact and have something new to say, (2) aware of this fact but trying to provide a better exposition, or (3) unaware of this fact and reinventing the wheel. Since I can't tell, I normally just bounce off.
I think a solution could be to have a stronger norm that posts about AI should say, and cite, what they are building on and how it relates / what is new. This would decrease the amount of content while improving its quality, and also make it easier to choose what to read. I view this as a win-win-win.