What is Compute? - Transformative AI and Compute [1/4]

FLOPs

Tiny remark regarding your post about the nomenclature of FLOP. Would it make sense for this series of intro posts to edit the occurrences of FLOPs to FLOP etc. to be consistent with your newly proposed nomenclature? I am currently upskilling in compute governance and as a newcomer, I was confused at first. I understand that it does not make sense to edit every post or article, but I just thought that it might be useful for those "intro" posts where a lot of basics are explained. Or maybe put in a link to the new nomenclature when you explain it for the first time? :)

[-]teradimich4y20

AlexNet was the first publication that leveraged graphical processing units (GPUs) for the training run

Do you mean the first of the data points on the chart? The GPU was used for DL long before AlexNet. References: [1], [2], [3], [4], [5].

[-]lennart4y10

Thanks for the correction and references. I just followed my "common sense" from lectures and other pieces.

What do you think made AlexNet stand out? Is it the depth and use of GPUs?

[-]teradimich4y10

I do not know the opinions of experts on this issue. And I lack competence for such conclusions, sorry.

[-]Gunnar_Zarncke4y20

I was slightly disappointed by this post not because it was bad but because it didn't provide much new or interesting. I see this more as a recap and hope for the next posts in this sequence to build on this.

[-]lennart4y30

Thanks for the feedback, Gunnar. You're right - it's more of a recap and introduction. I think the "newest" insight is probably the updates in Section 2.3.

I also would be curious to know in which aspects and questions you're most interested in.

[-]Gunnar_Zarncke4y30

The update in 2.3 was a valuable update. Based on the title (and my interests) I was hoping for

some integration of the limits for compute, memory, and interconnect. Like you say they limit each other but it is not very clear how the limits interrelate and scale with each other. Empirically, it would be interesting to see the relative sizes of these parts over time.
some comparison of the relative sizes of the human brain responsible for processing where we do have algorithms that are comparable to what the brain does, e.g. image processing and object and scene detection in the visual cortex.

[-]lennart4y30

Thanks!

I'm working with a colleague on the trends of the three components (compute, memory, and interconnect) over time of compute systems and then comparing it to our best estimates for the human brain (or other biological anchors). However, this will still take some time but I hope we will be able to share it in the future (≈ till the end of the year).

[-]Gunnar_Zarncke4y20

Cool. Looking forward to it.

One could argue the universe is a computer as well: pancomputationalism. ↩︎
You can read some thoughts on quantum computing in the series “Forecasting Quantum Computing” by Jaime Sevilla. ↩︎
Compute produces the data as an interactive environment for reinforcement learning. Therefore, more compute leads to more available training data. ↩︎
A petaflop/s is $10^{15}$ floating point operations per second for one day. A day has $86, 400 s e c o n d s \approx 10^{5} s e c o n d s$ . Therefore, $10^{20}$ floating point operations. ↩︎
Nonetheless, according to estimates, overall most compute is probably used for the deployed AI systems — inference. Whereas, as outlined, the training process is computational more complex, the repetitive behavior of inference once deployed, leads to overall more used compute. In the future those resources could be repurposed for training (if we do not see different hardware for training and inference — discussed in Section 4.2) (compute for training >> compute for inference but number of inferences >> number of training runs) (Amodei and Hernandez 2018). ↩︎
The final training run refers to the last training of an AI system before stopping updating the learned weights and biases and deploying the network for inference. There are usually dozens to hundreds of training runs of AI systems to tweak the architecture and hyper-parameters optimally. While this metric is relevant for the development costs, it is not an optimal proxy for the systems’ capabilities. ↩︎
“We think it’d be a mistake to be confident this trend won’t continue in the short term.” (Amodei and Hernandez 2018). ↩︎
The data used in this section is coming out of a project by Jaime Sevilla, Pablo Villalobos, Matthew Burtell and Juan Felipe Cerón. We collaborated to add more compute estimates to the public database. I can recommend their first analysis: “Parameter counts in Machine Learning”. ↩︎
Transformative AI, as defined by Open Philanthropy in this blogpost: “Roughly and conceptually, transformative AI is AI that precipitates a transition comparable to (or more significant than) the agricultural or industrial revolution.” ↩︎
For more thoughts and a discussion on this, I can recommend “The Scaling Hypothesis” by Gwern (or the summary in the AI Alignment Newsletter #156). ↩︎
I would also describe the purple part as an open research question. How can we decompose this — differentiating between parallelization, an engineering effort, and spending, where it is easier to find upper limits? ↩︎
I would be interested in an update on this. However, I also did not spend time looking for an update on this in the recent AI experts surveys. ↩︎
I initially made the claim that there are reasons to believe that the available memory capacity of compute systems might match the human brain or at least be sufficient (at least the information we can consciously recall and access). However, while thinking more about this claim, I became uncertain. I started wondering if the brain also has something similar to a memory hierarchy as it is the default for compute systems (different levels of memory capacities which can be accessed at different speeds). I would be interested in research on this. ↩︎
In general, computational power is key to our modern society, and might also be the foundation of life in the future: digital minds. The future of humanity could be computed on digital computers — see “Digital People Would Be An Even Bigger Deal” by Holden Karnofsky or “Sharing the World with Digital Minds” by Bostrom. ↩︎

LESSWRONG
LW

LESSWRONG
LW

27

What is Compute? - Transformative AI and Compute [1/4]

27

27

Epistemic Status

1. Compute

1.1 Logic, Memory and Interconnect

1.2 Chips or Integrated Circuits

2. Compute in AI Systems

2.1 Computing in AI Systems

Training

Inference

2.2 Compute Trends: 2012 to 2018

2.3 Compute Trends: An Update^[8]

3. Compute and AI Alignment

3.1 The Bitter Lesson

3.2 Scaling Hypothesis

3.3 AI and Efficiency

3.4 Qualitative Assessment

3.5 Compute Milestones

3.6 Conclusion

Next Post: Forecasting Compute

Acknowledgments

References

27

What is Compute? - Transformative AI and Compute [1/4]

27

27

Epistemic Status

1. Compute

1.1 Logic, Memory and Interconnect

1.2 Chips or Integrated Circuits

2. Compute in AI Systems

2.1 Computing in AI Systems

Training

Inference

2.2 Compute Trends: 2012 to 2018

2.3 Compute Trends: An Update[8]

3. Compute and AI Alignment

3.1 The Bitter Lesson

3.2 Scaling Hypothesis

3.3 AI and Efficiency

3.4 Qualitative Assessment

3.5 Compute Milestones

3.6 Conclusion

Next Post: Forecasting Compute

Acknowledgments

References

2.3 Compute Trends: An Update^[8]