CS student at the University of Southern California. Previously worked for three years as a data scientist at a fintech startup. Before that, four months on a work trial at AI Impacts. Currently looking for research opportunities aligning large language models. Currently working with Professor Lionel Levine on language model safety research. 

Wiki Contributions


Two-year update on my personal AI timelines

2. Are you still estimating that algorithmic efficiency doubles every 2.5 years (for now at least, until R&D acceleration kicks in?) I've heard from thers (e.g. Jaime Sevilla) that more recent data suggests it's doubling every 1 year currently.

It seems like the only source on this is Hernandez & Brown 2020. Their main finding is a doubling time of 16 months for AlexNet-level performance on ImageNet: "the number of floating point operations required to train a classifier to AlexNet-level performance on ImageNet has decreased by a factor of 44x between 2012 and 2019. This corresponds to algorithmic efficiency doubling every 16 months over a period of 7 years."

 They also find faster doubling times for some Transformers and RL systems, as shown here:

This is notably faster algorithmic progress than the 2.5 year doubling time used in Ajeya's report, though I do somewhat agree with her justification for a more conservative estimate:

Additionally, it seems plausible to me that both sets of results would overestimate the pace of algorithmic progress on a transformative task, because they are both focusing on relatively narrow problems with simple, well-defined benchmarks that large groups of researchers could directly optimize. Because no one has trained a transformative model yet, to the extent that the computation required to train one is falling over time, it would have to happen via proxies rather than researchers directly optimizing that metric (e.g. perhaps architectural innovations that improve training efficiency for image classifiers or language models would translate to a transformative model). Additionally, it may be that halving the amount of computation required to train a transformative model would require making progress on multiple partially-independent sub-problems (e.g. vision and language and motor control).

I have attempted to take the Hernandez and Brown 2020 halving times (and Paul’s summary of the Grace 2013 halving times) as anchoring points and shade them upward to account for the considerations raised above. There is massive room for judgment in whether and how much to shade upward; I expect many readers will want to change my assumptions here, and some will believe it is more reasonable to shade downward.

Curious to read any other papers on this topic. More research benchmarking algorithmic gains seems tractable and if anybody has a well-scoped question I might also be interested in doing that research. 

How Do We Align an AGI Without Getting Socially Engineered? (Hint: Box It)

“We'll still probably put it in a box, for the same reason that keeping password hashes secure is a good idea. We might as well. But that's not really where the bulk of the security comes from.”

This seems true in worlds where we can solve AI safety to the level of rigor demanded by security mindset. But lots of things in the world aren’t secure by security mindset standards. The internet and modern operating systems are both full of holes. Yet we benefit greatly from common sense, fallible safety measures in those systems.

I think it’s worth working on versions of AI safety that are analogous to boxing and password hashing, meaning they make safety more likely without guaranteeing or proving it. We should also work on approaches like yours that could make systems more reliably safe, but might not be ready in time for AGI. Would you agree with that prioritization, or should we only work on approaches that might provide safety guarantees?

How Do We Align an AGI Without Getting Socially Engineered? (Hint: Box It)

Love the Box Contest idea. AI companies are already boxing models that could be dangerous, but they've done a terrible job of releasing the boxes and information about them. Some papers that used and discussed boxing:

  • Section 2.3 of OpenAI's Codex paper. This model was allowed to execute code locally. 
  • Section 2 and Appendix A of OpenAI's WebGPT paper. This model was given access to the Internet. 
  • Appendix A of DeepMind's GopherCite paper. This model had access to the Internet, and the authors do not even mention the potential security risks of granting such access. 
  • DeepMind again giving access to the Google API without discussing any potential risks. 

The common defense is that current models are not capable enough to write good malware or interact with search APIs in unintended ways. That might well be true, but someday it won't be, and there's no excuse for setting a dangerous precedent. Future work will need to set boxing norms and build good boxing software. I'd be very interested to see follow-up work on this topic or to discuss with anyone who's working on it. 

What do ML researchers think about AI in 2022?

Turns out that this dataset contains little to no correlation between a researcher's years of experience in the field and their HLMI timelines. Here's the trendline, showing a small positive correlation where older researchers have longer timelines -- the opposite of what you'd expect if everyone predicted AGI as soon as they retire. 

My read of this survey is that most ML researchers haven't updated significantly on the last five years of progress. I don't think they're particularly informed on forecasting and I'd be more inclined to trust the inside view arguments, but it's still relevant information. It's also worth noting that the median number of years until a 10% probability of HLMI is only 10 years, showing they believe HLMI is at least plausible on somewhat short timelines. 

What do ML researchers think about AI in 2022?

If you have the age of the participants, it would be interesting to test whether there is a strong correlation between expected retirement age and AI timelines. 

[RETRACTED] It's time for EA leadership to pull the short-timelines fire alarm.

This was heavily upvoted at the time of posting, including by me. It turns out to be mostly wrong. AI Impacts just released a survey of 4271 NeurIPS and ICML researchers conducted in 2021 and found that the median year for expected HLMI is 2059, down only two years from 2061 since 2016. Looks like the last five years of evidence hasn’t swayed the field much. My inside view says they’re wrong, but the opinions of the field and our inability to anticipate them are both important.

chinchilla's wild implications

It's worth noting that Ajeya's BioAnchors report estimates that TAI will require a median of 22T data points, nearly an order of magnitude more than the available text tokens as estimated here. See here for more. 

Two-year update on my personal AI timelines

My report estimates that the amount of training data required to train a model with N parameters scales as N^0.8, based significantly on results from Kaplan et al 2020. In 2022, the Chinchilla scaling result (Hoffmann et al 2022) showed that instead the amount of data should scale as N.

Are you concerned that pretrained language models might hit data constraints before TAI? Nostalgebraist estimates that there are roughly 3.2T tokens available publicly for language model pretraining. This estimate misses important potential data sources such as  transcripts from audio and video and private text conversations and email. But the BioAnchors report estimates that the median transformative model will require a median of 22T data points, nearly an order of magnitude higher than this estimate. 

The BioAnchors estimate was also based on older scaling laws that placed a lower priority on data relative to compute. With the new Chinchilla scaling laws, more data would be required for compute-optimal training. Of course, training runs don't need to be compute-optimal: You can get away with using more compute and less data if you're constrained by data, even if it's going to cost more. And text isn't the only data a transformative model could use: audio, video, and RLHF on diverse tasks all seem like good candidates. 

Does the limited available public text data affect your views of how likely GPT-N is to be transformative? Are there any considerations overlooked here, or questions that could use a more thorough analysis? Curious about anybody else's opinions, and thanks for sharing the update, I think it's quite persuasive. 

Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover

Would you have any thoughts on the safety implications of reinforcement learning from human feedback (RLHF)? The HFDT failure mode discussed here seems very similar to what Paul and others have worked on at OpenAI, Anthropic, and elsewhere. Some have criticized this line of research as only teaching brittle task-specific preferences in a manner that's open to deception, therefore advancing us towards more dangerous capabilities. If we achieve transformative AI within the next decade, it seems plausible that large language models and RLHF will play an important role in those systems — so why do safety minded folks work on it?

Load More