Epic. Do you have to apply the normal route in order to get the GRFP, or can you go GRFP-only?
Really cool. I read some of these kinds of papers last week, but this is better context on the topic. Redundancy seems like evidence in favor of a narrow loss basin, but e.g. the fact that fine-tuned BERT models generalize very differently is evidence of multiple local minima. Your guess that linear mode connectivity works in simple image classification domains but not in language models seems like the most likely answer to me, but I would be interested to see it tested.
One technique that might help for fine-tuning the generator is Meta AI’s DIRECTOR . The technique uses a classifier to estimate the probability that a generated sequence will be unacceptable each time a new token is generated. Rather than generating full completions and sampling among them, this method guides the towards acceptable completions during the generation process. The Blender Bot 3 paper finds that this method works better than the more standard approach of ranking full completions according to the classifier’s acceptability score .
 http... (read more)
In particular, that report considers substituting capital for labor a potential driver of explosive growth. Fen’s argument from the Baumol effect relies on the premise that there are baseline levels of labor that cannot be automated, and that productivity growth is therefore limited by those bottlenecks.
I like this way of thinking about how quickly AI will grow smarter, and how much of the world will be amenable to its methods. Is understanding natural language sufficient to take over the world? I would argue yes, but my NLP professor disagrees — he thinks physical embodiment and the accompanying social cues would be very important for achieving superintelligence.
Your first two points make a related argument: that ML requires lots of high quality data, and that our data might not be high quality, or not in the areas it needs to be. A similar question woul... (read more)
Something I learned today that might be relevant: OpenAI was not the first organization to train transformer language models with search engine access to the internet. Facebook AI Research released their own paper on the topic six months before WebGPT came out, though the paper is surprisingly uncited by the WebGPT paper.
Generally I agree that hooking language models up to the internet is terrifying, despite the potential improvements for factual accuracy. Paul's arguments seem more detailed on this and I'm not sure what I would think if I thought ab... (read more)
Agreed it's really difficult for a lot of the work. You've probably seen it already but Dan Hendrycks has done a lot of work explaining academic research areas in terms of x-risk (e.g. this and this paper). Jacob Steinhardt's blog and field overview and Sam Bowman's Twitter are also good for context.
I second this, that it's difficult to summarize AI-safety-relevant academic work for LW audiences. I want to highlight the symmetric difficulty of trying to summarize the mountain of blog-post-style work on the AF for academics.
In short, both groups have steep reading/learning curves that are under-appreciated when you're already familiar with it all.
All of these academics are widely read and cited. Looking at their Google Scholar profiles, everyone one of them has more than 1000, and half have more than 10,000 citations. Outside of LessWrong, lots of people in academia and industry labs already read and understand their work. We shouldn't disparage people who are successfully bringing AI safety into the mainstream ML community.
(Also, this is an incredibly helpful writeup and it’s only to be expected that some stuff would be missing. Thank you for sharing it!)
These professors all have a lot of published papers in academic conferences. It’s probably a bit frustrating to not have their work summarized, and then be asked to explain their own work, when all of their work is published already. I would start by looking at their Google Scholar pages, followed by personal websites and maybe Twitter. One caveat would be that papers probably don’t have full explanations of the x-risk motivation or applications of the work, but that’s reading between the lines that AI safety people should be able to do themselves.
Agree with both aogara and Eli's comment.
One caveat would be that papers probably don’t have full explanations of the x-risk motivation or applications of the work, but that’s reading between the lines that AI safety people should be able to do themselves.
For me this reading between the lines is hard: I spent ~2 hours reading academic papers/websites yesterday and while I could quite quickly summarize the work itself, it was quite hard to me to figure out the motivations.
“OpenAI leadership tend to put more likelihood on slow takeoff”
Could you say more about the timelines of people at OpenAI? My impression was that they’re very short and explicitly include the possibility of scaling language models to AGI. If somebody builds AGI in the next 10 years, OpenAI seems like a leading candidate to do so. Would people at OpenAI generally agree with this?
Good answer from Gwern here.
It might have web search capabilities a la WebGPT, in which case I wouldn’t be confident of this. Without web search I’d agree.
2. Are you still estimating that algorithmic efficiency doubles every 2.5 years (for now at least, until R&D acceleration kicks in?) I've heard from thers (e.g. Jaime Sevilla) that more recent data suggests it's doubling every 1 year currently.
It seems like the only source on this is Hernandez & Brown 2020. Their main finding is a doubling time of 16 months for AlexNet-level performance on ImageNet: "the number of floating point operations required to train a classifier to AlexNet-level performance on ImageNet has decreased by a factor of 44x betwe... (read more)
“We'll still probably put it in a box, for the same reason that keeping password hashes secure is a good idea. We might as well. But that's not really where the bulk of the security comes from.”
This seems true in worlds where we can solve AI safety to the level of rigor demanded by security mindset. But lots of things in the world aren’t secure by security mindset standards. The internet and modern operating systems are both full of holes. Yet we benefit greatly from common sense, fallible safety measures in those systems.
I think it’s worth working on vers... (read more)
Love the Box Contest idea. AI companies are already boxing models that could be dangerous, but they've done a terrible job of releasing the boxes and information about them. Some papers that used and discussed boxing:
Turns out that this dataset contains little to no correlation between a researcher's years of experience in the field and their HLMI timelines. Here's the trendline, showing a small positive correlation where older researchers have longer timelines -- the opposite of what you'd expect if everyone predicted AGI as soon as they retire.
My read of this survey is that most ML researchers haven't updated significantly on the last five years of progress. I don't think they're particularly informed on forecasting and I'd be more inclined to trust the inside ... (read more)
Yes, to be clear, I don't buy the M-G law either on the basis of earlier surveys which showed it was just cherrypicking a few points motivated by dunking on forecasts. But it is still widely informally believed, so I point this out to annoy such people: 'you can have your M-G law but you will have to also have the implication (which you don't want) that timelines dropped almost an entire decade in this survey & the past few years have not been business-as-usual or "expected" or "predicted"'.
If you have the age of the participants, it would be interesting to test whether there is a strong correlation between expected retirement age and AI timelines.
This was heavily upvoted at the time of posting, including by me. It turns out to be mostly wrong. AI Impacts just released a survey of 4271 NeurIPS and ICML researchers conducted in 2021 and found that the median year for expected HLMI is 2059, down only two years from 2061 since 2016. Looks like the last five years of evidence hasn’t swayed the field much. My inside view says they’re wrong, but the opinions of the field and our inability to anticipate them are both important.
It's worth noting that Ajeya's BioAnchors report estimates that TAI will require a median of 22T data points, nearly an order of magnitude more than the available text tokens as estimated here. See here for more.
My report estimates that the amount of training data required to train a model with N parameters scales as N^0.8, based significantly on results from Kaplan et al 2020. In 2022, the Chinchilla scaling result (Hoffmann et al 2022) showed that instead the amount of data should scale as N.
Are you concerned that pretrained language models might hit data constraints before TAI? Nostalgebraist estimates that there are roughly 3.2T tokens available publicly for language model pretraining. This estimate misses important potential data sources such as &nb... (read more)
I suspect Chinchilla's implied data requirements aren't going to be that much of a blocker for capability gain. It is an important result, but it's primarily about the behavior of current backpropped transformer based LLMs.
The data inefficiency of many architectures was known before Chinchilla, but the industry worked around it because it wasn't yet a bottleneck. After Chinchilla, it has become one of the largest architectural optimization targets. Given the increase in focus and the relative infancy of the research, I would guess the next two years will s... (read more)
Would you have any thoughts on the safety implications of reinforcement learning from human feedback (RLHF)? The HFDT failure mode discussed here seems very similar to what Paul and others have worked on at OpenAI, Anthropic, and elsewhere. Some have criticized this line of research as only teaching brittle task-specific preferences in a manner that's open to deception, therefore advancing us towards more dangerous capabilities. If we achieve transformative AI within the next decade, it seems plausible that large language models and RLHF will play an important role in those systems — so why do safety minded folks work on it?
According to my understanding, there are three broad reasons that safety-focused people worked on human feedback in the past (despite many of them, certainly including Paul, agreeing with this post that pure human feedback is likely to lead to takeover):
Ah right. Thank you!
[Chinchilla 10T would have a 143x increase in parameters and] 143 times more data would also be needed, resulting in a 143*143= 20449 increase of compute needed.
Would anybody be able to explain this calculation a bit? It implies that compute requirements scale linearly with the number of parameters. Is that true for transformers?
My understanding would be that making the transformer deeper would increase compute linearly with parameters, but a wider model would require more than linear compute because it increases the number of connections between nodes at each layer.
Great post, thanks for sharing. Here's my core concern about LeCun's worldview, then two other thoughts:
The intrinsic cost module (IC) is where the basic behavioral nature of the agent is defined. It is where basic behaviors can be indirectly specified. For a robot, these terms would include obvious proprioceptive measurements corresponding to “pain”, “hunger”, and “instinctive fears”, measuring such things as external force overloads, dangerous electrical, chemical, or thermal environments, excessive power consumption, low levels of energy reserves in the
If anybody has good sources about LeCun's views on AI safety and value learning, I'd be interested.
If anybody has good sources about LeCun's views on AI safety and value learning, I'd be interested.
There's a conversation LeCun had with Stuart Russell and a few others in a Facebook comment thread back in 2019, arguing about instrumental convergence.
The full conversation is a bit long and difficult to skim. I haven't finished reading it myself, but in it LeCun links to an article he co-authored for Scientific American which argues x-risk from AI misalignment isn't something people should worry about. (He's more concerned about misuse risks.) Here's a ... (read more)
I'd like to publicly preregister an opinion. It's not worth making a full post because it doesn't introduce any new arguments, so this seems like a fine place to put it.
I'm open to the possibility of short timelines on risks from language models. Language is a highly generalizable domain that's seen rapid progress shattering expectations of slower timelines for several years in a row now. The self-supervised pretraining objective means that data is not a constraint (though it could be for language agents, tbd), and the market seems optimistic about b... (read more)
I'm having trouble understanding the argument for why a "sharp left turn" would be likely. Here's my understanding of the candidate reasons, I'd appreciate any missing considerations:
Ah okay. Are there theoretical reasons to think that neurons with lower variance in activation would be better candidates for pruning? I guess it would be that the effect on those nodes is similar across different datapoints, so they can be pruned and their effects will be replicated by the rest of the network.
“…nodes with the smallest standard deviation.” Does this mean nodes whose weights have the lowest absolute values?
Similarly, humans are terrible at coordination compared to AIs.
Are there any key readings you could share on this topic? I've come across arguments about AIs coordinating via DAOs or by reading each others' source code, including in Andrew Critch's RAAP. Is there any other good discussion of the topic?
A model/ensemble of models will achieve >90% on the MATH dataset using a no-calculator rule
Curious to hear if/how you would update your credence in this being achieved by 2026 or 2030 after seeing the 50%+ accuracy from Google's Minerva. Your prediction seemed reasonable to me at the time, and this rapid progress seems like a piece of evidence favoring shorter timelines.
I think it’s a pretty good argument. Holden Karnofsky puts a 1/3rd chance that we don’t see transformative AI this century. In that world, people today know very little about what advanced AI will eventually look like, and how to solve the challenges it presents. Surely some people should be working on problems that won’t be realized for a century or more, but it would seem much more difficult to argue that AI safety today is more altruistically pressing than other longtermist causes like biosecurity, and even neartermist causes like animal welfare and glo... (read more)
This is an awesome post. I've read it before, but hadn't fully internalized it.
My timelines on TAI / HLMI / 10x GDP growth are a bit longer than the BioAnchors report, but a lot of my objections to short timelines are specifically objecting to short timelines on rapid GDP growth. It's obvious after reading this that what we care about is x-risk timelines, not GDP timelines. Forecasting when x-risk might spike is more difficult because it requires focusing on specific risk scenarios, like persuasion tools or fast takeoff, rather than general growth in... (read more)
Incredible. Somebody please get this to Hofstadter and the rest of The Economist folks. Issuing a correction would be send a strong message to readers.
Here’s other AI safety coverage from The Economist, including quotes from Jack Clark of Anthropic. Seems thoughtful and well-researched, if a bit short on x-risk: https://www.economist.com/interactive/briefing/2022/06/11/huge-foundation-models-are-turbo-charging-ai-progress
I sent an email to the editor with this information, and included Hofstadter on it.
You might be interested in “ML for Cyberdefense” from this research agenda:
Fantastic agenda for the field, thanks for sharing.
Honesty is a narrower concept than truthfulness and is deliberately chosen to avoid capabilities externalities, since truthful AI is usually a combination of vanilla accuracy, calibration, and honesty goals. Optimizing vanilla accuracy is optimizing general capabilities, and we cover calibration elsewhere. When working towards honesty rather than truthfulness, it is much easier to avoid capabilities externalities.
I think it's worth mentioning that there are safety benefits to truthfulness beyond hone... (read more)
Interesting question. As far as what government could do to slow down progress towards AGI, I'd also include access to high-end compute. Lots of RL is knowledge that's passed through papers or equations, and it can be hard to contain that kind of stuff. But shutting down physical compute servers seems easier.
It's definitely a common belief on this site. I don't think it's likely, I've written up some arguments here.
I strongly agree with this objection. You might be interested in Comprehensive AI Services, a different story of how AI develops that doesn't involve a single superintelligent machine, as well as "Prosaic Alignment" and "The case for aligning narrowly superhuman systems". Right now, I'm working on language model alignment because it seems like a subfield with immediate problems and solutions that could be relevant if we see extreme growth in AI over the next 5-10 years.
Thanks, that really clarifies things. Frankly I’m not on board with any plan to “save the world” that calls for developing AGI in order to implement universal surveillance or otherwise take over the world. Global totalitarianism dictated by a small group of all-powerful individuals is just so terrible in expectation that I’d want to take my chances on other paths to AI safety.
I’m surprised that these kinds of pivotal acts are not more openly debated as a source of s-risk and x-risk. Publish your plans, open yourselves to critique, and perhaps you’ll revise your goals. If not, you’ll still be in a position to follow your original plan. Better yet, you might convince the eventual decision makers of it.
"Building an actual aligned AI, of course, would be a pivotal act." What would an aligned AI do that would prevent anybody from ever building an unaligned AI?
I mostly agree with what you wrote. Preventing all unaligned AIs forever seems very difficult and cannot be guaranteed by soft influence and governance methods. These would only achieve a lower degree of reliability, perhaps constraining governments and corporations via access to compute and critical algorithms but remaining susceptible to bad actors who find loopholes in the system. I guess what I'm ... (read more)
Thank you, this was very helpful. As a bright-eyed youngster, it's hard to make sense of the bitterness and pessimism I often see in the field. I've read the old debates, but I didn't participate in them, and that probably makes them easier to dismiss. Object level arguments like these help me understand your point of view.
Yeah, I guess the answer is yes by definition. Still wondering what kind of pivotal acts people are thinking about -- whether they're closer to a big power-grabs like "burn all the GPUs", or softer governance methods like "publishing papers with alignment techniques" and "encouraging safe development with industry groups and policy standards". And whether the need for a pivotal act is the main reason why alignment researchers need to be on the cutting edge of capabilities.
Specifically, do you agree with Eliezer that preventing existential risks requires a "pivotal act" as described here (#6 and #7)?
Love the effort to engage with alignment work in academia. It might be a very small thread of authors and papers at this point, but hopefully it will grow.
“It is necessary that people working on alignment have a capabilities lead.” Could you say a little more about this? Seems true but I’d be curious about your line of thought.
The theory of change could be as simple as “once we know how to build aligned AGI, we’ll tell everybody”, or as radical as “once we have an aligned AGI, we can steer the course of human events to prevent future catastrophe”. The more boring argument would be that any good ML research happens on the cutting edge of the field, so alignment needs big budgets and fancy labs just like any other researcher. Would you take a specific stance on which is most important?
Coming back to this: Your concern makes sense to me. Your proposal to train a new classifier for filtered generation to improve performance on other tasks seems very interesting. I think it might also be useful to simply provide a nice open-source implementation of rejection sampling in a popular generator repo like Facebook's OPT-175B, so that future researchers can build on it.
I'm planning on working on technical AI safety full-time this summer. Right now I'm busy applying to a few different programs, but I'll definitely follow up on this idea with you.
Did you consider using the approach described in Ethan Perez's "Red Teaming LMs with LMs"? This would mean using a new generator model to build many prompts, having the original generator complete those prompts, and then having a classifier identify any injurious examples in the completion.
The tricky part seems to be that this assumes the classifier's judgements are correct. If you trained the classifier on the examples identified by this process, it would only generate examples that are already labeled correctly by the classifier. To escape this pro... (read more)
Wild. One important note is that the model is trained with labeled examples of successful performance on the target task, rather than learning the tasks from scratch by trial and error like MuZero and OpenAI Five. For example, here's the training description for the DeepMind Lab tasks:
We collect data for 255 tasks from the DeepMind Lab, 254 of which are used during training, the left out task was used for out of distribution evaluation. Data is collected using an IMPALA (Espeholt et al., 2018) agent that has been trained jointly on a set of 18 procedurally
Great list of RL use cases: https://mighty-melody-f4b.notion.site/RL-for-real-world-problems-0114c270e5d94894b3c4f227e24401db