Computer science master's student interested in AI and AI safety.
Strong upvote. I think this is an excellent, carefully written, and timely post. Explaining issues that may arise from current alignment methods is urgent and important. It provides a good explanation of the unidentifiability or inner alignment problem that could arise from advanced AIs systems trained with current behavioral safety methods. It also highlights the difficulty of making AIs that can automate alignment research which is part of OpenAI's current plan. I also liked the in-depth description of what advanced science AIs would be capable of as well as the difficulty of keeping humans in the loop.
Nice post! The part I found most striking was how you were able to use the mean difference between outputs on harmful and harmless prompts to steer the model into refusing or not. I also like the refusal metric which is simple to calculate but still very informative.
TL;DR: Private AI companies such as Anthropic which have revenue-generating products and also invest heavily in AI safety seem like the best type of organization for doing AI safety research today. This is not the best option in an ideal world and maybe not in the future but right now I think it is.
I appreciate the idealism and I'm sure there is some possible universe where shutting down these labs would make sense but I'm quite unsure about whether doing so would actually be net-beneficial in our world and I think there's a good chance it would be net-negative in reality.
The most glaring constraint is finances. AI safety is funding-constrained so this is worth mentioning. Companies like DeepMind and OpenAI spend hundreds of millions of dollars per year on staff and compute and I doubt that would be possible in a non-profit. Most of the non-profits working on AI safety (e.g. Redwood Research) are small with just a handful of people. OpenAI changed their company from a non-profit to a capped for-profit because they realized that being a non-profit would have been insufficient for scaling their company and spending. OpenAI now generates $1 billion in revenue and I think it's pretty implausible that a non-profit could generate that amount of income.
The other alternative apart from for-profit companies and philanthropic donations is government funding. It is true that governments fund a lot of science. For example, the US government funds 40% of basic science research. And a lot of successful big science projects such as CERN and the ITER fusion project seem to be mostly government-funded. However, I would expect a lot of government-funded academic AI safety grants to be wasted by professors skilled at putting "AI safety" in their grant applications so that they can fund whatever they were going to work on anyway. Also, the fact that the US government has secured voluntary commitments from AI labs to build AI safely gives me the impression that governments are either unwilling or incapable of working on AI safety and instead would prefer to delegate it to private companies. On the other hand, the UK has a new AI safety institute and a language model task force.
Another key point is research quality. In my opinion, the best AI safety research is done by the big labs. For example, Anthropic created constitutional AI and they also seem to be a leader in interpretability research. I think empirical AI safety work and AI capabilities work involve very similar skills (coding etc.) and therefore it's not surprising that leading AI labs also do the best empirical AI safety work. There are several other reasons for explaining why big AI labs do the best empirical AI safety work. One is talent. Top labs have the money to pay high salaries which attracts top talent. Work in big labs also seems more collaborative than in academia which seems important for large projects. Many top projects have dozens of authors (e.g. the Llama 2 paper). Finally, there is compute. Right now, only big labs have the infrastructure necessary to do experiments on leading models. Doing experiments such as fine-tuning large models requires a lot of money and hardware. For example, this paper by DeepMind on reducing sycophancy apparently involved fine-tuning the 540B PaLM model which is probably not possible for most independent and academic researchers right now and consequently, they usually have to work with smaller models such as Llama-2-7b. However, the UK is investing in some new public AI supercomputers which hopefully will level the playing field somewhat. If you think theoretical work (e.g. agent foundations) is more important than empirical work then big labs have less of an advantage. Though DeepMind is doing some of that too.
GPT-4 is the model that has been trained with the most training compute which suggests that compute is the most important factor for capabilities. If that wasn't true, we would see some other company training models with more compute but worse performance which doesn't seem to be happening.
No offense but I sense status quo bias in this post.
If you replace "AI" with "industrial revolution" I don't think the meaning of the text changes much and I expect most people would rather live today than in the Middle Ages.
One thing that might be concerning is that older generations (us in the future) might not have the ability to adapt to a drastically different world in the same way that some old people today struggle to use the internet.
I personally don't expect to be overly nostalgic in the future because I'm not that impressed by the current state of the world: factory farming, the hedonic treadmill, physical and mental illness, wage slavery, aging, and ignorance are all problems that I hope are solved in the future.
Although AI progress is occurring gradually right now where regulation can keep up, I do think a hard takeoff is still a possibility.
My understanding is that fast recursive self-improvement occurs once there is a closed loop of fully autonomous self-improving AI. AI is not capable enough for that yet and most of the important aspects of AI research are still done by humans but it could become a possibility in the future once AI agents are advanced and reliable enough.
In the future before an intelligence explosion, there could be a lot of regulation and awareness of AI relative to today. But if there's a fast takeoff, regulation would be unable to keep up with AI progress.
Recently I learned that the negative effect of sleep deprivation on cognitive performance seems to accumulate over several days. Five days of insufficient sleep can lower cognitive performance by up to 15 IQ points according to this source.
I personally use Toggl to track how much time I spend working per day. I usually aim for at least four hours of focused work per day.
Thanks for the post! I think it does a good job of describing key challenges in AI field-building and funding.
The talent gap section describes a lack of positions in industry organizations and independent research groups such as SERI MATS. However, there doesn't seem to be much content on the state of academic AI safety research groups. So I'd like to emphasize the current and potential importance of academia for doing AI safety research and absorbing talent. The 80,000 Hours AI risk page says that there are several academic groups working on AI safety including the Algorithmic Alignment Group at MIT, CHAI in Berkeley, the NYU Alignment Research Group, and David Krueger's group in Cambridge.
The AI field as a whole is already much larger than the AI safety field so I think analyzing the AI field is useful from a field-building perspective. For example, about 60,000 researchers attended AI conferences worldwide in 2022. There's an excellent report on the state of AI research called Measuring Trends in Artificial Intelligence. The report says that most AI publications come from the 'education' sector which is probably mostly universities. 75% of AI publications come from the education sector and the rest are published by non-profits, industry, and governments. Surprisingly, the top 9 institutions by annual AI publication count are all Chinese universities and MIT is in 10th place. Though the US and industry are still far ahead in 'significant' or state-of-the-art ML systems such as PaLM and GPT-4.
What about the demographics of AI conference attendees? At NeurIPS 2021, the top institutions by publication count were Google, Stanford, MIT, CMU, UC Berkeley, and Microsoft which shows that both industry and academia play a large role in publishing papers at AI conferences.
Another way to get an idea of where people work in the AI field is to find out where AI PhD students go after graduating in the US. The number of AI PhD students going to industry jobs has increased over the past several years and 65% of PhD students now go into industry but 28% still go into academic jobs.
Only a few academic groups seem to be working on AI safety and many of the groups working on it are at highly selective universities but AI safety could become more popular in academia in the near future. And if the breakdown of contributions and demographics of AI safety will be like AI in general, then we should expect academia to play a major role in AI safety in the future. Long-term AI safety may actually be more academic than AI since universities are the largest contributor to basic research whereas industry is the largest contributor to applied research.
So in addition to founding an industry org or facilitating independent research, another path to field-building is to increase the representation of AI safety in academia by founding a new research group though this path may only be tractable for professors.
Thanks for the post. It's great that people are discussing some of the less-frequently discussed potential impacts of AI.
I think a good example to bring up here is video games which seem to have similar risks.
When you think about it, video games seem just as compelling as AI romantic partners. Many video games such as Call of Duty, Civilization, or League of Legends involve achieving virtual goals, leveling up, and improving skills in a way that's often more fulfilling than real life. Realistic 3D video games have been widespread since the 2000s but I don't think they have negatively impacted society all that much. Though some articles claim that video games are having a significant negative effect on young men.
Personally, I've spent quite a lot of time playing video games during my childhood and teenage years but I mostly stopped playing them once I went to college. But why replace an easy and fun way to achieve things with reality which is usually less rewarding and more frustrating? My answer is that achievements in reality are usually much more real, persistent, and valuable than achievements in video games. You can achieve a lot in video games but it's unlikely that you'll achieve goals that increase your status to as many people over a long period of time as you can in real life.
A relevant quote from the article I linked above:
"After a while I realized that becoming master of a fake world was not worth the dozens of hours a month it was costing me, and with profound regret I stashed my floppy disk of “Civilization” in a box and pushed it deep into my closet. I hope I never get addicted to anything like “Civilization” again."
Similarly, in the near term at least, AI romantic partners could be competitive with real relationships in the short term, but I doubt it will be possible to have AI relationships that are as fulfilling and realistic as a marriage that lasts several decades.
And as with the case of video games, status will probably favour real relationships causing people to value real relationships because they offer more status than virtual ones. One possible reason is that status depends on scarcity. Just as being a real billionaire offers much more status than being a virtual one, having a real high-quality romantic partner will probably yield much more status than a virtual one and as a result, people will be motivated to have real partners.