AI safety university groups: a promising opportunity to reduce existential risk

mic

Cross-posted from the Effective Altruism Forum

Summary: AI safety university groups^[1] are a promising way to grow the talent pool working to reduce existential risk from AI and can be fairly straightforward to set up successfully. Based on the experience of several student groups last semester, running AGI Safety Fundamentals or a similar AI safety reading group can attract dozens of participants and boost their likelihood to work on AI safety, while engaging many that wouldn’t have been reached by cause-neutral effective altruism community building. If you don’t have the time or experience to run an AI safety reading group, it can be valuable to broadly publicize a virtual program facilitated by another group (e.g., EA Cambridge).

Overall, AI safety field-building is a new, exciting area with much low-hanging fruit. At the same time, there are risks from poorly executed field-building which should be thoughtfully handled, such as inducing a poor impression of AI safety or increasing interest in accelerating the development of (potentially unsafe) artificial general intelligence (AGI).^[2]

If you’re interested in AI safety field-building, either at a university or in another setting (e.g., technical workplaces^[3]), fill out this form to get support from AI Safety Hub! Even if you think you’d do fine without a call, I’d strongly recommend taking a few seconds to fill this out, so that they can connect you with resources and other group organizers.

Transformative AI might pose considerable existential risk within the coming decades or later this century. Trained to maximize an objective, AI systems often find creative and undesirable ways to do so. YouTube's recommendation algorithm, trained to suggest content to maximize user engagement, ended up recommending increasingly extreme content and proliferating misinformation. Language models such as GPT-3, trained to predict the next word in a piece of text, can be prompted to produce offensive content and leak personal information from the training data. As AI capabilities advance, we can expect the stakes of these risks from misaligned systems to dramatically increase. Aligning deep learning systems with human values might be extremely challenging, and approaches that work for making a simpler model safe may fail for more advanced models. At the moment, leading AI companies such as Google Brain and Meta AI are actively working on developing AI with increasingly general capabilities (see A path to autonomous machine intelligence and Pathways), while not investing much in research to ensure that future systems are aligned and safe. (For a more thorough introduction to existential risk from AI, see Introduction to AI Safety - Robert Miles, Why AI alignment could be hard with modern deep learning, AI alignment - Wikipedia, and Power-seeking AI and X-risk mitigation.)

To make progress on the problem of existential risk from AI, it’s extremely valuable to have additional talented people working on it.^[4] That includes people in a variety of roles, such as machine learning safety research, conceptual alignment research, policy research, software engineering, operations, machine learning engineering, security, social science, and communications.^[5] Universities are a major source of potential talent, especially as students are actively exploring different possible career options. However, the vast majority of students have not heard about AI safety, even among students interested in AI, and fewer still have considered how they could work on AI safety.

Despite the need for many more people to work on AI safety, there hasn’t been much high fidelity, in-depth AI safety community-building at universities. But this year, there’s been a much stronger effort to actually try, and it seems to have been broadly successful. As a few case studies from this past semester:

Effective Altruism at Georgia Tech ran the AGI Safety Fundamentals technical alignment program for 33 participants from February to April 2022, ~15 of whom now plan to work in AI safety. Though EA at Georgia Tech was started in August 2021, we think this could work at other major CS universities too, even without any prior EA presence, as ~30 out of 33 participants didn’t have previous engagement with our EA group.
Imperial College EA ran a weekly reading group discussing papers in AI safety with 10–20 participants, mostly for masters/PhD students in computing/AI. They also gave away 100+ books in a giveaway, including to many of PhD/postgrad students in AI and computing. They ran a research retreat in February in which 11 members worked on submissions to the eliciting latent knowledge contest and hosted an “AGI Safety” seminar series, with speakers from DeepMind, OpenAI, and the Center on Long-Term Risk. One PhD student directing the group, Francis Rhys Ward, gave a lecture on “Long-term AI Risk” as part of the AI Ethics module at Imperial.
OxAI Safety Hub ran an AI alignment speaker series, with ~70 attendees for the first talk. They're currently running AGI Safety Fundamentals for ~60 participants (with rolling applications). Over the summer, they’re organizing research projects mentored by local AI safety researchers at Oxford.

Still, AI safety field-building remains relatively neglected in the world and in EA community groups. While there are 80+ EA university groups around the world, there are only 16 groups that I know of that are running significant AI safety activities (e.g., a program similar to AGI Safety Fundamentals).^[6] [Update: As of November 2022, I believe there are ~30 such groups.] Just a handful of groups have AI governance programming. There’s a lot of low-hanging fruit in the area, and a lot of capacity for more people to help out in this space. (Even relatively large and successful groups could benefit from more community builders; for example, EA at Harvard is currently interested in having an additional community builder working full time for AI safety field-building. Note that you don't need to be a university student to work on university community-building; indeed, OxAI Safety Hub and EA Cambridge are assisted by full-time staff who have already graduated.)

How much time does organizing a group require?

If you’re quite busy and limited on free time, one option is to just publicize a virtual AGI Safety Fundamentals program to computer science (CS) students at your university, the next time EA Cambridge runs the program (sign up here to be notified when it does). Depending on your university, there might be ways to do marketing which don’t take much time. I think it might just take a couple hours, if that.

Perhaps your university allows you to email all the CS students by emailing a listserv or asking an academic advisor. You could post about it to an online group used by students, such as Reddit, Facebook, or Discord. Maybe you could ask professors or clubs to forward an email to students. And of course, you could encourage your friends to sign up. (In publicity, I’d probably refrain from mentioning “artificial general intelligence” or “existential risk” to appear more mainstream; see EA NYU’s fellowship page for one possible way to describe the program.)

You might wonder, why would anyone sign up for a program like AGI Safety Fundamentals, especially when students are so busy? Here are some factors which I think helped EA at Georgia Tech get a good number of participants this past semester:

Plenty of students are interested in AI, and the program has some exciting content to satisfy that interest. In our publicity, we highlighted how the program discusses cutting-edge research in deep learning, reinforcement learning, and interpretability – in the context of ensuring AI systems are safe and beneficial for humanity.
We offered a free book, Human Compatible, for everyone who filled out our 1-minute interest form. Students really like free things!
Some students were already somewhat familiar with AI safety, through EA, Robert Miles’ YouTube channel, LessWrong, or another means
Lots of publicity, especially through university mailing lists

And since the participants commit to attending the discussion meetings every week, they're fairly likely to attend most meetings.

It can be valuable to have some sort of recurring in-person meetings, such as weekly events with free lunch. This could be a purely social event, or it could involve watching some videos relevant to AI safety (e.g., Robert Miles’ YouTube channel).

Additionally, we might be able to help you save time with group organizing by connecting you with operations assistants.

What could organizing a group look like?

I think AI safety clubs can contribute to AI safety through pursuing two primary goals:

getting people interested in working on AI safety
empowering people to gain the skills necessary to work on AI safety

For the first goal, one option which works well is running a local version of the AGI Safety Fundamentals reading group. As mentioned above, even if you don’t have time to facilitate discussions yourself, it could be quite valuable to publicize EA Cambridge’s virtual program to your university, the next time EA Cambridge runs it.

This section here is help you concretely envision what AI safety field-building could look like, especially based on what groups have done in the past. However, it’s not intended to be a comprehensive implementation guide and omits many helpful resources which I could share. If you’d like to get involved, please get in touch!

Goal 1: Raising interest in AI safety

Publicizing and running an introductory AI safety reading group appears to be a great way to build up a community of people interested in AI safety, analogous to how the Intro EA Program is an excellent way of starting an EA university club and has worked for ~50 university EA groups. I know of nine university groups which ran the AGI Safety Fundamentals alignment program this past semester: Oxford, Cambridge, Georgia Tech, University of Virginia, Northwestern, Mila, MIT, Harvard, and Stanford. Additionally, Columbia EA ran an AI safety reading group using a locally developed curriculum.

The AGI Safety Fundamentals alignment program is a semester-long reading group on AI alignment. Topics include an introduction to machine learning, existential risk from AGI, inverse reinforcement learning, reward modeling, scalable oversight, agent foundations, and AI safety careers. This program involves weekly 1.5-hour discussions in small groups of 4–6 participants and one discussion facilitator, as well as 2.5 hours of readings and exercises to be done before discussion meetings. Participants don’t need any prior experience in CS or machine learning, though it is helpful. The curriculum was first created in January 2021 by Richard Ngo, an AI governance researcher at OpenAI.

If you’d like to run the program locally, you’d create an application form, publicize the program to CS students at your university, and facilitate weekly 1.5-hour discussions for cohorts of 4–6 participants. You’d want to be familiar with the readings beforehand, but otherwise, facilitating discussions is fairly straightforward since there’s a facilitator guide. (If you’d prefer a curriculum with a lower time commitment for participants, check out Columbia EA’s AI safety curriculum.)

You might be surprised by how many people would be interested in applying to join a reading group on AI safety! If you market it well, you can get a good number of applications. I think a lot of students are just excited to learn more about deep learning, even if they don’t have any prior exposure to AI safety content. But even having a small number of participants could be very valuable.

If you don't have the time or experience to facilitate a few cohorts of the AGI Safety Fundamentals program, feel free to register interest for the global program here and publicize it when applications open. To help participants get to know each other outside of Zoom calls, it might be valuable to coordinate in-person meetups for participants, such as casual lunch socials.

Organizing a reading group is only one possible model for building a community. Other plans I’ve heard of for establishing a community are: running a series of workshops, organizing an AI safety retreat, or running an “eliciting latent knowledge” or distillation contest. Additional activities have worked well for groups include a watch party of Robert Miles’ YouTube channel on AI safety, speaker events, paper reading groups, casual social events, and simple one-on-one conversations – but I think simply publicizing/running AGI Safety Fundamentals would be an excellent way to get a group started started.

Goal 2: Upskilling

Besides getting people interested in working on AI safety, local groups have a valuable part in helping people gain the skills necessary to actually contribute to the field.

One simple option for helping members with upskilling is to connect members with existing resources for upskilling and to provide basic career advice for interested members. You could encourage and support members to:

Take a course in deep learning or natural language processing (or maybe deep reinforcement learning), either online or at your university. ML Safety Scholars and the ML for Alignment Bootcamp are great opportunities from the effective altruism community to learn machine learning over the summer.
Get involved with deep learning research opportunities, either at your university or elsewhere.
(For independent conceptual research) Work on distillation or ELK or alignment research exercises.
Work on deep learning projects (e.g., perhaps Kaggle competitions or with AI Safety Camp or EleutherAI)
Apply for AI safety research programs or internships such as the SERI Summer Research Fellowship, SERI ML Alignment Theory Scholars Program, the CHAI Research Internship, Redwood Research internship, etc. For building technical experience, it’s a good idea to also apply to internships that aren’t relevant to AI safety but are still valuable for building experience with machine learning or software engineering (e.g., pittcsc/Summer2023-Internships for software engineering, NSF REU Sites for research).

Still, there’s a gap between knowing what to do and actually doing it, and it might take just an extra bit of support. Stanford AI Alignment is planning on running coworking sessions and having peer accountability setting SMART goals.

After there’s a solid community of students interested in AI safety, you could organize group activities such as the following (based on suggestions from AI safety researchers):

Watching a deep learning and machine learning safety course together (perhaps similar to ML Safety Scholars)
Replicating a paper (e.g., deep Q-networks, proximal policy optimization, asynchronous actor-critic, GPT-2, or deep reinforcement learning from human preferences)
Running an empirical research project
- See AGI Week 8 — Effective Altruism Cambridge and AI Safety Ideas for more project ideas.
- Also, see Announcing the Harvard AI Safety Team
Going through conceptual alignment research exercises (e.g., from Richard Ngo’s Alignment research exercises or the SERI MATS application or eliciting latent knowledge)

In planning upskilling activities, it’s helpful to first get a sense of the skills and experience necessary for various careers, such as by reading “How to pursue a career in technical AI alignment” (strongly recommended!) and browsing job descriptions from the 80,000 Hours jobs board. I think it could also be useful to be familiar with the overall process for internships or getting involved with research – see the interview process for software engineering internships, the Tech Interview Handbook, the Machine Learning Interviews Book, and a thread on emailing professors about joining their research lab.

Who would be a good fit for this role?

You’d be a great fit if you are:

Interested in existential risk from AI
Knowledgeable about AI safety (let’s say 10+ hours of engagement; see this footnote for my reading recommendations)^[7]
Able to speak about AI safety in a nuanced way to newcomers
Organized and able to manage projects without letting important things fall through the cracks
At a university with a significant population of talented students, especially those studying computer science or policy (for AI governance) – or if you would be eager to support another university group (virtually or in-person)
Thoughtful about downside risk in communications.
- It’s easy to talk about AI safety in a way that sounds absurd or off-putting. Furthermore, creating hype around AGI could lead some students to work on AGI capabilities, if the only message some students hear is around the economic potential of AGI and not the safety concerns. In general, it’s good to do things on a small scale and get feedback before scaling them up. For example, before emailing all students at your school or giving a talk, it can be good to ask someone for feedback first. I highly recommend staying in regular contact with other AI safety or EA group organizers.
Welcoming and friendly

Some reasons to not work on this:

You’re much more excited to do other types of community building, such as cause-impartial EA or animal welfare community building, or AI safety field-building in another setting
You’re a much stronger fit for contributing to AI safety than the students you might recruit
Your university has few students interested in AI or AI policy, and you wouldn’t have time to help with another university group (such as AI Safety at MIT)
You’d have to sacrifice key opportunities for building technical experience for your own career in AI safety

Conclusion

AI safety field-building at universities is a promising and neglected way to engage more people to work on reducing existential risk from AI. You don’t have to be studying at a top CS or policy university, since you can work on AI safety field-building where you are or at another university.

If your university doesn’t already have much of a community interested in AI safety, a great option to get started would be to organize an introductory AI safety seminar program and publicize it to local students. You could use EA Cambridge’s alignment curriculum or governance curriculum, or Columbia EA’s alignment curriculum. If you’re not able to facilitate the program yourself, EA Cambridge or another EA group should be able to support you in recruiting virtual facilitators from the hundreds of past participants. Then, all you’d have to do is create an application form, publicize it to your school (e.g., through listservs), assign participants and facilitators to cohorts, and coordinate the program.

It’s also valuable to encourage participants to take further action such as:

Applying for career advising from 80,000 Hours and/or speaking with AI Safety Support
Attending EA conferences to meet other students or professionals interested in AI safety
Applying for AI safety internships/jobs (from Internships for Impact or the 80,000 Hours jobs board), as well as jobs which help build relevant skills (e.g., generic software engineering or deep learning roles)
Taking courses in deep learning, natural language processing, or deep reinforcement learning
Getting relevant research experience, project experience, etc.
Working on distillation, ELK, or alignment research exercises

Other valuable activities that groups can run include projects, paper reading groups, guest speaker events, one-on-one conversations, and retreats.

You could organize a group either on a part-time or full-time basis. Group funding for snacks, books, and much more is available from the Centre for Effective Altruism’s Group Support Funding. Besides being impactful, group organizing can also be a great way to build career capital.

Get in touch!

~~If you’re interested in helping to build up the AI safety community at universities, please fill out this interest form here!~~

Edit (August 2025): Please check out Kairos for AI safety field building support.

Thanks to Justis Mills, Jamie Bernardi, Luise Wöhlke, Anjay Friedman, and Thomas Woodside for providing suggestions and feedback on this post! All mistakes are my own.

^{^}
When I say “AI safety university group”, I’m also including EA clubs which have significant activities focused on building the AI safety community. To the average computer science student, “effective altruism” sounds like it has nothing to do with either AI or computer science, so from a marketing perspective, I think it can be helpful to run AI safety activities under an AI safety club, but I don’t think it matters that much overall.
^{^}
Another risk is if humanity fails to create a future with net positive value for sentient beings, despite successfully managing to avoid extinction or disempowerment from AI – see The future might not be so great and A typology of s-risks.
^{^}
This post will largely focus on university groups, but some of the content here may apply to workplace groups. While reading groups (also known as “fellowships” or “seminar programs”) have been a successful way for EA or AI safety university groups to attract new members, this is unlikely to be as fruitful for workplace groups, as employees rarely have time to participate in a reading group. Workplace groups may want to focus on activities with a lower time commitment for participants, such as introductory talks or workshop series.
^{^}
Research impact is heavy-tailed; the most influential machine learning researchers have ~1,000× as many highly influential citations. Other roles like engineering might be less heavy-tailed; still, AI safety organizations are looking for quite excellent engineers. Here, I don't necessarily mean reaching people who are already highly skilled in machine learning, policy research, etc. While that would be great, we can also reach bright students who could become highly skilled later. Consequently, for people whose comparative advantage is in reaching university students, it can make sense to prioritize reaching people who are especially likely to be very talented in the future.
^{^}
See Communications Specialist | Fund for Alignment Research (FAR) and Call For Distillers - AI Alignment Forum.
^{^}
That would be (in no particular order) OxAI Safety Hub, EA Cambridge, EA Warwick, EA at MIT / AI Safety at MIT, EA at Harvard / EA at the Harvard Kennedy School, EA NYU, EA at the University of Virginia, EA at Georgia Tech, EA Northwestern, AI Safety at Mila, Columbia EA, EA Georgetown, London School of Economics EA, Imperial EA, EA at UC Berkeley, the Stanford Existential Risks Initiative, and UCLA EA. If I’m missing your group, sorry about that! Feel free to shoot me a message and say hello :)
^{^}
Here are my recommendations for learning about AI safety:
- General introductions to AI safety
  - Intro to AI Safety, Remastered (20 mins)
  - Why AI alignment could be hard with modern deep learning (20 mins)
  - Artificial Intelligence Will Do What We Ask. That’s a Problem. (15 mins)
- Careers
  - How to pursue a career in technical AI alignment (1 hour)
  - Long-term AI policy strategy research and implementation (10 mins)
  - Jobs in AI safety & policy - 80,000 Hours (note that some of many of these positions are not working on AI safety/governance per se but are just good for building career capital)
- More on AI risk
  - Specification gaming: the flip side of AI ingenuity (15 mins)
  - Why Would AI Want to do Bad Things? Instrumental Convergence (10 mins)
  - Intelligence and Stupidity: The Orthogonality Thesis (13 mins)
  - Intelligence explosion: evidence and import (pages 10–15, 15 mins)
  - Video and Transcript of Presentation on Existential Risk from Power-Seeking AI (30 mins)
  - The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment (25 mins)
  - Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think… (10 mins)
  - We Were Right! Real Inner Misalignment - YouTube (12 mins)
  - AI Could Defeat All Of Us Combined (15 mins)
- Research agendas
  - My Overview of the AI Alignment Landscape: A Bird's Eye View (15 mins)
  - The longtermist AI governance landscape: a basic overview (15 mins)

LESSWRONG
LW