At EAGxBerkeley 2022, I was asked several times what new projects might benefit the AI safety and longtermist research ecosystem. I think that several existing useful-according-to-me projects (e.g., SERI MATS, REMIX, CAIS, etc.) could urgently absorb strong management and operations talent, but I think the following projects would also probably be useful to the AI safety/longtermist project. Criticisms are welcome.
Projects I might be excited to see, in no particular order:
- A London-based MATS clone to build the AI safety research ecosystem there, leverage mentors in and around London (e.g., DeepMind, CLR, David Krueger, Aligned AI, Conjecture, etc.), and allow regional specialization. This project should probably only happen once MATS has ironed out the bugs in its beta versions and grown too large for one location (possibly by Winter 2023). Please contact the MATS team before starting something like this to ensure good coordination and to learn from our mistakes.
- Rolling admissions alternatives to MATS’ cohort-based structure for mentors and scholars with different needs (e.g., to support alignment researchers who suddenly want to train/use research talent at irregular intervals but don’t have the operational support to do this optimally).
- A combined research mentorship and seminar program that aims to do for AI governance research what MATS is trying to do for technical AI alignment research.
- A dedicated bi-yearly workshop for AI safety university group leaders that teaches them how to recognize talent, foster useful undergraduate research projects, and build a good talent development pipeline or “user journey” (including a model of alignment macrostrategy and where university groups fit in).
- An organization that does for the Open Philanthropy worldview investigations team what GCP did to supplement CEA's workshops and 80,000 Hours’ career advising calls.
- Further programs like ARENA that aim to develop ML safety engineering talent at scale by leveraging good ML tutors and proven curricula like CAIS’ Intro to ML Safety, Redwood Research’s MLAB, and Jacob Hilton's DL curriculum for large language module alignment.
- More contests like ELK with well-operationalized research problems (i.e., clearly explain what builder/breaker steps look like), clear metrics of success, and have a well-considered target audience (who is being incentivized to apply and why?) and user journey (where do prize winners go next?). Possible contest seeds:
- Evan Hubinger’s SERI MATS deceptive AI challenge problem;
- Vivek Hebbar’s and Nate Soares’ SERI MATS diamond maximizer selection problem;
- Alex Turner’s and Quintin Pope’s SERI MATS training stories selection problem.
- More "plug-and-play" curriculums for AI safety university groups, like AGI Safety Fundamentals, Alignment 201, and Intro to ML Safety.
- A well-considered "precipism" university course template that critically analyzes Toby Ord's “The Precipice,” Holden Karnofsky's “The Most Important Century,” Will MacAskill's “What We Owe The Future,” some Open Philanthropy worldview investigations reports, some Global Priorities Institute ethics papers, etc.
- Hackathons in which people with strong ML knowledge (not ML novices) write good-faith critiques of AI alignment papers and worldviews (e.g., what Jacob Steinhardt’s “ML Systems Will Have Weird Failure Modes” does for Hubinger et al.’s “Risks From Learned Optimization”).
- A New York-based alignment hub that aims to provide talent search and logistical support for NYU Professor Sam Bowman’s planned AI safety research group.
- More organizations like CAIS that aim to recruit established ML talent into alignment research with clear benchmarks, targeted hackathons/contests with prizes, and offers of funding for large compute projects that focus on alignment. To avoid accidentally furthering AI capabilities, this type of venture needs strong vetting of proposals, possibly from extremely skeptical and doomy alignment researchers.
- A talent recruitment and onboarding organization targeting cyber security researchers to benefit AI alignment, similar to Jeffrey Ladish’s and Lennart Heim's theory of change. A possible model for this organization is the Legal Priorities Project, which aims to recruit and leverage legal talent for longtermist research.
This is somewhat risky, and should get a lot of oversight. One of the biggest obstacles to discussing safety in academic settings is that academics are increasingly turned off by clumsy, arrogant presentations of the basic arguments for concern.
Thanks for writing this, Ryan! I'd be excited to see more lists like this. A few ideas I'll add:
Regarding (2), I think the best AI gov research training org is substantially different from MATS' idealized form because:
Additional reasons why MATS + MAGS > MATS + governance patch:
The Bay is an AI hub, home to OpenAI, Google, Meta, etc., and therefore an AI governance hub. Governance is not governments. Important decisions are being made there - maybe more important decisions than in DC. To quote Allan Dafoe:
Also, many, many AI governance projects go hand-in-hand with technical expertise.
Maybe more broadly, AI strategy is part of AI governance.
Regarding point 5: AI safety researchers are already taking the time to write talks and present them (e.g., Rohin Shah's introduction to AI alignment, though I think he has a more ML-oriented version). If we work off of an existing talk or delegate the preparation of the talk, then it wouldn't take much time for a researcher to present it.
This is a pretty bad time for AI research in China, with the tech recession and chip ban. So there's a lot of talented researchers looking for jobs. What are your opinions on a Chinese hub for conceptual AI safety research? As far as I can tell, there are no MIRI-level AI safety research teams here in the Sinosphere.
How about individual researchers? Is there anyone prominently associated with "risks of AI takeover" and similar topics? edit: Or even associated with "benefits of AI takeover"!
I would be all for that, personally! Can’t speak for the broader community though, so recommend asking around :)
Currently, MATS finds it hard to bring Chinese scholars over to the US for our independent research/educational seminar program because of US visa restrictions on Chinese citizens. I think there is probably significant interest in a MATS-like program among Chinese residents and certainly lots of Chinese ML talent. I'm generally concerned by approaches to leverage ML talent for AI safety that don't select hard for value alignment with the longtermist/AI alignment project as this could accidentally develop ML talent that furthers AI capabilities. That said, if participant vetting is good enough and there are clear pathways for alumni to contribute to AI safety, I'm very excited about programs based in China or that focus on recruiting Chinese talent. I think the projects that I'm most excited about in this space would be MATS-like, ARENA-like, or CAIS-like. If you have further ideas, let me know!
What do you mean by "select hard for value alignment"? Chinese culture is very different from that of the US, and EA is almost unheard of. You can influence things by hiring, but expecting very tight conformance with EA culture is... unlikely to work. Are people interested in AI capabilities research currently barred from being hired by alignment orgs? I am very curious what the situation on the ground is.
Also, there are various local legal issues when it comes to advanced research. Sharing genomics data with foreign orgs is pretty illegal, for example. There's also the problem of not having the possibility of keeping research closed. All companies above a certain size are required to hire a certain number of Party members to act as informants.
So what has stopped there from being more alignment orgs in China? Is it bottleneck local coordination, interest, vetting, or funding? I'd very much be interested in participating in any new projects.
Value alignment here means being focused on improving humanity's long term future by reducing existential risk, not other specific cultural markers (identifying as EA or rationalist, for example, is not necessary). Having people working towards same goal seems vital for organizational cohesion, and I think alignment orgs would rightly not hire people who are not focused on trying to solve alignment. Upskilling people who are happy to do capabilities jobs without pushing hard internally for capabilities orgs to be more safety focused seems net negative.
I think it's important for AI safety initiatives to screen for participants that are very likely to go into AI safety research because:
These concerns might be addressed by:
Could/should there be an organization that helps grant visas to people working on AI safety?
I don't have insider information, but I think that Aligned AI, Anthropic, ARC, CLR, Conjecture, DeepMind, Encultured AI, FAR AI, MIRI, OpenAI, and Redwood Research (not an all-inclusive list) could all probably offer visas to employees. The MATS Program currently assists scholars in obtaining US B-1 visas or ESTAs and UK Standard Visitor visas. Are you asking whether there should be an organization that aims to hire people to work long-term on AI safety niches that these organizations do not fill, and if so, which niches?
That might be interesting, but I was wondering if one organization could be "the visa people" who do most of the visa-related work for all the organizations you listed. But maybe this work requires little time or is difficult to outsource?
Rethink Priorities and Effective Ventures are fiscal sponsors for several small AI safety organizations and this role could include handling their visas. There might be room for more such fiscal sponsor charities, as Rethink Charity are closing down their fiscal sponsorship program and Players Philanthropy Fund isn't EA-specific.
These proposals all seem robustly good. What would you think of adding an annual AI existential safety conference?
Currently, it seems that the AI Safety Unconference and maybe the Stanford Existential Risks Conference are filling this niche. I think neither of these conferences is optimizing entirely for the AI safety researcher experience; the former might aim to appeal to NeurIPS attendees outside the field of AI safety, and the latter includes speakers on a variety of x-risks. I think a dedicated AI safety conference/unconference detached from existing ML conferences might be valuable. I also think boosting the presence of AI safety at NeurIPS or other ML conferences might be good, but I haven't thought deeply about this.
I think my lab is bottlenecked on things other than talent and outside support for now, but there probably is more that could be done to help build/coordinate an alignment research scene in NYC more broadly.
Quick note on 2: CBAI is pretty concerned about our winter ML bootcamp attracting bad-faith applicants and plan to use a combo of AGISF and references to filter pretty aggressively for alignment interest. Somewhat problematic in the medium term if people find out they can get free ML upskilling by successfully feigning interest in alignment, though...
The AI Security Initiative at the UC Berkeley Center for Long-Term Cybersecurity probably fulfills this role, though I imagine additional players in this space would be useful.
I envisage something like AI Safety Camp or the AI safety Mentors and Mentees Program, except with the capacity to offer visas.