CLR is excited about safe Pareto improvements (SPIs) as a way to mitigate downsides from conflict between AIs. SPIs are a class of interventions on how agents negotiate that makes them all better off, no matter how they would have negotiated without the SPI.
Among many candidate interventions against AI conflict, SPIs stand out to us as unusually robust — see the introduction of our agenda on the topic. And in discussions with people who’ve thought a lot about conflict risks, we’ve found there’s broad support for work on SPIs. For those sympathetic to CLR’s general priorities and with relevant skills (see below), we think helping SPIs go well is one of the most impactful career paths.
But work on this area is currently very neglected (~2.5 FTE), and there isn’t yet an on-ramp for people to get up to speed.
To address these gaps, we’re running an SPI Fundamentals Program: an online course for people looking to learn about risks of AI conflict, how SPIs might address them, and open problems in this field. We plan to hire for SPI research roles, and we’re keen for you to apply to the program whether you want to test your fit for such a role, or you’d like to learn more and potentially contribute outside CLR.
The program will take place between Monday August 3rd and Friday August 28th. The program will consist of weekly readings, short exercises, Slack discussions, and office hours with CLR’s research lead on our SPI agenda, Anthony DiGiovanni. Participants interested in additional practice with SPI research can also do a paid capstone project, which would take place from Monday August 31st to Friday September 4th. The weekly hour-commitment is around 5-7 hours.
Apply for the SPI Fundamentals Program through this link by 23:59 GMT Friday July 24th.
Content
The SPI Fundamentals Program is designed to help participants develop a strong understanding of SPI concepts, and the methodology/frames that guide research in our agenda. The readings will be relatively technical, but won’t involve very advanced math — the most formally dense material will be DiGiovanni et al. (2024) and sections 1-4 of Oesterheld & Conitzer (2021).
By the time participants complete the curriculum, they should be able to answer the following (not exhaustive):
What are the high-level sufficient conditions for “rational” agents to avoid conflict? Why might those conditions not hold?
What are bargaining problems, and why aren’t they immediately resolved by intelligence / “good decision theory”?
How do the canonical examples of SPIs — surrogate goals, delegated game-playing, renegotiation — work?
What are the obstacles to SPIs being in each agent’s individual interest, ex ante? What are the existing results on resolving those obstacles?
What are the high-priority open problems in each of the three parts of CLR’s SPI agenda?
For the final week of the curriculum, participants can choose between two “streams”:
Conceptual: focused on, e.g., “What are the arguments for and against the key modeling assumptions of DiGiovanni et al. (2024)?”
Empirical: focused on, e.g., “Concretely, how do we evaluate LLMs for SPI safety failures?”
Exercises, office hours, and capstone projects will be designed to give participants better feedback loops, and a more nuanced understanding of SPIs, than they’d get from reading the materials alone. Examples of capstone projects: drafting a short proposal for an eval or conceptual research problem about SPIs; critiquing LLM-written SPI research; writing a doc on how a particular alignment technique might be used for implementing SPIs.
Target audience
We think the SPI Fundamentals Program will be most useful for you if you want to explore a career in AI conflict reduction. It could also be useful if you’re already working in an area that overlaps with our SPI agenda (e.g. cooperative AI, agent foundations), and are interested in reducing conflict risks via your current work.
While the curriculum is heavily skewed toward conceptual content, we expect it to also be important background for empirical work on SPIs, including research automation.
A great candidate might have any of the following backgrounds or skills — but you’re not required to be an expert in any of these, and we expect you’d be a good fit if you can parse most of the resources linked throughout this post:
Backgrounds:
game theory
mathematics/statistics
economics
decision theory
analytic/formal philosophy
computer science
theoretical physics
Skills:
constructing and thinking critically about models (both formal and informal) of complex/unfamiliar systems
reasoning about incentives
breaking down necessary and sufficient conditions for a given outcome
turning rough intuitions into claims that are appropriately precise
(for the empirical stream) experimental design, thinking about what a given test really measures
You don’t need any prior engagement with CLR’s research for this program. We will expect basic familiarity with AI safety concepts and game theory at the level of, e.g., material covered here.
I’m excited for this! I think that SPIs are one of the top few research directions that mathematically inclined people can contribute to to make AI go better.
CLR is excited about safe Pareto improvements (SPIs) as a way to mitigate downsides from conflict between AIs. SPIs are a class of interventions on how agents negotiate that makes them all better off, no matter how they would have negotiated without the SPI.
Among many candidate interventions against AI conflict, SPIs stand out to us as unusually robust — see the introduction of our agenda on the topic. And in discussions with people who’ve thought a lot about conflict risks, we’ve found there’s broad support for work on SPIs. For those sympathetic to CLR’s general priorities and with relevant skills (see below), we think helping SPIs go well is one of the most impactful career paths.
But work on this area is currently very neglected (~2.5 FTE), and there isn’t yet an on-ramp for people to get up to speed.
To address these gaps, we’re running an SPI Fundamentals Program: an online course for people looking to learn about risks of AI conflict, how SPIs might address them, and open problems in this field. We plan to hire for SPI research roles, and we’re keen for you to apply to the program whether you want to test your fit for such a role, or you’d like to learn more and potentially contribute outside CLR.
The program will take place between Monday August 3rd and Friday August 28th. The program will consist of weekly readings, short exercises, Slack discussions, and office hours with CLR’s research lead on our SPI agenda, Anthony DiGiovanni. Participants interested in additional practice with SPI research can also do a paid capstone project, which would take place from Monday August 31st to Friday September 4th. The weekly hour-commitment is around 5-7 hours.
Apply for the SPI Fundamentals Program through this link by 23:59 GMT Friday July 24th.
Content
The SPI Fundamentals Program is designed to help participants develop a strong understanding of SPI concepts, and the methodology/frames that guide research in our agenda. The readings will be relatively technical, but won’t involve very advanced math — the most formally dense material will be DiGiovanni et al. (2024) and sections 1-4 of Oesterheld & Conitzer (2021).
By the time participants complete the curriculum, they should be able to answer the following (not exhaustive):
For the final week of the curriculum, participants can choose between two “streams”:
Exercises, office hours, and capstone projects will be designed to give participants better feedback loops, and a more nuanced understanding of SPIs, than they’d get from reading the materials alone. Examples of capstone projects: drafting a short proposal for an eval or conceptual research problem about SPIs; critiquing LLM-written SPI research; writing a doc on how a particular alignment technique might be used for implementing SPIs.
Target audience
We think the SPI Fundamentals Program will be most useful for you if you want to explore a career in AI conflict reduction. It could also be useful if you’re already working in an area that overlaps with our SPI agenda (e.g. cooperative AI, agent foundations), and are interested in reducing conflict risks via your current work.
While the curriculum is heavily skewed toward conceptual content, we expect it to also be important background for empirical work on SPIs, including research automation.
A great candidate might have any of the following backgrounds or skills — but you’re not required to be an expert in any of these, and we expect you’d be a good fit if you can parse most of the resources linked throughout this post:
You don’t need any prior engagement with CLR’s research for this program. We will expect basic familiarity with AI safety concepts and game theory at the level of, e.g., material covered here.
Contact
If you have any questions about the program or are uncertain whether to apply, please reach out to info@longtermrisk.org or anthony.digiovanni@longtermrisk.org.