Open internship position + call for collaborations on threat model-dependent alignment, governance, and offense/defense balance
At the Existential Risk Observatory, we're currently carrying out a project called Solving the Right Problem: Towards Researchers Consensus on AI Existential Threat Models, together with MIT FutureTech and FLI.
Although many leading researchers agree that advanced AI can cause human extinction, there is major disagreement on how this could happen. Different researchers have very different existential threat models. Examples of such threat models include self-improvement leading to a takeover (Yudkowsky, Bostrom), what failure looks like 1+2 (Christiano), and gradual disempowerment (Kulveit). We are currently performing a substantive literature review and building a taxonomy aiming for a complete list of existential threat models.
When we have completed the literature review, we want to continue with the next steps: working towards researcher consensus on underlying threat model pathways and assumptions (key cruxes), threat model-dependent AI alignment, threat model-dependent AI governance, and threat model-dependent offense/defense balance.
As a personal impression, I observed that researchers working on all these fields are often talking past each other, since they have different threat models in mind which are not made explicit, but have major downstream consequences. Threat model confusion is even larger for policymakers and the general public, both doing significant damage in my opinion. As long as we don't have clarity on which problem we are solving exactly, how can we expect to converge on a good solution?
I hope that approaching these things in an explicitly threat model-dependent way can help to deconfuse subfields like alignment, governance, and offense/defense balance. For alignment, one might ask to which extent the concept of alignment is defined, relevant, and helpful for different threat models, and insofar it is helpful, which type, accuracy, and robustness of alignment may be required to prevent different threat models. For governance, one might propose novel policy measures aimed at reducing risks originating from specific threat models, and compare existing ones. For offense/defense balance, one might ask whether a specific type and application of AI corresponding to a certain threat model will be offense- or defense-dominant. I hope that approaching these things in an explicitly threat model-dependent way can help to deconfuse several subfields.
We currently have an internship position open (deadline 11 May, details here) to help us with this work. If you are interested, we encourage you to apply. However, the internship format may not make sense for everyone.
If you are sympathetic to our threat model-dependent work, if you think you have input, and if you would like to help us succeed, you can therefore also reach out directly: info@existentialriskobservatory.org.
Open internship position + call for collaborations on threat model-dependent alignment, governance, and offense/defense balance
At the Existential Risk Observatory, we're currently carrying out a project called Solving the Right Problem: Towards Researchers Consensus on AI Existential Threat Models, together with MIT FutureTech and FLI.
Although many leading researchers agree that advanced AI can cause human extinction, there is major disagreement on how this could happen. Different researchers have very different existential threat models. Examples of such threat models include self-improvement leading to a takeover (Yudkowsky, Bostrom), what failure looks like 1+2 (Christiano), and gradual disempowerment (Kulveit). We are currently performing a substantive literature review and building a taxonomy aiming for a complete list of existential threat models.
When we have completed the literature review, we want to continue with the next steps: working towards researcher consensus on underlying threat model pathways and assumptions (key cruxes), threat model-dependent AI alignment, threat model-dependent AI governance, and threat model-dependent offense/defense balance.
As a personal impression, I observed that researchers working on all these fields are often talking past each other, since they have different threat models in mind which are not made explicit, but have major downstream consequences. Threat model confusion is even larger for policymakers and the general public, both doing significant damage in my opinion. As long as we don't have clarity on which problem we are solving exactly, how can we expect to converge on a good solution?
I hope that approaching these things in an explicitly threat model-dependent way can help to deconfuse subfields like alignment, governance, and offense/defense balance. For alignment, one might ask to which extent the concept of alignment is defined, relevant, and helpful for different threat models, and insofar it is helpful, which type, accuracy, and robustness of alignment may be required to prevent different threat models. For governance, one might propose novel policy measures aimed at reducing risks originating from specific threat models, and compare existing ones. For offense/defense balance, one might ask whether a specific type and application of AI corresponding to a certain threat model will be offense- or defense-dominant. I hope that approaching these things in an explicitly threat model-dependent way can help to deconfuse several subfields.
We currently have an internship position open (deadline 11 May, details here) to help us with this work. If you are interested, we encourage you to apply. However, the internship format may not make sense for everyone.
If you are sympathetic to our threat model-dependent work, if you think you have input, and if you would like to help us succeed, you can therefore also reach out directly: info@existentialriskobservatory.org.
Thank you for your time!