In this review sequence, I lay out why I think security, cryptography, and auxiliary approaches are still neglected in AI safety, specific insights those communities can offer for AI safety, and some potential next steps.
I am writing this post now, as it seems to have become a more pressing focus within EA/AI safety but there are some issues and approaches I haven’t seen much discussion on here yet.
I am not an expert in any of the fields mentioned here, but given that Foresight Institute, the non-profit I manage, has a long-standing security community that cares about AI safety, I hope to provide a few birds-eye view reasons and examples gathered from discussions within the community.
Thank you to many of the people mentioned throughout this sequence for reviewing my characterization of their efforts. Any mistakes are my own, and I encourage general feedback, criticism, and pointers to relevant research and efforts.
The sequence contains four parts highlighting promising areas to consider:
I plan to publish the five parts of this sequence spaced out by a few days, starting with part 1 today. All of them should be relatively understandable as stand-alone pieces and will be linked here. Feel free to skip around to the parts that you find most interesting.
Overview
In this review sequence, I lay out why I think security, cryptography, and auxiliary approaches are still neglected in AI safety, specific insights those communities can offer for AI safety, and some potential next steps.
I am writing this post now, as it seems to have become a more pressing focus within EA/AI safety but there are some issues and approaches I haven’t seen much discussion on here yet.
I am not an expert in any of the fields mentioned here, but given that Foresight Institute, the non-profit I manage, has a long-standing security community that cares about AI safety, I hope to provide a few birds-eye view reasons and examples gathered from discussions within the community.
Thank you to many of the people mentioned throughout this sequence for reviewing my characterization of their efforts. Any mistakes are my own, and I encourage general feedback, criticism, and pointers to relevant research and efforts.
The sequence contains four parts highlighting promising areas to consider:
Part 1 | AI infosec: first strikes, zero-day markets, hardware supply chains
Part 2 | AI safety and the security mindset: user interface design, red-teams, formal verification
Part 3 | Boundaries-based security and AI safety approaches
Part 4 | Specific cryptographic and auxiliary approaches to consider for AI
Part 5 closes with a summary and potential next steps.
I plan to publish the five parts of this sequence spaced out by a few days, starting with part 1 today. All of them should be relatively understandable as stand-alone pieces and will be linked here. Feel free to skip around to the parts that you find most interesting.