This is a post by Abbey Chaver from Coefficient Giving (formerly Open Philanthropy). I recently did a relatively shallow investigation on the state of Infosec x AI. My research consisted of identifying the main GCR-relevant workstreams, looking at relative neglectedness, and trying to understand what skills were most needed to make progress.
The post below reflects this research and my opinions, and shouldn’t be seen as an official cG position.
Summary
There are several high impact, neglected problems that need cybersecurity work. Specifically, I think that more people should work on what I call the AI Infrastructure Security Shortlist:
Securing model confidentiality and integrity
Authorization solutions for misuse
Secure compute verification
Protocols for securely using untrusted AI labor
Preventing, detecting, and responding to rogue deployments
The short pitch for working on AI security is, “If AI gets really powerful, we should be really careful about who has access to it, and whether or not it can take unauthorized actions.”
I think that areas like AI red-teaming, cyber evals, and securing critical infrastructure are important, but relatively well-resourced with funding, talent, and awareness. Therefore, I don’t recommend them as a top priority.
I think these shortlist problems are important, but not necessarily easy to solve. It’s very high leverage for people to break these problems down and research what’s feasible. I’d like more people to work on them and share their findings to help guide the rest of the ecosystem.
The theory of change for this shortlist is something like:
A policy research org (eg, RAND) identifies some meaningful objective (like securing model weights), and proposes some strategic solutions, but does not pursue implementation.
A technical org (eg, Frontier AI companies; or a startup, such as Irregular) does the R&D to develop those proposals into usable solutions.
Once solutions exist, they are adopted. This might happen via sales, incorporation into industry standards or open source software, or potentially regulation.
To accomplish this, the AI infra security shortlist seems most bottlenecked on people with experience in:
Security engineering and research, especially at hyperscalars or in national security contexts
AI infrastructure engineering (including systems and hardware)
Standards development
Leaders and entrepreneurs to start and lead workstreams
If you have those skills, I think it’s very impactful to work on one of the problems listed above.
If you want to work on the shortlist, and especially if you have one of these backgrounds, here are some some next steps I recommend:
Apply to the organizations currently working on the shortlist (see Appendix A)
Connect with advisors at Heron, which specializes in Infosec careers.
If you think you can contribute to the field, but don’t have a concrete project or job opening in mind yet, or want to prepare to launch a new org, you can apply for funding to help you transition through our Career Development and Transition Funding program.
If you’re ready to launch a new org or work on a substantial project in need of funding, consider applying to our AI Governance Request for Proposals
We are also interested in funding fieldbuilding work to solve these problems, like training security engineers in AI risk, or creating online and in-person forums for collaboration.
State of the Field
What does Security for Transformative AI mean? Even ignoring the recent use of “AI Security” in place of “AI Safety” in the policy world, security covers a huge surface area of work. I’m using the classical definition of “protecting information systems against unauthorized use.”
In the context of transformative AI, I think there are three categories of problems where infosec work is critical:
Securing AI and compute from bad (human) actors
Securing systems and compute from rogue AI
Responding to AI cyber capabilities
I’m focusing specifically on infosec techniques, so for this analysis I’m excluding work that relies on heavy ML research (like interpretability, alignment training, or scheming experimentation), although of course there are areas of overlap.
To figure out what the priority areas are, I tried to identify the most important workstreams and compare that to the current level of investment and attention (full results are here). Here are my main takeaways:
Securing AI and compute from bad actors, aside from red-teaming, is highly neglected.
Securing systems and compute from rogue AI is very neglected (and somewhat outside the policy Overton window). I think these problems are highly unlikely to be worked on outside of the AI Safety community.
Responding to AI cyber capabilities has the most mainstream awareness and resourcing. It will likely receive additional funding from the USG and OpenAI’s resilience fund.
There’s a cluster of important, neglected problems here that can be summarized as “securing AI infrastructure,” so that’s what I’ll mainly focus on for the rest of this post.
The problems are:
Securing model weights and algorithms from exfiltration
Authorization solutions for misuse
Secure compute verification
Protocols for securely using untrusted AI labor
Preventing, detecting, and responding to rogue deployments
I’m estimating that about 120 people have worked on significantly advancing these fields with regards to existential risk, and the FTE equivalent is probably something like 40-60. I think given its importance, more people should do AI infra security on the margin. So how do we make progress?
Progress strategy
Recapping the theory of change:
A research org defines an important problem and proposes some strategic solutions.
A technical org does R&D on usable solutions.
Industry or regulatory standards spur adoption
Let’s look at each of these steps.
Strategy Research
The field of policy research orgs (like RAND and IAPS) is the most mature, with a number of orgs at a point where they are producing well-received work and can absorb more talent. These orgs need people with strong technical security backgrounds, GCR context, and policy skills. National security experience and AI lab security experience can make this work meaningfully stronger.
This work is very high leverage: by defining problems and success criteria clearly, breaking problems down into tractable directions, and creating a shared terminology across policymakers and builders, strategy research unblocks the next steps.
Working in security at a frontier lab is the most direct way to work on these problems. This work is more focused on short-term, urgent implementation. So this is a great option, but there’s also important work to be done outside the labs, especially to de-risk solutions that will be needed in the future.
Outside of labs, the space of technical implementation orgs is pretty underdeveloped and often missing for many of the proposals coming out of policy research orgs. It would be great to have more orgs doing R&D. One big factor in whether they can be successful is whether their solutions can easily be adopted by frontier labs.
These orgs need people with strong security engineering skills across the ML stack to do R&D, and feedback or in-house expertise on AI labs to make their solutions usable. They also need nation-state level offensive security to ensure their solutions are robust.
There’s a variety of approaches for technical implementation outside of labs. If you’re thinking about doing work in this space, you should consider different structures:
A non-profit R&D organization, possibly structured as an FRO, which produces technology for public access. For example, the Lean FRO.
A for-profit consultancy, like Amodo Design, that is contracted to develop projects given a specification. These prototypes can be used to demonstrate what’s possible to a policy audience, or be used as a basis for scaled production by another company (like a datacenter).
A for-profit startup, like Irregular, that intends to develop, sell, and scale security technology.
Adoption
There’s some advocacy for adoption happening in policy research orgs, but there are many gaps to fill. We don’t have much in the way of industry convenings, and we don’t have many technical standards that are ready to be adopted into legislation. The SL5 Task Force is an example of a technical org that takes advocacy seriously – seeking input from both national security and frontier lab stakeholders to develop adoptable solutions.
Startups necessarily have to do advocacy – you can’t make money without doing sales! Therefore, I’m pretty excited about seeing more startups working on these problems. However, there can be a large gap between current market demands and, for example, SL5-level protections, so it might not always be helpful. In cases where incremental security is valuable, and short-term adoption improves the cost or effectiveness of the eventual solution, I think it’s a good approach.
For policy advocacy, there’s a need for both policy drafting (writing usable standards and legislation), and policy entrepreneurship (communicating problems and solutions to congressional staff and executive branch regulators, and iterating the approach based on policymakers’ feedback). Building industry buy-in is also a major lever for policy advocacy.
Talent needs
AI infra security seems to be most bottlenecked on:
Midcareer security engineers from frontier lab, hyperscalar, or national security backgrounds
People who have worked in depth with ML systems, especially on infra, systems, and hardware
People who can do effective policy or industry advocacy, including standards writing
A few other types of talent that will be useful:
Formal verification experts: a promising avenue for achieving really high security standards
Academic security researchers: similarly helpful for really secure mechanism design, though not always that easy to integrate into successful R&D for adoption.
Entrepreneurs: leadership experience, persistent iteration towards product-market fit, and marketing and sales competence are highly valuable for this theory of change
If you have these skills, I think you should strongly consider working on the shortlist!
Next Steps
(Repeated from the summary)
Apply to the organizations currently working on the shortlist (see below)
Connect with Heron for advising, which specializes in Infosec careers.
If you’re ready to launch a new org or work on a substantial project in need of funding, consider applying to our AI Governance Request For Proposals
We are also interested in funding fieldbuilding work to solve these problems, like training security engineers in AI risk, or creating online and in-person forums for collaboration.
Thank you for reading!
Appendix A: State of the Field Table
These estimates were based on listing out organizations and then estimating the number of contributors on relevant projects at each organization. I likely missed independent or non-public research. These figures also do not reflect FTE commitments.
Objective
Solution examples
Who's working on it? (Non-exhaustive)
Estimated Contributors
1. Securing AI and compute from bad actors
Securing model weights and algorithms from exfiltration (privacy)
- Research about how to recognize this in the wild
- Incident reporting framework for targets and AI labs
Palisade, Google and Anthropic sort of tracking (but not the ideal orgs to track),OECD, OCRG
5-10
Securing critical infrastructure
- Rewriting critical infrastructure code in Rust
Atlas, Delphos, CSET, Center for Threat-Informed Defense, DHS, DARPA
150-200
Securing AI-generated code
- Formal verification of AI-written code
- AI-driven code testing and code review
- AI-driven pen-testing
Theorem labs, DARPA, Galois, Trail of Bits, Various security startups, some overlap with securing untrusted AI labor
120-150
Epistemic Security
- AI Watermarking
- Provenance for digital evidence
- AI-secure identity management and authentication
C2PA, GDM, DARPA SemaFor, various startups
150-200
Appendix B: Choosing what to work on
If you have any of these priority skills, I think you can work on any of the shortlist problems. I’d think more about which surface area to focus on based on your background – for example, if you have hardware experience, I hope you focus on developing hardware security mechanisms in datacenters, rather than new monitoring software for frontier AI labs.
If the problem you choose ends up being intractable, you should be able to pivot pretty easily to another, because there’s a lot of overlap in the surface areas and techniques.
It makes sense to choose problems based on which threat actor you’re more worried about (rogue AI, nation state adversary, terrorist), but again a lot of work will be robustly useful against all three.
For any given problem, paying attention to the overall maturity is useful – where is it in the theory of change? Which part of the problem is currently most tractable?
If you want to work on something not on the shortlist (including things I didn’t even list in this analysis), that might be really good! This was a pretty shallow investigation, so exploring other directions in more depth could be valuable (and you should share your work to help others decide).
Appendix C: Comparing to other AI Safety work
I’ve made mostly neglectedness arguments for working on these problems. For a visual illustration, here’s a graph of the number of technical AI safety researchers by area (not compiled by me):
(I don’t think this data is comprehensive, but it provides some rough idea).
Beyond neglectedness, Infosec work has some other nice properties:
Infosec problems are well-defined and a lot of techniques have already been developed by smart people, so you can make concrete progress on increasing security (compared to other technical AI Safety work that’s more speculative)
This shortlist is most important in a world where we are not able to solve alignment before we reach a dangerous level of AI capability
This shortlist work is less dependent on the specific implementation of AI (compared to, eg, Interpretability), so it’s useful in more AI paradigms than the current LLM paradigm
Some arguments against working in Infosec are:
A truly superintelligent AI will be able to evade any security protections we put in place, so we should only focus on aligning AI
Infosec is intractable for other reasons (cost, human persuasion, etc)
There are other subfields that are even smaller and equally or more important
Increasing security may increase some other risks, like concentration of power or a preemptive strike
I’m probably not providing comprehensive arguments against, and I think these takes are all reasonable. But hopefully, the arguments in favor provide enough grounding to seriously consider whether you should work on Infosec.
This is a post by Abbey Chaver from Coefficient Giving (formerly Open Philanthropy). I recently did a relatively shallow investigation on the state of Infosec x AI. My research consisted of identifying the main GCR-relevant workstreams, looking at relative neglectedness, and trying to understand what skills were most needed to make progress.
The post below reflects this research and my opinions, and shouldn’t be seen as an official cG position.
Summary
If you want to work on the shortlist, and especially if you have one of these backgrounds, here are some some next steps I recommend:
State of the Field
What does Security for Transformative AI mean? Even ignoring the recent use of “AI Security” in place of “AI Safety” in the policy world, security covers a huge surface area of work. I’m using the classical definition of “protecting information systems against unauthorized use.”
In the context of transformative AI, I think there are three categories of problems where infosec work is critical:
I’m focusing specifically on infosec techniques, so for this analysis I’m excluding work that relies on heavy ML research (like interpretability, alignment training, or scheming experimentation), although of course there are areas of overlap.
To figure out what the priority areas are, I tried to identify the most important workstreams and compare that to the current level of investment and attention (full results are here). Here are my main takeaways:
There’s a cluster of important, neglected problems here that can be summarized as “securing AI infrastructure,” so that’s what I’ll mainly focus on for the rest of this post.
The problems are:
I’m estimating that about 120 people have worked on significantly advancing these fields with regards to existential risk, and the FTE equivalent is probably something like 40-60. I think given its importance, more people should do AI infra security on the margin. So how do we make progress?
Progress strategy
Recapping the theory of change:
Let’s look at each of these steps.
Strategy Research
The field of policy research orgs (like RAND and IAPS) is the most mature, with a number of orgs at a point where they are producing well-received work and can absorb more talent. These orgs need people with strong technical security backgrounds, GCR context, and policy skills. National security experience and AI lab security experience can make this work meaningfully stronger.
This work is very high leverage: by defining problems and success criteria clearly, breaking problems down into tractable directions, and creating a shared terminology across policymakers and builders, strategy research unblocks the next steps.
Some examples of this are the Securing Model Weights report by RAND, which was adopted as a voluntary industry standard and is present the AI Safety policies of frontier developers, including Anthropic, OpenAI and DeepMind; or the Location Verification report which was an idea first publicly promoted by IAPS and was later mentioned in the Trump AI Action plan and later developed as a feature by Nvidia.
Technical Implementation
Working in security at a frontier lab is the most direct way to work on these problems. This work is more focused on short-term, urgent implementation. So this is a great option, but there’s also important work to be done outside the labs, especially to de-risk solutions that will be needed in the future.
Outside of labs, the space of technical implementation orgs is pretty underdeveloped and often missing for many of the proposals coming out of policy research orgs. It would be great to have more orgs doing R&D. One big factor in whether they can be successful is whether their solutions can easily be adopted by frontier labs.
These orgs need people with strong security engineering skills across the ML stack to do R&D, and feedback or in-house expertise on AI labs to make their solutions usable. They also need nation-state level offensive security to ensure their solutions are robust.
There’s a variety of approaches for technical implementation outside of labs. If you’re thinking about doing work in this space, you should consider different structures:
Adoption
There’s some advocacy for adoption happening in policy research orgs, but there are many gaps to fill. We don’t have much in the way of industry convenings, and we don’t have many technical standards that are ready to be adopted into legislation. The SL5 Task Force is an example of a technical org that takes advocacy seriously – seeking input from both national security and frontier lab stakeholders to develop adoptable solutions.
Startups necessarily have to do advocacy – you can’t make money without doing sales! Therefore, I’m pretty excited about seeing more startups working on these problems. However, there can be a large gap between current market demands and, for example, SL5-level protections, so it might not always be helpful. In cases where incremental security is valuable, and short-term adoption improves the cost or effectiveness of the eventual solution, I think it’s a good approach.
For policy advocacy, there’s a need for both policy drafting (writing usable standards and legislation), and policy entrepreneurship (communicating problems and solutions to congressional staff and executive branch regulators, and iterating the approach based on policymakers’ feedback). Building industry buy-in is also a major lever for policy advocacy.
Talent needs
AI infra security seems to be most bottlenecked on:
A few other types of talent that will be useful:
If you have these skills, I think you should strongly consider working on the shortlist!
Next Steps
(Repeated from the summary)
Thank you for reading!
Appendix A: State of the Field Table
These estimates were based on listing out organizations and then estimating the number of contributors on relevant projects at each organization. I likely missed independent or non-public research. These figures also do not reflect FTE commitments.
- SL5 implementation
- Regulations on high security standards for labs
30-40
- Training data filtering and provenance
- Threat modeling for sabotage during development
30-50
- Misuse compute governance, eg on-chip safety protocols for open source models
- KYC / authentication / licensing for model usage or development
- HEMs like flexHEGs
- preventing stolen models from being used at scale
2-10
- Datacenter-level verification of training / inference tasks
- preventing tampering of verification tooling
20-30
200
- Design and implementation of fine-grained permissions and identity that works for AI laborers
- Monitoring AIs within a lab for rogue actions
5-15
- In-lab monitoring of network boundary, compute usage, heavy use of sandboxes
- Response playbook and mechanisms
- Secure logging of misaligned behavior in real-world scenarios
1-5
- Benchmarks like CVEbench
- Honeypotting for AI agents
70-80
- Research about how to recognize this in the wild
- Incident reporting framework for targets and AI labs
5-10
150-200
- Formal verification of AI-written code
- AI-driven code testing and code review
- AI-driven pen-testing
120-150
- AI Watermarking
- Provenance for digital evidence
- AI-secure identity management and authentication
150-200
Appendix B: Choosing what to work on
Appendix C: Comparing to other AI Safety work
I’ve made mostly neglectedness arguments for working on these problems. For a visual illustration, here’s a graph of the number of technical AI safety researchers by area (not compiled by me):
(I don’t think this data is comprehensive, but it provides some rough idea).
Beyond neglectedness, Infosec work has some other nice properties:
Some arguments against working in Infosec are:
I’m probably not providing comprehensive arguments against, and I think these takes are all reasonable. But hopefully, the arguments in favor provide enough grounding to seriously consider whether you should work on Infosec.