AI Security Has Finally Become an AI Safety Concern — But I’m Not Sure We Mean the Same Thing

Harriet Farlow

Rejected for the following reason(s):

This is an automated rejection. No LLM generated, assisted/co-written, or edited work.

Read full explanation

TLDR: AI security used to be mostly ignored by AI safety people. Now it suddenly matters a lot, but I’m not convinced we’re talking about the same thing. Increasingly, “AI security” seems to mean protecting frontier labs like OpenAI from China stealing their models. But the version of AI security I’ve spent years working on is about the fact that AI systems themselves are often fundamentally insecure, easy to manipulate, and being rapidly deployed into environments that actually matter. My concern is that we’re focusing so heavily on protecting the labs that we risk ignoring whether the technology itself is secure in the first place.

Why I’m writing this

I’ve been thinking about writing this post for a long time now. I’ve been working in AI security for six years. Back in 2020, it sat awkwardly outside mainstream AI safety discussions and was not considered an EA cause area. Now it’s suddenly a major concern. But I think we still fundamentally disagree on what “AI security” — what we’re securing, and who we’re securing it from — actually means.

I came into this space through cybersecurity and national security. Around ten years ago, I started working as a data scientist in the Australian Department of Defence, and by 2023 I was Acting Technical Director of the AI Hub in the Australian Signals Directorate — Australia’s equivalent of GCHQ (UK) or the NSA (US).

My PhD is also in a field called adversarial machine learning, which focuses on methods that cause machine learning models to not behave as intended: disrupting them so they fail, deceiving them into producing incorrect outputs, or disclosing sensitive information. Prompt injection is one example of an adversarial machine learning technique, but there are over a hundred others.

I give this background because it explains how my understanding of AI security is shaped.

Back in 2021, data scientists were starting to realise that the models they were building were not only vulnerable to safety and ethical issues — where the model intentionally or unintentionally harms people in its environment — but security issues too, where some external actor (human or AI) actively tries to manipulate the model itself. However it was seen as a challenge that wouldn’t be relevant for a long time.

Cut to 2026, and suddenly AI security is a massive topic of conversation because of agentic AI and increasingly capable base models. But it’s often not the same AI security that earlier researchers were talking about.

In 2023 I founded Australia’s first (and still only dedicated) AI security company, Mileva Security Labs, to help cybersecurity folk understand how to secure AI systems too. I spend most of my time helping cybersecurity professionals understand the unique risks posed by AI so they know how to ensure their existing cybersecurity controls work on AI systems.

For example, in traditional cybersecurity if there is a code vulnerability that lets some hacker break into a system and exfiltrate (steal) a bunch of data, this would lead to massive impacts. Sensitive medical records, financial information, private messages, government documents, or customer identities could end up leaked online, sold to criminals, used for blackmail, or exploited by hostile states. Companies might face huge regulatory fines, lawsuits, reputational collapse, or even national security consequences depending on the data involved. This risk would be “patched” by some other piece of code.

However with AI systems, hackers can prompt engineer models to leak sensitive data because of the way machine learning models retain knowledge and are inherently probabilistic (rather than deterministic). This cannot be solved by a single code patch alone. It may require the model to undergo retraining, change the system prompt, or this risk may just need to be accepted.

So a control that prevents the same issue — data breaches — are totally different for traditional IT systems versus systems that comprise machine learning models. This is an example of an AI security issue.

If this doesn’t resonate with you, it may be because you’re used to hearing about AI security in very different ways. In my experience, AI security has three different definitions.

AI security definitions

AI security = securing AI systems from external threats (like the example above, and where most of my work sits).
AI security = AI for defensive or offensive security. For example, Claude’s Mythos as an AI system that can identify code vulnerabilities that can later be exploited, or shared with software providers so they can be patched.
AI security = ensuring frontier AI labs have good cybersecurity so their models can’t be stolen by other nation states.

I mentioned that AI security is finally being taken seriously, including in EA circles. However, most of the time I notice that this interest is in the third definition of AI security, specifically: so that China can’t steal OpenAI or Anthropic models.

Now as a cybersecurity person, I obviously firmly believe that every company should have good cybersecurity, including frontier AI labs. However I’m concerned about this framing for a couple of reasons.

First of all, it changes the referent object: the thing that is being protected. Instead of focusing on securing the AI system as a technology, it focuses on securing the organisation building them. This means there is a risk that security issues that arise from the uniquely different properties of machine learning models are ignored, including their probabilistic nature, dependence on massive amounts of training data, and liability to drift over time.

In the frontier-lab framing, the threat model is largely about humans misusing AI systems. The concern is hostile states stealing model weights, dangerous capabilities proliferating, or sophisticated actors gaining access to systems they shouldn’t have access to.

But the AI security world I came from was focused on something slightly different: harm emerging from the insecurity of the AI systems themselves. I worry that as AI security becomes increasingly framed through a national security lens, some of these inherent weaknesses in machine learning systems risk being sidelined. The conversation becomes heavily focused on protecting frontier labs from external actors, rather than asking whether the systems themselves are secure, robust, controllable, or resilient under adversarial pressure.

This distinction matters because the mitigations, expertise, and failure modes are completely different depending on which definition of AI security you adopt. If AI security becomes defined primarily as “protecting frontier labs from hostile states”, then a huge category of attacks and harms affecting actual deployed AI systems risk being deprioritised.

For example, we are already seeing real-world attacks that exploit the unique properties of machine learning systems rather than traditional software vulnerabilities: prompt injection attacks that manipulate LLM behaviour, model poisoning attacks that corrupt training data, jailbreaks that bypass safeguards, adversarial examples that cause models to misclassify inputs, and malicious open-source models uploaded to repositories like Hugging Face that execute arbitrary code during deployment. None of these problems are solved simply by giving frontier labs better perimeter cybersecurity.

If AI security conversations become dominated by frontier-lab geopolitics, we risk underinvesting in the operational security challenges that will shape how millions of organisations actually deploy AI systems over the coming months.

My second concern is that this framing can politicise AI security in ways that justify extraordinary interventions. As an ex-national security person, it probably goes without saying that I believe governments absolutely have a role to play in AI security. But I’m also wary of what happens when a technology becomes so tightly tied to national security interests that it effectively becomes treated as a strategic state asset.

Why the securitisation of AI is concerning

A precedent for this is encryption technologies, and the “crypto wars” of the 1990s, in which end-to-end encryption basically meant governments couldn’t easily wiretap phone lines of suspected criminals, even with a warrant. Governments argued that strong encryption threatened national security because it prevented intelligence agencies from accessing communications. The proposed solution was exceptional access and backdoors. But of course, backdoors don’t only work for governments. They create vulnerabilities that malicious actors can exploit too. The securitisation of encryption created its own security problems.

I worry we could repeat similar patterns with AI, but at a much larger scale. Once a technology becomes framed primarily as a national security asset, the conversation becomes less about building secure and trustworthy systems for society, and more about maintaining strategic advantage over adversaries.

You can already see this emerging: export controls on chips, increasing secrecy around frontier models, debates about open sourcing, and discussions around government oversight of advanced systems. Some of these measures may absolutely be justified. But securitisation also changes incentives in potentially dangerous ways.

If AI capability becomes tightly linked to national power, there is pressure to deploy systems faster, centralise access, tolerate greater risk, and prioritise capability advantage over robustness or security.

With AI, I could imagine similar arguments emerging around mandatory monitoring, government access, restrictions on open models, or centralised control over compute and deployment. And ironically, concentrating so much power and capability into a handful of strategically important labs may itself create new security risks too.

I’ve been adjacent to EA and AI safety circles for a few years now and have felt ideologically aligned in many ways. But I’ve also sometimes felt frustrated by the idea that the way I define AI security is not really considered a mechanism for AI safety.

The argument is usually that this kind of work is too short-term, too focused on current enterprise systems and existing models rather than frontier systems, and too concerned with present-day harms rather than future catastrophic risks from superintelligence.

And I do understand that perspective. But I think it risks overlooking something important: we cannot have safe AI without secure AI.

Regardless of which nation develops the most capable models, if we don’t leverage the existing cybersecurity toolkit — a field fundamentally built around adversarial pressure under uncertainty — then AI systems will remain vulnerable to manipulation by criminals, hackers, insider threats, and advanced persistent threats. Cybersecurity already has mature disciplines around red teaming, incident response, threat intelligence, secure architecture, risk analysis, and adversarial thinking. These are incredibly valuable capabilities for AI systems too.

At the same time, I think cybersecurity people have often been left out of AI safety conversations because the language used by the two communities is so different. A lot of cyber people hear AI safety framed in ways that feel abstract, philosophical, or disconnected from operational reality — “Terminator doom stuff”, essentially.

But whenever I talk to cybersecurity practitioners, it’s obvious to me that many of them do share the same underlying concerns as AI safety researchers: loss of control, adversarial misuse, dangerous incentives, fragile complex systems, and unpredictable behaviour under pressure.

At the end of the day, I’m writing this post because I’d love to spark more conversations with EA-aligned folk interested in AI security on these definitional challenges and how we can overcome them in a way that acknowledges the operational realities of our geopolitical climate, but still strives towards a safer and more secure AI future.

Would love to discuss below in the comments!