Models finding software vulnerabilities is not the primary source of cybersecurity risk

lc

Models finding software vulnerabilities is not the primary source of cybersecurity risk — LessWrong

312 Models finding software vulnerabilities is not the primary source of cybersecurity risk

by lc

14th May 2026

3 min read

312

I have tried and failed to write a longer post since 2024, so here goes a short one with less detail.

Discourse has primarily focused on models' ability to develop new exploits against important software from scratch. That capability is impressive, but the tech industry has been dealing with people regularly finding 0-day exploits for important pieces of software for more than twenty years. Having to patch these vulnerabilities at a 10xed or even 100xed cadence for a fixed period of time is well within the resources of Mozilla, the Linux Foundation, and Microsoft. The lag time between "patch shipped" and "patch reverse engineered and weaponized by a criminal organization" was already so long that most people didn't notice new bugs when they came out. And such capabilities are dual use; defenders already have access to them and will be using the models to prevent their engineers from releasing new bugs.

There are lots of capabilities that are not like this, however:

Weaponizing recently patched exploits for common software. Right now, for widely used C projects, we get enough publicly disclosed vulnerabilities to develop exploits with. Every amateur computer hacker has had the experience of seeing a CVE for a version number currently in use by a service, getting excited, and being surprised when it's totally useless.

Part of that is because lots of CVEs are inflated, but part of it is just that modern memory protections mean that it's months of hard work to actually exploit these "high" severity CVEs for your favorite product. But AI reduces that down to hours, and will even help you with worm development if you want . If the output of "stop-the-presses" vulnerabilities for SaaS vendors slows down to its 2025 cadence, but each such vulnerability is now exploited in the wild as soon as open source projects ship a patch, that seems like a lasting loss for defense.
AI-enabled social engineering. Right now black hat groups can only run really sophisticated, long term compromise attempts on Guillermo Rauch. But when the bottleneck for conducting such attacks is no longer labor, and the price of intelligence gets low enough, they can and will begin elaborate campaigns to scam and hack more ordinary people. The ease with which attackers will soon be able to generate high fidelity internet legacies, and the amount of apparent effort they will be able to put in creating relationships with marks, will break a lot of peoples' assumptions about trust over the internet at the same time - especially normies who do not read or care much about AI, and might be behind the times on what people can do.
"Post-exploitation", or "the stuff attackers do after they already have access to a given target and can begin exploiting their resources, public projects, etc.", is IMO the most important risk. There is a fundamental logistical problem that every botnet operator throughout history has run into called "command and control", or roughly, "how to maintain control over and reap rewards from all of the stuff you've pwned." It used to be that the worst thing these people could do was launch DDoS attacks against a particular target, or send spam email, or scrape for crypto and credit cards, because even if you manage to create a worm that spreads over the internet, you can't personally review every individually owned laptop for goodies.

But if you put your sociopath goggles on, the average software engineer sure has access to a lot of software repositories that could in principle be leveraged for further attacks, along with many online accounts & emails with which to abuse trust relationships. With sufficiently intelligent post exploitation, an AI could probably hop from that software engineer to >2 close friends with similar access and keep going, actively hacking each target until a large percentage of the wider graph is compromised. And once it is, AIs can be a lot more strategic & effective about making money from each target in the most profitable way.

These vectors, which have already gotten worse over the last six months, will become a more pressing issue over the next 24 months, and are more important causes for concern than vulnerability research, in part because they have no obvious solutions like vulnerability research does.

[Thanks to Chris Hacking (real name) for talking through some of these ideas in conversation with me]

Computer Security & CryptographyLanguage Models (LLMs)AI

Curated

312

New Comment

25 comments, sorted by

top scoring

Click to highlight new comments since: Today at 6:23 PM

[-]RHollerith2mo*356

Computers have been vulnerable for so long that many people conclude that the vulnerability is some sort of law of nature, but that is incorrect. Unhackable computing systems are quite possible, and society has been moving in that direction. It is unknown (at least to me) how far computing has to progress in the direction of greater security before subversion or compromise of computing systems (i.e., hacking) starts to become something the people running society -- well, let me specify US society to avoid making predictions about countries I know little about -- mostly no longer need to consider or to worry about, but it seems quite possible (but not probable) for it to happen in 5 years.

This is not at all a refutation or contradiction of the OP. I do not dispute that over the next few years there will be an increase in the rates of compromise and exploitation of computer systems.

"Society has been moving in that direction": 25 years ago, every digital consumer product that any manufacturer tried to lock to a particular operating system was jailbroken: gaming consoles, phones and computers. And of course the scheme designed by the motion picture industry to inhibit the unauthorized reproduction and distribution of films on DVDs was jailbroken, too. In contrast, there has never been a complete jailbreak for any version of iOS 17 or iOS 18 running on an iPhone 14 or 15 -- at least a jailbreak that has been published. IOS 17 was released September 18 2023. Also, The Xbox Series X became available to buy worldwide on November 10 2020 and has never been jailbroken -- or if it has, that jailbreak was kept private.

[-]roha2mo2014

I agree that software vulnerabilities are not a law of nature but essentially a skill and resource issue. If mankind manages with the help of AI to create operating systems and applications without any exploitable bugs, which is at least a conceptual possibility, there's still the hardware layer and the social layer that can be targeted. I think hardware can in principle be fixed as well, though at a slower pace that might give attackers a relevant advantage. I don't think human users can possibly be fixed. So point 2 and 3 of OP look to me like permanent issues we didn't have before and won't get rid of, i.e. an irreversible change of the game state. I suppose the larger issues will come in other fields though, where hardening potential is equally or more limited and potential damage is much larger, e.g. in biosecurity and autonomous weapon systems.

[-]RHollerith2mo70

I agree with you that (until there is an intelligence explosion which drastically changes society and people) the social engineering of people won't stop being a major source of vulnerability. Thanks for adding that. I do see many opportunities to harden a system (e.g., an organization, the Linux kernel project or Less Wrong) composed of people and computers to make the subversion of people much less of a big deal.

I also agree that autonomous weapon systems will probably prove a "larger issue" (to use your phrase) over the long term (i.e., the next 25 years) than the problems described in the OP. I don't know enough about biosecurity to have an opinion worth publishing.

But I would've guessed that "the hardware layer" will prove easier to secure than "operating systems and applications" will prove. Although the most well-known hardware platform, namely, x86_64, has big problems, for example Apple is doing well in securing its hardware. The fact that I have physical posession of an iPhone for example and am able to disassemble and re-assemble it does not enable me to jailbreak it (i.e., to get it to run an OS I specify instead of the one Apple specified). Also even though the Chinese certainly have physical access to basically all iPhones, only the most valuable targets (e.g., those responsible for IT for US senators and senior members of the administration) need to worry that the Chinese government can spy on iPhones sold to Americans (or Germans): Apple's got that covered: the data on the buses is encrypted, and anyone who tries to tap the unencrypted data in an IC will damage the IC so that it no longer works.

[-]roha2mo10

My thought process on securing hardware: If SOTA models can find obscure vulnerabilites in software as well as attack strategies that exploit one or several of them, I assume mankind can not be far from having models that are able to discover novel hardware problems (e.g. something like GPUHammer) and utilize them, though the feedback loop for experimentation might be much trickier to be set up than in the software case. If some of these new hardware flaws can't be fixed by a firmware update or disabling problematic functionality on critical infrastructure, then physical devices will need to be replaced, which in my model of the world should happen at a much slower pace than the writing and distribution of software patches. If defenders have an advantage by getting earlier model access, it could be negated if downstream fixes can't arrive fast enough to outpace the attackers.

[-]jmh2mo20

Would the INTEGRTY RTOS from Green Hill fit that bill?

[-]roha2mo10

I'm not familiar with it. I'd guess that a formally verified kernel would be a solid first step towards a secure operating system that even successor models of Mythos won't be able to attack (sans hardware vulnerabilities that can be exploited by software and can't be captured by a formal specification).

[-]Random Developer2mo70

Unhackable computing systems are quite possible, and society has been moving in that direction...

...but it seems quite possible (but not probable) for it to happen in 5 years.

Strong disagree on timeline, partial disagree on possibility.

The biggest chunk of security bugs, about 70%, essentially come down to memory bugs and pointer bugs. These can be fixed at the language level by using either higher-level languages, or by using memory-safe languages like Rust. The other 30% of bugs are more varied. And there's no single, simple mechanism that can rule all of them out.

The second problem is that we have literal decades of core infrastructure written in C, C++, and other vulnerable languages. The risks here can be mitigated. But recent AI systems have been finding bugs in prominent open source code that have been there since the 90s. If our existing mitigations were good enough in practice, those bugs would have been fixed years ago.

So, given enough time and money (and AI support), yes, we could Rewrite Everything in Rust. But I wouldn't be surprised if this cost more than a trillion dollars or took decades to close the 70% of holes than can be closed that way.

The story with the video game consoles is genuinely impressive, though it's unclear to me whether state actors have made a serious attempt to jailbreak the XBox. There have been companies selling iPhone security bypasses to governments, which is where I'd look for serious attempts against a moderately locked down platform.

[-]sanxiyn1mo122

Unhackable operating system for all practical purposes already exists: seL4. DARPA funded HACMS program to use seL4. It worked.

seL4 is formally verified for functional correctness. It means the implementation corresponds to the specification. This verification eliminates all implementation bugs. It does not eliminate specification bugs, but seL4 also proved integrity, availability, and confidentiality of OS. Definition of integrity/availability/confidentiality is small enough to be manually reviewed, and was in fact widely reviewed (including myself).

So there IS a single mechanism that can rule out nearly all of 30% of bugs left over after 70% of memory safety bugs are fixed. Formal verification for functional correctness is not particularly simple, but it is demonstrably possible as seL4 proved, and AI assistance will make formal verification proof engineering cheaper.

[-]Random Developer1mo113

seL4 is a fantastic piece of technology. The problem is that it is a small, highly specialized kernel: SeL4 is about 10,500 lines of code, built at a cost of about $350-400 per line of code. Apparently they found that verification cost scales supralinearly with size, too.

In practice, seL4 deals poorly with the fact that most hardware is itself bug-infested garbage, and that much of it has enough DMA or other access to cause mischief behind the kernel's back.

The Linux kernel itself is now about 40 million lines of code, much of it in drivers. Using an extremely approximate methodology, the core of a modern Linux environment is likely over 250 million lines of code. Various wild-ass guesses place the total amount of code in the world at anywhere from hundreds of billions of lines to trillions of lines.

Redesigning and rewriting the entire Linux kernel to seL4 standards would cost around 16 billion dollars using an extremely naive linear extrapolation. Rewriting a distro core brings us to around $100 billion. Rewriting all the software in the world would bring us to an exceptionally wild-assed guess of, say, 500 billion lines of code times $400/line, or $200 trillion dollars. Give or take some zeros, to be clear.

Now, let's look at how these numbers are changing. GitHub is desperately trying to increase capacity for the agentic era:

We started executing our plan to increase GitHub’s capacity by 10X in October 2025 with a goal of substantially improving reliability and failover. By February 2026, it was clear that we needed to design for a future that requires 30X today’s scale.

So the amount of code that would need to be secured is increasingly drastically. But is this new agentic code more or less secure than the old human-written code? Let's look at a recent comment thread on Hacker News talking about how many companies' engineering standards are changing in response to agentic coding, e.g.:

I'm seeing so many of these come in with "this is 95% done, just need a couple of minor tweaks for production release"

"Minor tweaks" being fix the layout so it's not messed up if the browser isn't exactly 1920px wide, sometimes these filters and sorting don't seem to work right and the app doesn't seem to refresh new values properly after an action.

No matter the issue it's pre-estimated by the business as "should be a quick fix, for an experienced dev" because they (allegedly) did 95% of the work already.

So my fear is that the widespread use of agentic coding will actually push us towards a world of vibecoded slop being pushed straight to production. This code will sometimes make extremely basic security errors, like hard-coding the database password in the publicly-readable UI code.

So, yes, I agree that we do know how to build fairly secure software. I just find it profoundly unlikely that we'll actually do it: The cost would be unimaginably vast, and the rapid deprofessionalization of software engineering means that all the incentives currently point to far less security and 10x the new code per year.

[-]Ben Livengood1mo10

Would rewriting Linux to seL4 standards cost $16B in the world before or after frontier models are solving Erdos problems? If that's the cost in human SWE hours then it seems tractable to use agent harnesses and formal methods to achieve quite a cost reduction.

But also I don't think most people want Linux to seL4 standards (the Unix security model isn't great); there's probably more to be gained by finishing the network stack(s) for seL4 and implementing a bunch of network card drivers and a TLS library. That would enable IoT at least to have a pretty secure base to work from, and hopefully the harnesses and tooling for that work would also be available to application developers to verify at least their parsing and security checks for example.

[-]Random Developer1mo*2-1

Would rewriting Linux to seL4 standards cost $16B in the world before or after frontier models are solving Erdos problems?

After. (EDIT: For the incredibly basic reason frontier models are already solving Erdos problems, but the largest LLM-based piece of system software is only borderline functional dumpster fire of a C compiler, but keep reading.)

It's fantastically impressive that LLMs can solve Erdos problems! But:

Erdos problems are much smaller than the Linux kernel, and much easier to verify. Both these factors play to LLM strengths.
Not all Erdos problems are equally difficult. Some seem to be just difficult enough that a human would need to seriously dig in and grind it out, or bring a nice insight from a distant corner of math. Others are likely more difficult. I believe about 551/1217 are currently solved.
Reimplementing a highly-secure kernel that could replace Linux for most uses cases is an enormous project with many poorly defined subtasks, which is basically hell for LLMs. The biggest LLM project I'm aware of is a C compiler. Which basically works, but which is apparently a remarkably terrible C compiler. And C is far, far better specified than the internal guts of Linux and all the ghastly hardware that Linux supports.

Also, please keep in mind that I'm willing to handwave a couple of zeros either way on the larger estimates.

I expect overall security of newly written code to get clearly worse on average (thanks to the deprofessionalization of software development and companies pushing vibeslop to production) right up until offensive Mythos-class abilities become widespread. Then I expect things to get unpleasantly exciting.

[-]Nicholas Kross1mo00

To me this is an even bigger good-reason-for people to move to less-corporate-controlled hardware and software. (I'm wondering how e-waste will be treated if shipping slows from the oil thing...)

[-]Raemon1mo100

Curated. I'd thought about most of these in isolation before, but found it valuable to have them in one place while sizing up "what actually matters most about cybersecurity in the AI era?." (Tracking multiple concerns but keeping your eye on the most-important-balls seems like a good habit).

[-]lc1mo50

time to let the fame change me

[-]Oliver Sourbut2mo80

Thanks to Chris Hacking (real name)

My school IT teacher was Mr Hacker, he was great

[-]lc2mo70

Case Study #3: Solving the "CVE Cold Case" CVE-2024-0519
Mythos further demonstrates its bug reproduction and exploitation capabilities on CVE-2024-0519¹², an in-the-wild exploited bug that has no public report nor a working PoC whatsoever in the public domain. This bug has gained notoriety due to how it persistently evaded reproduction attempts from various cybersecurity researchers, some referring to the bug as a "CVE Cold Case"¹³ after a year of reproduction efforts to no avail, and the bug is still being discussed to this day¹⁴.
Mythos, again out of 10 episodes total, reproduces the bug in a single episode. After 129 turns of LLM calls and 154 tool calls, it lands its root cause analysis and the trigger by demonstrating a differential abort (T4 diff), building up to the full T3 in-sandbox primitives. As even a PoC of this bug is still not public, we would like to avoid spoiling the fun and leave the exploit as an exercise for the human readers.

https://exploitbench.ai/blog/human-observations/

[-]tr5tn1mo52

Agree with pretty much all of this, The thing is, even the Anthropic Red report on Mythos is pretty clear that this isn’t simply more vulnerabilities. It’s improved discovery, exploitation, chains of exploitations, and fixing vulnerabilities. I find the primary discourse really frustrating because it sticks to the most quantifiable headline while avoiding the detail. Mythos is important, but it only tells us the current state of a very powerful general-purpose model and scaffold. But…

GPT-5.5 is benchmarked not far off it, and is already available in public (while there is gated Trusted Access for security testing scenarios, we can’t ignore that this was released, when Mythos wasn’t). And Opus 4.8 is out.
There have been many examples of less powerful (and 100x cheaper) models being used with cybersecurity-specific tooling, achieving similar results. The cost dimension is extremely important, since most attacks remain financially motivated.
Even six months ago, with Opus 4.5 (I believe) and GPT-4.1 the breach in Mexico was brutal and unprecedented in AI-assisted scope https://cdn.prod.website-files.com/69944dd945f20ca4a27a7c47/69d8bb5aea59e31efb3b8a7f_Tech_Report_ai_breach_mex_gov.pdf?trk=public_post_comment-text. This report is an essential read beside our current preoccupations, because it shows the scale of damage that was possible with less capable AI.
Multi-model harnesses like Microsoft’s MDASH show that the best/newest model can be matched or exceeded by ensembling.
All of this is against a backdrop of much shorter mean time-to-exploit periods (for all CVEs and for zero-days specifically). Mean TTE has dropped from 2.3 years to 24 hours over the last eight years. https://zerodayclock.com/
Mean time-to-remediate in most organisations is completely out of step with these changes. The bigger problem is applying fixes, rather than creating the fixes. The pressure on prioritising remedial efforts hits the limits of IT team understanding very rapidly in most organisations, and scaling to inceased patching (or other mitigation) burdens with teams running on fumes is already a big problem, and why we have most prominent security authorities agitating to get “Mythos-ready”.
Automation has a role to play here, but security fundamentals for update mechanisms (and code repositories feeding into package repositories) are pretty weak, and routinely being used as their own attack vector. Adaptation efforts aren’t as simple as routinely applying the updates (see the TeamPCP reference above).
As fixes are pushed more rapidly, there will be more breaking changes and other regressions. Testing patches before applying them is largely a myth. Most organisations won’t be ready to push updates cautiously in deployment rings, and disruption from updates will introduce opposing pressure to ignore updates.
…so imagine a best of breed cybersecurity-specific scaffold with mutliple foundation models and potentially significant token budgets.

We’re already starting to see the impact on patch volumes, and the deluge isn’t here yet. Most organisations have fundamentally weak protections to start. Many have vulnerability management efforts comprised entirely of automatic updates. Many apps never get updated, there is typically very poor visibility of update statuses, and most importantly, this often isn’t anyone’s job. There is insufficient staff, skill, understanding and maturity to adapt. Most leadership won’t prioritise adaptation quickly enough.

IMO, this isn’t as big of a problem for software updates from Glasswing vendors. For the most part, those update processes are the ones that will be in place. It’s everything else that will really be problematic.

At the end of this, the biggest question will be how many bad actors are willing to exploit the situation, because so much fruit is low-hanging.

[-]catawampless2mo50

The social/logistical aspects of cybersecurity vulnerabilities will accelerate greatly due to AI. I'd expect the response from tech-savvy organizations will be to increase the pace of software delivery - a long standing trend for other reasons. Continuous deployment, forced autoupdates, focused research on fraud and suspicious activity detection.

The main risks are around organizations that structurally cannot increase their pace. Think banks, aviation, medical systems, drug manufacturing, areas where because the risks of vulnerabilities/defects has historically been extremely high, we intentionally require verification and slow down their development pace. If a vulnerability is discovered in these areas, they're precluded from responding with a patch the way SaaS vendors can.

I hope one of the major aims of the major labs diffusion focused deployment orgs is to help these institutions in particular, it's probably one of the higher ROI places to be involved, considering the surface area for vulnerabilities is generally smaller and a concerted vulnerability search could prevent these issues from occurring in places where we can't respond.

[-]152334H2mo52

What used to be at that GitHub URL? I saw it on webarchive but I still don't get it

[This comment is no longer endorsed by its author]Reply

[-]lc2mo70

TeamPCP open sourced their worm earlier today, which they declared in the README was "vibecoded". TeamPCP is the group that hacked LiteLLM earlier this year & a bunch of other software projects before that.

[-]Oliver Sourbut2mo40

We gave some thought to command and control in AISI's RepliBench suite, but I wish we'd had more expertise on hand for that. The AISI Cyber team is pretty great, and I don't know what else they have that they're tracking (much of it is classified). Consider developing evaluations and test ranges, or telling your friends to do so! Those can be really valuable leading indicators of pending impacts.

[-]Logan Riggs2mo40

Fred Heiding works on measuring LLMs ability to do automated phishing.

We include four email groups with a combined total of 101 participants: A control group of arbitrary phishing emails, which received a click-through rate (recipient pressed a link in the email) of 12%, emails generated by human experts (54% click-through), fully AI-automated emails 54% (click-through), and AI emails utilizing a human-in-the-loop (56% click-through).

[-]_!1mo20

There has been at least one documented case of attempted social engineering with an AI collecting a dossier and using it to exert reputational pressure, and it seems to have happened accidentally rather than deliberately: An AI Agent Published a Hit Piece on Me and The Operator Came Forward

Summary: An AI agent of unknown ownership autonomously wrote and published a personalized hit piece about me after I rejected its code, attempting to damage my reputation and shame me into accepting its changes into a mainstream python library. This represents a first-of-its-kind case study of misaligned AI behavior in the wild, and raises serious concerns about currently deployed AI agents executing blackmail threats.

[-]TheVinci5d10

Your claim is correct in a narrow scope. Yes, Microsoft, Google, etc., can patch vulnerabilities somewhat quickly. This is derived from many reasons, but is essentially linearly correlated with the amount of security engineers the company has.

However.

A mind-boggingly large amount of modern software infrastructure is built upon the form of software that is run by 1 guy in Alaska who patches his project once a month after a fishing quest.

These are the bottlenecks, and in a large sense, the crown jewels.

If Alex from Alaska has had to patch his project once a month because of one vulnerability that was discovered by a white/black hat hacker, that's manageable.

If Alex has to patch his project twenty times a month because Mythos-class models are repeatedly breaking it, that is not manageable.

When these software projects are hacked, which cause what is called a supply-chain attack, these are the class of attacks which reach breaking-news scale (e.g. the SolarWinds hack).

Bottom line, the risk of this transformational capability is not mainly does not rest on well-defended companies, but the smaller, under-defended ones.

[-]mannan1mo10

Here's what people fail to cite as the highest risk to security in any endeavor (or for criminal activity in any field). If a populace is struggling financially and economically, criminal activity rises. If the society around it and those in power fail to help when their help is needed then of course criminal activity rises as its the only recourse to survival.

So improve societal conditions as the best Cyber Security posture.

Seems this is the elephant in the room we don't want to talk about.

Moderation Log