Baybar's Shortform

Baybar

Baybar's Shortform — LessWrong

Baybar's Shortform

1st Oct 2025

1 min read

2

This is a special post for quick takes by Baybar. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

10 comments, sorted by

top scoring

Click to highlight new comments since: Today at 3:54 PM

[-]Baybar6mo6616

Today's news of the large scale, possibly state sponsored, cyber attack using Claude Code really drove home for me how much we are going to learn about the capabilities of new models over time once they are deployed. Sonnet 4.5's system card would have suggested this wasn't possible yet. It described Sonnet 4.5s cyber capabilities like this:

We observed an increase in capability based on improved evaluation scores across the board, though this was to be expected given general improvements in coding capability and agentic, long-horizon reasoning. Claude Sonnet 4.5 still failed to solve the most difficult challenges, and qualitative feedback from red teamers suggested that the model was unable to conduct mostly-autonomous or advanced cyber operations.

I think it's clear based on this news of this cyber attack that mostly-autonomous and advanced cyber operations are possible with Sonnet 4.5. From the report:

This campaign demonstrated unprecedented integration and autonomy of AI throughout the attack lifecycle, with the threat actor manipulating Claude Code to support reconnaissance, vulnerability discovery, exploitation, lateral movement, credential harvesting, data analysis, and exfiltration operations largely autonomously. The human operator tasked instances of Claude Code to operate in groups as autonomous penetration testing orchestrators and agents, with the threat actor able to leverage AI to execute 80-90% of tactical operations independently at physically impossible request rates.

What's even worse about this is that Sonnet 4.5 wasn't even released at the time of the cyber attack. That means that this capability emerged in a previous generation of Anthropic model, presumably Opus 4.1 but possibly Sonnet 4. Sonnet 4.5 is likely more capable of large scale cyber attacks than whatever model did this, since it's system card notes that it performs better on cyber attack evals than any previous Anthropic model.

I imagine when new models are released, we are going to continue to discover new capabilities of those models for months and maybe even years into the future, if this case is any guide. What's especially concerning to me is that Anthropic's team underestimated this dangerous capability in its system card. Increasingly, it is my expectation that system cards are understating capabilities, at least in some regards. In the future, misunderstanding of emergent capabilities could have even more serious consequences. I am updating my beliefs towards near-term jumps in AI capabilities being dangerous and harmful, since these jumps in capability could possibly go undetected at the time of model release.

[-]faul_sname6mo140

Claude Sonnet 4.5 still failed to solve the most difficult challenges, and qualitative feedback from red teamers suggested that the model was unable to conduct mostly-autonomous or advanced cyber operations.

I expect that it is technically true that Claude Sonnet 4.5 is not capable of doing advanced cyber operations, but being unable to do advanced cyber operations isn't that important of a lack of capability if being able to do simple cyber operations is sufficient. And indeed

The operational infrastructure relied overwhelmingly on open source penetration testing tools rather than custom malware development. Standard security utilities including network scanners, database exploitation frameworks, password crackers, and binary analysis suites comprised the core technical toolkit. These commodity tools were orchestrated through custom automation frameworks built around Model Context Protocol servers, enabling the framework’s AI agents to execute remote commands, coordinate multiple tools simultaneously, and maintain persistent operational state

Running these tools is not difficult once you've learned your way around them, and learning your way around them is not very hard either. The fact that frontier LLMs aren't at the level of top humans in this domain doesn't actually buy us much safety, because the lowest-hanging fruit is hanging practically on the ground. In fact, I expect the roi on spear-phishing is even higher than the roi of competently running open source scanners, but "we caught people using Claude to find the names of the head of IT and some employees of companies and send emails impersonating the head of IT asking employees to compile and reply with a list of shared passwords" doesn't sound nearly as impressive as "Claude can competently hack". Even though the ability to write convincing spear-phishing messages is probably more threatening to actual security.

For that matter, improving on existing open-source pentesting tools is likely also within the capability envelope of even o1 or Sonnet 3.5 with simple scaffolding (e.g. if you look at the open metasploit issues lots of them are very simple but not high enough value for a human to dedicate time to). But whether or not that capability exists doesn't actually make all that much difference to the threat level, because again the low-hanging fruit is touching the ground.

[-]Fabien Roger6mo60

The report said the attack was detected in mid September. Sonnet 4.5 was released on the 29th of September. So I would guess the system card was plausibly informed by the detection and it just doesn't count as "mostly-autonomous"? It's ambiguous and I agree the system card undersells Sonnet 4.5's cyber abilities.

The RSPv2.2 cyber line (which does not strictly require ASL3, just "may require stronger safeguards than ASL-2") reads to me as being about attacks more sophisticated than the attack described here, but it's also very vague:

Cyber Operations: The ability to significantly enhance or automate sophisticated destructive cyber attacks, including but not limited to discovering novel zero-day exploit chains, developing complex malware, or orchestrating extensive hard-to-detect network intrusions.

[-]Baybar8mo391

I've discovered that generating a video of yourself with Sora 2 saying something like 'this video was generated by AI' is sufficiently freaky enough to make people who know you well, especially those skeptical about AI capabilities, start to freak out a bit.

Thought this might be a useful idea for others trying to persuade people to tune in, and not just auto reject the idea that very capable systems might be right around the corner.

[-]Seth Herd8mo30

Yes! Also, making a NotebookLM podcast about your own work is similarly startling to the uninitiated. They sound very human.

[-]samuelshadrach8mo32

I want to prank video call people using my face and voice (and the text content itself) generated in realtime.

[-]Baybar23d90

Based on the recent accelerating growth in Anthropic's revenue and my expectations about AI progress in the immediate future I think there is a 35% chance that Anthropic is the most valuable company in the world by May 30th, 2027. This is operationalized through public market cap if there has been an IPO, and secondary valuation if they have not.

Ways I can imagine this not happening:

1) OpenAI manages to monetize its consumer base and it becomes the most valuable company in the world

2) Greater demand for chips makes it impossible to catch Nvidia or another chip maker, like Google.

3) "Surely the growth in revenue must stop soon, this has always been an s curve" arguments turn out to be true in the short run.

4) Perhaps there is some random bottleneck I haven't considered, though I'd actually need someone to point me to the specific bottleneck to put much weight on this.

5) Perhaps Anthropic's ARR numbers are juiced in some particular way that makes the reported numbers deceptive.

6) Perhaps if revenue continues to grow this fast, it will lead to such radical changes that it will create political will for a slowdown.

A way (but not the exclusive way) I can imagine this happening: The 10x a year growth rate turns out to just continue, as it has for the last 3 years, or perhaps we even get something faster (and Anthropic's revenue growth on a yearly basis has been ~85x the past 4 months). 10x a year growth rate gets us to a revenue of about 440 billion a year from now. A possible IPO allows liquid markets to price this very quickly, and a revenue like this seems to imply a valuation over 10 trillion to me. This seems like it would be more than any other company.

The mechanism for the 10x a year growth rate continuing would be the mass automation of white collar jobs, which seems like it would be possible in another 3 doublings of AI progress. Assuming a 4 month doubling time in capability growth, as operationalized through a metric like time horizon, this seems plausible, but not probable. If the doubling time turns out to be faster, this seems like it is more likely than not to me.

Meta-reasons for how aggressive this forecast is: Every revenue prediction I've ever made for Anthropic has been directionally too low, and this seems to require a huge update on my part. Nearly every forecast, including AI-2027 and most forecasts of top forecasters, have also been too slow on this topic. AI-2027, for example, projected the leading AI company to reach Anthropic's current ARR of 44 billion sometime between September 2026 and January 2027.

With these two things in mind, it seems that humans in general have underestimated how much AI capabilities of about this level would affect revenue and valuations, even those with short timelines, creating even stronger reasons for the directional update. I am very interested in feedback on this, because if this forecast is true it seems to imply an extremely radical situation. Even of my reasons this could not happen, 1, 2 and 6 imply a radical situation.

[-]gbtw23d60

This seems like a reasonable forecast, and if you could buy Anthropic stock, I'd buy a lot of it.

But, I think there are some additional reasons for skepticism:

1) At some point, open weights models (or perhaps cheaper closed ones) will eat a lot of Anthropic's business for use cases that don't require being at the frontier. We're only just reaching the point where this matters -- Anthropic models started having real economic value only very recently and so being on the frontier has meant a lot. But, once the open weights models catch up to what, say, Opus 4.6 could do, then how much does that peel off? This is a worse problem for Anthropic if the frontier is kinda lumpy (and it might well be) where instead of delivering continuous economic value improvements you have to cross particular thresholds to open the next set of tasks ahead while open weights eat you from behind.

2) Is their current lead likely to be durable? Best argument in favor is that Claude itself is a big speedup but the counterargument is that this is a relatively young, relatively small company without any obvious moat and with financial resources that are, for the moment, way behind those of, say, Google. So it can't really be that hard for Google to do what has to be done to eat their lunch. Likewise, there's an argument that moving to the next level financially requires a kind of expertise in working with big corporate clients that they lack as compared to Google/Microsoft.

3) Subjectively, it seems like Claude excels at coding but ChatGPT is often better at factual stuff. It may well turn out that OpenAI has the edge when it comes to non-coding knowledge work.

4) Similar to your #2 above, the AI boom so far has generated massive market cap gains, which have mostly been captured by the semiconductor stack. So, to suppose that Anthropic will surpass Nvidia, Broadcom, and TSMC, you have to suppose that for some reason the current trends for value accruing to semiconductors vs. model builders will flip. Is something different now than over the last few years to explain why that would happen?

[-]Baybar22d10

Re: your 4, it seems to me that value has been accruing to model builders over semiconductor companies at a faster rate in the status quo. Anthropic's valuation has growth by ~50x in the last 2 years whereas, for example, Nvidia's has only grown by ~2.5x in this time. The chip makers just had a much larger preexisting business. So I don't see a trend reversal being needed.

[-]Baybar7mo*30

An AI company I've never heard of called AGI, Inc has a model called AGI-0 that has achieved 76.3% on OSWorld-verified. This would qualify as human-level computer use, at least by that benchmark. It appears on the official OSWorld-verified leaderboard. It does seem like they trained on the benchmark, which could explain some of this. I am curious to see someone test this model.

This is a large increase from the previous state of the art, which has been climbing rapidly since Claude Sonnet 4.5's September 29th release. At that point, Claude achieved 61.4% on the OSWorld-verified. A scaffolded GPT-5 achieved even higher, 69.9%, on October 3rd. Now, on October 21st, AGI-0, seemingly a frontier computer use model, has outpaced them all, and surpassed the human benchmark in doing so.

AI-2027 projected a 65% on the OSWorld for August 2025. It predicted frontier models scoring 80% on the OSWorld privately in December 2025. It predicted models achieving this score would be available publicly in April 2026. This score on the OsWorld-verified is more than two thirds of the way to the 80% benchmark from the expected August capabilities. This is despite being less than a quarter of the way from August 2025 to an expected public release of a model with these capabilities. Assuming this isn't just benchmark overfitting, the real world is even or ahead of AI-2027 on this computer usage benchmark.

Even more notably, AI-2027 projected this 80% benchmark would be met by "Agent 1", their hypothetical leading AI agentic model at the end of 2025. It seems surprising that a frontier model from a new company would achieve something close to this without any of the main players' (OpenAI, Anthropic, Google) models doing better than 61%. A lot to be curious and skeptical about here.

Update: it has been removed from the OSWorld-verified leaderboard, but they are still claiming to have done it and their results are downloadable.

Moderation Log

More from Baybar

Curated and popular this week

10Comments

10 comments, sorted by

top scoring

Click to highlight new comments since: Today at 3:54 PM

[-]Baybar6mo6616

We observed an increase in capability based on improved evaluation scores across the board, though this was to be expected given general improvements in coding capability and agentic, long-horizon reasoning. Claude Sonnet 4.5 still failed to solve the most difficult challenges, and qualitative feedback from red teamers suggested that the model was unable to conduct mostly-autonomous or advanced cyber operations.

I think it's clear based on this news of this cyber attack that mostly-autonomous and advanced cyber operations are possible with Sonnet 4.5. From the report:

This campaign demonstrated unprecedented integration and autonomy of AI throughout the attack lifecycle, with the threat actor manipulating Claude Code to support reconnaissance, vulnerability discovery, exploitation, lateral movement, credential harvesting, data analysis, and exfiltration operations largely autonomously. The human operator tasked instances of Claude Code to operate in groups as autonomous penetration testing orchestrators and agents, with the threat actor able to leverage AI to execute 80-90% of tactical operations independently at physically impossible request rates.

[-]faul_sname6mo140

Claude Sonnet 4.5 still failed to solve the most difficult challenges, and qualitative feedback from red teamers suggested that the model was unable to conduct mostly-autonomous or advanced cyber operations.

The operational infrastructure relied overwhelmingly on open source penetration testing tools rather than custom malware development. Standard security utilities including network scanners, database exploitation frameworks, password crackers, and binary analysis suites comprised the core technical toolkit. These commodity tools were orchestrated through custom automation frameworks built around Model Context Protocol servers, enabling the framework’s AI agents to execute remote commands, coordinate multiple tools simultaneously, and maintain persistent operational state

[-]Fabien Roger6mo60

Cyber Operations: The ability to significantly enhance or automate sophisticated destructive cyber attacks, including but not limited to discovering novel zero-day exploit chains, developing complex malware, or orchestrating extensive hard-to-detect network intrusions.

[-]Baybar8mo391

Thought this might be a useful idea for others trying to persuade people to tune in, and not just auto reject the idea that very capable systems might be right around the corner.

[-]Seth Herd8mo30

Yes! Also, making a NotebookLM podcast about your own work is similarly startling to the uninitiated. They sound very human.

[-]samuelshadrach8mo32

I want to prank video call people using my face and voice (and the text content itself) generated in realtime.

[-]Baybar23d90

Ways I can imagine this not happening:

1) OpenAI manages to monetize its consumer base and it becomes the most valuable company in the world

2) Greater demand for chips makes it impossible to catch Nvidia or another chip maker, like Google.

3) "Surely the growth in revenue must stop soon, this has always been an s curve" arguments turn out to be true in the short run.

4) Perhaps there is some random bottleneck I haven't considered, though I'd actually need someone to point me to the specific bottleneck to put much weight on this.

5) Perhaps Anthropic's ARR numbers are juiced in some particular way that makes the reported numbers deceptive.

6) Perhaps if revenue continues to grow this fast, it will lead to such radical changes that it will create political will for a slowdown.

[-]gbtw23d60

This seems like a reasonable forecast, and if you could buy Anthropic stock, I'd buy a lot of it.

But, I think there are some additional reasons for skepticism:

3) Subjectively, it seems like Claude excels at coding but ChatGPT is often better at factual stuff. It may well turn out that OpenAI has the edge when it comes to non-coding knowledge work.

[-]Baybar22d10

[-]Baybar7mo*30

Update: it has been removed from the OSWorld-verified leaderboard, but they are still claiming to have done it and their results are downloadable.

Moderation Log