Is 90% of code at Anthropic being written by AIs?

The denominator isn't fixed.

If you have a tool which does a good job at automatically generating a certain type of code, you should probably generate more of that type than you used to. This could easily balloon to 90% of your new code, but only replace a small amount of human coding. You seem to allude to this scenario when you talk about one-off scripts. Another example is automated tests.

[-]Michaël Trazzi3h60

Another Anthropic employee told me that 90% of the code written by AI wasn't crazy. He said something like: "most of the work happens in the remaining 10%".

[-]Dom Polsinelli23m10

This is interesting and especially relevant to AI risk if we are nearing automation of the research process.

That said, I am more interested in what fraction of all code bein deployed is being written by AI. That would be more representative of AGI as it relates to mass unemployment or other huge economic shifts but not necessarily human disempowerment.

[-]J Thomas Moros25m10

As a professional software developer with 20+ years of experience who has repeatedly tried to use AI coding assistants and gotten consistently poor results, I am skeptical of even your statement that, "The average over all of Anthropic for lines of merged code written by AI is much less than 90%, more like 50%." 50% seems way too high. Or if it is then most of that code is extraneous changes that aren't part of the core code that executes. For example, I've seen what I believe to be AI-generated code where 3/4 of the API endpoints are unused. They exist just because the AI assumes that the rest endpoint for each entity ought to have all the actions even though that didn't make sense in this case.

I think there is a natural tendency for AI enthusiasts to overestimate the amount of useful code they are getting from AI. If we were going to make any statements about how much code was generated by AI at any organization, I think we would need much better data than I have ever seen.

Takeaways

First, it's notable that the prediction is probably false in a straightforward sense, but nonetheless Dario seemingly claims the prediction is "absolutely true" without clearly retracting it. I don't think this is really on the journalists here; Dario could have done a much better job avoiding making false claims. Correspondingly, I think we shouldn't put much trust in Dario's off-the-cuff statements, even about relatively straightforward questions like "what percentage of code at Anthropic is written by AIs", if these aren't backed up by something more official and precise. I'd guess off-the-cuff statements are less reliable if Dario would have an incentive to mislead about the situation; in this case, he has an incentive to imply his prediction is true and an incentive to imply AI progress is faster.^[5]

That said, I do think that if you carefully look at the surrounding context of the claim (in particular, including Dario's clarification), you could come away with an understanding that is close to correct. So another takeaway is that it's important to look at exactly what Dario said: he's often straightforwardly misinterpreted! His initial prediction is also often misinterpreted; notably he said "in twelve months, we may be in a world where AI is writing essentially all of the code" rather than saying he expects AI to be writing essentially all of the code in 12 months!

Insofar as Anthropic thinks the fraction of code written by AIs at Anthropic is an important metric of internal automation^[6], they should say what this number is in a more credible public output (e.g. a system card^[7]) with more specificity about how exactly this was measured and what this includes. I currently straightforwardly believe specific, precise, and non-speculative claims by Anthropic in system cards because there are more mechanisms that prevent falsehoods.

I think this is some evidence that Dario is making overly aggressive hype-y predictions about the trajectory of AI and won't straightforwardly admit when his predictions are wrong. I'm more concerned about him not admitting he's wrong, rather than making overly aggressive predictions. In particular, I worry that Anthropic (and Dario) won't admit they were wrong about expecting powerful AI by early 2027, or at least they will admit this late rather than revising predictions sometime in 2026. Unfortunately, Anthropic has an incentive to hype up AI progress and claim their predictions were basically right, while simultaneously part of their story for mitigating catastrophic risk from AI depends on them being a trusted (and trustworthy) communicator about the current (and expected future) level of capabilities and risks.

(I should note that I think society is dramatically underpreparing for catastrophic risks from powerful AI systems, including a substantial chance (~20%) that misaligned AI systems literally take over the world within 10 years.)

Exactly how wrong is this prediction? Again, we'll charitably interpret the prediction as being about Anthropic—though I do think it would be reasonable to consider the prediction to be about all professional software engineering, in which case the prediction looks substantially worse. Dario predicted 90% of code in 3-6 months implying his median is somewhere between 3 and 6, perhaps at 4.5 months. We're now at a bit over 7 months (as of when this post was written) and this hasn't happened, though perhaps a majority of code at Anthropic is written by AIs and teams exist where 90% of code is written by AIs. Also, if you expand what you include very broadly (e.g. to include AIs running scripts once) it might be true. Quantitatively, we might interpret Dario as saying this would be around 65% likely within 6 months of the initial prediction (supposing his median was in fact around 4.5 months) and it currently looks to me like this will take more like 9-15 months for a reasonable operationalization of the prediction. I predicted this wouldn't happen and I'd guess that if I were to have given an overall distribution, I would have won a little over 1 bit of epistemic credit from Dario.

Making predictions is good

I do think it's good that Dario is expressing his views and making (a few) predictions. I'd be excited about Anthropic (or Dario) making nearer-term predictions (e.g. about the next 3 to 12 months) that have relatively clear resolution criteria, especially if these predictions are about more meaningful metrics/properties. For instance: What types of software engineering tasks will AIs be able to do autonomously? How much will AIs be accelerating engineers at Anthropic (to the extent we can measure this)? Making intermediate predictions seems particularly important because Anthropic is predicting the existence of transformatively powerful AI systems within 16 months.

I'm somewhat worried that this post will contribute to an incentive to not make any specific predictions. So I'll try to go out of my way to praise AI company CEOs for making meaningful short-term predictions (if this happens). We should try to create an incentive gradient toward making many meaningful predictions that are possible to adjudicate and admitting when these predictions don't come true.

[Thanks to Buck Shlegeris, Ajeya Cotra, and Daniel Kokotajlo for comments.]

Appendix: relevant sources and adjudication of the prediction

Here are the sources I have about this prediction:

Dario originally made the prediction in this interview: "If I look at coding, programming, which is one area where AI is making the most progress, what we are finding is we are not far from the world—I think we'll be there in three to six months—where AI is writing 90 percent of the code. And then in twelve months, we may be in a world where AI is writing essentially all of the code. But the programmer still needs to specify, you know, what are—what are the conditions of what you're doing, what—you know, what is the overall app you're trying to make, what's the overall design decision?"

In September (6 months after the initial prediction), Dario said "70, 80, 90% of the code written at Anthropic is written by Claude. I said something like this 3 or 6 months ago, and people thought it was falsified because we didn't fire 90% of the engineers." Again this misleadingly implies the prediction came true while also saying it probably didn't (notably, 70% is very different from 90%). I think what happened here is that Dario had a vague sense of what fraction of code was written by AIs and then expressed this in a somewhat misleading/hype-y way.

Here is the source for Marc Benioff interviewing Dario that I quoted above.

At The Curve, an Anthropic employee claimed that 90% of code at Anthropic was written by AIs. I now think this is straightforwardly false.

Other private discussion(s) with Anthropic employees.

I'm adjudicating this prediction assuming that Dario meant: "code written at Anthropic in the course of the sort of work that human employees sometimes do or might otherwise have done, probably as measured by what gets committed to relevant repositories, so not including code generated by AIs during RL training, not including things which are better described as synthetic data, and probably not including one-off code that isn't merged". Ideally, we'd include some types of code that aren't typically merged (though this would be hard to measure), but not literally all lines of code that get output (by humans or AIs). This probably wouldn't include things like AIs procedurally generating RL environments (which probably is better described as synthetic data), but would include AIs developing RL environments insofar as human employees at the company (not contractors) might have plausibly done this work. There are definitely some potential ambiguities here, and the prediction is clearest if we only exclude teams that work on generating RL environments and synthetic data and just include merged code. It should count as 90% of code generated by AI even if this includes large quantities of code which otherwise wouldn't have been written but which are the sort of thing that humans might have done. So it counts even if human employees are doing much more of some type of work they otherwise would have done less of (e.g., vibe coding web interfaces for viewing experiment results) but doesn't count if it's something human employees don't do approximately at all.

The exact quote is: "I think we'll be there in three to six months—where AI is writing 90 percent of the code. And then in twelve months, we may be in a world where AI is writing essentially all of the code." ↩︎
As in, are merged into a code base that other employees might use or which might get deployed in some way. ↩︎
It would be reasonable to talk about this metric, but only if it was pretty clear this was the metric that was being discussed. By default, I think people have something in mind which is closer to "lines of code merged" and possibly also code written to run valuable experiments even if this doesn't get merged. ↩︎
I have a median of around the start of 2032 for full automation of research engineering at the leading AI company, but this happening before 2029 seems plausible. ↩︎
I think there is a pattern of there being misconceptions related to Anthropic that are convenient for Anthropic leadership and don't get corrected despite (parts of) Anthropic leadership knowing they are false. I think the most common and strongest occurrences of this are misconceptions among Anthropic employees that make Anthropic look better from an AI safety perspective. I think the largest of these have been "Anthropic won't push the frontier" and "Anthropic won't impose large risks because we'll follow our RSP". (Or at least, these misconceptions naively make Anthropic look better naively; under a less naive perspective that is accounting for tradeoffs, they might just look like poor choices.) ↩︎
I think it's OK, and somewhat interesting to track over time. ↩︎
System cards are somewhat of a strange place to include this type of information, but it's pretty similar to the sort of information discussed in the "Internal model evaluation and use survey" section of the Sonnet 4.5 system card. If there was some other transparency output that Anthropic regularly produced or updated, then this could be a better fit than putting it in the system card. ↩︎

LESSWRONG
LW

LESSWRONG
LW

35

Is 90% of code at Anthropic being written by AIs?

35

35

Takeaways

Making predictions is good

Appendix: relevant sources and adjudication of the prediction