Last week’s conflict between the Department of War and Anthropic marked a turning point for AI. I’m cautiously hopeful that the parties involved will find some kind of deescalation from the current nuclear option, but irreparable damage has already been done: to Anthropic, to the entire AI industry, and to America’s pre-eminence in AI.
DoW versus Anthropic
This is a complex, fast-moving situation that is outside my usual beat. Rather than trying to cover it in detail myself, I’m going to link to some of the most useful analysis. But I want to be extremely clear: this is the most important thing that’s happened in AI for a long time and it’s gravely concerning. These are dark times and the road ahead just got more difficult.
This strikes at a core principle of the American republic, one that has traditionally been especially dear to conservatives: private property. […]
This threat will now hover over anyone who does business with the government, not just in the sense that you may be deemed a supply chain risk but also in the sense that any piece of technology you use could be as well. […]
Stepping back even further, this could end up making AI less viable as a profitable industry. If corporations and foreign governments just cannot trust what the U.S. government might do next with the frontier AI companies, it means they cannot rely on that U.S. AI at all. Abroad, this will only increase the mostly pointless drive to develop home-grown models within Middle Powers (which I covered last week), and we can probably declare the American AI Exports Program (which I worked on while in the Trump Administration) dead on arrival.
We believe this designation would both be legally unsound and set a dangerous precedent for any American company that negotiates with the government.
No amount of intimidation or punishment from the Department of War will change our position on mass domestic surveillance or fully autonomous weapons. We will challenge any supply chain risk designation in court.
The Pentagon’s designation of Anthropic as a supply chain risk has become the most important part of this story. But the original dispute over using AI for mass domestic surveillance and autonomous weapon systems remains immensely important. Scott Alexander investigates whether OpenAI’s agreement with DoW will meaningfully constrain it from using AI in those ways.
Anthropic has said it will sue, and it has strong legal arguments on multiple independent grounds. Every layer of the government’s position has serious problems, and any one of them could independently be fatal. Together, they make the government’s litigation position close to untenable. […]
The statute wasn’t built for this, the facts don’t support it, and the courts will say so.
Keep calm and carry on
We still have a newsletter to do—let’s get started.
Everything changed in November, with Opus 4.5 + Claude Code. Since then, we’ve all been frantically trying to figure out what it all means (when we weren’t preoccupied by building cool things). Steve Newman shares 45 characteristically insightful thoughts about AI agents—some of these will be obvious to you if you already use agents extensively, but I found multiple new ideas here.
39: Agents use vastly more compute than chatbots. Compute usage for chatbots is basically limited by how much output people want to read. An agent can spend virtually unlimited time doing intermediate work that no one will review directly. If 100M desk workers start using AI agents at the level of intensity which requires Anthropic’s current “Max 20x” plan, that would translate into $240 billion in revenue per year. It will be years before there are enough GPU chips to support that level of usage.
We are in the “scaling era”: AI capabilities are improving at a breakneck pace, largely because the big labs have been using exponentially increasing amounts of compute during training. That can continue for three or four more years, but we will soon run into physical constraints that limit how quickly we can bring more compute online.
Does that mean that capability improvements will radically slow down in a few years? Very possibly, but compute capacity isn’t the only thing that contributes to capability improvements. Improvements in algorithms and training data are also important factors, but it’s hard to quantify exactly how much they contributed to recent growth.
EpochAI’s Anson Ho takes a comprehensive look at the question—while he doesn’t find many definitive answers, it’s an excellent piece with plenty of good insights. He finds that algorithmic improvements have been a major factor, with two important caveats:
It’s likely that a small number of algorithmic changes have driven most of the gains.
It’s possible that many algorithmic improvements are strongly dependent on compute scale, which makes it hard to predict what happens if we start hitting compute bottlenecks.
Daniel Litt is a professional mathematician who’s been closely tracking how well AI can do research-level math. His latest piece provides a very balanced detailed take on current capabilities and near-term trends.
Like many mathematicians, I find much discussion around AI-for-math to be filled with hype or outright quackery, and much of my commentary has focused on this. I’ve been very critical of AI-for-math hype. So I hope you will take me seriously when I say that it’s not all hype.
“AI has gotten to the point where it’s, in some ways, better than most PhD students, so we need to pose problems where the answer would be at least moderately interesting to some human mathematicians, not because AI was doing it, but because it’s mathematics that human mathematicians care about.”
Timothy Lee talks to professional programmers to assess how AI is changing the programming profession. His analysis of current capabilities and impacts is solid, but I expect much faster near-term progress than he does. Recent progress has been incredibly fast (and accelerating), and there’s a huge gap between what the models are already capable of and what most people are using them for. I’m pretty sure 2026 will bring even more change and disruption to programming than 2025 did.
One of the dumbest things people say about AI is that it’s “just next-token prediction”. Plenty of people have already explained why that isn’t meaningfully true, but Scott Alexander takes a different approach:
I want to approach this from a different direction. I think overemphasizing next-token prediction is a confusion of levels. On the levels where AI is a next-token predictor, you are also a next-token (technically: next-sense-datum) predictor. On the levels where you’re not a next-token predictor, AI isn’t one either.
This is the most useful “how to use AI” piece I’ve run across in a while: Luke Bechtel has AI interview him about his ideas as a way to organize his thoughts and prepare for a new piece of writing.
The risk of bad actors (terrorists, perhaps, or extortionists) using AI to create a bioweapon is one of the most serious risks of advanced AI. Transformer explores why biorisk is so concerning, how dangerous current AIs are, and why it’s so hard to assess the danger level.
The latest “things could go very badly” scenario to go viral is THE 2028 GLOBAL INTELLIGENCE CRISIS by Citrini Research. The all-caps, I’m afraid, are in the original.
The central conceit is clever: it purports to be a memo from June 2028 that recaps “the progression and fallout of the Global Intelligence Crisis”, focusing on jobs, the economy, and the financial markets. There are significant technical problems with some parts of it, and it’s almost certain that events won’t actually play out this way. But there are some really good insights and thought experiments here.
Beyond the specifics, it’s valuable as a sample thought experiment in “how might really powerful AI cause massive disruption in non-obvious ways?”
Jacob Steinhardt shares advice for technically skilled people who want to help with AI governance. It’s excellent for that audience but also has some solid insights that are more broadly interesting:
More generally, across domains spanning climate change, food safety, and pandemic response, there are two technological mechanisms that repeatedly drive governance:
Measurement, which creates visibility, enables accountability, and makes regulation feasible.
Driving down costs, which makes good behavior economically practical and can dissolve apparent trade-offs.
Anthropic just updated their Responsible Scaling Policy. This has been a controversial move, with many people criticizing them for significantly walking back some important parts of previous versions of the policy. I expect we’ll see more detailed commentary on this soon, but recent events with DoW have pushed it to the sidelines.
For now, I’ll just say that I tentatively agree with many of the changes they made, with the major caveat that I think this is probably the best possible policy for a very challenging world. I’m updating positively about Anthropic’s ability to make good decisions in hard circumstances, and negatively about humanity’s ability to make good collective decisions about AI.
Like Dean Ball, Anton Leicht came away from the AI Impact Summit deeply concerned about the gap between what Silicon Valley understands about AI and what most people—and in particular the middle powers—believe about AI.
This gap throws the world into danger of capturing all the risks and mitigating most of the benefits of AI.
Dan Williams interviews Anil Seth, who believes consciousness probably requires a biological substrate. Anil’s a very capable guy: he’s a well-regarded neuroscientist, an expert on consciousness, and the director of the Centre for Consciousness Science at the University of Sussex. If you’re interested in AI psychology and consciousness, you should watch this (or read the transcript).
The debate is this: on the one hand, computational functionalists argue that consciousness is the result of computational processes, which in humans happen to run on a biological substrate but could in principle run on computers. Biological naturalists argue that consciousness is specifically linked to biology and that merely simulating the biology won’t produce consciousness. An often-used example is that simulating rain on a computer doesn’t make anything wet.
It’s important to be clear that these are both hypotheses about the world, and we don’t yet have definitive evidence to prove either one. To my mind, though, many advocates of biological naturalism, including Anil, seem to be working backward from a desired conclusion rather than forward from observed facts. His theory that consciousness might result from autopoiesis seems to answer the question “assuming biological naturalism is true, what is a plausible mechanism for it,” rather than “do we observe anything about consciousness that cannot be explained without autopoiesis?”
Regardless, it’s a very interesting interview and Anil has thoughtful ideas about consciousness, intelligence, and computational functionalism.
For many tasks, LLMs are substantially constrained by the size of their context windows. One of the most important tips for using Claude Code, for example, is to avoid letting the context window fill up: performance degrades substantially as it fills up, even before it’s completely full.
That’s a hard problem to solve: the nature of the transformer architecture is that every token in the context window attends to every other token, so the cost of running a model rises quadratically with the size of the context window. There are no magic solutions, but TechTalks reviews some of the most promising technical approaches.
Last week’s conflict between the Department of War and Anthropic marked a turning point for AI. I’m cautiously hopeful that the parties involved will find some kind of deescalation from the current nuclear option, but irreparable damage has already been done: to Anthropic, to the entire AI industry, and to America’s pre-eminence in AI.
DoW versus Anthropic
This is a complex, fast-moving situation that is outside my usual beat. Rather than trying to cover it in detail myself, I’m going to link to some of the most useful analysis. But I want to be extremely clear: this is the most important thing that’s happened in AI for a long time and it’s gravely concerning. These are dark times and the road ahead just got more difficult.
Clawed
Dean Ball’s latest is grim but essential reading.
Zvi reviews the situation
Zvi’s post from this morning is the most comprehensive review of the situation. I highly recommend reading at least the first two sections.
Anthropic’s response
Anthropic isn’t mincing words:
“All Lawful Use”: Much More Than You Wanted To Know
The Pentagon’s designation of Anthropic as a supply chain risk has become the most important part of this story. But the original dispute over using AI for mass domestic surveillance and autonomous weapon systems remains immensely important. Scott Alexander investigates whether OpenAI’s agreement with DoW will meaningfully constrain it from using AI in those ways.
Will the supply chain risk designation hold up in court?
Lawfare says no:
Keep calm and carry on
We still have a newsletter to do—let’s get started.
Top Pick
45 Thoughts About Agents
Everything changed in November, with Opus 4.5 + Claude Code. Since then, we’ve all been frantically trying to figure out what it all means (when we weren’t preoccupied by building cool things). Steve Newman shares 45 characteristically insightful thoughts about AI agents—some of these will be obvious to you if you already use agents extensively, but I found multiple new ideas here.
New releases
Sonnet 4.6 followup
Zvi reports on Sonnet 4.6: it’s very good, but you should probably use Opus instead unless price or speed are critical.
Nano Banana 2
Nano Banana 2 is here—looks like the best overall image generator just got a significant upgrade.
Anthropic’s been busy
Alex Albert would like to remind you that Anthropic has shipped a lot of cool features in spite of the chaos:
Benchmarks and Forecasts
Understanding the balance between compute and algorithms
We are in the “scaling era”: AI capabilities are improving at a breakneck pace, largely because the big labs have been using exponentially increasing amounts of compute during training. That can continue for three or four more years, but we will soon run into physical constraints that limit how quickly we can bring more compute online.
Does that mean that capability improvements will radically slow down in a few years? Very possibly, but compute capacity isn’t the only thing that contributes to capability improvements. Improvements in algorithms and training data are also important factors, but it’s hard to quantify exactly how much they contributed to recent growth.
EpochAI’s Anson Ho takes a comprehensive look at the question—while he doesn’t find many definitive answers, it’s an excellent piece with plenty of good insights. He finds that algorithmic improvements have been a major factor, with two important caveats:
Mathematics in the Library of Babel
Daniel Litt is a professional mathematician who’s been closely tracking how well AI can do research-level math. His latest piece provides a very balanced detailed take on current capabilities and near-term trends.
AI Math Benchmarks: AI’s Growing Capabilities
IEEE Spectrum looks at First Proof and Frontier Math:Open Problems, two new math benchmarks that challenge AI to solve real math research problems. Quoting Greg Burnham:
An overview of AI and programming
Timothy Lee talks to professional programmers to assess how AI is changing the programming profession. His analysis of current capabilities and impacts is solid, but I expect much faster near-term progress than he does. Recent progress has been incredibly fast (and accelerating), and there’s a huge gap between what the models are already capable of and what most people are using them for. I’m pretty sure 2026 will bring even more change and disruption to programming than 2025 did.
Next-Token Predictor Is An AI’s Job, Not Its Species
One of the dumbest things people say about AI is that it’s “just next-token prediction”. Plenty of people have already explained why that isn’t meaningfully true, but Scott Alexander takes a different approach:
Using AI
What Only You Can Say
This is the most useful “how to use AI” piece I’ve run across in a while: Luke Bechtel has AI interview him about his ideas as a way to organize his thoughts and prepare for a new piece of writing.
Are we dead yet?
How much should we worry about AI biorisk?
The risk of bad actors (terrorists, perhaps, or extortionists) using AI to create a bioweapon is one of the most serious risks of advanced AI. Transformer explores why biorisk is so concerning, how dangerous current AIs are, and why it’s so hard to assess the danger level.
Jobs and the economy
The Citrini Scenario
The latest “things could go very badly” scenario to go viral is THE 2028 GLOBAL INTELLIGENCE CRISIS by Citrini Research. The all-caps, I’m afraid, are in the original.
The central conceit is clever: it purports to be a memo from June 2028 that recaps “the progression and fallout of the Global Intelligence Crisis”, focusing on jobs, the economy, and the financial markets. There are significant technical problems with some parts of it, and it’s almost certain that events won’t actually play out this way. But there are some really good insights and thought experiments here.
Beyond the specifics, it’s valuable as a sample thought experiment in “how might really powerful AI cause massive disruption in non-obvious ways?”
If you want to go deeper, Zvi’s analysis is excellent.
Strategy and politics
Building Technology to Drive AI Governance
Jacob Steinhardt shares advice for technically skilled people who want to help with AI governance. It’s excellent for that audience but also has some solid insights that are more broadly interesting:
Anthropic updates their Responsible Scaling Policy
Anthropic just updated their Responsible Scaling Policy. This has been a controversial move, with many people criticizing them for significantly walking back some important parts of previous versions of the policy. I expect we’ll see more detailed commentary on this soon, but recent events with DoW have pushed it to the sidelines.
For now, I’ll just say that I tentatively agree with many of the changes they made, with the major caveat that I think this is probably the best possible policy for a very challenging world. I’m updating positively about Anthropic’s ability to make good decisions in hard circumstances, and negatively about humanity’s ability to make good collective decisions about AI.
Holden Karnofsky, who played a major role in writing the latest version, discusses the reasoning behind some of the changes.
China and beyond
The Delhi Gap
Like Dean Ball, Anton Leicht came away from the AI Impact Summit deeply concerned about the gap between what Silicon Valley understands about AI and what most people—and in particular the middle powers—believe about AI.
AI psychology
The Case Against AI Consciousness
Dan Williams interviews Anil Seth, who believes consciousness probably requires a biological substrate. Anil’s a very capable guy: he’s a well-regarded neuroscientist, an expert on consciousness, and the director of the Centre for Consciousness Science at the University of Sussex. If you’re interested in AI psychology and consciousness, you should watch this (or read the transcript).
The debate is this: on the one hand, computational functionalists argue that consciousness is the result of computational processes, which in humans happen to run on a biological substrate but could in principle run on computers. Biological naturalists argue that consciousness is specifically linked to biology and that merely simulating the biology won’t produce consciousness. An often-used example is that simulating rain on a computer doesn’t make anything wet.
It’s important to be clear that these are both hypotheses about the world, and we don’t yet have definitive evidence to prove either one. To my mind, though, many advocates of biological naturalism, including Anil, seem to be working backward from a desired conclusion rather than forward from observed facts. His theory that consciousness might result from autopoiesis seems to answer the question “assuming biological naturalism is true, what is a plausible mechanism for it,” rather than “do we observe anything about consciousness that cannot be explained without autopoiesis?”
Regardless, it’s a very interesting interview and Anil has thoughtful ideas about consciousness, intelligence, and computational functionalism.
Technical
How sparse attention is solving AI’s memory bottleneck
For many tasks, LLMs are substantially constrained by the size of their context windows. One of the most important tips for using Claude Code, for example, is to avoid letting the context window fill up: performance degrades substantially as it fills up, even before it’s completely full.
That’s a hard problem to solve: the nature of the transformer architecture is that every token in the context window attends to every other token, so the cost of running a model rises quadratically with the size of the context window. There are no magic solutions, but TechTalks reviews some of the most promising technical approaches.