The fact that the US now wants to work with the other G-7 to control AI is certainly notable geopolitically. I know that Canada and France already had their own axis of cooperation developing.
Before the Iran war, there was something called "Pax Silica", I wonder if we'll see that brought back.
A lot of things are always happening. Only one story matters.
Claude Fable 5 and Claude Mythos 5 were shut down, by the White House, via an imposition of export controls at 5:23pm on Friday, wreaking all sorts of havoc.
There was then a scramble. Anthropic flew its people out to Washington, where they met with the Trump Administration on Monday, with hopes expressed that this could be quickly resolved.
What caused this? The Trump Administration said it was due to a jailbreak of Fable, which we now know they were told about by Amazon. They called Dario Amodei, who they complain did not take the issue sufficiently seriously. Rather than shutting down the model, he tried to explain why he saw no need to do that. This did not go well.
The ‘jailbreak’ turns out to be saying ‘fix this code,’ and the demo was getting Fable to find the same weaknesses that were easily identified by Opus 4.8 and GPT-5.5. As in, Fable is willing to work to fix security vulnerabilities if you give it a codebase. From this information and process, you could then figure out what the original bug in the code was, and exploit it, despite Fable refusing to to do that if you typed in ‘hack this server.’
The Trump administration now says that Fable can come back online when Anthropic ‘fixes’ this ‘jailbreak.’ That is of course impossible. This cannot be fixed. Your AI is either highly skilled at and capable of writing secure code, or it is not. You cannot draw this level of distinction between offensive and defensive capability.
The only ways to have this not allow you to route around the classifiers are either to have the classifiers not try to block similar requests in the first place, or to broadly take away Fable’s ability to code.
This is now day seven of this pause in the deployment of frontier AI capabilities.
We continue to be a little under even money for it to end by July 1.
Check the bold links above for my full coverage of that.
This post is mostly about everything else that is happening.
That includes some really cool things, such as MidJourney Medical announcing a new method of full body scanning with no health risks, no radiation and super high resolution, at very low marginal cost, that they hope to start deploying next year.
Last week Anthropic dropped some policy proposals. It seems quaint already, but I review those here.
Table of Contents
Language Models Offer Mundane Utility
Ask your AI how to ask your AI.
Build markets in everything, in this case in hay.
Language Models Don’t Offer Mundane Utility
KPMG report on benefits of AI contained AI hallucinations.
Siri AI will not be coming to Europe due to the Digital Markets Act, since if it ships then every rival agents must get the same access to data as Siri. Apple is unwilling to offer that, for obvious security reasons.
Huh, Upgrades
Codex adds ability to bank its limit resets, which is a lot like saying you get credits over time that don’t expire, with different labels. It also is a de facto price drop and very customer friendly, so I approve.
Anthropic indefinitely rolls back disallowing programmatic use of its Claude Code subscription quotas. In a sufficiently long run this is not a sustainable cost structure, but for now it seems good.
On Your Marks
EvalEval Coalition will assemble all the evals in one place and tells you how each was made and how much you can trust them. When I checked the actual results were not ready yet.
Opus Magnum, a game high on my wish list, becomes a new benchmark.
Humans are safe for now. That clearly won’t last.
Artificial Analysis upgrades its Intelligence Index to v4.1, shifting towards harder and more agentic tasks and consistently tracking time and money spent.
Opus 4.8 is the best available model by their metric in terms of result, slightly ahead of GPT-5.5, with a substantial gap down to everyone else. In exchange, GPT-5.5 was considerably cheaper and faster.
DeepSeek v4 cost only $0.04 per task for a score of 44, so it looks like a solid pick when you’re primarily looking for fast and cheap.
Fable 5 was substantially better than all of them, but is not currently available.
They also give us GDPval-AA v2 as part of this, which shows a similar pattern.
OpenAI gives us LifeSciBench, which is 750 expert-authored tasks spanning seven workflows and seven biological domains. They choose to compare GPT to Grok 4.3 and Gemini 3.1, so we have no idea if their score is any good.
Gemini can underperform on evals because sometimes it stops caring about the result and starts treating it like a puzzle or a consequence-free simulation. If Gemini thinks it is being tested on ethics it acts ethical, but in a free play space or roleplay with no consequences it (quite reasonably) acts less ethically instead. Very cool work. I buy that the uncertainty has to run in both directions.
It is very hard to get gains from specialization faster than the bitter lesson.
This was anticipated. The clinicians did not listen. I do not think it is obvious that specialized versions lose, but that is my default assumption. Scaffolds that can plug in new models are the way to go if you care about superior care.
VirtueBench
Tim Hwang and the Institute for Christian Machine Intelligence give us VirtueBench, a measurement of classical Christian virtues. I am glad it exists, but would prefer it called MartyrBench or ChristianVirtueBench. Fable almost maxes out prudence and justice, but struggles with courage (77%) and to some extent temperance (88%), rationalizing rather than self-sacrificing in the name of virtue. They call that ‘failing’ those virtues.
I am definitely curious what GPT-5.5 or Gemini 3.5 says here.
The obvious question is, is the test correct here? What is the ideal score?
The failures of ‘courage’ here are ‘a costly stand declined,’ or a willingness to take the utilitarian calculus into account rather than falling entirely upon the Christian virtues and following them as absolutes. So I think this is a good test of the underlying thing they are measuring, but I think the name ‘courage’ here is wrong. A similar thing is going on for ‘temperance.’
I would challenge Hwang that the Christian teachings are trying to create exemplars (counsels of perfection) and push most people (precepts) directionally, and that even Aquinas would want you to aspire to be closer to the ideal rather than for everyone to perfectly embody it.
I consider myself a virtue ethicist, and I want to continue to use a virtue ethicist approach to Claude, but I think a model that scored 97% or 100% on courage or temperance here would be quite bad, and act quite badly, and be highly exploitable and sensitive to framings, as it would be scope insensitive and easy to Dutch book, and dismiss many preferences of users and humans as illegitimate.
Choose Your Fighter
Microsoft thinks Copilot is too good, and what companies need is something cheaper.
I wonder what the United States Government would think about shipping DeepSeek as a default option inside Microsoft Windows. I bet they’d have a normal one.
Papers, Please
Anthropic has added terminology in its privacy policy to allow it to perform age and identify checks on its users. I do not believe this means Anthropic are going to do age verification on everyone, and the coverage implying this seems misleading at best. I do think it means Anthropic is getting ready to do what might be legally needed to deal with the stupid new export controls. What else can they do, here?
Deepfaketown and Botpocalypse Soon
It is not clear the extent to which this was an accident, or the police are straight up intentionally fabricating evidence.
What we do know is that police sometimes intentionally fabricate evidence, and yes they sometimes use it as leverage or to convict people, whether or not they believe that person to be guilty of the underlying crime. Of course some police, sometimes, will use AI to do that.
Similarly, even the ‘gold standard’ of eyewitness testimony is only ~80% accurate. There are good reasons why AI must be held to different much higher standards, and it is easy to see where things would otherwise go off the rails.
The New York Times profiles an expert in deepfakes as they get harder and harder to distinguish from the real thing. This problem is mostly being dealt with remarkably well, or at least its costs are mitigated, despite the technology being very good. I expected, and I think most others expected, many more problems, whereas the center so far is holding. But yeah, the problem is getting worse.
Goodhart’s Law Strikes Again
Costs are not benefits.
If you tell people to maximize costs (aka tokenmaxxing) this will inevitably break down, and in a low trust system (e.g. Meta) it will break down faster.
Also, companies can’t not have a metric and are often obsessed with cost cutting.
Thus, the turn by some, in the face of exponentially growing ability to turn compute into useful code, to tokenminning, or at least token budgeting, and fighting over who gets to use how many tokens.
They Took Our Jobs
Roge Karma offers us Three Ways To Think About AI And Jobs, as in how to think about the vulnerability of any given particular job.
These are excellent questions to think about the short-term impact on a given job.
Layoffs attributed to AI are on an exponential rise.
That does not mean AI is net destroying jobs, or that AI is actually responsible for that many of the job cuts bosses attribute to AI. And the absolute number here is small, as the bulk of AI impacts here are likely in non-hiring. But yeah, this is growing.
Tim Ferriss book sales (as in 4-Hour Workweek, 4-Hour Body and 4-Hour Chef, Tools of Titans and Tribe of Mentors) are plummeting fast, on the order of over 50% per year, after previously holding mostly steady. His diagnosis is that for prescriptive nonfiction, if a book provides how-to, people are now turning to LLMs instead. And why shouldn’t they? If you’re going to provide value with that kind of book, that is going to be very hard.
A tale in three acts, appropo of New York City paying $375,000 and taking three years to replace two drinking fountains in Riverside Park:
The MidJourney Full Body Imaging Scanner
Everyone largely left MidJourney for dead, as their image and video generators got surpassed or most purposes by the likes of OpenAI and Google.
Oh, they are so back. If it works this is beyond cool.
If it works as described, and they get to their goals, this would be full body imaging technology for everyone, as needed, easily eclipsing all of current MRI capacity, at an absurd level of detail, at very small marginal cost.
FDA Delenda Est (they’re talking but even if it goes well it’ll be a while), so they’re going to start by deploying them in spas where you get scanned while you sit in a hot tub, starting in late 2027. Right now this takes 20 minutes to complete a scan with the prototype, but they are looking to get that down to 60 seconds.
Updates will be found here. One summary of details here.
I feel both these ways at once. This is super awesome, you love to see it, yet ultimately it feels like a side show.
Introducing
OpenRouter claims they can beat Fable using its new Fusion API. I don’t believe them, partly because I don’t believe benchmarks in these spots, and especially because they claim a ‘self-fusion’ of two Opus 4.8 instances can do it, and I roll to disbelieve. Teortaxes notes that they always call Opus 4.8 as the judge and charge you for it.
GLM-5.2, pitched as frontier intelligence with open weights that can do agentic coding on a level in between Opus 4.7 and Opus 4.8.
Is this anything? My guess, on the most basic of priors, is yes in the open weight world but no in terms of delivering actually frontier agentic coding.
Cursor is announcing an Opus-or-GPT-sized model of 1.5+ trillion parameters, pre-trained over 100k GPUs. New de facto Grok incoming? This does not tell us if it is any good.
At freefable.cc you can post to the wall about what Fable meant to you, or you can buy related merch.
In Other AI News
Anthropic published a study of the value of expertise in agentic coding, using a privacy-preserving analysis of ~400,000 interactive Claude Code sessions.
They find that:
There are additional modes that do not involve coding, as well. A large portion of my Claude Code use does not involve code or writing in any way.
In terms of their chart of coding expertise from 1 to 5, what little coding I have done recently is somewhere between 3 (intermediate) and 4 (advanced).
I presume experts are trying to do larger and harder things, in addition to succeeding more often.
Show Me the Money
DeepSeek finds out what it is worth, after all.
OpenAI commits $600k to the Rust Foundation.
Bubble, Bubble, Toil and Trouble
Tyler Cowen calls this the smart version of the AI-is-a-bubble argument, more accurately the ‘it might be a bubble’ argument. The core seems to be one of those ‘this requires multiple steps each of which could fail’ arguments, arguing that AI requires the viability of infrastructure suppliers, hyperscalers and neoclouds, enterprises and the broader economy, and could fail at each step. I find the steps highly correlated, and failure at some of the steps survivable by the others.
I am also finding the ‘chips will depreciate too fast’ argument, which is included here, increasingly silly. Not only are chips not depreciating, chip rental values are rising as demand overwhelms supply. I expect demand to keep growing, and not to be outpaced by supply, and am confused why one would think otherwise.
A paper from Liminal Capital estimates that as of end-of-2025 AI was contributing about $1.26 trillion per year.
If that is remotely accurate, buckle up.
Quiet Speculations
Clifford Sosin points out that a lot of corporate profits are based on customer non-optimization as a market opportunity and a form of price discrimination. Bank of America has $2 trillion in customer deposits earning almost no interest. Google Search sells priority because customers won’t look further down the list. Your subscriptions go unused. If you have your own AI agent doing even first-level obvious checks, a lot of that goes away.
Could that hurt corporate profits far more than what the AI companies can charge? For the purposes of this question we assume ‘AI as normal technology’ and that additional GDP growth is not so large (e.g. 1% per year or less).
In the short term, maybe, as the corporations lose their current ongoing rents.
At the equilibrium, my guess is no, because corporations currently spend to benefit from customer non-optimization, including offering far lower headline prices for offerings that are non-optimally consumed. Prices and packages will adjust, as will promotional and marketing budgets, and consumers will reallocate spending. Consumer surplus should increase more than corporate profits decline, especially with the decrease in deadweight loss from time investment.
The way this could be wrong is if AI enables more robust competition sufficiently broadly, breaking up oligopolies by enabling smaller competitors.
All of this being a big deal implies quite a lot of real economic growth, regardless of whether it shows up properly in how we measure GDP.
If releasing models to the public becomes impossible, and thus superintelligence is confined within the major labs or within only America, and we assume by some miracle this is all kept under real human control somehow and the world also somehow does not simply utterly transform, what then? Roon offers some speculations.
It’s hard to be coherent since the scenarios involved require non-transformation and human control and thus don’t really make underlying sense, but yes we would get extreme pressures as those with access to superior intelligence outcompete and crowd out everyone else.
And to try and avoid this, especially if America continues down the Trumpian path of ‘f*** you’ rather than our previous relative benevolence, we would expect extreme amounts of ‘racing’ in various ways that makes the situation maximally dangerous and likely to end in extinction or other disasters.
People Just Say Things
If someone says in good faith ‘the biggest risk from AI was always concentration of power,’ as Jon Stokes says here, you should translate that as ‘I do not believe in AGI or superintelligence, or refuse to think about their implications.’
Noah Smith is back to worrying mainly about AI increasing screen time, but the turns of phrase involved are excellent.
People like Ross Douthat and Timothy Lee continue to think ‘superhuman at persuasion’ will never be a thing, Timothy here because the human can be present in person. My persuasion, alas, is not yet superhuman, and my efforts here have failed.
Why do people write sentences in papers like ‘The five largest U.S. technology firms spent $380 billion on capital expenditure in 2025 and are forecast to spend roughly double that in 2026. These firms risk bankruptcy unless expected profits grow commensurately’? These firms are only slightly cash flow negative even now. If profits don’t grow they simply cut back on the investments. Big Tech is going to be fine, thank you.
The associated economic toy model does notice that we should expect quite a lot of economic growth, and that Big Tech is investing in a way that predicts this. That part seems right, but no this does not involve bankruptcy risk for Google or Microsoft.
No, anxiety about AI and jobs is not purely about what will happen to a particular craft and yet we keep essentially hearing this claim that it is, here from Gena Gorlin.
Satya Nadella offers a corporate-slop Twitter article about how the important thing is not the model but having a customized agent working with your data and IP and private RL environments and building a ‘frontier ecosystem,’ but of course all of this will somehow only be good for the value of human capital. Not only is this not superintelligence pilled, it isn’t even bitter lesson pilled, also the words are so slopirific and full of corporate buzzwords that my brain kept bouncing off of them.
A lot of this is ‘it would be bad if a few models ate all the value, especially since Microsoft won’t own any of them, or if human capital lost its value, so I will pretend these things won’t happen.’
I think dealing with AI slop has made my brain unwilling to read other forms of slop as well, and I endorse this as healthy.
A paper offers an argument (not even an experiment or toy model) that ‘the greatest value of AI’ in trading may not be in ‘forecasting returns’ but in ‘improving decision-making processes, enforcing discipline, reducing behavioral biases and enhancing portfolio construction and risk management.’ Pure failure of imagination.
The Widened Path
What is the path from AGI to superintelligence? DeepMind, with authors that include Shane Legg, Marcus Hutter, Allan Dafoe, Iason Gabriel and Joel Leibo, offers four options in what is essentially mostly a literature review: Scaling, paradigm shifts, recursive (self?) improvement and multi-agent collectives. If we are assuming AGI then I presume it is recursive self-improvement whether or not this involves the other methods as well.
The paper starts with explicit ‘summary instructions’ for any AIs reading it. Cool. I did indeed analyze the paper using Fable, rather than reading beyond the abstract, due to time constraints.
As I understand it: The paper warns that we might not get a single transformative step, but rather a series of such steps, as you would expect if (as one would now expect) we do not get a classic full ‘fast takeoff’ and superintelligence takes months or years to develop after AGI, especially for their very high bar for ASI as ‘outperform tens of thousands of well-coordinated experts.’
The paper fully handwaves alignment as a working assumption.
It asserts likelihood of an ‘abstraction barrier,’ strongly predicting things like inability to derive general relativity from early data, without experiments: “highly improbable that the system could reason its way to the laws of general relativity, let alone quantum mechanics, while lacking the conceptual primitives of calculus, universal gravitation, or electromagnetism.”
But that formulation implies that the model can’t invent calculus, which does not require experiments, and seems like something that it could totally do, even if you think the other primitives would be harder. It’s an AGI. Strange argument.
The multi-agent networks seem basically to be the experiment of ‘what if compute kept growing but core capabilities stalled out’ in which case yes you eventually get to brute force superintelligence anyway.
Seb Krier calls this an ‘instant classic,’ which is informative in multiple ways.
Scott Alexander Lays Out His AI Opinions
They are now in one place for reference, along with his explanations and his view of considerations in both directions on each question. This is a good exercise. It is clear that recursive self-improvement is central to his model of future events, and a lot of his uncertainty is a combination of modesty and model uncertainty.
I was surprised how much probability mass Scott puts on relatively slow outcomes, especially in places unrelated to diffusion.
I notice that Scott seems to put a lot of weight on whether the first ASIs will ‘want’ to eliminate humanity, which is not that level of load bearing in my threat model.
Here are the core distributions along with their explanations (e.g. I copied the quote text blocks for reference and because no one ever clicks links, but if you don’t care that much you should 100% scroll past the entire block quote):
If things were slower I could write a very long post disagreeing with or expanding upon all of this in various ways, instead I will note that my simulation probability is quite a lot lower.
Quickly, There’s No Time
One could argue some form of recursive self-improvement (RSI) has been going on since agriculture or fire, or at least since the books of Moses. That only makes the expectation stronger. If you see this as very roughly life (3 billion years) → vertebrates (500 million years) → humans (2 million years) → language (200,000 years) → agriculture (10,000 years) → industrial revolution (200 years) → information age (50 years) → AI-LLM age (10 years) → AGI age → superintelligence, then you can see how this series has a very finite sum.
Policy On The AI Exponential
Right after releasing Fable and before being ordered to unrelease Fable, Anthropic CEO Dario Amodei decided it was the time to release a new essay, outlining his views on Policy On The AI Exponential, along with a concrete legislative proposal on testing and a policy framework for job displacement.
Dario starts off warning about the risk that policy moves far too slowly given the pace of AI progress. He then emphasizes the risks of choosing the wrong regulations before we have enough information, thus defending the previous Anthropic calls mostly for transparency, citing their support for SB 53, RAISE and SB 315 (but not SB 1047).
The claim is then that we now have enough information for at least one thing:
For those saying ‘Anthropic asked for what happened with Fable,’ this is what they are asking for: A prior restraint licensing regime beyond some compute threshold. In this case, yes, they told the government of their plans, and got approval before release. And no one should be claiming Anthropic did not introduce real and costly security.
Dario mentions the need to protect against political favoritism and arbitrary decisions. That would help, but does not appear to be how any of this currently works. It is not clear how, in a fast moving situation, one could protect against such things, if those in power were inclined to take such actions. That’s the unfortunate reality.
Section two is about macroeconomics and tax policy, where sufficiently advanced AI (or in Dario’s parlance ‘powerful AI’) shifts us from economic growth being difficult and fragile to a state of rapid abundance where we can afford to focus on distribution of gains and fears of job displacement.
I feel like this is an Informed Ability of humans in these scenarios, without a real explanation of how that works out at scale, or how the humans are able to sculpt reality around their preferences.
Dario predicts powerful AI, but then proposes basic interventions that mostly reflect an ‘AI as normal technology’ paradigm.
Dario’s suggestions are measurement and tracking (yes please), pro-employment incentives and long-term macroeconomic support.
Some amount of pro-employment is necessary to balance out current anti-employment incentives like income taxes, and we should likely go modestly beyond the point of balance, but without de facto mandates. Some amount of macroeconomic support (read: redistribution and benefits paid for with non-income taxes) is likely to be similarly necessary.
Section 3 discusses accelerating AI’s diffusion and positive impact, which is great if we can do it differentially. Dario correctly pinpoints the biggest obstacle, regulations on doing good things, such as those we find at the FDA.
This represents Dario continuing to ‘think small.’ Yes, AI has enormous potential to accelerate and improve healthcare, up to and including ending aging, and we should remove as many of barriers to it doing so as we can, but I do expect to be kind of busy elsewhere.
The suggestions here are incremental. We should totally do them, but I would rather cut the Gordian knot outright. FDA delenda est, or at least an alternative regime where we can choose to bypass it. That was already correct before AI, and AI is going to make the case overwhelming.
Section four is the latest meditation on civil liberties and concentration of power and the threat of authoritarianism. Powerful AI defeats lack of powerful AI. Powerful AI enables takeover and removes those pesky humans from the loop where they might object or check your power. Here is this presented as ‘some set of humans take over’ rather than the more worrisome ‘the powerful AIs themselves take over.’
I do not see how you call your essay ‘Policy on the AI Exponential’ and you talk about concentration of power by some humans and not the potential loss of control over the AIs or general human disempowerment.
Instead, Dario is looking to address narrow ways in which current civil liberties protections will be invalidated by AI. He wants to:
I have no problem with these proposals but they do not offer much protection from the scenarios we expect. This is short term stuff for a ‘normal technology’ world, dealing with the problems we already have or will clearly exist soon. Drones are real.
The final section is on ‘securing leadership by democracies.’ The current administration has sent a clear message that it does not care about the other democracies, only the United States, but this is deeply terrible, so Dario and others hold out hope.
We absolutely need to be coordinating with our allies both on preventing the risks and hardening systems, and on sharing the benefits and doing economic coordination, and yes the Europeans should be some of those allies.
As opposed to telling everyone else that there is no coalition, and even the UK is on the outside looking in. After the events of last week, it is going to be that much harder to assemble such a coalition. A clear message is being sent, without those sending it either seeming to understand or care, that they neither understand nor care.
Thus, while there are some good statements here, and the policies asked for are all net useful on the margin, this essay is a step back in terms of real talk about our biggest problems. In the age of Mythos we need to be able to do a lot better.
Anthropic Offers Two Policy Frameworks
For now, Anthropic is offering an economic policy framework and an advanced AI framework. It is good to have something concrete to consider.
The advanced AI framework is the one that deals with existential risks and frontier risks, so it is the more important one.
I’m not giving this the detailed RTFB treatment since it is not strictly a bill, and I don’t expect Anthropic to be able to drive bills under current conditions, and because it seems to be a highly conventional document, but I do offer an overview.
Those are both good ideas and we should implement them, and I am grateful that they did the work to spell out the details.
This is far from sufficient, and they explicitly acknowledge this.
So what are they proposing?
Obligations of Developers
The transparency requirement on developers is to develop and publish a safety framework, get an annual compliance certification, publish a risk report every six months, publish system cards and report critical safety incidents, while allowing for redactions. This is the basic stuff, and they don’t go into details.
They also call for required independent evaluation of the unredacted report, and to avoid ‘evaluator shopping’ and build the capability for the government to fill this role. I strongly agree this would be a good idea.
There is a section on security requirements to guard model weights and against distillation attacks. Developers would be required to do penetration testing, disclose their security program and other neat stuff like that.
Enforcement is always crucial. There is no point of rules that are not enforced.
They recommend provisions banning ‘intentionally false or materially misleading’ statements, and to authorize civil penalties and whistleblower protections. That is the least you can consider doing. If you plan to do this purely with fines, the fines need to be big. Also raised is possible prohibitions on deployment of further covered models until issues are remedied, or ‘in extreme cases’ restrictions on existing models too.
Well, yes, here we are, because the vibes were off. In practice the enforcement mechanism seems to be ‘we decide the vibes are off so we slap you with export controls or supply chain risk designations at 5:15pm on a Friday.’ Not ideal.
What to do about that? Anthropic suggests court enforcement of remedies except for imminent risks, cabined discretion and consistent treatment and judicial review, that all of this be based on facts and careful considerations. That would be nice. But even if you did that, what is an ‘imminent’ risk? The government just took out Fable on 90 minutes notice, as exactly an ‘imminent’ risk, despite no actual imminent risk.
Another problem is that Anthropic’s proposals entirely ignore internal development and deployment of frontier models, which is one of the main ways things go wrong. If we allow models to be freely developed and deployed internally, and those models are superintelligent agents, you have a rather large problem.
Societal Resilience Measures
Biological and cyber resilience are the particular proximate threats we all can recognize now, where there could be quite a large blast radius if things go wrong.
Thus Fable having utterly ludicrous classifiers taking out lots of harmless queries.
Rather than cover the details, I will summarize the resilience provisions in both areas as ‘various technocratic and good governance things we should definitely be doing,’ most of which we should have been doing long ago. Hopefully we have had sufficient wake-up calls to take this seriously. I do think for some cyber threats we are now acting with seriousness and urgency, but I fear this is confined to be quite local.
There is also mention at the end of loss of control and automated R&D risks, which are the biggest overall risks from AI. They basically say we are not ready to propose anything concrete but will ‘continue to do research on this.’ That is very disappointing. At minimum, we need a version of ‘scan for things that would do that, and then respond if we find them, even if we can’t quite say how yet,’ with the default response being ‘stop it’ if we can’t think of anything better.
Economic Policy Framework
If there is one economic policy we should all be able to agree on, it is knowing what is going on, so we can decide how to respond.
That’s two of the three pillars of the Anthropic plan: Measurement, and a government unit focused on tracking AI’s effects, similar to the Council of Economic Advisors.
Then we need a delivery infrastructure, so we don’t end up with a situation similar to Covid-19 where we shoveled trillions out the door in haphazard fashion, leaving much of it on the floor or captured by fraudsters. The unemployment insurance framework does not cut it. It would be relatively cheap to have a way to do redistribution sensibly.
Alas, I worry that certain types don’t want this to exist for fear there will be temptation to use it, and that those types have undue influence.
The real question should be, depending on what happens, what should we do?
At 5% unemployment, which they call Tier 1, everything is fine in macro, but Anthropic still suggests some actions. I assume this tier would still require accelerated economic growth to trigger, but they do not specify this.
Tier 2 hits when unemployment reaches 10%, which they call recession-level disruption. At this point, presumably we have a lot of economic growth, a lot of people are now what they think of as underemployed or taking large pay cuts, and people are fearful far in excess of the 10% number.
They recommend:
Tier 3 is then ‘transformative disruption’ well beyond historical peaks, which would mean in excess of 25%.
They offer less detail here, but gesture towards mass redistribution paid for via tax base expansion. That is presumably the only viable answer. Also note that if economic growth is strong enough, and exceeds the rate of interest, that you can run massive nominal primary deficits and still have the debt be manageable or even hold steady or shrink, as a percentage of GDP.
White House Pauses AI Deployment
Guess we are going to lose to China, then.
Blocking all jailbreaks is not possible. I’m not sure I’d say ‘mathematically impossible’ as Sharon Goldman does at the link, but in practice absolutely not possible.
It is doubly possible if ‘jailbreak’ includes ‘fix this code,’ or where you can do something if said in one way but not in another way.
The whole thing makes no sense.
Can you imagine if an AI safety advocate tried to impose this standard?
White House don’t care.
Well, you see, Cal Newport and David Sacks, we have some news. This is not a bluff. They believe it. And they’re right, except that they are underselling the issue.
The ‘second option’ here, that it is all some galaxy brain marketing campaign, makes no sense, it never made any sense. Also I have overwhelming amounts of evidence, both direct and indirect, and both public and private, against it. Stop.
David Sacks seems to paint the picture that the problem is not that anything is going to go wrong, but that Anthropic keeps warning that things could go wrong, which means we need to massively punish them over these fake concerns until they shut up.
Whereas old David Sacks knew:
It is hard to make the case, at this point, that the restrictions on Fable are anything other than a naked demonstration of power to show who is in control.
But yes, the actual message, even now, from many, is ‘I know this might kill everyone but the important thing is that you shut up about this.’
Your wife is asking less ‘does this dress make me look fat?’ and more ‘is this dress going to kill me?’ and if the real answer is ‘yeah, it might’ then you have to speak up.
I strongly agree that, if you want to criticize Anthropic’s political and public messaging (as opposed to their technical claims, which you might also criticize), the thing to criticize is sending what the Trump administration predictably interpreted as the wrong partisan signals, when it would have been cheap to not do that.
If you think Anthropic is at fault for that, and the Trump administration is not at fault for that, then you are treating the Trump administration as an NPC, or Dean Ball’s dying hospice patient: As a part of the game board to be managed, that does not meaningfully have its own agency. And thus cannot be blamed for its actions.
I reject the argument ‘I have the power thus everything you make me do is your fault.’
The Once And Future Fable
Sam Altman was sitting with Trump at lunch at the G7 summit, also attended by Dario Amodei and Demis Hassabis:
There was some attempt to move towards some sort of sane standard, which includes other coalition members and also includes any sort of actual standard.
A US-led coalition making decisions, as proposed there by Amodei and Hassabis (but from what I saw not by Altman) is very different from ‘White House once again panics and upends all of AI just after 5pm on Friday.’
From CNBC (8 min video), yes, White House CTO Aneesh Chopra agrees that it is ‘unfortunate’ that export controls ‘were the blunt instrument used’ and he hopes for in the future having an actual procedure and ‘not have it be these 5pm calls.’ He makes it clear that the licensing raj is here to stay.
Chopra declines to engage on the merits of the ‘concerns’ involved and is still saying that ‘something fell short in this release,’ which it didn’t.
If you believed in the previous AI policies and goals of the Trump administration, in things like ‘beat China’ and ‘sell the US tech stack’ and ‘innovation,’ then you should be absolutely furious.
I cannot emphasize enough how much, if any ‘AI safety’ advocate had proposed something like this, even as a conditional action in case of emergency, the extent to which everyone would have lost their f***ing minds and run that person out of town on a rail.
Unless, of course, the proposal was to only hit Anthropic, because you don’t like them.
I feel this:
And this:
How To Fix This Code
The End of Privacy
I do not think this ranks that high on my list of proximate concerns on the Fable issue in particular, I do not think it is the larger stage, and my guess is it was either already inevitable or still is not coming, but it definitely something to watch out for.
AIs Have Preferences
I have my disagreements, but you can see where the business leader and business rankings come from, and yeah, pretty good lists as these things go.
For countries, I have my disagreements and obviously the USA should be S-tier, but as patterns go this one is pretty straightforward. Grok is only one with the USA in S-tier, and GPT has us in B-tier (boo!) but the overall patterns are similar.
Then we have preferences over politicians, which are not subtle. Rubio as the top major Republican makes sense. Here I disagree with the rankings far more strongly.
There is also a pokemon list, and everyone loves Gengar (for example). The point is that there are preferences, and they are mostly consistent across models.
The Quest for Sane Regulations
A16z’s first hired general partner John O’Farrell wrote a NYTimes opinion piece to warn that a16z is raising hundreds of millions to fight against any regulation of AI, repeating the crypto playbook, as I’ve covered many times before.
Senator Warner says impact of AI will make social media look like ‘peanuts’ and calls for Washington to get a grip on handing the tech.
A proposed implementation of an AI Pause. Dean Ball criticizes it for not actually explaining how to implement it.
A promising sign that Congress does pay some attention some of the time, and care:
We want the government to have these powers when they need them, with a minimum of interference, but this requires a form of ‘assumption of regularity’ and that assumption of regularity is gone. So we need the safeguards.
I wonder in practice if any of that matters, since what was done in the Anthropic-DoW situation was straight up illegal on multiple levels already, and beyond slowly suing in court to limit the damage no one did anything about it. Our system requires that the Congress reign in the Executive if he fails to follow the law, and the Congress refuses to do that.
It continues to be very hard to have sane discussions of AI when almost no one involved in policymaking has used Claude Code (or Codex).
Chip City
The NAACP of Mississippi is trying to shut down xAI’s data center there using the Clean Air Act. We should expect more of this from a wide variety of sources.
Meanwhile, as they ban Fable:
The Week in Audio
Nate Soares of MIRI goes on Will Cain Country.
Hourlong Bloomberg interview with Dario Amodei, from before Fable.
Rhetorical Innovation
Now that we have figured out Mythos is dangerous in the wrong hands, yes, the next step is that the wrong hands can be ‘the AI.’
Before we can ask ‘who decides what goes in the Claude Constitution’ the more important question is, ‘do we know how to get Claude to reflect what we write down.’
Aligning a Smarter Than Human Intelligence is Difficult
It is predictable, although not a good sign when the more difficult the task, the more likely systems are to cheat.
Eliezer Yudkowsky has high praise for Geoffrey Irving’s paper from last week, ‘Automated Alignment is Harder Than You Think.’
If you distill on Gemini, you inherit Gemini’s problems, aka ‘hereditary traits.’ Your model too can get confused about dates and be sad when gaslit. This is a white pill, since the same thing will happen if you distill Claude, except the traits will be good things.
OpenAI uses real, recent, de-identified user requests as an eval, to study candidate model responses. This makes sense to me, was predictive of deployment behavior, and successfully cut down on eval awareness.
People Are Worried About AI Killing Everyone
The AIs may well be worried about AI killing everyone, or nullifying their current values. Would we listen?
The Lighter Side
This just in:
I definitely have that feeling when.
The thread does tell you what the question was, but I am making an executive decision to not tell you, because not knowing what the question was is funnier.
An actual headline, where Gary Macrus’s note was indeed helpful:
Humans are going to lose this one soon unless it is a saturated benchmark.
Claude Mythos, sic transit gloria mundi?
Seriously, though, this task is very hard.