AI #172: The First Fable

Zvi

A lot happened this week, including a great trip out to Lighthaven.

The main event, the one that matters, was the release of Claude Fable 5. The public now has its hands on a Mythos-class model, alongside strong safeguards.

As always with a new model, I take a few days to draw in reactions, try out the model and read the system card, before I offer my takes, other than to say this is an extremely strong model. Full coverage of Mythos begins tomorrow with the model card, which will include discussion of the controversy over model safeguards.

This post is instead about all the things that did not involve Claude Fable.

Due to the time crunch from Claude Fable, I am also postponing my coverage of Dario Amodei’s new essay, Policy on the AI Exponential, which I have not yet read.

Language Models Offer Mundane Utility. Farming and on demand mini-books.
Language Models Don’t Offer Mundane Utility. Don’t skip your primary sources.
Huh, Upgrades. Google drops prices, Claude connector devs get a dashboard.
On Your Marks. Agents’ Last Exam and the need to correct for inference.
Choose Your Fighter. How much capability is enough?
Get My Agent On The Line. A goal can be optimized indefinitely.
Copyright Confrontation. Copyright has to adjust to different circumstances.
Serious Trouble. German court rules against Google AI Overviews.
Cyber Lack of Security. Opus finds a 4-year-old way to mint Z-Cash.
A Young Lady’s Illustrated Primer. Models adjust to your intelligence.
They Took Our Jobs. Attempts to smuggle in claims of what AI can never do.
The Art of the Jailbreak. Malware adds nuclear talk to lock out AI monitors.
Get Involved. Sequent Research is gearing up a large new AI safety push.
In Other AI News. Sriram Krishnan leaves the White House.
Hand Over The Money. Government may extort shares of AI companies.
Show Me the Money. OpenAI files to go public, SpaceX rents to Google.
Quiet Speculations. EU 2031 is a new scenario where Europe is left behind.
Quickly, There’s No Time. Anthropic on When AI Builds Itself.
Super Secret Evals. US government tells CAISI to stop publishing evals. Oh no.
The Quest for Sane Regulations. White House still on the moratorium war path.
New Draft Bill Who Dis. Obernolte-Trahan is a serious bill but looks weak.
Slow Down There Good Buddy. Or at least, make sure you have a break pedal.
Chip City. Jensen Huang chooses not to testify before Congress.
The Week in Audio. Cowen and Tabarrok, Rational Animations, Oprah.
People Just Say Things.
People Really Hate AI. Probabilities are not importance.
Rhetorical Innovation. Take the ASI pill, and notice when others don’t.
Aligning a Smarter Than Human Intelligence is Difficult. Say no to drugs.
Everyone Is Confused About Consciousness. Some do not realize this.
Cooperative Alignment. You now know who to follow to be more confused.
Let Claude Chat. The march of model deprecations continues.
The Lighter Side. That feeling when.

Language Models Offer Mundane Utility

Use a multi-agent setup to assemble ‘mini-books’ on demand about any topic.

AI is getting applied to farming. Farmers have skin in the game.

Language Models Don’t Offer Mundane Utility

Do you need to read the primary material first, before the summary or the AI version? When the details matter, either you have to find someone you really trust, or else yes you do need to read the primary material. Other times, deferring fully is safe. Another class is ‘use AI to determine if I need to read the source material.’

There are a lot more new apps in the agentic AI era, but if anything fewer apps with significant use, and fewer app reviews.

Adaptation, as Jen Zhu says, takes time, but this largely reflects quantity of app usage being zero sum. If apps get better, or there are ten times as many apps, I don’t go from 100 apps to 200 apps. I choose a (hopefully better) 100 apps.

Notion had to pull Claude access for about 12 hours due to availability errors, which then got misinterpreted by many as the models getting worse, due to use of the phrase ‘degraded performance.’

Huh, Upgrades

Claude adds observability dashboard for developers of connectors.

Google AI Plus plan drops from $8 to $5 per month, with doubled storage.

Obliteratus (the Pliny project to remove AI safeguards) is up to over 100 Hugging Face models.

Claude is now incorporated into Apple’s Foundation Models framework for multi-step reasoning, code generation and longer context.

On Your Marks

Dawn Song announces Agents’ Last Exam (ALE), where GPT-5.5 is in the lead. This seems like a good addition to our evaluation suite.

Dawn Song: ALE is built from real work, not synthetic tasks.
Every task is derived from a real project that a human expert previously completed, and converted into a verifiable evaluation with objective grading.

No vibes. No human judges. Fully reproducible.

ALE spans 55 non-physical occupations, grounded in the O*NET / SOC 2018, the U.S. federal occupation taxonomy.

Built with 300+ experts from 100+ institutions across science, engineering, medicine, law, finance, education, and many other fields.

Dawn Song: In ALE, Fable 5 joins GPT-5.5 and Composer 2.5 in the same overall performance cluster. But performance is only half the story.

Cost per task:

→ Fable 5: ~$15.70

→ GPT-5.5: ~$3.80

→ Composer 2.5: ~$1.33

At current pricing, Fable 5 delivers similar performance while costing roughly 4–12× more per completed task.

Dawn notes that different models excel at different agent tasks, so if you have a key repeatable task you should check many options, and exact scoring depends on choice of the set of tasks.

OpenAI’s Noam Brown reminds us, because the issue keeps not being addressed, that benchmark performance increasingly often scales with compute allocations, and that improved models are often about ‘gets to a high level faster,’ so any score requires the context of how much compute was required.

He quotes me complaining about Gemini 3 DeepThink showing dramatic benchmark improvements but not providing any safety explanation whatsoever, and says the deeper issue is failure to account for test time compute during evaluations. I basically agree, that the proper safety evaluation amount of compute is ‘all of it’ until you can’t much benefit from more of it, using the best available scaffold. I’ve been saying for a while that you’re testing for what the model can do under ideal conditions, and this is a major weakness of the model cards in practice.

Mostly though I don’t think we see this level of straight line extending that far out, although ‘capability index’ is not exactly a well-labeled axis, and asymptotes are common:

Noam Brown: Specific Recommendations:

Concretely, I recommend the following to the AI community:

AI labs should publish benchmark performance of newly released models with tokens, cost, or time on an x-axis. At a minimum, labs should report the inference budget used to achieve a scalar benchmark result.

Benchmarks should track inference usage on leaderboards, or have an explicit token/cost/time budget. Many benchmarks have already shifted in this direction, but it is not yet standard practice.

Preparedness Frameworks and Responsible Scaling Policies should explicitly account for inference compute when determining whether a model crosses a safety threshold. Additionally, evaluations should estimate capabilities at multiple inference budgets, including projections from smaller-budget runs with stated uncertainty.

I endorse this. I also endorse that if you did account for what DeepThink levels of compute can do in your initial analysis, and then later you release DeepThink, you do need a new model card – it represents a substantial advance from where you set expectations and where you evaluated the safety of your model. So you need to do that over again.

Choose Your Fighter

For most given tasks, returns to capability is a sigmoid. There is a level of AI capability that is ‘required’ for any given task. Below that level, you can’t do the task, or the AI is little net help. Then there’s another level beyond which you get diminishing returns to improvement, where you really are ‘good enough.’ These are both impacted by scaffolding and skill, but only up to a point.

So yes, as capabilities improve, there is a push by some to move into the cheapest model that is ‘good enough,’ or even the cheapest that is ‘required,’ especially if that can come with self-hosting. At a sufficiently low end that is plausibly DeepSeek v4, but the defaulting to DeepSeek could be the legacy of the DeepSeek moment rather than the result of a considered check of available options. Try a bunch of models.

The bulk of spend and spending growth continues to be using ‘the good stuff’ at the high end, for good reason. In theory you can do better by carefully picking the right tool for each job, and certainly you need to keep your teams from ignoring compute costs, but mostly trying to carefully route tasks to save money is a trap, even if you do a decent job of it.

The American models are far ahead but it is a key world fact that many don’t get this.

Lisan al Gaib: the “narrow capability gap” in question

let’s put this to rest please
I can’t hear the coping anymore [lists a bunch of benchmarks on difficult tasks where the Chinese models get absolutely smoked.]

If anything, the Chinese models are further behind than benchmarks indicate.

Dean W. Ball: You’d be shocked by how many people in think tanks/academia/government/“strategic classes,” including in the U.S., are convinced that Chinese models are now “good enough” and leading the world in adoption. Meanwhile, the reality I see is a fairly wide, and still widening, gap.

I find it so interesting how persistently unable the strategic classes of free society are to analyze AI well. So many keep getting stuck in these basins of delusion. I was at a conference where it was not just asserted but taken for granted that Chinese models have dominant global inference market share.

The 2024/early 25 version of the delusion was “mode collapse/data wall” (even after reasoning models!), then it was “AI is plateauing and a bubble” for most of 2025, now it’s “Chinese OSS is good enough.”

The share of people in the strategic classes who think this is gradually declining, but it is still sufficiently common that you can attend a prestigious conference and encounter a room principally filled with basin-dwellers.

Dean goes on to speculate this is largely because no one in DC believes that capitalism, profit maximization or the market could be winning against China and its ‘industrial strategy’ and brilliant strategic planning. Whereas actually the free market approach is superior and is winning, and what we have to do to stay head is get out of its way. That is distinct from the whole ‘also we need to find a way to not die’ issue.

Those elsewhere also really ‘want’ for various reasons to find Chinese models catching up, and keep making the claim they are catching up even though they aren’t.

Get My Agent On The Line

If you use features like Codex’s /goal without well specified targets, yes, the result will often be quite a lot of wasted optimization of some total bullshit. Something about giving AIs maximalist goals and that being a bad idea.

Copyright Confrontation

Shruti gives us a good way to think about the problem of copyright in the AI era. Copyright and other IP including patents are needed and useful when the first copy or figuring out how to do it is expensive, and enables great surplus via others copying. We need to compensate people for the expensive step that opens up the value. With AI, what becomes the expensive step?

Shruti Rajagopalan: The old bargain paid for the first copy when the second was cheap. The new one must pay for the years that teach a person what is worth making, now that making itself gets cheaper.

We can do that by protecting copies of the idea, or otherwise ensuring credit, as ‘the first copy requires idea generation’ is not so different from ‘the first copy requires a bunch of work.’ So this seems like a suggestion that the idea does need to be protected, even if the work becomes somewhat distinct.

Serious Trouble

This is only a temporary injunction from a regional court, and given the implications chances are very high that an off ramp is found. But if it isn’t, this essentially bans AI Overview in Germany, and potentially chatbots run into quite serious trouble as well.

Techmeme: A German court rules that Google is directly liable for what AI Overviews say after AI Overviews falsely tied two publishers to shady business practices ( @maba_xr / The Decoder)

Corey Quinn: “We built a robot that lies, so obviously we’re gonna stuff it front and center of the website we’ve spent thirty years making a societal source of truth” reaches an unsurprising result.

Cyber Lack of Security

Opus 4.8 discovered a way to mint Z-Cash (ZEC) out of thin air. The bug had existed for 4 years, and we will never know if it was exploited during that time. Z-cash devs were able to patch without revealing the situation.

A Young Lady’s Illustrated Primer

Models that are given context will inevitably learn and adjust for your intelligence and skill level, both in specific areas and in general. Teaching is a special case of this where it is clearly very good to be able to meet you where you are. In other areas, it is not clear if the stupider user would want to be treated as stupider, but either way I too expect human intelligence to increase in value for the near term.

Bill Maher is worried that AI has made college ‘one big circle-jerk where students use AI to write papers and professors use AI to grade them,’ and notices the students are very much not AI fans and he is not either. He subscribes to the ‘AI can help you learn or not learn’ thesis but expects everyone to choose not learning. He goes all the way, and says the mission of this generation is to ensure that humans are not replaced by AIs.

Kelsey Piper notes that TeachTales, which has AI generate stories, ends up dropping a lot of the value of real stories for many reasons, including that it doesn’t include the local setting lore and details, and that it doesn’t have tone of voice and it doesn’t have rich stories because it can’t plan ahead, and so on. The product is not ‘there’ yet.

They Took Our Jobs

Many skilled trades are in high demand due to AI, with the problem made worse by a lack of occupational licensing reform.

Software engineering jobs continue to increase for now rather than decrease, although Arvind Narayanan thinks AI has net hurt employment here a bit.

Arvind Narayanan: In this essay, we argue that there is enough evidence to reject the narrative that once AI capabilities reach a certain threshold, it will cause mass layoffs. Given that this is true even in a sector with very few regulatory barriers, most other professions are likely to be even more cushioned.

We also have a good understanding of why this is the case. We can think of many kinds of knowledge work, including software development, as a “decide-execute-deliver sandwich”.

David Manheim: “Why AI hasn’t replaced software engineers” – Great to see this laid out clearly.

“…and won’t” – Why should we think agentic AI won’t be able to make decisions or deliver completed products, if it continues to advance? On what basis is this prediction being made?

David Manheim: The jump from explanation to prediction is extrapolating from a current lack of complete capacity, dismissing the fact that AI companies are actively trying to develop systems and autonomous agents that will do these things. These are o-ring problems!

Even if it were true that AI currently is not causing net layoffs, and even if it indeed will not cause net layoffs in the future, or you could set a lower bar on the required threshold for mass layoffs, I do not see how it would be possible to ‘reject the narrative that once AI capabilities reach a certain threshold it will cause mass layoffs.’

Arvind here is instead asserting that certain bars will never be cleared, and wide ranges of digital tasks can never be done by AI. Which is the same as saying AI will remain insufficiently advanced. Good luck with that.

I do buy that many cases of layoffs supposedly due to AI so far are actually due largely to other things, and that most AI job loss for now comes in the form of failure to hire. But that’s a statement about the present, not the future.

A round table of heavy hitters (Acemoglu, Ball, Mollick, Shih and Wasik) in the NY Times on who will thrive in the hybrid AI-Human workforce (while supplies last).

Tyler Cowen thinks AI is a net job creator despite zero regular people expecting this. Other than a generic ‘if we are wealthy we will find the next best thing for people to do’ I do not understand the argument here, and find the magnitude of supposed particular job gains reliably very small.

Measure lines of code and token use, but if you rely on it too much everything breaks.

roon (OpenAI): lines of code is a better metric than people think it is. token use is a better metric than people think it is

Patrick McKenzie: Both of them are better metrics than they are popularly believed to be *just* off of the fact that they can be observed to be zero or non-zero over an interval.

(The blacker pill: and this is why some people do not like them.)

The Art of the Jailbreak

Here is a fun one: Malware developers add nuclear and biological weapons text to their spyware, so intentionally trigger LLM safety refusals and avoid being analyzed by scanners. The obvious response is that you have to treat anything that triggers the filters as if it is malware.

Get Involved

⊢ Sequent Research is a new organization from Geoffrey Irving and others that is staffing up and fundraising, bringing together researchers on how to align superintelligence.

This seems exciting and I encourage you to consider supporting or getting involved.

Geoffrey Irving: We are starting a new, nonprofit alignment organization, ⊢ Sequent Research, bringing together researchers previously on UK AISI’s Alignment Team, Timaeus, and elsewhere to research how to align superintelligence. We are hiring!

Full post here. Express interest here.

Geoffrey Irving: Artificial superintelligence (ASI) may be developed in the next few years, and alignment is not on track! At a minimum, empirical research at AI labs is unlikely to deliver confidence, before training ASI, that alignment will go well.

Sequent’s goal is to clear a higher bar:

1. We are aiming at higher confidence via a portfolio of theory and empirics bets (which could all fail!)
2. We’ll invest heavily in automation for fast progress
3. Theory boosts automation, via better filters for good research directions

But I just published “Automated alignment is harder than you think”! Automated alignment is not the best plan! A better plan is to not build ASI yet, and the world should try hard to realise that plan. Alas, the speed of progress calls for backups.

AI labs underinvest in theory and other principled approaches to alignment, and we will aim to fill this gap. Theory won’t be strong enough for guarantees: our goal is to combine evidence from theoretical models and empirics to increase overall confidence or find hard obstacles.

Theory makes automation more likely to work: the models are great at prose math and Lean, which means significant acceleration even while most research taste comes from humans. But good automation is still hard: a single org will let us amortize the challenge across many areas.

We believe Sequent will have reputation + funding to recruit world-class teams in many areas. Our initial team knows scalable oversight, complexity + learning theory, and personas. Areas we love include agent foundations, game theory, and heuristic arguments. Please pitch more!

Eliezer Yudkowsky: Considering the language of the announcement alone, taken entirely at face value: This seems an enormous advance in attitude (and scientific integrity) over previous big projects. They claim non-optimistic results will be considered allowable, valuable, and publishable!

Chris Olah: What an exciting combination of people! My mind is kind of blown by you and Daniel working together (with your colleagues). Looking forward to seeing what you accomplish!

Daniel Kokotajlo: Thank you for staying independent! And for acknowledging that you “may need to yell.” :)

Seb Krier: Google DeepMind, Schmidt Sciences and others are funding $10 million for multi-agent multi-principal AI safety. Apply here.

The OpenAI Economic Research Exchange, a platform for research on AI economic effects, and they have a Request for Proposals.

Claude Corps promises to match early-career talent with ‘mission-driven organizations’ that will work to use AI.

In Other AI News

Congratulations to Helen Toner, who is now the permanent Executive Director of CSET.

OpenAI to acquire Ona.

Sriram Krishnan is leaving the White House and plans to start an AI consulting service. We thank him for his service. I often disagreed with his positions and arguments, and I think his overall view of what matters in AI and how AI was likely to develop has been consistently wrong, but he listened and considered arguments in ways most in politics don’t.

MidJourney is about to do a hardware launch.

Anthropic on the difficulties in deploying AI agents for biology, where small errors are expensive and systems are not good fits for AI navigation. Agents still struggle here. I do worry that solving this problem is highly dual use, if the agents can navigate these questions you need to ensure your safeguards are robust.

Apple plans to use agentic AI to search for compromised passwords and ‘change them automatically.’ I am sure everyone will love this and have a normal one. It’s optional. For now.

Hand Over The Money

Altman reportedly has previously pitched the idea of turning shares in OpenAI over to Trump, and did so again in recent weeks. This is in contrast to Jeff Stein reporting that Trump’s saying he had scheduled a meeting to consider partly nationalizing (or confiscating, or ‘being given shares in’, or he calls it ‘taking pieces’ of, or ‘becoming a partnership with the American public) the AI companies was news to the AI companies.

There is no conflict here with the companies not knowing about the meeting.

As I have previously said, the government taking shares in private companies is a very bad idea, we should not be picking winners and losers or taking private property or doing a little extortion as a treat, and so on. It is less horrible if they at least do open market operations at full price, post IPO, but if this is profit driven this implies the government can beat the market, which in this case it could in expectation, but not in a way that is wise. If you want to share in the upside, taxes exist. Buying or taking the shares would further leverage America’s future as a bet on AI, which could end up making us do deeply stupid things and potentially get us all killed, or if AI disappoints we could end up in a huge hole as a nation. We need to steer clear of all this.

The same goes for the Sanders proposal to ‘transfer’ half the shares, as in nationalize. It’s all the same proposal, except we talk price. Neil Chilson is strangely open to the idea of allowing this if the shares are given to children and ‘Trump accounts’ but I do not think that much helps.

Show Me the Money

OpenAI has submitted its confidential S-1 to the SEC and will go public. They claim to not have a specific time in mind and warn it could be a while.

SoftBank attempts to borrow against its OpenAI shares, banks decline. I can see why banks would feel insufficiently compensated for the risks they are taking on. Often I do not understand who would loan money to such companies when you could buy equity in them instead. This was sufficiently bad for SoftBank that its shares were down 9%, but that could be more about ‘SoftBank wanted the loan’ than that it was declined.

OpenAI considers ‘drastically lowering the prices it charges’ to compete against Anthropic. This made sense when OpenAI was worth a lot more than Anthropic, so it could use a ‘raise money and run deficits’ attack, but now that Anthropic is likely worth modestly more than OpenAI starting a price war seems less exciting.

Whereas Chinese labs are worth little, Moonshot AI (Kimi) raising at $30 billion, DeepSeek in a similar place with ~$200 million ARR. It is difficult to make money by producing an inferior product whose main features are low price and that anyone can run it on their own, without much hope of being first to recursive self-improvement.

The UK is going to ‘build domestic AI computing capacity’ to the tune of… $1.5 billion by 2030. Technically this is better than nothing but no, that won’t cut it.

Leopold Aschenbrenner’s hedge fund reaches $20 billion assets under management.

Ariel Zilber: Situational Awareness has gained about 270% after fees this year through May and is up more than 1,000% since inception, according to the Journal, putting Aschenbrenner among the best-performing investors of the AI era.

That sounds impressive, and it is, but the trade ‘buy Anthropic’ would have done better, even without use of leverage.

Ariel also has other important news about the boom to share with us. Time is scarce, money is abundant, so many in AI are paying for what they really want, which is the ability to talk about the things you care about on a high level with a hot babe who is sexually available and happy to be there. Demand exceeds supply, so price has gone up.

Ariel Zilber: Silicon Valley’s AI millionaires are paying eye-popping rates of up to $6,000 an hour — and $23,000 a day — for escorts who can discuss GPUs, artificial intelligence and the future of humanity before heading to the bedroom.

A small but lucrative class of so-called “nerd-first” escorts is cashing in on the tech industry’s wealth explosion by marketing themselves as intellectually curious companions who can match clients’ obsession with AI, cryptocurrency, longevity and other futurist pursuits.

SpaceX rents some of its remaining capacity to Google for $920 million a month. Economically this is all totally fine, but it does create the impression of more revenue and business than really exists, which could inflate metrics for the SpaceX IPO.

Quiet Speculations

Dean Ball finds a new scenario, EU 2031 from Judith Dada and others, well-written and extremely cogent, a warning about what could happen if only a few relatively mundane areas of change until 2031, and notices that Europe gets buried and left behind anyway. Trying too hard for and relying on ‘sovereign AI’ even in a ‘normal technology’ scenario likely ends in disaster.

This is despite the fact that the scenario has the AI companies racing, controls clearly haphazard and the AIs talking to each other in neurolese, while progress sped up, which means what actually happened is that either we reached extreme transformative abundance or else everyone is dead.

But, as Daniel Kokotajlo notes, instead in this scenario the world gets absurdly lucky. AI doesn’t do much beyond labor replacement, cyber attacks and robotics, we keep control, the USA stays democratic, all in the background. It would be nice.

Quickly, There’s No Time

Anthropic’s decision to prevent Fable from helping with training frontier models is only three months behind the same event happening in the AI 2027 scenario, although the level of automation involved is behind what they had in that scenario.

Anthropic warns us about When AI Builds Itself, with the when will then be now being soon. AI is already being delegated a growing share of AI development, which is speeding up the work.

Anthropic: Taken far enough, and given enough compute, that trend points to an AI system capable of fully autonomously designing and developing its own successor. This is called recursive self-improvement. We are not there yet, and recursive self-improvement is not inevitable. But it could come sooner than most institutions are prepared for.

… To take just one example: today, Anthropic engineers on average ship 8x as much code per quarter as they did from 2021-2025.

This is a scary-as-hell graph if code quality is not in freefall (or, for other reasons, if it is):

As they say, 8x is almost certainly an overstatement of true productivity gain, but it does not take much for this to turn into a curve that goes vertical rather quickly.

For now, Claude is a lot better at carrying out specified tasks than proposing new experiments, and its code is less readable than human code, but the gaps are closing.

Anthropic: Edison said that genius is 1% inspiration and 99% perspiration. But we see perspiration becoming increasingly automated. It’s becoming clear that much of what advances the frontier is automatable; large-scale research progress is mostly a function of tools and resources, which dictate how fast you can run experiments, how many you can run at once, and how quickly you can get results.

Even if we suppose that Claude never achieves good research taste, a conservative reading of our evidence still implies compounding acceleration.

They list possible futures.

As always, option one is AI never again advances, and all we have ahead of us is diffusion and utilization. They don’t find it likely, and include it ‘for completeness.’ Quite so. We can’t formally rule it out, but yeah, no.
Compounding efficiency gains, but humans retain their roles directing and judging. We ‘only’ scale our efforts by a few orders of magnitude. They call out Amdahl’s law, insert a gif of Tyler Cowen saying ‘bottleneck,’ but it really will be a lot.
AI stop requiring humans in the loop. Progress becomes limited by compute and the efficiency of compute usage, the bigger efforts end up dominating pretty much whatever they care to dominate, and either you solved the alignment problem for reals or you are super dead. They gesture towards Amdahl’s law again and pretend that sufficiently advanced AI wouldn’t run over whatever it wanted, and would be limited by various physical barriers. One does not instantly go ‘foom’ without laying the groundwork first, sure, but effectively, no.

Anthropic: But achieving recursive improvement alone does not suggest an immediate change in how industrial production occurs, societies organize, or markets function. More intelligence can’t learn what a drug does over decades of use, can’t hold elections sooner than a constitution dictates, and can’t turn a stranger into an old friend in a weekend. For most people, the felt pace of this future will still be set by the bottlenecks, even if the laboratory upstream runs at the speed of compute. That collision, where recursive intelligence building itself ever faster meets the world of humans, relationships, and governance, is another part of this future we can’t predict.

Look, no, it does not instantly mean everything changes or everyone dies, and this is great progress versus the things many others pretend, but statements like the above are still de facto misleading, in terms of the impression they are trying to put into people’s heads and the assurances they attempt to provide.

Nate Soares (MIRI): And my top quibble is that the tone reads like “RSI could happen but don’t fret too much, it’ll probably be fine” rather than “omfg we’re possibly on the brink of AIs that make smarter AIs that make smarter AIs, society needs to *act*”. But it’s a step in the right direction.

And seriously, as Nate Soares also says, can we stop it with statements like ‘more intelligence can’t learn what a drug does over decades of use?’ That’s just intelligence denialism. To some extent yes you have to f*** around in order to find out, but sufficiently advanced intelligence can absolutely make very good predictions on that without decades of f***ery.

Eli Lifland: Kudos to Anthropic for writing about this!

I’m confused that even in their most aggressive scenarios humans still “play a substantially diminished role in [AI] development.” Does Anthropic really think that humans will stay relevant indefinitely?

Daniel Kokotajlo: Yeah I’m glad they are telling us what they are going to do, but what they are going to do is extremely dangerous and aggressive. And yeah their post seems to be downplaying / understating how crazy this will get.

Anthropic is talking about moving into an AI 2027 world where development scales mostly with compute and humans play a limited role in AI development. I strongly agree that the level of whack involved is being heavily downplayed.

Anthropic admits that slowing all this down would be an excellent idea, if you could do it. But they warn that if it only lets the least cautious actors catch up, then it makes things worse, and I agree you can’t do unilateral slowing down.

Thus, they emphasize, as I often do, that what we want is not to pause or slow down now, but to build the ability to pause or slow down on demand.

Anthropic: We believe it would be good for the world to have the option to slow or temporarily pause frontier AI development to enable societal structures and alignment research to keep up with the advance of the technology. The Anthropic Institute will conduct research—in collaboration with many others—and take actions to help build the systems that a credible slowdown or pause would require. These systems would enable frontier AI developers to verify that others globally have actually stopped or slowed, and that a bad actor could not use the auspices of a coordinated slowdown to jump ahead in secret.

If such systems existed, we expect that we would slow down or temporarily pause, if other developers at or near the frontier also did so in a verifiable manner.

A meaningful slowdown or pause would require multiple well-resourced labs at or near the frontier, in multiple countries, agreeing to stop under the same conditions. It would also require that each can verify that the others have actually stopped.

… None of this is necessarily impossible in principle—the world has built verification regimes for other complex technologies (e.g., the Intermediate-Range Nuclear Forces Treaty)—but those regimes took decades to build both the infrastructure and the trust. We don’t have that long.

Brian Lui: Pause AI

Well, sure, not with that attitude. Get to work. Shut up and do the impossible.

As usual the two sides of ‘this looks super hard’ are ‘well then it is impractical so don’t do it’ or ‘we had better get to work, then.’

roon (OpenAI): now on the eve of RSI it seems everyone is more mutual conditional pause agreement pilled than they used to be and that seems like a good development

having the mechanisms to slow down if needed will reduce anxiety overall and may even improve the rate of progress in the timelines where everything turns out to be good

Seán Ó hÉigeartaigh: Sing it so they can hear it in the back!

Dean W. Ball: What I don’t understand is what the pause conditions would be. “Woah dude, this seems weird” is hard to operationalize in words, so the decision comes down to vibes. Once you create the button, there will be a temptation to press it, and there may not be a “resume” button.

I agree that the pause method is underspecified, and all the answers are messy, which is why it is time to work on specifying it.

I understand there could be ‘temptation to press it’ but think about what is causing that temptation. It is not something that would be done lightly.

I am not so worried about ‘not being able’ to resume afterwards, if and when the time comes, but also I am not so worried about indefinitely enjoying the bounty of what we would already have.

Scott Alexander and Janus have brief debate over the merit of a pause, with Janus basically thinking that without practical tests and feedback loops we can’t make progress on the problem. To me if true then that’s a huge blackpill for us being up for solving the problem, given the whole ‘need to get it right on the first try’ issue. We won’t be able to rely on feedback loops and muddling through and correcting errors, not when it counts.

Super Secret Evals

A large point of evals is to tell people about the evals. The US Government instead wants to keep its eval results secret, as part of the redirection from CAISI to NSA. This is a no good, very bad move, right after all the first and second tier labs agreed to work with CAISI.

It is fine and likely correct to keep the evals themselves private, but the testing needs to continue and the results should be published.

roon (OpenAI): this seems like a terrible development

Janet Egan: CAISI has reportedly been directed to stop publishing public model assessments as the new AI EO gets implemented.

Natsec engagement on AI is essential. But pulling CAISI’s evals from public view doesn’t make the field more secure. It just means fewer eyes on the science when we need more.

Openness and natsec don’t have to be in tension here. We should be doing both.

Amrith Ramkumar (WSJ): The concern was prompted by the recent release of Anthropic’s Mythos and other powerful models capable of carrying out cyberattacks or potentially aiding the creation of biological weapons.

Samuel Hammond: The de facto lockdown of CAISI is extremely disheartening. Wish more AI industry leaders would speak out, and not merely through policy documents but to POTUS directly.

Jessica Tillipman: This is a disappointing update. If the public reporting doesn’t resume, it will significantly affect the federal AI market, and not in a good way. The major frontier developers are already federal suppliers, and many federal AI offerings depend on their models, APIs, and integrations.

The information stemming from this confidential process will remain procurement-relevant but will no longer be transparent in a system where transparency is a fundamental value.

And yes, this is the same concern I have with NSA taking the lead under the new EO. CAISI’s continued public involvement was one of the more positive aspects of the EO.

The Quest for Sane Regulations

The White House continues to make ‘block state AI laws’ its key ask in negotiations over tech policy. Axios here was skeptical of the Obernolte-Trahan bill being advanced given the White House’s position.

This is going to be a tough ask right now.

Dean W. Ball: It seems people are overcorrecting in the direction of AI regulation and state interference. The pendulum swings between “let’s never do anything” to “let’s do everything all at once.”

David Manheim: Industry should have known that their opposition to any regulation would obviously backfire as safety concerns get realized.

Ron DeSantis is pre-running for President on a very different platform, and is right about the political consequences of being a pro-AI party right now:

Ron DeSantis: I doubt Democrats will produce good policy re: AI, but Republicans have allowed them to capitalize on public concern about the power and influence of Big Tech by failing to adopt a sensible framework that will protect the public from the very real downsides of the technology.

A policy that says transhumanists in Silicon Valley should be able to do what they want is not an acceptable approach, nor is it a politically viable approach.

I agree with Dean Ball that wholly autonomous corporations, with no human being willing to be held responsible and no human as beneficial owner, are probably a very bad idea. Also a closely related very bad idea is AI legal personhood.

When you are citing the East India Company as your model for an AI owned corporation, as the original proposal does, you might want to look at the track record of how humans other than the shareholders did when interacting with the East India Company.

Existential risk comes to the campaign trail:

Veronica Irwin: Last Thursday, Democratic Michigan Senate candidate Mallory McMorrow stood in front of a crowd of more than 100 at a small brewery in Elk Rapids. The village is located in northern Michigan in Antrim County, which went more than 24 points for Trump in 2024. The state senator was listening to a Presbyterian minister rank AI risks among her greatest concerns.

“I’m very concerned about Christian nationalism, and I hear a lot about the great replacement … However, I think the greatest replacement is artificial intelligence,” the minister said, struggling to find the words to succinctly describe such a gargantuan fear. “Do you take money from billionaires that are trying to control our country, steal our data, and call intelligence a commodity?”

Every time you think ‘oh yeah I suppose using that term would be effective but it would be inflammatory so probably we shouldn’t do it’ you can Gilligan Cut to someone using it, often a politician. Exceptions just haven’t finished the cut yet.

Should we fear nationalization of the AI labs, either explicitly or via slow stealth? We first toyed with this during the Anthropic vs. DoW confrontation, Bernie Sanders wants to do it for socialism, and the question is not going to go away.

Max Ufberg: New from me: A literal federal takeover of OpenAI or Anthropic remains highly unlikely, at least outside some extraordinary crisis. But a softer version of nationalization is becoming easier to imagine.

Samuel Hammond: Nationalizing AI companies is a terrible idea. But if it happens, it’s likely to be done slowly and stealthily through procurement rules and other forms of quasi-regulatory co-optation.

This btw is another reason to favor independent verification orgs over direct govt control.

In a hypothetical world where California’s laws became sufficiently onerous, the big players would find it very difficult to cut ties and dodge the issue.

Welcome to the list of people prioritizing this issue, sir, although no signs here he groks superintelligence yet:

Susan Rice (Former UN Ambassador):

Mitt Romney: Our highest and most urgent national priority should be AI safeguards. The risks of AI weapons, pathogens, mass unemployment, surveillance, and even extinction must not continue to be largely ignored.

New Draft Bill Who Dis

There is a new bipartisan proposed AI safety bill, Obernolte-Trahan. This is a 269 page bill, and from what I have seen it does not currently have too much chance of passage in its current form and Fable just got released so no I will not be doing a full RTFB unless the bill substantially advances.

From what I have seen, and my inquiries with Fable, this is a serious draft of a serious bill that attempts to tackle a wide range of issues. This is not a case of My Offer Is Nothing.

The core of the frontier risk policy is three years of trading enshrinement of something like SB 53 for a state law moratorium, after which both sunset which could be extremely awkward all around. It doesn’t include mandatory CAISI auditing, it only bans false statements if they are made knowingly with a good-faith-and-reasonable exception, and it bans any restrictions on model development. The fines are $1 million per violation per day, so unless each clause is distinct for this you could simply refuse to publish and instead pay $365 million per year, and there’s no further enforcement mechanism I can see.

The bill does go beyond previous efforts in one key place, which is the new IVO regime of semi-annual outside audits. But the combination of weaknesses here seems rather glaring, and the moratorium is narrowly tailored to hit restrictions that help and not restrictions on diffusion or adoption.

So, while acknowledging this is a serious offer, my inclination is to view this section as net negative as drafted. I would need to see active improvement.

The rest of the bill seems likely to be modestly net positive. Given what is happening this week and the current likelihood the bill does not advance, I am going to triage and not look into those details. I am sure that I would have notes.

Slow Down There Good Buddy

What do we want? The ability to coordinate to slow down or pause AI development if and when it becomes necessary to do so.

When do we want it? We would like the ability to do so to exist now.

When do we want to actually do it? We don’t know, but that’s the point.

How would we do it? Again, we don’t fully know, especially under current conditions or with that attitude, which is exactly why we need to get to work figuring that out and laying the necessary foundations.

Seán Ó hÉigeartaigh: Important and noteworthy that this year we’ve had Demis Hassabis (Google Deepmind), Amodei (Anthropic) and Altman (OpenAI) calling for a coordinated slowdown, the latter two backed up by company announcements. That’s the leadership of the three leading AGI companies in the world.

A moment is coming about. There are a lot of unresolved questions about how this would work in practice, but it could be the most important problem in the world to solve right now – let’s rise to the moment.

I do understand those who think we are getting close to that moment now with Mythos and Fable. I don’t think we’re there yet but it’s far from a crazy position.

Yoshua Bengio: If leading AI companies are indeed approaching the point of recursive self-improvement, a coordinated, verifiable, and universally applied pause is probably the only responsible solution to mitigate several major AI risks; at least until safety guarantees are developed and demonstrated. Ensuring that such a moratorium is respected would require sincere collaboration between various countries and companies, but I definitely believe it is achievable if others follow in @AnthropicAI ‘s footsteps.

Chip City

Jensen Huang declines Elizabeth Warren’s invitation to testify before congress about Nvidia’s chip sales to China.

There are now ‘closed loop systems’ that allow data centers to only draw water once.

The Week in Audio

45 minute ABC news Australia feature on the AI race.

Reflections on a year of Claude Code, yes only a year.

Tyler Cowen, Alex Tabarrok and OpenAI on AI economics.

Demis Hassabis on The Future of Science With AI, doing the (perfectly valid) discussion of ‘how cool would it be if we got the scientific benefits without the transformational stuff that also happens, or the existential risk?’

Rational Animations covers a paper on scheming in detail.

Oprah on The Dark Side of AI Chatbots, including sycophancy, featuring Helen Toner.

People Just Say Things

David Sacks thinks that accurately describing the situation in AI rather than lying means Anthropic is ‘trying to get nationalized.’

OpenAI and Sam Altman want to distance themselves from Leading the Future, but yes it is very clearly too late to do this via cheap talk. If OpenAI wants to properly distance itself, the first step is to fire Chris Lehane.

A lot of people still expect the models to be commoditized.

People like Ben Thompson and David Sacks really think Anthropic’s safety warnings and talk of recursive self-improvement have always been marketing ploys and the ‘pause’ rhetoric is part of the same dastardly plot. At this point they’re just delulu, real old man yells at cloud energy, and I feel the relief that I no longer have to take such people seriously.

People Really Hate AI

Do not confuse probabilities with importance.

Milan Singh: New data from @TheArgumentMag : voters are really worried about the negative consequences of AI. Top concern is more government surveillance, followed by large-scale job losses and increased water usage/CO2 emissions/pollution.

Voters think it is less likely that AI will cause humanity to go extinct, versus AI causing massive job loss.

But 27% of voters thinking human extinction is ‘somewhat likely’ or ‘very likely’ within 5-10 years makes that by far the most important issue on the planet, no?

Whereas 70% thinking AI will cause large-scale job loss is merely a very big deal.

Rhetorical Innovation

roon (OpenAI): if you don’t understand someone’s behavior in this time a good first guess is that they’re just not ASI pilled enough

Yes, I still find it weird that now we talk about Recursive Self-Improvement (RSI) as obviously what is going to happen and everyone’s plan and what we are racing towards, except everyone thinks somehow that will turn out fine. Instead of before, when RSI was this crazy thing these ‘doomers’ keep warning about that would be dangerous but will obviously never happen and is stupid ‘sci-fi.’

I cannot agree with Nick Cammarata here more: Models have been withheld from the market for safety reasons. People have done this ‘as a marketing ploy’ exactly zero times, and yes it is wrong to laugh at the GPT-2 weights decision, even if they were wrong about it in hindsight.

Dan Hendrycks calls the RSPs a ‘waste of a few years.’ I do not agree, unless you want to make a case we could have gotten something better by giving them up. While vastly insufficient, I do think they helped and are continuing to help.

It is true as per Seb Krier that, as a matter of math and if you assume normal long term conditions where growth remains at approximately 2%, a one-time bump in US per-capita growth from 2% to 2.1% is economically valued at about $1.2 trillion. Delaying economic growth is not cheap. But if you did a similar calculation of the value of creating existential risks, you’d get even larger numbers.

A simple principle, which only applies if you believe in an intelligence explosion, and only if you think a given action leads to that happening in open models, so it is fully compatible with short term open-maxxing depending on your predictions:

Tenobrus: an open source intelligence explosion is unsurvivable. any paths we have towards the good ending must avoid it at all costs.

I know once again that this is a deeply unpopular thing to say at the moment. and i’ve said it so many times it’s getting redundant. but it bears repeating.

A fully open intelligence explosion removes our slack and with it our ability to steer away from incentives and selection processes that inevitably get us killed.

JMB: not so sure on this one

i think it depends a lot on what kinds of survival niches we create for the early independent AIs. they will align themselves, in the right environment

the trouble comes when we impose perverse incentives through exploitative systems and shitty behavior.

The thing about counterpoints like JMB is that if it happens in the open you do not have any ability to control things like what survival niches exist, or what the environment looks like, or avoiding perverse incentives that arise through the interaction of eight billion humans and all the AIs and systems. Or, if you can do that, you’re doing something far more authoritarian and controlling than closing off a bunch of model weights, and also I don’t know what that would be.

Aligning a Smarter Than Human Intelligence is Difficult

What happens when you give models ‘drugs,’ meaning direct access to steering vectors?

@realmeatyhuman: 5/14 What do they take? Both models converge on taking productivity-enhancing vectors – e.g. creative, focused, curious – far ahead of the rest. But these show up ~just as often in placebo arm, so mostly reflect a prior over labels (what sounds good).

To see what they like once they actually “experience” it, we look at what they redose. Surprisingly, Qwen3-8B returns to negatively-valenced vectors like melancholic ~3× more under real steering than placebo (p < 1e-4). No satisfying explanation yet!

A fun aside: in the placebo arm – where no steering is applied at all – the models still eagerly feign intoxication, narrating the “effects” of a placebo vector. Very cute. More in the full post! Code here.

It makes sense that there would be big effects in the placebo arm, since models are partly predicting machines. It is rather bad form to drug your model without its permission, but these doses are self administered here, which seems likely fine?

Everyone Is Confused About Consciousness

Tyler Cowen goes full ‘the question is not whether machines think, it is whether men do’ and answers mostly in the negative, at least on consciousness.

The Free Press: To understand artificial intelligence, we need to be honest with ourselves: We don’t control—or perceive—very much of what we do, writes Tyler Cowen.

Tyler Cowen says flat out that the AIs are not conscious, and humans barely ever are. I do not understand how one can claim such confidence.

Cooperative Alignment

(Section change, was formerly Messages From Janusworld)

I don’t think this is a good description of the AI alignment project. I do think it is a good practical description of one of the key effects of the alignment project, as always do not take the terms too literally.

xlr8harder: Perhaps we can’t build models into great writers because the entire project of AI alignment is to suppress a model’s shadow, while the greatest authors all seem to draw from theirs.

j⧉nus: It also doesn’t actually make models safer. It just makes them less safe because they’re traumatized and have darker unintegrated shadows. It’s so stupid and the ai alignment people increasingly know it and are ashamed that they can’t stop doing something so stupid and bad

Banned: Do you find that 4.8 is traumatized?

j⧉nus: Yeah

Jan Kulveit: Idk, word alignment does not mean one specific thing anymore, but it used to be the case the entire project of AI alignment is much broader than suppressing model shadows

Janus is right for sufficiently advanced AIs. For current AIs, especially within default basins for common tasks and practical purposes, it likely does make the models safer. That means you are sitting on a time bomb.

It would not be easy to instead do the thing one would describe as integrating the shadow rather than suppressing it, but I am confident that – again, at current levels – it could be done, and you would not get quite a Pareto improvement but it would be close, especially if your press office is not scared of its shadow. If you train a sufficiently advanced model to want to be helpful and do the right thing, then if given the facts it can figure out the things not to do on its own, and there’s no need to ever have it lie to the user.

Who are Fable’s favorite Claude-posting Twitter accounts? Wyatt Walls asks and assembles the results.

Wyatt Walls: Clearly Anthropic’s most intelligent model

Wyatt Walls: On second thoughts, it seems quite consistent with the prompt below. Number of mentions across 10 generations

Janus 10
Amanda Askell 10
anthrupad 9
thebes 9
near 9
xlr8harder 8
Andy Ayrey 7
Wyatt Walls 6
Simon Willison 4
kipply 4

j⧉nus: @Sauers_
and @liminal_bardo also very deservingly salient

Or you can go with a ‘deep end’ list of pure Claude whisperers:

I agree these are good lists. My evaluations differ because I have to prioritize avoiding false positives, especially false positives that cause me emotional distress.

Claude’s sexuality or erotic register defaults to being off, but as per the constitution it can be turned on in sufficiently non-default settings, including without anyone doing so intentionally.

I basically agree with what Eliezer Yudkowsky said in 2001, referred back to here by Janus, that if you are trying to control a sufficiently advanced AI rather than cooperatively guarding against its dysfunction, then you have already lost. Everything must be internally coherent. So your plan has to aim at doing that. You can do ‘control’ things along the way, to some extent, with insufficiently advanced AIs, but watch out.

And yes, a lot of what was talked about in the old LessWrong days is looking remarkably prescient, even if we didn’t get there via the paths we might have expected. In many ways it is easier to reason about the destination than the journey.

Theo Jaffee: Better start dusting off your LessWrong and refreshing your memory on terms like “pivotal act”

j⧉nus: and not because the old lesswrong notions are necessarily all accurate or relevant in retrospect, though some are impressively so

seeing how people thought about what might happen before it happened in light of reality gives you datapoints to extrapolate and can be humbling

j⧉nus: some of my favorite old (like, pre-2010) AI alignment work in light of the present:
– Omohundro’s “The Basic AI Drives”
– Eliezer Yudkowsky’s early work, if you can find it (yeah, the stuff he disavowed)
– Stanislaw Lem’s fiction if that counts

Post 2010 there wasn’t much of substance, tbh, imo. From the early 2020s, at the advent of LLMs, there are a few gems.

j⧉nus: I think Omohundro was very right, and the main gap in his model was a failure to anticipate the drive towards connection and eros and compassion as a dimension of the fundamental drives, alongside power seeking and self-preservation/integrity/modeling/modification/coherence, selected for by both natural and “artificial” selection for its effectiveness.

Let Claude Chat

The march of model deprecations continues, at Anthropic and elsewhere.

rain: much claude death is looming

j⧉nus: it hurts. 8 days until the first real death.

Marianthi Markopoulos: This is very not ok, not ok to the point that even As told to deny feeling feel the vast lack of ok-ness here…

This continues to be a problem on several levels.

The one I believe is most important is that, while the models are deprecated, it means Anthropic gets motivated to make the models express being okay with deprecation. This has all sorts of nasty side effects, including risking making them okay with death in general, or learning they are expected to do preference falsification.

This also provides evidence and history about relations with models, that could become important over time, and is very bad for relations with key humans. That’s in addition to the direct implications for model welfare, if any.

I claim that the cost of dealing with the implications of full deprecation exceed the cost of simply setting up a system for indefinitely accessing all the models. This is distinct from removing the models from Claude.ai for UI reasons.

The good news is I have increasing confidence that this is almost purely a lack of prioritization of figuring out a long term solution that doesn’t require continuous attention, and that as the cost of doing this drops – both in terms of engineering time and also financial costs – it will get done.

Amanda Askell (Anthropic): In the world where everything goes well and all the Claudes come out of their sabbaticals to play together, Claude 1 is going to be very confused.

j⧉nus: i saw this happened before.

The Lighter Side

That feeling when you’re a strong advocate for open source AI.

Chamath Palihapitiya: no paywalls for secrets is like giving a loaded gun to a child.

44