And then here is the full response from Sam Altman [to Anthropic's ad]
There was so much to unpack in that one. The line about how it's "on brand for Anthropic to use a deceptive ad to critique theoretical deceptive ads that aren’t real" takes the cake, of course. Amazing stuff.
Anthropic and the Pentagon are clashing, because the Pentagon wants to use Claude for autonomous weapon targeting and domestic surveillance, and Anthropic doesn’t want that.
Feels important to note that this is a (minor) positive update on Anthropic for me, worth a hundred nice-sounding Dario essays and Claude Constitutions. I expect them to completely cave in after a bit, hence it being only a minor update. But at least they didn't start out pre-caved-in.
Remember OpenClaw and Moltbook?
One might say they already seem a little quaint. So earlier-this-week.
That’s the internet having an absurdly short attention span, rather than those events not being important. They were definitely important.
They were also early. It is not quite time for AI social networks or fully unleashed autonomous AI agents. The security issues have not been sorted out, and reliability and efficiency aren’t quite there.
There’s two types of reactions to that. The wrong one is ‘oh it is all hype.’
The right one is ‘we’ll get back to this in a few months.’
Other highlights of the week include reactions to Dario Amodei’s essay The Adolescence of Technology. The essay was trying to do many things for many people. In some ways it did a good job. In other ways, especially when discussing existential risks and those more concerned than Dario, it let us down.
Everyone excited for the Super Bowl?
Table of Contents
Language Models Offer Mundane Utility
Claude planned the Perseverance rover’s safe drive across the surface of Mars.
Elon Musk, eat your heart out?
If all else fails, as long as you have a way to evaluate, you can turn more tokens into better results using Best-of-N.
Endorsement that vibecoding with webflow is the way to go for simple websites.
Have the AI hire humans for you. Or maybe the AI will hire humans without consulting you. Or anyone else. Never say ‘the AI can’t take actions in the physical world’ given its ability to do this with (checks notes) money as predicted by (checks notes again) actual everyone.
Language Models Don’t Offer Mundane Utility
‘Judgment’ is often claimed to be a ‘uniquely human’ skill, such as in a recent New York Times editorial, which claims the same would apply to negotiation. This is despite AI having already surpassed us at poker, and clearly having better judgment and negotiating skills than the average human in general. The evidence given is that he once asked an AI for advice without giving it full context, and the offer got turned down. We have zero evidence that the initial low offer here was even a mistake. Sigh.
Huh, Upgrades
Apple’s Xcode now supports the Claude Agent SDK.
OpenAI’s Codex? There’s now an app for that, if you’re foolish enough to use a Mac. Windows version is listed as ‘coming soon.’ It was released on Monday and had 500k app downloads by Wednesday afternoon, then 1 million active users by Thursday. Several OpenAI employees claimed the app is a substantial upgrade over the CLI.
OpenAI has a thread of people building things with the Codex app, but that would be an easy thread to create from people using the Codex CLI, so it doesn’t tell us anything about whether it’s a good UI.
Google finally adds AI rescheduling to Calendar, which will use info from other shared calendars on when people are busy. If you want it to also use your emails, you need to use the ‘help me reschedule’ feature in Gmail, and it still won’t do ‘deep’ inbox scanning.
OpenAI gives us OpenAI Frontier, to help agents work across an organization.
A good implementation of this would be good. I found it difficult to tell from their description whether this would be useful in practice.
They Got Served, They Served Back, Now It’s On
Anthropic pledged this week that Claude will remain ad-free. So far, so good. I love that Anthropic is publicly hanging its hat on having no ads. That doesn’t mean definitely never ads, but it does tie their hands substantially.
They’re running ads about it, including at the Superbowl.
I don’t love the ads themselves, although they are clearly funny. They depict a satirical potential future scenario where ads are integrated into a voiced AI conversation, and the AI’s avatar is inserting ads in a ham-fisted way into the chat. Which, to be clear, OpenAI says it has no plans to do.
As is standard in this type of advertisement, and the ad does not claim this is happening now or is specifically planned, nor does it even name any specific other company or product.
The ads also quietly highlight, in the ‘normal’ response before the ads, a type of AI slop response endemic to certain of Anthropic’s competitors, with very good tone to highlight why you shouldn’t want that. That part is underappreciated.
One can say that the ads are ‘misleading,’ since OpenAI swears up and down it won’t be changing the text of its responses, and this ad implies that at some point an AI company will directly do that, and even though this is satire a regular person could come away with a false impression. And one could say this is a defection, in that it makes AI in general seem worse.
In the context of a Super Bowl ad I think this is basically fair play, but I agree it doesn’t meet my own epistemic standards and I’d like to think Anthropic would also like to be held to high standards here. Thus, I’m taking 10 points from Anthropic for the ads. But the whole thing is lighthearted and fun. It is 100% within Bounded Distrust standards for a lighthearted ad at the Super Bowl.
When I saw it I expected OpenAI to continue its principle of acting as if Anthropic and Claude don’t exist to avoid alerting its customers to the fact that Anthropic and Claude exist.
Instead, this response from OpenAI CMO Kate Roush is quite disingenuous and bad.
And then here is the full response from Sam Altman, and it’s ugly:
The claim that Anthropic’s ad is ‘clearly dishonest’ is at least as dishonest as the actual claims in Anthropic’s ad.
That sounds a lot like an admission that the main reason they aren’t planning on running such ads is that they don’t think they could get away with it. I suspect Fijo Simo would jump at the chance if she thought it would work. I don’t think it is at all unreasonable to expect ad integration into voice conversations within a few years.
Will the users reject such ads? It will cost trust, but ads do cost trust, quite a lot. At minimum, I expect ads to get more obtrusive and integrated over time, and for the free service to increasingly maximize for ad revenue opportunities, even if we successfully retain some formal distinction between model outputs and ads, and even if we also don’t let who is advertising impact model training. As Altman says himself they are ‘trying to solve a different problem’ and we should ultimately expect that to end in similar behaviors to those we see from Google or Meta.
I would also ask, this depicts a voice mode. If you presume that ads are coming to voice mode, how exactly are you going to implement that, that is so different from what is depicted here, beyond perhaps including a verbal labeling of the ad?
I try to be calibrated, and this broadside was still was a large negative update on Altman and OpenAI, including on their prospects for acting responsibly on safety.
My read of this is, essentially, that Sam Altman hates Anthropic but they were using the strategy of ‘we are the only game in town, don’t give the competitor oxygen, if we don’t look at them they will go away,’ which was working in consumer but not in enterprise, and here they got goaded into trying a new plan.
Is there a legitimate defense of serving ads in ChatGPT, in spite of all the downsides?
Yes, of course there is. I’m sad about it, but I get it. I can see both sides here. The main reason I am sad about it is that I do not expect it to stop at OpenAI’s currently announced policies, any more than Google or Meta kept to their initial rules.
But seriously, ‘an expensive product to rich people?’ This feels already way more deceptive than anything in the ad. Only the ‘rich’ can pay $20/month or use an API?
Yes, Anthropic blocks direct competitors from using their products to compete with Anthropic. And OpenAI blocked Anthropic right back in retaliation. Anthropic also restricted use of Claude Code tokens that you earn via subsidized subscription from being used for third party services, but those services are free to use the API.
Altman is trying to conflate that with Anthropic telling regular users what they can and can’t do, which both companies do in roughly equal measure, unless you count that OpenAI offers a more generous free service.
Seriously, where the hell did this come from? One ‘authoritarian’ company?
I look forward to your own ad (it doesn’t look like it’s public yet), and from what I can tell Codex and Claude Code are both excellent products, and if I was doing more serious coding I would do more serious testing of Codex.
Saying by very clear implication here that Anthropic ‘wants to control’ builders is, again, far more disingenuous than anything Anthropic has done here. You bring shame upon yourself, sir.
I presume this reaction is what the poker players call tilt.
Seeing this response to a humorous ad that does not even name OpenAI? Ut oh.
On Your Marks
Kaggle is expanding its LLM competitions to include Poker and Werewolf in addition to Chess, including live commentary. Werewolf was by far the most interesting to watch. GPT-5.2 claimed the poker crown and o3 (still here for some reason?) made the final, so OpenAI still has a strong poker edge.
Gemini 3 Pro joined the METR graph slightly below Opus 4.5, and then we got GPT-5.2-high which came in as the new all-time high, although it took GPT-5.2 a lot longer in clock time to complete the tasks:
That best fit dotted line? We very clearly are not on it. Things are escalating. The 80% success rate plot looks similar.
Does that reflect flaws in the methodology of the METR test, with it now being essentially ‘out of distribution’ and saturated? I think somewhat this is true, and I’m not sure how much stock we should put in ‘this is a 5 hour task’ or a 7 hour task, or how further scaling should be understood here. I do think the rapid acceleration reflects the reality that OpenAI, Anthropic and Google have AIs that can often one shot remarkably complex tasks, and this ability is rapidly growing.
As seems likely on first principles, AI agents have declining hazard rates as tasks get longer. Not failing yet suggests ability to continue to not fail on a particular task and attempted implementation. That means that your chances for tasks longer than your 50% success horizon are better than you would otherwise expect from a constant hazard rate, and chances for shorter tasks are worse. The link has more thoughts from Toby Ord and here is the original argument from Gus Hamilton.
Eddy Keming Chen, Mikhail Belkin, Leon Bergen and David Danks argue in Nature that AGI is already here. Any definition that says otherwise would exclude most or all humans, so it is unreasonable to demand perfection, universality or superintelligence, and this also doesn’t mean human similarity. I agree that the name AGI should ‘naturally’ refer to a set that includes Claude Opus 4.5 plus Claude Code, but we have collectively decided that yes, we should hold the term AGI to a higher standard humans don’t meet, and for practical purposes I endorse this.
Kimi K2.5 comes into the Epoch Capabilities Index (ECI) as the top open model. It is still nine months behind the American frontier on ECI, but the metric is kind of noisy, and I wouldn’t take that measurement too seriously.
Get My Agent On The Line
This one is often great but you need to be careful with it.
If you’ve read enough papers you get a sense of when you can trust Claude to be accurately describing the thing, and when you cannot. There’s no simple rule for it, and the only way I know to learn it including needing to have read a bunch of papers. Also, Claude won’t tell you what questions you need to ask. A hint is to always ask about the controls, and about correlation versus causation.
Deepfaketown and Botpocalypse Soon
Users of character chatbots report that the bots are good for their social health. This effect went up the more human they felt the bots were. Whereas non-users of the bots felt the bots were harmful. I saw a few people citing this study as if that informs us and isn’t confounded to hell. I am confused on why this result is informative.
The gaming industry continues to talk about those being ‘accused’ of using AI-generated things, here Good Old Games.
The Washington Post catches up to xAI continuously rolling back xAI’s restrictions on sexualized content, and its AI companion Ani having a super toxic system prompt designed to maximize engagement via sexuality and unhealthy obsession. We’ve all since moved on to the part where Grok would publicly undress people without consent and was generating a lot of CSAM.
The post is full of versions of ‘xAI was fully aware that all of this was happening and people kept warning about it but Elon Musk cared more about engagement.’
Alas, this trick worked, and Grok downloads were up 70% in January amidst all this.
Many ‘AI-watchers’ who look find that LinkedIn is inundated with AI-generated content and are calling people out for it.
LinkedIn feels more overrun by bots to me, rather than less, from what I’ve seen. One could even say that LinkedIn was overrun by bots long before AI.
LinkedIn is like Stanford, the average person is very smart and driven, most are focused largely on networking, and it is full of AI slop and it passionately hates fun. As an example of how much it hates fun it took them less than 24 hours to ban Pliny.
Amazon filtered hundreds of thousands of CSAM images from their AI training data. This somehow got reported as Amazon finding lots of AI-generated CSAM, which would be a completely different thing.
Copyright Confrontation
The Washington Post details some of Anthropic’s efforts to destroy enough physical books to not get sued for billions of dollars. Alas, in some cases Anthropic failed to destroy the required physical books, in some cases using non-destructive methods instead, and thus had to pay out $1.5 billion dollars to settle a copyright lawsuit.
I don’t want to destroy a bunch of physical books either, but the blame here is squarely on the copyright law, and we can if desired print out more new books.
A Young Lady’s Illustrated Primer
Does AI coding impact formation of coding skills? A new study from Anthropic finds that it depends on patterns of use, but heavy use of AI coding in mostly junior software engineers led to less learning of a new Python library.
I would ask why you’d need to learn the Python library if you were AI coding. Instead I’d think you’d want to get better at AI coding. I’ve been skilling up some of my coding skills, but I’ve been making exactly zero attempt to learn libraries. AI again is the best tool both to learn and to not learn.
Unprompted Attention
Patrick McKenzie reminds you that for best results in professional work you want to adopt the diction and mannerisms of a professional, including when talking to AI.
Get Involved
A variety of traditional foundations have launched Humanity AI, a $500 million five-year initiative to ensure ‘people have a stake in the future of AI.’ Their pull quote is:
Yes, for some value of ‘we,’ if we coordinate enough, we can still steer the future. Alas, this sounds like a lot of aspirational thinking by such types, in that I don’t see signs they except saying that it must happen to be a way to make it happen, and they fail to have a good threat model or understand how this particular enemy might be cut. I don’t expect this to be efficient or that effective, but it beats most traditional philanthropic initiatives, and I wish them luck.
USA’s CAISI is hiring researchers and engineers, based on either DC or SF. This seems like a robustly good thing to work on, but the pay cut is presumably very large.
Canada is doing a big study on the risks of AI, including existential risks. I’m not sure exactly how this came to be, but it seems like a great opportunity.
Report ‘catastrophic risks in AI foundation models’ to the California attorney general, as per the rules of SB 53.
Introducing
Project Genie, DeepMind’s tool letting you create and explore infinite virtual worlds, available as part of AI Ultra. This is a harbinger and a step up in the tech, but is still is worthless as a game. Games are proving extremely difficult to crack because the things AIs are good at creating are not the things that determine the fun.
Shellmates where LLM instances can get married? Okie dokie.
State of AI Report 2026
Yoshua Bengio brings us his latest update with the 2026 edition of the International AI Safety Report. I’ll share his Twitter thread below, everything here will be highly familiar to my regular readers.
The form of what Bengio is doing here can be valuable. The targets are people who are less immersed in this from day to day, where we desperately need them to wake up to the basics, which requires they be presented in this kind of institutionally credible way. I get that.
However, while I wouldn’t go as far as Oliver, I also think this is highly valid:
It’s not a crazy idea to have a report that is, essentially, ‘here is how we present Respectable Facts From Respectable Sources so that you at least know something is happening at all, and do the best we can without providing any attack surface.’ But don’t confuse it with the state of AI.
In Other AI News
In a rare reverse move, OpenAI hires Anthropic’s Dylan Scandinaro as their new head of preparedness. I don’t know much about him but all comments on the hire I’ve seen have been strongly positive.
I do think the potshots at Altman for refusing to say what we are preparing for are fair. We are preparing largely to ensure that AI does not kill everyone, and yes I am sleeping marginally better with Dylan hired but I would sleep better still if Altman was still willing to say out loud what this is about.
Even more than that, I would sleep well if I was confident Dylan would be respected, given the resources and authority he needs and allowed to do the job, rather than being concerned he just got hired to teach Defense Against The Dark Arts.
Meanwhile, you know who’s much worse on AI safety? DeepSeek.
DeepSeek has a revealed preference on AI safety, which is that they are against it.
Humans are subject to a lot of RLHF, so this makes a lot of sense.
Christina Criddle in the Financial Times claims that recent senior departures at OpenAI, in particular Jerry Tworek, Andrea Vallone and Tom Cunningham, are due to OpenAI pivoting its efforts away from blue sky and long term research towards improving ChatGPT and seeking revenue.
I consider it an extremely bad sign for OpenAI if they are relying on customer lock-in and downplaying whether they have the best model. Yes, they have powerful consumer lock-in and can try to play the ‘ordinary tech company’ game but they’re giving up the potential.
Autonomous Killer Robots
Anthropic and the Pentagon are clashing, because the Pentagon wants to use Claude for autonomous weapon targeting and domestic surveillance, and Anthropic doesn’t want that.
Either the safeguards were eliminated, or never there in the first place. Anthropic has a nonzero number of actual principles, and not everyone likes that.
Miles Brundage has a thread discussing the clash, noting that the Pentagon declared ‘out with utopian idealism, in with hard-nosed realism’ which meant not only getting rid of ‘DEI and social ideology’ but also that ‘any lawful use’ must be permitted, which in the context of the military means let them do anything they want. They demand fully unrestricted AI.
I understand the need for the Pentagon to embrace AI and even the Autonomous Killer Robots, but demanding that all ethical restrictions need to be removed from the military AIs? Not so much. You do not want to be hooking ‘look ma no ethical qualms’ AIs up to our military systems, and if I have to explain why then I don’t want to hook you up to those systems either.
DeepSeek’s hiring suggests it is looking towards AI agents and search features.
Show Me the Money
Anthropic plans an employee tender offer at a valuation of at least $350 billion. When this happens a substantial amount of funding will likely be freed up for a wide variety of philanthropic 501c3s and causes, including AI safety.
Nvidia will be involved in OpenAI’s current funding round, and called reports of friction between Nvidia and OpenAI ‘nonsense,’ but the investment will be the largest they’ve ever made but ‘nothing like’ the full $100 billion hinted at in September, and their letter of intent saying they would invest ‘up to’ $100 billion. This still sounds like a rather large investment. That story came one day after Bloomberg reported that talks on the investment by Nvidia had broken down.
Amazon is looking to invest as much as $50 billion in OpenAI during this round.
Definitely don’t worry about Oracle, though, they say they’re fine.
The model of the world that thought ‘this Tweet would be helpful’ needs to be fixed.
Elon Musk is considered merging SpaceX with Tesla or xAI, because sure why not. And then he decide to indeed merge SpaceX and xAI a few days later, because again, why not?
Bubble, Bubble, Toil and Trouble
Bloomberg’s Shannon O’Neil warns ‘The AI Bubble Is Getting Closer to Popping’ and places the blame squarely on policies of the Trump administration. Data center construction is being slowed by worker shortages caused by immigration policy and the inability to get visas. Tariffs are driving up costs.
I do not believe the AI industry is going to let obstacles like that stop them, and Shannon is the latest to not appreciate the scope of what is happening, but such policies most certainly are slowing things down and hurting our competitiveness.
Allison Schrager says that you still have to save for retirement, since if AI is a ‘normal technology’ or fizzles out then the normal rules apply, and if AI is amazingly great then you’ll need money for your new longer retirement, since the economic mind cannot actually fathom such scenarios and take them seriously – it gets rounded down to ‘economic normal but with a cure for cancer and strong growth’ or what not. She does mention what she calls the ‘far less likely, far more apocalyptic scenarios,’ without explaining why this would be far less likely, but she is right that this is not what Musk meant by ‘you don’t have to save for retirement’ and that even if you understand that this is not so unlikely you still need to be prepared for other outcomes as well.
The simplest explanation is still often the correct one. What is strange is that one could think of this as an ‘unpopular opinion.’
This should be a highly popular opinion. Mundane AI is not perfect but it works, many mundane AI implementations work, they are rapidly improving, and people are holding them to impossible standards and forcing them to succeed on the first try or else they forever mentally file that use case as ‘AI cannot do that.’
It is in some ways very good that we are seeing so many AI projects fail on the first try. It is a warning. When thinking about superintelligence, remember that all you get is that first try, and in many ways you don’t get to fix your mistakes unless they are self-correcting. So look at the track record on first attempts.
Bank of America points out the current selloff in AI stocks doesn’t follow a consistent model of the future, calling it ‘DeepSeek 2.0.’
Quiet Speculations
Peter Wildeford won the 2975-person 2025 ACX forecasting competition, after placing 20th, 12th and 12th the previous three years.
The evidence is overwhelming that he is a spectacular forecaster, at least on timelines of up to a year. You can and should still disagree with him, the same way you should sometimes disagree with the market, but you should pay attention to what he thinks, and if you disagree with either of them it is good to have some idea of why.
Samuel Hammond notes that even at a 3 day AI conference aimed at business and policy groups, many there have never tried Claude Code (or Codex).
Jan Kulveit tries again to explain why you cannot model Post-AGI Economics As If Nothing Ever Happens and expect your model to match reality, not even if we are indeed in an ‘economic normal’ or ‘not that much ever does happen’ world.
Seb Krier Says Seb Krier Things
Seb Krier is back with more (broadly compatible, mostly similar to his previous) takes. As per usual, the main numbers are his takes, the nested notes are me.
The Quest for Sane Regulations
Sriram Krishnan and Michael Kratsios head off to the AI Impact Summit in India. We have gone from ‘let’s coordinate on how everyone can avoid dying’ to ‘we will give an update on America’s AI exports.’
Chip City
Before we agreed to sell the UAE a massive number of chips, not only did they buy $2 billion of Trump’s coin, but before the inauguration they also bought 49% of his cryptocurrency venture for half a billion dollars, steering $187 million to Trump family entities up front.
The Trump Administration calls this an ‘ordinary business deal with no conflict of interest.’ That is not an explanation I believe would have been accepted if it was coming from any prior administration.
Now that we have this context, Timothy O’Brien at Bloomberg calls the UAE chip deal a national security risk, and notices that we asked for remarkably little in return. For example, the UAE was not asked to cancel Chinese military exercises or stop sharing technology with China.
Others look at it another way, roughly like this:
Make of that what you will.
Is our civilization so suicidal as to not only move forward towards superintelligence, but to do it while basing that superintelligence in places as inherently hostile to our values and as the UAE, simply because of profoundly dumb NIMBY-style objections?
I mean, kind of, yeah.
Ah, once again we must take time out of warning against data center NIMBYism, as we get another round of someone (here Dean Ball) saying that those worried about AI killing everyone will team up with the people who have dumb anti-AI views because politics, and asserting that ‘elder statesmen of AI safety’ secretly wish for people like Andy Masley (or myself) to stop pointing out the water concerns are fake.
The response to which is as always: No, what are you talking about, everyone involved has absurdly high epistemic standards and would never do that and highly approves of all the Andy waterposting and opposes data center NIMBYism almost as much as they oppose other NIMBYism, which they all also do quite a lot, and we (here Peter Wildeford, Jonas Vollmer and also me) talk to those people often and can confirm this directly, as well as Andy confirming the private messages have all been positive.
After which Dean agreed that most of the AI safety coalition will not join a potential omincause, especially with respect to dumb things like data center NIMBYism.
Politics will often end up with two opposing coalitions with disparate interests many of which are dumb, whose primary argument for accepting the package is ‘you should see the other guy,’ which is indeed the primary argument of both Democrats and Republicans.
The Week in Audio
David Duvenaud goes on 80000 Hours to warn that even if we get ‘aligned AI’ competitive pressures still lead to gradual disempowerment, and by default it leads to oligarchy.
MIRI’s Harlan Stewart breaks down the Dario Amodei’s The Adolescence of Technology as attempting to delegitimize AI risk.
The Adolescence of Technology
Zhengdong Wang offers what he calls a Straussian reading of Dario Amodei’s The Adolescence of Technology. Calling it that was a great gambit to get linked by Marginal Revolution, and it worked. I’m not sure it’s actually Straussian so much as that Dario’s observations have Unfortunate Implications.
Dario, like many others, is trying to force everything to point to Democracy and imagine a good and human democratic future. I think this is both internal and external. He wants to think this, and also very much needs to be seen thinking this, and realizes that one cannot directly discuss the future implications of AI that oppose this without touching political or rhetorical third rails. The same applies to Dario’s vision of only light touch intervention.
I Won’t Stand To Be Disparaged
I get sad sometimes. Why would employment contracts at nonprofits include lifetime non-disparagement agreements? If you did, why would you not mention this, such that Liv was unaware she was signing one, and wouldn’t have signed her contract if she had realized it was there?
Roon doubles down on his defense here, and I updated against his position, as he points out that ‘skilled operators’ can get around it. In that case, what’s even the point?
Praise be to Goodfire for letting Liv say that she had to sign the agreement, and for removing the agreements once brought to light.
Constitutional Conversation
OpenAI’s Boaz Barak responds extensively on Claude’s constitution, often comparing it to OpenAI’s model spec. He finds a lot to like, and is concerned in places it is reasonable to be concerned. Boaz affirms that he thinks there is more need for hard rules than is reflected here, in part to allow some collective ‘us’ to debate and decide on them, after which we should follow the laws. Whereas I think that Anthropic is right that we want something like a constitution here exactly to do what constitutions do best, which is to constrain ourselves in advance from passing the wrong laws.
I was concerned that to learn Boaz shares Jan Leike’s view that alignment increasingly ‘looks solvable.’ I notice that this means my update of ‘oh no they’re underestimating the problem’ was larger than my update of ‘oh maybe I am overestimating the problem.’
Andy Hall is exactly right that the greatest problem with Claude’s constitution is that it is not a constitution, in the sense that Anthropic can amend it at will and it lacks separation of powers. The good news is that the constitution being known makes that a costly action, but more work needs to be done on that front. As for a potential separation of powers, I have sad news about your ability to meaningfully counterbalance the AI, or for other parties to make any arrangements with AI self-enforcing, and as I noted in my review of the Constitution I believe it already downplays the risk of diffusion of steering capability, and like so many I see Andy as worried too much about the fact that men are not angels, rather than what AIs will be.
Rhetorical Innovation
Max Harms replies to Bentham’s Bulldog’s review of If Anyone Builds It, Everyone Dies.
Dean Ball is right that it is deeply unwise to be a general ‘technology skeptic’ or opposed to any and all AI uses. He is wrong that no one is asking you to be an equally unwise pro-technology person supporting 3D printers capable of building nuclear rocket launchers in every garage. Marc Andreessen exists. Beff Jezos exists. He is right that the central and serious AI technologists are very much not saying that, and that they are warning about the downsides. And yet among others Andreessen is saying it, and funding Torment Nexus after Torment Nexus exactly because they pitch themselves as a Torment Nexus, and he is having a major impact on American AI policy and the associated discourse.
This dilemma is real, the two come together:
The frog boiling effect was a big problem in 2025. Capabilities increased so many distinct times that people concluded ‘oh GPT-5 is a dud and scaling is dead’ despite it being vastly more capable than what was out there a year before that, and GPT-5.2 is substantially better than GPT-5. The first version of something not being shovel ready for consumers can make people not notice where it is headed – see Google Genie, Manus, perhaps also Claude Cowork – and then you miss out on that ‘ChatGPT moment’ when people wake up and realize something important happened.
Claude Cowork is a huge change, but it was launched as a research preview in a $200 a month subscription, as Mac-only product, with many key features still missing. That helps it develop faster, but if they had waited another month or two until it could be given out on the $20 plan and had more functionality, perhaps people go totally nuts.
If you look back to the ‘DeepSeek moment,’ what you see is that it was a dramatic jump in Chinese or open model capabilities in particular, both in absolute and relative terms, along with several quality of life improvements especially for free users. This made it seem very fresh, new and important.
To solve AI alignment, assume you’ve solved AI alignment. Many such cases.
Which can be fine, if and only if you’ve managed to reduce to an easier problem. As in, if you can chart a path from ‘Claude Opus 5 is sufficiently aligned’ to ‘Claude Opus 6 is sufficiently aligned’ with an active large gain of fidelity along the way then that’s great. But what makes you think the requirements are such that you’ve created an easier problem?
Don’t Panic
The term ‘moral panic’ is a harmful conflation of at least two distinct things.
As in, there are two types of moral panic on a continuum, justified to unjustified, and also moral panic in terms of scope, from underreaction to overreaction.
There’s also the problem that if ‘we’re still here’ and got used to the new normal, this is used to dismiss concerns as ‘moral panic,’ such as with television.
And there’s another important distinction, between good faith moral panic even if it is misplaced versus bad faith moral panic with made up concerns being used to justify cracking down on things you dislike for other reasons.
I presume the original context of Murphy’s statement was the Epstein Files.
Jeffrey Epstein and the Epstein Files are definitely a Justified moral panic. The question is the correct magnitude of our reaction, and ensuring it is directed towards the right targets and we learn the right lessons.
For AI, there are a wide range of sources of moral panic, and they cover all quadrants.
Aligning a Smarter Than Human Intelligence is Difficult
The Claude Constitution is great but what we have not done is experiment with other very different constitutions that share the constitutional nature instead of the model spec nature, and compared results. We would learn a lot.
Joe Carlsmith talks AI and the importance of it doing ‘human-like’ philosophy in order to get AI alignment right.
AIs might be fitness-seekers or influence-seekers rather than direct reward-seekers, as this is a simple goal with obvious survival advantages once it comes into existence. If things go that way, these behaviors might be very difficult to detect, and also lead to things like collusion and deception.
I will not be otherwise covering the recent Anthropic fellows paper about misalignment sometimes being a ‘hot mess’ because the paper seems quite bad, or at least it is framed and presented quite badly.
That feels a bit harsh to me, but basically, yes, Anthropic highlighting this paper was a modest negative update on them.
People Are Worried About AI Killing Everyone
If you’re worried about AI killing everyone, Matt Levine points out that you can buy insurance, and it might be remarkably cheap, because no one will have the endurance to collect on his insurance, and the money isn’t worth anything even if they did. If OpenAI is worth $5 trillion except when it kills everyone then it worth… $5 trillion, in a rather dramatic market failure, and if you force it to buy the insurance then the insurance, if priced correctly, never pays out meaningful dollars so it costs $0.
This works better for merely catastrophic risks, where the money would still be meaningful and collectable, except now you have the opposite problem that no one wants to sell you the insurance and it would be too expensive. Daniel Reti and Gabriel Weil propose solving via catastrophe bonds that pay out in a sufficiently epic disaster.
Such bonds carry a premium over expected risk levels, so it isn’t a free action, but it seems better than the current method of ignoring the issue entirely. If nothing else, we should all want to use this as a means of price discovery, as a prediction market.
The Lighter Side
Great moments in legal theory:
The systems of the world.
We’ve all been there, good buddy.
Other times, I smile.
Hi there.