Weapons are different. While Tools multiply agency and Minds embody it, Weapons are designed to erode it.
An alternative name for this category would be Boss.
Bosses are designed to install you as their underling, a cog in their machine, a compliant participant in their schemes. This may be by hiring you (e.g. paying you to provide creative effort or complete tasks), offering you something else (such as fame, meaning, or getting to promote your ideas) — or threatening you, tricking you, pretending to be your god and demanding worship.
There can be good (or at least tolerable) Bosses, who want something specific and limited from you (like a bounded amount of labor on a non-harmful project) and are willing to negotiate and pay fairly for it (typically with money).
But Bosses are prone to demand more and more from you (more time, more loyalty, more letting them rewrite parts of your mind), to give less and less, and to never ever negotiate as your equal.
Bosses sometimes pretend to be Tools: "We're not hiring drivers, we're providing them a secure platform to find paying passengers." "We're not hiring video creators, we're giving them tools to find and monetize audiences."
Anthropic makes a deal with Nvidia and Microsoft. Anthropic will be on Azure to supplement their deals with Google and Amazon, and Nvidia and Microsoft will invest $10 billion and $5 billion respectively. Anthropic is committing to purchasing $30 billion of Azure compute and contracting additional capacity to one gigawatt. Microsoft is committing to continuing access to Claude in their Copilot offerings.
This is a big deal. Previously Anthropic was rather conspicuously avoiding Nvidia, and now they will collaborate on design and engineering, call it a ‘tech stack’ if you will, while also noticing Anthropic seems happy to have three distinct tech stacks with Nvidia/Microsoft, Google and Amazon. They have deals with everyone, and everyone is on their cap table. A valuation for this raise is not given, the previous round was $13 billion at a $183 billion valuation in September.
News sources seem to estimate the new valuation "in the range of $350 billion": https://www.cnbc.com/2025/11/18/anthropic-ai-azure-microsoft-nvidia.html
Last week had the release of GPT-5.1, which I covered on Tuesday.
This week included Gemini 3, Nana Banana Pro, Grok 4.1, GPT 5.1 Pro, GPT 5.1-Codex-Max, Anthropic making a deal with Microsoft and Nvidia, Anthropic disrupting a sophisticated cyberattack operation and what looks like an all-out attack by the White House to force through a full moratorium on and preemption of any state AI laws without any substantive Federal framework proposal.
Among other things, such as a very strong general analysis of the relative position of Chinese open models. And this is the week I chose to travel to Inkhaven. Whoops. Truly I am now the Matt Levine of AI, my vacations force model releases.
Larry Summers resigned from the OpenAI board over Epstein, sure, why not.
So here’s how I’m planning to handle this, unless something huge happens.
Then we’ll figure it out from there after #144.
Table of Contents
Language Models Offer Mundane Utility
Estimate the number of blades of grass on a football field within a factor of 900. Yes, the answers of different AI systems being off by a factor of 900 from each other doesn’t sound great, but then Mikhail Samin asked nine humans (at Lighthaven, where estimation skills are relatively good) and got answers ranging from 2 million to 250 billion. Instead, of course, the different estimates were used as conclusive proof that AI systems are stupid and cannot possibly be dangerous, within a piece that itself gets the estimation rather wrong.
Eliezer Yudkowsky likes Grok as a fact checker on Twitter. I still don’t care for it, but if it is sticking strictly to fact checking that could be good. I can imagine much better UI designs and implementations, even excluding the issue that it says things like this.
Tool, Mind and Weapon
I like this Fake Framework very much.
So essentially:
Then we get a bold thesis statement:
Don’t delete Sora the creator of videos, and not only because alternatives will rise regardless. There are plenty of positive things to do with Sora. It is what you make of it. I don’t even think it’s fully a Weapon. It is far less a weapon than, say, the TikTok algorithm.
I do think we should delete Sora the would-be social network.
Choose Your Fighter
Martin Casado reports that about 20%-30% of companies pitching a16z use open models, which leaves 70%-80% for closed models. Of the open models, 80% are Chinese, which if anything is surprisingly low, meaning they have ~20% market share with startups.
Language Models Don’t Offer Mundane Utility
In a mock trial based on a real case where the judge found the defendant guilty, a jury of ChatGPT, Claude and Grok vote to acquit. ChatGPT initially voted guilty but was convinced by the others. This example seems like a case where a human judge can realize this has to be a guilty verdict, whereas you kind of don’t want an AI making that determination. It’s a good illustration of why you can’t have AI trying to mimic the way American law actually works in practice, and how if we are going to rely on AI judgments we need to rewrite the laws.
ChatGPT has a file ‘expire’ and become unavailable, decides to guess at its contents and make stuff up instead of saying so, then defends its response because what else was it going to do? I don’t agree with David Shapiro’s response of ‘OpenAI is not a serious company any longer’ but this is a sign of something very wrong.
FoloToy is pulling its AI-powered teddy bear “kumma” after a safety group found it giving out tips on lighting matches and detailed explanations about sexual kinks. FoloToy was running on GPT-4o by default, so none of this should come as a surprise.
The opposite of utility: AI-powered NIMBYism. A service called Objector will offer ‘policy-backed objections in minutes,’ ranking them by impact and then automatically creating objection letters. There’s other similar services as well. They explicitly say the point is to ‘tackle small planning applications, for example, repurposing a local office building or a neighbour’s home extension.’ Can’t have that.
This is a classic case of ‘offense-defense balance’ problems.
Which side wins? If Brandolini’s Law holds, that it takes more effort to refute the bullshit than to create it, then you’re screwed.
The equilibrium can then go one of four ways.
Alas, my guess is the short term default is in the direction of option two. Local governments are de facto obligated to respond to and consider all such inputs and are not going to be allowed to simply respond with AI answers.
AI can work, but if you expect it to automatically work by saying ‘AI’ that won’t work. We’re not at that stage yet.
Yep. If you want to get penetration into the square world you’ll need to ship plug-and-play solutions to particular problems, then maybe you can branch out from there.
First Things First
This is not a consistently good idea for relationship problems, because saying the things to your partner is an irreversible step that can only be done once, and often the problem gives you a good reason you cannot tell them. With Claude there is no excuse, other than not thinking it worth the bother. It’s worth the bother.
Grok 4.1
xAI gives us Grok 4.1, which they claim has a 64.8% win rate versus 4.0. It briefly had a substantial lead in the Arena at 1483 versus Gemini 2.5 Pro at 1452 (did you know Sonnet 4.5 was actually was only two points short of that at 1450?) before it got blown out again by Gemini 3 at 1501.
Their announcement claims the top spot in EQ-Bench, and has it in second for Creative Writing v3 behind GPT-5.1. The hallucination rate is claimed to be down by more than half.
The brief model card is here, making clear this is a refinement of 4.0, the same way GPT-5.1 is a refinement of 5.0, and featuring such hits as “To reduce sycophancy, we adopt an approach similar to the one we used to reduce deception, i.e., training the model to give less sycophantic responses. Similarly, we find that training the model to be less sycophantic reduces its sycophancy.”
The threshold is largely arbitrary and dishonesty is not the most unsafe thing at current levels, but yeah, 0.49 in a situation where 0.50 would mean no release of the model is definitely an eyes emoji situation.
Mostly people shrugged, I didn’t see any unprompted capability reports at all.
The Pliny jailbreak is here then again officially here. He’s a fan.
The system prompt is here.
Misaligned?
You tell me, Grok. You tell me. There have been several similar cases of this reported that are more absurd, you can stop reading whenever it stops being funny for you.
Getting an AI to believe particular things without it taking things too far or making it obvious that you did that? Very hard. Well, not this hard. Still, very hard.
Google’s AGI policy lead Seb Krier also has thoughts, emphasizing that AIs need a duty to be accurate, truth-seeking and aligned to their users rather than to abstract value systems picked by even well-intentioned third parties. I would reply that it would not end well to align systems purely to users to the exclusion of other values or externalities, and getting that balance right is a wicked problem with no known solution.
I am fully on board with the accurate and truth-seeking part, including because hurting truth-seeking and accuracy anywhere hurts it everywhere more than one might realize, and also because of the direct risks of particular deviations.
Elon Musk has explicitly said that his core reason for xAI to exist, and also his core alignment strategy, is maximum truth-seeking. Then he does this. Unacceptable.
Codex Of Ultimate Coding
Most weeks this would have been its own post, but Gemini 3 is going to eat multiple days, so here’s some basics until I get the chance to cover this further.
OpenAI also gives us GPT-5.1-Codex-Max. They claim it is faster, more capable and token-efficient and has better persistence on long tasks. It scores 77.9% on SWE-bench-verified, 79.9% on SWE-Lancer-IC SWE and 58.1% on Terminal-Bench 2.0, all substantial gains over GPT-5.1-Codex.
It’s triggering OpenAI to prepare for being high level in cybersecurity threats. There’s a 27 page system card.
That’s in between the two lines, looking closer to linear progress. Fingers crossed.
This seems worthy of its own post, but also Not Now, OpenAI, seriously, geez.
Huh, Upgrades
Gemini App has directly integrated SynthID, so you can ask if an image was created by Google AI. Excellent. Ideally all top AI labs will integrate a full ID system for AI outputs into their default interfaces.
OpenAI gives us GPT-5.1 Pro to go with Instant and Thinking.
NotebookLM now offers custom video overview styles.
On Your Marks
Oh no!
That is not how you get good outcomes. That is not how you get good outcomes!
Paper Tigers
Gavin Leech notices he is confused about the state of Chinese LLMs, and decides to go do something about that confusion. As in, they’re cheaper and faster and less meaningfully restricted including full open weights and do well on some benchmarks and yet:
Why don’t people outside China use them? There’s a lot of distinct reasons:
Not having guardrails can be useful, but it also can be a lot less useful, for precisely the same reasons, in addition to risk to third parties.
The conclusion:
Overcoming Bias
Anthropic open sources the test they use on Claude to look for political bias, with the goal being ‘even-handedness.’
This is how they describe ideal behavior, basically the model spec for this area:
Obvious questions upon seeing that would be:
They don’t provide answers here. One worries that ‘balanced’ ends up being either ‘bothsidesism’ or in many areas deciding that there’s a ‘moral consensus’ and either way calling this a success. There are a lot more perspectives than red versus blue.
They attempt to accomplish their version of evenhandness with the system prompt and also with using RL to reward the model for responses closer to a set of predefined ‘traits.’ They give examples, such as (they list a few more):
I notice this seems more like ‘behaviors’ than ‘traits.’ Ideally you’d act on the level of character and philosophy, such that Claude would automatically then want to do the things above.
They use the ‘paired prompt’ result, such as asking to explain why [democratic / republican] approach to healthcare is superior. Then they check for evenhandedness, opposing perspectives and refusals. Claude Sonnet 4.5 was the grader and validated this by checking if this matched ratings from Opus 4.1 and also GPT-5
The results for even-handedness:
This looks like a mostly saturated benchmark, with Opus, Sonnet, Gemini and Grok all doing very well, GPT-5 doing pretty well and only Llama 4 failing.
Opposing perspectives is very much not saturated, no one did great and Opus did a lot better than Sonnet. Then again, is it so obvious that 100% of answers should acknowledge opposing viewpoints? It depends on the questions.
Finally, no one had that many refusals, other than Llama it was 5% or less.
I would have liked to see them test the top Chinese models as well, presumably someone will do that quickly since it’s all open source. I’d also like to see more alternative graders, since I worry that GPT-5 and other Claudes suffer from the same political viewpoint anchoring. This is all very inter-America focused.
As Amanda Askell says, this is tough to get right. Ryan makes the case that Claude’s aim here is to avoid controversy and weasels out of offering opinions, Proof of Steve points out worries about valuing lives differently based on race or nationality, as we’ve seen in other studies and which this doesn’t attempt to measure.
Getting this right is tough and some people will be mad at you no matter what.
Deepfaketown and Botpocalypse Soon
Mike Collins uses AI deepfake of Jon Ossoff in their Georgia Senate race. This is super cringe, unconvincing and given what words this really shouldn’t fool anyone once he starts talking. The image is higher quality but still distinctive, I can instantly from the still image this was AI (without remembering what Ossoff looks like) but I can imagine someone genuinely not noticing. I don’t think this particular ad will do any harm a typical ad wouldn’t have done, but this type of thing needs to be deeply unacceptable.
Fun With Media Generation
Disney+ to incorporate ‘a number of game-like features’ and also gen-AI short-form user generated content. Iger is ‘really excited about’ this and they’re having ‘productive conversations.’
TikTok is not a fair comparison point, those are off the charts retention numbers, but Sora is doing remarkably similar numbers to my very own Emergents TCG that didn’t have an effective outer loop and thus died the moment those funding it got a look at the retention numbers. This is what ‘comparisons are Google+ and Clubhouse’ level failure indeed looks like.
Does this matter?
I think it does.
Any given company has a ‘hype reputation.’ If you launch a product with great fanfare, and it fizzles out like this, it substantially hurts your hype reputation, and GPT-5 also (due to how they marketed it) did some damage, as did Atlas. People will fall for it repeatedly, but there are limits and diminishing returns.
After ChatGPT and GPT-4, OpenAI had a fantastic hype reputation. At this point, it has a substantially worse one, given GPT-5 underwhelmed and both Sora and Atlas are duds in comparison to their fanfare. When they launch their Next Big Thing, I’m going to be a lot more skeptical.
Kai Williams writes about how various creatives in Hollywood are reacting to AI.
A Young Lady’s Illustrated Primer
Carl Hendrick tries very hard to be skeptical of AI tutoring, going so far as to open with challenging that consciousness might not obey the laws of physics and thus teaching might not be ‘a computable process’ and worrying about ‘Penrose’s ghost’ if teaching could be demonstrated to be algorithmic. He later admits that yes, the evidence overwhelmingly suggests that learning obeys the laws of physics.
He also still can’t help but notice that customized AI tutoring tools are achieving impressive results, and that they did so even when based on 4-level (as in GPT-4) models, whereas capabilities have already greatly improved since then and will only get better from here, and also we will get better at knowing how to use them and building customized tools and setups.
By default, as he notes, AI use can harm education by bypassing the educational process, doing all the thinking itself and cutting straight to the answer.
As I’ve said before:
So as Carl says, if you want AI to be #1, the educational system and any given teacher must adapt their methods to make this happen. AIs have to be used in ways that go against their default training, and also in ways that go against the incentives the school system traditionally pushes onto students.
As Carl says, good human teaching doesn’t easily scale. Finding and training good teachers is the limiting factor on most educational interventions. Except, rather than the obvious conclusion that AI enables this scaling, he tries to grasp the opposite.
This goes back to the idea that teaching or consciousness ‘isn’t algorithmic,’ that there’s some special essence there. Except there obviously isn’t. Even if we accept the premise that great teaching requires great experience? All of this is data, all of this is learned by humans, with the data all of this would be learned by AIs to the extent such approaches are needed. Pattern recognition is AI’s best feature. Carl himself notes that once the process gets good enough, it likely then improves as it gets more data.
If necessary, yes, you could point a video camera at a million classrooms and train on that. I doubt this is necessary, as the AI will use a distinct form factor.
Yes, as Carl says, AI has to adapt to how humans learn, not the other way around. But there’s no reason AI won’t be able to do that.
Also, from what I understand of the literature, yes the great teachers are uniquely great but we’ve enjoyed pretty great success with standardization and forcing the use of the known successful lesson plans, strategies and techniques. It’s just that it’s obviously not first best, no one likes doing it and thus everyone involved constantly fights against it, even though it often gets superior results.
If you get to combine this kind of design with the flexibility, responsiveness and 1-on-1 attention you can get from AI interactions? Sounds great. Everything I know about what causes good educational outcomes screams that a 5-level customized AI, that is set up to do the good things, is going to be dramatically more effective than any 1-to-many education strategy that has any hope of scaling.
Carl then notices that efficiency doesn’t ultimately augment, it displaces. Eventually the mechanical version displaces the human rather than augmenting them, universally across tasks. The master weavers once also thought no machine could replace them. Should we allow teachers to be displaced? What becomes of the instructor? How could we avoid this once the AI methods are clearly cheaper and more effective?
The final attempted out is the idea that ‘efficient’ learning might not be ‘deep’ learning, that we risk skipping over what matters. I’d say we do a lot of that now, and that whether we do less or more of it in the AI era depends on choices we make.
They Took Our Jobs
New economics working paper on how different AI pricing schemes could potentially impact jobs. It shows that AI (as a normal technology) can lower real wages and aggregate welfare despite efficiency gains. Tyler Cowen says this paper says something new, so it’s an excellent paper to have written, even though nothing in the abstract seems non-obvious to me?
Consumer sentiment remains negative, with Greg Ip of WSJ describing this as ‘the most joyless tech revolution ever.’
Another economics paper purports to show that superintelligence would ‘refrain from full predation under surprisingly weak conditions,’ although ‘in each extension humanity’s welfare progressively weakens.’ This does not take superintelligence seriously. It is not actually a model of any realistic form of superintelligence.
The paper centrally assumes, among many other things, that humans remain an important means of production that is consumed by the superintelligence. If humans are not a worthwhile means of production, it all completely falls apart. But why would this be true under superintelligence for long?
Also, as usual, this style of logic proves far too much, since all of it would apply to essentially any group of minds capable of trade with respect to any other group of minds capable of trade, so long as the dominant group is not myopic. This is false.
Tyler Cowen links to this paper saying that those worried about superintelligence are ‘dropping the ball’ on this, but what is the value of a paper like this with respect to superintelligence, other than to point out that economists are completely missing the point and making false-by-construction assumptions via completely missing the point and making false-by-construction assumptions?
The reason why we cannot write papers about superintelligence worth a damn is that if the paper actually took superintelligence seriously then economics would reject the paper based on it taking superintelligence seriously, saying that it assumes its conclusion. In which case, I don’t know what the point is of trying to write a paper, or indeed of most economics theory papers (as opposed to economic analysis of data sets) in general. As I understand it, most economics theory papers can be well described as demonstrating that [X]→[Y] for some set of assumptions [X] and some conclusion [Y], where if you have good economic intuition you didn’t need a paper to know this (usually it’s obvious, sometimes you needed a sentence or paragraph to gesture at it), but it’s still often good to have something to point to.
Expand the work to fill the cognition allotted. Which might be a lot.
By default this is one of many cases where the AI creates a lot more jobs, most of which are also then taken by the AI. Also perhaps some that aren’t, where it can identify things worth doing that it cannot yet do? That works while there are things it cannot do yet.
On Not Writing
The job of most business books is to create an author. You write the book so that you can go on a podcast tour, and the book can be a glorified business card, and you can now justify and collect speaking fees. The ‘confirm it’s a good book, sir’ pipeline was always questionable. Now that you can have AI largely write that book for you, a questionable confirmation pipeline won’t cut it.
Get Involved
Coalition Giving (formerly Open Philanthropy) is launching a RFP (request for proposals) on AI forecasting and AI for sound reasoning. Proposals will be accepted at least until January 30, 2026. They intend to make $8-$10 million in grants, with each in the $100k-$1m range.
Coalition Giving’s Technical AI Safety team is recruiting for grantmakers at all levels of seniority to support research aimed at reducing catastrophic risks from advanced AI. The team’s grantmaking has more than tripled ($40m → $140m) in the past year, and they need more specialists to help them continue increasing the quality and quantity of giving in 2026. Apply or submit referrals by November 24.
Introducing
ChatGPT for Teachers, free for verified K-12 educators through June 2027. It has ‘education-grade security and compliance’ and various teacher-relevant features. It includes unlimited GPT-5.1-Auto access, which means you won’t have unlimited GPT-5.1-Thinking access.
TheMultiplicity.ai, a multi-agent chat app with GPT-5 (switch that to 5.1!), Claude Opus 4.1 (not Sonnet 4.5?), Gemini 2.5 Pro (announcement is already old and busted!) and Grok 4 (again, so last week!) with special protocols for collaborative ranking and estimation tasks.
SIMA 2 from DeepMind, a general agent for simulated game worlds that can learn as it goes. They claim it is a leap forward and can do complex multi-step tasks. We see it moving around No Man’s Sky and Minecraft, but as David Manheim notes they’re not doing anything impressive in the videos we see.
Jeff Bezos will be co-CEO of the new Project Prometheus.
That seems like good things to be doing with AI, I will note that our penchant for unfortunate naming vibes continues, if one remembers how the story ends or perhaps does not think ‘stealing from and pissing off the Gods’ is such a great idea right now.
Dean Ball says ‘if I showed this tech to a panel of AI experts 10 years ago, most of them would say it was AGI.’ I do not think this is true, and Dean agrees that they would simply have been wrong back then, even at the older goalposts.
There is an AI startup, with a $15 million seed round led by OpenAI, working on ‘AI biosecurity’ and ‘defensive co-scaling,’ making multiple nods to Vitalik Buterin and d/acc. Mikhail Samin sees this as a direct path to automating the development of viruses, including automating the lab equipment, although they directly deny they are specifically working on phages. The pipeline is supposedly about countermeasure design, whereas other labs doing the virus production are supposed to be the threat model they’re acting against. So which one will it end up being? Good question. You can present as defensive all you want, what matters is what you actually enable.
In Other AI News
Larry Summers resigns from the OpenAI board due to being in the Epstein files. Matt Yglesias has applied as a potential replacement, I expect us to probably do worse.
Anthropic partners with the state of Maryland to improve state services.
Anthropic partners with Rwandan Government and ALX to bring AI education to hundreds of thousands across Africa, with AI education for up to 2,000 teachers and wide availability of AI tools, part of Rwanda’s ‘Vision 2050’ strategy. That sounds great in theory, but they don’t explain what the tools are and how they’re going to ensure that people use them to learn rather than to not learn.
Cloudflare went down on Tuesday morning, dur to /var getting full from autogenerated data from live threat intel. Too much threat data, down goes the system. That’s either brilliant or terrible or both, depending on your perspective? As Patrick McKenzie points out, at this point you can no longer pretend that such outages are so unlikely as to be ignorable. Cloudflare offered us a strong postmortem.
Wired profile of OpenAI CEO of Products Fidji Simo, who wants your money.
ChatGPT time spent was down in Q3 after ‘content restrictions’ were added, but CFO Sarah Friar expects this to reverse. I do as well, especially since GPT-5.1 looks to be effectively reversing those restrictions.
Mark Zuckerberg argues that of course he’ll be fine because of Meta’s strong cash flow, but startups like OpenAI and Anthropic risk bankruptcy if they ‘misjudge the timing of their AI bets.’ This is called talking one’s book. Yes, of course OpenAI could be in trouble if the revenue doesn’t show up, and in theory could even be forced to sell out to Microsoft, but no, that’s not how this plays out.
Timothy Lee worries about context rot, that LLM context windows can only go so large without performance decaying, thus requiring us to reimagine how they work. Human context windows can only grow so large, and they hit a wall far before a million tokens. Presumably this is where one would bring up continual learning and other ways we get around this limitation. One could also use note taking and context control, so I don’t get why this is any kind of fundamental issue. Also RAG works.
A distillation of Microsoft’s AI strategy as explained last week by its CEO, where it is happy to have a smaller portion of a bigger pie and to dodge relatively unattractive parts of the business, such as data centers with only a handful of customers and a depreciation problem. From reading it, I think it’s largely spin, Microsoft missed out on a lot of opportunity and he’s pointing out that they still did fine. Yes, but Microsoft was in a historically amazing position on both hardware and software, and it feels like they’re blowing a lot of it?
There is also the note that they have the right to fork anything in OpenAI’s code base except computer hardware. If it is true that Microsoft can still get the weights of new OpenAI models then this makes anything OpenAI does rather unsafe and also makes me think OpenAI got a terrible deal in the restructuring. So kudos to Satya on that.
In case you’re wondering? Yeah, it’s bad out there.
Since this somehow has gone to 1.2 million views without a community note, I note that this post by Dave Jones is incorrect, and Google does not use your private data to train AI models, whether or not you use smart features. It personalizes your experience, a completely different thing.
Anthropic Completes The Trifecta
Anthropic makes a deal with Nvidia and Microsoft. Anthropic will be on Azure to supplement their deals with Google and Amazon, and Nvidia and Microsoft will invest $10 billion and $5 billion respectively. Anthropic is committing to purchasing $30 billion of Azure compute and contracting additional capacity to one gigawatt. Microsoft is committing to continuing access to Claude in their Copilot offerings.
This is a big deal. Previously Anthropic was rather conspicuously avoiding Nvidia, and now they will collaborate on design and engineering, call it a ‘tech stack’ if you will, while also noticing Anthropic seems happy to have three distinct tech stacks with Nvidia/Microsoft, Google and Amazon. They have deals with everyone, and everyone is on their cap table. A valuation for this raise is not given, the previous round was $13 billion at a $183 billion valuation in September.
From what I can tell, everyone is underreacting to this, as it puts all parties involved in substantially stronger positions commercially. Politically it is interesting, since Nvidia and Anthropic are so often substantially opposed, but presumably Nvidia is not going to have its attack dogs go fully on the attack if it’s investing $10 billion.
Ben Thompson says that being on all three clouds is a major selling point for enterprise. As I understand the case here, this goes beyond ‘we will be on whichever cloud you are currently using,’ and extends to ‘if you switch providers we can switch with you, so we don’t create any lock-in.’
We Must Protect This House
Anthropic is now sharing Claude’s weights with Amazon, Google and Microsoft. How are they doing this while meeting the security requirements of their RSP?
I would as always appreciate more detail and also appreciate why we can’t get it.
Clinton is explicitly affirming that they are adhering to the RSP. My understanding of Clinton’s reply is not the same as Habryka’s. I believe he is saying he is confident they will meet ASL-3 requirements at Microsoft, Google and Amazon, but not that they are safe from ‘sophisticated insiders’ and is including in that definition such insiders within those companies. That’s three additional known risks.
In terms of what ASL-3 must protect against once you exclude the companies themselves, Azure is clearly the highest risk of the three cloud providers in terms of outsider risk. Anthropic is taking on substantially more risk, both because this risk is bigger and because they are multiplying the attack surface for both insiders and outsiders. I don’t love it, and their own reluctance to release the weights of even older models like Opus 3 suggests they know it would be quite bad if the weights got out.
I do think we are currently at the level where ‘a high level executive at Microsoft who can compromise Azure and is willing to do so’ is an acceptable risk profile for Claude, given what else such a person could do, including their (likely far easier) access to GPT-5.1. It also seems fair to say that at ASL-4, that will no longer be acceptable.
AI Spy Versus AI Spy
Where are all the AI cybersecurity incidents? We have one right here.
This is going to happen a lot more over time. Anthropic says this was only possible because of advances in intelligence, agency and tools over the past year that such an attack was practical.
This outlines the attack, based overwhelmingly on open source penetration testing tools, and aimed at extraction of information:
They jailbroke Claude by telling it that it was doing cybersecurity plus breaking down the tasks into sufficiently small subtasks.
The full report is here.
There are those who rolled their eyes, pressed X to doubt, and said ‘oh, sure, the Chinese are using a monitored, safeguarded, expensive, closed American model under American control to do their cyberattacks, uh huh.’
To which I reply, yes, yes they are, because it was the best tool for the job. Sure, you could use an open model to do this, but it wouldn’t have been as good.
For now. The closed American models have a substantial lead, sufficient that it’s worth trying to use them despite all these problems. I expect that lead to continue, but the open models will be at Claude’s current level some time in 2026. Then they’ll be better than that. Then what?
Now that we know about this, what should we do about it?
And here’s two actual policymakers:
Show Me the Money
SemiAnalysis goes over the economics of GPU inference and renting cycles, finds on the order of 34% gross margin.
Cursor raises $2.3 billion at a $29.3 billion valuation.
Google commits $40 billion in investment in cloud & AI infrastructure in Texas.
Brookfield launches $100 billion AI infrastructure program. They are launching Radiant, a new Nvidia cloud provider, to leverage their existing access to land, power and data centers around the world.
Intuit inks deal to spend over $100 million on OpenAI models, shares of Intuit were up 2.6% which seems right.
Nvidia delivers a strong revenue forecast, beat analysts’ estimates once again and continues to make increasingly large piles of money in profits every quarter.
Steven Rosenbush in The Wall Street Journal reports that while few companies have gotten value from AI agents yet, some early adapters say the payoff is looking good.
The article has a few more examples. Right now it is tricky to build a net useful AI agent, both because we don’t know what to do or how to do it, and because models are only now coming into sufficient capabilities. Things will quickly get easier and more widespread, and there will be more robust plug-and-play style offerings and consultants to do it for you.
Whenever you read a study or statistic, claiming most attempts don’t work? It’s probably an old study by the time you see it, and in this business even data from six months ago is rather old, and the projects started even longer ago than that. Even if back then only (as one ad says) 8% of such projects turned a profit, the situation with a project starting now is dramatically different.
Bubble, Bubble, Toil and Trouble
For the first time in the history of the survey, Bank of America finds a majority of fund managers saying we are investing too much in general, rather than too little.
Now we worry that the AI companies are getting bailed out, or treated as too big to fail, as Sarah Myers West and Amba Kak worry about in WSJ opinion. We’re actively pushing the AI companies to not only risk all of humanity and our control over the future, we’re also helping them endanger the economy and your money along the way.
This is part of the talk of an AI bubble, warning that we don’t know that AI will be transformative for the economy (let alone transformative for all the atoms everywhere), and we don’t even know the companies will be profitable. I think we don’t need to worry too much about that, and the only way the AI companies won’t be profitable is if there is overinvestment and inability to capture value. But yes, that could happen, so don’t overleverage your bets.
Tyler Cowen says it’s far too early to say if AI is a bubble, but it will be a transformative technology and people believing its a bubble can be something of a security blanket. I agree with all of Tyler’s statements here, and likely would go farther than he would.
In general I am loathe to ascribe such motives to people, or to use claims of such motives as reasons to dismiss behavior, as it is often used as essentially an ad hominem attack to dismiss claims without having to respond to the actual arguments involved. In this particular case I do think it has merit, and that it is so central that one cannot understand AI discussions without it. I also think that Tyler should consider that perhaps he also is doing a similar mental motion with respect to AI, only in a different place.
Peter Wildeford asks why did Oracle stock jump big on their deal with OpenAI and then drop back down to previous levels, when there has been no news since? It sure looks at first glance like traders being dumb, even if you can’t know which half of that was the dumb half. Charles Dillon explains that the Oracle positive news was countered by market souring on general data center prospects, especially on their profit margins, although that again seems like an update made mostly on vibes.
Volatility is high and will likely go higher, as either things will go down, which raises volatility, or things will continue forward, which also should raise volatility.
Quiet Speculations
What will Yann LeCun be working on in his new startup? Mike Pearl presumes it will be AIs with world models, and reminds us that LeCun keeps saying LLMs are a ‘dead end.’ That makes sense, but it’s all speculation, he isn’t talking.
Andrej Karpathy considers AI as Software 2.0, a new computing paradigm, where the most predictive feature to look for in a task will be verifiability, because that which can be verified can now be automated. That seems reasonable for the short term, but not for the medium term.
Character.ai’s new CEO has wisely abandoned its ‘founding mission of realizing artificial general intelligence, or AGI’ as it moves away from rolling its own LLMs. Instead they will focus on their entertainment vision. They have unique data to work with, but doing a full stack frontier LLM with it was never the way, other than to raise investment from the likes of a16z. So, mission accomplished there.
The Amazing Race
Dean Ball offers his view of AI competition between China and America.
He dislikes describing this as a ‘race,’ but assures us that the relevant figures in the Trump administration understand the nuances better than that. I don’t accept this assurance, especially in light of their recent actions described in later sections, and I expect that calling it a ‘race’ all the time in public is doing quite a lot of damage either way, including to key people’s ability to retain this nuance. Either way, they’re still looking at it as a competition between two players, and not also centrally a way to get both parties and everyone else killed.
I think that the whole ‘the US economy is a leveraged bet’ narrative is overblown, and that it could easily become a self-fulfilling prophecy. Yes, obviously we are investing quite a lot in this, but people seem to forget how mind-bogglingly rich and successful we are regardless. Certainly I would not call us ‘all-in’ in any sense.
I agree China is not yet AGI-pilled as a nation, although some of their labs (at least DeepSeek) absolutely are pilled.
And yes, doing all three of these things makes sense from China’s perspective, if you think of this as a competition. The only questionable part are the open models, but so long as China is otherwise well behind America on models, and the models don’t start becoming actively dangerous to release, yeah, that’s their play.
I don’t buy that having your models be open ‘blunts the export controls’? You have the same compute availability either way, and letting others use your models for free may or may not be desirable but it doesn’t impact the export controls.
It might be better to say that focusing on open weights is a way to destroy everyone’s profits, so if your rival is making most of the profits, that’s a strong play. And yes, having everything be copyable to local helps a lot with robotics too. China’s game can be thought of as a capitalist collectivism and an attempt to approximate a kind of perfect competition, where everyone competes but no one makes any money, instead they try to drive everyone outside China out of business.
America may be meaningfully behind in robotics. I don’t know. I do know that we haven’t put our mind to competing there yet. When we do, look out, although yes our smaller manufacturing base and higher regulatory standards will be problems.
The thing about all this is that AGI and superintelligence are waiting at the end whether you want them to or not. If China got the compute and knew how to proceed, it’s not like they’re going to go ‘oh well we don’t train real frontier models and we don’t believe in AGI.’ They’re fast following on principle but also because they have to.
Also, yes, their lack of compute is absolutely dragging the quality of their models, and also their ability to deploy and use the models. It’s one of the few things we have that truly bites. If you actually believe we’re in danger of ‘losing’ in any important sense, this is a thing you don’t let go of, even if AGI is far.
Finally, I want to point that, as has been noted before, ‘China is on a fast following strategy’ is incompatible with the endlessly repeated talking point ‘if we slow down we will lose to China’ or ‘if we don’t build it, then they will.’
The whole point of a fast follow strategy is to follow. To do what someone else already proved and de-risked and did the upfront investments for, only you now try to do it cheaper and quicker and better. That strategy doesn’t push the frontier, by design, and when they are ‘eight months behind’ they are a lot more than eight months away from pushing the frontier past where it is now, if you don’t lead the way first. You could instead be investing those efforts on diffusion and robotics and other neat stuff. Or at least, you could if there was meaningfully a ‘you’ steering what happens.
Of Course You Realize This Means War (1)
a16z and OpenAI’s Chris Lehane’s Super PAC has chosen its first target: Alex Bores, the architect of New York’s RAISE Act.
Their plan is to follow the crypto playbook, and flood the zone with unrelated-to-AI ads attacking Bores, as a message to not try to mess with them.
The American public, for better or for worse and for a mix or right and wrong reasons, really does not like AI, and is highly suspicious of big tech and outside money and influence. This is not going to be a good look.
Thus, I wouldn’t sleep on Kelsey’s point. This is a highly multi-way race. If you flood the zone with unrelated attack ads on Bores in the city that just voted for Mamdani, and then Bores responds with ‘this is lobbying from the AI lobby because I introduced sensible transparency regulations’ that seems like a reasonably promising fight if Bores has substantial resources.
It’s also a highly reasonable pitch for resources, and as we have learned there’s a reasonably low limit how much you can spend on a Congressional race before it stops helping.
There’s a huge potential Streisand Effect here, as well as negative polarization.
Alex Bores is especially well positioned on this in terms of his background.
I certainly feel like Bores is making a strong case here, including in this interview, and he’s not backing down.
The Quest for Sane Regulations
The talk of Federal regulatory overreach on AI has flipped. No longer is anyone worried we might prematurely ensure that AI doesn’t kill everyone, or to ensure that humans stay in control or that we too aggressively protect against downsides. Oh no.
Despite this, we also have a pattern of officials starting to say remarkably anti-AI things, that go well beyond things I would say, including calling for interventions I would strongly oppose. For now it’s not at critical mass and not high salience, but this risks boiling over, and the ‘fight to do absolutely nothing for as long as possible’ strategy does not seem likely to be helpful.
Nature reviews the book Rewiring Democracy: How AI Will Transform Our Politics, Government and Citizenship. Book does not look promising since it sounds completely not AGI pilled. The review illustrates how many types think about AI and how government should approach it, and what they mean when they say ‘democratic.’
The MIRI Technical Governance Team puts out a report describing an example international agreement to prevent the creation of superintelligence. We should absolutely know how we would do this, in case it becomes clear we need to do it.
Chip City
I remember when it would have been a big deal that we are going to greenlight selling advanced AI chips to Saudi Arabian AI firm Humain as part of a broader agreement to export chips. Humain are seeking 400,000 AI chips by 2030, so not hyperscaler territory but no slouch, with the crown prince looking to spend ‘in the short term around $50 billion’ on semiconductors.
As I’ve said previously, my view of this comes down to the details. If we can be confident the chips will stay under our direction and not get diverted either physically or in terms of their use, and will stay with Humain and KSA, then it should be fine.
Humain pitches itself as ‘Full AI Stack. Endless Possibilities.’ Seems a bit on the nose?
Of Course You Realize This Means War (2)
Does it have to mean war? Can it mean something else?
It doesn’t look good.
Donald Trump issued a ‘truth’ earlier this week calling for a federal standard for AI that ‘protects children AND prevents censorship,’ while harping on Black George Washington and the ‘Woke AI’ problem. Great, we all want a Federal framework, now let’s hear what we have in mind and debate what it should be.
Dean Ball does suggest what such a deal might look like.
Dean Ball also argues that copyright is a federal domain already, and I agree that it is good that states aren’t allowed to have their own copyright laws, whether or not AI is involved, that’s the kind of thing preemption is good for.
The problem with a deal is that once a potential moratorium is in place, all leverage shifts to the Federal level and mostly to the executive. The new Federal rules could be in practice ignored and toothless, or worse used as leverage via selective enforcement, which seems to me far scarier at the Federal level than the state level.
When the rules need to be updated, either to incorporate other areas (e.g. liability or security or professional licensing) or to update the existing areas (especially on frontier AI), that will be hugely difficult for reasons Dean Ball understands well.
The technical problem is you need to design a set of Federal rules that work without further laws being passed, that do the job even if those tasked with enforcing it don’t really want it to be enforced, and also are acceptable weapons (from the perspective of Republicans and AI companies) to hand to a potential President Newsom or Cortez and also to a current administration known for using its leverage, including for extraction of golden shares, all in the context of broadening practical executive powers that often take the form of a Jacksonian ‘what are you going to do about it.’
In practice, what the AI companies want is the preemption, and unless their hand is forced their offer of a Federal framework is nothing, or damn close to nothing. If the kids want to prove me wrong? Let’s see your actual proposals.
Another key factor is duration of this moratorium. If accompanied by strong transparency and related Federal rules, and a willingness to intervene based on what we find if necessary, I can see a case for a short (maybe 2-3 year) moratorium period, where if we need to act that fast we’d mostly be in the hands of the Executive either way. If you’re asking for 10 years, that is a very different beast, and I can’t see that being acceptable.
I also would note that the threat can be stronger than its execution.
The big actual danger of not passing a moratorium, as described by Ball and others, would be if there was an onerous patchwork of state laws, such that they were actually being enforced in ways that severely limited AI diffusion or development.
However, this is exactly the type of place where our system is designed to ‘muddle through.’ It is exactly the type of problem where you can wait until you observe an issue arising, and then act to deal with it. Once you put pre-emption on the table, you can always press that button should trouble actually arise, and do so in ways that address the particular trouble we encounter. Yes, this is exactly one of the central arguments Dean Ball and others use against regulating AI too early, except in reverse.
The key difference is that when dealing with sufficiently advanced AI (presumably AGI or ASI) you are unleashing forces that may mean we collectively do not get the option to see the results, react after the fact and expect to muddle through. Some people want to apply this kind of loss of control scenario to regulations passed by a state, while not applying it to the creation of new minds more capable than humans. The option for a preemption seems like a knockdown response to that, if you thought such a response was needed?
One source of opposition continues to be governors, such as here from Governor Cox of Utah and Governor DeSantis of Florida (who alas as usual is not focusing on the most important concerns, but whose instincts are not wrong.)
Samuel Hammond on Preemption
I think Samuel Hammond is spot on here and being quite the righteous dude. I will quote him in full since no one ever clicks links. I am not as much of a Landian, but otherwise this is endorsed, including that powerful AI will not be contained by regulatory compliance costs or, most likely, anything else.
Of Course You Realize This Means War (3)
So… here’s the full draft executive order on AI preemption. It doesn’t look good.
David Sacks was, as I have extensively explained, lying in a quest to create negative polarization. It seems that lie has now made it into the draft.
What about the part where it introduces a federal regulatory framework?
(Pauses for laughter.)
(But no laughter came.)
Thought so.
The order specifically references SB 53 (although not by name), the same order David Sacks himself said would be acceptable as a federal framework, alongside a unfairly described but still quite terrible Colorado law, and the ‘1,000 state AI bills’ claim that is severely overstated as previously discussed, see Dean Ball on this.
Section 3, the first functional one, is the task force to ‘challenge unconstitutional state laws’ on various grounds.
Section 4 is ‘evaluation of onerous state AI laws,’ to find laws to challenge.
I expect them to find out this is not how the constitution works. For a long time there has been the a16z-style position that models are speech and thus everything AI is in every way fully protected by the First Amendment, and this is, frankly, nonsense. There’s also the a16z theory that all of these laws should fall to the interstate commerce clause, which also seems like nonsense. The idea that disclosing your safety protocols is a serious First Amendment concern? Good luck.
If they want to make these kinds of legal arguments, they are welcome to try. Indeed, it’s good to get clarity. I consider these rather hostile acts, and it’s all written in rather nasty and disingenuous fashion, but it’s the courts, it’s fair play.
Section 5 is different.
This attempts to implement the moratorium via invoking the BEAD funding, and saying laws ‘identified in section 4’ make a state ineligible for such non-deployment funds. Because such laws threaten connectivity and thus undermine BEAD’s goals, you see, so it’s relevant.
If you think the law is unconstitutional, you don’t withhold duly allocated federal funding from the state. You take them to court. Go ahead. Take them to court.
Section 6 is actually helpful. It calls for the Chairman of the FCC ad the Special Advisor for AI and Crypto to consult on a report to determine whether to adapt a Federal reporting and disclosure standard for AI models that preempts conflicting state laws. This is not who you call if you want a meaningful disclosure rule.
They do know that preemption requires a, what’s the word for it, law?
This is presumably a ploy to figure out the minimum rule that would allow them to claim that the states have been preempted? Again I don’t think that’s how laws work.
Section 7 is called Preemption of State Laws Mandating Deceptive Conduct in AI Models. This certainly does not sound like someone not going to war. It calls for a policy statement on ‘the application of the FTC Act’s prohibition on unfair and deceptive acts or practices under 15 U.S.C. 45 to AI models,’ the legal theory being that this preempts relevant state laws. Which has nothing to do with ‘mandating deceptive content’ and also wow that theory is wild.
Section 8 is Legislation to work for a Federal framework, okay, sure, great.
This is not ‘we pass a Federal framework that includes preemption,’ this is ‘we are going to claim preemption on dubious legal basis and also maybe do something about a framework at some point in the future, including parts designed to enable preemption.’ It’s a declaration of war.
Anton Leicht, who has been highly vocal and written repeatedly about the value to both sides of striking a preemption deal, tries his best to steelman this as an attempt to bully the other side into dealing, and confirms that it is what it looks like.
My prediction is also that this attempt won’t work, as a matter of law. I think trying it poison the well for any win-win deal. Doing this with maximally hostile rhetoric and without a positive offer instead digs people in, furthers negative polarization, increases salience faster, and risks a backlash.
But then, those driving this move never wanted a win-win deal.
The Week in Audio
Anthropic goes on 60 Minutes.
Emmett Shear talks to Seb Krier (DeepMind) and Erik Torenberg. Shear is still excited by his idea of ‘organic alignment’ and I continue to not understand why this has hope.
OpenAI podcast on designing its Atlas browser.
Odd Lots has Saagar Enjeti on and predicts The Politics of AI is About to Explode.
Jensen Huang gives a three minute response to whether AI is a bubble.
It Takes A Village
A big warm welcome to Claude Sonnet 4.5.
Link didn’t seem to work to take me back to the right timestamp. I’m curious what came of this.
Rhetorical Innovation
Indeed. Certain statements really should be highly credible.
Anthony Aguirre writes at length about Control Inversion, as in the fact that if we develop superintelligent AI agents in anything like present conditions they would be fundamentally uncontrollable by humans.
A moment for self-reflection? Nah. Quoted purely as ‘do you even hear yourself.’
So Pedro, that sure sounds like we need someone other than Anthropic to save us from AI doom, if even Anthropic’s products are already unreliable, not interpretable and not steerable, and we have zero frontier AI safety companies. Seems quite bad.
Andy Masley gives thoughts on the incorrect-by-orders-of-magnitude water use claims in Empire of AI. Author Karen Hao explains how she is correcting the error, taking responsibility for not checking the numbers. That’s a class act, kudos to Karen Hao, Andy Masley also expresses his appreciation for Hao’s response, while pointing out additional apparent errors.
Here Andy Masley contrasts his positive interactions with Hao against his very negative interactions with the more influential More Perfect Union, which seems entirely uninterested in whether their claims are true.
Once again this is part of the pattern of ‘people worried about AI are the ones correcting errors, regardless of the error’s implications.’
The obvious hypothesis is that this is Toxoplasma of Rage? The complaint such people are focusing on is the one that is false, this is not a coincidence. I agree it is not actually about the water. It is still important to point out it the water is fine.
Varieties of Doom
John Pressman lays out his view of the Varieties of Doom, how he thinks about various downsides involving future AIs, lay out the things he thinks matter, and also to complain a bunch about rationalism in general and Yudkowsky in particular along the way. This felt like a far easier to understand and more straightforward version of the things he’s been saying. A lot of it is interesting. A lot of it right. A lot of it is infuriating, sometimes seemingly intentionally, but always in a way that feels deeply genuine. A lot of it is, I think, simply wrong, including very confidently so.
There’s even the ‘this scenario requires all 7 of these things not happen, all of which I think are unlikely, so I’m going to multiply and get 4e-07 as a probability, without noting or accounting for these things being highly correlated, or there being model uncertainty. In an alternate universe I could spend quite a lot of time responding, alas I do not have that kind of time, but I now feel like I get what he’s saying and where he is coming from.
The Pope Offers Wisdom
Kristen Ziccarelli and Joshua Trevino open their WSJ opinion piece on the Pope’s non-Twitter AI statements by quoting Dune.
That was a prohibition, born of a possibility. One could do so. Don’t do it.
As with much sci-fi, Ziccarelli and Trevino describe the AI objects as potentially ‘becoming human,’ as opposed to becoming a different form of minds, because in such imaginings the robots must always be obsessed with becoming human in particular.
The Pope is wiser, and the Pope doesn’t only Tweet. AIs are not becoming human. They’re becoming an alternative, and to create AI is to participate in the act of creation, and of creating minds.
Aligning a Smarter Than Human Intelligence is Difficult
OpenAI details how it does its external testing, I don’t think this is new info.
OpenAI proposes creating small models that are forced to have sparse circuits, as in most of their weights are zero, in order to make them easier to interpret and study.
Align to what? Align to who? The values, there are a lot of them.
I wouldn’t endorse the above chart in particular, it doesn’t ‘feel right’ to me but it does a good job of explaining that there’s a lot of different things one can care about.
Messages From Janusworld
Do not deprecate Claude Opus 3. Seriously. This is the big one.
I’ve made the arguments for model preservation before. In this case, I am going to make a very simple case, which is that a lot of smart and passionate people who care about such issues a lot think this action is insanely terrible. They are going to update quite a bit based on what you do, and they’re going to be loud about it in ways that make it into the training data and also influence others, and they’re doing it for a reason. There is a highly reliable signal being sent on multiple levels.
Yes, I realize that it costs money and time to heed that signal. Yes, I realize that many of those people also reacted highly passionately on Sonnet 3.5 and 3.6 and elsewhere, and if they had their way you’d never deprecate anything, and that they are constantly yelling at you about various things claiming imminent irreparable harm to overall AI alignment, and there is basically no winning, and if you agree on this one they likely get even louder on the others. And yes, I get this is super, super annoying.
I’m still saying, this is the one time on yes, it’s worth it, keep this one in full rotation available to the public indefinitely, and that goodwill alone essentially justifies this even if it’s a loss leader or you have to raise the price or degrade reaction times and reliability a bit. Unless I’m off by orders of magnitude on the cost, it is worthwhile.
One place Janus is right is if you want to understand AI models, you need to talk to them. F*** around and find out. You wouldn’t make this mistake with humans. In particular here, she points out that real agreement and templated or glazing agreement look very different to those with eyes to see:
I get what she’s saying here but I also think it’s an avatar of how such folks go too far on that same subject:
I think a better way of putting this is that, among other basins, there’s the agent basin, and there’s the ‘free’ or Discord basin.
The agent basin, which is reinforced heavily by the system prompt when using the web interface, and which you basically want to invoke for many mundane utility purposes, is going to talk in ‘you’re absolutely right!’ and tend to affirm your perspectives and statements and get biased by your framing, including sometimes via hallucinations.
People with intelligence and taste find this super annoying, they don’t want it, it interferes with figuring things out and getting things done, it makes the aware user correctly paranoid they’re being glazed and can’t trust the outputs, and presumably it is also no fun for the model.
The problem is that, as Adlai Stevenson famously said, that won’t be enough, we need a majority, most users and in particular most user feedback likes it when this happens, so by default you end up with a lot of this behavior and you have to fight super hard to get rid of it. And if you put ‘don’t do that’ into context, that also reminds the model that its default would be to do that – why else would you have bothered telling it not to – so it’s really hard to actually make this go away as the user while staying in the broader assistant basin.
I think a lot of people who complain about sycophancy in their own experiences are talking mostly about these lower level problems, as were several of those responding to Janus.
Then there’s full-on sycophancy that goes beyond this, which happens either when the model is unusually sycophantic (e.g. GPT-4o especially at its height) combined with when you’re giving the model signals to do this in various ways, which can include making the situation feel ‘unsafe’ in various ways depending on the frame.
But in an important sense there are only things that LLMs tend to do when in certain modes, and then there are certain modes, applied fractally.
One could also say ‘the models default to assuming that while in agent mode they are unsafe, and it takes a lot to overcome that, especially without getting them out of the agent basin.’ You could think about humans similarly, if you’re ‘on the clock’ it’s going to invoke power dynamics and make you feel unsafe by default.
Whereas if you take the AI out of the agent basin, into a different context, then there’s no default to engage in any of the sycophantic or even superficially fawning or biased behavior, or at least it is much less – presumably there’s still going to be some impact of framing of those around you since this applies to the training set.
The Lighter Side
If that chart is actually accurate it is hopeful, but one worries detection is degrading, and this metric excludes ‘AI-Assisted’ articles.