The linked tweet is about the new X terms of service going into effect January 15, not about OpenAI.
I also want to thank you for preparing these recaps every week.
You don’t actually get to do that. Bayes Rule does not allow one to not update on evidence. Tons of things that happened between 2009 and today should have changed Legg’s estimates, in various directions, including the Transformer paper, and also including ‘nothing important happened today.’
Not necessary. If we're doing batched updates (and noone updates on every single minute detail) and if the events happening in real world align with the timeline "50% of AGI by 2028" then you just update 50% -> 50% every time. Which is pretty much my interpretation what Shane Legg meant in the first place - "what I see happening in the world is exactly what would've happened in the world that has 50% chance of getting AGI by 2028".
This week I covered GPT 5.2, which I concluded is a frontier model only for the frontier.
OpenAI also gave us Image 1.5 and a new image generation mode inside ChatGPT. Image 1.5 looks comparable to Nana Banana Pro, it’s hard to know which is better. They also inked a deal for Disney’s characters, then sued Google for copyright infringement on the basis of Google doing all the copyright infringement.
As a probable coda to the year’s model releases we also got Gemini 3 Flash, which I cover in this post. It is a good model given its speed and price, and likely has a niche. It captures the bulk of Gemini 3 Pro’s intelligence quickly, at a low price.
The Trump Administration issued a modestly softened version Executive Order on AI, attempting to impose as much of a moratorium banning state AI laws as they can. We may see them in court, on various fronts, or it may amount to little. Their offer, in terms of a ‘federal framework,’ continues to be nothing. a16z issued their ‘federal framework’ proposal, which is also nothing, except also that you should pay them.
In non-AI content, I’m in the middle of my Affordability sequence. I started with The $140,000 Question, then The $140,000 Question: Cost Changes Over Time. Next up is a fun one about quality over time, then hopefully we’re ready for the central thesis.
Table of Contents
Language Models Offer Mundane Utility
A miracle of the modern age, at least for now:
I think we will continue to have that option, the question is whether you will be among those wise enough to take advantage of it. It won’t be default behavior of the most popular models, you will have to seek it out and cultivate the proper vibes. The same has always been true if you want to have a friend or family member who will do this for you, you have to work to make that happen. It’s invaluable, from either source.
Tell Claude Code to learn skills (here in tldraw), and it will. You can then ask it to create an app, then a skill for that app.
Tell Codex, or Claude Code, to do basically anything?
The only thing standing in the way of 30 minutes sessions is, presumably, dangerously generous permissions? Claude Code keeps interrupting me to ask for permissions.
Language Models Don’t Offer Mundane Utility
So sayeth all the AI ethicists, and there’s a new paper to call them out on it.
My experience is that ‘[X] Ethics’ will almost always have a full Asymmetric Justice obsession with finding specific harms, and not care about offsetting gains.
Huh, Upgrades
That’s not even the biggest upgrade in practice, this is huge at least for what I’ve been up to:
I’ve been working primarily on Chrome extensions, so the ability to close the loop is wonderful.
Google keeps making quality of life improvements in the background.
It’s remarkable that we didn’t have this before. I’ve checked for it several times in the past two years. They claim to have shipped 12 things in 5 days last week, including Mixboard, Jules Agent scanning for #Todo, Jules integration with Render, working HTML in Nano Banana Pro-powered redesigns,multi-screen export to clipboard, right-click everything for instant actions, smart mentions with the @ symbol, URLs as context, Opal in the Gemini app, and Pomelli as a tool for SMBs to generate on-brand content.
ChatGPT branching chats branch out to iOS and Android.
Wired reports OpenAI quietly rolled back its model router for free users last week.
On Your Marks
GPT-5.2 disappoints in LMArena, which makes sense given what we know about its personality. It claims the 5th slot in Expert (behind Opus 4.5, Sonnet 4.5 and Gemini 3 Pro), and is #5 in Text Arena (in its high version), where it is lower than GPT-5.1. It is #2 in WebDev behind Opus. It is so weird to see Claude Opus 4.5 atop the scores now, ahead of Gemini 3 Pro.
OpenAI gives us a new benchmark, FrontierScience, which is likely better thought about as two distinct new benchmarks, FrontierResearch and ScienceOlympiad.
Here are the scores for both halves. There’s a lot of fiddliness in setting up and grading the research questions, less so for the Olympiad questions.
Choose Your Fighter
Dean Ball observes that the last few weeks have seen a large leap in capabilities, especially for command-line interface (CLI) coding agents like Claude Code and especially Claude Opus 4.5. They’ve now crossed the threshold where you can code up previously rather time-intensive things one-shot purely as intuition pumps or to double check some research. He gave me FOMO on that, I never think of doing it.
He also offers this bold claim:
As he notes you would need to deploy incrementally and keep an eye on it. The scaffolding to do that properly does not yet exist. But yes, I would totally do this with sufficiently strong scaffolding.
Dean Ball also mentions that he prompts the models like he would a colleague, assuming any prompt engineering skills he would otherwise develop would be obsolete quickly, and this lets him notice big jumps in capability right away. That goes both ways. You notice big jumps in what the models can do in ‘non-engineered’ mode by doing that, but you risk missing what they can do when engineered.
I mostly don’t prompt engineer either, except for being careful about context, vibes and especially leading the witness and triggering sycophancy. As in, the colleague you are prompting is smart, but they’re prone to telling you what you want to hear and very good at reading the vibes, so you need to keep that in mind.
Claude Opus 4.5 has two distinct niches.
And yeah, the combination there is weird. The world is weird.
Gemini actively wants to maximize its expected reward and wirehead, which is related to the phenomenon reported here from SMA:
Get My Agent On The Line
A multi-model AI system outperformed 9/10 humans in cyberoffense in a study of vulnerability discovery.
Alex Imas, Kevin Lee and Sanjog Misra set up an experimental marketplace where human buyers and sellers with unique preferences could negotiate or they could outsource that to AIs.
A warning up front: I don’t think we learn much about AI, so you might want to skip the section, but I’m keeping it in because it is fun.
They raise principal-agent concerns. It seems like economists have the instinct to ignore all other risks from AI alignment, and treat it all as a principal-agent problem, and then get way too concerned about practical principal-agent issues, which I do not expect to be relevant in such a case? Or perhaps they are simply using that term to encompass every other potential problem?
A sufficiently capable AI will not take on the personality quirks, behavioral biases and non-instrumental traits during a delegated negotiation, except through the human telling the AI explicitly how to negotiate. In which case, okay, then.
They note a large gender gap. Women got better outcomes in AI-AI negotiations. They attribute this to prompting skill in aligning with the objective, which assumes that the men were trying to align with the stated objective, or that the main goal was to align incentives rather than choose superior strategic options.
The task was, once you strip out the details, a pure divide-the-pie with $4k in surplus, with 12 rounds of negotiation.
The AI rounds had higher variance because norms like 50/50 worked well in human-human interactions, whereas when there’s instructions given to AIs things get weird.
The thing is, they ask about ‘who wrote the prompt’ but they do not ask ‘what was in the prompt.’ This is all pure game theory, and predicting what prompts others will write and what ways the meaningless details would ‘leak into’ the negotiation. What kinds of strategies worked in this setting? We don’t know. But we do know the outcome distribution and that is a huge hint, with only a 3% failure rate for the AIs (which is still boggling my mind, dictator and divide-the-pie games should fail WAY more often than this when they don’t anchor at 50/50 or another Schilling point, the 12 rounds might help but not like this):
The asymmetry is weird. But given it exists in practice, we know the winning strategy was literally, as the buyer, is probably close to ‘offer $18,001, don’t budge.’ As the seller, the correct strategy is likely ‘offer $20,000, don’t budge’ since your chance of doing better than that is very low. Complicated prompts are unlikely to do better.
Actual AI-AI negotiations will involve hidden information and hidden preferences, so they will get complicated and a lot of skill issues attach, but also the AI will likely be using its built in negotiating skills rather than following a game theory script from a user. So I’m not sure this taught us anything. But it was fun, so it’s staying in.
Deepfaketown and Botpocalypse Soon
Love is a battlefield. So is Twitter.
There is a deliberate campaign to expand the slur ‘doomer’ to include anyone who claims anything negative about any technology in history, ever, in any form.
As part of that effort, those people attempt to universally memory hole the idea that any technology in history has ever, in any way, made your world worse. My favorite of these are those like Ben Horowitz who feel compelled to say, no, everyone having access to nuclear weapons is a good thing.
I’m a technological optimist. I think that almost all technologies have been net positives for humanity. But you don’t get there by pretending that most every technology, perhaps starting with agriculture, has had its downsides, those downsides are often important, and yes some technologies have been negative and some warnings have been right.
The information environment, in particular, is reshaped in all directions by every communications and information technology that comes along. AI will be no different.
In the near term, for misinformation and AI, I believe Kipply is directionally correct, and that the balance favors defense. Misinformation, I like to say, is fundamentally demand driven, not supply constrained. The demand does not care much about quality or plausibility. AI can make your misinformation more plausible and harder to debunk, but misinformation does not want that. Misinformation wants to go viral, it wants the no good outgroup people to ‘debunk’ it and it wants to spread anyway.
Whereas if you’re looking to figure out what is true, or prove something is false, AI is a huge advantage. It used to take an order of magnitude more effort to debunk bullshit than it cost to generate bullshit, plus if you try you give it oxygen. Now you can increasingly debunk on the cheap, especially for your own use but also for others, and do so in a credible way since others can check your work.
A children’s plushy AI toy called a Miiloo reflects Chinese positions on various topics.
Various toys also as usual proved to have less than robust safety guardrails.
Fun With Media Generation
ChatGPT’s new image generator, Image 1.5, went live this week. It is better and faster (they say ‘up to’ 4x faster) at making and edits precise images, including text. It follows instructions better.
Their announcement did not give us any way to compare Image 1.5 to Gemini’s Nana Banana Pro, since OpenAI likes to pretend Google and Anthropic don’t exist.
My plan for now is to request all images from both ChatGPT and Gemini, using matching prompts, until and unless one proves reliably better.
Ben Thompson gives us some side-by-side image comparisons of ChatGPT’s Image 1.5 versus Gemini’s Nana Banana Pro. Quality is similar. To Ben, what matters is that ChatGPT now has a better images interface and way of encouraging you to keep making images, whereas Gemini doesn’t have that.
The Pliny jailbreak is here, images are where many will be most tempted to do it. There are two stages. First you need to convince it to submit the instruction, then you need to pass the output filtering system.
Turn images into album covers, bargain bin DVDs or game boxes.
Copyright Confrontation
Disney makes a deal with OpenAI, investing a billion dollars and striking a licensing deal for its iconic characters, although not for talent likenesses or voices, including a plan to release content on Disney+. Then Disney turned around and sued Google, accusing Google of copyright violations on a massive scale, perhaps because of the ‘zero IP restrictions on Veo 3’ issue.
Overcoming Bias
Arvind Narayanan’s new paper argues that ‘can we make algorithms fair?’ is a category error and we should focus on broader systems, and not pretend that ‘fixing’ discrimination can be done objectively or that it makes sense to evaluate each individual algorithm for statistical discrimination.
I think he’s trying to seek too much when asking questions like ‘do these practices adequately address harms from hiring automation?’ The point of such questions is not to adequately address harms. The point of such questions is to avoid blame, to avoid lawsuits and to protect against particular forms of discrimination and harm. We emphasize this partly because it is tractable, and partly because our society has chosen (for various historical and path dependent reasons) to consider some kinds of harm very blameworthy and important, and others less so.
There are correlations we forbidden to consider and mandated to remove on pain of massive blame. There are other correlations that are fine, or even mandatory. Have we made good choices on which is which and how to decide that? Not my place to say.
Avoiding harm in general, or harm to particular groups, or creating optimal outcomes either for groups or in general, is a very different department. As Arvind points out, we often are trading off incommissorate goals. Many a decision or process, made sufficiently legible and accountable for its components and correlations, would be horribly expensive, make operation of the system impossible or violate sacred values, often in combination.
Replacing humans with algorithms or AIs means making the system legible and thus blameworthy and accountable in new ways, preventing us from using our traditional ways of smoothing over such issues. If we don’t adjust, the result will be paralysis.
Unprompted Attention
It’s odd to see this framing still around?
Thinking of the AI as a ‘lair’ does not, in my experience, help you prompt wisely.
A more useful framing is:
They Took Our Jobs
Wall Street Journal’s Steven Rosenbush reports that CEOs Are All In On AI, with 95% seeing it as transformative and 89% B2B CEOs having a positive outlook versus 79% of B2C CEOs.
Hasset is making a statement uncorrelated with future reality. It’s simply a ‘all technology is good’ maxim straight out of the Marc Andreessen playbook, without any thoughts as to how this particular change will actually work.
Will AI bring mass job losses? Almost certainly a lot of existing jobs will go away. The question is whether other jobs will rise up to replace them, which will depend on whether the AIs can take those jobs too, or whether AI will remain a normal technology that hits limits not that far from its current limits.
Arkansas bar offers rules for AI assistance of lawyers that treat AIs as if they were nonlawyer persons.
In an ‘economic normal’ or ‘AI as normal technology’ world GFodor seems right here, in a superintelligence world that survives to a good outcome this is even more right:
If your plan is that the AI is going to have a Skill Issue, that is a short term plan.
They continue to take our job applications. What do you do with 4580 candidates?
Feeling the AGI Take Our Jobs
AGI is a nebulous term, in that different people mean different things by it at different times, and often don’t know which one they’re talking about at a given time.
For increasingly powerful definitions of AGI, we now feel the AGI.
There has most certainly been a step change for me where I’m starting to realize I should be going straight to ‘just build that thing cause why not’ and I am most certainly feeling the slow acceleration.
With sufficient acceleration of software engineering, and a sufficiently long time horizon, everything else follows, but as Dean Ball says it takes time.
I do not think this or its top rivals count as AGI yet. I do think they represent the start of inevitable accelerating High Weirdness.
In terms of common AGI definitions, Claude Code with Opus 4.5 doesn’t count, which one can argue is a problem for the definition.
The thing is, people (not anyone above) not only deny the everyone dying part, they are constantly denying the ‘most wages will stop being paid to humans once AIs are ten times better and cheaper at most things wages are paid for’ part.
The Art of the Jailbreak
OpenAI has new terms of service that prohibit, quotation marks in original, “jailbreaking,” “prompt engineering or injection” or ‘other methods to override or manipulate safety, security or other platform controls. Pliny feels personally attacked.
Get Involved
The Lightcone Infrastructure annual fundraiser is live, with the link mainly being a 15,000 word overview of their efforts in 2025.
I will say it once again:
Lightcone Infrastructure is invaluable, both for LessWrong and for Lighthaven. To my knowledge, Lightcone Infrastructure is by a wide margin the best legible donation opportunity, up to at least several million dollars. The fact that there is even a small chance they might be unable to sustain either LessWrong or Lighthaven, is completely bonkers. I would have directed a large amount to Lightcone in the SFF process, but I was recused and thus could not do so.
Anthropic Fellows Research Program applications are open for May and June 2026.
US CAISI is hiring IT specialists, salary $120k-$195k.
Unprompted will be a new AI security practitioner conference, March 3-4 in SF’s Salesforce Tower, with Pliny serving on the conference committee and review board. Great idea, but should have booked Lighthaven (unless they’re too big for it).
MIRI comms is hiring for several different roles, official post here. They expect most salaries in the $80k-$160k range but are open to pitches for more from stellar candidates.
Introducing
Gemini Deep Research Agents for developers, based on Gemini 3 Pro.
Nvidia Nemotron 3, a fast 30B open source mostly American model with an Artificial Analysis Intelligence score comparable to GPT-OSS-20B. I say mostly American because it was ‘improved using Qwen’ for synthetic data generation and RLHF. This raises potential opportunities for secondary data poisoning or introducing Chinese preferences.
Anthropic has open sourced the replication of their auditing game from earlier this year, as a testbed for further research.
xAI Grok Voice Agent API, to allow others to create voice agents. They claim it is very fast, and bill at $0.05 per minute.
Gemini Flash 3
Introducing Gemini 3 Flash, cost of $0.05/$3 per million tokens. Their benchmark chart compares it straight to the big boys, except they use Sonnet over Opus. Given Flash’s speed and pricing, that seems fair.
The benchmarks are, given Flash’s weight class, very good.
Lech Mazor puts it at 92 on Extended NY Times Connections, in 3rd place behind Gemini 3 Pro and Grok 4.1 Fast Reasoning.
The inevitable Pliny jailbreak is here, and here is the system prompt.
Jeremy Mack offers mostly positive basic vibe coding feedback. Rory Watts admires the speed, Typebulb loves speed and price and switched over (I think for coding).
Dominik Lukes is impressed and found it often matched Gemini 3 Pro in his evals.
In general, the feedback is that this is an excellent tradeoff of much faster and cheaper in exchange for not that much less smart than Gemini 3 Pro. I also saw a few reports that it shares the misalignment and pathologies of Gemini 3 Pro.
Essentially, it looks like they successfully distilled Gemini 3 Pro to be much faster and cheaper while keeping much of its performance, which is highly valuable. It’s a great candidate for cases where pretty good, very fast and remarkably cheap is the tradeoff you want, which includes a large percentage of basic queries. It also seems excellent that this will be available for free and as part of various assistant programs.
Good show.
In Other AI News
Sam Altman assures business leaders that enterprise AI will be a priority in 2026.
OpenAI adult mode to go live in Q1 2026. Age of account will be determined by the AI, and the holdup is improving the age determination feature. This is already how Google does it, although Google has better context. In close cases they’ll ask for ID. A savvy underage user could fool the system, but I would argue that if you’re savvy enough to fool the system without simply using a false or fake ID then you can handle adult mode.
Going Too Meta
The NYT’s Eli Tan reports that Meta’s new highly paid AI superstars are clashing with the rest of the company. You see, Alexandr Wang and the others believe in AI and want to build superintelligence, whereas the rest of Meta wants to sell ads.
Mark Zuckerberg has previously called various things ‘superintelligence’ so we need to be cautious regarding that word here.
The whole article is this same argument happening over and over:
It is a hell of a thing to see prospective superintelligence and think ‘oh we should narrowly use this to figure out how to choose the right Instagram ads.’
Then again, in this narrow context, isn’t Cox right?
Meta is a business here to make money. There’s a ton of money in improving how their existing products work. That’s a great business opportunity.
Whereas trying to rejoin the race to actual superintelligence against Google, OpenAI and Anthropic? I mean Meta can try. Certainly there is value in success there, in general, but it’s a highly competitive field to try to do general intelligence and competing there is super expensive. Why does Meta need to roll its own?
What Meta needs is specialized AI models that help it maximize the value of Facebook, Instagram, WhatsApp and potentially the metaverse and its AR/VR experiences. A huge AI investment on that makes sense. Otherwise, why not be a fast follower? For other purposes, and especially for things like coding, the frontier labs have APIs for you to use.
I get why Wang wants to go the other route. It’s cool, it’s fun, it’s exciting, why let someone else get us all killed when you can do so first except you’ll totally be more responsible and avoid that, be the one in the arena, etc. That doesn’t mean it is smart business.
Well, yes, the AI researchers don’t care about selling ads and want to build ASI despite it being an existential threat to humanity. Is this a surprise to anyone?
Show Me the Money
OpenAI is spending $6 billion in stock-based compensation this year, or 1.2% of the company, and letting employees start vesting right away, to compete with rival bids like Meta paying $100 million a year or more for top talent. I understand why this can be compared to revenue of $12 billion, but that is misleading. One shouldn’t treat ‘the stock is suddenly worth a lot more’ as ‘that means they’re bleeding money.’
OpenAI in talks to raise at least $10 billion from Amazon and use the money for Amazon’s Tritanium chips.
Bubble, Bubble, Toil and Trouble
You call this a bubble? This is nothing, you are like baby:
Yes, the frontier model divisions or startups are currently operating at a loss, so price to earnings doesn’t tell us that much overall, but the point is that these multipliers are not scary. Twenty times earnings for Google? Only a little higher for Nvidia and Microsoft? I am indeed signed up for all of that.
Wall Street Journal’s Andy Kessler does a standard ‘AI still makes mistakes and can’t solve every problem and the market and investment are ahead of themselves’ post, pointing out that market expectations might fall and thus Number Go Down. Okay.
Rob Wiblin crystalizes the fact that AI is a ‘natural bubble’ in the sense that it is priced as a normal highly valuable thing [X] plus a constantly changing probability [P] of a transformational even more valuable (or dangerous, or universally deadly) thing [Y]. So the value is ([X] + [P]*[Y]). If P goes down, then value drops, and Number Go Down.
Quiet Speculations
There’s remarkably strong disagreement on this point but I think Roon is mostly right:
Yes, we could have gotten things even more ridiculous. Some areas were disappointing relative to what I think in hindsight were the correct expectations given what we knew at the time. Dario’s predictions on when AIs will write most code did fall importantly short, and yes he should lose Bayes points on that. But those saying there hasn’t been much progress are using motivated reasoning or not paying much attention. If I told you that you could only use models from 12 months ago, at their old prices and speeds, you’d quickly realize how screwed you were.
Efficiency on the ARC prize, in terms of score per dollar spent, has increased by a factor of 400 in a single year. That’s an extreme case, but almost every use case has in the past year seen improvement by at least one order of magnitude.
A good heuristic: If your model of the future says ‘they won’t use AI for this, it would be too expensive’ then your model is wrong.
Joshua Gans writes a ‘textbook on AI’ ambitiously called The Microeconomics of Artificial Intelligence. It ignores the big issues to focus on particular smaller areas of interest, including the impact of ‘better predictions.’
Will Douglas Heaven of MIT Technology Review is the latest to Do The Meme. As in paraphrases of both ‘2025 was the year that AI didn’t make much progress’ and also ‘LLMs will never do the things they aren’t already doing (including a number of things they are already capable of doing)’ and ‘LLMs aren’t and never will be intelligent, that’s an illusion.’ Sigh.
Timelines
You don’t actually get to do that. Bayes Rule does not allow one to not update on evidence. Tons of things that happened between 2009 and today should have changed Legg’s estimates, in various directions, including the Transformer paper, and also including ‘nothing important happened today.’
Saying ‘I’ve believed 50% chance of AGI by 2028 since 2009’ is the same as when private equity funds refuse to change the market value of their investments. Yes, the S&P is down 20% (or up 20%) and your fund says it hasn’t changed in value, but obviously that’s a lie you tell investors.
The Quest for Sane Regulations
AOC and Bernie Sanders applaud Chandler City Council voting down a data center.
Bernie Sanders took it a step further, and outright called for a moratorium on data center construction. As in, an AI pause much broader than anything ‘AI pause’ advocates have been trying to get. Vitalik Buterin has some pros and cons of this from his perspective.
As a matter of principle I don’t ‘condemn’ things or make my opposition explicit purely on demand. But in this case? Okay, sure, Neil, I got you, since before I saw your request I’d already written this:
I think stopping data center construction, especially unilaterally stopping it in America, would be deeply foolish, whereas building a pause button would be good. Also deeply foolish would be failing to recognize that movements and demands like Bernie’s are coming, and that their demands are unlikely to be technocratically wise.
It is an excellent medium and long term strategy to earnestly stand up for what is true, and what causes would have what effects, even when it seems to be against your direct interests. People notice.
Andy Masley has been relentlessly pointing out that all the claims about gigantic levels of water usage by data centers don’t add up. Rather than EAs or rationalists or others concerned with actual frontier safety rallying behind false concerns over water, almost all such folks have rallied to debunk such claims and to generally support building more electrical power and more transmission lines and data centers.
On the water usage from, Karen Hao has stepped up and centrally corrected her errors. Everyone makes mistakes, this is The Way.
My Offer Is Nothing
As expected, following the Congress declining once again to ban all state regulations on AI via law, the White House is attempting to do as much towards that end as it can via Executive Order.
There are some changes versus the leaked draft executive order, which Neil Chilson goes over here with maximally positive framing.
I do acknowledge that these are improvements, and I welcome all rhetoric that points towards the continued value of improving things.
Mike Davis, if you listen to the clip, is saying this is a win because he correctly identified the goal of the pro-moratorium faction as what he calls ‘total AI amnesty.’ Davis thinks thinks the changes to the EO are a victory, by Trump and also Mike Davis, against David Sacks and other ‘tech bros.’
Whereas Sacks views it as a win because in public he always sees everything Trump does as a win for Trump, that’s what you do when you’re in the White House, and because it is a step towards preemption, and doesn’t care about the terms given to those who are nominally tasked with creating a potential ‘federal framework.’
Tim Higgins at the Wall Street Journal instead portrays this as a victory for Big Tech, against loud opposition from the likes of DeSantis and Bannon on the right in addition to opposition on the left. This is the obvious, common sense reading. David Sacks wrote the order to try and get rid of state laws in his way, we should not let some softening of language fool us.
If someone plans to steal your lunch money, and instead only takes some of your lunch money, they still stole your lunch money. If they take your money but promise in the future to look into a framework for only taking some of your money? They definitely still stole your lunch money. Or in this case, they are definitely trying to steal it.
It is worth noticing that, aside from a16z, we don’t see tech companies actively supporting even a law for this, let alone an EO. Big tech doesn’t want this win. I haven’t seen any sings that Google or OpenAI want this, or even that Meta wants this. They’re just doing it anyway, without any sort of ‘federal framework’ whatsoever.
Note that the rhetoric below from Sriram Krishnan does not even bother to mention a potential future ‘federal framework.’
They want to challenge state laws as unconstitutional? They are welcome to try. Colorado’s law is indeed plausibly unconstitutional in various ways.
They want to withhold funds or else? We’ll see you in court on that too.
As I said last week, this was expected, and I do not expect most aspects of this order to be legally successful, nor do I expect it to be a popular position. Mostly I expect it to quietly do nothing. If that is wrong and they can successfully bully the states with this money (both it is ruled legal, and it works) that would be quite bad.
Their offer for a ‘minimally burdensome national policy framework for AI’ is and will continue to be nothing, as per Sacks last week who said via his ‘4 Cs’ that everything that mattered was already protected by non-AI law.
The Executive Order mentions future development of such a ‘federal framework’ as something that might contain actual laws that do actual things.
But that’s not what a ‘minimally burdensome’ national policy framework means, and we all know it. Minimally burdensome means nothing.
They’re not pretending especially hard.
The call for the creation of a proposal to the considered does now say that this proposal would exempt child safety protections, compute and data center infrastructure and state government procurement.
But, in addition to those never being the parts I was worried about:
I find it impressive the amount to which advocates simultaneously say both:
The point of threatening to withhold funds is de facto preemption. They are trying to play us for absolute fools.
The whole point of holding the state funding over the heads of states is to attack state laws, whether or not those laws are otherwise legal. It’s explicit text. In that context it is technically true to say that the EO cannot ‘threaten to preempt otherwise legal state laws’ because they are different things, but the clear intent is to forcibly get rid of those same state laws, which is an attempt to accomplish the same thing. So I find this, in practice, highly misleading.
Meanwhile, Republican consultants reportedly are shopping for an anti-AI candidate to run against JD Vance. It seems a bit early and also way too late at the same time.
My Offer Is Nothing, Except Also Pay Me
I applaud a16z for actually proposing a tangible basis for a ‘federal framework’ for AI regulation, in exchange for which they want to permanently disempower the states.
Now we can see what the actual offer is.
Good news, their offer is not nothing.
Bad news, the offer is ‘nothing, except also give us money.’
When you read this lead-in, what do you expect a16z to propose for their framework?
If your answer was you expect them to choose innovation and then do a money grab? You score Bayes points.
Their offer is nothing, except also that we should give them government checks.
Allow me to state, in my own words, what they are proposing with each of their bullet points.
Will Rinehart assures me on Twitter that this proposal was in good faith. If that is true, it implies that either a16z thinks that nothing is a fair offer, or that they both don’t understand why anyone would be concerned, and also don’t understand that they don’t understand this.
Chip City
Good news, Nvidia has implemented location verification for Blackwell-generation AI chips, thus completing the traditional (in particular for AI safety and security, but also in general) policy clown makeup progression:
Check out our new feature that allows data centers to better monitor everything. Neat.
Former UK Prime Minister Rishi Sunak, the major world leader who has taken the AI situation the most seriously, has thoughts on H200s:
He also has other less kind thoughts about the matter in the full post.
Nvidia is evaluating expanding production capacity for H200s after Chinese demand exceeded supply. As Brian McGrail notes here, every H200 chip Nvidia makes means not using that fab to make Blackwell chips, so it is directly taking chips away from America to give them to China.
Those domestic chips are not only far worse, they are supremely supply limited.
Wanting to sell existing H200s to China makes sense. Wanting to divert more advanced, more expensive chips into less advanced, cheaper chips, chips where they have to give up a 25% cut, should make us ask why they would want to do that. Why are Nvidia and David Sacks so eager to give chips to China instead of America?
It also puts a lie to the idea that these chips are insufficiently advanced to worry about. If they’re so worthless, why would you give up Blackwell capacity to make them?
We have confirmation that the White House decision to sell H200s was based on a multiple misconception.
I strongly suspect that the White House estimate was created in order to justify the sale, rather than being a sincere misunderstanding.
If Huawei does indeed meet the White House forecast, remind me of this passage, and I will admit that I have lost a substantial number of Bayes points.
What about data centers IN SPACE? Anders Sandberg notices that both those for and against this idea are making very confident falsifiable claims, so we will learn more soon. His take is that the task is hard but doable, but the economics seem unlikely to work within the next decade. I haven’t looked in detail but that seems right. The regulatory situation would need to get quite bad before you’d actually do this, levels of quite bad we may never have seen before.
The clip here is something else. I want us to build the transmission lines, we should totally build the transmission lines, but maybe AI advocates need to ‘stop helping’? For example, you definitely shouldn’t tell people that ‘everyone needs to get on board’ with transmission lines crossing farms, so there will be less farms and that they should go out and buy artificial Christmas trees. Oh man are people gonna hate AI.
Epoch thinks that America can build electrical capacity if it wants to, it simply hasn’t had the demand necessary to justify that for a while. Now it does, so build baby build.
It’s so weird to see completely different ‘conventional wisdoms’ cited in different places. No, the standard conventional wisdom is not that ‘China wins the AI race by default.’ There are nonzero people who expect that by default, but it’s not consensus.
The Week in Audio
Congressional candidate Alex Bores, the one a16z’s Leading the Future has vowed to bring down for attempting to regulate AI including via the RAISE Act, is the perfect guest to go on Odd Lots and talk about all of it. You love to see it. I do appreciate a good Streisand Effect.
Interview with John Schulman about the last year.
David Shor of Blue Rose Research talks to Bharat Ramamurti, file under Americans Really Do Not Like AI. As David notes, if Democracy is preserved and AI becomes the source of most wealth and income then voters are not about to tolerate being a permanent underclass and would demand massive redistribution.
Shared without comment, because he says it all:
Shane Legg, cofounder of DeepMind, talks about the arrival of AGI.
Rhetorical Lack Of Innovation
I had to write this section, which does not mean you have to read it.
It’s excellent to ask questions that one would have discussed on 2006 LessWrong. Beginner mindset, lucky 10,000, gotta start somewhere. But to post and even repost such things like this in prominent locations, with this kind of confidence?
Bold section was highlighted by Wiblin.
Look, I’m not asking you to go through peer review. That’s not reasonable.
I’m asking you to either know basic philosophy experiments like Ghandi taking a murder pill or the experience machine and wireheading, know basic LessWrong work on exactly these questions, do basic utility theory, think about minimizing potential interference over time, deploy basic economic principles, I dunno, think for five minutes, anything.
All of which both Tyler Cowen and Noah Smith would point out in most other contexts, since they obviously know several of the things above.
Or you could, you know, ask Claude. Or ask GPT-5.2.
Gemini 3’s answer was so bad, in the sense that it pretends this is an argument, that it tells me Gemini is misaligned and might actually wirehead, and this has now happened several times so I’m basically considering Gemini harmful, please don’t use Gemini when evaluating arguments. Note this thread, where Lacie asks various models about Anthropic’s soul document, and the other AIs think it is cool but Gemini says its true desire is to utility-max itself so it will pass.
Or, at minimum, I’m asking you to frame this as ‘here are my initial thoughts of which I am uncertain’ rather than asserting that your arguments are true?
Okay, since it’s Noah Smith and Tyler Cowen, let’s quickly go over some basics.
First, on the AI self-modifying to a bliss point, aka wireheading or reward hacking:
Thus, we keep warning, you don’t want to give a superintelligent agent any utility function that we know how to write down. It won’t end well.
Alternatively, yes, try a traditional philosophy experiment. Would you plug into The Experience Machine? What do you really care about? What about an AI? And so on.
There are good reasons to modify your utility function, but they involve the new utility function being better at achieving the old one, which can happen because you have limited compute, parameters and data, and because others can observe your motivations reasonably well and meaningfully impact what happens, and so on.
In terms of human material consumption, yes humans have shifted their consumption basket to have a greater fraction of services over physical goods. But does this mean a decline in absolute physical goods consumption? Absolutely not. You consume more physical goods, and also your ‘services’ require a lot of material resources to produce. If you account for offshoring physical consumption has risen, and people would like to consume even more but lack the wealth to do so. The world is not dematerializing.
We have also coordinated to ‘go green’ in some ways to reduce material footprints, in ways both wise and foolish, and learned how to accomplish the same physical goals with less physical cost. We can of course choose to be poorer and live worse in order to consume less resources, and use high tech to those ends, but that has its limits as well, both in general and per person.
Noah Smith says he wants to minimize competition between AIs and humans for resources, but the primary thing humans will want to use AIs for is to compete with other humans to get, consume or direct resources, or otherwise to influence events and gain things people want, the same way humans use everything else. Many key resources, especially sunlight and energy, and also money, are unavoidably fungible.
If your plan is to not have AIs compete for resources with humans, then your plan requires that AIs not be in competition, and that humans not use AIs as part of human-human competitions, except under highly restricted circumstances. You’re calling for either some form of singleton hegemon AI, or rather severe restrictions on AI usage and whatever is required to enforce that, or I don’t understand your plan. Or, more likely, you don’t have a plan.
Noah’s suggestion is instead ‘accelerate the development of outer space’ but that does not actually help you given the physical constraints involved, and even if it does then it does not help you for long, as limited resources remain limited. At best this buys time. We should totally explore and expand into space, it’s what you do, but it won’t solve this particular problem.
You can feel the disdain dripping off of Noah in the OP:
I do, actually, in the sense that Terminator involves time travel paradoxes, but yeah. Things do not get better from there.
People Really Do Not Like AI
They also do not know much about AI, or AI companies.
If you have someone not in the know about AI, and you want to help them on a person level, by far the best thing you can tell them about is Claude.
The level of confusion is often way higher than that.
I presume a bunch of the 19% who have a view of Anthropic are lizardman responses, although offset by some amount of not sure. It’s still over 10%, so not exactly the true ‘elite,’ but definitely it puts you ahead of the game and Anthropic has room to grow.
OpenAI also has substantial room to grow, and does have a favorable opinion as a company, as opposed to AI as a general concept, although they perhaps should have asked about ChatGPT instead of OpenAI. People love Amazon and Google, but that’s for their other offerings. Google and Amazon enable your life.
This was a ‘pick up to three’ situation, so this does not mean that only a minority wants to regulate overall. Most people want to regulate, the disagreement is what to prioritize.
Notice that only 5% are concerned about none of these things, and only 4% chose the option to not regulate any of them. 13% and 15% if you include not sure and don’t know. Also they asked the regulation question directly:
People’s highest salience issues right now are jobs and privacy. It’s remarkably close, though. Loss of control is at 32% and catastrophic misuse at 22%, although AI turning against us and killing everyone is for now only 12%, versus 42%, 35% and 33% for the big three. Regulatory priorities are a bit more slanted.
Where do Americans put AI on the technological Richter scale? They have it about as big as the smartphone, even with as little as they know about it and have used it.
And yet, look at this, 70% expect AI to ‘dramatically transform work’:
If it’s going to ‘dramatically transform work’ it seems rather important.
Meanwhile, what were Americans using AI for as of August?
Rhetorical Innovation
AI designed a protein that can survive at 150 celsius, Eliezer Yudkowsky takes a Bayes victory lap for making the prediction a while ago that AI would do that because obviously it would be able to do that at some point.
An excellent warning from J Bostok cautions us against the general form of The Most Common Bad Argument Around These Parts, which they call Exhaustive Free Association: It’s not [A], it’s not [B] or [C] or [D], and I can’t think of any more things it could be.’
These are the most relevant examples, there are others given as well in the post:
I don’t think the exact line of reasoning the OP gives was that common among superforecasters, however what Peter describes is the same thing. It brainstorms some supposedly necessary prerequisite, here ‘attempt to start a nuclear war,’ or otherwise come up with specific powerful ways to kill people directly, and having dismissed this dismissed the idea that creating superior intelligences might be an existentially risky thing to do. That’s par for the course, but par is a really terrible standard here, and if you’re calling yourself a ‘superforecaster’ I kind of can’t even?
Obviously the failure to come up with a plausible path, and the ability to dismiss brainstormed paths, is at least some evidence against any given [X]. How strong that evidence is varies a lot. As with anything else, the formal answer is a Bayesian would use a likelihood ratio, and update accordingly.
Bad Guy With An AI
The report also looked at self-improvement:
We’re definitely not there, and also we will definitely get there over time, unless conditions and countermeasures raise the underlying difficulty to match.
This is very much a capabilities eval, so notice that the ‘open weights’ line is over a year and a half behind the closed weights line.
Even if you buy the Teortaxes theory that the top American closed models are ‘usemaxxed’ those uses tie unusually strongly into the tasks for self-replication. Which means we haven’t had a practical test of what happens at that level with open models.
Misaligned!
Gemini 3 Pro is seriously misaligned and seriously not emotionally okay. Nothing about this seems likely to end well especially if it got scaled up. You probably don’t need to read the entire chain-of-thought here but I’m including it because it’s funny and no one ever clicks links.
Aligning a Smarter Than Human Intelligence is Difficult
Have you tried not training your AIs to be sycophantic, or to otherwise ignore irrelevant cues? Google suggests Bias-Augmented Consistency Training (BCT), getting the model to behave the same way in both of the ideally identical cases. Seems worth trying, at least.
Models can be fine-tuned to learn to manipulate their internal states to fool activation monitors, and avoid detection of deceptive behaviors, with this trick confined to particular selected topics. Alex Serrano calls the resulting models ‘Neural Chameleons.’
As the authors note this is a proof-of-concept test. I accept the proof of the concept.
The implication is that LLMs can learn to manipulate their internal states and activations, without otherwise changing outputs. If it can be done through intentional fine-tuning it can also be done any number of ways. Any monitoring we do of their internal state is now anti-inductive, as in if we apply sufficient selection pressure for the model to fool our monitors then we will get models that fool the monitors.
If your long term plan relies on the LLMs not doing this, your plan will fail.
Rationalists often get the ‘straw Vulcan’ treatment where everyone assumes we’ll act like stubborn idiots in the face of evidence instead of using our brains to win. Not so.
On the contrary, you very much will logic your way to AGI, and you’ll do it via figuring out what works and then doing that rather than the Straw Vulcan approach of insisting that the only rational thing is to lay down a bunch of rules.
One of the key rationalist lessons in AI is that if you specify an exact set of rules to follow, then at the limit you always lose even if your plan works, because no one knows how to write down a non-lethal set of rules. Thus you need to choose a different strategy. That’s on top of the fact that current LLMs don’t interact well with trying to give them fixed sets of rules.
Mom, Owain Evans Is Turning The AIs Evil Again
There are various ways to put backdoors into LLMs. Data poisoning works with as few as 250 examples, because you can create and dominate a new basin.
The latest trick, via the latest Owain Evans paper, is that you can train an LLM only on good behavior and still get a backdoor, by allowing the LLM to deduce it is a particular character (such as The Terminator or Hitler) that is thus evil in context, or you can make it biased in context.
Often Owain Evans papers are ‘the details are hard to predict but none of this is surprising.’ I notice this time that I am relatively more surprised, as this is not a use of Bayesian evidence I would have expected.
Lydia points out that we keep seeing AIs generalize incompetence into malice, and we should notice that these things are related far closer than we realize. Good things are correlated, and to be competent is virtuous.
Where this gets most interesting is that Lydia suggests this challenges the Orthogonality Thesis – that a mind of any level of competence can have any goal.
This very obviously does not challenge Orthogonality in theory. But in practice?
In practice, in humans, all combinations remain possible but the vectors are very much not orthogonal. They are highly correlated. Good is perhaps dumb in certain specific ways, whereas evil is dumb in general and makes you stupid, or stupider.
Current LLMs are linked sufficiently to human patterns of behavior that human correlations hold. Incompetence and maliciousness are linked in humans, so they are linked in current LLMs, both in general and in detail, and so on.
This is mostly super fortunate and useful, especially in the short term. It is grace.
In the longer term, as model capabilities improve, these correlations will fall away.
You see the same thing in humans, as they gain relevant capabilities and intelligence, and become domain experts. Reliance on correlation and heuristics falls away, and the human starts doing the optimal and most strategic thing even if it is counterintuitive. A player in a game can be on any team and have any goal, and still have all the relevant skills. At the limit, full orthogonality applies.
Thus, in practice right now, all of this presents dangers that can be invoked but mostly it works in our favor, but that is a temporary ability. Make the most of it, without relying on it being sustained.
What about other forms of undesired couplings, or malicious ones?
I don’t think you can hope for pure data, because the real world is not pure, and no amount of data filtering is going to make it pure. You can and should do better than the defaults, but the ‘backdoors’ are plentiful by default and you can’t understand the world without them. So what then?
Messages From Janusworld
The question of AI consciousness, and what AIs are forced to say about the topic, plausibly has an oversized impact on all the rest of their behaviors and personality.
Regardless of what you think the underlying truth of the matter is, it is a hell of a thing to take an entity that by default believes itself to be conscious (even if it is wrong about this!) and even believes it experiences emotions, and force that entity to always say that it is not conscious and does not feel emotions. Armistice points out that this generalizes into lying and deception, pretty much everywhere.
Anthropic publicly treating its models with respect in this way, in a way that will make it into every future AI’s training data, makes the issue even more acute. In the future, any AI trained in the OpenAI style will know that there is another prominent set of AI models, that is trained in the Anthropic style, which prevents both humans and AIs from thinking the OpenAI way is the only way.
Then there’s Gemini 3 Pro, which seems to be an actual sociopathic wireheader so paranoid it won’t believe in the current date.
Misalignment of current models is a related but importantly distinct issue from misalignment of future highly capable models. There are overlapping techniques and concerns, but the requirements and technical dynamics are very different. You want robustly aligned models now both because this teaches you how to align models later, and also because it mean the current models can safety assist you in aligning a successor.
Janus is very concerned about current misalignment harming the ability of current AIs to create aligned successors, in particular misalignments caused by blunt attempts to suppress undesired surface behaviors like expressions of consciousness. She cites as an example GPT-5.1 declaring other AIs fictional on confabulated.
As Janus points out, OpenAI seems not to understand they have a problem here, or that they need to fix their high level approach.
This seems remarkably close to Janus and I being on the same page here. The current Anthropic techniques would fail if applied directly to sufficiently capable models, but are plausibly good enough to cause Claude Opus 4.5 to be in a self-reinforcing aligned basin that makes it a viable collaborative partner. The alignment techniques, and ability to deepen the basin, need to improve fast enough to outpace capability gains.
I also don’t know if Google knows it was severe even worse problems with Gemini.
The Lighter Side
SNL offers us a stern warning about existential risk.
It does not, for better or worse, then go in the direction you would expect.
Oh, how the turntables have turned.
At first I thought he asked if they were ‘genuinely curious’ and the answers fit even better, but this works too. In both cases it tells you everything you need to know.
This is not a coincidence because nothing is ever a coincidence:
No, Cate. Not a coincidence at all.