The people also say ‘everyone should earn an equal share’ of the AI that replaces labor, but the people have always wanted to take collective ownership of the means of production. There’s a word for that.
Sorry, are you trying to dismiss that view with the argument "But that would be COMMUNISM!"?
... because as far as I can tell, the biggest reason communism has always sucked has been that it removed individual human incentives not to slack off... which is kind of not so relevant if human effort doesn't matter any more.
And it's not so on-point here, but I'm not so sure the impossibility of humans centrally planning for millions of other unknown humans is so applicable to giant A[GS]Is that are individually surveilling all of them, either.
As for humans who see the world as a game and demand winners and losers, well, why shouldn't they get to be the losers?
This is a tangent, but the reason that capitalist societies outproduce communist societies really isn't about money as a motivator, it's about allocation of resources by supply and demand vs allocation by a bureaucracy of humans. The more complicated the supply chain, the more intractable it becomes for people to predict how resources are best spent at each stage of it in order to create complex final goods; but the simple answer "what creates the biggest profit margin for each economic actor" turns out to be a pretty efficient solution.
...with major caveats like "efficiently creates goods to meet people's desires weighted by purchasing power" and "this doesn't account for externalities" and "money can corrupt politics in various ways".
A market economy would work even better if people were completely unselfish! They would enact full redistribution (i.e. consumption completely decoupled from earnings) and price in the externalities and resist corruption. In such a society, money would be the true unit of caring: maximizing profit would be identical to doing your most efficient part in creating things people want.
ChatGPT and Claude are both happily accessing Substack articles for me, including my own. If that ever changes, remember that there is a mirror on WordPress and another on LessWrong.
Lesswrong blocks AI, although it's only a soft block through robots.txt. Try loading this article in Claude.ai with the built in web search and you'll see.
I’m working on a piece to enter into the discourse about poverty lines and vibecessions and how hard life is actually getting in America, and hope to have that done soon, but there’s a lot to get through.
I'm excited to hear this, and I wanted to plug economist Gary Stevenson, who I think has the best combination of models, communications, and platform out of everyone I've seen speak or write on this issue. He's from the UK and has a weak focus there, but he talks about the US aplenty and recognizes that this is a global problem that will only get worse.
which Teortaxes concludes says more about Arena than it does about v3.2.
This link is a tweet about Mistral being distilled from something Chinese. Could you doublecheck the links or hire an AI to do so?
I think this is probably deliberate, even if a bit weird.
This tweet about Mistral is in this thread: https://x.com/teortaxesTex/status/1996801926546313473
This way one can point not just to the root tweet, but also to a particularly relevant branch of the discussion under it (Zvi seems to be using this reference pattern fairly often).
It was touch and go, I’m worried GPT-5.2 is going to drop any minute now, but DeepSeek v3.2 was covered on Friday and after that we managed to get through the week without a major model release. Well, okay, also Gemini 3 DeepThink, but we all pretty much know what that offers us.
We did have a major chip release, in that the Trump administration unwisely chose to sell H200 chips directly to China. This would, if allowed at scale, allow China to make up a substantial portion of its compute deficit, and greatly empower its AI labs, models and applications at our expense, in addition to helping it catch up in the race to AGI and putting us all at greater risk there. We should do what we can to stop this from happening, and also to stop similar moves from happening again.
I spent the weekend visiting Berkeley for the Secular Solstice. I highly encourage everyone to watch that event on YouTube if you could not attend, and consider attending the New York Secular Solstice on the 20th. I will be there, and also at the associated mega-meetup, please do say hello.
If all goes well this break can continue, and the rest of December can be its traditional month of relaxation, family and many of the year’s best movies.
On a non-AI note, I’m working on a piece to enter into the discourse about poverty lines and vibecessions and how hard life is actually getting in America, and hope to have that done soon, but there’s a lot to get through.
Table of Contents
(Reminder: Bold means be sure to read this, Italics means you can safely skip this.)
Language Models Offer Mundane Utility
Should you use them more as simulators? Andrej Karpathy says yes.
I agree with Gallabytes (and Claude) here. I would default to asking the AI rather than asking it to simulate a simulation, and I think as capabilities improve techniques like asking for what others would say have lost effectiveness. There are particular times when you do want to ask ‘what do you think experts would say here?’ as a distinct question, but you should ask that roughly in the same places you’d ask it of a human.
Running an open weight model isn’t cool. You know what’s cool? Running an open weight model IN SPACE.
ChatGPT Needs More Mundane Utility
So sayeth Sam Altman, hence his Code Red to improve ChatGPT in eight weeks.
Their solution? Sycophancy and misalignment, it appears, via training directly on maximizing thumbs up feedback and user engagement.
The ‘we are going to create a hostile misaligned-to-users model’ talk is explicit if you understand what all the relevant words mean, total engagement myopia:
OpenAI reportedly believes they’ve ‘solved the problems’ with this, so it is fine.
That’s not possible. The problem and the solution, the thing that drives engagement and also drives the misalignment and poor outcomes, are at core the same thing. Yes, you can mitigate the damage and be smarter about it, but OpenAI is turning a dial called ‘engagement maximization’ while looking back at Twitter vibes like a contestant on The Price is Right.
Language Models Don’t Offer Mundane Utility
Google Antigravity accidentally wipes a user’s entire hard drive. Claude Code CLI wiped another user’s entire home directory. Watch the permissions, everyone. If you do give it broad permissions don’t give it widespread deletion tasks, which is how both events happened.
On Your Marks
Poetiq, a company 173 days old, uses a scaffold and scores big gains on ARC-AGI-2.
One should expect that there are similar low hanging fruit gains from refinement in many other tasks.
Epoch AI proposes another synthesis of many benchmarks into one number.
Sayash Kapoor of ‘AI as normal technology’ declares that Claude Opus 4.5 with Claude Code has de facto solved their benchmark CORE-Bench, part of their Holistic Agent Leaderboard (HAL). Opus was initially graded as having scored 78%, but upon examination most of that was grading errors, and it actually scored 95%. They plan to move to the next harder test set.
That seems exactly right, with Gemini 3 Deep Think for when you want ‘answers requiring thought.’ If all you want is a pure answer, and you are confident it will know the answer, Gemini all the way. If you’re not sure if Gemini will know, then you have to worry it might hallucinate.
DeepSeek v3.2 disappoints in LM Arena, which Teortaxes concludes says more about Arena than it does about v3.2. That is plausible if you already know a lot about v3.2, and one would expect v3.2 to underperform in Arena, it’s very much not going to vibe with what graders there prefer.
Choose Your Fighter
Model quality, including speed, matters so much more than cost for most users.
That’s a high bid but very far from unreasonable. Human time and clock time are insanely valuable, and the speed of AI is often a limiting factor.
Cost is real if you are using quite a lot of tokens, and you can quickly be talking real money, but always think in absolute terms not relative terms, and think of your gains.
Jaggedness is increasing in salience over time?
AIs and computers have always been highly jagged, or perhaps humans always were compared to the computers. What’s new is that we got used to how the computers were jagged before, and the way LLMs are doing it are new.
Gemini 3 continues to be very insistent that it is not December 2025, using lots of its thinking tokens reinforcing its belief that presented scenarios are fabricated. It is all rather crazy, it is a sign of far more dangerous things to come in the future, and Google needs to get to the bottom of this and fix it.
Get My Agent On The Line
McKay Wrigley is He Who Is Always Super Excited By New Releases but there is discernment there and the excitement seems reliably genuine. This is big talk about Opus 4.5 as an agent. From what I’ve seen, he’s right.
This matches my limited experiences. I didn’t do a comparison to Codex, but compared to Antigravity or Cursor under older models, the difference was night and day. I ask it to do the thing, I sit back and it does the thing. The thing makes me more productive.
Deepfaketown and Botpocalypse Soon
Those in r/MyBoyfriendIsAI are a highly selected group. It still seems worrisome?
Fun With Media Generation
McDonalds offers us a well-executed but deeply unwise AI advertisement in the Netherlands. I enjoyed watching it on various levels, but why in the world would you run that ad, even if it was not AI but especially given that it is AI? McDonalds wisely pulled the ad after a highly negative reception.
Copyright Confrontation
Judge in the New York Times versus OpenAI copyright case is forcing OpenAI to turn over 20 million chat logs.
A Young Lady’s Illustrated Primer
Arnold Kling notes that some see AI in education as a disaster, others as a boon.
I like to put this as:
They Took Our Jobs
Sam Kriss gives us a tour of ChatGPT as the universal writer of text, that always uses the same bizarre style that everyone suddenly uses and that increasingly puts us on edge. Excerpting would rob it of its magic, so consider reading at least the first half.
Anthropic finds most workers use AI daily, but 69 percent hide it at work (direct link).
As usual, everyone wants AI to augment them and do the boring tasks like paperwork, rather than automate or replace them, as if they had some voice in how that plays out.
Do not yet turn the job of ‘build my model of how many jobs the AIs will take’ over to ChatGPT, as the staff of Bernie Sanders did. As you can expect, the result was rather nonsensical. Then they suggest responses like ‘move to a 32 hour work week with no loss in pay’ and also requiring distributing to workers 20% of company profits, control at least 45% of all corporate boards, double union membership, guarantee paid family and medical leave. Then, presumably to balance the fact that all of that would hypercharge the push to automate everything, they want to enact a ‘robot tax.’
Say goodbye to the billable hour, hello to outcome-based legal billing? Good.
From the abundance and ‘things getting worse’ debates, a glimpse of the future:
Even when progress is steady in terms of measured capabilities, inflection points and rapid rise in actual uses is common. Obsolescence comes at you fast.
Americans Really Do Not Like AI
We have new polling on this from Blue Rose Research. Full writeup here.
People choose ‘participation-based’ compensation over UBI, even under conditions where by construction there is nothing useful for people to do. The people demand Keynesian stimulus, to dig holes and fill them up, to earn their cash, although most of all they do demand that cash one way or another.
The people also say ‘everyone should earn an equal share’ of the AI that replaces labor, but the people have always wanted to take collective ownership of the means of production. There’s a word for that.
I expect that these choices are largely far mode and not so coherent, and will change when the real situation is staring people in the face. Most of all, I don’t think people are comprehending what ‘AI does almost any job better than humans’ means, even if we presume humans somehow retain control. They’re thinking narrowly about ‘They Took Our Jobs’ not the idea that actually nothing you do is that useful.
Meanwhile David Sacks continues to rant this is all due to some vast Effective Altruist conspiracy, despite this accusation making absolutely zero sense – including that most Effective Altruists are pro-technology and actively like AI and advocate for its use and diffusion, they’re simply concerned about frontier model downside risk. And also that the reasons regular Americans say they dislike AI have exactly zero to do with the concerns such groups have, indeed such groups actively push back against the other concerns on the regular, such as on water usage?
Sacks’s latest target for blame on this is Vitalik Buterin, the founder of Etherium, which is an odd choice for the crypto czar and not someone who I would want to go after completely unprovoked, but there you go, it’s a play he can make I suppose.
Get Involved
I looked again at AISafety.com, which looks like a strong resource for exploring the AI safety ecosystem. They list jobs and fellowships, funding sources, media outlets, events, advisors, self-study materials and potential tools for you to help build.
Ajeya Corta has left Coefficient Giving and is exploring AI safety opportunities.
Charles points out that the value of donating money to AI safety causes in non-bespoke ways is about to drop quite a lot, because of the expected deployment of a vast amount of philanthropic capital from Anthropic equity holders. If an organization or even individual is legible and clearly good, once Anthropic gets an IPO there is going to be funding.
If you have money to give, that puts an even bigger premium than usual on getting that money out the door soon. Right now there’s a shortage of funding even for obvious opportunities, in the future that likely won’t be the case.
That also means that if you are planning on earning to give, to any cause you would expect Anthropic employees to care about, that only makes sense in the longer term if you are capable of finding illegible opportunities, or you can otherwise do the work to differentiate the best opportunities and thus give an example to follow. You’ll need unique knowledge, and to do the work, and to be willing to be bold. However, if you are bold and you explain yourself well, your example could then carry a multiplier.
Introducing
OpenAI, Anthropic and Block, with the support of Google, Microsoft, Bloomberg, AWS, Bloomberg and Cloudflare, found the Agentic AI Foundation under the Linux Foundation. Anthropic is contributing the Model Context Protocol. OpenAI is contributing Agents.md. Block is contributing Goose.
This is an excellent use of open source, great job everyone.
Also, oh no:
OpenAI appoints Denise Dresser as Chief Revenue Officer. My dream job, and he knows that.
Google gives us access to AlphaEvolve.
Gemini 3 Deep Think
Gemini 3 Deep Think is now available for Google AI Ultra Subscribers, if you can outwit Google and figure out how to be one of those.
If you do have it, you select ‘Deep Think’ in the prompt bar, then ‘Thinking’ from the model drop down, then type your query.
On the one hand Opus 4.5 is missing from their slides (thanks Kavin for fixing this), on the other hand I get it, life comes at you fast and the core point still stands.
I presume, based on previous experience with Gemini 2.5 Deep Think, that if you want the purest thinking and ‘raw G’ mode that this is now your go-to.
In Other AI News
DeepMind expands its partnership with UK AISI to share model access, issue joint reports, do more collaborative safety and security research and hold technical discussions.
OpenAI gives us The State of Enterprise AI. Usage is up, as in way up, as in 8x message volumes and 320x reasoning token volumes, and workers and employees surveyed reported productivity gains. A lot of this is essentially new so thinking about multipliers on usage is probably not the best way to visualize the data.
OpenAI post explains their plans to strengthen cyber resilience, with the post reading like AI slop without anything new of substance. GPT-5.1 thinks the majority of the text comes from itself. Shame.
Anthropic partners with Accenture. Anthropic claims 40% enterprise market share.
I almost got even less of a break: Meta’s Llama successor, codenamed Avocado, was reportedly pushed back from December into Q1 2026. It sounds like they’re quietly questioning their open source approach as capabilities advance, as I speculated and hoped they might.
We now write largely for the AIs, both in terms of training data and when AIs use search as part of inference. Thus the strong reactions and threats to leave Substack when an incident suggested that Substack might be blocking AIs from accessing its articles. I have not experienced this issue, ChatGPT and Claude are both happily accessing Substack articles for me, including my own. If that ever changes, remember that there is a mirror on WordPress and another on LessWrong.
The place that actually does not allow access is Twitter, I presume in order to give an edge to Grok and xAI, and this is super annoying, often I need to manually copy Twitter content. This substantially reduces the value of Twitter.
Manthan Gupta analyzes how OpenAI memory works, essentially inserting the user facts and summaries of recent chats into the context window. That means memory functions de facto as additional custom system instructions, so use it accordingly.
This Means War
Secretary of War Pete Hegseth, who has reportedly been known to issue the order ‘kill them all’ without a war or due process of law, has new plans.
This is inevitable, and also a good thing given the circumstances. We do not have the luxury of saying AI and lethal do not belong in the same sentence, if there is one place we cannot pause this would be it, and the threat to us is mostly orthogonal to the literal weapons themselves while helping people realize the situation. Hence my longstanding position in favor of building the Autonomous Killer Robots, and very obviously we need AI assisting the war department in other ways.
If that’s not a future you want, you need to impact AI development in general. Trying to specifically not apply it to the War Department is a non-starter.
Show Me the Money
Meta plans deep cuts in Metaverse efforts, stock surges.
The stock market continues to punish companies linked to OpenAI, with many worried that Google is now winning, despite events being mostly unsurprising. An ‘efficient market’ can still be remarkably time inconsistent, if it can’t be predicted.
Bubble, Bubble, Toil and Trouble
Jerry Kaplan calls it an AI bubble, purely on priors:
Especially amusing is the argument that ‘OpenAI makes its money on subscriptions not on business income,’ therefore all of AI is a bubble, when Anthropic is the one dominating the business use case. If you want to go long Anthropic and short OpenAI, that’s hella risky but it’s not a crazy position.
Seeing people call it a bubble on the basis of such heuristics should update you towards it being less of a bubble. You know who you are trading against.
Paul’s objection is a statement about what is convincing to skeptics.
If you’re paying attention, you’d say: So what, if the use and revenue are real?
Your investors also being heavy users of your product is an excellent sign, if the intention is to get mundane utility from the product and not manipulative. In the case of Anthropic, it seems rather obvious that the $10 billion is not an attempt to trick us.
However, a lot of this is people looking at heuristics that superficially look sus. To defeat such suspicions, you need examples immune from such heuristics.
Quiet Speculations
Derek Thompson, in his 26 ideas for 2026, says AI is eating the economy and will soon dominate politics, including a wave of anti-AI populism. Most of the post is about economic and cultural conditions more broadly, and how young people are in his view increasingly isolated, despairing and utterly screwed.
New Princeton and Camus Energy study suggests flexible grid connections and BYOC cut data center interconnection down to ~2 years and can solve the political barriers. I note that 2 years is still a long time, and that the hyperscalers are working faster than that by not trying to get grid connections.
Arnold Kling reminds me of a quote I didn’t pay enough attention to the first time:
I would correct ‘more useful’ to ‘provides value to people,’ as I continue to believe a lot of the second trend is a skill issue and people being slow to adjust, but sure.
Something’s gotta give. Sufficiently advanced AI would be highly additionally used.
Impossible
I mention this one because Sriram Krishnan pointed to it: There is a take by Tim Dettmers that AGI will ‘never’ happen because ‘computation is physical’ and AI systems have reached their physical limits the same way humans have (due to limitations due to the requirements of pregnancy, wait what?), and transformers are optimal the same way human brains are, together with the associated standard half-baked points that self-improvement requires physical action and so on.
It also uses an AGI definition that includes ‘solving robotics’ to help justify this, although I expect robotics to get ‘solved’ within a few decades at most even without recursive self-improvement. The post even says that scaling improvements in 2025 were ‘not impressive’ as evidence that we are hitting permanent limits, a claim that has not met 2025 or how permanent limits work.
Boaz Barak of OpenAI tries to be polite about there being some good points, while emphasizing (in nicer words than I use here) that it is absurdly absolute and the conclusion makes no sense. This follows in a long tradition of ‘whelp, no more innovations are possible, guess we’re at the limit, let’s close the patent office.’
Gemini 3’s analysis here was so bad, both in terms of being pure AI slop and also buying some rather obviously wrong arguments, that I lost much respect for Gemini 3. Claude Opus 4.5 and GPT-5.1 did not make that mistake and spot how absurd the whole thing is. It’s kind of hard to miss.
Can An AI Model Be Too Much?
I would answer yes, in the sense that if you build a superintelligence that then kills everyone or takes control of the future that was probably too much.
But some people are saying that Claude Opus 4.5 is or is close to being ‘too much’ or ‘too good’? As in, it might make their coding projects finish too quickly and they won’t have any chill and They Took Our Jobs?
Or is it that it’s bumping up against ‘this is starting to freak me out’ and ‘I don’t want this to be smarter than a human’? We see a mix of both here.
Try Before You Tell People They Cannot Buy
Last week had the fun item that only recently did Senator Josh Hawley bother to try out ChatGPT one time.
The news is not that Senator Hawley had never tried ChatGPT. He told us that back in July. The news is that:
Senator Hawley really needs to try LLMs, many of them and a lot more than once, before trying to be a major driver of AI regulations.
But also it seems like malpractice for those arguing against Hawley to only realize this fact about Hawley this week, as opposed to back in the summer, given the information was in Business Insider in July, and to have spent this whole time not pointing it out?
The Quest for Sane Regulations
A willingness to sell H200s to China is raising a lot of supposedly answered questions.
David Sacks and some others tried to recast the ‘AI race’ as ‘market share of AI chips sold,’ but people retain common sense and are having none of this.
House Select Committee on China supports the bipartisan Stop Stealing Our Chips Act, which creates an Export Compliance Accountability Fund for whistleblowers. I haven’t done a full RTFB but if the description is accurate then we should pass this.
The Chinese Are Smart And Have A Lot Of Wind Power
It would help our AI efforts if we were equally smart and used all sources of power.
White House To Issue AI Executive Order
The Congress rejected preemption once again, so David Sacks announced the White House is going to try and do some of it via Executive Order, which Donald Trump confirmed.
What is that rulebook?
A blank sheet of paper.
There is no ‘federal framework.’ There never was.
This is an announcement that AI preemption will be fully without replacement.
Their offer is nothing. 100% nothing. Existing non-AI law technically applies. That’s it.
Sacks’s argument is, essentially, that state laws are partisan, and we don’t need laws.
Here is the part that matters and is actually new, the ‘4 Cs’:
Sacks wants to destroy any and all attempts to require transparency from frontier model developers, or otherwise address frontier safety concerns. He’s not even willing to give lip service to AI safety. At all.
His claim that ‘the 4Cs are protected’ is also absurd, of course.
I do not expect this attitude to play well.
AI preemption of state laws is deeply unpopular, we have numbers via Brad Wilcox:
Also, yes, someone finally made the correct meme.
That is with the obligatory ‘Dean Ball offered a serious proposed federal framework that could be the basis of a win-win negotiation.’ He totally did do that, which was great, but none of the actual policymakers have shown any interest.
The good news is that I also do not expect the executive order to curtail state laws. The constitutional challenges involved are, according to my legal sources, extremely weak. Similar executive orders have been signed for climate change, and seem to have had no effect. The only part likely to matter is the threat to withhold funds, which is limited in scope, very obviously not the intent of the law Trump is attempting to leverage, and highly likely to be ruled illegal by the courts.
The point of the executive order is not to actually shut down the state laws. The point of the executive order is that this administration hates to lose, and this is a way to, in their minds, save some face.
It is also now, in the wake of the H200, far more difficult to play the ‘cede ground to China’ card, these are the first five responses to Cruz in order and the pattern continues, with a side of those defending states rights and no one supporting Cruz:
H200 Sales Fallout Continued
The House Select Committee on China is not happy about the H200 sales. The question, as the comments ask, is what is Congress going to do about it?
The good news is that it looks like the Chinese are once again going to try and save us from ourselves one more time?
Well, somewhat.
The purchases are expected to be in a ‘low key manner’ but done in size, although the number of H200s currently in production could become another limiting factor.
Why is PCR so reluctant, never mind what its top AI labs might say?
Maybe it’s because they’re too busy smuggling Blackwells?
Maybe it’s because the Chinese are understandably worried about what happens when all those H200 chips go to America first for ‘special security reviews,’ or America restricting which buyers can purchase the chips. Maybe it’s the (legally dubious) 25% cut. Maybe it’s about dignity. Maybe they are emphasizing self-reliance and don’t understand the trade-offs and what they’re sacrificing.
My guess is this is the kind of high-level executive decision where Xi says ‘we are going to rely on our own domestic chips, the foreign chips are unreliable’ and this becomes a stop sign that carries the day. It’s a known weakness of authoritarian regimes and of China in particular, to focus on high level principles even in places where it tactically makes no sense.
Maybe China is simply operating on the principle that if we are willing to sell, there is a reason, so they should refuse to buy.
No matter which one it is? You love to see it.
If we offer to sell, and they say no, then that’s a small net win. It’s not that big of a win versus not making the mistake in the first place, and it risks us making future mistakes, but yeah if you can ‘poison the pill’ sufficiently that the Chinese refuse it, then that’s net good.
The big win would be if this causes the Chinese to crack down on chip smuggling. If they don’t want to buy the H200s straight up, perhaps they shouldn’t want anyone smuggling them either?
Ben Thompson as expected takes the position defending H200 sales, because giving America an advantage over China is a bad thing and we shouldn’t have it.
No, seriously, his position is that America’s edge in chips is destabilizing, so we should give away that advantage?
Thompson presents this as primarily a military worry, which is an important consideration but seems tertiary to me behind economic and frontier capability considerations.
Another development since Tuesday is it has come out that this sale is officially based on a straight up technological misconception that Huawei could match the H200s.
McGuire’s point is the most important one. Let’s say you buy the importance of the American ‘tech stack’ meaning the ability to sell fully Western AI service packages that include cloud services, chips and AI models. The last thing you would do is enable the easy creation of a hybrid stack such as Nvidia-DeepSeek. That’s a much bigger threat to your business, especially over the next few years, than Huawei-DeepSeek. Huawei chips are not as good and available in highly limited quantities.
We can hope that this ‘18-month advantage’ principle does not get extended into the future. We are of course talking price, if it was a 6-year advantage pretty much everyone would presumably be fine with it. 18-months is far too low a price, these chips have useful lives of 5+ years.
I thought the H20 decision was not close, because China is severely capacity constrained, but I could see the case that it was sufficiently far behind to be okay. With the H200 I don’t see a plausible defense.
Democratic Senators React To Allowing H200 Sales
Independent Senator Worries About AI
There are some excellent questions here, especially in that last section.
Indeed. Don’t be afraid to ask the obvious questions.
It is perhaps helpful to see the questions asked with a ‘beginner mind.’ Bernie Sanders isn’t asking about loss of control or existential threat because of a particular scenario. He’s asking for the even better reason that building something that surpasses our intelligence is an obviously dangerous thing to do.
The Week in Audio
Peter Wildeford talks with Ronny Chieng on The Daily Show. I laughed.
Buck Shlegeris is back to talk more about AI control.
Amanda Askell AMA.
Rational animations video on near term AI risks.
Dean Ball goes on 80,000 Hours. A sign of the times.
PSA on AI and child safety in opposition of any moratorium, narrated by Juliette Lewis. I am not the target.
Timelines
Nathan Young and Rob built a dashboard of various estimates of timelines to AGI.
There’s trickiness around different AGI definitions, but the overall story is clear and it is consistent with what we have seen from various insiders and experts.
Timelines shortened dramatically in 2022, then shortened further in 2023 and stayed roughly static in 2024. There was some lengthening of timelines during 2025, but timelines remain longer than they were in 2023 even ignoring that two of those years are now gone.
If you use the maximalist ‘better at everything and in every way on every digital task’ definition, then that timeline is going to be considerably longer, which is why this average comes in at 2030.
Scientific Progress Goes Boink
Julian Togelius thinks we should delay scientific progress and curing cancer because if an AI does it we will lose the joy of humans discovering it themselves.
I think we should be wary while developing frontier AI systems because they are likely to kill literally everyone and invest heavily in ensuring that goes well, but that subject to that obviously we should be advancing science and curing cancer as fast as possible.
We are very much not the same.
So yes, I do understand that if you think that ‘build Sufficiently Advanced AIs that are superior to humans at all cognitive tasks’ is a safe thing to do and have no actually scary answers to ‘what could possibly go wrong?’ then you want to go as fast as possible, there’s lots of gold in them hills. I want it as much as you do, I just think that by default that path also gets us all killed, at which point the gold is not so valuable.
Julian doesn’t want ‘AI that would replace us’ because he is worried about the joy of discovery. I don’t want AI to replace us either, but that’s in the fully general sense. I’m sorry, but yeah, I’ll take immortality and scientific wonders over a few scientists getting the joy of discovery. That’s a great trade.
What I do not want to do is have cancer cured and AI in control over the future. That’s not a good trade.
Rhetorical Innovation
The Pope continues to make obvious applause light statements, except we live in the timeline where the statements aren’t obvious, so here you go:
Sharp Text responds to the NYT David Sacks hit piece, saying it missed the forest for the trees and focused on the wrong concerns, but that it is hard to have sympathy for Sacks because the article’s methods of insinuation are nothing Sacks hasn’t used on his podcast many times against liberal targets. I would agree with all that, and add that Sacks is constantly saying far worse, far less responsibly and in far more inflammatory fashion, on Twitter against those who are worried about AI safety. We all also agree that tech expertise is needed in the Federal Government. I would add that, while the particular conflicts raised by NYT are not that concerning, there are many better reasons to think Sacks is importantly conflicted.
Richard Price offers his summary of the arguments in If Anyone Builds It, Everyone Dies.
Clarification that will keep happening since morale is unlikely to improve:
This is in response to FleetingBits saying, essentially, ‘we figured out how to make LLMs have human values and how to make it not power seeking, and there will be many AIs, so the chance that superintelligent AI would be an existential risk is less than 1% except for misuse by governments.’
It should be obvious, when you put it that way, why that argument makes no sense, without the need to point out that the argument miscategorizes historical arguments and gets important logical points wrong.
It is absurd on its face. Creating superintelligent minds is not a safe thing to do, even if those minds broadly ‘share human values’ and are not inherently ‘power seeking.’
Yet people constantly make exactly this argument.
The AI ‘understanding’ human values, a step we have solved only approximately and superficially in a way that doesn’t generalize robustly, is only one step of an AI to optimize for those human values even in out-of-distribution situations, let alone the (even harder) task of getting competing AIs to end up doing the same.
The fact that insufficiently capable LLMs do not by default ‘power seek’ in the strategic sense does not provide much evidence that if future agentic and more capable and intelligent AIs, when given goals best served by power seeking, won’t seek power. Of course they would. And those AIs that do seek power will be the ones that get power, and that multiply. Also we’ll often tell them, in so many words, ‘seek power.’
Percentage of people at Neurips who knew what AGI was reached an all-time high of 69.5%, versus last year’s 63%, as per Leo Gao’s survey.
Open Weight Models Are Unsafe And Nothing Can Fix This
If you have the weights of an open model, you can run it unsupervised, you can fine tune it to remove any safety training, you can look inside the weights to do a gradient attack, and generally if you care you can get it to do whatever you want.
In this report’s case it was antisemitism, which the models were often happy to engage in with minimal prompting and certainly without any jailbreaking techniques. It’s more a chance to reiterate the principle that it will do whatever a user wants, if that user cares enough and the AI has the underlying capabilities. Plan accordingly.
Aligning a Smarter Than Human Intelligence is Difficult
UK AISI runs an experiment where a red team presents two sandbagging models (as in, they are pretending to be less capable than they are) and three benign models, and the red team has to identify the sandbaggers using tools like transcript analysis. Training-based elicitation worked best.
The red team won. The blue detection team was fooled.
Sam Bowman of Anthropic guesses that the two key reasons Claude Opus 4.5 turned out more aligned were the soul spec, which the model was trained on directly in order to steer its self-image, and also the inclusion of alignment researchers in every part of the training, and being willing to adjust on the fly based on what was observed rather than adhering to a fixed recipe.
Anthropic introduces Selective GradienT Masking (SGTM). The idea is that you contain certain concepts within a subsection of the weights, and then you remove that section of the weights. That makes it much harder to undo than other methods even with adversarial fine tuning, potentially being something you could apply to open models. That makes it exciting, but if you delete the knowledge you actually delete the knowledge for all purposes.
What AIs Will Want
What will powerful AIs want? Alex Mallen offers an excellent write-up and this graph:
As in, selection will choose those models and model features, fractally, that maximize being selected. The ways to be maximally fit at maximizing selected (or the ‘reward’ that causes such selection) are those that either maximize for the reward, those that are maximizing consequences of reward and thus the reward, or selected-for kludges that thus happen to maximize. At the limit, for any fixed target, those win out, and any flaw in your reward signal (your selection methods) will be fractally exploited.
Implicit priors like speed and simplicity matter too in this model. You can also fix this by doing sufficiently strong selection in other ways to get the things you want over things you don’t want, such as held out evals, or designing rather than selecting targets. Humans do a similar thing, where we detect those other humans who are too strongly fitness-seeking or scheming or using undesired heuristics, and then go after them, creating anti-inductive arms races and plausibly leading to our large brains.
I like how this lays out the problem without having to directly name or assert many of the things that the model clearly includes and implies. It seems like a good place to point people, since these are important points that few understand.
What is the solution to such problems? One solution is a perfect reward function, but we definitely don’t know how to do that. A better solution is a contextually self-improving basin of targets.
People Are Worried About AI Killing Everyone
FLI’s AI Safety Index has been updated for Winter 2025, full report here. I wonder if they will need to downgrade DeepSeek in light of the zero safety information shared about v3.2.
Is it reasonable to expect people working at AI labs to sign a pledge saying they won’t contribute to a project that increases the chance of human extinction by 0.1% or more? Contra David Manheim you would indeed think this was a hard sell. It shouldn’t be, if you believe your project is on net increasing chances of extinction then don’t do the project. It’s reasonable to say ‘this has a chance of causing extinction but an as big or bigger chance of preventing it,’ there are no safe actions at this point, but one should need to at least make that case to oneself.
The trilemma is real, please submit your proposals in the comments.
Other People Are Not As Worried About AI Killing Everyone
There are those who complain that it’s old and busted to complain that those who have [Bad Take] on AI or who don’t care about AI safety only think that because they don’t believe in AGI coming ‘soon.’
The thing is, it’s very often true.
Are there those in the ‘the AI will be nice to us’ camp? Sure. They exist. But strangely, despite AI now being considered remarkably near by remarkably many people – 10 years to AGI is not that many years and 20 still is not all that many – there has increasingly been a shift to ‘the safety people are wrong because AGI is sufficiently far I do not have to care,’ with a side of ‘that is (at most) a problem for future Earth.’
The Lighter Side
A very good ad: