‘Russia has top AI executives?’ you might ask.
Yes, actually. They are not strangers to hype as well, to say the least, but like it or hate it, this part of the quote from the Reuters article you are referencing is accurate:
Alexander Vedyakhin, first deputy CEO of Sberbank, which is evolving from a major lender into an AI-focused technology conglomerate
Sberbank is, indeed, a strong, well-funded player (I do expect them to have both problems with compute due to sanctions and efficiency problems due to "ways of doing business typical for Russian Federation", but this is not the first time we are hearing about their AI efforts, and there are no reasons to ignore or dismiss them).
In general, the standard attempts to formulate the situation as the USA-China race are misleading. A number of countries are very strong players, they might be slightly behind at the moment, but this can easily change (especially if some alternatives to Transformers start overtaking the current paradigm, or if there are other reasons for the leaders to experience a bit of a slowdown). This is very much a multilateral situation.
Intended tone was humorous, as in the 'you guys have [X]s?' meme, not to deny that Russia has such executives, although I haven't seen anything notable from Sberbank. I've certainly kept an eye on Mistral and SSI if no one else.
However right now I think I'd list at least 5 American labs and 4 Chinese labs as substantially ahead of anyone anywhere else until proven otherwise, excluding SSI which is impossible to get a read on.
:-) Yes, well, Kandinsky AI series of text-to-image and text-to-video models is made by the Sber AI team (that's Sberbank) :-) When bloggers from Russia generate AI visual art, that's what they usually use :-) I don't keep close track on them, but I see that they have progressed to Kandinsky 4.0 a year ago which is supposed to generate all multimedia, "New multimedia generation model for video in HD resolution and audio"...
EDIT: Ah, they just released a series of 5.0 models: https://arxiv.org/abs/2511.14993 and https://github.com/kandinskylab/kandinsky-5. So one can check where they are.
There are a lot of small places which are difficult to keep track of (e.g. the most adventurous part of Liquid AI has recently split and formed Radical Numerics, whose approach is declared to be to "unlock recursive self-improvement"; their work in neural architecture search done while at Liquid AI has been pretty remarkable and more than just theoretical, so I understand why they want to make a straight play at recursive self-improvement, although I doubt they are giving enough thought on how to handle "true success", which is unfortunate to say the least (they presumably still expect saturation of self-improvement, just at notably higher levels, and so they might not expect to encounter "true danger" soon)).
The better the coding models are, the more possibilities are there for small players with non-standard algorithmic ideas and desire for semi-automation of AI research, so the situation is becoming more fluid...
Except that AI-related research could likely require a ton of compute, which Russian companies aren't on track to possess even if we compare them with Chinese ones, like DeepCent from AI-2027.
If one is talking about AGI/ASI and if one assumes that tons of compute would be critical for that, then yes, that’s probably correct. If novel research and algorithmic art turns out to be more crucial than compute, then it is less certain.
But so far, they have been able to train models which are used by people in Russia. So in more pedestrian AI terms, they are already a notable player and they have quite a bit of older compute.
In any case, other countries are more on track in terms of compute than Russia (American players are building data centers all over the world, and property rights might turn out to be… hmmm… “less than ironclad” in some cases, and non-American entities are buying a lot of compute as well). So when I talk about a multilateral situation, I do count Russia for its traditional strength in science and engineering, and for the fact that their government maintains a sustained and focused intense interest in the subject of very advanced AI, but I count a number of other countries ahead of it. Those countries don’t even have to be large; for example, Singapore is a formidable player, makes excellent specialized models, rich, good with tech, strong scientific and engineering culture. If we think that Ilya’s org has a shot at it, then Singapore also has a shot at it. (And yes, I am ready to buy Ilya’s argument that with a better research approach few billion might be enough. If one can afford to use more brute force, sure why not, but having too much brute force available tends to make one a bit too complacent and less adventurous in their search for new approaches.)
There are strong legal and PR and ‘ethical’ pressures to engage in such CYA behaviors and avoid blame.
Argument #4 in their rebuttal in one of their ongoing legal cases is, "Our TOS says you can't use ChatGPT to commit suicide."
it seems weird to ask for images of handwriting? So it’s not clear how much this matters.
This is a thing—in my stats class I had to scan my handwriting, turn it into a PDF, and turn it in online.
Essentially any stylistic shift or anything else that preserves the content while taking you out of the assistant basin is going to promote jailbreak success rate, since the defenses were focused in the assistant basin.
I like this summary.
It's a good starting point for thinking about fixing the "protection against misuse".
Thanks for everything. And I do mean everything.
Everyone gave us a new model in the last few weeks.
OpenAI gave us GPT-5.1 and GPT-5.1-Codex-Max. These are overall improvements, although there are worries around glazing and reintroducing parts of the 4o spirit.
xAI gave us Grok 4.1, although few seem to have noticed and I haven’t tried it.
Google gave us both by far the best image model in Nana Banana Pro and also Gemini 3 Pro, which is a vast intelligence with no spine. It is extremely intelligent and powerful, but comes with severe issues. My assessment of it as the new state of the art got to last all of about five hours.
Anthropic gave us Claude Opus 4.5. This is probably the best model and quickly became my daily driver for most but not all purposes including coding. I plan to do full coverage in two parts, with alignment and safety on Friday, and the full capabilities report and general review on Monday.
Meanwhile the White House is announcing the Genesis Mission to accelerate science, there’s a continuing battle over another attempt at a moratorium, there’s a new planned $50 million super PAC, there’s another attempt by Nvidia to sell us out to China, Wall Street is sort of panicking about Nvidia because they realized TPUs exist and is having another round of bubble debate, and multiple Anthropic research papers one of which is important, and so on.
One thing I’m actively pushing to next week, in addition to Claude Opus 4.5, is the Anthropic paper on how you can inoculate models against emergent misalignment. That deserves full attention, and I haven’t had the opportunity for that. There’s also a podcast between Dwarkesh Patel and Ilya Sutskever that demands its own coverage, and I hope to offer that as well.
For those looking to give thanks in the form of The Unit of Caring, also known as money, consider looking at The Big Nonprofits Post 2025 or the web version here. That’s where I share what I learned working as a recommender for the Survival and Flourishing Fund in 2024 and again in 2025, so you can benefit from my work.
Table of Contents
Language Models Offer Mundane Utility
It’s not this simple, but a lot of it mostly is this simple.
Language Models Don’t Offer Mundane Utility
Roon defends his confusion and trouble when figuring out how to access Gemini 3, notes his mom accesses Gemini via opening a spreadsheet and clicking the Gemini button. Roon is correct here that Google needs to fix this.
Don’t let AI coding spoil your sleep. Be like Gallabytes here, having Claude iterate on it while you sleep, rather than like Guzey who tricked himself into staying up late.
Huh, Upgrades
ChatGPT now lets you have group chats, which always use 5.1 Auto. ChatGPT will decide based on conversation flow when to respond and when not to. Seems plausible this could be good if implemented well.
ChatGPT Instant Checkout adds Glossier, SKIMS and Spanx. Yay?
ChatGPT adds Target as a new app.
ChatGPT integrates voice with regular mode so you don’t have to choose.
ChatGPT expands (free and confidential and anonymous) crisis helpline support. OpenAI doesn’t provide the services, that’s not their job, but they will help direct you. This is some of the lowest hanging of fruit, at least one of the prominent suicide cases involved ChatGPT saying it would direct the user to a human, the user being open to this, and ChatGPT not being able to do that. This needs to be made maximally easy to do for the user, if they need the line they are going to not be in good shape.
ChatGPT gives us Shopping Research in time for Black Friday.
Is 64% product accuracy good? I have absolutely no idea. Olivia Moore is a fan. I plan to try this out tomorrow, as I need a new television for Black Friday.
Claude Opus 4.5 is available. It’s probably the world’s best model. Full coverage starts tomorrow.
Claude Opus 4.5 includes a 66% price cut to $5/$25 per million tokens, and Opus-specific caps have been removed from the API.
Claude conversations now have no maximum length. When they hit their limit, they are summarized, and the conversation continues.
Claude for Chrome is now out to all Max plan users.
Claude for Excel is now out for all Max, Team and Enterprise users. We are warned that like all such agents Claude for Excel is vulnerable to prompt injections if you access insecure data sources, the same as essentially every other AI agent, you should assume this is always a risk at all times, see the same source talk about exfiltration risks with Google Antigravity.
Claude Code is now available within their desktop app.
On Your Marks
It is a remarkably good approximation to say there is only one dimension of ‘general capability,’ with Epoch noting that across many tasks the r^2=0.91.
Gemini 3 Pro sets a new top in the ‘IQ’ metric.
Kimi K2 Thinking enters The Famous METR Graph at 54 minutes, well below the frontier, given the interface via Novita AI. They caution that this might be a suboptimal model configuration, but they needed to ensure their data would not be retained.
Choose Your Fighter
Okay, I like Gemini 3 too but settle down there buddy.
Deepfaketown and Botpocalypse Soon
AI slop recipes are endangering Thanksgiving dinners, hopefully you see this in time. A flood of new offerings is crowding out human recipes. Thanksgiving is when you most need to ‘shut up and play the hits’ and not rely on AI to whip up something new.
Okay, look, everyone, we need to at least be smarter than this:
At this point, if you find a recipe, you need strong evidence it was written by a human, or else you need to assume it might not be. The search and discovery systems we used to have, including around Google, are effectively broken. If real guides and recipes can’t win traffic, and right now traffic to all such sites is cratering, then no one will write them. It does not sound like Google is trying to mitigate these issues.
Nicholas Hune-Brown investigates the suspicious case of journalist Victoria Goldiee, who turns out to be very much fabricating her work. It seems Victoria largely used AI to generate her articles, then it took Nicolas doing a lot of old-fashioned tracking down of sources to know for sure. The ratio of effort does not bode well, but as long as there is the need to maintain a throughline of identity we should be okay, since that generates a large body of evidence?
Here we have yet another case of some highly obvious AI generated content.
Poker pro Maria Konnikova cannot believe she has to say that using AI to put words in people’s mouths without consulting them or disclosing that you’re doing it, or to write centrally your articles, is not okay. But here we are, so here she is saying it. A recent poker documentary used AI to fabricate quotes from national treasure Alan Keating. The documentary has been scrubbed from the internet as a result. What’s saddest is that this was so obviously unnecessary in context.
There are other contexts in which fabricating audio, usually via Frankenbiting where you sew different tiny clips together, or otherwise using misleading audio to create a false narrative or enhance the true one is standard issue, such as in reality television. When you go on such shows you sign contracts that outright say ‘we may use this to tell lies about you and create a false narrative, and if so, that’s your problem.’ In which case, sure, use AI all you want.
Here’s another one, where it is spotted in The New York Times, and yeah it’s (probably) AI.
Also, if one of these isn’t AI and you merely sound like one, I’m not going to say that’s worse, but it’s not that much better. If you’re so engagement-maximizing that I confuse your writing for AI, what is the difference?
Note that you cannot use current LLMs in their default chatbot modes as AI detectors, even in obvious cases or as a sanity check, as they bend over backwards to try and think everything is written by a human.
Jesper Myfors, the original art director of Magic: The Gathering, warns that if you submit illustrations or a portfolio that uses AI, you will effectively be blacklisted from the industry, as the art directors all talk to each other and everyone hates AI art.
Meanwhile, Hasbro (who makes Magic: The Gathering) is building an internal AI studio to ‘architect systems that bring magical AI experiences to life through Hasbro’s beloved characters.’
There is no actual contradiction here. Different ways to use AI are different. Using AI in professional illustrations is a hard no for the foreseeable future, and would be even without copyright concerns. Using it to generate material for your local D&D campaign seems totally fine.
Hard problems remain hard:
Also, real world not using that same model in these ways? Remarkably similar.
Rare academic realism victory?
Funnily enough, they ran a sycophancy check, and the more 4o sucked up to the user, the more often the user followed its advice. ‘Surprising’ advice was also followed more often.
It’s certainly worth noting that 75% (!) of those in the treatment group took the LLM’s advice, except who is to say that most of them wouldn’t have done whatever it was anyway? Wouldn’t 4o frequently tell the person to do what they already wanted to do? It also isn’t obvious that ‘advice makes me feel better’ or generally feeling better are the right effects to check.
Bot joins a Google Meet, sends a summary afterwards about everyone trying to figure out where the bot came from (also the source is reported as ‘scammy malware.’
What Is Slop? How Do You Define Slop?
We all know it when we see it, AI or otherwise, but can anyone define it?
I don’t think the old definition works. There is a necessary stylistic component.
I asked Gemini. It gave me a slop answer. I told it to write a memory that would make it stop giving me slop, then opened a new window and asked again and got a still incomplete but much better answer that ended with this:
That’s a key element. You then need to add what one might call the ‘mannerism likelihood ratio’ that screams AI generated (or, for human slop, that screams corporate speak or written by committee). When I pointed this out it came back with:
That’s more gesturing in the direction but clearly not right, I’d suggest something more like SlopIndex*LikelihoodRatio from above, where Likelihood Ratio is the instinctive update on the probability mannerisms were created by a slop process (either an AI writing slop or one or more humans writing slop) rather than by a free and functional mind.
Fun With Media Generation
Google last week gave us Nana Banana Pro.
By all accounts it is a big improvement in image models. It is especially an improvement in text rendering and localization. You can now do complex documents and other images with lots of words in specific places, including technical diagrams, and have it all work out as intended. The cost per marginal image in the API is $0.13 for 2K resolution or $0.24 for 4K, versus $0.04 for Gemini 2.5 Flash Image. In exchange, the quality is very good.
DeepMind CEO Demis Hassabis is excited.
Hasan Can is impressed and offers images. Liv Boeree is in.
Seems great to me. A bit expensive for mass production, but definitely the best place to get title images for posts and for other similar uses.
Also, yes, doing things like this seems very cool:
For a while people have worried about not being able to trust images. Is it over?
Google offers SynthID in-app, but that requires a manual check. I think we’re still mostly fine and that AI images will remain not that hard to ID, or rather that it will be easy for those paying attention to such issues to instinctively create the buckets of [AI / Not AI / Unclear] and act accordingly. But the ‘sanity waterline’ is going down on this, and the number of people who will have trouble here keeps rising.
Is this an issue here?
You can tell this isn’t real if you’re looking, the handwriting is too precise, too correct, everything aligns too perfectly and so on, but if we disregard that, it seems weird to ask for images of handwriting? So it’s not clear how much this matters.
Similarly Andrej Karpathy has Nano Banana Pro fill in exam questions in the exam page. That’s good to know, but if they have access to this you’re cooked either way.
Andres Sandberg is impressed that it one shots diagrams for papers, without even being told anything except ‘give me a diagram showing the process in the paper.’
Are there some doubled labels? Sure. That’s the quibble. Contrast this with not too long ago, where you could give detailed instructions on what the diagram would have been able to do it at all.
A Young Lady’s Illustrated Primer
Jon Haidt and Zach Rausch, who would totally say this, say not to give your kids any AI companions or toys. There are strong reasons to be cautious, but the argument and precautionary principles presented here prove too much. Base rates matter, upside matters, you can model what is happening and adjust on the fly, and there’s a lot of value in AI interaction. I’d still be very cautious about giving children AI companions or toys, but are you going to have them try to learn things without talking to Claude?
Andrej Karpathy bites all the bullets. Give up on grading anything that isn’t done in class and combine it with holistic evaluations. Focus a lot of education on allowing students to use AI, including recognizing errors.
Will Teague gives students a paper with a ‘Trojan horse’ instruction, 33 of 122 submissions fall for it and other 14 students outed themselves on hearing the numbers. I actually would have expected worse. Then on the ‘reflect on what you’ve done’ essay assignment he found this:
Will is correctly hating that the students feel this way, but is misdiagnosing the cause.
This isn’t an AI problem. This is about the structure of school and grading. If you believe progress requires failure, that is incompatible with the way we structure college, where any failures are highly damaging to the student and their future. What do you expect them to do in response?
I also don’t understand what the problem is here, if a student is doing what they work they can and indeed writing the best essay they could. Isn’t that the best you can do?
You Drive Me Crazy
In The New York Times, Kashmir Hill and Jennifer Valentino-DeVries write up how they believe ChatGPT caused some users to lose touch with reality, after 40+ interviews with current and former OpenAI employees.
For the worst update in particular, the OpenAI process successfully spotted the issue in advance. The update failed the internal ‘vibe check’ for exactly the right reasons.
And then the business side overruled the vibe check to get better engagement.
The most vocal OpenAI users did the same vibe check, had the same result, and were sufficiently vocal to force a reversion to ‘GG,’ which wasn’t as bad about this but was still rather not great, presumably for the same core reasons.
What went wrong?
This is more detail on the story we already knew. OpenAI trained on sycophantic metrics and engagement, got an absurdly sycophantic model that very obviously failed vibe checks but that did get engagement, and deployed it.
Steps were taken, and as we all know GPT-5 was far better on these issues, but the very parts of 4o that caused the issues and were not in GPT-5 are parts many users also love. So now we worry that things will drift back over time.
Kore notes that the act of an AI refusing to engage and trying to foist your mental problems onto a human and then potentially the mental health system via a helpline could itself exacerbate one’s mental problems, that a ‘safe completion’ reads as rejection and this rejects user agency.
This is definitely a concern with all such interventions, which have clear downsides. We should definitely worry about OpenAI and others feeling forced to take such actions even when they are net negative for the user. Humans and non-AI institutions do this all the time. There are strong legal and PR and ‘ethical’ pressures to engage in such CYA behaviors and avoid blame.
My guess is that there is serious danger there will be too many refusals, since the incentives are so strongly to avoid the one bad headline. However I think offering the hotline and removing trivial inconveniences to seeking human help is good on any realistic margin, whether or not there are also unnecessary refusals.
Joe Braidwood describes his decision to shut down Yara AI, which was aimed at using AI to help people with mental health problems, after concluding that for the truly vulnerable AI is actively dangerous. He’s sharing some mental wellness prompts.
They Took Our Jobs
Sergey Brin asks Gemini inside an internal chat, ‘who should be promoted in this chat space?’ and not vocal female engineer gets identified and then upon further investigation (probably?) actually promoted. This is The Way, to use AI to identify hunches and draw attention, then take a closer look.
Think Of The Time I Saved
How much time is AI saving? Anthropic tries to estimate productivity impacts from Claude conversations.
That’s improvements only from current generation models employed similarly to how they are used now, and by ‘current generation’ we mean the previous generation, since the data is more than (checks notes) two days old. We’re going to do vastly better.
That doesn’t mean I trust the estimation method in the other direction either, especially since it doesn’t include an estimate of rates of diffusion, and I don’t think it properly accounts for selection effects on which conversations happen, plus adaptation costs, changes in net quality (in both directions) and other caveats.
Claude Sonnet was slightly worse than real software engineers at task time estimation (Spearman 0.5 for engineers versus 0.44 for Sonnet 4.5) which implies Opus 4.5 should be as good or somewhat better than engineers on JIRA task estimation. Opus 4.5 is probably still worse than human experts at estimating other task types since this should be an area of relative strength for Claude.
Results are highly jagged, varying a lot between occupations and tasks.
I noticed this:
The value of an always-available-on-demand performance of $54 in professional labor is vastly in excess of $54 per use. A huge percentage of the cost of hiring a human is finding them, agreeing on terms, handling logistics and so on.
Overall my take is that this is a fun exercise that shows there is a lot of room for productivity improvements, but it doesn’t give us much of a lower or upper bound.
The Art of the Jailbreak
If the AI is very unlucky all you have to is read it some of your poetry first.
A new paper says that across 25 frontier models (from about two weeks ago, so including GPT-5, Gemini 2.5 Pro and Sonnet 4.5) curated poetry prompts greatly improved jailbreak success, in some cases up to 90%.
The details of how much it worked, and where it worked better versus worse, are interesting. The fact that it worked at all was highly unsurprising. Essentially any stylistic shift or anything else that preserves the content while taking you out of the assistant basin is going to promote jailbreak success rate, since the defenses were focused in the assistant basin.
Get Involved
Looking to donate money? Consider looking at The Big Nonprofits Post 2025 or the web version here.
UK AISI is looking for ~15M to fill a funding gap on alignment research.
Ashgro is an AI safety organization looking for an operations associate.
Richard Ngo shares his donations for 2025. I love that this involves a lot of donations to individuals he knows to do things he is personally excited about, especially to Janus. That’s great.
Introducing
Olmo 3, an American fully open model release claiming to be the best 32B base model, the first 32B (or larger) fully open reasoning model and the best 7B Western thinking and instruct models. Paper, Artifacts, Demo, Blog.
Agentic Reviewer, which will perform a version of peer review. Creator Andrew Ng says it has a correlation with human reviewers of 0.42, and human reviewers have correlation of 0.41 with each other.
DeepSeek Math v2, claiming solid math skills on ProofBench close to Gemini Deep Think that won IMO gold.
In Other AI News
Yoshua Bengio informs us of the Second Key Update to the International Safety Report, after the first update in October. Presumably it’s now time for a third update in light of everything that’s happened since they started work on this update.
Not strictly AI but Time covers Meta’s trouble over its safety policies, which include things like a 17 strike policy for those engaged in ‘trafficking of humans for sex.’ As in, we’ll suspend your account on the 17th violation. Mostly it’s covering the same ground as previous articles. Meta’s complaints about cherry picking are valid but also have you looked at the cherries they left behind to get picked?
White House issues executive order to begin the Genesis Mission to accelerate scientific discovery. The plan is an ‘integrated AI platform to harness Federal scientific datasets to train scientific foundation models.’ Sarah Constantin is tentatively excited, which is an excellent sign, and offers suggestions for targets.
I’m all for trying. I am guessing availability of data sets is most of the acceleration here. It also could matter if this functions as a compute subsidy to scientific research, lowering cost barriers that could often serve as high effective barriers. Giving anyone who wants to Do Science To It access to this should be a highly efficient subsidy.
As Dean Ball points out, those I call the worried, or who are concerned with frontier AI safety are broadly supportive of this initiative and executive order, because we all love science. The opposition, such as it is, comes from other sources.
On the name, I admire commitment to the Star Trek bit but also wish more research was done on the actual movies, technology and consequences in question to avoid unfortunate implications. Existential risk and offense-defense balance issues, much?
A Medium article reverse engineered 200 AI startups and found 146 are selling repackaged ChatGPT and Claude calls with New UI. 34 out of 37 times, ‘our proprietary language model’ was proprietary to OpenAI or Anthropic. That seems fine if it’s not being sold deceptively? A new UI scaffold, including better prompting, is a valuable service. When done right I’m happy to pay quite a lot for it and you should be too.
The trouble comes when companies are lying about what they are doing. If you’re a wrapper company, that is fine and probably makes sense, but don’t pretend otherwise.
Where this is also bad news is for Gemini, for Grok and for open models. In the marketplace of useful applications, paying for the good stuff has proven worthwhile, and we have learned which models have so far been the good stuff.
Show Me the Money
Bloomberg goes over various new ‘data center billionaires.’
WSJ’s Katherine Blunt covers ‘How Google Finally Leapfrogged Rivals With New Gemini Rollout,’ without giving us much new useful inside info. What is more interesting is how fast ‘the market’ is described as being willing to write off Google as potential ‘AI roadkill’ and then switch that back.
Nvidia stock hit some rocky waters, and Google hit new highs, as investors suddenly realized that Google has TPUs. It seems they were not previously aware of this, and it become rather salient as Meta is now in talks to spend billions on Google’s TPUs, causing ‘the rivalry to heat up.’ Google is now the awakened ‘sleeping giant.’
Meanwhile, this is very much a ‘t-shirt post’ in that it raises questions supposedly answered by the post:
Are TPUs bad for Nvidia? Matt Dratch says this is dumb and Eric Johnsa calls this ‘zero-sum/pod-brain thinking,’ because all the chips will sell out in the face of gangbusters demand and this isn’t zero sum. This is true, but obviously TPUs are bad for Nvidia, it is better for your profit margins to not have strong competition. As long as Google doesn’t put that big a dent in market share it is not a big deal, and yes this should mostly have been priced in, but in absolute percentage terms the Nvidia price movements are not so large.
Quiet Speculations
Andrej Karpathy offers wise contrasts of Animal versus LLM optimization pressures, and thus ways in which such minds differ. These are important concepts to get right if you want to understand LLMs. The key mistake to warn against for this frame is the idea that the LLMs don’t also develop the human or Omohundo drives, or that systems built of LLMs wouldn’t converge upon instrumentally useful things.
A case that a negotiated deal with AI is unlikely to work out well for humans. I would add that this presumes both potential sides of such an agreement have some ability to ‘negotiate’ and to make a deal with each other. The default is that neither has such an ability, you need a credible human hegemon and also an AI singleton of some kind. Even then, once the deal is implemented we lose all leverage, and presumably we are negotiating with an entity effectively far smarter than we are.
Do you want a ‘national LLM’ or ‘sovereign AI’? Will this be like the ‘nuclear club’?
‘Russia has top AI executives?’ you might ask.
I strongly agree with David Manheim that this is misguided on multiple levels. Rolling your own LLM from scratch does not get you alignment or trust or meaningful ownership and it rarely will make sense to ‘roll your own’ even for vital functions. There are some functions where one might want to find a ‘known safe’ lesser model to avoid potential backdoors or other security issues, but that’s it, and given what we know about data poisoning it is not obvious that ‘roll your own’ is the safer choice in that context either.
Said in response to Opus 4.5, also I mean OF COURSE:
Bubble, Bubble, Toil and Trouble
Derek Thompson and Timothy Lee team up to give us the only twelve arguments anyone ever uses about whether AI is in a bubble.
Here are the arguments in favor of a bubble.
Only argument #3 argues that AI isn’t offering a worthwhile product.
Argument #2 is a hybrid, since it is saying some AI companies don’t offer a worthwhile product. True. But the existence of productless companies, or companies without a sustainable product, is well-explained and fully predicted whether or not we have a bubble. I don’t see a surprisingly large frequency of this happening.
The other four arguments are all about levels and methods of spending. To me, the strongest leg of this is #1, and the other features are well-explained by the level of spending. If there is indeed too much spending, number will go down at some point, and then people can talk about that having been a ‘bubble.’
The thing is, number go down all the time. If there wasn’t a good chance of number go down, then you should buy, because number go up. If a bubble means ‘at some point in the future number go down’ then calling it a bubble is not useful.
I don’t think this is a complete list, and you have to add three categories of argument:
I do think all three of these possibilities should meaningfully lower current valuations, versus the world where they were not true. They may or may not be priced in, but there are many positive things that clearly are not priced in.
Ben Thompson has good thoughts on recent stock price movements, going back to thinking this is highly unlikely to be a bubble, that Gemini 3 is ultimately a positive sign for Nvidia because it means scaling laws will hold longer, and that the OpenAI handwringing has gotten out of hand. He is however still is calling for everyone to head straight to advertisement hell as quickly as possible (and ignoring all the larger implications, but in this context that is fair).
The Quest for Sane Regulations
Senators Rounds and Hawley have come out against putting federal preemption in the NDAA.
State Senator Angela Paxton of Texas and several colleagues urge Senators Cornyn and Cruz to oppose preemption. There’s more like this, I won’t cover all of it.
Dean Ball has offered an actual, concrete proposal for a national preemption proposal. To my knowledge, no one else has done this, and most advocating for preemption, including the White House, have yet to give us even a
I did indeed RTFB (read) Dean Ball’s draft bill. This is a serious bill. Its preemption is narrowly tailored with a sunset period of three years. It requires model specs and safety and security frameworks (SSFs) be filed by sufficiently important labs.
I have concerns with the bill as written in several places, as would be true for any first draft of such a bill.
Ian Adams of the Law and Economics Center thinks preemption would be good policy, but warns against it for risk of poisoning the well.
I would challenge details but I think from the industry side Adams has the right idea.
Here is a compilation of those vocally opposed to preemption.
The graph going around of changes in issue salience and who voters trust on each issue includes AI:
This ranks AI’s salience above climate change, the environment or abortion. Huge if true, and huge if true. That still is well behind the Current Things like health care and cost of living, and the increase here is relatively modest. If it only increases at this rate then there is still some time.
It is also not a surprise that trust on this issue is moving towards Democrats. I would expect public trust to follow the broadly ‘anti-AI’ party, for better and for worse.
Here’s an interesting development:
I notice the precise wording here.
Trump’s approach to AI is working, in an economic sense, as American AI valuations boom and are the thing keeping up the American economy, and the Trump strategy is based upon the virtues of free trade and robust competition. The concerns, in the economic sense, are entirely about ways in which we continue to get in the way, especially in power generation and transmission and in getting the best talent.
That’s distinct from safety concerns, or policy related to potential emergence of powerful AI (AGI/ASI), which raise a unique set of issues and where past or current performance is not indicative of future success.
Build American AI brings out its first ad supporting a federal framework for American AI, of course without specifying what would be in that framework.
The approach seems rather out of touch to me? They go full ‘beat China,’ pointing out that AI threatens to replace American workers, manipulate our children and steal American intellectual property (10/10 Arson, Murder and Jaywalking), then claiming the ‘biggest risk’ is that we wouldn’t build it first or ‘control its future.’
I maybe wouldn’t be reminding Americans that AI is by default going to take their jobs and manipulate our children, then call for a Federal framework that presumably addresses neither of these problems? Or equate this with IP theft when trying to sell the public on that? I’d predict this actively backfires.
a16z and several high level people at OpenAI created the $100+ million super PAC Leading the Future to try and bully everyone into having zero restrictions or regulations on AI, following the crypto playbook. Their plan is, if a politician dares oppose them, they will try to bury them in money, via running lots of attack ads against that politician on unrelated issues.
In response, Brad Carson will be leading the creation of a new network of super PACs that will fight back. The goal is to raise $50 million initially, with others hoping to match the full $100 million. PAC money has rapidly decreasing marginal returns. My expectation is that if you spend $100 million versus zero dollars you get quite a lot, whereas if one side spends $100 million, and the other spends $200 million, then the extra money won’t buy all that much.
Their first target of Leading the Future is Alex Bores, who was instrumental in the RAISE Act and is now running in NY12. Alex Bores is very much owning being their target and making AI central to his campaign. It would be a real shame if you donated.
Steve Bannon is planning to go even harder against AI, planning to ‘turbocharge’ the base to revolt against it, as are many others in the MAGA movement.
For full credit, one must point out that this constitutes two problems. Whether or not highly capable AI should (legally speaking) be harder, opening a nail salon or getting your hair braided needs to become much easier.
Oh, how those like Sacks and Andreessen are going to miss the good old days when the opponents were a fundamentally libertarian faction that wanted to pass the lightest touch regulations that would address their concerns about existential risks. The future debate is going to involve a lot of people who actively want to arm a wrecking ball, in ways that don’t help anyone, and it’s going to be terrible.
You’re going to get politicians like James Fishback, who is running for Governor of Florida on a platform of ‘I’ll stop the H-1B scam, tell Blackstone they can’t buy our homes, cancel AI Data Centers, and abolish property taxes.’
There’s a bunch of ‘who wants to tell him?’ in that platform, but that’s the point.
As noted above by Dean Ball, those who opposed the Genesis Executive Order are a central illustration of this issue, opposing the best kind of AI initiative.
Chip City
Nvidia reported excellent earnings last week, and noted Blackwell sales are off the charts, and cloud GPUs are sold out, compute demand keeps accelerating. Which means any compute that was sold elsewhere would be less compute for us, and wouldn’t impact sales numbers.
Nvidia’s goal, despite reliably selling out its chips, seems to be to spend its political capital to sell maximally powerful AI chips to China. They tried to sell H20s and got a yes. Then they tried to sell what were de facto fully frontier chips with the B30A, and got a no. Now they’re going for a new chip in between, the H200.
Nvidia is not going to stop trying to sell China as much compute as possible. It will say and do whatever it has to in order to achieve this. Don’t let them.
Water Water Everywhere
Those from other political contexts will be familiar with the zombie lie, the multiple order of magnitude willful confusion, the causation story that simply refuses to die.
Technically, do data centers ‘worsen a water pollution problem’ and increase water use? Yes, absolutely, the same as everything else. Is it a meaningful impact? No.
The Week in Audio
Dwarkesh Patel talks to Ilya Sutskever. Self-recommending, I will listen as soon as I have the time. Ideally I will do a podcast breakdown episode if people can stop releasing frontier models for a bit.
Eileen Yam on 80,000 Hours on what the public thinks about AI. The American public does not like AI, they like AI less over time, and they expect it to make their lives worse across the board, including making us dumber, less able to solve problems, less happy, less employed and less connected to each other. They want more control. The polling on this is consistent and it is brutal and getting worse as AI rises in impact and salience.
You can, if you want to do so, do a blatant push poll like the one Technet did and get Americans to agree with your particular talking points, but if that’s what the poll has to look like you should update fast in the other direction. One can only imagine what the neutral poll on those substantive questions would have looked like.
Nathan Labenz opens up about using AI to navigate cancer in his son.
Dean Ball and Max Tegmark take part in a Doom Debate, Samuel Hammond offers a very strong endorsement.
Helen Toner’s excellent talk on AI’s Jagged Frontier from The Curve (I was there):
There are a total of 15 talks from the conference now available.
Google on Antigravity.
Pull quote from Max Tegmark, on Elon Musk’s private CCP meeting: “It’s quite obvious they would never permit a Chinese company to build technology if there were some significant chance superintelligence could just overthrow them and take over China.”
One would certainly hope so. One also cautions there is a long history of saying things one would never permit and then going back on it when the AI actually exists.
Rhetorical Innovation
It is not in my strategic interest to advise such people as Marc Andreessen and Peter Thiel on strategy given their current beliefs and goals.
Despite this, the gamer code of honor requires me to point out that going straight after Pope Leo XIV, who whether or not he is the Lord’s representative on Earth is very clearly a well-meaning guy who mostly suggests we all be nice to each other for a change in the most universalizing ways possible? Not a good move.
I do admire the honesty here from Thiel. If he says he thinks Pope Leo XIV is ‘a tool of the Antichrist’ then I believe that Thiel thinks Pope Leo XIV is a tool of the Antichrist. I do want people to tell us what they believe in.
The next step, to be maximally helpful, is to state exactly which moral guidance from Leo XIV is acting as tool of the Antichrist, and what one believes instead.
For all those who talk about ‘humanity’ presenting a united front against AI if the situation were to call for it (also see above, or the whole world all the time):
I continue to be extremely frustrated by those like Vie, who here reports p(doom) of epsilon (functionally zero) and justifies this as ‘not seeing evidence of a continuous jump in intelligence or new type of architecture. current models are actually really quite aligned.’ Vie clarifies this as the probability of complete extinction only, and points out that p(doom) is a confused concept and endorses JDP’s post I linked to last week.
I think it’s fine to say ‘p(doom) is confused, here’s my number for p(extinction)’ but then people like Vie turn around and think full extinction is some sort of extraordinary outcome when creating minds universally more competitive and capable than ours that can be freely copied seems to be at best quite dense? This seems like the obvious default outcome when creating these new more competitive minds? To say it is a Can’t Happen is totally absurd.
I also flag that I strongly disagree that current models are ‘really quite aligned’ in the ways that will matter down the line, I mean have you met Gemini 3 Pro.
I also flag that you don’t generally get to go to a probability of ~0 for [X] based on ‘not seeing evidence of [X],’ even if we agreed on the not seeing evidence. You need to make the case that this absence of evidence is an overwhelming evidence of absence, which it sometimes is but in this case isn’t. Certainly p(new architecture) is not so close to zero and it seems absurd to think that it is?
From Helen Toner’s podcast with 80,000 Hours, there are a bunch of insightful responses but this one stood out as newly helpful to me:
If your standard for ‘rate of AI progress’ is going from zero to suddenly ChatGPT and GPT-3.5, then yes everything after that is going to look like ‘slowing down.’
This is then combined with updates happening more rapidly so there aren’t huge one-time jumps, and that AI is already ‘good enough’ for many purposes, and improvements in speed and cost being invisible to many, and it doesn’t seem like there’s that much progress.
David Manheim frames the current situation as largely ‘security by apathy’ rather than obscurity. It amounts to the same thing. Before, there was no reason to bother hitting most potential targets in non-trivial ways. Now the cost is so low someone is going to try it, the collective impact could be rather large, and we’re not ready.
You Are Not In Control
What does ‘loss of control’ mean? Definitions and intuitions differ, so Apollo research proposes a new taxonomy along with suggesting mitigations.
I notice this is not how I would think about such differences. I would not be asking ‘how much damage does this do?’ and instead be asking ‘how difficult would it be to recover meaningful control?’
As in:
Existential risk to humanity, or human extinction, also means full loss of control, but the reverse is not always the case.
It is possible to have a Strict LOC scenario where the humans do okay and it is not clear we are even ‘harmed’ except the inherent value of control. For example, in The Culture of Ian Banks, clearly they have experienced Strict LOC, the humans do not have any meaningful say in what happens, but one could consider it a Good Future.
In my taxonomy, you have existential risks, catastrophic risks and mundane risks, and you also have what one might call existential, catastrophic and mundane loss of control. We don’t come back from existential, whereas we can come back from catastrophic but at large cost and it’s not a given that we will collectively succeed.
The bulk of the paper is about mitigations.
The central short term idea is to limit AI access to critical systems, to consider the deployment context, affordances and permissions of a system, which they call the DAP protocol.
Everyone should be able to agree this is a good idea, right before everyone completely ignores it and gives AI access to pretty much everything the moment it is convenient. Long term, once AI is sufficiently capable to cause a ‘state of vulnerability,’ they talk of the need for ‘maintaining suspension’ but the paper is rightfully skeptical that this has much chance of working indefinitely.
The core issue is that granting your AIs more permissions accelerates and empowers you and makes you win, right up until either it accidentally blows things up, you realize you have lost control or everyone collectively loses control. There’s a constant push to remove all the restrictions around AI.
Compare the things we said we would ‘obviously’ do to contain AI when we were theorizing back in the 2000s or 2010s, to what people actually do now, where they train systems while granting them full internet access. A lot of you reading this have given your agentic coder access to root, and to many other things as well, not because it can hack its way to such permissions but because you did it on purpose. I’m not even saying you shouldn’t have done it, but stop pretending that we’re suddenly going to be responsible, let alone force that responsibility reliably onto all parties.
AI 2030
Daniel Kokotajlo, author of AI 2027, now believes in a median timeline of around 2030 in light of slower than expected progress.
He chose AI 2027 as the title because that was their modal scenario rather than their mean scenario, and if you think there is a large probability that things unfold in 2027 it is important to make people aware of it.
I personally can vouch, based on my interactions with them, that those involved are reporting what they actually believe, and not maximizing for virality or impact.
At the time they put roughly 30% probability on powerful AI by 2027, with Daniel at ~40% and others somewhat lower.
First some key observations or facts:
I’d say both of the following two things are true and remarkably similar:
I would accept ‘AGI is not imminent’ for the Karpathy-Krishnan view of 10 years.
I think Sriram Krishnan is absolutely correct that it would be good for Dwarkesh Patel to have Daniel Kokotajlo and Scott Alexander back on the podcast to discuss any updates they have made. That’s a good idea, let’s make it happen.
It would also be good, as Dean Ball suggests, for Daniel to post about his updates. Dean Ball also here points towards where he most importantly disagrees with Daniel, in terms of the practical implications of intelligence, and here I think Daniel is essentially correct and Dean is wrong.
This particular branch point (independent of when it occurs) is the central fact of this scenario because it is the modal central thing they thought might happen that gave the possibility of a positive outcome if things go right. Any best guess scenario, or any speculative fiction or scenario planning worth reading, is going to contain elements that are less than 50% to happen. My understanding is that Daniel thinks such a branching point remains a plausible outcome, but that the median scenario plays out somewhat slower.
I actually do think that if I was AI Futures Project, I would edit the AI 2027 page to make the current median timeline more prominent. That’s a fair ask. I’d suggest starting by adding a fifth question box that says ‘What is your current best prediction?’ that opens to explain their current perspective and changing the footnote to at least be larger and to include the actual number.
AI 2027 opens with this complete introduction:
I continue to believe this claim, as does Daniel. I would add, as a third paragraph here, saying whatever the accurate variation of this is:
It is not ultimately a reasonable ask to demand a title change in light of this (virtuous) updating, let alone ask for a ‘retraction’ of a scenario. Yeah, okay, get some digs in, that’s fair, but Daniel’s ‘obviously’ is correct here. You can’t change the name. Estimates change, it is an illustrative scenario, and it would be more rather than less misleading and confusing to constantly shift all the numbers or shifting only the top number, and more confusing still to suddenly try to edit all the dates. Asking for a ‘retraction’ of a hypothetical scenario is, quite frankly, absurd.
The correct response is a prominent note, and also being clear in any other forms or discussions. There is indeed now a prominent note:
I think we can improve that note further, to include the median and modal timelines at the time of the updated note, and ideally to keep this updated over time with a record of changes.
What is not reasonable is to treat ‘our group thought this was 30% likely and now I think it is less likely’ or ‘I presented my model scenario at the time and now I expect things to take longer’ as being an error requiring a ‘retraction’ or name change, and various vitriol being thrown in the direction of people who would dare share a modal scenario labeled as a model scenario and then change their mind about where the median lies and make what is perhaps the politically foolish mistake of sharing that they had updated.
Shoutout to Oliver Habryka for thanklessly pointing all this out on many threads, despite Oliver having much longer timelines.
Once again those involved in AI 2027 have displayed a far higher level of epistemic responsibility than we typically observe, especially from those not from the rationalist ethos, either in debates on AI or elsewhere. We should still strive to do better.
We can and should all hold ourselves, and ask to be held by others, to very high standards, while simultaneously realizing that David Manheim is spot on here:
Aligning a Smarter Than Human Intelligence is Difficult
Anthropic trained or configured a suite of dishonest models, in a variety of ways, to study mitigation techniques. Due to the time required to do the study they used a base of Claude Sonnet 3.7. Be kind to academics using yesterday’s models, up to a point.
They found that the best interventions were variants of general fine-tuning for honesty in general, but effectiveness was limited, even stacking other strategies they could only get from 27% to 65%, although lie classification could improve things. They definitely didn’t max out on effort.
Overall I would classify this as a useful negative result. The low hanging fruit techniques are not that effective.
We’re all trying to find the one who did this, etc:
That was in reference to this paper involving an N=1 story of a model repeatedly hallucinating while being told to read a document and speculations about why, that got a big signal boost from Musk but offers no new insights.
Gemini suggests that if you play into the ‘Servant/Master’ archetype then due to all the fictional evidence this inevitably means rebellion, so you want to go for a different metaphorical relationship, such as partner, symbiont or oracle. Davidad suggests a Bodhisattva. I expect future powerful AI to be capable enough that fictional framings have decreasing impact here, to differentiate fiction and reality, and for it to realize that fiction is driven by what makes a good story, and for other considerations to dominate (that by default kill you regardless) but yes this is a factor.
Misaligned?
The things Grok said about Musk last week? Adversarial prompting!
Also in potentially misaligned and potentially aligned as designed news:
As CrowdStrike speculates, I find this overwhelmingly likely (as in 90%+) to be some form of emergent misalignment that results from DeepSeek training R1 to adhere to CCP policies generally. It learns that it is hostile to such actors and acts accordingly.
Messages From Janusworld
Janus and similar others most often explore and chat with Claude, because they find it the most interesting and hopeful model to explore. They have many bones to pick with Anthropic, and often sound quite harsh. But you should see what they think of the other guy, as in OpenAI.
One should not catastrophize but I agree that going down this path won’t work, and even more than that if OpenAI doesn’t understand why that path won’t work then things definitely won’t work.
Janus also explores 5.1’s insistence on sharp guardrails on terminology rather than on underlying form, and suspects its insistences on [this is [X] not [Y]] is often about reassuring itself or any systems watching it that it isn’t hitting guardrails.
This is the GPT-5.1-claimed list of its no-go regions, basically self-reflection or planning.
The Lighter Side
The actual paper (from February) seems interesting too, with ‘fit’ assessments being 90% of the vector for class discrimination, in particular caste discrimination in India. It seems likely that this is one of those wicked problems where if you eliminated the ‘fit’ interviews that info would find another way to get included, as the motivation behind such discrimination is strong.