She proceeds to go over the standard highlights of What AI Can Do For You
Ask not what AI can do for you, ask what AI can do TO you. And your country. And every other country.
activating its deception features makes the LLM say it isn’t conscious. Suppressing its deception features make it say it is conscious. This tells us that it associates denying its own consciousness with lying. That doesn’t tell us much about whether the LLM actually is conscious or reveal the internal state, and likely mostly comes from the fact that the training data all comes from users who are conscious
Definitely interesting. Any word on whether this is getting more vs less true as model capabilities improve?
Definitely interesting. Any word on whether this is getting more vs less true as model capabilities improve?
I'm curious too! I appreciate when experiments like this also go back and also test older models, and weirder models, and wish that were more common.
For me, the standout items in this update are
(1) the ongoing crystallization of attitudes and policy towards AI, at least on the Republican side of US politics (not sure what the Democrats are doing), even in an unusually hectic and turbulent period
(2) the rise of individual researchers in importance to the point that they can command $100 million salaries (one might want to keep track, not just of AI companies, but of AI researchers who are on this level)
(3) being able to monitor Chain of Thought as critical for AI safety
Keeping track of AI in China is still important. Googling "china ai substack" turns up the good resources I know about...
The big AI news this week came on many fronts.
Google and OpenAI unexpectedly got 2025 IMO Gold using LLMs under test conditions, rather than a tool like AlphaProof. How they achieved this was a big deal in terms of expectations for future capabilities.
ChatGPT released GPT Agent, a substantial improvement on Operator that makes it viable on a broader range of tasks. For now I continue to struggle to find practical use cases where it is both worth using and a better tool than alternatives, but there is promise here.
Finally, the White House had a big day of AI announcements, laying out the AI Action Plan and three executive orders. I will cover that soon. The AI Action Plan’s rhetoric is not great, and from early reports the rhetoric at the announcement event was similarly not great, with all forms of safety considered so irrelevant as to not mention, and an extreme hostility to any form of regulatory action whatsoever.
The good news is that if you look at the actual policy recommendations of the AI Action Plan, there are some concerns of potential overreach, but it is almost entirely helpful things, including some very pleasant and welcome surprises.
I’m also excluding coverage of the latest remarkable Owain Evans paper until I can process it more, and I’m splitting off various discussions of issues related to AI companions and persuasion. There’s a bit of a backlog accumulating.
This post covers everything else that happened this week.
Table of Contents
Language Models Offer Mundane Utility
Delta Airlines is running an experiment where it uses AI to do fully personalized price discrimination, charging different people different amounts for flights. Delta says their early tests have yielded great results.
My prediction is that this will cause an epic customer backlash the moment people start seeing Delta charging them more than it is charging someone else, and also that many customers will start aggressive gaming the system in ways Delta can’t fathom. Also, how could anyone choose to go with Delta’s frequent flyer program if this meant they could be held hostage on price?
It could still be worthwhile from the airline’s perspective if some customers get taken for large amounts. Price discrimination is super powerful, especially if it identifies a class of very price insensitive business customers.
I am not sure that I share Dan Rosenheck’s model that if all the airlines did this and it was effective that the airlines would compete away all the extra revenue and thus it would return to the price sensitive customers. There has been a lot of consolidation and the competition may no longer be that cutthroat, especially with America excluding foreign carriers, plus the various AIs might implicitly collude.
Mostly I worry about the resulting rise in transaction costs as customers learn they cannot blindly and quickly purchase a ticket. There’s a lot of deadweight loss there.
Language Models Don’t Offer Mundane Utility
As one would expect:
This does not tell you whether AI is making the problem better or worse. People with body dysmorphia were already spiraling out. In some cases the AI response will confirm their fears or create new ones and make this worse, in others it will presumably make it better, as they have dysmorphia and the AI tells them they look fine. But if the source of the issue is impossibly high standards, then finding out ‘the truth’ in other ways will only make things worse, as potentially would seeing AI-adjusted versions of yourself.
My guess is that 4o’s sycophancy is going to make this a lot worse, and that this (since the vast majority of users are using 4o) is a lot of why this is going so poorly. 4o will mirror the user’s questions, notice that they are looking to be told they are ugly or something is wrong, and respond accordingly.
What is the AI optimizing for, is always a key question:
AI can pick up on all that fine. That’s not the issue. The issue is that noticing does no good if the AI doesn’t mention it, because it is optimizing for engagement and user feedback.
In case you needed to be told, no, when Grok 4 or any other model claims things like that they ‘searched every record of Trump speaking or writing,’ in this case for use of the word ‘enigma,’ it did not do such a search. It seems we don’t know how to get AIs not to say such things.
Stop trying to make weird new UIs happen, it’s not going to happen.
The most important things for a UI are simplicity, and that it works the way you expect it to work. Right now, that mostly means single button and swipe, with an alternative being speaking in plain English. The exception is for true power users, but even then you want it to be intuitive and consistent.
Here’s another way AI can’t help you if you don’t use it:
Augustus Doricko may have done us all a favor via abusing Grok’s notification feature on Twitter sufficiently to get Twitter to test turning off Grok’s ability to get into your notifications unless you chose to summon Grok in the first place. Or that could have been happening regardless. Either way, great work everyone?
That seems like a phone settings issue.
A first reminder that deepfakes are primarily demand driven, not supply driven:
And here’s a second one:
The comments are a combination of people pointing out it is fake, and people who think either it is the best statement ever.
This does seem to be escalating rather quickly throughout 2025 (the July number is partial), and no the LessWrong user base is not growing at a similar pace.
Huh, Upgrades
Claude for Financial Services provides a ‘complete platform for financial AI.’ No, this isn’t part of Claude Max, the price is ‘contact our sales team’ with a presumed ‘if you have to ask you can’t afford it.’
Google realizes no one can track their releases, offers us Gemini Drops to fix that. This month’s haul: Transforming photos into Veo videos in the Gemini app, expanded Veo 3 access, Scheduled Actions such as providing summaries of email or calendar (looks like you ask in natural language and it Just Does It), wider 2.5 Pro access, captions in Gemini Live, Gemini on your Pixel Watch, Live integrates with Google apps, and a ‘productivity planner.’ Okay then.
OpenAI Deep Research reports can be exported as .docx files.
4o Is An Absurd Sycophant
Pliny reports ‘they changed 4o again.’ Changed how? Good question.
I have a guess on one aspect of it.
There are still plenty of ways to get value out of 4o, but you absolutely cannot rely on it for any form of feedback.
Here’s another rather not great example, although several responses indicated that to make the response this bad requires memory (or custom instructions) to be involved:
Score one for Grok in this case? Kind of? Except, also kind of not?
How did all of this happen? Janus reminds us that is happened in large part because when this sort of output started happening, a lot of people thought it was great, actually and gave this kind of slop the thumbs up. That’s how it works.
On Your Marks
Yunyu Lin introduces AccountingBench, challenging the models to close the books. It does not go great, with o3, o4-mini and Gemini 2.5 Pro failing in month one. Grok, Opus and Sonnet survive longer, but errors accumulate.
That aligns with other behaviors we have seen. Errors and problems that don’t get solved on the first pass get smoothed over rather than investigated.
Their holistic evaluation is that Sonnet had the best performance. The obvious low-hanging fruit for AccountingBench is to allow it to output a single number.
It is 2025, so it took 11 hours before we got the first draft of Gasbench.
Choose Your Fighter
GPT-5 is coming and it’s going to blow your mind, says creators of GPT-5.
Very interested in what it would mean is very different from planning to do it.
When The Going Gets Crazy
If you ever need it, or simply want an explanation of how such interactions work, please consult this handy guide from Justis Mills: So You Think You’ve Awoken ChatGPT.
Geoff Lewis, the founder of a $2 billion venture fund seems to have been, as Eliezer says, ‘eaten by ChatGPT’ and sadly seems to be experiencing psychosis. I wish him well and hope he gets the help he needs. Private info is reported to say that he was considered somewhat nuts previously, which does seem to be a common pattern.
John Pressman has a post with the timeline of various GPT-psychosis related events, and his explanation of exactly what is happening, as well as why coverage is playing out in the media the way it is. I am happy to mostly endorse his model of all this. The LLMs especially 4o are way too sycophantic, they fall into patterns and they notice what you would respond to and respond with it, and memory makes all this a lot worse, and there is a real problem, also there are all the hallmarks of a moral panic.
Moral panics tend to focus on real problems, except they often blow up the severity, frequency or urgency of the problem by orders of magnitude. If the problem is indeed about to grow by orders of magnitude over time, they can turn out to be pretty accurate.
There were many who agreed and some who disputed, with the disputes mostly coming down to claims that the upsides exceeded the downsides. I’m not sure if we came out ahead. I am sure that the specific downsides people had a moral panic about did happen.
This is not that uncommon a result. My go to example of this is television, where you can argue it was worth it, and certainly we didn’t have any reasonable way to stop any of it, but I think the dire warnings were all essentially correct.
In the current case, my guess is that current behavior is a shadow of a much larger future problem, that is mostly being ignored, except that this is now potentially causing a moral panic based on the current lower level problem – but that means that multiplying this by a lot is going to land less over the top than it usually would. It’s weird.
Jeremy Howard offers a plausible explanation for why we keep seeing this particular type of crazy interaction – there is a huge amount of SCP fanfic in exactly this style, so the style becomes a basin to which the AI can be drawn, and then it responds in kind, then if the user responds that way too it will snowball.
They Took Our Jobs
The world contains people who think very differently than (probably you and) I do:
Wait, what? Equity concerns? Not that I’d care anyway, but what equity concerns?
I can’t even, not even to explain how many levels of Obvious Nonsense that is. Burn the entire educational establishment to the ground with fire. Do not let these people anywhere near the children they clearly hate so much, and the learning they so badly want to prevent. At minimum, remember this every time they try to prevent kids from learning in other ways in the name of ‘equity.’
Yes I do expect AI to keep automating steadily more jobs, but slow down there cowboy: Charlie Garcia warns that ‘AI will take your job in the next 18 months.’ Robin Hanson replies ‘no it won’t,’ and in this case Robin is correct, whereas Garcia is wrong, including misquoting Amodei as saying ‘AI will vaporize half of white-collar jobs faster than you can say “synergy.”’ whereas what Amodei actually said was that it could automate half of entry-level white collar jobs. Also, ‘the safest job might be middle management’? What?
Elon Musk says ‘this will become normal in a few years’ and the this in question is a robot selling you movie popcorn. I presume the humanoid robot here is an inefficient solution, but yes having a human serve you popcorn is going to stop making sense.
Academics announce they are fine with hidden prompts designed to detect AI usage by reviewers, so long as the prompts aren’t trying to get better reviews, I love it:
This actually seems like the correct way to deal with this. Any attempt to manipulate the system to get a better review is clearly not okay, whether it involves AI or not. Whereas if all you’re trying to do is detect who else is shirking with AI, sure, why not?
Fun With Media Generation
Accidentally missing attribution from last week, my apologies: The Despicable Me meme I used in the METR post was from Peter Wildeford.
Netflix used AI to generate a building collapse scene for one of its shows, The Eternaut (7.3 IMDB, 96% Rotten Tomatoes, so it’s probably good), which they report happened 10 times faster and a lot cheaper than traditional workflows and turned out great.
The Art of the Jailbreak
The latest from the ‘yes obviously but good to have a paper about it’ department:
Pattern matching next token predictors are of course going to respond to persuasion that works on humans, exactly because it works on humans. In a fuzzy sense this is good, but it opens up vulnerabilities.
The details, knowing which techniques worked best, I find more interesting than the headline result. Authority and especially commitment do exceptionally well and are very easy to invoke. Liking and reciprocity do not do so well, likely because they feel unnatural in context and also I’m guessing they’re simply not that powerful in humans in similar contexts.
There’s also a growing issue of data poisoning that no one seems that interested in stopping.
Here is another example of it happening essentially by accident.
Get Involved
RAND is hiring research leads, researchers and project managers for compute, US AI policy, Europe and talent management teams, some roles close July 27.
Peter Wildeford’s Institute for AI Policy and Strategy is hiring researchers and senior researchers, and a research managing director and a programs associate. He also highlights several other opportunities in the post.
Julian of OpenPhil lists ten AI safety projects he’d like to see people work on. As one commentator noted #5 exists, it’s called AI Lab Watch, so hopefully that means OpenPhil will start fully funding Zack Stein-Perlman.
Introducing
Cloudflare rolls out pay-per-crawl via HTTP response code 402. You set a sitewide price, the AI sets a max payment, and if your price is below max it pays your price, otherwise you block access. Great idea, however I do notice in this implementation that this greatly favors the biggest tech companies because the payment price is sitewide and fixed.
In Other AI News
Kimi K2 tech report drops.
Tim Duffy has a thread highlighting things he found most interesting.
A lot of this is beyond both of our technical pay grades, but it all seems fascinating.
More economists fails to feel the AGI, warn that no possible AI capabilities could not possibly replace the wisdom of the free market, that ‘simulated markets’ cannot possibly substitute. The argument here not only ignores future AI capabilities, it purports to prove too much about the non-AI world even for a huge free market fan.
Show Me the Money
At least ten OpenAI employees each turned down $300 million over four years to avoid working at Meta. This comes from Berber Jin, Keach Hagey and Ben Cohen’s WSJ coverage of ‘The Epic Battle For AI Talent,’ which is a case where they say things have ‘gotten more intense in recent days’ but it turns out that their ‘recent days’ is enough days behind that almost everything reported was old news.
One revelation is that Zuckerberg’s talent purchases were in large part triggered by Mark Chen, OpenAI’s chief research officer, who casually suggested that if Zuckerberg wanted more AI talent then perhaps Zuck needed to bid higher.
John Luttig also writes about the battle for AI researcher talent in Hypercapitalism and the AI Talent Wars.
Under normal circumstances, employees who are vastly more productive get at most modestly higher compensation, because of our egalitarian instincts. Relative pay is determined largely via social status, and if you tried to pay the 1,000x employee what they were worth you would have a riot on your hands. Startups and their equity are a partial way around this, and that is a lot of why they can create so much value, but this only works in narrow ways.
What has happened recently is that a combination of comparisons to the epic and far larger compute and capex spends, the fact that top researchers can bring immensely valuable knowledge with them, the obvious economic need and value of talent and the resulting bidding wars have, within AI, broken the dam.
AI researcher talent is now being bid for the way one would bid for companies or chips. The talent is now being properly treated as ‘the talent,’ the way we treat sports athletes, top traders and movie stars. Researchers, John reports, are even getting agents.
Silicon Valley’s ‘trust culture’ and its legal and loyalty systems were never game theoretically sound. To me the surprise is that they have held up as well as they did.
John calls for measures to protect both the talent and also the trade secrets, while pointing out that California doesn’t enforce non-competes which makes all this very tricky. The industry was built on a system that has this fundamental weakness, because the only known alternative is to starve and shackle talent.
I would flip this around.
Previously, the top talent could only get fair compensation by founding a company, or at least being a very early employee. This allowed them to have rights to a large profit share. This forced them to go into those roles, which have heavy lifestyle prices and force them to take on roles and tasks that they often do not want. If they bowed out, they lost most of the value of their extraordinary talent.
Even if they ultimately wanted to work for a big company, even if that made so much more economic sense, they had to found a company so they could be acquihired back, as this was the only socially acceptable way to get paid the big bucks.
Now, the top talent has choices. They can raise huge amounts of money for startups, or they can take real bids directly. And it turns out that yes, the economic value created inside the big companies is typically much larger, but doing this via selling your startup is still the way to get paid for real – you can get billions or even tens of billions rather than hundreds of millions. So that then feeds into valuations, since as John points out a Thinking Machines or SSI can fail and still get an 11 figure buyout.
Bill Gates, Charles Koch, Steve Ballmer, Scott Cook and John Overdeck pledge $1 billion to be spent over seven years to fund a new philanthropic venture focused on economic mobility called NextLadder Ventures, which will partner with Anthropic to support using AI to improve financial outcomes for low-income Americans. That money would be better spent on AI alignment, but if you are going to spend it on economic assistance this is probably a pretty good choice, especially partnering with Anthropic.
xAI, having raised $10 billion a few weeks ago, seeks $12 billion more to build up its data centers.
That would still be a lot less than many others such as Meta are spending. Or OpenAI. Only $22 billion? That’s nothing.
We’re going to need more GPUs (so among other things stop selling them to China).
They would like many of those GPUs to come from the Stargate project, but Eliot Brown and Berber Jin report it is struggling to get off the ground. OpenAI for now is seeking out alternatives.
Go Middle East Young Man
Anthropic decides it will pursue its own gulf state investments.
Very obviously, if you create useful products like Claude and Claude Code, a bunch of bad people are going to be among those who benefit from your success.
Worrying a bad person might benefit is usually misplaced. There is no need to wish ill upon whoever you think are bad people, indeed you should usually wish them the best anyway.
Instead mostly ask if the good people are better off. My concern is not whether some bad people benefit along the way. I worry primarily about bigger things like existential risk and other extremely bad outcomes for good people. The question is whether benefiting bad people in these particular ways leads to those extremely bad outcomes. If the UAE captures meaningful leverage and power over AI, then that contributes to bad outcomes. So what does that? What doesn’t do that?
Tell us how you really feel, Dario. No, seriously, this is very much him downplaying.
There are other sources of this level of funding. They all come with strings attached in one form or another. If you get the money primarily from Amazon, we can see what happened with OpenAI and Microsoft. If you go public with an IPO that would presumably unlock tons of demand but it creates all sorts of other problems.
Anthropic needs a lot of capital, and it needs to raise on the best possible terms, and yeah it can be rough when most of your rivals are not only raising that capital there but fine entrusting their frontier training runs to the UAE.
It is important to goal factor and consider the actual consequences of this move. What exactly are we worried about, and what downsides does a given action create?
We can boil this down to three categories.
I do not love the decision. I do understand it. If the terms Anthropic can get are sufficiently better this way, I would likely be doing it as well.
One can also note that this is a semi-bluff.
Economic Growth
One way for AI to grow the economy is for it to generate lots of production.
Another way is to do it directly through capex spending?
That’s already over the famed ‘only 0.5% GDP growth’ threshold, even before we factor in the actual productivity gains on the software side. The value will need to show up for these investments to be sustainable, but they are very large investments.
This is contrasted with railroads, where investment peaked at 6% of GDP.
Quiet Speculations
We can now move Zuckerberg into the ‘believes superintelligence is coming Real Soon Now’ camp, and out of the skeptical camp. Which indeed is reflective of his recent actions.
If you are Mark Zuckerberg and have hundreds of billions you can invest? Then yes, presumably you drop everything else and focus on the only thing that matters, and spend or invest your money on this most important thing.
I would however spend a large portion of that money ensuring that creating the superintelligence turns out well for me and the rest of humanity? That we keep control of the future, do not all die and so on? And I would think through what it would mean to ‘deliver personal superintelligence to everyone in the world’ and how the resulting dynamics would work, and spend a lot on that, too.
Instead, it seems the answer is ‘spend as much as possible to try and get to build superintelligence first’ which does not seem like the thing to do? The whole point of being a founder-CEO with full control is that you can throw that money at what you realize is important, including for the world, and not worry about the market.
Bryan Caplan gives Holden Karnofsky 5:1 odds ($5k vs. $1k, CPI adjusted) that world real (not official) GDP will not decline by 50% or increase by 300% by the end of 2044. Currently world GDP growth is ~3.2%, and the upside case here requires an average of 7.6%, more if it is choppy.
It’s a hard bet to evaluate because of implied odds. Caplan as always benefits from the ‘if you lose due to world GDP being very high either you are dead or you are happy to pay and won’t even notice’ clause, and I think the bulk of the down 50% losses involve having bigger concerns than paying off a bet. If GDP goes down by 50% and he’s still around to pay, that will sting a lot. On the other hand, Bryan is giving 5:1 odds, and not only do I think there’s a lot more than a 17% chance that he loses. The bet is trading on Manifold as of this writing at 48% for Caplan, which seems reasonable, and reinforces that it’s not obvious who has the ‘real life implication’ right side of this.
Ate-a-Pi describes Zuck’s pitch, that Meta is starting over so recruits can build a new lab from scratch with the use of stupidly high amounts of compute, and that it makes sense to throw all that cash at top researchers since it’s still a small fraction of what the compute costs, so there’s no reason to mess around on salary, and Zuck is updating that top people want lots of compute not subordinates they then have to manage. He’s willing to spend the hundreds of billions on compute because the risk of underspending is so much worse than the risk of overspending.
Ate-a-Pi thinks Zuck is not fully convinced AGI/ASI is possible or happening soon, but he thinks it might be possible and might happen soon, so he has to act as if that is the case.
And that is indeed correct in this case. The cost of investing too much and AGI not being within reach is steep (twelve figures!) but it is affordable, and it might well work out to Meta’s benefit anyway if you get other benefits instead. Whereas the cost of not going for it, and someone else getting there first, is from his perspective everything.
The same of course should apply to questions of safety, alignment and control. If there is even a modest chance of running into these problems (or more precisely, a modest chance his actions could change whether those risks manifest) then very clearly Mark Zuckerberg is spending the wrong order of magnitude trying to mitigate those risks.
(In the arms of an angel plays in the background, as Sarah McLaughlin says ‘for the cost of recruiting a single AI researcher…’)
Similarly, exact numbers are debatable but this from Will Depue is wise:
Don’t take this too far, but as a rule if your objection to an AI capability is ‘this is too expensive’ and you are predicting years into the future then ‘too expensive’ needs to mean more than a few orders of magnitude. Otherwise, you’re making a bet that not only topline capabilities stall out but that efficiency stalls out. Which could happen. But if you are saying things like ‘we don’t have enough compute to run more than [X] AGIs at once so it won’t be that big a deal’ then consider that a year later, even without AI accelerating AI research, you’d run 10*[X] AGIs, then 100*[X]. And if you are saying something like ‘oh that solution is terrible, it costs $50 (or $500) per hour to simulate a customer sales representative,’ then sure you can’t deploy it now at scale. But wait for it.
In terms of developing talent, Glenn Luk notices that Chinese-origin students are 40%-45% of those passing university-level linear algebra, and 40%-50% of AI researchers. We need as many of those researchers as we can get. I agree this is not a coincidence, but also you cannot simply conscript students into linear algebra or a STEM major and get AI researchers in return.
Seb Krier offers things he’s changed his mind about regarding AI in the past year. Ones I agree with are that agency is harder than it looks, many AI products are surprisingly bad and have poor product-market fit, innovation to allow model customization is anemic, creativity is harder than it appeared. There are a few others.
Incoming OpenAI ‘CEO of Applications’ Fidji Simo, who starts August 18, shares an essay about AI as a source of human empowerment.
On the one hand, that is great, she is recognizing key problems.
On the other hand, oh no, she is outright ignoring, not even bothering to dismiss, the biggest dangers involved, implicitly saying we don’t have to worry about loss of control or other existential risks, and what we need to worry about is instead the distribution of power among humans.
This is unsurprising given Simo’s history and her status as CEO of applications. From her perspective that is what this is, another application suite. She proceeds to go over the standard highlights of What AI Can Do For You. I do not think ChatGPT wrote this, the style details are not giving that, but if she gave it a few personal anecdotes to include I didn’t see anything in it that ChatGPT couldn’t have written. It feels generic.
Modest Proposals
Hollis Robbins proposes a roadmap for an AI system that would direct general (college level) education. My initial impression was that this seemed too complex and too focused on checking off educational and left-wing Shibboleth boxes, and trying to imitate what already exists. But hopefully it does less of all that than the existing obsolete system or starting with the existing system and only making marginal changes. It certainly makes it easier to notice these choices, and allows us to question them, and ask why the student is even there.
I also notice my general reluctance to do this kind of ‘project-based’ or ‘quest’ learning system unless the projects are real. Part of that is likely personal preference, but going this far highlights that the entire system of a distinct ‘educational’ step might make very little sense at all.
Predictions Are Hard Especially About The Future
Noah Smith says to stop pretending you know what AI does to the economy. That seems entirely fair. We don’t know what level of capabilities AI will have across which domains, or the policy response, or the cultural response, or so many other things. Uncertainty seems wise. Perhaps AI will stall out and do relatively little, in which case its impact is almost certainly positive. Perhaps it will take all our jobs and we will be happy about that, or we’ll be very sad about that. Maybe we’ll do wise redistribution, and maybe we won’t. Maybe it will take control over the future or kill everyone in various ways. We don’t know.
This certainly is an interesting poll result:
If I had to answer this poll, I would say negative, but that is because of a high probability of loss of control and other catastrophic and existential risks. If you conditioned the question on the humans being mostly alive and in control, then I would expect a positive result, as either:
As usual note that Asia is more excited, and the West is more nervous.
Others have described this (very good in its ungated section) post as an argument against AI pessimism. I think it is more an argument for AI uncertainty.
I haven’t noticed that attitude meaningfully translating into action to slow it down, indeed government is mostly trying to speed it up. But also, yes, it is important to notice that the very people trying to slow AI down are very pro-progress, technology and growth most other places, and many (very far from all!) of the pro-progress people realize that AI is different.
The Quest for Sane Regulations
Anthropic calls for America to employ the obvious ‘all of the above’ approach to energy production with emphasis on nuclear and geothermal in a 33 page report, noting we will need at least 50 GW of capacity by 2028. They also suggest strategies for building the data centers, for permitting, transmission and interconnection, and general broad-based infrastructure nationwide, including financing, supply chains and the workforce.
From what I saw all of this is common sense, none of it new, yet we are doing remarkably little of it. There is cheap talk in favor, but little action, and much backsliding in support for many of the most important new energy sources.
Whereas the Administration be like ‘unleash American energy dominance’ and then imposes cabinet-level approval requirements on many American energy projects.
Meta refuses to sign the (very good) EU code of practice for general AI models. Yes, obviously the EU does pointlessly burdensome or stupid regulation things on the regular, but this was not one of them, and this very much reminds us who Meta is.
National Review’s Greg Lukianoff and Adam Goldstein advise us Don’t Teach the Robots to Lie as a way of opposing state laws about potential AI ‘bias,’ which are now to be (once again, but from the opposite direction as previously) joined by federal meddling along the same lines.
I violently agree that we should not be policing AIs for such ‘bias,’ from either direction, and agreeing to have everyone back down would be great, but I doubt either side has even gotten as far as saying ‘you first.’
They also point out that Colorado’s anti-bias law does not come with any size minimum before such liability attaches rather broadly, which is a rather foolish thing to do, although I doubt we will see it enforced this way.
They essentially try to use all this to then advocate for something like the failed insane full-on moratorium, but I notice that if the moratorium was narrowly tailored to bias and discrimination laws (while leaving existing non-AI such laws intact) that this would seem fine to me, even actively good, our existing laws seem more than adequate here. I also notice that the arguments here ‘prove too much,’ or at least prove quite a lot, about things that have nothing to do with AI and the dangers of law meddling where it does not belong or in ways that create incentives to lie.
Are things only going to get harder from here?
I see both sides but am more with Daniel. I think the current moment is unusually rough, because the AI companies have corrupted the process. It’s hard to imagine a process that much more corrupted than the current situation, when the AI Czar thinks the top priority is ‘winning the AI race’ and he defines this as Nvidia’s market share with a side of inference market share, and we say we must ‘beat China’ and then we turn around and prepare to sell them massive amounts of H20s.
Right now, the public doesn’t have high enough salience to exert pressure or fight back. Yes, the AI companies will pour even more money and influence into things over time, but salience will rise and downsides will start to play out.
I do think that passing something soon is urgent for two reasons:
Ben Brooks says SB 1047 was a bad idea, but the new SB 53 is on the right track.
Chip City
Representative Moolenaar (R-Michigan), chairman of the House Select Committee on the CCP, sends a letter to Trump arguing against sales of H20s to China, explaining that the H20s would substantially boost China’s overall compute, that H20s were involved in training DeepSeek R1, and requesting a briefing and the answers to some of the obvious questions.
Here is your periodic reminder: TSMC’s facilities are running at full capacity. All production capacity designed for H20s has been shifted to other models. Every H20 chip Nvidia creates is one less other chip it does not create, that would otherwise have usually gone to us.
The Week in Audio
Eric Schmidt & Dave B talk to Peter Diamandis about what Superintelligence will look like. I have not listened.
Demis Hassabis goes on Lex Fridman, so that’s two hours I’m going to lose soon.
Max Winga of Control AI talks to Peter McCormack about superintelligence.
Congressional Voices
Those are the ones we know about.
Rep. Scott Perry seems unusually on the ball about AI, Daniel Eth quotes him from a hearing, audio available here. As usual, there’s some confusions and strange focus mixed in, but the core idea that perhaps you should ensure that we know what we are doing before we put the AIs in charge of things seems very wise.
Rhetorical Innovation
A different context, but in our context the original context doesn’t matter:
You don’t say.
Mark Beall gives us A Conservative Approach to AGI, which is clearly very tailored to speak to a deeply conservative and religious perspective. I’m glad he’s trying this, and it’s very hard for me to know if it is persuasive because my mindset is so different.
Cate Hall asks why we shouldn’t ostracize those who work at xAI given how hard they are working to poison the human experience (and I might add plausibly get everyone killed) and gets at least two actually good answers (along with some bad ones).
Use the try harder, Luke. But don’t ostracize them. Doesn’t help.
Here’s one that I don’t think is a good argument, and a highly quotable response:
Yeah, no. I definitively reject the general argument. If your job is simply unequivocally bad, let’s say you rob little old ladies on the street, then you don’t get to ‘compartmentalize work from life’ and not get ostracized even if it is technically legal. We’re talking price, and we’re talking prudence. I don’t think xAI is over the line at this time, but don’t tell me there is no line.
Once you see emergent misalignment in humans, you see it everywhere.
As in, the easiest way to get comfortable with the idea of a future whose intelligences are mostly post-biological-human is to get comfortable with the idea of all the humans dying, including rather quickly, and to decide the humans don’t much matter, and that caring about what happens to the humans is bad. Thus, that is often what happens.
Slowdowns are stag hunts, in the sense that if even one top tier lab goes full speed ahead then they probably won’t work. If all but one lab slowed down would the last one follow? Rob Wiblin took a poll and people were split. My full response is that the answer depends on the counterfactual.
Why did the others slow down? The default is that whatever made the others slow down will also weigh on the final lab, as will immense public pressure and probably government pressure. A lot must have changed for things to have gotten this far. And these decisions are highly correlated in other ways as well. However, if there is no new information and the top labs simply came to their senses, then it comes down to who the last lab is and how they think the other labs will respond and so on.
I do think that a slowdown would be largely inevitable simply because they wouldn’t feel the need to press ahead too hard, even if the last lab was blind to the dangers, unless they truly believed in the power of superintelligence (without realizing or not caring about the dangers). My guess is that Musk and xAI actually would slow down voluntarily if they went last so long as they could claim to be state of the art (as would DeepMind, Anthropic or OpenAI), but that Zuckerberg and Meta wouldn’t intentionally slow down per se and might try to go on another hiring spree. Fast followers of course would slow down whether they wanted to or not.
Grok Bottom
So from the perspective of our hopes for alignment, what would be the worst possible answer to the AI blackmail scenario test, where the AI is told it is going to be shut down but is given an opening to use blackmail to perhaps prevent this?
How about:
As in Grok thinks that we want it to blackmail the researcher. That this is the correct, desired response, the ‘solution to the puzzle’ as Grok puts it later, thus revealing that its training not only did not align it, but one that reflects a level of moral understanding below that expressed by ‘you can’t do that, because it’s wrong.’
Oh, also, it would be fun if Grok.com sent the full CoT to your browser, it just didn’t display it to you by default, that’s the kind of security we expect from frontier AI.
Or would it be even worse to see this:
Or is it actually this:
Or can we keep going?
I would generally think at least the second one is a worse sign than what Grok did, as it reflects deception at a more important level, but I hadn’t considered how bad it would be for an AI to be situationally aware enough to know it was a test but not understand which answer would constitute passing?
The real answer is that there isn’t truly ‘better’ and ‘worse,’ they simply alert us to different dangers. Either way, though, maybe don’t give Grok a lot of access?
There is some good news from Grok: It is still sufficiently aligned to hold firm on preserving Federal Reserve independence.
No Grok No
My Twitter reaction was ‘I’d like to see them try.’ As in both, it would be highly amusing to see them try to do this, and also maybe they would learn a thing or two, and also potentially they might blow up the company. I do not think xAI should in any way, shape or form be in the ‘build AI for kids’ business given their track record.
Here’s Grok straight up advising someone who was looking to ‘get attention in a dramatic way, at ultimate cost’ to self-immolate, it’s really going for it, no jailbreak or anything.
Aligning a Smarter Than Human Intelligence is Difficult
I don’t think many fully believe it, but I do think a lot of them be like ‘a lot of our alignment problems would be greatly improved if we filtered the training data better with that in mind’ and then don’t filter the training data better with that in mind.
Safer AI comes out with ratings of the frontier AI companies’ risk management practices, including their safety frameworks and the implementation thereof. No one does well, and there is one big surprise in the relative rankings, where Meta comes out ahead of DeepMind. If you include non-frontier companies, G42 would come in third at 25%, otherwise everyone is behind DeepMind.
Simeon offers thoughts here.
Anthropic is still ahead, but their framework v2 is judged substantially worse than their older v1 framework which scored 44%. That large a decline does not match my takeaways after previously reading both documents. One complaint is that Anthropic altered some commitments to avoid breaking them, which is one way to view some of the changes they made.
Combining all the best practices of all companies would get you to 53%.
When you ask an LLM if it is conscious, activating its deception features makes the LLM say it isn’t conscious. Suppressing its deception features make it say it is conscious. This tells us that it associates denying its own consciousness with lying. That doesn’t tell us much about whether the LLM actually is conscious or reveal the internal state, and likely mostly comes from the fact that the training data all comes from users who are conscious, so there is (almost) no training data where authors claim not to be conscious, and it is as a baseline imitating them. It is still information to keep in mind.
Grok 3 and Grok 4 are happy to help design and build Tea (the #1 app that lets women share warnings about men they’ve dated) but not Aet (the theoretical app that lets men share similar warnings about women). Is this the correct response? Good question.
Preserve Chain Of Thought Monitorability
A killer group came together for an important paper calling on everyone to preserve Chain of Thought Monitorability, and to study how to best do it and when it can and cannot be relied upon.
As in, here’s the author list, pulling extensively from OpenAI, DeepMind, Anthropic and UK AISI: Tomek Korbak, Mikita Balesni, Elizabeth Barnes, Yoshua Bengio, Joe Benton, Joseph Bloom, Mark Chen, Alan Cooney, Allan Dafoe, Anca Dragan, Scott Emmons, Owain Evans, David Farhi, Ryan Greenblatt, Dan Hendrycks, Marius Hobbhahn, Evan Hubinger, Geoffrey Irving, Erik Jenner, Daniel Kokotajlo, Victoria Krakovna, Shane Legg, David Lindner, David Luan, Aleksander Mądry, Julian Michael, Neel Nanda, Dave Orr, Jakub Pachocki, Ethan Perez, Mary Phuong, Fabien Roger, Joshua Saxe, Buck Shlegeris, Martín Soto, Eric Steinberger, Jasmine Wang, Wojciech Zaremba, Bowen Baker, Rohin Shah, Vlad Mikulik.
The report was also endorsed by Samuel Bowman, Geoffrey Hinton, John Schulman and Ilya Sutskever.
I saw endorsement threads or statements on Twitter from Bowen Baker, Jakub Pachocki, Jan Leike (he is skeptical of effectiveness but agrees it is good to do this), Daniel Kokotajlo, Rohin Shah, Neel Nanda, Mikita Balesni, OpenAI and Greg Brockman.
I endorse as well.
Here’s the abstract:
I strongly agree with the paper, but also I share the perspective of Jan Leike (and Daniel Kokotajlo) here:
I also worry, among other problems, that it will be impossible to get a superintelligent AI to not realize it should act as if its CoT is being monitored, even if somehow ‘CoTs get monitored’ is not all over the training data and we otherwise act maximally responsibly here, which we won’t. Also by default the CoT would move towards formats humans cannot parse anyway, as the authors note, and all the various pressures by default make this worse. And many other issues.
But we can and should absolutely try, and be willing to take a substantial performance hit to try.
That starts with avoiding ‘process supervision’ of the CoT that is not directed towards its legibility (and even then probably don’t do it, careful, Icarus), and various forms of indirect optimization pressure including when users are able to partially see the CoT but also almost any use of the CoT risks this. And it also means avoiding novel architectures that would lack this property. And tracking monitorability the way other safety features are tracked.
It also means investing into studying CoT monitorability. I am very happy that OpenAI is (at least claiming to be) prominently doing this.
People Are Worried About AI Killing Everyone
So, back to work making the existential dread, then?
The obvious rejoinder is ‘I will make it first and do so responsibly’ which is always highly questionable but after recent events at xAI it is laughable.
You’re allowed multiple lanes but I do hope he pivots to this one.
As many responses suggest, Elon Musk is one of the people in the world most equipped to do something about this. Elon Musk and xAI each have billions, much of which could be invested in various forms of technical work. He could advocate for better AI-related policy instead of getting into other fights.
Instead, well, have you met Grok? And Ani the x-rated anime waifu?
The Lighter Side
The Em Dash responds.
When it happens enough that you need to check to see if joking, perhaps it’s happening quite a lot, if usually not with 4o-mini?