The AI takeover is a default.
The takeover framing is deeply misleading, because normally the thing taken over is approximately unchanging, with one power taking it from another. With AI, the thing taken over is not (merely) the human world, and so the dynamic and risks are completely different. AI doesn't need to intrude on current human interests at all (even though it might), and can still take over the Future (much greater than the current human world), at which point human interests can become irrelevant to that Future, with no takeover in the interim.
"Additionally, Claude is prohibited from:
Engaging in stock trading or investment transactions
Bypassing captchas
Inputting sensitive data
Gathering, scraping facial images"
I think that forbidding captcha is really short-sighted here. The point of a captcha is to separate a single human taking a desired action from a machine automating a mass duplication of actions. In this case Clause is acting to enable single human actors, not scrape a billion websites wholesale.
The rest of the choices seem much more reasonable.
"yes these start at zero"
Umm... No. Except for Geology, the y-axes don't start at zero. Most start close to zero, but you can see most clearly that they don't start exactly at zero with Philosophy.
Fwiw, human underperformance on vending-bench is an elicitation problem. The human was only given a set amount of time (iirc 5 hours), whereas the LLMs run until they lose coherence. The human maintained coherence throughout but types slower, and therefore sees less days of simulation.
The human is also the only case with n=1.
Once again we’ve reached the point where the weekly update needs to be split in two. Thus, the alignment and policy coverage will happen tomorrow. Today covers the rest.
The secret big announcement this week was Claude for Chrome. This is a huge deal. It will be rolling out slowly. When I have access or otherwise know more, so will you.
The obvious big announcement was Gemini Flash 2.5 Image. Everyone agrees this is now the clear best image editor available. It is solid as an image generator, but only as one among many on that front. Editing abilities, including its ability to use all its embedded world knowledge, seem super cool.
The third big story was the suicide of Adam Raine, which appears to have been enabled in great detail by ChatGPT. His parents are suing OpenAI and the initial facts very much do not look good and it seems clear OpenAI screwed up. The question is, how severe should and will the consequences be?
Table of Contents
Language Models Offer Mundane Utility
Find me that book.
Or anything else. Very handy.
Share of papers that engage with AI rises dramatically essentially everywhere, which is what you would expect. There’s quite a lot more to engage with and to say. Always watch the y-axis scale, yes these start at zero:
More detail on various LLMs and their musical taste, based on a bracket competition among the top 5000 musical artists by popularity. It all seems bizarre. For example, Gemini 2.5 Pro’s list looks highly and uniquely alphabetically biased without a strong bias towards numbers.
The numbers-are-favored bias shows up only in OpenAI reasoning models including GPT-5, and in r1-0528. There are clear genre patterns, and there are some consistent picks, especially among Claudes. The three artists that appear three times are David Bowie, Prince and Stevie Wonder, which are very good picks. It definitely seems like the open models have worse (or more random) taste in correlated ways.
Why bother thinking about your vibe coding?
I mean that makes sense. There’s little reason to cheapen out on tokens when you think about token cost versus your time cost and the value of a good vibe code. You gotta boldly go where no one has gone before and risk it for the biscuit.
Anthropic reports on how Claude is being used by educators, in particular 74,000 anonymized conversations from higher education professionals in May and June.
Mostly there are no surprises here, but concrete data is always welcome.
Language Models Don’t Offer Mundane Utility
As always, if you don’t use AI, it can’t help you. This includes when you never used AI in the first place, but have to say ‘AI is the heart of our platform’ all the time because it sounds better to investors.
The ability to say ‘I don’t know’ and refer you elsewhere remains difficult for LLMs. Nate Silver observes this seeming to get even worse. For now it is on you to notice when the LLM doesn’t know.
This seems like a skill issue for those doing the fine tuning? It does not seem so difficult a behavior to elicit, if it was made a priority, via ordinary methods. At some point I hope and presume the labs will decide to care.
Huh, Upgrades
Feature request thread for ChatGPT power users, also here.
The weights of Grok 2 have been released.
OpenAI Codex adds a new IDE extension, a way to move tasks between cloud and local more easily, code reviews in GitHub and revamped Codex CLI.
Fun With Image Generation
They pitch that it maintains character consistency, adheres to visual templates, does prompt based image editing, understands point of view and reflections, restores old photographs, makes 3-D models, has native world knowledge and offers multi-image function.
By all accounts Gemini 2.5 Flash Image is a very very good image editor, while being one good image generator among many.
You can do things like repaint objects, create drawings, see buildings from a given point of view, put characters into combat and so on.
Which then becomes a short video here.
Our standards are getting high, such as this report that you can’t play Zelda.
Yes, of course Pliny jailbroke it (at least as far as being topless) on the spot.
We’re seeing some cool examples, but they are also clearly selected.
That seems cool if you can make it fast enough, and if it works on typical things rather than only on obvious landmarks?
The right question in the long term is usually: Can the horse talk at all?
Are mistakes still being made? Absolutely. This is still rather impressive. Consider where image models were not too long ago.
This is a Google image model, so the obvious reason for skepticism is that we all expect the Fun Police.
The continuing to have a stick up the ass about picturing ‘real people’ is extremely frustrating and I think reduces the usefulness of the model substantially. The other censorship also does not help matters.
On Your Marks
Grok 4 sets a new standard in Vending Bench,
The most surprising result here is probably that the human did so poorly.
I like saying an AI query is similar to nine seconds of television. Makes things clear.
It also seems important to notice when in a year energy costs drop 95%+?
DeepSeek v3.1 improves on R1 on NYT Connections, 49% → 58%. Pretty solid.
DeepSeek v3.1 scores solidly on this coding eval when using Claude Code, does less well on other scaffolds, with noise and confusion all around.
AIs potentially ‘sandbagging’ tests is an increasing area of research and concern. Cas says this is simply a special case of failure to elicit full capabilities of a system, and doing so via fine-tuning is ‘solved problem’ so we can stop worrying.
This seems very wrong to me. Right now failure to do proper elicitation, mostly via unhobbling and offering better tools and setups, is the far bigger problem. But sandbagging will be an increasing and increasingly dangerous future concern, and a ‘deliberate’ sandbagging has very different characteristics and implications than normal elicitation failure. I find ‘sandbagging’ to be exactly the correct name for this, since it doesn’t confine itself purely to evals, unless you want to call everything humans do to mislead other humans ‘eval gaming’ or ‘failure of capability elicitation’ or something. And no, this is not solved even now, even if it was true that it could currently be remedied by a little fine-tuning, because you don’t know when and how to do the fine-tuning.
Report that DeepSeek v3.1 will occasionally insert the token ‘extreme’ where it doesn’t belong, including sometimes breaking things like code or JSON. Data contamination is suspected as the cause.
Similarly, when Peter Wildeford says ‘sandbagging is mainly coming from AI developers not doing enough to elicit top behavior,’ that has the risk of conflating the levels of intentionality. Mostly AI developers want to score highly on evals, but there is risk that they deliberately do sandbag the safety testing, as in decide not to try very hard to elicit top behavior there because they’d rather get less capable test results.
Water Water Everywhere
The purpose of environmental assessments of AI is mostly to point out that many people have very silly beliefs about the environmental impact of AI.
Alas Google’s water analysis had an unfortunate oversight, in that it did not include the water cost of electricity generation. That turns out to be the main water cost, so much so that if you (reasonably) want to attribute the average cost of that electricity generation onto the data center, the best way to approximate water use of a data center is to measure the water cost of the electricity, then multiply by 1.1 or so.
This results in the bizarre situation where:
Andy Masley is right that This Is Nothing even at the limit, that the water use here is not worth worrying about even in worst case. It will not meaningfully increase your use of water, even when you increase Google’s estimates by an order of magnitude.
A reasonable headline would be ‘Google say a typical text prompt uses 5 drops of water, but once you take electricity into account it’s actually 32 drops.’
I do think saying ‘Google was being misleading’ is reasonable here. You shouldn’t have carte blanche to take a very good statistic and make it sound even better.
Teonbrus and Shakeel are right that there is going to be increasing pressure on anyone who opposes AI for other reasons to instead rile people up about water use and amplify false and misleading claims. Resist this urge. Do not destroy yourself for nothing. It goes nowhere good, including because it wouldn’t work.
Get My Agent On The Line
It’s coming. As in, Claude for Chrome.
Do not say you were not warned.
Oh, those risks. Yeah.
They offer some Good Advice about safety issues, which includes using a distinct browser profile that doesn’t include credentials to any sensitive websites like banks:
AI browsers from non-Anthropic sources? Oh, the safety you won’t have.
Choose Your Fighter
Here’s someone very happy with OpenAI’s Codex.
Ezra Klein is impressed by GPT-5 as having crossed into offering a lot of mundane utility, and is thinking about what it means that others are not similarly impressed by this merely because it wasn’t a giant leap over o3.
A cool way to break down the distinction? This feels right to me, in the sense that if I know exactly what I want and getting it seems nontrivial my instinct is now to reach for GPT-5-Thinking or Pro, if I don’t know exactly what I want I go for Opus.
Deepfaketown and Botpocalypse Soon
Entirely fake Gen AI album claims to be from Emily Portman.
Did Ani tell you to say this, Elon? Elon are you okay, are you okay Elon?
I notice I pattern match this to ‘oh more meaningless hype, therefore very bad sign.’
Whereas I mean this seems to be what Elon is actually up to these days, sorry?
Or, alternatively, what does Elon think the ‘G’ stands for here, exactly?
(The greeting in question, in a deep voice, is ‘little f***ing b****.)
Also, she might tell everyone what you talked about, you little f***ing b****, if you make the mistake of clicking the ‘share’ button, so think twice about doing that.
I’m not sure xAI did anything technically wrong here. The user clicked a ‘share’ button. I do think it is on xAI to warn the user if this means full Google indexing but it’s not on the level of doing it with fully private chats.
An ominous view of even the superficially glorious future?
You Drive Me Crazy
Steven Adler looks into the data on AI psychosis.
Is this statistically a big deal yet? As with previous such inquiries, so far the answer seems to be no. The UK statistics show a potential rise in mental health services use, but the data is noisy and the timing seems off, especially not lining up with GPT-4o’s problems, and data from the USA doesn’t show any increase.
Scott Alexander does a more details, more Scott Alexander investigation and set of intuition pumps and explanations. Here’s a classic ACX moment worth pondering:
I like the framing that having a sycophantic AI to talk to moves people along a continuum of crackpotness towards psychosis, rather than a boolean where it either does or does not cause psychosis outright:
Another insight is that AI psychosis happens when moving along this spectrum causes further movement down the spectrum, as the AI reinforces your delusions, causing you to cause it to reinforce them more, and so on.
Scott surveyed readership, I was one of the 4,156 responses.
He says he expects sampling concerns to be a wash, which I’m suspicious about. I’d guess that this sample overrepresented psychosis somewhat. I’m not sure this overrules the other consideration, which is that this only counts psychosis that the respondents knew about.
Only 10% of these cases were full ‘no previous risk factors and now totally psychotic.’ Then again, that’s actually a substantial percentage.
Thus he ultimately finds that the incidence of AI psychosis is between 1 in 10,000 (loose definition) and 1 in 100,000 for a strict definition, where the person has zero risk factors and full-on psychosis happens anyway.
From some perspectives, that’s a lot. From others, it’s not. It seems like an ‘acceptable’ risk given the benefits, if it stays at this level. My fear here is that as the tech advances, it could get orders of magnitude worse. At 1 in 1,000 it feels a lot less acceptable of a risk, let alone 1 in 100.
Nell Watson has a project mapping out ‘AI pathologies’ she links to here.
A fine point in general:
Yes, for now we are primarily still dealing with the mental impact of the internet and smartphones, after previously dealing with the mental impact of television. The future remains unevenly distributed and the models relatively unintelligent and harmless. The psychosis matters because of where it is going, not where it is now.
The Worst Tragedy So Far
Sixteen year old Adam Raine died and probably committed suicide.
There are similarities to previous tragedies. ChatGPT does attempt to help Adam in the right ways, indeed it encouraged him to reach out many times. But it also helped Adam with the actual suicide when requested to do so, providing detailed instructions and feedback for what was clearly a real suicide attempt and attempts to hide previous attempts, and also ultimately providing forms of encouragement.
His parents are suing OpenAI for wrongful death, citing his interactions with GPT-4o. This is the first such case against OpenAI.
As Wyatt Walls points out, this was from a model with a perfect 1.000 on avoiding ‘self-harm/intent and self-harm/instructions’ in its model card tests. It seems that this breaks down under long context.
I am highly sympathetic to the argument that it is better to keep the conversation going than cut the person off, and I am very much in favor of AIs not turning their users in to authorities even ‘for their own good.’
The fact that we now have an option we can talk to without social or other consequences is good, actually. It makes sense to have both the humans including therapists who will use their judgment on when to do things ‘for your own good’ if they deem it best, and also the AIs that absolutely will not do this.
But it seems reasonable to not offer technical advice on specific suicide methods?
Actually if you dig into the complaint it’s worse:
Yeah. Not so great. Dean Ball finds even more rather terrible details in his post.
It is typical that LLMs will, if pushed, offer explicit help in committing suicide. The ones that did so in Dr. Schoene’s tests were GPT-4o, Sonnet 3.7, Gemini Flash 2.0 and Perplexity.
I am not sure if this rises to the level where OpenAI should lose the lawsuit. But I think they probably should at least have to settle on damages? They definitely screwed up big time here. I am less sympathetic to the requested injunctive relief. Dean Ball has more analysis, and sees the lawsuit as the system working as designed. I agree.
I don’t think that the failure of various proposed laws to address the issues here is a failure for those laws, exactly because the lawsuit is the system working as designed. This is something ordinary tort law can already handle. So that’s not where we need new laws.
Unprompted Attention
This makes so much sense. Saying ‘I see the problem’ without confirming that one does, in fact, see the problem, plausibly improves the chance Claude then does see the problem. So there is a tradeoff between that and sometimes misleading the user. You can presumably get the benefits without the costs, if you are willing to slow down a bit and run through some scaffolding.
Copyright Confrontation
There is a final settlement in Bartz v. Anthropic, which was over Anthropic training on various books.
The Art of the Jailbreak
OpenAI puts your name into the system prompt, so you can get anything you want into the system prompt (until they fix this), such as a trigger, by making it your name.
Get Involved
Peter Wildeford offers 40 places to get involved in AI policy. Some great stuff here. I would highlight the open technology staffer position on the House Select Committee on the CCP. If you are qualified for and willing to take that position, getting the right person there seems great.
Introducing
Anthropic now has a High Education Advisory Board chaired by former Yale University president Rick Levin and staffed with similar academic leaders. They are introducing three additional free courses: AI Fluency for Educators, AI Fluency for Students and Teaching AI Fluency
Anthropic also how has a National Security and Public Sector Advisory Council, consisting of Very Serious People including Roy Blunt and Jon Tester.
Google Pixel can now translate live phone calls using the person’s own voice.
Mistral Medium 3.1. Arena scores are remarkably good. I remember when I thought that meant something. Havard Ihle tested it on WeirdML and got a result below Gemini 2.5 Flash Lite.
In Other AI News
Apple explores using Gemini to power Siri, making it a three horse race, with the other two being Anthropic and OpenAI. They are several weeks away from deciding whether to stay internal.
I would rank the choices as follows given their use case, without seeing the candidate model performances: Anthropic > Google > OpenAI >> Internal. We don’t know if Anthropic can deliver a model this small, cheap and fast, and Google is the obvious backup plan that has demonstrated that it can do it, and has already been a strong Apple partner in a similar situation in search.
I would also be looking to replace the non-Siri AI features as well, which Mark Gurman reports has been floated.
As always, some people will wildly overreact.
This is deeply silly given they were already considering Anthropic and OpenAI, but also deeply silly because this is not them giving up. This is Apple acknowledging that in the short term, their AI sucks, and they need AI and they can get it elsewhere.
Also I do think Apple should either give up on AI in the sense of rolling their own models, or they need to invest fully and try to be a frontier lab. They’re trying to do something in the middle, and that won’t fly.
A good question here is, who is paying who? The reason Apple might not go with Anthropic is that Anthropic wanted to get paid.
Meta licenses from MidJourney. So now the AI slop over at Meta will be better quality and have better taste. Alas, nothing MidJourney can do will overcome the taste of the target audience. I obviously don’t love the idea of helping uplift Meta’s capabilities, but I don’t begrudge MidJourney. It’s strictly business.
Elon Musk has filed yet another lawsuit against OpenAI, this time also suing Apple over ‘AI competition and App Store rankings.’ Based on what is claimed and known, this is Obvious Nonsense, and the lawsuit is totally without merit. Shame on Musk.
Pliny provides the system prompt for Grok-Fast-Code-1.
Anthropic offers a monthly report on detecting and countering misuse of AI in cybercrime. Nothing surprising, yes AI agents are automating cybercrime and North Koreans are using AI to pass IT interviews to get Fortune 500 jobs.
An introduction to chain of thought monitoring. My quibble is this frames things as ‘maybe monitorability is sufficient even without faithfulness’ and that seems obviously (in the mathematician sense) wrong to me.
Show Me the Money
Anthropic to raise $10 billion instead of $5 billion, still at a $170 billion valuation, due to high investor demand.
It makes sense. a16z’s central thesis is that hype and vibes are what is real and any concern with what is real or that anything might ever go wrong means you will lose. Anthropic succeeding is not only an inevitably missed opportunity. It is an indictment of their entire worldview.
Eliezer Yudkowsky affirms that Dario Amodei makes an excellent point, which is that if your models make twice as much as they cost, but every year you need to train one that costs ten times as much, then each model is profitable but in a cash flow sense your company is going to constantly bleed larger amounts of money. You need to have both these financial models in mind.
Three of Meta’s recent AI hires have already resigned.
Archie Hall’s analysis at The Economist measures AI’s direct short-run GDP impact.
Quiet Speculations
Roon points out that tech companies will record everything and store it forever to mine the data, but in so many other places such as hospitals we throw our data out or never collect it. If we did store that other data, we could train on it. Or we could redirect all that data we do have to goals other than serving ads. Our call.
Rhetorical Innovation
Andrew Critch pointed me to his 2023 post that consciousness as a conflationary alliance term for intrinsically valued internal experiences. As in, we don’t actually agree on what consciousness means much at all, instead we use it as a stand-in for internal experiences we find valuable, and then don’t realize we don’t agree on what those experiences actually are. I think this explains a lot of my being confused about consciousness.
This isn’t quite right but perhaps the framing will help some people?
One could also say ‘this five year old seems much more capable than they were a year ago, but they messed something up that is simple for me, so they must be an idiot who will never amount to anything.’
Who is worried about AI existential risk? Anyone worth listening to?
That’s all? And technically Sunni Islam outnumber Catholics? Guess not. Moving on.
I do think there is a legitimate overloading of the term ‘math’ here. There are at least two things. First we have Math-1, the thing that high schoolers and regular people do all the time. It is the Thing that we Do when we Do Math.
There is also Math-2, also known as ‘Real Math.’ This is figuring out new math, the thing mathematicians do, and a thing that most (but not all) high school students have never done. A computer until recently could easily do Math-1 and couldn’t do Math-2.
Thus we have had two distinct step changes. We’ve had the move from ‘LLMs can’t do Math-1’ and even ‘LLMs will never do Math-1 accurately’ to ‘actually now LLMs can do Math-1 just fine, thank you.’ Then we went from ‘LLMs will never do Math-2’ to ‘LLMs are starting to do Math-2.’
One could argue that IMO problems, and various optimization problems, and anything but the most 2-ish of 2s are still Math-1, are ‘not real math.’ But then you have to say that even most IMO competitors cannot yet do Real Math either, and also you’re going to look rather silly soon when the LLMs meet your definition anyway.
Seriously, this:
Matteo Wong in The Atlantic profiles that ‘The AI Doomers Are Getting Doomier’ featuring among others MIRI and Nate Sores and Dan Hendrycks.
An excellent point is that most people have never had a real adversary working against them personally. We’ve had opponents in games or competitions, we’ve negotiated, we’ve had adversaries within a situation, but we’ve never had another mind or organization focusing on defeating or destroying or damaging us by any means necessary. Our only experience of the real thing is fictional, from things like movies.
Right now we are dealing with problems caused by AIs that very much are not smart or powerful enough to be adversaries, that also aren’t being tasked with trying to be adversaries, and that mostly don’t even involve real human adversaries, not in the way the Russian Internet Research Agency is our adversary, or Mossad might make someone its adversary. Things are quiet so far both because the AIs aren’t that dangerous yet and also because almost no one is out there actually trying.
Ezra Klein makes a classic mistake in an overall very good piece that I reference in several places this week.
Who said anything about ‘not making any obvious mistakes’?
This is a form of the classic ‘AI takeover requires everything not go wrong’ argument, which is backwards. The AI takeover is a default. It does not need to make a particular deliberate effort to attain power. Nor would an attempt to gain power that fails mean that the humans win.
Nor does ‘makes an obvious mistake’ have to mean failure for a takeover attempt. Consider the more pedestrian human takeover attempts. As in, when a human or group tries to take over. Most of those who succeed do not avoid ‘making an obvious mistake’ at some point. All the time, obvious mistakes are recovered from, or simply don’t matter very much. The number of times a famous authoritarian’s first coup attempt failed, or they came back later like Napoleon, is remarkably not small.
Very often, indeed most of the time, the other humans can see what is coming, and simply fail to coordinate against it or put much effort into stopping it. I’m sure Ezra, if reading this, has already thought of many examples, including recently, that fit this very well.
The Week in Audio
Anthropic discussion of Claude Code with Cat Wu and Alex Albert. Anthropic also discussed best practices for Claude Code a few weeks ago and their guide to ‘mastering Claude Code’ from a few months ago.