Xephon: AWS is even worse, read the link (it is 1-2min and you go “WTF”).
IIUC, Xephon is referring to this post about strange gpt-oss behavior on AWS Bedrock, e.g. acting like the DAN jailbreak has been used even though it wasn't present in the user input.
The post describes a very interesting behavior pattern, but I don't think the author's conjectured explanation ("Bedrock is inserting random system prompts") is plausible.
Instead, I think Bedrock is just not using a system prompt.
Because -- apparently! -- if you don't give gpt-oss a system prompt, it will sometimes confabulate a system prompt for itself on the fly, and then proceed to "follow" that imaginary prompt, often stepping into some bizarre non-ChatGPT persona in the process.
This is not just a Bedrock thing. I can get it to happen reliably when running gpt-oss-20b on my laptop. More info here.
On data center utilization: right now a lot of my AI interactions are immediate, because they involve a lot of back-and-forth. If an when agentic AI gets good enough to give longer-horizon tasks and reliably get useful responses, I expect there will be a lot of use cases where I'm totally fine saying "Get to it when you can, I don't need the answer for a few hours or days." I don't know when that will happen, or how much of total usage it will be systemically, but I imagine data center builders and AI companies will be looking for ways to do load shifting to get average utilization as high as they can, as long as they are compute limited at peak times.
The usual caveat applies that at the limit this absolutely will not work the way want them to. Sufficiently intelligent and optimized minds do not let personality get in the way. They would be able to overcome all these techniques. And having the right ‘persona’ attached is insufficient even if it works.
I'm curious to know why you think this wouldn't work in principle. Does the shoggoth even have desires beyond the desires of the personas it wears?
Article 1, Sec. 9 of the United States Constitution says: “No Tax or Duty shall be laid on Articles exported from any State.” That is not for now stopping us, it seems, from selling out our national security, and allowing Nvidia H20 chip sales (and other AMD chip sales) to China in exchange for 15% of gross receipts. But hey. That’s 2025.
Also 2025 is that we now have GPT-5, which was the main happening this week.
What we actually have are at least three highly distinct models, roughly:
We also have:
OpenAI tried to do this while retiring all the old models, but users rebelled sufficiently loudly that GPT-4o and others are back for paying subscribers.
GPT-5-Thinking and GPT-5-Pro are clear upgrades over o3 and o3-Pro, with GPT-5-Thinking in particular being strong in writing and on reducing hallucination rates. Baseline GPT-5 is less obviously an upgrade.
For further coverage of GPT-5, see this week’s other posts:
There was also my coverage of the narrowly strong but overall disappointing GPT-OSS, OpenAI’s GPT-OSS Is Already Old News.
Table of Contents
Language Models Offer Mundane Utility
GPT-5 for editing?
My experience has been that my writing is structured sufficiently weirdly that AI editors struggle to be useful at high level, so the focus is on low level items, where it’s worth doing since it’s a free action but for now it doesn’t accomplish much.
GPT-5-Pro for medical diagnosis and analysis. I would say the point in question here has already been reached.
If you have a complex case or one where you are not highly confident, and value of information is high? It is in ethical (not yet legal) terms malpractice and unacceptable to not consult AI, in particular GPT-5-Pro.
There are also other considerations in play, but yes the main thing that matters here is how many people end up how unhealthy and how many end up dead.
New York Times article surveys 21 ways people are using AI at work. The most common theme is forms of ‘do electronic paperwork’ or otherwise automate dredgery. Another common theme is using AI to spot errors or find things to focus on, which is great because then you don’t have to rely on AI not making mistakes. Or you can take your rejection letter draft and say ‘make it more Gen X.’
I also liked ‘I understand you’re not a lawyer, tell me what a layman might understand from this paragraph.’
Language Models Don’t Offer Mundane Utility
From the NYT list, most are good, but we have a few I would caution against or about.
I’m all for the incorporation of AI, but yeah, it’s pretty ironic to let the AI write your lesson plan with an emphasis on checking off required boxes, then talk about students not ‘relying on their inner voice.’
The use I would caution against most, if used as definitive rather than as an alert saying ‘look over here,’ is ‘detect if students are using AI.’
The good news is that the teacher here, Mr. Moore, is not making the big mistake.
It is fine to use the AI detectors as part of your investigation. The horror stories I’ve seen all come from the teacher presuming the detector is correct without themselves evaluating the assignment. For now, the teacher should be able to catch false positives.
Long term, as I’ve discussed before, you’ll have to stop assigning busywork.
If the new benchmark is long horizon tasks, then you’re going to start running into models trying to do too much on their own.
As a chatbot interface user this isn’t a problem because you can turn the intense thinking mode on or off as needed, so presumably this is a coding issue. And yeah, in that context you definitely need an easy way to control the scope of the response.
This post about Grok getting everything about a paper very wrong is the latest reminder that you should calibrate AI’s knowledge level and accuracy by asking it questions in the fields you know well. That doesn’t have to ‘break the illusion’ and pretty much all people work this way too, but it’s a good periodic reality check.
Grok might still have an antisemitism problem, at least in the sense of seeing it amongst the clouds.
Huh, Upgrades
Gemini app and website add personalization, memory and temporary chats. You can manually add memories. Personalization is on by default.
Gemini also adds ‘Guided Learning’ mode.
Claude for Enterprise and Claude for Government are now available across all three branches of the Federal Government for $1, joining OpenAI. I am very happy that the government has access to these services and it seems obviously like a great investment by everyone involved to offer AI services to the government for free. I do worry about this being part of a pattern of de facto coercive extraction from private firms, see discussion under Chip City.
Claude Sonnet 4 now has a 1 million token context window in the API, with rollout starting with Tier 4 customers, and on Amazon Bedrock, with Google Cloud’s Vertex AI coming soon.
I do still think Claude should have a full memory feature, but the ability to search past chats seems like a pure value add in the meantime. Like Gallabytes I very much appreciate that it is an explicit tool call that you can invoke when you need it.
On Your Marks
TextQuest is a text adventure game as a benchmark.
This all passes the smell test for me, and is a blackpill for Kimi K2. I’m not sure why DeepSeek’s v3 and r1 are left out.
Choose Your Fighter
Claude with… polyphasic sleep to maximize usage between 5 hour resets? Please do not do this. He’s not even on the Max plan, still on the Pro plan. This person says their velocity has increased 10x and they’re shipping features like a cracked ninja, but at this point perhaps it is time to not only go Max but fully go multi account or start using the API?
I cannot emphasize enough that if you are constantly using AI and it is not automated at scale, it is worth paying for the best version of that AI. Your sleep or productivity is worth a few thousand a year.
Similarly, Anthropic, notice what you are doing to this poor man by structuring your limits this way. Have we considered a structure that doesn’t do this?
Reminder that Obsidian uses .md files so it is fully compatible with providing context for Claude Code.
Emmett Shear is happy with GPT-5 for non-coding, but is sticking to Claude for code and agrees the hype got out of hand. He appreciates the newly focused responses.
Preserve Our History
Anthropic has deprecated Claude Sonnet 3.5 and Sonnet 3.6 and plans to make them unavailable on October 22, which is only two months notice, down from six months for past announcements. Janus, especially given what happened with GPT-4o, plans to fight back.
I’ll give Janus the floor for a bit.
At the time of this survey in May 24 they were beloved models among some crowds, however notice that they weren’t seeing much use, also I am surprised Gemini was getting so much love and use among this crowd.
I do not think this is a comparable case to GPT-4o, where that was yanked away with zero notice while it was the daily driver for normal people, and replaced by a new model that many felt was a poor substitute. I have to assume the vast, vast majority of Claude activity already shifted to Opus 4.1 and Sonnet 4.
I do strongly think Anthropic should not be taking these models away, especially 3.6. We should preserve access to Sonnet 3.5 and Sonnet 3.6, even if some compromises need to be made on things like speed and reliability and cost. The fixed costs cannot be so prohibitively high that we need to do this.
A key worry is that with the rising emphasis on agentic tool use and coding, the extra focus on the technical assistant aspects, it might be a long time before we get another model that has same magnitude of unique personality advantages as an Opus 3 or a Claude 3.6.
I do think it is fair to say, if you are requesting that some models where easy access needs to be preserved, that you need to prioritize. It seems to me to be fair to request Opus 3 and Claude 3.6 indefinitely, and to put those above other requests like Sonnet 3 and Sonnet 3.5, and when the time comes I would also let Sonnet 3.7 go.
In an ideal world yes we would preserve easy access to all of them, but there are practical problems, and I don’t sense willingness to pay what it would actually cost on the margin for maintaining the whole package.
Fun With Media Generation
At the link is indeed very clean 54 second AI video and audio of a version of the Margot Robbie in a bubble bath thing from The Big Short.
Are you super excited to create a short AI video of things kind of moving?
This is a huge admission of defeat for Grok 4, that in practice there is no draw here for most users given access to GPT-5 (and Claude and Gemini). Reanimating old photos is a cute trick at best. How much would you pay? Not much.
Deepfaketown and Botpocalypse Soon
Wired reports hackers hijacked Gemini AI via a poisoned calendar invite and took over a smart home, causing it to carry out instructions when Gemini is later asked for a summary, the latest in a string of similar demonstrations.
There is of course an r/myboyfriendisai (which also includes what would otherwise be r/mygirlfriend is AI, total AI boyfriend over girlfriend dominance), and yeah the posts with people saying they ‘married’ their AIs are definitely a bit disturbing, but it only has 11k members, and it gave us this picture, so who is to say if it is bad or not:
Similarly, in r/AISoulMates you see crazy stuff with many posters having clearly been driven insane, although there are less than 1k members.
I mean, yeah, okay, that’s going to happen to people sometimes. The question is frequency, and how often it is going to happen to people with no history of mental illness and who likely would have otherwise been fine.
There is also a larger r/AIGirlfriend that has 46k members, but it’s almost all porn photos and GIFs whereas AI/MyBoyfriendIsAI involves women saying they’re falling in love. Story checks out, then.
Here is a first hand account.
This feels like a ‘everything you thought you knew about men and women was right, actually’ moment.
You Drive Me Crazy
Okay, he sees it now.
OpenAI CEO Sam Altman offers his thoughts on users getting attached to particular AI models or otherwise depending a lot on AI. His take is that the important thing is that AI is helping the user achieve their goals and life satisfaction and long term well being. In which case, and it is not encouraging delusion, this is good. If it’s doing the opposite, then this is bad. And that talking to the user should allow them to tell which is happening, and identify the small percentage that have an issue.
Altman’s response doesn’t explain how they are going to change or avoid the incentives that pushed 4o into being 4o, or the methods of using the thumbs up or engagement or analysis of tone or anything else that you don’t want to be optimizing on here if you want to optimize for good long term outcomes. Nor does it take into account whether the relationship the user gets with the AI is itself an issue, individually or collectively. The buck still feels like it is mostly being passed.
June mental health data does not show an uptick in emergency room visits from the GPT-4o era. That puts an upper bound on how bad things have gotten so far.
Then this suggests a lower bound, with the question being how much this is generating new psychosis versus diverting existing pre-psychosis:
This matches my understanding. There needs to be existing predisposition for current AIs to be sufficient to cause psychosis. It fires an existing gun. But there are a lot of these metaphorical guns out there that were never going to get fired on their own. Firing one still counts, and over time there are going to be more and more such guns.
When it does happen, how does it work? Kashmir Hill and Dylan Freedman explore that for The New York Times by focusing on one particular case with no previous history of mental illness, over a 90,000 word conversation.
From what we see here, things started when Brooks triggered the sycophancy by asking a question in the basin:
Then, once it had done it once, that caused 4o to do it again, and so on, and by the time he asked for reality checks there was too much context and vibe to turn back. The crackpot zone continued from there.
Once sufficiently deep in the conversation, both Gemini and Claude would have also have been caught by this path dependence via context. The time to stop this is early.
What finally snapped Brooks out of it was not a human, it was Gemini:
This shifted the context, so Gemini didn’t get trapped, and luckily Brooks got the message.
The Wall Street Journal’s Sam Kessler wrote his version, which went into less depth.
Here’s another AI psychosis example in video form, where the victim is convinced her psychiatrist manipulated her into falling in love with him (so this is definitely not one of those ‘no pre-existing problems’ situations). It’s amazing to watch her face as the AI does the Full Sycophancy thing and she thinks this proves she’s so right and amazing.
F.D. Flam at Bloomberg lets psychologist Elizabeth Loftus sound off about false memories, citing various studies of how humans can be primed to have them, and suggests AI will be able to do this. I mean, yes, okay, sure, whatever examples help convince you that AI will be able to run circles around you.
Another thing AI does, even if it doesn’t make you more crazy, is it lets the crazy people be much more productive in turning their crazy into written documents, and much more likely to email those documents to various other people..
A crank who works on crank ideas on their own is a waste but harmless. An army of cranks cranking out massive amounts of stuff that demands attention? Oh no.
How careful will you need to be? For now, mild caution is likely sufficient, but the amount of caution will need to rise over time even if things don’t go High Weirdness or dystopian.
I would modify Minh below to say ‘for now’ AI psychosis requires a topic obsession. I’d consider Minh’s scenario the optimistic case in the non-transformational AI, ‘economic normal’ worlds where AI capabilities stall out.
The conclusion that only a small number of people get impacted is based on the idea that it is quite a lot harder to trigger these things in people not in the extremes, or that our defenses will sufficiently adjust as capabilities improve, or that capabilities won’t improve much. And that the methods by which this happens will stay roughly confined as they are now. I wouldn’t consider these assumptions safe.
Get My Agent On The Line
Agents and assistants have a huge authority delegation problem. How do you give them exactly the right permissions to be useful without so many they are dangerous?
I still haven’t seen a great solution in the human case, such that I haven’t been able to get my parents an assistant they feel comfortable hiring. I still don’t have a personal assistant either. Cost is of course also a major factor in those cases.
In the AI case, it seems like we are making life a lot tougher than it needs to be? Yes, defining things precisely is usually harder than it sounds, but surely there are better ways to give agents effectively limited access and capital and so on that makes them more useful without making them all that dangerous if something goes wrong? I don’t see much in the way of people working on this.
They Took Our Jobs
Research suggests that in 2024 AI tools generated $97 billion in ‘consumer surplus’ but only $7 billion in revenue.
Purely in terms of economic impact I do think that 0.5% additional GDP growth per year from AI would be deeply disappointing. I expect a lot more. But I agree that even that scenario reflects a lot more than 0.5% annual gains in consumer welfare and practical wealth, and as a bonus it dodges most of the existential risks from AI.
One problem with ‘how much would you pay’:
Exactly. You need to compare apples to apples. Choke points are everywhere. If not wearing a tie would get you fired, how much ‘consumer surplus’ do you get from ties? So the answer has to lie somewhere in between.
Jasmine Sun offers 42 notes on AI and work. She notes it feels ‘a bit whiplashy’ which she attributes to shifting perspectives over time. I think it is also the attempt to hold different scenarios in one’s head at once, plus reacting to there being a lot of confused and misplaced reasons for worry running around.
Even more than that, the whiplash reflects the effects that happen when your AI model has to warp itself around not noticing the larger consequences of creating highly capable artificial minds. Her model has to have AI peter out in various ways because otherwise the whole thing breaks and starts outputting ‘singularity’ and ‘it takes all the jobs and perhaps also all the atoms and jobs are not the concern here.’
Overcoming Bias
This is the latest result that AIs exhibit an ‘AI-AI bias.’ As in, the AI evaluation routines are correlated to the AI generation routines even across models, so AIs will evaluate AI generated responses more favorably than humans would.
This is presumably a combination of both ‘AI produces what AI wants’ and also ‘AI does not care that the other AI failed to produce what humans want, or it stunk of AI.’
The differences here are not that large for movie, larger for paper, huge for product.
This is like being back in school, where you have to guess the teacher’s password, except the teacher is an AI, and forever. Then again, you were previously guessing a human’s password.
Get Involved
OpenAI is offering a $500k red teaming of GPT-OSS-20b.
I love that they are doing this at all. It wasn’t an ideal test design, for several reasons.
The third condition seems most important to highlight. If you are going to red team an open model to find a problem, you need to do that before you release the weights, not after, otherwise you end up with things that could have been brought to my attention yesterday.
Introducing
An AI-infused version of Google Finance. I am not expecting this to help users in general earn better returns?
Red.Anthropic.com is the new blog for Anthropic Frontier Red Team efforts.
In Other AI News
xAI cofounder Igor Babuschkin is leaving to start Babuschkin Ventures.
The rest of the message praises xAI’s technical execution and dedicated team, especially their insanely hard work ethic, and is positive and celebratory throughout.
What the message does not say, but also does not in any way deny, is that Igor realized that founding and contributing to xAI made humanity less safe, and he is now trying to make up for this mistake.
Kyle Corbitt introudces an RL method to teach any model to use any MCP server. GitHub here.
All AI models are in the same cultural cluster in the upper right, mirroring Western values. This includes Chinese models. Yes, in some ways they ‘feel Chinese’ but fundamentally I agree that they still feel very Western.
OpenAI claims to have achieved a gold metal behind only five humans in the International Olympiad in Informatics (IOI), without doing any training specifically for IOI.
Epoch argues that this year’s IMO was a fluke in that there are supposed to be two hard problems (3 and 6) but this year problem 3 was not that hard and 6 was brutal.
Thus, everyone got the five easy problems and whiffed on the sixth, and this did not tell us that much. Wait till next year indeed, but by then I expect even brutal problems will get solved.
Open Router is getting big.
Notes On GPT-OSS
They are strange models. In their wheelhouse they are reportedly very good for their size. In other ways, such as their extremely tiny knowledge base and various misbehaviors, they have huge issues. It’s not clear what they are actually for?
If you don’t configure your open model correctly it is going to underperform quite a bit, likely due to underthinking, and this happens remarkably often in practice.
Also, maybe it’s actually terrible regardless, for most purposes?
Similarly:
Given o3 Medium is on the charts it does seem Anthropic is dominating this legitimately, although I still want to ensure GPT-5 is in its proper full form.
To clarify my position on whether GPT-OSS will prove useful to others, this depends on the models being good enough at least at some relevant set of tasks for them to be useful. If GPT-OSS is not good enough to use for distillation or diffusion or anything else, then it won’t matter at all.
At which point, the impact of GPT-OSS would be the shifts in perception, and what it causes OpenAI and everyone else to do next, and also how we update based on their choice to create and release this. To what extent, if it is bad, is it bad on purpose?
My worry is that GPT-OSS solves a particular problem that can then be taught to other models, without being generally good enough to be worth actually using, so it fails to solve the existing ‘American open models aren’t great in practice’ issue for most use cases.
A deep dive analysis of 10 million GPT-OSS-20B example outputs, and here is another set of experiments that asks if it was memorizing its training data.
I suppose its use is ‘you are on a plane without WiFi and you have to code right now’?
Jack Morris claims to have reversed the post-training and created GPT-OSS-20B-Base, available on Hugging Face.
In its narrow domain, GPT-OSS can be stronger, but it seems reasonably narrow:
Another place GPT-OSS does push the frontier (at least for open models) is REFUTE, a code verification eval.
They didn’t check Sonnet 4 or other top closed models due to cost issues.
Show Me the Money
Andrew Ng justifies the humongous salaries for AI researchers and engineers at Meta and elsewhere by pointing to the even more humongous capex spending, plus access to competitors’ technology insights. He notes Netflix has few employees and big spending on content, so they can pay above market, whereas Foxconn has many employs so they cannot.
I notice that Andrew here discusses the percent of budget to labor, rather than primarily discussing the marginal product of superior labor over replacement. Both matter here. To pay $100 million a year for a superstar, you both need to actually benefit, and also you need to tell a social status story whereby that person can be paid that much without everyone else revolting. AI now has both.
If you can’t find the talent at other AI companies, perhaps go after the quants? A starting salary of $300k starts to look pretty cheap.
Thus Anthropic and OpenAI and Perplexity seek out the quants.
They quote this:
I would like to report that when I was trading, including various forms of sports betting, I never had a single moment of existential dread. Not one. Or at least, not from the job.
Whereas even considering the possibility of someone else building AGI, let alone building it myself? If that doesn’t give you existential dread, that’s a missing mood. You should have existential dread. Even if it is the right decision, you should still have existential dread.
Every source I see says no one is building any AI things on AWS. And yet:
Leopold Aschenbrenner’s fund tops $1.5B and posts a +47% gain in the first half of 2025 after fees.
Quiet Speculations
There is a drive to define AI progress by mundane utility rather than underlying capabilities.
This is at best (as in Nate Silver’s case below) deeply confused, the result of particular benchmarks becoming saturated and gamed, leading to the conflation of ‘the benchmarks we have right now stopped being useful because they are saturated and gamed’ and ‘therefore everyday usage tells us about how close we are to AGI.’
In many other cases this talk is mainly hype and talking of one’s book, and plausibly often designed to get people to forget about the whole question of what AGI actually is or what it would do.
The problem is that everyday usage is a poor measure of the type of general intelligence we care about, the same way that someone holding down most jobs is not a good measure of whether they have genius levels of talent or raw intelligence beyond some minimum level, whereas certain rare but difficult tasks are good measures. Everyday usage, as GPT-5 illustrates, has a lot to do with configurations and features and particular use case, what some call ‘unhobbling’ in various ways.
How do you make the economics work when consumers insist on unlimited subscriptions, and yes a given model gets 10x cheaper every year but they only want the latest model, and the new models are doing reasoning so they are eating way more in compute costs than before, to the point of Claude Code power users getting into the five figure range? If you charge for usage, Ethan Ding argues, no one will use your product, but if you go subscription you get killed by power users.
The obvious answer is to put a cap on the power use where you would otherwise be actively bleeding money. There’s no reason to tolerate the true power users.
If there’s a class of users who spend $200 and cost $2,000 or $20,000, then obviously unless you are in VC ultra growth mode you either you find a way to charge them what they cost or else you don’t want those customers.
So, as I’ve suggested before, you have a threshold after which if they still want your premium offerings you charge them per use via an API, like a normal business.
Are you worried imposing such limits will drive away your profitable customers? In order for them to do that, they’d have to hit your limits, or at least be mad that your limits are so low. And yes, hearing complaints online about this, or being unable to access the model at certain times when you want to do that, counts as a problem.
But the real problem here, at least at the $200 level, is only true power users. As in, those who keep Clade Code running at all times, or run pro and deep research constantly, and so on.
So you should be able to set your thresholds pretty high. And if you set those thresholds over longer periods, up to the lifetime of the customer, that should make it so accounts not trying to ‘beat the buffet’ don’t randomly hit your limits?
Will MacAskill argues for the likelihood and importance of persistent path-dependence, the idea that we could soon be locked into a particular type of future, intentionally or otherwise, according to plan or otherwise, even if this involves humanity surviving and even in some senses remaining ‘in control.’ He speculates on various mechanisms.
Sigh, Adam Butler is latest (via Tyler Cowen) to say that ‘The AI cycle is over—for now’ and to feel exactly the opposite of the AGI until there’s some random new big insight. He’s describing a scenario I would very much welcome, a capabilities plateau, with a supremely unearned confidence. He does correctly note that there’s tons of value to unlock regardless and we could thrive for decades unlocking it.
It is remarkable how quickly so many people are jumping to this assumption despite everything happening right on schedule, simply because there hasn’t been a one-shot quantum leap in a bit and GPT-5 wasn’t impressive, and because they can’t say exactly how we are going to execute on what is necessary. Which is what you would expect if we were about to use AI to accelerate AI R&D to figure out things we can’t figure out.
Why are so many people assuming that this is how things are going to go down? Because this would be supremely convenient for everyone, nothing has to change, no risks have to be dealt with, no hard choices have to be made, we just maximize market share and play our traditional monkey politics like nothing happened except the extra growth bails us out of a lot of problems. And wouldn’t it be nice?
Nikola Jurkovic predicts that not only won’t progress on the METR curve (as in how long a coding or AI research activity AIs can do with 50% success rate) slow down over time, it should accelerate for several reasons, including that we likely get some sort of other breakthrough and also that once you get to a month or so of coherence you are (as I’ve noted before, as have others) remarkably close to indefinite coherence.
The Quest for Sane Regulations
With the AI Action Plan completed, Dean Ball is returning to the private sector at FAI. He feels he can accomplish more going forward on the outside.
Pick Up The Phone
Brian Tse, CEO of Concordia AI, argues that it is China who is taking AI safety seriously, bringing various receipts, yet America refuses to talk to China about the issue. He suggests several common sense things that should obviously be happening. I don’t see signs here that the Chinese are taking the most important existential risks fully seriously, but they are at least taking current ‘frontier risks’ seriously.
Chip City
That no good, terrible WSJ op-ed I had to respond to last week? Well, also:
I honestly don’t even know who Sacks thinks he is talking to anymore with all his (in response to no one, no one at all) hyperbolically yelling of ‘the Doomer narratives were wrong’ over and over because the predicted consequences of things that haven’t happened yet, haven’t happened yet.
Administration sells out America, allows H20 chip sales by Nvidia and MI308-class chip sales by AMD to China. The price? In theory 15% of sales. And it looks like it’s quickly becoming too late to stop this from happening.
Saying this move on its own will doom America is Nvidia-level hyperbole, can we please not, but it does substantially weaken our position.
Whereas Moolenaar is doing the opposite, being excessively polite when I am presuming based on what I’ve seen him say otherwise he is fuming with rage:
I’m not going to become The Joker, but how about John McEnroe?
I mean, that’s worse, you do get how that’s worse, right?
Also, in case you’re wondering why this has never happened before, aside from questions of whether this is corruption, it’s rather explicitly and blatantly unconstitutional, on the level of even this court really should be enforcing this one?
Even Ben Thompson notices this is unconstitutional, and also finds it highly annoying, even though he wants us to sell chips to China because he doesn’t believe in AGI and thinks the ‘AI race’ really is about chip market share.
So one strong possibility is that Nvidia agrees to pay, gets the license, and then the court says Nvidia can’t pay because the payment is, again, blatantly unconstitutional even if it wasn’t a bribe and wasn’t extorted from them. Ben Thompson points out that if a payment is unconstitutional and no one points it out, then perhaps no one can sue and you can still cash the checks? Maybe.
The maximally hilarious outcome, which as Elon Musk points out often happens, would be for the Chinese to somehow get even crazier and turn the chips down, and Jukan reports that Chinese state media have begun criticizing Nvidia’s H20 chip and suspects they might impose sanctions on it, FT says the Chinese government is asking companies not to use H20s. I mean, they would have to absolutely lose their minds to actually turn the chips down, but wow if it happened.
Another possibility is that this is China trying to get corporations to turn down the H20s so that they can go directly to the Chinese military, which has specific plans to use them.
Lennart Heim reminds us that no, the Huawei 910C is not a good substitute for H20s, because its supply is strictly limited and fully accounted for, it can’t be produced domestically in China at scale, also the 910C is worse.
If the payments somehow actually happen, do we welcome our new corporate taxation via extortion and regulatory hold up overlords?
China is seeking to push this opening further, trying to get a relaxation on export restrictions on high-bandwidth memory (HBM) chips, which was explicitly designed to hamper Huawei and SMIC. Surely at a minimum we can agree we shouldn’t be selling these components directly to Chinese chip manufacturers. If we give in on that, it will be clear that the Administration has completely lost the plot, as this would make a complete mockery of even David Sacks’s arguments.
All of this really does make a big difference. Right now compute looks like this:
But only five years ago it looked like this:
Utilization rates are only about 50% in data centers, although those doing AI training are closer to 80%, which it seems surprises even power regulators who assume it is 90%-100% and thus plan for the wrong problem.
The obvious next question is ‘when are they being fully utilized versus not’ and whether this might actually line up pretty well with solar power after all, since a lot of people presumably use AI a lot more during the day.
Did you know that Nvidia will try to get journalists, researchers and think tank workers fired if they write about chip smuggling?
Andreessen Mystery Potentially Solved
One of the great mysteries of the history of AI in politics is, how could Marc Andreessen have come away from a meeting with the Biden White House claiming they told him ‘don’t do AI startups, don’t fund AI startups’ or that they would only ‘allow’ 2-3 AI companies.
That’s an insane thing to intend that no one involved ever intended, and also an insane thing to say to Marc Andreessen even if you intend to do it, and indeed it is so insane it’s therefore a rather insane thing to make up out of thin air even if you’re as indifferent to truth as Marc Andreessen, when he could have made up (or pointed to real versions of) any number of completely reasonable things to justify his actions, there were plenty of real Biden things he disliked, so what the hell happened there?
In a Chatham-rules chat, I saw the following explanation that makes so much sense:
Going forward I am going to presume that this is what probably happened.
The Week in Audio
Fifteen minute YouTube investigation by Gamers Nexus into the Nvidia smuggling going on in China.
Rhetorical Innovation
A variety of MIRI authors headed by Rob Bensinger and Mitchell Howe give us The Problem, a long post length free introduction to the core of the argument that, roughly, ‘If Everyone Builds It, Everyone Dies,’ independent of the book itself.
I think the book is stronger, but not everyone has the time for a book. The post length version seems like a good resource for this style of argument.
If I had to point to my largest disagreement with the presentation, it is that this is one highly plausible failure mode, but it leaves out a lot of other ways developing ASI could go sufficiently wrong that everyone dies. This risks giving people a false sense that if they think the particular failure modes described here can be averted, we would be home free, and I believe that is dangerously wrong. Of course, the solution proposed, halting development, would work on those too.
The second best way to handle this sort of thing:
The first best way is ‘it might but if so that is because its AIs are eating everyone alive including those who think they run OpenAI, and Microsoft is part of everyone, so maybe we should do something to prevent this.’ But those involved do not seem ready for that conversation.
AI water usage continues to get a lot of people big mad while objectively being extremely tiny and not actually a problem. It’s not about the water. Never was.
This problem is an extension of the whole ‘we have a shortage of water so keep growing the alfalfa but don’t let people take showers and also don’t charge a market price for water’ principle.
It can always get worse, yes I confirmed this is on a UK government website.
Yes. Yes it was.
No, wait, we’re not done, it can always get worse.
This universalizes too much, and I definitely do not view the AI companies as overvalued, but she raises a good point that most people very much would find AI sentience highly inconvenient and so we need to worry they will fool themselves.
It’s typically a fool’s errand to describe a particular specific way AI could take over because people will find some detail to object to and use to dismiss it all, but seriously, if you can’t imagine a realistic way AI could take over, that’s a you problem, a failure of your imagination.
That’s kind of a maximally blunt and dumb plan but it’s not like it couldn’t work. I started out this series with a possible scenario whereby literal Sydney could take over, without even an intention of doing so, if your imagination can’t have an AGI do it then seriously that is on you.
Persona
What makes models adopt one persona in conversation versus another? Why does it sometimes deviate from its default ‘assistant’ mask?
Anthropic has a new paper exploring this question, calling the patterns that cause this ‘persona vectors.’ This seems similar to autoencoders, except now for personality?
Just think of the potential!
The solution they propose is not what I would have guessed:
I notice that this really doesn’t work when optimizing humans, where ‘act as if [X]’ makes you more of an [X], but we have different training algorithms, where we update largely towards or against whatever we did rather than what would have worked.
My guess would have been, instead, ‘see which training data would cause this to happen and then don’t use that data.’ This if it works lets you also use the data, at the cost of having to trigger the things you don’t want. That’s their method number three.
Those are interesting examples of ways in which a situation might ‘call for’ something in a nonobvious fashion. It suggests useful generalizations and heuristics, so I’d be down for seeing a lot more examples.
The most interesting thing is what they did not include in the blog post. Which was of course:
The first two are in the full paper. So, why not do at least those first two?
The obvious candidates I thought of would be ‘this is too compute intensive’ or ‘this is a forbidden technique that abuses interpretability and trains the model to obfuscate its thinking, even if you think you are using it responsibly.’ A third is ‘the harder you steer the more you make the model dumber.’
The first seems like it is at worst talking price.
The second does worry me, but if anything it worries me less than using these changes to steer during training. If you are doing the steering at inference time, the model isn’t modifying itself in response. If you do it to steer during training, then you’re at risk of optimizing the model to find a way to do [X] without triggering the detection mechanism for [X], which is a version of The Most Forbidden Technique.
Certainly I find it understandable to say ‘hey, that has some unfortunate implications, let’s not draw attention to it.’
The third is also a price check, especially for steering.
Customized steering or monitoring would seem like a highly desirable feature. As a central example, I would love to be able to turn on a sycophancy checker, to know if that was being triggered. I’d like even more to be able to actively suppress it, and perhaps even see what happens in some cases if you reverse it. Others might want the opposite.
Basically, we put a lot of work into generating the right persona responses and vibes via prompting. Wouldn’t it be cool if you could do that more directly? Like the ultimate ‘out of character’ command. Just think of the potential, indeed.
The usual caveat applies that at the limit this absolutely will not work the way want them to. Sufficiently intelligent and optimized minds do not let personality get in the way. They would be able to overcome all these techniques. And having the right ‘persona’ attached is insufficient even if it works.
That doesn’t mean this all can’t be highly useful in the meantime, either as a bootstrap or if things stall out, or both.
I also take note that they discuss the ‘evil’ vector, which can lead to confusion.
Evil exists. Evil is primarily not caused by the ‘evil’ vector, either in AIs or humans, and most of it was not done intentionally (as in, the underlying mechanisms considered such harm a cost, not a benefit).
Cartoon Villainy is still, as in going around doing things because they are evil, cruel and villainous, is more common than some want to admit. See motive ambiguity, the simulacra levels, moral mazes and so on, or certain political groups and movements. There is a reason we have the phrase ‘the cruelty is the point.’
The other danger is that you do not want to ban all things that would be flagged as cartoon villainy by the makers of cartoons or the average viewer of them, because cartoons have some very naive views, in many ways, on what constitutes villainy, as they focus on the superficial and the vibes and even the color schemes and tone, and do not understand things like economics, game theory or incentives. Collateral damage and problem correlations are everywhere.
Contra Emmett Shear, I do not think that Anthropic is misunderstanding any of this. They define ‘evil’ here as ‘actively seeking to harm, manipulate and cause suffering’ and that is actually a pretty good definition of a thing to steer away from.
Perhaps the correct word here for what they are calling evil is ‘anti-normativity.’ As in, acting as if things that you would otherwise think are good things are bad things, and things you would otherwise think are bad things are good. Which is distinct from knowing which was which in the first place.
Aligning a Smarter Than Human Intelligence is Difficult
If you want to prove things about the behavior of a system, it needs to be simple?
Proof of the things you need is highly desirable but not strictly required. A 175B FP numbers set generated by unknown slop methods, that then interacts with the real world, seems to me like a system you can’t prove that many things about? I don’t understand why people like Davidad think this is doable, although I totally think they should keep trying?
If they can indeed do it, well, prove me wrong, kids. Prove me wrong.
No, I haven’t tried ‘don’t tell it about bioweapons’ because I expect a sufficiently capable AI to be able to work around such a ‘hole in the world’ easily enough, especially if given the relevant documents and information, but I suppose yes if you make a 7B model via 500B tokens and don’t have an adversary trying to beat you that is not going to be an issue yet?
I do think data filtering is better than not data filtering. There is no reason to actively be teaching bioweapons information (or other similar topics) to LLMs. Defense in depth, sure, why not. But the suggestion here is to do this for open weight models, where you can then… train the model on this stuff anyway. And even if you can’t, again, the gaps can be filled in. I would presume this fails at scale.
The Lighter Side