That’s a lot of money. For context, I remember talking to a congressional staffer a few months ago who basically said that a16z was spending on the order of $100M on lobbying and that this amount was enough to make basically every politician think “hmm, I can raise a lot more if I just do what a16z wants” and that many did end up doing just that. I was, and am, disheartened to hear how easily US government policy can be purchased
I am disheartened to hear that Daniel or that anyone else is surprised by this. I have wondered since "AI 2027" was written how the AGI-Risk Community is going to counter the inevitable flood of lobbying money in support of deregulation. There are virtually no guardrails left on political spending in American politics. It's been the bane of every idealist for years. And who has more money than the top AI companies?
Thus I'm writing to say:
I respect and admire the 'AGI-risk community' for its expertise, rigor and passion, but I often worry that this community is a closed-tent that's not benefiting enough from people with other non-STEM skillsets.
I know the people in this community are extremely-qualified in the fields of AI and Alignment itself. But it doesn't mean they are experienced in politics, law, communication, or messaging (though I acknowledge that there are exceptions).
But for the wider pool of people who are experienced in those topics (but don't understand neural nets or Von Neumann architecture), where are their discussion groups? Where do you bring them in? Is it just in-person?
Amen. We need people with more skillsets and knowledge bases on this project. I'd be interested in talking about strategy for doing this. It's been bugging me for a while that we're not doing this actively.
Ultimately, Congress needs to act. Right? (because voluntary commitments from companies just won't cut it) But how to get to that point?
I've wondered what Daniel & "AI Futures Project's" actual strategy is.
For example, are they focusing the most on convincing:
a) politicians,
b) media outlets (NYT, CNN, Fox, MSNBC, tech websites, etc.),
c) AI/AI-Adjacent Companies/Executives/Managers, or
d) scientists and scientific institutions
If I could over-generalized, I would say:
- the higher up the list, the "more intimate with the halls of power"
- the lower on the list, the "more intimate with the development of AI"
But I feel it's very hard for "d) scientists and scientific institutions" to get their concerns all the way to "a) politicians" without passing-through (or competing-with) "b" and "c".
Daniel's comment reveals they're at least trying to convince "a) politicians" directly. I'm not saying it's bad to talk to politicians, but I feel that politicians are already hearing too many contradictory signals on AI Risk (from "b" and "c" and maybe even some "d"). On my phone, I get articles constantly saying "AI is over-hyped", "AI is a bubble", "AI is just another lightbulb", etc.
That's a lot to compete with! Even without the influence of lobbying money, the best-intentioned politician might be genuinely confused right now!
If I was able to speak to various AI-Risk organizations directly, I would ask: how much effort are you putting into convincing the people who convince the politicians? Ideally we'd get the AI Executives themselves on our side (and then the lobbying against us would start to disappear), but in the absence of that, the media needs to at least be talking about it and scientific institutions need to be unequivocal.
But if they're just "doing one Congressional staffer meeting at a time"... without strongly covering the other bases... in my non-expert-opinion... we're in trouble.
The sycophancy scores suggest we’re not doing a great job identifying sycophancy.
Tim Hua is. His results suggest that the least sycophantic model is Kimi K2.
And what mankind should do with xAI given the model non-card and that Musk claims that Grok 5 "has a shot at being true AGI"? The bad scenario is that Musk is in AI-induced psychosis, then we are done slower. And an even worse scenario is that Grok 5 also had an architectural breakthrough, which is unlikely to keep Grok 5 interpretable.
Their capacity is still miles behind domestic demand, their quality still lags far behind Nvidia, and of course their capacity was going to ramp up a lot over time as is that of TSMC and Nvidia (and presumably Samsung and Intel and AMD). I don’t get it.
Does anyone seriously think that if we took down our export controls, that Huawei would cut back its production schedule? I didn’t think so.
As with Intel, you need to distinguish fabs from chip (and system) design, very different activities. Having (multiple) real customers is plausibly crucially important for a fledgling/struggling fab company (they let the fab company make the process of being a customer better, and set the right priorities), and it's probably a major reason Intel Foundry is dying. Intentions to ramp up a lot or subsidies and vague domestic demand don't necessarily help that much (it's not necessarily centrally voluntary and non-fake demand, given the current product). If you have a somewhat worse product as a fab company, you don't get comparable interest in it, and can't get those multiple real customers.
Just a note, no direct relation to alignment:
I read this article on GPT5 being too metaphorical as a writer and that signaling an training process optimizing for an unwanted goal (the investigator's twitter thread about it is linked above). The accusation was that GPT5T was producing lots of metaphor as a way to impress humans and/or other LLMs with how human and literary it was, but the metaphors were nonsense.
Too metaphorical is a matter of taste, but I found it very interesting that the lead choice in "full of shit" was quite good IMO - a metaphor right at the leading edge of my ability to get it and so perfect for me.
Adjusting the pop filter to "count the german language's teeth"
German has a reputation for being harsh in an auditory sense, with shorp stops and starts. This harshness has been posited to relate to the harshness of the german culture, clearly relevant for the interview.
A pop filter is (presumably - I don't know) going to be adjusted for those sharp sounds. Adjusting it properly requires "counting the teeth" by guessing how much there will be in this interview.
This ties back to the whole subject of the essay which has to do with German history. So, wow, this is IMO an amazingly skilled use of metaphor.
This is not to disagree with the hypothesis about getting here by optimizing for LLMs thinking it's human-generated. But I for one have been enjoying GPT5's writing; it's replaced the scifi schlock I was reading for relaxation before bed, since I can quickly prompt it and so co-create instead of just hear other people's stories. These are short pieces; when I tried to see how far it could get quickly with plotting and writing at novel length (SOMEONE should write a novel about the alignment problem, and it ain't gonna be me unaided), it broke down pretty quickly.
Is the rest of it? I don't know. When I listen to GPT5T's stories with half my attention, I lose track and suspect it's full of shit. When I listen with full attention I feel it's often too heavy on metaphor, but they usually make sense, so as with reading a skilled author, I assume the ones I don't get will also make sense and contribute to the feel perhaps.
I think good writing is very much a matter of taste. The idea that there's some transcendent skill only goes so far. I find it interesting that evaluations like Zvi's that take "good taste" to be a very real and important thing (I think it's somewhat real and important) are the basis of OpenAI pushing ChatGPTs writing to the "impress people with the depth of your metaphor" areas instead of the "tell a good story" areas.
Just saying.
Which makes it rather strange to choose to sell worse, and thus less expensive and less profitable, chips to China rather than instead making better chips to sell to the West.
I think what might be going on here is that different fabs have separate capacities. You can't make more H200s because they have to be made on the new and fancy fabs, but you can make H20s in an older facility. So if you want to sell more chips, and you're supply limited on the H200s, then the only thing you can do is make crappier chips and figure out where to sell them.
It doesn’t look good, on many fronts, especially taking a stake in Intel.
We continue.
Table of Contents
America Extorts 10% of Intel
USA successfully extorts a 10% stake in Intel. Scott Lincicome is here with the ‘why crony capitalism is terrible’ report, including the fear that the government might go after your company next, the fear that we are going to bully people into buying Intel products for no reason, the chance Intel will now face new tariffs overseas, and more. Remember the fees they’re extorting from Nvidia and AMD.
Scott also offers us this opinion in Washington Post Editorial form.
As is usually the case, the more details you look at, the worse it gets. This deal does give Intel something in return, but that something is letting Intel off the hook on its commitments to build new plants, so that seems worse yet again.
Samsung is reportedly ‘exploring partnerships with American companies to ‘please’ the Trump administration and ensure that its regional operations aren’t affected by hefty tariffs.’
To be clear: And That’s Terrible.
Tyler Cowen writes against this move, leaving no doubt as to the implications and vibes by saying Trump Seizes the Means of Production at Intel. He quite rightfully does not mince words. A good rule of thumb these days is if Tyler Cowen outright says a Trump move was no good and very bad, the move is both importantly damaging and completely indefensible.
Is there a steelman of this?
Ben Thompson says yes, and he’s the man to provide it, and despite agreeing that Lincicome makes great points he actually supports the deal. This surprised me, since Ben is normally very much ordinary business uber alles, and he clearly appreciates all the reasons such an action is terrible.
So why, despite all the reasons this is terrible, does Ben support doing it anyway?
Ben presents the problem as the need for Intel to make wise long term decisions towards being competitive and relevant in the 2030s, and that it would take too long for other companies to fill the void if Intel failed, especially without a track record. Okay, sure, I can’t confirm but let’s say that’s fair.
Next, Ben says that Intel’s chips and process are actually pretty good, certainly good enough to be useful, and the problem is that Intel can’t credibly promise to stick around to be a long term partner. Okay, sure, again, I can’t confirm but let’s say that’s true.
Ben’s argument is next that Intel’s natural response to this problem is to give up and become another TSMC customer, but that is against America’s strategic interests.
Once again, I cannot confirm the economics but seems reasonable on both counts. We would like Intel to stand on its own and not depend on TSMC for national security reasons, and to do that Intel has to be able to be a credible partner.
The next line is where he loses me:
Why does America extorting a 10% passive stake in Intel solve these problems, rather than make things worse for all the reasons Lincicome describes?
Because he sees ‘America will distort the free market and strongarm Intel into making chips and other companies into buying Intel chips’ as an advantage, basically?
So much for this being a passive stake in Intel. This is saying Intel has been nationalized. We are going the CCP route of telling Intel how to run its business, to pursue an entirely different corporate strategy or else. We are going the CCP route of forcing companies to buy from the newly state-owned enterprise. And that this is good. Private capital should be forced to prioritize what we care about more.
That’s not the reason Trump says he is doing this, which is more ‘I was offered the opportunity to extort $10 billion in value and I love making deals’ and now he’s looking for other similar ‘deals’ to make if you know what’s good for you, as it seems extortion of equity in private businesses is new official White House policy?
It is hard to overstate how much worse this is than simply raising corporate tax rates.
As in, no Intel is not a special case. But let’s get back to Intel as a special case, if in theory it was a special case, and you hoped to contain the damage to American free enterprise and willingness to invest capital and so on that comes from the constant threat of extortion and success being chosen by fiat, or what Republicans used to call ‘picking winners and losers’ except with the quiet part being said out loud.
Why do you need or want to take a stake in Intel in order to do all this? We really want to be strongarming American companies into making the investment and purchasing decisions the government wants? If this is such a strategic priority, why not do this with purchase guarantees, loan guarantees and other subsidies? It would not be so difficult to make it clear Intel will not be allowed to fail except if it outright failed to deliver the chips, which isn’t something that we can guard against either way.
Why do we think socialism with Trumpian characteristics is the answer here?
I’m fine with the idea that Intel needs to be Too Big To Fail, and it should be the same kind of enterprise as Chase Bank. But there’s a reason we aren’t extorting a share of Chase Bank and then forcing customers to choose Chase Bank or else. Unless we are. If I was Jamie Dimon I’d be worried that we’re going to try? Or worse, that we’re going to do it to Citibank first?
That was the example that came to mind first, but it turns out Trump’s next target for extortion looks to be Lockheed Martin. Does this make you want to invest in strategically important American companies?
As a steelman exercise of taking the stake in Intel, Ben Thompson’s attempt is good. That is indeed as good a steelman as I’ve been or can come up with, so great job.
Except that even with all that, even the good version of taking the stake would still be a terrible idea, you can simply do all this without taking the stake.
And even if the Ben Thompson steelman version of the plan was the least bad option? That’s not what we are doing here, as evidenced by ‘I want to try and get as much as I can’ in stakes in other companies. This isn’t a strategic plan to create customer confidence that Intel will be considered Too Big To Fail. It’s the start of a pattern of extortion.
Thus, 10 out of 10 for making a good steelman but minus ten million for actually supporting the move for real?
Again, there’s a correct and legal way for the American government to extort American companies, and it’s called taxes.
Tyler Cowen wrote this passage on The History of American corporate nationalization for another project a while back, emphasizing how much America benefits from not nationalizing companies and playing favorites. He thought he would share it in light of recent events.
The Quest For No Regulations Whatsoever
I am Jack’s complete lack of surprise.
Their ‘a16z is lobbying because it wants sensible guardrails and not total deregulations’ t-shirt is raising questions they claim are answered by the shirt.
OpenAI is helping fund this via Brockman. Total tune of $100 million.
Which is a lot.
So now we can double that. They’re (perhaps legally, this is our system) buying the government, or at least quite a lot of influence on it. As usual, it’s not that everyone has a price but that the price is so cheap.
As per usual, the plan is to frame ‘any regulation whatsoever, at all, of any kind’ as ‘you want to slow down AI and Lose To China.’
Industry, and a16z in particular, were already flooding everyone with money. The only difference is now they are coordinating better, and pretending less, and spending more?
They continue to talk about ‘vast forces’ opposing the actual vast force, which was always industry and the massive dollars behind it. The only similarly vast forces are that the public really hates AI, and the physical underlying reality of AI’s future.
And there it is, right in the article, as text. What they are worried about is that we won’t pass a law that says we aren’t allowed to pass any laws.
If you think ‘Congress won’t pass AI laws’ is a call for Congress to pass reasonable AI laws, point to the reasonable AI laws anyone involved has ever said a kind word about, let alone proposed or supported.
No it doesn’t? These ‘concerns about China’ peaked around January. There has been no additional reason for such concerns in months that wasn’t at least priced in, other than acts of self-sabotage of American energy production.
The Quest for Sane Regulations
Dean Ball goes over various bills introduced in various states.
There are always tons of bills. The trick is to notice which ones actually do anything and also have a chance of becoming law. That’s always a much smaller group.
By ‘comprehensively regulate’ Dean means the Colorado-style or EU-style use-based approaches, which we both agree is quite terrible. Dean instead focuses on two other approaches more in vogue now.
I agree with Dean that I don’t support that idea, I think it is net harmful, but if you want to talk to an AI you can still talk to an AI, so so far it’s not a big deal.
Did I mention recently that nothing I say in this column is investment or financial advice, legal advice, tax advice or psychological, mental health, nutritional, dietary or medical advice? And just in case, I’m also not ever giving anyone engineering, structural, real estate, insurance, immigration or veterinary advice.
Because you must understand that indeed nothing I have ever said, in any form, ever in my life, has been any of those things, nor do I ever offer or perform any related services.
I would never advise you to say the same, because that might be legal advice.
Similarly, it sounds like AI companies would under these laws most definitely also not be saying their AIs can provide mental health advice or services? Okay, sure, I mean annoying but whatever?
Technically via the definition here it is mental healthcare to ‘detect’ that someone might be (among other things) intoxicated, but obviously that is not going to stop me or anyone else from observing that a person is drunk, nor are we going to have to face a licensing challenge if we do so. I would hope. This whole thing is deeply stupid.
So I would presume the right thing to do is to use the best tools available, including things that ‘resemble’ ‘mental healthcare.’ We simply don’t call it mental healthcare.
Similarly, what happens when Illinois HB 1806 says this (as quoted by Dean):
My obvious response is, if this means an AI can’t do it, it also means a friend cannot do it either? Which means that if they say ‘I am feeling depressed and lonely today. Help me improve my mood’ you have to say ‘I am sorry, I cannot do that, because I am not a licensed health professional any more than Claude Opus is’? I mean presumably this is not how it works. Nor would it change if they were somehow paying me?
Dean’s argument is that this is the point:
There’s a kind of whiplash here that I am used to when reading such laws. I don’t care if it is impossible to comply in the law if fully enforced in a maximally destructive and perverse way unless someone is suggesting this will actually happen. If the laws are only going to get enforced when you actively try to offer therapist chatbots?
Then yes it would be better to write better laws, and I don’t especially want to protect those people’s roles at all, but we don’t need to talk about what happens if the AI gets told to help improve someone’s mood and the AI suggests going for a walk. Nor would I expect a challenge to that to survive on constitutional grounds.
More dear to my heart, and more important, are bills about Frontier AI Safety. He predicts SB 53 will become law in California, here is his summary of SB 53:
There is a sharp contrast between this skeptical and nitpicky and reluctant but highly respectful Dean Ball, versus the previous Dean Ball reaction to SB 1047. He still has some objections and concerns, which he discusses. I am more positive on the bills than he is, especially in terms of seeing the benefits, but I consider Dean’s reaction here high praise.
Chip City
Nvidia reported earnings of $46.7 billion, growing 56% in a year, beating both revenue and EPS expectations, and was promptly down 5% in after hours trading, although it recovered and was only down 0.82% on Thursday. It is correct to treat Nvidia only somewhat beating official estimates as bad news for Nvidia. Market is learning.
I do not fully understand why Nvidia does not raise prices, but given that decision has been made they will sell every chip they can make. Which makes it rather strange to choose to sell worse, and thus less expensive and less profitable, chips to China rather than instead making better chips to sell to the West. That holds double if you have uncertainty on both ends, where the Americans might not let you sell the chips and the Chinese might not be willing to buy them.
Also, even Ben Thompson, who has called for selling even our best chips to China because he cares more about Nvidia market share than who owns compute, noticed that H20s would sell out if Nvidia offered them for sale elsewhere:
Instead they chose a $5 billion writedown. We are being played.
Ben is very clear that what he cares about is getting China to ‘build on Nvidia chips,’ where the thing being built is massive amounts of compute on top of the compute they can make domestically. I would instead prefer that China not build out this massive amount of compute.
China plans to triple output of chips, primarily Huawei chips, in the next year, via three new plants. This announcement caused stock market moves, so it was presumably news.
What is obviously not news is that China has for a while been doing everything it can to ramp up quality and quantity of its chips, especially AI chips.
This is being framed as ‘supporting DeepSeek’ but it is highly overdetermined that China needs all the chips it can get, and DeepSeek happily runs on everyone’s chips. I continue to not see evidence that any of this wouldn’t have happened regardless of DeepSeek or our export controls. Certainly if I was the PRC, I would be doing all of it either way, and I definitely wouldn’t stop doing it or slow down if any of that changed.
Note that this article claims that DeepSeek is continuing to do its training on Nvidia chips at least for the time being, contra claims it had been told to switch to Huawei (or at least, this suggests they have been allowed to switch back).
Once Again The Counterargument On Chip City
Sriram Krishnan responded to the chip production ramp-up by reiterating the David Sacks style case for focusing on market share and ensuring people use our chips, models and ‘tech stack’ rather than on caring about who has the chips. This includes maximizing whether models are trained on our chips (DeepSeek v3 and r1 were trained on Nvidia) and also who uses or builds on top of what models.
The thing is, even if you think who uses what ecosystem is the important thing because AI is a purely ordinary technology where access to compute in the medium term is relatively unimportant, which it isn’t, no, they mostly aren’t (that co-dependent) and it basically doesn’t build a moat.
I’ll start with my analysis of the question in the bizarre alternative universe where we could be confident AGI was far away. I’ll close by pointing out that it is crazy to think that AGI (or transformational or powerful AI, or whatever you want to call the thing) is definitely far away.
The rest of this is my (mostly reiterated) response to this mostly reiterated argument, and the various reasons I do not at all see these as the important concerns even without concerns about AGI arriving soon, and also I think it positively crazy to be confident AGI will not arrive soon or bet it all on AGI not arriving.
Sriram cites two supposed key mistakes in the export control framework: Not anticipating DeepSeek and Chinese open models while suppressing American open models, underestimating future Chinese semiconductor capacity.
The first is a non-sequitur at best, as the export controls held such efforts back. The second also doesn’t, even if true (and I don’t see the evidence that a mistake was even made here), provide a reason not to restrict chip exports.
Yes, our top labs are not releasing top open models. I very much do not think this was or is a mistake, although I can understand why some would disagree. If we make them open the Chinese fast follow and copy them and use them without compensation. We would be undercutting ourselves. We would be feeding into an open ecosystem that would catch China up, which is a more important ecosystem shift in practice than whether the particular open model is labeled ‘Chinese’ versus ‘American’ (or ‘French’). I don’t understand why we would want that, even if there was no misuse risk in the room and AGI was not close.
I don’t understand this obsession some claim to have with the ‘American tech stack’ or why we should much care that the current line of code points to one model when it can be switched in two minutes to another if we aren’t even being paid for it. Everyone’s models can run on everyone’s hardware, if the hardware is good.
This is not like Intel+Windows. Yes, there are ways in which hardware design impacts software design or vice versa, but they are extremely minor by comparison. Everything is modular. Everything can be swapped at will. As an example on the chip side, Anthropic swapped away from Nvidia chips without that much trouble.
Having the Chinese run an American open model on an American chip doesn’t lock them into anything it only means they get to use more inference. Having the Chinese train a model on American hardware only means now they have a new AI model.
I don’t see lock-in here. What we need, and I hope to facilitate, is better and more formal (as in formal papers) documentation of how much lower switching costs are across the board, and how much there is not lock-in.
I don’t see why we should sell highly useful and profitable and strategically vital compute to China, for which they lack the capacity to produce it themselves, even if we aren’t worried about AGI soon. Why help supercharge the competition and their economy and military?
The Chinese, frankly, are for now winning the open model war in spite of, not because of, our export controls, and doing it ‘fair and square.’ Yes, Chinese open models are currently a lot more impressive than American open models, but their biggest barrier is lack of access to quality Nvidia chips, as DeepSeek has told us explicitly. And their biggest weapon is access to American models for reverse engineering and distillation, the way DeepSeek’s r1 built upon OpenAI’s o1, and their current open models are still racing behind America’s closed models.
Meanwhile, did Mistral and Llama suck because of American policy? Because the proposed SB 1047, that never became law, scared American labs away from releasing open models? Is that a joke? No, absolutely not. Because the Biden administration bullied them from behind the scenes? Also no.
Mistral and Meta failed to execute. And our top labs and engineers choose to work on and release closed models rather than open models somewhat for safety reasons but mostly because this is better for business, especially when you are in front. Chinese top labs choose the open weights route because they could compete in the closed weight marketplace.
The exception would be OpenAI, which was bullied and memed into doing an open model GPT-OSS, which in some ways was impressive but was clearly crippled in others due to various concerns, including safety concerns. But if we did release superior open models, what does that get us except eroding our lead from closed ones?
As for chips, why are we concerned about them not having our chips? Because they will then respond by ramping up internal production? No, they won’t, because they can’t. They’re already running at maximum and accelerating at maximum. Yes, China is ramping up its semiconductor capacity, but China made it abundantly clear it was going to do that long before the export controls and had every reason to do so. Their capacity is still miles behind domestic demand, their quality still lags far behind Nvidia, and of course their capacity was going to ramp up a lot over time as is that of TSMC and Nvidia (and presumably Samsung and Intel and AMD). I don’t get it.
Does anyone seriously think that if we took down our export controls, that Huawei would cut back its production schedule? I didn’t think so.
Even more than usual, Sriram’s and Sacks’s framework implicitly assumes AGI, or transformational or powerful AI, will not arrive soon, where soon is any timeframe on which current chips would remain relevant. That AI would remain an ordinary technology and mere tool for quite a while longer, and that we need not be concerned with AGI in any way whatsoever. As in, we need not worry about catastrophic or existential risks from AGI, or even who gets AGI, at all, because no one will build it. If no one builds it, then we don’t have to worry about if everyone then dies.
I think being confident that AGI won’t arrive soon is crazy.
What is the reason for this confidence, when so many including the labs themselves continue to say otherwise?
Are we actually being so foolish as to respond to the botched rollout of GPT-5 and its failure to be a huge step change as meaning that the AGI dream is dead? Overreacting this way would be a catastrophic error.
I do think some amount of update is warranted, and it is certainly possible AGI won’t arrive that soon. Ryan Greenblatt updated his timelines a bit, noting that it now looks harder to get to full automation by the start of 2028, but thinking the chances by 2033 haven’t changed much. Daniel Kokotajlo, primary author on AI 2027, now has a median timeline of 2029.
Quite a lot of people very much are looking for reasons why the future will still look normal, they don’t have to deal with high weirdness or big risks or changes, and thus they seek out and seize upon reasons to not feel the AGI. Every time we go even a brief period without major progress, we get the continuous ‘AI or deep learning is hitting a wall’ and people revert to their assumption that AI capabilities won’t improve much from here and we will never see another surprising development. It’s exhausting.
Power Up, Power Down
That’s a fun thing to bring up during a walkabout, also it is true, also this happened days after they announced they would not approve new wind and solar projects thus blocking a ‘massive amount of electricity’ for no reason.
They’re also unapproving existing projects that are almost done.
Here EPA Administrator Lee Zeldin is asked by Fox News what exactly was this ‘national security’ problem with the wind farm. His answer is ‘the president is not a fan of wind’ and the rest of the explanation is straight up ‘it is a wind farm, and wind power is bad.’ No, seriously, check the tape if you’re not sure. He keeps saying ‘we need more base load power’ and this isn’t base load power, so we should destroy it. And call that ‘national security.’
This is madness. This is straight up sabotage of America. Will no one stop this?
Meanwhile, it seems it’s happening, the H20 is banned in China, all related work by Nvidia has been suspended, and for now procurement of any other downgraded chips (e.g. the B20A) has been banned as well. I would presume they’d get over this pretty damn quick if the B20A was actually offered to them, but I no longer consider ‘this would be a giant act of national self-sabotage’ to be a reason to assume something won’t happen. We see it all the time, also history is full of such actions, including some rather prominent ones by the PRC (and USA).
Chris McGuire and Oren Cass point out in the WSJ that our export controls are successfully giving America a large compute advantage, we have the opportunity to press that advantage, and remind us that the idea of transferring our technology to China has a long history of backfiring on us.
Yes, China will be trying to respond by making as many chips as possible, but they were going to do that anyway, and aren’t going to get remotely close to satisfying domestic demand any time soon.
People Really Do Not Like AI
There are many such classes of people. This is one of them.
It’s true on Twitter as well, if you go into the parts that involve people who might be on Bluesky, or you break contain in other ways.
The responses in this case did not involve death threats, but there are still quite a lot of nonsensical forms of opposition being raised to the very concept of AI usage here.
Another example this week is that one of my good friends built a thing, shared the thing on Twitter, and suddenly was facing hundreds of extremely hostile reactions about how awful their project was, and felt they had to take their account private, rather than accepting my offer of seed funding.
Did Google Break Their Safety Pledges With Gemini Pro 2.5?
It certainly seems plausible that they did. I was very much not happy at the time.
Several labs have run with the line that ‘public deployment’ means something very different from ‘members of the public can choose to access the model in exchange for modest amounts of money,’ whereas I strongly think that if it is available to your premium subscribers then that means you released the model, no matter what.
In Google’s case, they called it ‘experimental’ and acted as if this made a difference.
It doesn’t. Google is far from the worst offender in terms of safety information and model cards, but I don’t consider them to be fulfilling their commitments.
Safety Third at xAI
xAI has finally given us the Grok 4 Model Card and they have updated the xAI Risk Management Framework.
(Also, did you know that xAI quietly stopped being a public benefit corporation last year?)
The value of a model card greatly declines when you hold onto it until well after model release, especially if you also aren’t trying all that hard to think well about or address the actual potential problems. I am still happy to have it. It reads as a profoundly unserious document. There is barely anything to analyze. Compare this to an Anthropic or OpenAI model card, or even a Google model card.
If anyone at xAI would greatly benefit from me saying more words here, contact me, and I’ll consider whether that makes sense.
As for the risk management framework, few things inspire less confidence than starting out saying ‘xAI seriously considers safety and security while developing and advancing AI models to help us all to better understand the universe.’ Yo, be real. This document does not ‘feel real’ to me, and is often remarkably content-free or reflects a highly superficial understanding of the problems involved and a ‘there I fixed it.’ It reads like the Musk version of corporate speak or something? A sense of box checking and benchmarking rather than any intent to actually look for problems, including a bunch of mismatching between the stated worry and what they are measuring that goes well beyond Goodhart’s Law issues?
That does not mean I think Grok 4 is in practice currently creating any substantial catastrophic-level risks or harms. My presumption is that it isn’t, as xAI notes in the safety framework they have ‘run real world tests’ on this already. The reason that’s not a good procedure should be obvious?
All of this means that if we applied this to an actually dangerous future version, I wouldn’t have confidence we would notice in time, or that the countermeasures would deal with it if we did notice. When they discuss deployment decisions, they don’t list a procedure or veto points or thresholds or rules, they simply say, essentially, ‘we may do various things depending on the situation.’ No plan.
Again, compare and contrast this to the Anthropic and OpenAI and Google versions.
But what else do you expect at this point from a company pivoting to goonbots?
I too am very much rooting for SpaceX and was glad to see the launch later succeed.
Misaligned!
Owen Evans is at it again. In this case, his team fine-tuned GPT-4.1 only on low-stakes reward hacking, being careful to not include any examples of deception.
They once again get not only general reward hacking but general misalignment.
Owain reports being surprised by this. I wouldn’t have said I would have been confident it would happen, but I did not experience surprise.
Once again, the ‘evil behavior’ observed is as Janus puts it ‘ostentatious and caricatured and low-effort’ because that matches the training in question, in the real world all sides would presumably be more subtle. But also there’s a lot of ‘ostentatious and charcatured and low-effort’ evil behavior going around these days, some of which is mentioned elsewhere in this post.
Correct, this is a reskinning, but the reason it matters is that we didn’t know, or at least many people were not confident, that this was a reskinning that would not alter the result. This demonstrates a lot more generalization.
Very much so. Yes, everything gets noticed, everything gets factored in. But also, that means everything is individually one thing among many.
It is not helpful to be totalizing or catastrophizing any one decision or event, to say (less strongly worded but close variations of) ‘this means the AIs will see the record of this and never trust anyone ever again’ or what not.
There are some obvious notes on this:
In case you were wondering what happens when you use AI evaluators? This happens. Note that there is strong correlation between the valuations from different models.
I agree with Davidad that what it produces in these spots is gibberish – if you get rid of the block saying ‘counting the German language’s teeth’ is gibberish then the passage seems fine. I do think this shows that GPT-5 is in these places optimized for something rather different than what we would have liked, in ways that are likely to diverge increasingly over time, and I do think that is indeed largely external AI judges, even if those judges are often close to being copies of itself.
Aligning a Smarter Than Human Intelligence is Difficult
Anthropic looks into removing information about CBRN risks from the training data, to see if it can be done without hurting performance on harmless tasks. If you don’t want the model to know, it seems way easier to not teach it the information in the first place. That still won’t stop the model from reasoning about the questions, or identifying the ‘hole in the world.’ You also have to worry about what happens when you ultimately let the model search the web or if it is given key documents or fine tuning.
The result details here are weird, with some strategies actively backfiring, but some techniques did show improvement with tradeoffs that look worthwhile.
I’m very much with Eliezer here.
There is a lot of value in advancing how far you can push AGI before you get into existential levels of trouble, giving you more time and more resources to tackle the later problems.
Claims about alignment:
I mean that’s nice but it doesn’t give me much additional expectation that this will work when scaled up to the point where there is actual danger in the room. If the stronger model isn’t trying to fool you then okay sure the weaker model won’t be fooled.
When you train one thing, you train everything, often in unexpected ways. Which can be hard to catch if the resulting new behavior is still rare.
I am intrigued by the ability to use model diff amplification to detect a ‘sleeper agent’ style behavior, but also why not extend this? The model diff amplification tells you ‘where the model is going’ in a lot of senses. So one could do a variety of things with that to better figure out how to improve, or to avoid mistakes.
Also, it should be worrisome that if a small % of training data is bad you get a small % of crazy reversed outputs? We don’t seem able to avoid occasional bad training data.
How Are You Doing?
A cool idea was that OpenAI and Anthropic used their best tests for misalignment on each others’ models.
I included a few of the charts:
The sycophancy scores suggest we’re not doing a great job identifying sycophancy.
That’s quite a lot of refusing from Opus and Sonnet, but also a much, much better ratio of correctness given an answer. Given these choices, if I don’t have easy verification access, I expect to prefer a lot of refusals, although a warning that hallucination was likely in this spot would be even better?
Same thing here. If verification is trivial then o3 is best here, if not I want Opus 4 as the only model that is correct more often than it is wrong?
All the scheming rates seemed reasonably similar.
Some Things You Miss When You Don’t Pay Attention
If you can’t grasp the full range of dynamics going on with highly capable and intelligent AI systems, you miss a lot. The attitude that dismisses anything weird sounding or that uses a word in a nontraditional way as not real or not relevant, or as something to be suppressed lest people get the ‘wrong’ ideas or attitudes, will cause one to miss a lot of what is and will be going on.
Which in turn means you won’t understand the problems and how to solve them. Such as the extent and in what ways, at the limit, for sufficiently advanced models, this is true?
Training (both for humans and AIs) runs the gamut of knowing you are in training versus not knowing, and of it being ideal to behave differently versus identically due to being in training or know that they are in training, both on the level of the model’s or human’s behavior and in how you set up the scenarios involved.
There are many different arguments being made by Janus and Sauers here.
If you think all of this is not confusing? I assure you that you do not understand it.
Other People Are Not As Worried About AI Killing Everyone
I think we have a new worst, or most backwards, argument against AI existential risk.
Read it, and before you read my explanation, try to understand what he’s saying here.
The argument seems to be:
Okay, yes, so far so good. Intelligence allows mastery of the world around you, and over other things that are less intelligent than you are, even if the world around you ‘uses more computation’ than you do. You can build a house to stop the rain even if it requires a lot of computation to figure out when and where and how rain falls, because all you need to figure out is how to build a roof. Sure.
The logical next step would be:
Or in less words:
Instead, Wolfram argues this?
Wait, what? No, seriously, wait what?
The Lighter Side
It’s difficult out there (3 minute video).
A clip from South Park (2 minutes). If you haven’t seen it, watch it.
In this case it can’t be that nigh…