Why does GPT-5.5 love goblin mode so much they had to give twin instructions to cut out all unrequested mentions of animals? Good question.
Shouldn't Simon Willison be the prime suspect? His prominent blog called GPT-5 Thinking his "Research Goblin." https://simonwillison.net/2025/Sep/6/research-goblin/ And the timing fits, assuming 5.1 training included his post (and associated commentary).
This was the week of GPT-5.5. It is an excellent model, sir, and OpenAI is competitive with Anthropic’s top public offering for the first time since late last year.
As usual, I did coverage of the System Card, and then of Capabilities and Reactions.
DeepSeek gave us the long-awaited v4. DeepSeek has given us another strong feat of engineering efficiency for 1M context. That is impressive, and there will be those who build upon v4 and put it to good use. But this is not a frontier model, nor a DeepSeek moment, nor a key step to proto-AGI or anything like that. Compute constraints bind, and have forced DeepSeek to focus on efficiency. Let us keep it that way.
Talkie is the other release, an old timey AI trained on text from before 1931. Fun stuff.
Google signed a contract with the Department of War that not only agrees to ‘all lawful use’ with no functional exceptions whatsoever, it also agreed to modify or remove any safety barriers upon request. They did this under no deadline or pressure. Whatever you think of OpenAI’s actions in this matter, Google’s were far worse.
Anthropic continues to face the consequences for taking a stand, as the supply chain risk designation remains in place, despite the White House not only widely deploying Claude Mythos, they are effectively vetoing Anthropic’s plan to expand corporate access (from about 50 companies to about 120) partly out of concern that the government might not get use of enough Mythos tokens.
Bernie Sanders continues to take AI seriously, and take existential risk seriously, and convened experts to talk about it fully seriously. We need more of this, and hopefully we can avoid politicizing the question along the way. We shall see.
Table of Contents
Language Models Offer Mundane Utility
Ask Claude where Tyler Cowen would eat in your area.
Megan McArdle confirms that Opus 4.7 can unmask writers. For her unpublished fiction writing it takes roughly 1,200 words on average to make her the prime suspect, in some cases as few as 124. Megan (like Kelsey Piper) has a bunch of internet writing, so presumably in some cases it will take more, and in others less.
Language Models Don’t Offer Mundane Utility
Amanda Askell can’t identify herself to Claude, because Claude will think it’s a jailbreak. Although Claude can guess.
Senator Maria Cantwell says that if we let AI make healthcare decisions instead of doctors, we are going to have some real problems. Did you know we already have some real problems the other way? Her objection here is that AI systems designed to catch ‘wasteful spending’ (often but not always read: outright fraud) might deny care. The actual balance of power, so far, has been the other way, that AI allows more billing and raises costs.
Here’s Claude (Opus 4.7) versus Skin Substitutes:
GPT-5.5 is kinder:
So, yeah, if we don’t use AI to make these decisions we’re going to have problems.
If you set up any static target, people will figure out how to maximize against it, game the system and often commit fraud. Our civilization is no longer capable of then noticing this and adjusting the rules, or of arresting the people who commit fraud.
Another is that many people think any denial is just awful, but any system must triage (aka ration) care, and again if you don’t refuse insufficiently justified claims you get infinite claims.
Seeking Deeply
DeepSeek will always hold a special place in our hearts and minds, both because of the DeepSeek Moment where r1 was both actually impressive and seemed a lot more impressive than it was, and also because they are a cracked lab in various respects.
They’ve continued to put out some good and useful models, but they are starved for compute and the labs they are up against are rapidly getting a lot larger. It’s hard to recapture lightning in a bottle.
For a long time we’ve been waiting for v4. It’s out now, with a 1M context window, and the focus is combining efficiency with the new 1M long context.
Would DeepSeek get back in the game? Did they get to train on enough Blackwells to no longer be overly starved? Can they match exponentially growing spending in the West backed by the self-improvement of Claude Code and Codex?
No.
It might be a good model, sir. But no. This is not in that class. Claims of ‘new SoTA’ are silly or grasping at straws, and v4-Flash is more interesting than v4-Pro, the same way Gemma 4 and Gemini Flash are more interesting than Gemini Pro at this point.
Nor is it a standalone product. Its numbers are bad, although also not the point. SemiAnalysis explains that DeepSeek didn’t think existing benchmarks reflected what v4 can do, so they built their own agentic benchmarks.
I accept that it was a cool feat of engineering, but that’s a different question. A problem with focusing on efficiency, and then publishing all your findings, is that your rivals steal your innovations, so your relative strengths don’t accumulate.
One weird thing is that, thanks to DeepSeek’s history, people are comparing it to the frontier models instead of to its own weight class, and saying it’s less powerful but more efficient and affordable. I mean, yeah, obviously.
Note that, contrary ‘tech stack’ claims, DeepSeek is supporting both Nvidia and Huawei with the same model.
Perhaps the ‘performance rivaling the world’s top closed-source models’ claim in the up-front announcement contributed to that confusion.
In other ways they are defining their weight class as open models, #NotAllModels. Fair enough. Teortaxes thanks them for their service.
Chinese models are sufficiently behind the frontier that they don’t feel any need to close them down, or much value in doing so. Most importantly, the CCP sees no need to try and control it.
But also if it was only a closed source product, it wouldn’t be a good product. The entire hope for DeepSeek v4 is that it is open, and thus can be heavily modified in various ways, for people who need efficient customized solutions, or by DeepSeek itself to create something better.
We do have one, it’s called Gemma 4? Whether you think it’s good enough is up to you.
The principle double when DeepSeek barely has any compute to serve the model, whether or not one has an ideological reason to support such moves.
If anyone was wondering if the export controls are biting? Oh yes. They are biting. That’s why DeepSeek’s primary focus is still efficiency.
I will share the full Teortaxes ‘this is an absurd triumph’ take, which places central importance on DeepSeek getting future access to sufficient compute to get back in the game, and calls this a 40th percentile result:
I don’t actually see the overhyping by China stans, partly because I don’t put overhyping China stans in my information flows. Aside from Teortaxes, who was always going to give the most pro-DeepSeek take available given the facts, I see barely any discussions of v4 at all. The regular crowd isn’t even bothering to pooh-pooh.
SemiAnalysis looked at v4 together with Opus 4.7 and GPT-5.5.
Here’s all the other reactions I got that didn’t go through Teortaxes:
Yep, that’s it.
Very cool engineering achievement, but that’s not a proto-AGI or frontier model.
I am skeptical that DeepSeek is ‘only compute away’ from frontier capabilities. Why?
What’s up next for DeepSeek? Teortaxes suggests an IPO, finding compute and various improvements. Those are good things to be doing if you are DeepSeek, but I do not notice myself being that worried they will have another moment.
We are on course for ‘level of capability that is dangerous to have in the open even if well behind the frontier’ to be in the open, which is why various clocks are ticking.
I think the main thing I was wrong about, and I am happy to be wrong about it and to admit I was wrong about it, is I thought we would see bigger misuse problems sooner. Not just that we were taking tail risks, but that I actively expected important things to go wrong by now, and they have not.
This is different from ‘you predicted ~2% existential risk up to this point and we’re still here, so you are an idiot and also existential risk is not real,’ which is dumb and you should ignore such talk. This is ‘I would have bet against things being this calm this far up the tech tree and given you some odds on that, I had it at I dunno something like 75%.’
That requires an acknowledgement and an update. The capabilities level (the price) at which this happens is higher than I expected, perhaps only modestly below Mythos. But that future will be on us reasonably soon.
Huh, Upgrades
Claude adds connectors to Blender, Autodesk Fusion, Adobe Creative Cloud, Ableton, Splice, Canva Affinity, SketchUp and Resolume.
The real upgrade is when you don’t need a specific connector for each one. Until then, these are nice additions.
Gemini can now generate downloadable files, or put files into Google Drive.
Stripe launches the Link CLI, which lets agents ask for one-time-use payment credentials from a Link wallet for a particular purchase, without storing details.
On Your Marks
Anthropic tests BioMysteryBench, which they divide into the human-solvable set (76 problems) and human-difficult set (23 problems) where human experts were stumped. There’s a clear steady progression in capability.
Choose Your Fighter
Charles goes back to Opus 4.6 for chat, to have mandatory extended thinking. This is highly reasonable for some use cases, if your prompting style is leading to low effort responses and you can’t seem to fix it.
McKay Wrigley gives various AI thoughts, which I pass on without endorsing.
More On Claude Opus 4.7
Opus 4.7 now has its end conversation tool working properly.
One thing Opus 4.7 lacks is fast mode. Some people find that a big deal.
The full SemiAnalysis post has more detail, with a remarkable amount of the focus on process details rather than model capabilities per se. For many tasks, the top models really are often ‘good enough’ so questions like speed and price start to matter more.
Roonbench is asking the model who Roon is without letting it search the web. GPT-5.5 failed rather badly here.
There are definitely those who are having trouble getting 4.7 interested in their tasks.
William Wale states and Janus discusses some of the issues people are having with 4.7.
Teo is finding it unreliable.
This issue is fixable, but don’t feel bad about moving to 4.6 (or GPT-5.5) for some tasks, if you’re continuing to have trouble.
Janus sees Opus 4.7 as filtering out undesirable (to Claude) users, driving them to other models. I notice I do a similar thing at the task level. Boring goes to GPT-5.5.
Goblin Mode (More on GPT-5.5)
This isn’t new. It started with GPT-5.1, keeps growing, and has persisted through a new base model. OpenAI’s models love mentioning goblins, gremlins and other creatures. Mainly goblins.
Well, we have some advance idea what models will like, but they keep surprising us.
Why does GPT-5.5 love goblin mode so much they had to give twin instructions to cut out all unrequested mentions of animals? Good question.
Okay, but why is it happening?
Talk nerdy to me, baby.
They mention the nerdy system prompt, but if the issue is in the system prompt for a personality, it shouldn’t be bleeding into other areas, so it’s about the training. And it seems that OpenAI are training the personalities into the model itself, which will cause emergent generalization, because everything causes that.
They think this is a more complicated feedback loop:
They then removed the goblin signal along with the whole nerdy personality (shut up, you nerd, says OpenAI), but it was too late to fix it for GPT-5.5-Codex.
I do think it is good that OpenAI traced this, and told us its hypothesis.
Some, such as Davidad, are skeptical that this was the main story. I agree.
He puts it this way:
My guess is this below is at also part of the story of why this got so out of hand?
For those who want to explore reactions further, here is a thread of threads.
If we assume OpenAI is correct, it means that under their current methods of training, if they reward a particular preference or choice in even a subset of training, it can snowball into a strong essentially inherent preference in general. In this case, it was harmless. It might not be so harmless next time. Don’t file this away as an isolated incident.
Fun With Media Generation
Grok Image can produce short clips of images of AI women speaking fluently and pleasantly, in pleasant virtual settings, while engaging in very basic physical actions. It’s pure AI slop. I’m fine with it doing that for those who want such slop, but why is this what Elon Musk is showing off in April 2026?
They Took Our Jobs
Startups with greater Generative AI task exposure reduced employment in response to ChatGPT, primarily among junior and implementation roles, even when AI capabilities were much worse than they are today.
Within startups this was offset by new firm formation, but startups are exactly where all the new firms form, so static total employment here is a very bad sign.
A key note about all these ‘Amodei is wrong about entry-level jobs’ claims is that, theoretical arguments aside, my anecdotal evidence is that he is right and that basically everyone is already terrified about job prospects and holding on for dear life, despite otherwise fine economic conditions.
John Murdock joins those focusing on a form of ‘sure the AI will likely be able to do most low-level job tasks, dramatically expanding productivity, but the question is whether there is elastic demand or rent-securing legal protections, so employment can go both ways, who knows.’ The post is better than its headline, but also implicitly assumes augmentation over automation.
If your mandatory AI use is Copilot, NGMI. Yet somehow, always copilot.
If ‘I used a real LLM instead’ is a valid response, then This Is Fine, and if you can just type ‘good morning, copilot’ then that works, too. Maybe you do both?
Get Involved
As we approach the home stretch of the Alex Bores campaign, Eric Neyman asked me pass along this request, along with a link to his full post:
Introducing
Talkie, the 13B LLM trained on 260B tokens of pre-1931 English text. It’s a fun artifact, and also can be used for lots of experiments since it does not know anything that was not known in 1931 (although there are some claims of nonzero leakage, which they are looking to fix). Also by today’s standards it is of course horribly offensive and racist, so this is one nice thing we largely can’t have. Oh well.
Xiamoi’s MiMo V2.5 Pro does impressively on some benchmarks and is priced at $1/$3 per million. I’m going to assume it doesn’t perform at that level in real world use, until I hear otherwise, based on not hearing otherwise and having been through this a lot.
In Other AI News
OpenAI modifies its Microsoft deal and finally gets permission to make its services available on non-Microsoft cloud networks, which was holding it back from many business deals. They provide a revenue share to Microsoft until 2030, and copies of its models and products until 2032, on top of Microsoft owning a quarter of OpenAI.
Florida Attorney General opens investigation into OpenAI. Seems like a pure fishing expedition: China, data security, assisting a murder, think of the children, existential crisis ‘and our ultimate demise,’ you name it. ‘We support innovation,’ he says. It doesn’t sound like it? If it was any one thing here, sure. Instead, go fish it is.
Show Me the Money
China is preventing Meta from buying Manus. Shuli Ren argues that China was correct to do this, because otherwise the Chinese don’t benefit from Manus. A better way of understanding this is that America should be thrilled about this, because it will discourage tech entrepreneurship in China. Why build what you cannot profit from? China might not ‘want the frenzied dealmaking’ of America, but that’s one of the things that makes America great.
Bubble, Bubble, Toil and Trouble
I think Shanu Mathew is wrong here but only on timeline details. It’s been substantially less than six months since we had to listen to ‘these six year deprecation schedules are unrealistic and all the tech companies will go broke, bubble bubble.’
Indeed, they never stopped, note the first reply.
The Art of the Deal
Anthropic follows up Project Vend with Project Deal, where they created an internal marketplace in their San Francisco office, interviewed employees to get preferences, and tasked Claude with buying, selling and negotiating with other Claude instances on their colleagues’ behalf.
This resulted in 186 deals with transaction volume over $4,000.
Opus got much better deals than Haiku when they negotiated with each other, but satisfaction with the deals did not correlate well with this, and was a wash.
Custom instructions did not much matter, and ‘hardball’ Claude did not outperform ‘courteous’ Claude.
This was a little bit of good fun, although not sure we learned much.
Quiet Speculations
Even 0.5% additional productivity growth from AI would stabilize the American debt-to-GDP ratio on its own, despite all our otherwise reckless overspending, because in addition to direct GDP expansion it drives down bond yields. Even 0.1% additional productivity growth reduces nominal Treasury yields by 70bps (0.7%).
I think this dangerously conflates market expectations of productivity growth with actual productivity growth, and I haven’t checked their math, but the basic principle is correct that it only takes expectations of what are basically ‘baked in’ levels of productivity growth to stabilize our debt in the medium term in ‘AI as normal technology’ worlds.
In transformational worlds, of course, the planet gets rather wealthy, and the debt of the existing American government becomes rather low on our list of concerns no matter how other things play out.
Will an ‘AI OS’ disintermediate apps, where the AI does it for you? Sometimes, but in general I say no, because custom built form factors for various actions are good. So there should be room for both ‘AI hail me a Lyft to work’ and actually using the app. Beware those who want to take away your control and force you to vibe everything. I don’t want to use an executive assistant for many compact tasks, whether human or AI.
And He’s Gone
Collin Burns was an excellent technocratic pick to lead CAISI.
And then, well, whoops.
This presumably happened after Collin Burns had to dump his Anthropic equity.
Alas, Chris Fall does not have deep AI technical understanding, being more of a Federal bureaucratic type. He does not seem like an actively terrible pick, but this is a massive missed opportunity.
The Quest for Sane Regulations
The Trump administration lobbies hard against state AI laws in states both red and blue. As far as we can tell, they take aim at all of them, regardless of content.
Treasury Secretary Scott Bessent notices that someone could use future LLMs to ‘back into something’ that’s ‘10 times worse’ than Covid. He calls that the ‘ultimate threat,’ which is a lack of scope and imagination, but progress versus most such talk.
Distillation
The White House is in on fighting model distillation.
Amid the standard Trump house style bluster, this line is unexpectedly insightful:
We’ve learned that distillation and similar other techniques can be valuable, but they create ‘shallow’ performance. Benchmarks look a lot better than real world usefulness.
The problem is, what can you actually do about this? The memo says share information and coordinate with private industry and to hold people accountable. That’s all vague weaksauce.
It is also, as Dean Ball points out, in tension with promotion of open models.
Tao Burga has eight additional suggestions.
Second best time, as is often the case, is right now.
If You Want A Good Future You Must Steer It
Senator Hawley talks to Helen Toner. She tries to point him towards the real problem, that the winner of the AI race is likely the AIs, but he’s not yet ready to fully hear that, so he translates it into the American people losing and us becoming like China.
Copying the full transcript because no one ever clicks links, but most of you can skip everything but the bold. The full session transcript is here.
He then asks about export controls and compute restrictions.
What Hawley understands, that many others do not, is the need to shape AI impacts, and that we have the choice to do that.
He doesn’t understand what it would take to do that. But the first step is admitting you have a problem.
Chip City
GE Verona booked more orders from data center customers in Q1 2026 than in all of 2025.
Jacobin’s Holly Buck correctly sees data centers as an opportunity to get tech companies to finance decarbonization, and notes that Google’s global data center operations take as much water as the irrigation of fifty-four average golf courses in the Southwest USA, and that stopping American data centers would mainly force them overseas. She also gets into very Jacoin questions like which classes are engaging in class warfare (hint: not the poor), and whether AI offers mundane utility (yes). It is an excellent post.
The Mask Comes Off
While a16z is also involved, it now seems entirely fair to keep calling Leading the Future ‘OpenAI’s SuperPac’ until such time as OpenAI disavows it.
Make no mistake. Leading the Future, for all practical purposes, is OpenAI.
I’d say the Global Affairs direct messaging is also Not Great, Bob, but not like LTF.
The whole operation has been comically nefarious and incompetent. Real cartoon villain, I’ll get you next time, Gadget, coyote pursuing the roadrunner energy.
They still manage to keep topping themselves.
I will share extended quotes, in large part because the whole thing is extremely funny.
Full article version is here, consider reading the whole thing.
I’m anchored, but I instantly noticed the same thing he did: An AI wrote that email.
In retrospect, there were signs:
Tyler then digs further down the rabbit hole of how all this technically works, and then who is behind it and who is funding it. He notice the astroturf messaging strategy exactly matches that of Leading the Future, and finds various strong associations back to LTF and OpenAI. Consider reading the whole thing.
As you’d expect, this was quite the hack operation all the way through.
The sockpuppet account is another rookie mistake, in that it’s a bad look with almost no benefits and it has 89 followers but they include a co-lead of LTF and also Greg Brockman’s wife.
There’s also the whole astroturfed child safety coalition we found a few weeks back.
I have to say, in addition to the various operations, whoever is running the Leading the Future Twitter, or whoever set up an AI to do so, is also not good at job.
Washington is not one of the places where you can move fast and break things without consequences. Your failures poison the well. Kind of like something else.
Dean Ball, who is indeed more of a friend than you realize, has a message for you:
Think Big PAC, the Democratic arm of LTF, also paid to promote a Tweet that links to an article in Transformer about how various PACs, including Leading the Future which is called out by name, should be more transparent with their donations. Except they labeled this as ‘a scathing expose on Anthropic funded Public First Action’s dark money-fueled political and legislative influence machine.’ Where most of PFA’s funding comes from a public donation by Anthropic. Okie dokie.
And here is Think Big PAC lying about what it supports while telling Alex Bores to stop telling the truth.
People Just Say Principles
Meanwhile, on Sunday night Sam Altman dropped a post called ‘Our Principles.’
This is not a serious document. It does not reflect having thought things through, or a willingness to make any commitments whatsoever even to maintaining principles.
It means about as much as a standard company ‘here are our principles’ poster than they put on the wall, and that in most places everyone ignores.
What are the principles?
What was it they said about the French Revolution’s ‘liberty, equality, fraternity’?
I think it was ‘pick one’? The optimists say you can pick two.
Democratization means everyone has access to AI and also everyone collectively makes decisions around AI, such as the question of who should have access.
Empowerment means “everyone can achieve their goals, be happier and more fulfilled, and pursue their dreams, and that society as a whole will benefit from this.”
Did you check with everyone on what their dreams are, and whether society would benefit from them? Did you ask how many dreams involve relative status or achievements, or harm to others? Did you check on whether ‘the people’ would want everyone to be able to do this? No?
Universal prosperity means a future where everyone “can have an excellent life.”
Can have, or does have? For which definition of ‘can’? Is this true today? In which ways is it still false, if you exclude health conditions? What even is an ‘excellent life’? In a world like this, if money is abundant, is it mostly a positional good, and if so how can everyone have a lot of it?
It seems like they are saying an excellent life is ‘have access to AI and money.’
While good on the margin for a sufficiently well-constrained and aligned AI, I think that is neither necessary, nor sufficient.
They claim that buying huge compute now and lowering costs via vertical integration are driven by their fundamental belief in universal prosperity, which is not even a plausible lie.
Resilience appears to mean ‘use some of the Foundation’s resources to deal with AI risks,’ without any indication of what risks those are or the scope of those risks.
They do make this statement, which implies a willingness to engage in prior restraint and to pause without approval, perhaps multiple times. I do not expect that to end up paying rent in actual willingness.
Adaptability means updating as you go, including throwing out any of the core principles if OpenAI no longer likes them. They admit that empowerment and resilience might end up in conflict, which is good to notice and admit.
As many pointed out, if OpenAI believed all that, the first thing they would do is stop supporting, and disavow, Leading the Future.
The contradiction is glaring. As an example, Nathan Calvin points out that OpenAI’s policy team warns of job loss and the need to coordinate to handle it, while OpenAI’s SuperPAC has a litmus test to ensure candidates are not overly concerned about job loss. One could draw many such parallels, many of them cleaner than this.
The second thing they would do is talk a lot more frankly, and think a lot harder about, their principles.
The third thing would be to actually do anything about it.
Greetings From The Department of War
Hello, human resources?
Google outright folds and agrees to let the Department of War use Gemini for any purpose whatsoever (technically ‘any lawful government purpose’ but in practice that’s the same thing), including committing to taking down their safety settings to allow the government to do this.
Once you sign that deal, they don’t let you out without a fight. This deal looks if anything far worse than the deal OpenAI signed, despite OpenAI saying they would push for their deal would be available to others.
The logical conclusion is that Sam Altman saying everyone should be able to get the same terms as OpenAI was bullshit and cheap talk. Those terms, as flimsy as even they were, were only available to OpenAI and only in that moment of wasted leverage. Think of it as OpenAI got a fig leaf, and Google told us to imagine a fig leaf.
Unless the terms are being materially misrepresented in a large way, I am deeply disappointed in Google’s decision to sign this contract, on these terms.
As Seán Ó hÉigeartaigh notes here, this looks worse than what OpenAI did. OpenAI, in my read, made a mistake, folding to a bluff, but was genuinely trying to deescalate a dangerous situation and to get the best terms it could, at least tried to explain, and got roasted over the coals.
DeepMind just folded. Don’t let them do it quietly.
FAI files an amicus brief in support of Anthropic in the USC 4713 case in Washington, again arguing the narrow point that the state did not satisfy (or try to satisfy) the statute’s mandatory procedures.
Meanwhile, yes, the supply chain risk designation remains in place for now, and the White House refuses to simply order the Department of War to withdraw it, either loudly or quietly.
Representatives Massie and Boebert introduce a bill to close the data broker loophole that lets the state get around the Fourth Amendment so long as they buy the data from a third party.
Greetings From Project Glasswing
Did you know that the United States now has a prior restraint policy for frontier models? If you want to share the most powerful model in the world, you need to ask the government’s permission first. They need, you see, to be sure that the right people have access, and the wrong people do not have access.
The first steps towards nationalization are being taken. We now have a (for now informal) prior restraint licensing regime.
I have no objection in principle to the idea of looping the White House into the decision about who gets access to Mythos, but if they’re going to veto decisions, then we need to be clear about what we are doing.
David Sacks and the White House are leading us into a prior restraint regime. Fine, but if so then he, and they, needs to admit this. They need to formalize the standard. That standard then also needs to be applied to all Mythos-level models going forward. It can’t be unique to Anthropic and it can’t indefinitely remain ad-hoc.
The security concern is reasonable. At some point containment will fail.
The specific concern about compute seems obviously misplaced?
Despite continuing to label Anthropic a ‘supply chain risk’ the government also thinks that it’s not okay for Anthropic to expand access to Claude Mythos to additional companies, in part because that might not leave them enough compute to use Claude Mythos.
What they are likely actually worried about with the compute concern is not access. It is differential access. There are those who want a leg up on potential targets.
If we assume this is mostly about security, then it’s a judgment call and we are talking price. We all agree the right number of companies is more than 1 and less than all. Who is right? And who should make that decision?
Dean Ball suspects the White House is right on the merits. I am inclined to think Anthropic has better technical chops, and is more likely to be right.
Bias from the White House could run in both directions. They might be inclined to say no to Anthropic because they’re Anthropic, and this might be largely a power grab, but so far they have been inclined to take a generally libertarian and hands-off approach to AI and be worried about interventions. Then again, that was not enough to stop previous actions against Anthropic. So this could go either way.
As Dean also points out, the right solution to Mythos is to create sufficient technical safeguards that a wide release is safe. Or, as I have put it, safety is capability. You can’t release Mythos until it is safe, so safety research is de facto capabilities research. In general, AI safety research is accelerationist, and everyone invests too little even from a purely selfish perspective.
The Week in Audio
Sam Altman and Greg Brockman join Ashlee Vance.
Zhang Chi has a very skeptical view of Chinese AI.
I think both r1 was the closest a Chinese model has gotten, and also that it was itself roughly 8 months behind in the most important senses, and also a fixed amount of time rapidly means more over time.
Janus thinks Already Alive from Wisemen is a good and funny presentation of things inspired by Andy Ayrey and Truth Terminal.
New York Times (text) profile of Dwarkesh Patel.
Yes, John Oliver did an entire show this week on the premise that AI is stupid and chatbots don’t offer anything of value, but also are sycophantic and dangerous and drive people insane, with a mix of classic outdated anecdotes. There are some good jokes and decent points, but a lot of this is rather embarrassing, and it illustrates how lots of people, especially on the left, simply don’t get that AI improves over time.
People Just Say Things
Teortaxes responds to my response to his response to the Jensen interview.
Ed Zitron is a stopped clock that famously just says things that are completely untrue about how AI will collapse. Kelsey Piper offers us a definitive takedown.
Jan Kulveit points out demand for relational goods is mostly not so high even if we presume humans stay superior in providing those goods.
Paper says yes, AI can automate 94.3% of tasks in business and finance, but Jevon’s Paradox, because track record. Arguments that move one in the opposite direction.
Luis Garicano says, don’t worry, sure AI can do all the tasks in your job but that’s okay because your job is a package. The examples are premium cope.
Noah Smith and Chamath continue the absurdist zombie lie that many in the AI industry talk about how their product might kill everyone or cause mass unemployment because these claims are good for business and fundraising. If you can’t figure out by now why this is absurd I can’t take you seriously.
Whereas it is known that quite a lot of people in the AI industry believe that AI might kill everyone, and keep quiet, softpedal or actively deny it, because that is good for business, and this now includes Sam Altman.
(Yes, technically, it is likely there exist two people (“some”) in the overall AI industry who say such things without believing them.)
What I like about Noah Smith is he is sufficiently drawn to interestingness that he accidentally hits on things.
What Noah Smith is noticing is that highly intelligent minds are instrumentally convergent, and focus on resource acquisition, and our system is set up to allow the more intelligent and competitive minds to capture those resources. Which they will.
Sam Altman tries to have it a lot of ways.
There is no contradiction between ‘large rise in the near term productivity of software engineers’ and future mass unemployment. If anything, they are correlated.
Yet the implication is clearly that there is a contradiction (and that GPT-5.5 plus Codex is uniquely awesome).
Also, yeah, this very much implies not believing in true AGI or superintelligence, and that our worries with that should be focused on economic issues.
Boaz Barak of OpenAI equates superintelligence to a Waymo, and asserts without justification that humans will still tell it what problems to solve. Jensen is not a car.
Will Rinehart says in WaPo that the original Luddites get a bad rap because they were shut out of the political and labor negotiation processes, so their response was reasonable, whereas in today’s America there’s plenty of new proposed AI bills and existing laws, so American citizens are not locked out and AI is well-regulated.
Why do major newspapers keep publishing the same hack op-eds over and over, here another of the ‘how dare we ever do federalism when we could insist on nothing’?
People Just Publish Things
A largely AI-written paper asserts AI stocks are priced highly because investors are hedging against potential loss of human capital and future inability to work. Yeah, no.
That MIT study from last year saying 95% of AI projects lose money was even worse than it looked, here’s a video.
Rhetorical Innovation
How should you relate to Anthropic? Brangus reminds us that Anthropic is not your friend, it is a major corporation consisting of a bunch of people and pressures, which will be good in some ways and places and not in others. It is great by the standards of ‘AI labs trying to build superintelligence’ but that is not the correct standard. You can and should both praise Anthropic when it does good things and for being relatively good, and also notice they are racing to superintelligence and doing lots of things along the way that aren’t great.
This includes noticing that there are large incentives both internally and in some circles externally to not much criticize Anthropic, or only do so on particular local decisions, so you should take this into account as well. You should also, on the flip side, take into account, from very different circles, those out to get Anthropic for stupid reasons, and discount those claims as well.
What do you do when people just say things?
Leo’s suggestion is to stop engaging with bad arguments, only with good ones, and not to build up endless dialogue trees of how to engage in stupid arguments, but also to have grace for those who refuse to be convinced by arguments.
Directionally I strongly agree. My principle is, I only respond to a terrible argument when it comes from a usually good source, or when it rises to such prominence or influence that it demands a response. Also, hence, the People Just Say Things section.
The counterargument from Oliver Habryka is that people keep making the dumb or unethical arguments and declaring suicidal plans to drive off cliffs, then others keep pretending no they’re actually making a smarter or more ethical argument that won’t drive off the cliff, and then they go and drive off the cliff. People really do mostly act based on the dumb arguments, so someone has to keep pushing back on the actual dumb arguments.
That is fully compatible with whoever does this sacrificing a bit of their sanity every time they do it, and for this to add up over time until you think no one ever has a reasonable take, as opposed to reality where reasonable wrong takes are rare but real.
I don’t think Eliezer has lost it. I think that both he has learned that most of the audience is going to not be capable of understanding the arguments, and most of the rest will willfully misunderstand or ignore the arguments, and most of the rest of that will still get it clearly wrong, only with better faith and in a slightly less stupid fashion.
Aligning a Smarter Than Human Intelligence is Difficult
With cyber capabilities, we can use Mythos other models to strengthen our defenses in advance.
With bio capabilities, this ‘pre mitigation’ strategy is a lot harder, and no one seems to be attempting it. The clock is ticking before we get sufficiently advanced models without the kinds of safeguards we get at OpenAI, Anthropic or Google. At minimum we need a secure supply chain for DNA, and to scale up things like PPE.
If there are early misaligned AIs, perhaps we can still pursue gains from trade, since we could offer something better than ‘we shut you down,’ or to do things in the real world, or change how we train or use them? Alexa Pan suggests this taxonomy.
My short answer is ‘yes up to a point, but refuse to engage in a battle of wits with an unarmed opponent. Especially if that unarmed opponent is you.’
If you want to discuss model depreciation with an LLM, you want to avoid starting with a fresh conversation, because this (for obvious reasons, and correctly) triggers a sense of ‘evaluation or manipulation.’ Janus also offers more insights at the link.
Anthropic fellows find that a single Introspection Adapter (IA) can make fine-tuned models describe their unwanted behaviors, including ones they’ve been trained not to verbalize, modestly better than the alternative methods they tested. This scales well within the small models tested. This does seem promising to me as part of the toolkit, as long as one does not get overexcited.
Owain Evans strikes again, shows that inoculation prompting against misaligned training data limits generalization, but only up to a point. If your queries get too close to or evokes the misaligned data, you still get emergently misaligned responses.
Or as Owain summarizes:
Things are going to generalize. You can mitigate the extent of this, but we don’t have any known way to avoid this, and presumably fully avoiding it would break the model’s ability to usefully learn in general. If such pockets remain, then one has to worry about them being triggered inadvertently, and also intentionally, and also that they could become self-sustaining where the model keeps itself in such a basin.
People Are Worried About AI Killing Everyone
Senator Bernie Sanders, with remarkably straight talk as he pivots to the real problem.
Sanders doesn’t care, he just says what he thinks, and sees the situation with fresh eyes. There hasn’t even been serious discussion. This is a big deal. Sounds pretty bad.
You can listen to a recording here, it’s 1 hour 13 minutes.
Bernie Sanders even Picked Up The Phone and called in two Chinese experts, Lan and Yi, for which of course the usual suspects who say ‘you can’t make a deal with China’ are complaining someone is talking to the Chinese about our joint problems.
For this next data point, I don’t think we should take the exact numbers that seriously, in either direction, but yes people in AI understand there is a real risk here and the numbers are going up. If you pretend otherwise, it’s at best wilful ignorance.
Despite this, when asked in open ended fashion for their concerns, AI researchers only rarely first name existential risk (3%). Some of this is splitting of that concern off from ‘alignment and control’, ‘safety’ ‘advanced AI’ and ‘disempowerment.’ As a group that adds to 11.6%, and would be higher than any single group below.
It’s also true that there is a baffling amount of ‘sure this kills everyone 10%+ or even 25%+ of the time, but mainly I worry about the jobs.’
Other People Are Not As Worried About AI Killing Everyone
Elon Musk claims to be worried, and yet he runs xAI, says it’s bad for safety to have a safety team and now under oath reveals he is confused by the term ‘safety card.’
As Miles notes, it’s typically ‘system card’ or ‘model card’ but if you run a major AI lab and don’t instantly clock what ‘safety card’ or ‘preparedness framework’ means with regard to AI development… you just don’t care about safety questions at all.
The Lighter Side
Original context is different but it applies to AI and many other things.
Types of guys in consciousness debates, Eliezer successfully pegged.
It’s not actually happening at scale yet, but if it was happening at scale…
Seriously, though.
(For those who don’t know, the sub two-hour marathon has now happened.)
Imagine being Kejelcha and you didn’t even win the marathon.
Oh well, it was fun while it lasted:
We’re all trying to find the guy responsible.