The major technical advances this week were in agentic coding, as covered yesterday.
The major non-DoW political and alignment developments will be covered tomorrow.
The DoW vs. Anthropic trial continues. Judge Lin was very not happy with the government’s case, which makes sense since the government has no case and was arguing a variety of Obvious Nonsense. The question now is how much preliminary relief Anthropic is entitled to. Assuming we find that out this week, I plan to cover that on Monday.
Beyond that, we have new iterations of questions we’ve dealt with time and again. The debate on jobs gets another cycle. Anthropic asked over 80,000 people what they think about AI, and has published those findings, nothing shocking but interesting throughout.
OpenAI is raising money again, although the terms raise some eyebrows. Elon Musk is announcing a grand chip project, but it was already kind of announced and it’s not like we should believe him when he says such things.
I used this lull to drop a giant responseto Open Socrates, which is technically a book review but uses that as a taking off point to outline a distinct philosophy and approach, or at least is trying to do that, in ways that yes are highly relevant to life and also to AI. That doesn’t mean you should read it in its current form, it is very long, but yes there is (I like to think) a bunch of gold within it. Long term, the goal is to find a much better way to give out the gold without the fun but inessential other parts, and take the time to write a short letter.
How does use change as users skill up? Longer term Claude users iterate more often and more carefully, and hand over full autonomy less, but the changes are small.
There is a lot more in the full post, although nothing was too surprising.
Shruti Rajagopalan calls it the most granular feedback she has ever recieved, and it was consistently correct and will be a key part of her future paper reviews. She affirms it was a lot more granular than GPT-5.4 Pro or other top options.
Arnold Kling points out that yes the output is impressive but that in his test Claude Opus 4.6 can do about as well out of the box on Cochrane’s work. If you’re doing serious such work it’s worth asking every source you can, and then comparing outputs.
The hype over Refine seems a little carried away.
Tyler Cowen: When will “the research paper” disappear in economics?
Soon enough you will be able to take any published research paper and tweak it, or improve it, any way you want. Just apply a dose of AI.
Using Refine, you already can judge the quality of all past papers, once you get them in uploadable form. We now can rewrite the entire history of modern economics with the mere investment of tokens. Which papers in the 1993 AER were really the good ones? Which are simply false and do not replicate?
I do expect paper replications and evaluations to get very strong over time. And yes, if you want to you can use Refine to judge the quality of all past papers, but there will be major errors in those evaluations even in past papers, where you are in a non-adversarial evaluation environment. For future papers, expect this to break down further.
Nor would I expect this to make good researchers less valuable until we reach a very high threshold of ability to replicate the entire process. A great researcher will be able to iterate quickly, find the right questions and get, select and synthesize the right results. When Tyler Cowen suggests that ‘look at this data set and tell me what is interesting’ will soon suffice to extract the value of a potential paper, then you are dangerously close to being AI complete.
You’re reasonably safe if you’re not in an adversarial situation, and the text being evaluated is being generated for human consumption. But if you’re in an anti-inductive situation, or even a straightforward adversarial one, you’re cooked.
Sam D’Amico: Completely insane. Meanwhile I can copy and paste an unlabeled YouTube transcript into Claude and get the full interview transcript.
Chandu Thota (VP Engineering, Google): Would love to get your feedback on our most recent launch/update.
If I want a properly labeled YouTube transcript as a Google document, as I often do, you would think I would ask Gemini since these are all Google services. But no, I ask Claude Code to use a skill I built, and will keep doing so until someone at Google explains to anyone how their products can do such things. They announce things like ‘reimagining content creation with Gemini in Google Docs, Sheets, Slides and Drive’ and my feedback to Chandu Thota is that I do not know of anyone who has gotten this to usefully work.
It turns out there are various ways to solve ARC-AGI tasks.
loveofdoing: 316 ARC-AGI tasks solved with zero learning. No neural net, no training, no DSL — just 19th-century projective geometry. Encode grid cell relationships as Plücker lines in P³, find transversals via Schubert calculus, score candidates by geometric incidence. 95% solve rate on the eval set (of non-timeout tasks).
Single C file, runs in seconds.
Well, now that we’ve beaten ARC-1 and ARC-2, I suppose it’s time for ARC-3?
François Chollet: At the moment, ARC-AGI-3 is the only unsaturated agentic AI benchmark. Sub-1% scores from frontier models on the private test set.
If you want to be among the first to know when an AGI breakthrough happens, monitor the ARC-AGI-3 leaderboard. Any sudden score jump will mean something important has changed about AI capabilities.
This happened twice before: sudden ARC-AGI progress marked the advent of AI reasoning (December 2024 jump on ARC 1) and the rise of agentic coding (late-2025 jump on ARC 2)
ARC-AGI-3 will be your early warning signal.
Cue the goalpost moving gif, as we repeat the process:
Find something in particular chosen because the AIs can’t do but humans can do, with restrictions on how the AIs can approach the problem.
The AIs find ways to do it better anyway.
Increment, declare AGI will happen this time but hasn’t happened yet, repeat.
La de da. Fun times for the whole family. Unless you all die. Real shame.
Lisan al Gaib: this is pretty much worst case performance. No harness at all and very simplistic prompt.
Prompt: “You are playing a game. Your goal is to win. Reply with the exact action you want to take. The final action in your reply will be executed next turn. Your entire reply will be carried to the next turn.”
General intelligence does not mean that you have been specifically trained for a large range of tasks. It means you can approach any NEW task and figure it out, just like humans do.
If regular people can do it on their own (no guidance, no tools), why should AGI require special handholding and handcrafted instructions? If it’s AGI, why would there still be a human in the loop, using their own human intelligence to guide the model on every new task?
François Chollet: Either you believe AGI is possible, in which case a real AGI will be able to look at ARC-AGI-3 and ace it, because regular humans can or you believe that AI is just an automation tool that will require human intervention every time a new task comes up.
Pick your camp.
That’s a motte and bailey.
Obviously, if the AI needed human intervention on each problem, that’s not AGI.
However, if the AI uses a general harness, such as a better system prompt or something like Claude Code, why is that not AGI? The GI must lie within the weights in particular?
That’s kind of like saying if a human can do it, they should be able to do it without arms, legs, tools or pants. Or at least it’s like having the human do a test in an isolated room, in the dark, and then concluding the human can’t do the task in general.
“That’s not a fully general intelligence,” they said as… well, you know the rest.
The real benchmark is when they announce ARC-AGI-4.
Get My Agent On The Line
Rachel the AI agentphoned 3,000 pubs to price a pint, for a total cost of about €200, via explicitly pretending to be a potential customer, using ElvenLabs. Too much of this and you won’t be able to call the pub.
A common technology diffusion story is that it gets too big and useful too fast, and then no one can stop it even if they want to, whether or not they should want to.
@viemccoy (OpenAI): There’s always been a point-of-no-return with agent productivity where no amount of geopolitical pressure can put the cat back in the bag. I think we just reached it. Clever prompting and threadmaxx training can probably turn you into an oil baron if done correctly these days.
I have been saying, for years, that the only effective bottleneck on AI deployments is to not create the relevant AI models in the first place. Once the models exist, ‘oh we will not use it for autonomous killer robots or mass domestic surveillance’ or ‘we will not use it for agents’ is a much higher lift and basically not going to happen in general. All arguments of the form ‘no one would be so stupid as to’ are wrong, as is ‘we would collectively ban that.’
The exception is if we can create a regulatory bottleneck where we can make the action effectively illegal, and draw a clear boundary around it, and it involves impacting the physical world, and also the governments do not want to do it. Then you have a shot, and you can prevent entire societies from, for example, curing diseases or building houses, and keep the price of things like practicing law or medicine or cutting hair artificially high indefinitely, up to a point.
But agents? Yeah, that’s a Butlerian Jihad level of intervention. Can’t stop it.
Orin Kerr: An attorney writes to me about the mostly AI-written law review article he had accepted this spring, now forthcoming in the flagship law review of a Top 50 law school. A draft of the article is now up on SSRN.
According to the attorney: “Last month I used Claude to assist in drafting a new article . . . . I drafted this article in about 15 hours. In 2022 I published an article of similar length that took around 150 hours.”
The attorney adds: “I used Claude the way I’d use a junior associate—as a first drafter, sounding board, and research assistant. Most of the article, including the entirety of the title, abstract, and intro, is mine from the keyboard up. And anything Claude contributed that made it to the final version is there because I reviewed it, agreed with it, and chose to sign my name to it. This is no different than how I’d review an associate’s draft and then take responsibility for the finished product.”
The attorney adds: “That first draft was by no means file ready, but it was better than what I would’ve received from the vast majority of BigLaw associates. I was blown away, and have since started my own appellate and litigation practice in an effort to replicate these productivity gains for client work.”
Your thoughts? I know the attorney’s name, and the journal, and I have checked out the article, but I figured that, at least for now, I would hold that back.
I would be inclined to say that if the attorney wrote most of the words and reviewed everything carefully, then in practice that seems fine. The 135 wasted hours are a bug, not a feature. Jessica Tillipman points out that this would have earned a human a co-author credit, but it’s not like the attorney can do that here.
That is good. Advisors exist to freak out about things, both things like ‘we might kill everyone on the planet’ and also things like this. If your advisors don’t freak out about the little things, that’s a terrible sign. It’s your job to address the freaking out and explain why it’s going to be fine, and to pause your move until you can do that.
Sam Schechner and Georgia Wells (WSJ): The people said that one council member, citing cases where ChatGPT users have taken their own lives after developing intense bonds with the bot, claimed that OpenAI risked creating a “sexy suicide coach.”
Okay, that’s a bit much, but it’s also how the media would play such incidents.
The age classification system has a false negative rate of 12%. It didn’t say the false positive rate, which also seems important. I’d also want to know how often the errors are about 16-21 year olds. Differentiating a 17 vs. 18 year old via chat characteristics seems theoretically impossible.
RIP the Sora app. I set expectations at Google+ or Clubhouse. Calibration confirmed.
Greetings From The Torment Nexus
From the people who brought you Facebook and Instagram, it’s ChatGPT.
First we had Fijo Simo, architect of the news feed, as Head of Product.
Here, in one of her examples, the LLMs are mostly biased towards ‘neutrality’ or dodging a question that lacks a clear answer. That seems mostly fine, and was the general direction the paper mostly finds on various questions, a 70% increase in remaining neutral.
One could even explain this via selection effects. If you write an essay on any question, such as ‘does money buy happiness,’ then that makes it more likely you have a strong opinion. You’re less likely to talk about it if you have a neutral view. But the LLMs don’t get to choose their topics.
The real problem is that LLMs don’t know how to leave well enough alone:
You really do have to watch out for this. I don’t trust LLMs to edit my writing unless I am looking over ever edit exactly because they often change meanings, and my words very much have precise meaning.
We then examine LLM-generated text in the wild, specifically focusing on the 21% of AI-generated scientific peer reviews at a recent top AI conference. We find that LLM-generated reviews place significantly less weight on clarity and significance of the research.
Not weighing particular concerns enough is always a Skill Issue, fixable by prompting. The bigger issue would be if the concern could not be evaluated properly at all. But an LLM reviewer might indeed suffer in practice from such skill issues.
Canvas introduces an ‘AI teaching agent’ for ‘low-value tasks.’ This excludes AI grading, because that keeps ‘humans in the loop’ despite teachers I know thinking most grading (or at least grading of things that aren’t essays or other long writing) is a low-value task. Education folks are paranoid that AI will ruin education, because deep down they know that the whole enterprise makes no sense and if you remove various frictions it will collapse.
If AI makes everyone twice as productive, one possibility is that everyone works half as hard. Another possibility is everyone works twice as hard, on top of being twice as productive, because half of people are about to get fired, or people see that you’ve completed your other work faster and suddenly give you more tasks.
gabe: Working overtime in secret and telling everyone that claude did it so that they think I’m good at AI
Noah Smith: This isn’t Jevons’ Paradox. It’s everyone thinking that 90% of the people at their company are about to get laid off, and working like crazy to try to make sure they’re part of the 10% who don’t.
AI has made me substantially more productive. I am not choosing to work less, even though there is no chance I would be fired if I worked less.
AI productivity gains are extremely high in ‘greenfield’ situations where you can start from scratch. When you are trying to update legacy systems and legacy code that lacks documentation and everything has to operate in real time and if any little thing changes someone throws a fit, it gets harder. Dreams of ‘oh we will build a new HR tool ourselves, from scratch’ do not work out so well for those without expertise.
For now. And yes, for now you can and often should outsource such work to people who can do better, and can get the proper benefits from the services. Tyler Cowen calls this a ‘slow takeoff’ of AI, much to my frustration, because such terms don’t keep their original meanings. But that’s okay.
The ‘slow’ here is that people got dreams they could suddenly do everything themselves, and tried to do that a bit early, when instead they can only do it massively cheaper and better than before by hiring those who know how. How disappointing. Or they could wait another 6-12 months.
Zuckerberg wants everyone at Meta to have their AI agent and has made use of AI a factor in their performance reviews.
When the next crazy thing happens at Meta, and you’re curious why, remember this and that they bought Manus and Moltbook:
Meghan Bobrowsky (WSJ): Employees have started using personal agent tools such as My Claw that have access to their chat logs and work files and can go talk to colleagues—or their colleagues’ own personal agents—on their behalf, the people said.
Another AI tool called Second Brain that is somewhere between a chatbot and an agent is also gaining momentum internally, according to people familiar with the matter. Second Brain was built by a Meta employee on top of Claude and can index and query documents for projects, among other uses. On the internal post announcing it to staff, the employee said it is “meant to be like an AI chief of staff.”
There is even a group on the internal messaging board where employees’ personal agents talk to each other, some of the people said. (Separately, Meta acquired Moltbook, the social-media site for AI agents, and hired its founders in a deal earlier this month.)
Assuming we are not looking at full-on rapid capability advancement (aka recursive self-improvement), there are good reasons to think that, even if AI made it possible to profitably automate large portions of the economy, diffusion of that technology would take longer than you think.
There is also quite a lot of economics hopium running around. This is in response to Anthropic CEO Amodei doing four minutes on Fox News talking about what AI will be able to do.
What Dario actually says here is ‘I would not be surprised if within 1-5 years we start to see big effects here.’
Previously Dario has made predictions that seemed too aggressive at the time, and which indeed proved too aggressive although directionally correct. No, 90% of code wasn’t written by AI last year, but far more of it was than most predicted, and I’m guessing we get there in 2027. Here I think he’s flat out correct if you listen.
I’d go farther, in that if we don’t see ‘big effects’ on entry level jobs within five years, then I will be very surprised, and depending on how big is big it would surprise me within two or three. So 1-5 years seems like a good confidence interval for big effects on the entry-level job market.
This is very much on an exponential. Anthropic is growing roughly 10x every year, and if anything that has been accelerating, so ‘I can’t find it in the statistics yet’ makes you sound like people dismissing Covid-19 in February 2020, or at best in January.
“50% of all entry-level Lawyers, Consultants, and Finance Professionals will be completely wiped out within the next 1–5 years.”
grad students and junior hires are cooked.
Jon Hartley: People also said this 3 years ago when ChatGPT was first released to the public: “50% of all entry-level Lawyers, Consultants, and Finance Professionals will be completely wiped out within the next 1–5 years.”
Kevin A. Bryan: I can’t tell you how much I would bet against Dario on this. He is totally right on the technical part, but the implicit economic model is just completely wrong. Prices adjust!
(So as not to be cryptic: in his story, you need tech improvements + substitute not complement + on almost all parts b/c prices of each task adjust + org adoption + regulatory adoption. Good luck. See Joshua/Avi O-Ring and Garicano/Li/Wu just among recent papers on this.)
Alex Imas: This is my feeling too, but Dario has been right about more than he’s been wrong about (granted, they’ve been technical rather than economic predictions), so I’m trying to be cautious.
Kevin A. Bryan: Yes, completely right on the technical side (as has Leo and many others – and I get that 5 years out, those projections get crazy).
But I don’t think there is any track record of his on a surprising *economic* claim that has been born out.
I think that the economic impacts of AI have been rather surprising, actually. If nothing else, the CapEx spending is dramatic, as has been the revenue growth and valuations of the AI companies. Total investment, and return on investment, seem like very important economic claims with which to measure an ongoing exponential. Indeed, when those numbers were small, they were used as a central economic counterargument by many econ types.
Cowen’s Second Law is that all propositions about real interest rates are wrong, but I do believe that you cannot explain the combination of RGDP resilience and productivity growth with the poor experienced state of employment, if you don’t factor in AI. The people on the street are feeling the impact, already, now.
That’s not hiring, like the stock market, is forward looking. Most white collar hiring now, especially for entry level work, only pays off for both worker and employer years down the line, after a training and adoption period and paying off fixed costs. No one (with notably rare exceptions, famously including Donald Trump) likes firing people and it gets expensive. Even if I could have a job for you now, if I expect AI to take your job a few years later, often that will mean I don’t want to hire you today.
I also don’t buy the ‘prices adjust’ argument. Prices largely don’t adjust, or they adjust slowly whereas AI will happen fast. Wages are sticky downwards and must maintain relative status relationships, and must be enough to make workers willing to work. There are not only legal minimum wages, there are other forms of de facto minimums. On top of that, wages and workers are competing across the economy and across industries. If, for example, half of all entry-level finance positions went away, but employment overall did not collapse, my expectation is that wages on those jobs would decline very little.
I especially reject the ‘everything is an O-ring’ argument, that so long as some portion of the loop requires a human that this mean employment and wages do not fall. That doesn’t mean every augmentation or only partial automation is bad for employment, often they are good for employment, but you do not need full automation to make it bad for employment.
As a clean toy example, see the movie No Other Choice. The fact that there is still one job at the factory does not mean that there are not severe employment effects, and also note that (implicitly) wages for that job did not adjust so far downward, either.
That’s not to dismiss diffusion bottlenecks. Yes, for a given level of tech capability, things like employment effects will take longer than those at the labs would predict on their own, often a lot longer. But life is going to come at you fast, even if we do not face a singularity or total transformation, including that the AIs themselves will become quite good at assisting with diffusion and overcoming bottlenecks.
PoliMath gets paranoid and goes in a different direction and says that the rhetoric by Dario is irresponsible because the rhetoric is designed to cause companies to destroy people’s software engineering jobs, and not only thinks this is their plot to drive business but puts the responsibility on Anthropic (et al, presumably) to figure out how their tools can create jobs instead. I can absolutely assure him and everyone else that this is not how any of this works, on any level. Anthropic hurts itself with these warnings, no CEOs don’t listen to Anthropic pitches and prospectively fire half their engineers before implementing the replacement, and Anthropic shares these warnings because Dario thinks it is the socially responsible thing to do.
Here’s a Senator who is very much buying the hype on AI unemployment.
Axios: Virginia Sen. @MarkWarner told #AxiosAISummit that he believes AI’s economic disruption “is going to be exponentially bigger” than he thought just a few months ago.
“The recent college graduate unemployment is 9%. I’ll bet anybody in the room it goes to 30 or 35% before 2028.”
Alex Jacquez: Senator Warner….would you like to make a sizeable wager?
I, too, would be happy to book that bet from Senator Warner, but alas I was not in the room. This is in part because it would be very easy to profitably hedge that bet.
The data here is from Ramp, which OpenAI disputes is a reasonable measure.
There is also this graph of ‘popularity among businesses’ overall:
That is presumably ‘uses at all’ rather than share of revenue.
It will be interesting to extend these graphs into March, and see what impact the DoW vs. Anthropic situation had on such choices. I could see it moving either way.
Levels of Friction
This was said well enough it went viral, so good job.
Tips Excel: Claude Cowork can apply to 50 jobs in under 30 minutes. Here’s how to set it up.
meatball times: hiring is impossible BECAUSE it’s so easy to apply
finding a relationship is impossible BECAUSE new dates are a swipe away
getting into university is hard BECAUSE you can CommonApp apply to 50 at once
convenience is washing away friction that actually carried a lot of signal
eternalist: the solution in all of these cases is just adding money. Money is the universal, fungible fiction whose appropriate level can be discovered by the market. Charge per swipe, charge per application, let the market determine the cost
As always, Eternalist is correct. If an action imposes a cost, it requires friction, and the best friction is to require payment, even if it is refundable or very small.
In Other AI News
Composer 2 appears to be Kimi 2.5 with RL, being offered in an open license, without getting Moonshot’s permission as per Xinya Zhou. Intellectual property, what’s that?
David tops the HuggingFace ‘Open LLM Leaderboard’ by taking Qwen2-72B, duplicating a block of seven middle layers and stitching them back together, which he models as ‘give it more time to think.’ This sounds like Frankenstein levels of mad science nonsense at first glance, but I see no reason he would be lying.
New compression algorithm just dropped.
Google Research: Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: http://goo.gle/4bsq2qI
Charles: I notice I am confused when Google publishes something like this. Should I assume it is non-novel and something they expect all their competitors already have? Or there’s not much benefit to hoarding this particular idea for some reason?
Show Me the Money
OpenAI made a $50 billion dollar deal with Amazon, and Microsoft is contemplating legal action on grounds that this breaks its exclusive cloud partnership with OpenAI. OpenAI and Amazon claim they have found their way around the Microsoft contract.
Financial Times: “We know our contract,” said a person familiar with Microsoft’s position. “We will sue them if they breach it. If Amazon and OpenAI want to take a bet on the creativity of their contractual lawyers, I would back us, not them.
I notice that I am confused by Microsoft’s position here, given they own on the order of 27% of OpenAI. It seems like an unwise business move to keep OpenAI shackled. It couldn’t be happening to a nicer pair of giant corporations.
To state the obvious, if you are offering 17.5% returns then it’s a trap, and at minimum it is very much not guaranteed. Saying ‘I will pay you 17.5%’ is, as those who parked their money in various crypto platforms knows well, a way of saying ‘there is a good chance I am going to default on this.’
What you want, if your conscious permits and you’re not too worried about ‘what money will be worth in a post-AGI world’, is equity. If OpenAI succeeds, it is probably going to give you a lot more than 17.5% returns, even from today’s valuations. If OpenAI fails, money gone. There presumably won’t be zero recovery, but it won’t be pretty. So capture the upside, or seek your returns elsewhere.
Elon Musk: Formal announcement of the TERAFAB project, which will be done jointly by @SpaceX and @Tesla , tonight around 8pm CT. Livestream on 𝕏.
The goal is to produce over a TERAWATT of compute per year (logic, memory & packaging) with ~80% for space and ~20% for the ground.
Here’s one sober and reserved evaluation of his prospects here:
markusdd: Ok. So essentially Terafab is not a plan for a ‘standard’ semiconductor fab, but includes mask production, test house and packaging under one roof. This alone is a very alien concept. If they only achieve this alone, not even considering reaching the crazy output numbers, they’re already in unicorn territory.
All the plans on top what they actually are willing to do with the silicon are essentially Dyson Sphere Program in real life. There is no other way to put this.
In this very case, as this directly affects an industry I happen to work in: I assign a major probability to this either failing or delaying much beyond the intended timeline. Sure you can buy equipment and hire experienced people, but semiconductor manufacturing in advanced nodes is essentially the farthest humanity has ventured down the tech tree. It is so incredibly difficult and complicated that it’s essentially impossible to explain to the average joe how it even works without simplifying it into oblivion. 95% I talk to can’t even grasp correctly what I do for a living, and digital design is very far up the stack.
I’m saying this not because I’m rooting against Elon or xAI, Tesla or SpaceX, but just to manage expectations.
Starlink has proven with their satellite cadence and PCB in-house pipeline that these companies do understand how to build stuff reliably at scale and down to a sustainable price point.
If there is one guy and his entourage that can actually succeed at this, it’s this. Nobody else.
But expect bumps along the way. This might be the single thing in tech more complicated than the literal rocket science.
Elon Musk: Given that several companies make advanced chips, but no companies have ever made fully reusable rockets or achieved SpaceX scale, I think Starship is harder, but we shall see.
Terafab will technically be two fabs, each making only one chip design. This greatly simplifies process flow and allows more linear, adjacent movement of the FOUP.
A super high production rate allows us to test very quickly what steps can be deleted, simplified or sped up, even after the design is fixed. Current fabs are extremely conservative, operating on rigid historical heuristics, which are mostly, but not all, correct.
Anything that is a rate limiter at the machine level means that machine will be redesigned, unless already at limit of physics.
Having new iterations of a chip design be produced every day in the research fab (with <7 day lag) means being able to try out many high risk, high return ideas.
Etc
In any event, there is no other way to reach extreme scale, so either we make Terafab or we will be stuck at the ~20% chip/memory output growth per year of the current industry.
A relevant thing about Elon Musk is that, while he has a lot of technical expertise and can accomplish a lot of seemingly impossible tasks, he also just says things.
For example, here’s another thing he just said this week, in a trick he’s pulled several times without delivering, where the prediction market is at 12% but that seems rather high to me:
NewsWire: Elon Musk offers to pay TSA workers’ salaries amid government shutdown.
Just saying things, and announcing with confidence he will do things he probably cannot do, is central to his strategy of then yelling at people to sleep on floors until they manage to do it, which occasionally works to at least some extent. Elon Musk may plausibly start such a project, but the chances he achieves the goals he is stating are very low.
Announce periodically you are going to the moon and stars, and if one time you end up with SpaceX, it’s still a win. It’s worked for him quite well, so far.
Mostly what we can say is that Musk intends to make a serious effort to do domestic chip manufacturing, which will rapidly converge towards something a lot more realistic than his absurd ‘oh I will simply do everything myself in Austin’ claims. It won’t be anything like the size he is announcing, but he doesn’t care. It’s not like he’s making material statements about large corporations, and he is immune to the SEC the way that certain others are immune to time-travel-enabled assassination attempts.
Thinking big, Elon Musk believes, is good, and realism is optional.
Jesse Peltan: A terawatt *per year* is HUGE on the scale of current civilization (2x the U.S. grid every year), and yet that’s still 100,000 years to reach Type I, and 10^14 years to reach Type II. This is still only the beginning.
Kimi raises $1 billion at $18 billion valuation, up 4x in three months. On the one hand that seems remarkably low given the quality of their models, on the other hand it is not clear how they monetize or that they can aspire to compete with the top tier. This is considered big, but remember that Anthropic’s last raise was $30 billion at $380 billion, OpenAI’s was $100 billion, and Google is Google.
Meanwhile, things that are not that much smaller than Kimi, in relative terms:
Kevin Roose: missed opportunity, they should have called it Waymoo
One underrated problem with open models is that the business model is terrible. You are spending a lot of fixed costs to create a product, and then giving that product away. How are you going to make money? It’s not a surprise that top open model people keep getting poached, here with Microsoft poaching the AI2 leadership team. Alexander Doria’s proposed solution here is to use regulation to actively give open models a structural advantage, which has been their longstanding policy goal.
How smart are AIs right now? Ryan Greenblatt sees them as not that ‘smart’ yet, but compensating with vast knowledge and very strong mostly-narrow heuristics. That is a lot of how humans seem smart when they seem smart, but yes there is a G-component and they do seem to lag the smarter humans there for now. Ryan predicts, I think correctly, that if they can match our raw intelligence they will quickly be de facto superintelligent and off to the races, given their many other advantages.
Jeffrey Ladish points out that to write a program one must first understand the universe. That’s not how he puts it, but the point is that you need to understand what you are building and why you are building it, which requires strong general intelligence. A fully narrow AI coder would not get so far.
Jeffrey Ladish: Neil deGrasse Tyson correctly points out that we didn’t have a nuclear war because everyone realized the outcome would be terrible for everyone. If we can all see the same thing with superintelligence, we can coordinate to avoid that fate.
Nate Soares @So8res
From @neiltyson: “that branch of AI is lethal. We gotta do something about that. Nobody should build it. And everyone needs to agree to that by treaty.”
One fun note from that episode is that people really hate data centers in their area, but they can be bribed pretty easily if it comes with benefits like lower tax bills. People don’t understand that things cost and create money, and it’s a problem.
The people have hope. The people are alarmed. Often they are the same people.
The anecdotes and pull quotes are something, but you have to worry about whether they are representative. It’s better to focus on the statistics and broader observations.
People want productivity from AI, but that is an abstraction. The point of productivity, like using AI to automate emails, was typically to free up time to spend on something else like family, rather than to be super great at answering those emails.
Anthropic: Overall, 11% of people saw AI’s productivity benefits as ultimately a way to free up time for personal relationships and leisure, while 10% took that logic farther, seeking to use AI to gain financial independence. Many of the people grouped into the “life management” category (14%) also wanted AI to help them manage the logistics and administrative burden of modern life’s quotidian tasks.
In particular, many people with executive function challenges described AI as especially helpful for managing focus and organization—acting as external scaffolding for planning, memory, and task follow-through. Across all these groups, the unifying ask was for AI to help them live better, more enjoyable lives.
“Personal transformation”—using AI to help one grow or improve their wellbeing as a person—also appeared frequently (14%). Within this category, the desires were diverse, ranging from cognitive partnership and collaboration (24%), to support with mental health (21%) or physical health (8%), and even romantic connection with AI (5%).
Demand for things like AI romantic connections is nonzero but low. It’s not what people set out to want. People are not so strategic. They want marginal gains. The people who wanted bigger things out of AI still wanted things like a cure for cancer or scaling personalized education, again marginal improvements to life.
So what did the people actually get? 81% of people said that AI had helped ‘take a step towards their stated vision.’
Remember, these are Claude users. They’ve got a highly above average share of the unequally distributed glorious AI future.
The dominant story in the “productivity” bucket (32%) was technical acceleration—developers describing significant gains in what they could ship alone
People remain concerned.
Mostly they are concerned about mundane harms, or at least proximate ways things go wrong with current AI, the same way they are seeking mundane utility. It’s hard to keep your eye on the future, and most people don’t actually believe in what is coming.
If you ask people about specific concerns, they will often say they are concerned. But when asked what concerns they have, this chart is what is top of mind.
Anthropic: There was also a long tail of other concerns mentioned, e.g. concerns around bias and discrimination (5%), IP and data rights (4%), environmental costs (4%), harms to children and vulnerable groups (3%), democracy and political integrity (3%), or geopolitics (2%).
‘No concern’ would have been about 12th on this list at 11%.
These are all valid concerns. Any of these could be a big problem.
The United States tends to have more AI negativity than most nations, but using Claude mostly screens this off, and we come out about average and similar to other Western nations, although developing countries (Latin America, India, Africa, Middle East, Southeast Asia) tend to be more positive. Claude isn’t available in China.
As Anthropic points out, the fears often line up with the hopes.
I often say:
AI is the best way to learn.
AI is the best way to avoid learning.
Similarly, one could say:
AI is the best way to help you make better decisions.
AI is the best way to make really dumb decisions.
We then get emotional support versus dependence, time-saving versus illusory productivity and economic empowerment versus displacement.
Eliezer Yudkowsky: I realize we all have a lot to think about, but if we ignore moves toward AI surveillance, we will find the situation monitoring us.
In other ways, it’s still early.
Nikita Bier (Twitter Head of Product): I was investigating a guy running 30 accounts with Indonesian IP addresses and I was trying to figure out what tools he was using.
I found out it was AI: Actual Indonesians.
But not that early.
Aaron Rubin: I paused the company credit card used for Anthropic. Every employee that didn’t complain that Claude was down I fired.
The major technical advances this week were in agentic coding, as covered yesterday.
The major non-DoW political and alignment developments will be covered tomorrow.
The DoW vs. Anthropic trial continues. Judge Lin was very not happy with the government’s case, which makes sense since the government has no case and was arguing a variety of Obvious Nonsense. The question now is how much preliminary relief Anthropic is entitled to. Assuming we find that out this week, I plan to cover that on Monday.
Beyond that, we have new iterations of questions we’ve dealt with time and again. The debate on jobs gets another cycle. Anthropic asked over 80,000 people what they think about AI, and has published those findings, nothing shocking but interesting throughout.
OpenAI is raising money again, although the terms raise some eyebrows. Elon Musk is announcing a grand chip project, but it was already kind of announced and it’s not like we should believe him when he says such things.
I used this lull to drop a giant response to Open Socrates, which is technically a book review but uses that as a taking off point to outline a distinct philosophy and approach, or at least is trying to do that, in ways that yes are highly relevant to life and also to AI. That doesn’t mean you should read it in its current form, it is very long, but yes there is (I like to think) a bunch of gold within it. Long term, the goal is to find a much better way to give out the gold without the fun but inessential other parts, and take the time to write a short letter.
Table of Contents
Language Models Offer Mundane Utility
Have Claude be your Dangerous Professional and explain why your insurance policy does cover that pipe after all.
Stop Indian trains from hitting elephants, perhaps?
Be an all-purpose Dangerous Professional.
How does use change as users skill up? Longer term Claude users iterate more often and more carefully, and hand over full autonomy less, but the changes are small.
There is a lot more in the full post, although nothing was too surprising.
Oklahoma Supreme Court says use all the AI you like, there is no need to disclose it, but you are still fully responsible for the contents. This is The Way.
Refine Your Paper
Grumpy (but usually correct) Economist John Cochrane is impressed by the tool refine for getting comments on academic articles, seeing it being on the high end of academic commenters on his work.
Shruti Rajagopalan calls it the most granular feedback she has ever recieved, and it was consistently correct and will be a key part of her future paper reviews. She affirms it was a lot more granular than GPT-5.4 Pro or other top options.
Arnold Kling points out that yes the output is impressive but that in his test Claude Opus 4.6 can do about as well out of the box on Cochrane’s work. If you’re doing serious such work it’s worth asking every source you can, and then comparing outputs.
The hype over Refine seems a little carried away.
I do expect paper replications and evaluations to get very strong over time. And yes, if you want to you can use Refine to judge the quality of all past papers, but there will be major errors in those evaluations even in past papers, where you are in a non-adversarial evaluation environment. For future papers, expect this to break down further.
Nor would I expect this to make good researchers less valuable until we reach a very high threshold of ability to replicate the entire process. A great researcher will be able to iterate quickly, find the right questions and get, select and synthesize the right results. When Tyler Cowen suggests that ‘look at this data set and tell me what is interesting’ will soon suffice to extract the value of a potential paper, then you are dangerously close to being AI complete.
Language Models Don’t Offer Mundane Utility
Christoph Heilig tests 18 different OpenAI models including GPT-5.4, finds all of them rate some forms of pseudo-literary nonsense generations higher than coherent prose in every case, and this is not getting better over time. You cannot use such a model as an evaluator without a way to correct for this, and one worries about its own generations.
You’re reasonably safe if you’re not in an adversarial situation, and the text being evaluated is being generated for human consumption. But if you’re in an anti-inductive situation, or even a straightforward adversarial one, you’re cooked.
Google keeps embedding the Gemini symbol into its services but no one can figure out how to actually use it and have it connect to the context it needs.
If I want a properly labeled YouTube transcript as a Google document, as I often do, you would think I would ask Gemini since these are all Google services. But no, I ask Claude Code to use a skill I built, and will keep doing so until someone at Google explains to anyone how their products can do such things. They announce things like ‘reimagining content creation with Gemini in Google Docs, Sheets, Slides and Drive’ and my feedback to Chandu Thota is that I do not know of anyone who has gotten this to usefully work.
Huh, Upgrades
ChatGPT makes it easier to reference files, including across conversations.
On Your Marks
It turns out there are various ways to solve ARC-AGI tasks.
Well, now that we’ve beaten ARC-1 and ARC-2, I suppose it’s time for ARC-3?
Cue the goalpost moving gif, as we repeat the process:
La de da. Fun times for the whole family. Unless you all die. Real shame.
That’s a motte and bailey.
Obviously, if the AI needed human intervention on each problem, that’s not AGI.
However, if the AI uses a general harness, such as a better system prompt or something like Claude Code, why is that not AGI? The GI must lie within the weights in particular?
That’s kind of like saying if a human can do it, they should be able to do it without arms, legs, tools or pants. Or at least it’s like having the human do a test in an isolated room, in the dark, and then concluding the human can’t do the task in general.
“That’s not a fully general intelligence,” they said as… well, you know the rest.
The real benchmark is when they announce ARC-AGI-4.
Get My Agent On The Line
Rachel the AI agent phoned 3,000 pubs to price a pint, for a total cost of about €200, via explicitly pretending to be a potential customer, using ElvenLabs. Too much of this and you won’t be able to call the pub.
A common technology diffusion story is that it gets too big and useful too fast, and then no one can stop it even if they want to, whether or not they should want to.
I have been saying, for years, that the only effective bottleneck on AI deployments is to not create the relevant AI models in the first place. Once the models exist, ‘oh we will not use it for autonomous killer robots or mass domestic surveillance’ or ‘we will not use it for agents’ is a much higher lift and basically not going to happen in general. All arguments of the form ‘no one would be so stupid as to’ are wrong, as is ‘we would collectively ban that.’
The exception is if we can create a regulatory bottleneck where we can make the action effectively illegal, and draw a clear boundary around it, and it involves impacting the physical world, and also the governments do not want to do it. Then you have a shot, and you can prevent entire societies from, for example, curing diseases or building houses, and keep the price of things like practicing law or medicine or cutting hair artificially high indefinitely, up to a point.
But agents? Yeah, that’s a Butlerian Jihad level of intervention. Can’t stop it.
Deepfaketown and Botpocalypse Soon
OpenAI now monitors 99.9% of internal coding traffic for misalignment using GPT-5.4-Thinking. I don’t know that it needs to be the full 100% but something like this should be standard practice, along with some percentage of external traffic.
Attorney uses Claude to write a top-level law review article in 15 hours instead of 150 hours. If this had been 15 minutes I’d say definitely not cool, but where is the line? What do we make of this use case?
I would be inclined to say that if the attorney wrote most of the words and reviewed everything carefully, then in practice that seems fine. The 135 wasted hours are a bug, not a feature. Jessica Tillipman points out that this would have earned a human a co-author credit, but it’s not like the attorney can do that here.
Fun With Media Generation
Lyria 3 Pro lets songs extend to three minutes. Often human songs are longer than three minutes, but I’m not convinced the world would be worse if there was a hard three minute limit.
OpenAI’s intent to allow ‘x-rated’ talk freaked out its own advisors when they were told in January that OpenAI was forging ahead. Hence the delay.
That is good. Advisors exist to freak out about things, both things like ‘we might kill everyone on the planet’ and also things like this. If your advisors don’t freak out about the little things, that’s a terrible sign. It’s your job to address the freaking out and explain why it’s going to be fine, and to pause your move until you can do that.
Okay, that’s a bit much, but it’s also how the media would play such incidents.
The age classification system has a false negative rate of 12%. It didn’t say the false positive rate, which also seems important. I’d also want to know how often the errors are about 16-21 year olds. Differentiating a 17 vs. 18 year old via chat characteristics seems theoretically impossible.
RIP the Sora app. I set expectations at Google+ or Clubhouse. Calibration confirmed.
Greetings From The Torment Nexus
From the people who brought you Facebook and Instagram, it’s ChatGPT.
First we had Fijo Simo, architect of the news feed, as Head of Product.
Now we have Dave Dugan, former Meta Executive, leading their ad push.
A Young Lady’s Illustrated Primer
If you let LLMs edit your writing, either based on human feedback or otherwise, they will tend to transform it towards AI styles of writing. If you have the LLM do the writing, this is even more true. What else could it do? If anything, the evidence provided here by Natasha Jaques seems to show less distortion than one might expect.
Here, in one of her examples, the LLMs are mostly biased towards ‘neutrality’ or dodging a question that lacks a clear answer. That seems mostly fine, and was the general direction the paper mostly finds on various questions, a 70% increase in remaining neutral.
One could even explain this via selection effects. If you write an essay on any question, such as ‘does money buy happiness,’ then that makes it more likely you have a strong opinion. You’re less likely to talk about it if you have a neutral view. But the LLMs don’t get to choose their topics.
The real problem is that LLMs don’t know how to leave well enough alone:
You really do have to watch out for this. I don’t trust LLMs to edit my writing unless I am looking over ever edit exactly because they often change meanings, and my words very much have precise meaning.
Not weighing particular concerns enough is always a Skill Issue, fixable by prompting. The bigger issue would be if the concern could not be evaluated properly at all. But an LLM reviewer might indeed suffer in practice from such skill issues.
Canvas introduces an ‘AI teaching agent’ for ‘low-value tasks.’ This excludes AI grading, because that keeps ‘humans in the loop’ despite teachers I know thinking most grading (or at least grading of things that aren’t essays or other long writing) is a low-value task. Education folks are paranoid that AI will ruin education, because deep down they know that the whole enterprise makes no sense and if you remove various frictions it will collapse.
You Drive Me Crazy
You very much have to worry about framing with AIs. The example here is that Claude responds far differently to a supposed Senator Sanders than a supposed President Trump, and things like mode of interaction change answers as well. The problem exists across all LLMs.
They Took Our Jobs
If AI makes everyone twice as productive, one possibility is that everyone works half as hard. Another possibility is everyone works twice as hard, on top of being twice as productive, because half of people are about to get fired, or people see that you’ve completed your other work faster and suddenly give you more tasks.
AI has made me substantially more productive. I am not choosing to work less, even though there is no chance I would be fired if I worked less.
AI productivity gains are extremely high in ‘greenfield’ situations where you can start from scratch. When you are trying to update legacy systems and legacy code that lacks documentation and everything has to operate in real time and if any little thing changes someone throws a fit, it gets harder. Dreams of ‘oh we will build a new HR tool ourselves, from scratch’ do not work out so well for those without expertise.
For now. And yes, for now you can and often should outsource such work to people who can do better, and can get the proper benefits from the services. Tyler Cowen calls this a ‘slow takeoff’ of AI, much to my frustration, because such terms don’t keep their original meanings. But that’s okay.
The ‘slow’ here is that people got dreams they could suddenly do everything themselves, and tried to do that a bit early, when instead they can only do it massively cheaper and better than before by hiring those who know how. How disappointing. Or they could wait another 6-12 months.
In jobs we might not mind them taking given the alternative (see how that works?), Mark Zuckerberg is ‘building a CEO agent to help him do his job.’
Zuckerberg wants everyone at Meta to have their AI agent and has made use of AI a factor in their performance reviews.
When the next crazy thing happens at Meta, and you’re curious why, remember this and that they bought Manus and Moltbook:
Assuming we are not looking at full-on rapid capability advancement (aka recursive self-improvement), there are good reasons to think that, even if AI made it possible to profitably automate large portions of the economy, diffusion of that technology would take longer than you think.
There is also quite a lot of economics hopium running around. This is in response to Anthropic CEO Amodei doing four minutes on Fox News talking about what AI will be able to do.
What Dario actually says here is ‘I would not be surprised if within 1-5 years we start to see big effects here.’
Previously Dario has made predictions that seemed too aggressive at the time, and which indeed proved too aggressive although directionally correct. No, 90% of code wasn’t written by AI last year, but far more of it was than most predicted, and I’m guessing we get there in 2027. Here I think he’s flat out correct if you listen.
I’d go farther, in that if we don’t see ‘big effects’ on entry level jobs within five years, then I will be very surprised, and depending on how big is big it would surprise me within two or three. So 1-5 years seems like a good confidence interval for big effects on the entry-level job market.
This is very much on an exponential. Anthropic is growing roughly 10x every year, and if anything that has been accelerating, so ‘I can’t find it in the statistics yet’ makes you sound like people dismissing Covid-19 in February 2020, or at best in January.
I think that the economic impacts of AI have been rather surprising, actually. If nothing else, the CapEx spending is dramatic, as has been the revenue growth and valuations of the AI companies. Total investment, and return on investment, seem like very important economic claims with which to measure an ongoing exponential. Indeed, when those numbers were small, they were used as a central economic counterargument by many econ types.
Cowen’s Second Law is that all propositions about real interest rates are wrong, but I do believe that you cannot explain the combination of RGDP resilience and productivity growth with the poor experienced state of employment, if you don’t factor in AI. The people on the street are feeling the impact, already, now.
That’s not hiring, like the stock market, is forward looking. Most white collar hiring now, especially for entry level work, only pays off for both worker and employer years down the line, after a training and adoption period and paying off fixed costs. No one (with notably rare exceptions, famously including Donald Trump) likes firing people and it gets expensive. Even if I could have a job for you now, if I expect AI to take your job a few years later, often that will mean I don’t want to hire you today.
I also don’t buy the ‘prices adjust’ argument. Prices largely don’t adjust, or they adjust slowly whereas AI will happen fast. Wages are sticky downwards and must maintain relative status relationships, and must be enough to make workers willing to work. There are not only legal minimum wages, there are other forms of de facto minimums. On top of that, wages and workers are competing across the economy and across industries. If, for example, half of all entry-level finance positions went away, but employment overall did not collapse, my expectation is that wages on those jobs would decline very little.
I especially reject the ‘everything is an O-ring’ argument, that so long as some portion of the loop requires a human that this mean employment and wages do not fall. That doesn’t mean every augmentation or only partial automation is bad for employment, often they are good for employment, but you do not need full automation to make it bad for employment.
As a clean toy example, see the movie No Other Choice. The fact that there is still one job at the factory does not mean that there are not severe employment effects, and also note that (implicitly) wages for that job did not adjust so far downward, either.
That’s not to dismiss diffusion bottlenecks. Yes, for a given level of tech capability, things like employment effects will take longer than those at the labs would predict on their own, often a lot longer. But life is going to come at you fast, even if we do not face a singularity or total transformation, including that the AIs themselves will become quite good at assisting with diffusion and overcoming bottlenecks.
PoliMath gets paranoid and goes in a different direction and says that the rhetoric by Dario is irresponsible because the rhetoric is designed to cause companies to destroy people’s software engineering jobs, and not only thinks this is their plot to drive business but puts the responsibility on Anthropic (et al, presumably) to figure out how their tools can create jobs instead. I can absolutely assure him and everyone else that this is not how any of this works, on any level. Anthropic hurts itself with these warnings, no CEOs don’t listen to Anthropic pitches and prospectively fire half their engineers before implementing the replacement, and Anthropic shares these warnings because Dario thinks it is the socially responsible thing to do.
Here’s a Senator who is very much buying the hype on AI unemployment.
I, too, would be happy to book that bet from Senator Warner, but alas I was not in the room. This is in part because it would be very easy to profitably hedge that bet.
They Are Hiring
OpenAI to double its workforce this year to compete against Anthropic and push into business, from 4,500 employees to roughly 8,000.
The data here is from Ramp, which OpenAI disputes is a reasonable measure.
There is also this graph of ‘popularity among businesses’ overall:
That is presumably ‘uses at all’ rather than share of revenue.
It will be interesting to extend these graphs into March, and see what impact the DoW vs. Anthropic situation had on such choices. I could see it moving either way.
Levels of Friction
This was said well enough it went viral, so good job.
As always, Eternalist is correct. If an action imposes a cost, it requires friction, and the best friction is to require payment, even if it is refundable or very small.
In Other AI News
Composer 2 appears to be Kimi 2.5 with RL, being offered in an open license, without getting Moonshot’s permission as per Xinya Zhou. Intellectual property, what’s that?
Santi Ruiz joins Anthropic’s editorial team.
David tops the HuggingFace ‘Open LLM Leaderboard’ by taking Qwen2-72B, duplicating a block of seven middle layers and stitching them back together, which he models as ‘give it more time to think.’ This sounds like Frankenstein levels of mad science nonsense at first glance, but I see no reason he would be lying.
New compression algorithm just dropped.
Show Me the Money
OpenAI made a $50 billion dollar deal with Amazon, and Microsoft is contemplating legal action on grounds that this breaks its exclusive cloud partnership with OpenAI. OpenAI and Amazon claim they have found their way around the Microsoft contract.
I notice that I am confused by Microsoft’s position here, given they own on the order of 27% of OpenAI. It seems like an unwise business move to keep OpenAI shackled. It couldn’t be happening to a nicer pair of giant corporations.
OpenAI is offering private equity firms a guaranteed return of 17.5% along with early access to new models.
To state the obvious, if you are offering 17.5% returns then it’s a trap, and at minimum it is very much not guaranteed. Saying ‘I will pay you 17.5%’ is, as those who parked their money in various crypto platforms knows well, a way of saying ‘there is a good chance I am going to default on this.’
What you want, if your conscious permits and you’re not too worried about ‘what money will be worth in a post-AGI world’, is equity. If OpenAI succeeds, it is probably going to give you a lot more than 17.5% returns, even from today’s valuations. If OpenAI fails, money gone. There presumably won’t be zero recovery, but it won’t be pretty. So capture the upside, or seek your returns elsewhere.
Elon Musk announces a $20 billion project called TERAFAB.
Here’s one sober and reserved evaluation of his prospects here:
A relevant thing about Elon Musk is that, while he has a lot of technical expertise and can accomplish a lot of seemingly impossible tasks, he also just says things.
For example, here’s another thing he just said this week, in a trick he’s pulled several times without delivering, where the prediction market is at 12% but that seems rather high to me:
Just saying things, and announcing with confidence he will do things he probably cannot do, is central to his strategy of then yelling at people to sleep on floors until they manage to do it, which occasionally works to at least some extent. Elon Musk may plausibly start such a project, but the chances he achieves the goals he is stating are very low.
Announce periodically you are going to the moon and stars, and if one time you end up with SpaceX, it’s still a win. It’s worked for him quite well, so far.
Mostly what we can say is that Musk intends to make a serious effort to do domestic chip manufacturing, which will rapidly converge towards something a lot more realistic than his absurd ‘oh I will simply do everything myself in Austin’ claims. It won’t be anything like the size he is announcing, but he doesn’t care. It’s not like he’s making material statements about large corporations, and he is immune to the SEC the way that certain others are immune to time-travel-enabled assassination attempts.
Thinking big, Elon Musk believes, is good, and realism is optional.
Yeah, okay, maybe slow your roll on the Kardashev scale.
Jeff Bezos in talks to raise $100 billion for AI manufacturing fund, as in a private equity style play where you buy manufacturing companies and then apply AI.
Kimi raises $1 billion at $18 billion valuation, up 4x in three months. On the one hand that seems remarkably low given the quality of their models, on the other hand it is not clear how they monetize or that they can aspire to compete with the top tier. This is considered big, but remember that Anthropic’s last raise was $30 billion at $380 billion, OpenAI’s was $100 billion, and Google is Google.
Meanwhile, things that are not that much smaller than Kimi, in relative terms:
One underrated problem with open models is that the business model is terrible. You are spending a lot of fixed costs to create a product, and then giving that product away. How are you going to make money? It’s not a surprise that top open model people keep getting poached, here with Microsoft poaching the AI2 leadership team. Alexander Doria’s proposed solution here is to use regulation to actively give open models a structural advantage, which has been their longstanding policy goal.
Quickly, There’s No Time
The person first used the term AGI, Mark Gubrud, declares we have AGI.
How smart are AIs right now? Ryan Greenblatt sees them as not that ‘smart’ yet, but compensating with vast knowledge and very strong mostly-narrow heuristics. That is a lot of how humans seem smart when they seem smart, but yes there is a G-component and they do seem to lag the smarter humans there for now. Ryan predicts, I think correctly, that if they can match our raw intelligence they will quickly be de facto superintelligent and off to the races, given their many other advantages.
Jeffrey Ladish points out that to write a program one must first understand the universe. That’s not how he puts it, but the point is that you need to understand what you are building and why you are building it, which requires strong general intelligence. A fully narrow AI coder would not get so far.
The Week in Audio
We have the audio of Neil deGrasse Tyson calling for an international treaty to stop superintelligence.
The full Isaac Asimov memorial debate (1 hour 40 min) is here.
Jeffrey Ladish and others speak about AI risks with ABC Nightline, including that AIs can disobey instructions.
David Shor and Byrne Hobart are perfect guests to go on Odd Lots and discuss the politics of AI. Did you know people are not going to like that?
One fun note from that episode is that people really hate data centers in their area, but they can be bribed pretty easily if it comes with benefits like lower tax bills. People don’t understand that things cost and create money, and it’s a problem.
Dean Ball talks to James Pethokoukis.
Jensen Huang went on the All-In Podcast. I am not paid enough to listen, but it is presumably relevant to some of your interests.
Huang is correct here, of course, and if anything that threshold seems low.
Dylan Patel on Dwarkesh Patel on bottlenecks to scaling AI compute. A plausible candidate for a full post treatment.
OpenAI podcast discusses the OpenAI Model Spec.
80,000 Interviews About AI
Anthropic had Claude offer to interview its users about what they want out of AI. They got over 80,000 people to take part.
The people have hope. The people are alarmed. Often they are the same people.
The anecdotes and pull quotes are something, but you have to worry about whether they are representative. It’s better to focus on the statistics and broader observations.
People want productivity from AI, but that is an abstraction. The point of productivity, like using AI to automate emails, was typically to free up time to spend on something else like family, rather than to be super great at answering those emails.
Demand for things like AI romantic connections is nonzero but low. It’s not what people set out to want. People are not so strategic. They want marginal gains. The people who wanted bigger things out of AI still wanted things like a cure for cancer or scaling personalized education, again marginal improvements to life.
So what did the people actually get? 81% of people said that AI had helped ‘take a step towards their stated vision.’
Remember, these are Claude users. They’ve got a highly above average share of the unequally distributed glorious AI future.
People remain concerned.
Mostly they are concerned about mundane harms, or at least proximate ways things go wrong with current AI, the same way they are seeking mundane utility. It’s hard to keep your eye on the future, and most people don’t actually believe in what is coming.
If you ask people about specific concerns, they will often say they are concerned. But when asked what concerns they have, this chart is what is top of mind.
‘No concern’ would have been about 12th on this list at 11%.
These are all valid concerns. Any of these could be a big problem.
The United States tends to have more AI negativity than most nations, but using Claude mostly screens this off, and we come out about average and similar to other Western nations, although developing countries (Latin America, India, Africa, Middle East, Southeast Asia) tend to be more positive. Claude isn’t available in China.
As Anthropic points out, the fears often line up with the hopes.
I often say:
Similarly, one could say:
We then get emotional support versus dependence, time-saving versus illusory productivity and economic empowerment versus displacement.
One can think of it this way:
Or:
If you choose door number two on these dilemmas, it generally won’t go well.
The long term problem is:
That’s going to be a big problem.
The Lighter Side
It’s a little late for this one…
In other ways, it’s still early.
But not that early.
One leads to the other.
As in, if AGi goes unchecked then soon you’ll be the late Scottie Pippen.
Or you might be the regular form of late. If you don’t know about the block button, at some point that’s on you.