No. o3 estimates that 60% of American jobs are physical such that you would need robotics to automate them, so if half of those fell within a year, that’s quite a lot.
A lot of jobs that can't be fully automated have sub-tasks software agents could eliminate. >30% of total labor hours might be spent in front of a computer (EG:data entry in a testing lab and all the steps needed to generate report.) That ignores email and the time savings once there is a good enough AI secretary.
AGI could eliminate almost all of that.
I'd estimate 1.7x productivity for a lab I worked at previously. Effect on employment depends on demand elasticity of course.
“White House may ease restrictions on selling AI chips to UAE.” and “UAE steps back from building homegrown AI models” seem to contradict each other.
I'd heard about the AI in UAE K-12 classrooms already before reading about it here.
As mentioned in the main body of the post, here's the high-quality, full-cast podcast version of this piece—produced using ElevenLabs voices, with each quoted individual getting their own consistent voice for maximum clarity and ease of listening.
If you find value in this kind of audio adaptation of Don't Worry About the Vase, I’d love for you to consider supporting the cost of production by subscribing—after, of course, you’ve subscribed to Zvi. The ElevenLabs voices make for a highly listenable experience, especially over long sessions, but they do come at a significant cost—particularly for longer posts like this one.
https://open.substack.com/pub/dwatvpodcast/p/ai-115-the-evil-applications-division
Table of Contents
Also covered this week: OpenAI Claims Nonprofit Will Retain Nominal Control, Zuckerberg’s Dystopian AI Vision, GPT-4o Sycophancy Post Mortem, OpenAI Preparedness Framework 2.0. Not included: Gemini 2.5 Pro got an upgrade, recent discussion of students using AI to ‘cheat’ on assignments, full coverage of MIRI’s AI Governance to Avoid Extinction.Language Models Offer Mundane Utility
Use a lightweight version of Grok as the Twitter recommendation algorithm? No way, you’re kidding, he didn’t just say what I think he did, did he? I mean, super cool if he figures out the right implementation, but I am highly skeptical that happens. State Bar of California used AI to help draft its 2025 bar exam. Why not, indeed? Make the right play, eventually. I’d only be tempted to allow this given that o3 isn’t going to be that good at it. I wouldn’t let someone use a real solver at the table, that would destroy the game. And if they did this all the time, the delays would be unacceptable. But if someone wants to do this every now and then, I am guessing allowing this adds to your alpha. Remember, it’s all about table selection. Yeah, definitely ngmi, sorry. Actually, in this context, I think the doctor is right, if you actually look at the screen. He’s not actually using GPT-4o to figure out what to do. That’s crazy talk, you use o3. What he’s doing is translating the actual situations into medical note speak. In that case, sure, 4o should be fine, and it’s faster. AI is only up to ~25% of code written inside Microsoft, Zuckerberg reiterates his expectation of ~50% within a year and seems to have a weird fetish that only Llama should be used to write Llama. But okay, let’s not get carried away:Language Models Don’t Offer Mundane Utility
Humans are still cheaper than AIs at any given task if you don’t have to pay them, and also can sort physical mail and put things into binders. A common misconception, easy mistake to make… No. We are saying that in the future it is going to be a superpersuader shoggoth with alien values who might take over the world and kill everyone. But that’s a different AI, and that’s in the future. For now, it’s only a largely you-directed potentially-persuader shoggoth with subtly alien and distorted values that might be a lying liar or an absurd sycophant, but you’re keeping up with which ones are which, right? As opposed to the human therapist, who is a less you-directed persuader semi-shoggoth with alien and distorted (e.g. professional psychiatric mixed with trying to make money off you) values, that might be a lying liar or an absurd sycophant and so on, but without any way to track which ones are which, and that is charging you a lot more per hour and has to be seen on a fixed schedule. The choice is not that clear. To be fair, the human can also give you SSRIs and a benzo. Well, yes, fair, there is that. They’re not safe now exactly and might be a lot less safe than we know, and no I’m not using them for therapy either, thank you. But you make do with what you have, and balance risks and benefits in all things. Patrick McKenzie is not one to be frustrated by interfaces and menu flows, and he is being quite grumpy about Amazon’s order lost in shipment AI-powered menus and how they tried to keep him away from talking to a human. Why are all the major AI offerings so similar? Presumably because they are giving the people what they want, and once someone proves one of the innovations is good the others copy it, and also they’re not product companies so they’re letting others build on top of it? Deep Research, reasoning models and inference scaling are relatively new modes that then got copied. It’s not that no one tries anything new, it’s that the marginal cost of copying such modes is low. They’re also building command line coding engines (see Claude Code, and OpenAI’s version), integrating into IDEs, building tool integrations and towards agents, and so on. The true objection from Janus as I understand it is not that they’re building the wrong products, but that they’re treating AIs as products in the first place. And yeah, they’re going to do that. Parmy Olson asks, are you addicted to ChatGPT (or Gemini or Claude)? She warns people are becoming ‘overly reliant’ on it, citing this nature paper on AI addiction from September 2024. I do buy that this is a thing that happens to some users, that they outsource too much to the AI. It all comes down to how you use it. If you use AI to help you think and work and understand better, that’s what will happen. If you use AI to avoid thinking and working and understanding what is going on, that won’t go well. If you conclude that the AI’s response is always better than yours, it’s very tempting to do the second one. Notice that a few years from now, for most digital jobs the AI’s response really will always (in expectation) be better than yours. As in, at that point if the AI has the required context and you think the AI is wrong, it’s probably you that is wrong. We could potentially see three distinct classes of worker emerge in the near future:- Those who master AI and use AI to become stronger.
- Those who turn everything over to AI and become weaker.
- Those who try not to use AI and get crushed by the first two categories.
It’s not so obvious that any given person should go with option #1, or for how long. Another failure mode of AI writing is when it screams ‘this is AI writing’ and the person thinks this is bad, actually. I see what you did there. It’s not that hard to do or describe if you listen for the vibes. The way I’d describe it is it feels… off. Soulless. It doesn’t have to be that way. The Janus-style AI talk is in this context a secret third thing, very distinct from both alternatives. And for most purposes, AI leaving this signature is actively a good thing, so you can read and respond accordingly. Claude (totally unprompted) explains its face blindness. We need to get over this refusal to admit that it knows who even very public figures are, it is dumb.Take a Wild Geoguessr
Scott Alexander puts o3’s GeoGuessr skills to the test. We’re not quite at ‘any picture taken outside is giving away your exact location’ but we’re not all that far from it either. The important thing to realize is if AI can do this, it can do a lot of other things that would seem implausible until it does them, and also that a good prompt can give it a big boost. There is then a ‘highlights from the comments’ post. One emphasized theme is that human GeoGuessr skills seem insane too, another testament to Teller’s observation that often magic is the result of putting way more effort into something than any sane person would. An insane amount of effort is indistinguishable from magic. What can AI reliably do on any problem? Put in an insane amount of effort. Even if the best AI can do is (for a remarkably low price) imitate a human putting in insane amounts of effort into any given problem, that’s going to give you insane results that look to us like magic. There are benchmarks, such as GeoBench and DeepGuessr. GeoBench thinks the top AI, Gemini 2.5 Pro, is very slightly behind human professional level. Seb Krier reminds us that Geoguessr is a special case of AIs having truesight. It is almost impossible to hide from even ‘mundane’ truesight, from the ability to fully take into account all the little details. Imagine Sherlock Holmes, with limitless time on his hands and access to all the publicly available data, everywhere and for everything, and he’s as much better at his job as the original Sherlock’s edge over you. If a detailed analysis could find it, even if we’re talking what would previously have been a PhD thesis? AI will be able to find it. I am obviously not afraid of getting doxxed, but there are plenty of things I choose not to say. It’s not that hard to figure out what many of them are, if you care enough. There’s a hole in the document, as it were. There’s going to be adjustments. I wonder how people will react to various forms of ‘they never said it, and there’s nothing that would have held up in a 2024 court, but AI is confident this person clearly believes [X] or did [Y].’ The smart glasses of 2028 are perhaps going to tell you quite a lot more about what is happening around you than you might think, if only purely from things like tone of voice, eye movements and body language. It’s going to be wild. Sam Altman calls the Geoguessr effectiveness one of his ‘helicopter moments.’ I’m confused why, this shouldn’t have been a surprising effect, and I’d urge him to update on the fully generalized conclusion, and on the fact that this took him by surprise. I realize this wasn’t the meaning he intended, but in Altman’s honor and since it is indeed a better meaning, from now on I will write the joke as God helpfully having sent us ‘[X] boats and two helicopters’ to try and rescue us.Write On
David Duncan attempts to coin new terms for the various ways in which messages could be partially written by AIs. I definitely enjoyed the ride, so consider reading. His suggestions, all with a clear And That’s Terrible attached:- Chatjacked: AI-enhanced formalism hijacking a human conversation.
- Praste: Copy-pasting AI output verbatim without editing, thinking or even reading.
- Prompt Pong: Having an AI write the response to their message.
- AI’m a Writer Now: Using AI to have a non-writer suddenly drop five-part essays.
- Promptosis: Offloading your thinking and idea generation onto the AI.
- Subpromptual Analysis: Trying to reverse engineer someone’s prompt.
- GPTMI: Use of too much information detail, raising suspicion.
- Chatcident: Whoops, you posted the prompt.
- GPTune: Using AI to smooth out your writing, taking all the life out.
- Syntherity: Using AI to simulate fake emotional language that falls flat.
I can see a few of these catching on. Certainly we will need new words. But, all the jokes aside, at core: Why so serious? AI is only failure modes when you do it wrong.Get My Agent On The Line
Do you mainly have AI agents replace human tasks that would have happened anyway, or do you mainly do newly practical tasks on top of previous tasks? If you want it done right, for now you have to do it yourself. For now. If it’s valuable enough you’d do it anyway, the AI can do some of those things, and especially can streamline various simple subcomponents. But for now the AI agents mostly aren’t reliable enough to trust with such actions outside of narrow domains like coding. You’d have to check it all and at that point you might as well do it yourself. But, if you want it done at all and that’s way better than the nothing you would do instead? Let’s talk. Then, with the experience gained from doing the extra tasks, you can learn over time how to sufficiently reliably do tasks you’d be doing anyway.We’re In Deep Research
Anthropic joins the deep research club in earnest this week, and also adds more integrations. First off, Integrations: I’m not sure what the right amount of nervousness should be around using Stripe or PayPal here, but it sure as hell is not zero or epsilon. Proceed with caution, across the board, start small and so on. What Claude calls ‘advanced’ research lets it work to compile reports for up to 45 minutes. As of my writing this both features still require a Max subscription, which I don’t otherwise have need of at the moment, so for this and other reasons I’m going to let others try these features out first. But yes, I’m definitely excited by where it can go, especially once Claude 4.0 comes around. Peter Wildeford says that OpenAI’s Deep Research is now only his third favorite Deep Research tool, and also o3 + search is better than OpenAI’s DR too. I agree that for almost most purposes you would use o3 over OAI DR.Be The Best Like No One Ever Was
Gemini has defeated Pokemon Blue, an entirely expected event given previous progress. As I noted before, there were no major obstacles remaining. Gemini and Claude had different Pokemon-playing scaffolding. I have little doubt that with a similarly strong scaffold, Claude 3.7 Sonnet could also beat Pokemon Blue.Huh, Upgrades
MidJourney gives us Omni Reference: Any character, any scene, very consistent. It’s such a flashback to see the MidJourney-style prompts discussed again. MidJourney gives you a lot more control, but at the cost of having to know what you are doing. Gemini 2.0 Image Generation has been upgraded, higher quality, $0.039 per image. Most importantly, they claim significantly reduced filter block rates. Web search now available in the Claude API. If you enable it, Claude makes its own decisions on how and when to search.On Your Marks
Toby Ord analyzes the METR results and notices that task completion seems to follow a simple half-life distribution, where an agent has a roughly fixed chance of failure at any given point in time. Essentially agents go through a sequence of steps until one fails in a way that prevents them from recovering. Sara Hooker is taking some online heat for pointing out some of the fatal problems with LmSys Arena, which is the opposite of what should be happening. If you love something you want people pointing out its problems so it can be fixed. Also never ever shoot the messenger, whether or not you are also denying the obviously true message. It’s hard to find a worse look. If LmSys Arena wants to remain relevant, at minimum they need to ensure that the playing field is level, and not give some companies special access. You’d still have a Goodhart’s Law problem and a slop problem, but it would help. We now have Glicko-2, a compilation of various benchmarks. I can believe this, if we fully ignore costs. It passes quite a lot of smell tests. I’m surprised to see Gemini 2.5 Pro winning over o3, but that’s because o3’s strengths are in places not so well covered by benchmarks.Choose Your Fighter
I’ve been underappreciating this: Yes, the need to verify outputs is super annoying, but o3 does not otherwise waste your time. That is such a relief. Hasan Can falls back on Gemini 2.5 Pro over Sonnet 3.7 and GPT-4o, doesn’t consider o3 as his everyday driver. I continue to use o3 (while keeping a suspicious eye on it!) and fall back first to Sonnet before Gemini. Sully proposes that cursor has a moat over copilot and it’s called tab. Peter Wildeford’s current guide to which model to use, if you have full access to all:Upgrade Your Fighter
Diffusion can be slow. Under pressure, diffusion can be a lot faster. We’re often talking these days about US military upgrades and new weapons on timescales of decades. This is what is known as having a very low Military Tradition setting, being on a ‘peacetime footing,’ and not being ready for the fact that even now, within a few years, everything changes, the same way it has in many previous major conflicts of the past. Now imagine that, but for everything else, too.Unprompted Suggestions
Better prompts work better, but not bothering works faster, which can be smarter. I have a sense for how to prompt well but mostly I set my custom instructions and then write what comes naturally. I certainly could use much better prompting, if I had need of it, I almost never even bother with examples. Mostly I find myself thinking some combination of ‘the custom instructions already do most of the work,’ ‘eh, good enough’ and ‘eh, I’m busy, if I need a great prompt I can just wait for the models to get smarter instead.’ Feelings here are reported rather than endorsed. If you do want a better prompt, it doesn’t take a technical expert to make one. I have supreme confidence that I could improve my prompting if I wanted it enough to spend time on iteration.Deepfaketown and Botpocalypse Soon
For now it’s psychosis, but that doesn’t mean in the future they won’t be out to get you. How are thing going on Reddit? This is in a particular subsection of Reddit, but doubtless it is everywhere. Some number of people might be adapting the em dash in response as humans, but I am guessing not many, and many AI responses won’t include an em dash. As a window to what level of awareness of AI ordinary people have and need: Oh no, did you know that profiles on dating sites are sometimes fake, but the AI tools for faking pictures, audio and even video are rapidly improving. I think the warning here from Harper Carroll and Liv Boeree places too much emphasis on spotting AI images, audio and video, catfishing is ultimately not so new. What’s new is that the AI can do the messaging, and embody the personality that it senses you want. That’s the part that previously did not scale. Ultimately, the solution is the same. Defense in depth. Keep an eye out for what is fishy, but the best defense is to simply not pay it off. At least until you meet up with someone in person or you have very clear proof that they are who they claim to be, do not send them money, spend money on them or otherwise do things that would make a scam profitable, unless they’ve already provided you with commensurate value such that you still come out ahead. Not only in dating, but in all things. Russian bots publish massive amounts of false claims and propaganda to get it into the training data of new AI models, 3.6 million articles in 2024 alone, and the linked report claims this is effective at often getting the AIs to repeat those claims. This is yet another of the arms races we are going to see. Ultimately it is a skill issue, the same way that protecting Google search is a skill issue, except the AIs will hopefully be able to figure out for themselves what is happening. Nate Lanxon and Omar El Chmouri at Bloomberg ask why are deepfakes ‘everywhere’ and ‘can they be stopped?’ I question the premise. Compared to expectations, there’s very few deepfakes running around. As for the other half of the premise, no, they cannot be stopped, you can only adapt to them.They Took Our Jobs
Fiverr CEO Micha Kaufman goes super hard on how fast AI is coming for your job. As in, he says if you’re not an exceptional talent and master at what you do (and, one assumes, what you do is sufficiently non-physical work), you will need a career change within a matter of months and you will be doomed he tells you, doooomed! As in: It’s worth reading the email in full, so here you go: So, first off, no. That’s not going to happen within ‘a matter of months.’ We are not going to suddenly have AI taking enough jobs to put all the non-exceptional white-collar workers out of a job during 2025, nor is it likely to happen in 2026 either. It’s coming, but yes these things for now take time. o3 gives only about a 5% chance that >30% of Fiverr headcount becomes technologically redundant within 12 months. That seems like a reasonable guess. One might also ask, okay, suppose things do unfold as Micha describes, perhaps over a longer timeline. What happens then? As a society we are presumably much more productive and wealthier, but what happens to the workers here? In particular, what happens to that ‘non-exceptional’ person who needs to change careers? Presumably their options will be limited. A huge percentage of workers are now unemployed. Across a lot of professions, they now have to be ‘elite’ to be worth hiring, and given they are new to the game, they’re not elite, and entry should be mostly closed off. Which means all these newly freed up (as in unemployed) workers are now competing for two kinds of jobs: Physical labor and other jobs requiring a human that weren’t much impacted, and new jobs that weren’t worth doing before but are now. Wages for the new jobs reflect that those jobs weren’t previously in sufficient demand to hire people, and wages in the physical jobs reflect much more labor supply, and the AI will take a lot of the new jobs too at this stage. And a lot of others are trying to stay afloat and become ‘elite’ the same way you are, although some people will give up. So my expectations is options for workers will start to look pretty grim at this point. If the AI takes 10% of the jobs, I think everyone is basically fine because there are new jobs waiting in the wings that are worth doing, but if it’s 50%, let along 90%, even if restricted to non-physical jobs? No. o3 estimates that 60% of American jobs are physical such that you would need robotics to automate them, so if half of those fell within a year, that’s quite a lot. Then of course, if AIs were this good after a months, a year after that they’re even better, and being an ‘elite’ or expert mostly stops saving you. Then the AI that’s smart enough to do all these jobs solves robotics. (I mean just kidding, actually there’s probably an intelligence explosion and the world gets transformed and probably we all die if it goes down this fast, but for this thought experiment we’re assuming that for some unknown reason that doesn’t happen.) AI in the actual productivity statistics where we bother to have people use it? As in, if they gave you a Copilot license, that saved 1.35 hours per week of email work, for an overall productivity gain of 3%, and a 6% gain in high focus time. Not transformative, but not bad for what workers accomplished the first year, in isolation, without alerting their behavior patterns. And that’s with only half of them using the tool, so 7% gains for those that used it, that’s not a random sample but clearly there’s a ton of room left to capture gains, even without either improved technology or coordination or altering work patterns, such as everyone still attending all the meetings. To answer Tyler Cowen’s question, saving 40 minutes a day is a freaking huge deal. That’s 8% of working hours, or 4% of waking hours, saved on the margin. If the time is spent on more work, I expect far more than an 8% productivity gain, because a lot of working time is spent or wasted on fixed costs like compliance and meetings and paperwork, and you could gain a lot more time for Deep Work. His question on whether the time would instead be wasted is valid, but that is a fully general objection to productivity gains in general, and over time those who waste it lose out. On wage gains, I’d expect it to take a while to diffuse in that fashion, and be largely offset by rising pressure on employment. Whereas for now, a different paper Tyler Cowen points us to claims currently only 1%-5% of all work hours are currently assisted by generative AI, and that is enough to report time savings of 1.4% of total work hours. The framing of AI productivity as time saved shows how early days all this is, as do all of the numbers involved. Private AI investment reached $33.9 billion last year (up only 18.7%!), and is rapidly diffusing across all companies. Part of the problem is that companies try to make AI solve their problems, rather than ask what AI can do, or they just push a button marked AI and hope for the best. Even if you ‘think like a corporate manager’ and use AI to target particular tasks that align with KPIs, there’s already a ton there. It’s fair to say that generative AI isn’t having massive productivity impacts yet, because of diffusion issues on several levels. I don’t think this should be much of a blackpill in even the medium term. Imagine if it were otherwise already. It is possible to get caught using AI to write your school papers for you. It seems like universities and schools have taken one of two paths. In some places, the professors feed all your work into ‘AI detectors’ that have huge false positive and negative rates, and a lot of students get hammered many of whom didn’t do it. Or, in other places, they need to actually prove it, which means you have to richly deserve to be caught before they can do anything: It’s so cute to look back to this March 2024 write-up of how California was starting to pay people to go to community college. It doesn’t even think about AI, or what will inevitably happen when you put a bounty on pretending to do homework and virtually attend classes. As opposed to the UAE which is rolling AI out into K-12 classrooms next school year, with a course that includes ‘ethical awareness,’ ‘fundamental concepts’ and also real world applications.The Art of the Jailbreak
For now ‘Sam Altman told me it was ok’ can still at least sometimes serve as an o3 jailbreak. Then again, a lot of other things would work fine some of the time too. Someone at OpenAI didn’t clean the data set. There’s only one way I can think of for this to be happening. Objectively as a writer and observer it’s hilarious and I love it, but it also means no one is trying all that hard to clean the data sets to avoid contamination. This is a rather severe Logos Failure, if you let this sort of thing run around in the training data you deserve what you get.Get Involved
You could also sell out, and get to work building one of YC’s requested AI agent companies. Send in the AI accountant and personal assistant and personal tutor and healthcare admin and residential security and robots software tools and voice assistant for email (why do you want this, people, why?), internal agent builder, financial manager and advisor, and sure why not the future of education?OpenAI Creates Distinct Evil Applications Division
Am I being unfair? I’m not sure. I don’t know her and I want to be wrong about this. I certainly stand ready to admit this impression was wrong and change my judgment when the evidence comes in. And I do think creating a distinct applications division makes sense. But I can’t help but notice the track record that makes her so perfect for the job centrally involves scaling Facebook’s ads and video products, while OpenAI looks at creating a new rival social product and is already doing aggressive A/B testing on ‘model personality’ that causes massive glazing? I mean, gulp? OpenAI already created an Evil Lobbying Division devoted to a strategy centered on jingoism and vice signaling, headed by the most Obviously Evil person for the job. This pattern seems to be continuing, as they are announcing board member Fidji Simo as the new ‘CEO of Applications’ reporting to Sam Altman. So what makes Fidji Simo so uniquely qualified to lead this group? Why am I rather skeptical of the ‘public good’ goal? Well, uh, you see… If you are telling me Fidji Simo is uniquely qualified to run your product division, you are telling me a lot about the intended form of your product division. The best thing about most AI products so far, and especially about OpenAI until recently, is that they have firmly held the line against exactly the things we are talking about here. The big players have not gone in for engagement maximization, iterative A/B testing, Skinner boxing, advertising or even incidental affiliate revenue, ‘news feed’ or ‘for you’ algorithmic style products or other such predation strategies. When you combine the appointment of Simo, her new title ‘CEO’ and her prior track record, the context of the announcement of enabling ‘traditional’ company growth functions, and the recent incidents involving both o3 the Lying Liar and especially GPT-4o the absurd sycophant (which is very much still an absurd sycophant, except it is modestly less absurd about it) which were in large part caused by directly using A/B customer feedback in the post-training loop and choosing to maximize customer feedback KPIs over the warnings of internal safety testers, you can see why this seems like another ‘oh no’ moment. Simo also comes from a ‘shipping culture.’ There is certainly a lot of space within AI where shipping it is great, but recently OpenAI has already shown itself prone to shipping frontier-pushing models or model updates far too quickly, without appropriate testing, and they are going to be releasing open reasoning models as well where the cost of an error could be far higher than it was with GPT-4o as such a release cannot be taken back. I’m also slightly worried that Fidji Simo has explicitly asked for glazing from ChatGPT and then said its response was ‘spot on.’ Ut oh. A final worry is this could be a prelude to spinning off the products division in a way that attempts to free it from nonprofit control. Watch out for that. I do find some positive signs in Altman’s own intended new focus, with the emphasis on safety including with respect to superintelligence, although one must beware cheap talk:In Other AI News
Apple announces it is ‘exploring’ adding AI-powered search to its browser, and that web searches are down due to AI use. The result on the day, as of when I noticed this? AAPL -2.5%, GOOG -6.5%. Seriously? I knew the EMH was false but not that false, damn, ever price anything in? I treat this move as akin to ‘Chipotle shares rise on news people are exploring eating lunch.’ I really don’t know what you were expecting? For Apple not to ‘explore’ adding AI search as an option on Safari, or customers not to do the same, would be complete lunacy. Apple and Anthropic are teaming up to build an AI-powered ‘vibe-coding’ platform, as a new version of Xcode. Apple is wisely giving up on doing the AI part of this itself, at least for the time being. From Mark Bergen and Omar El Chmouri at Bloomberg: ‘Mideast titans’ especially the UAE step back from building homegrown AI models, as have most everywhere other than the USA and China. Remember UAE’s Falcon? Remember when Aleph Alpha was used as a reason for Germany to oppose regulating frontier AI models? They’re no longer trying to make one. What about Mistral in France? Little technical success, traction or developer interest. The pullbacks seem wise given the track record. You either need to go all out and try to be actually competitive with the big boys, or you want to fold on frontier models, and at most do distillations for customized smaller models that reflect your particular needs and values. Of course, if VC wants to fund Mistral or whomever to keep trying, I wouldn’t turn them down.Show Me the Money
OpenAI buys Windsurf (a competitor to Cursor) for $3 billion. Parloa, who are attempting to build AI agents for customer service functions, raises $120 million at $1 billion valuation. American VCs line up to fund Manus at a $500 million valuation. So Manus is technically Chinese but it’s not marketed in China, it uses an American AI at its core (Claude) and it’s funded by American VC. Note that new AI companies without products can often get funded at higher valuations than this, so it doesn’t reflect that much investor excitement given how much we’ve had to talk about it. As an example, the previous paragraph was the first time I’d seen or typed ‘Parloa,’ and they’re a competitor to Manus with double the valuation. That’s saying that Microsoft is at capacity. That’s why they can beat earnings in AI by expanding capacity, as confirmed repeatedly by Bloomberg.Quiet Speculations
Metaculus estimate for date of first ‘general AI system to be devised, tested and publicly announced’ has recently moved back to July 2034 from 2030. The speculation is this is largely due to o3 being disappointing. I don’t think 2034 is a crazy estimate but this move seems like a clear overreaction if that’s what this is about. I suspect it is related to the tariffs as economic sabotage? Paul Graham speculates (it feels like not for the first time, although he says that it is) that AI will cause people to lose the ability to write, causing people to then lose everything that comes with writing. You think there are going to be schools?Overcoming Diffusion Arguments Is a Slow Process Without a Clear Threshold Effect
Are we answering the whole ‘AGI won’t much matter because diffusion’ attack again? Sigh, yes, I got tricked into going over this again. My apologies. Seriously, most of you can skip this section. Someone needs to play Hearts of Iron, and that someone works at the DoD. If AGI was made tomorrow at a non-insane price and our military platforms didn’t incorporate it for 25 years, or hell even if current AI doesn’t get incorporated for 25 years, I wouldn’t expect to have a country or a military left by the time that happens, and I don’t even mean because of existential risk. The paper itself is centrally a commentary on what the term ‘AGI’ means and their expectation that you can make smarter than human things capable of all digital taks and that will only ‘diffuse’ over the course of decades similarly to other techs. I find it hard to take seriously people saying ‘because diffusion takes decades’ as if it is a law of nature, rather than a property of the particular circumstances. Diffusion sometimes happens very quickly, as it does in AI and much of tech, and it will happen a lot faster with AI being used to do it. Other times it takes decades, centuries or millennia. Think about the physical things involved – which is exactly the rallying cry of those citing diffusion and bottlenecks – but also think about the minds and capabilities involved, take the whole thing seriously, and actually consider what happens. The essay is also about the question about whether ‘o3 is AGI,’ which it isn’t but which they take seriously as part of the ‘AGI won’t be all that’ attack. Their central argument relies on AGI not having a strong threshold effect. There isn’t a bright line where something is suddenly AGI the way something is suddenly a nuclear bomb. It’s not that obvious, but the threshold effects are still there and very strong, as it becomes sufficiently capable at various tasks and purposes. The reason we define AGI as roughly ‘can do all the digital and cognitive things humans can do’ is because that is obviously over the threshold where everything changes, because the AGIs can then be assigned and hypercharge the digital and cognitive tasks, which then rapidly includes things like AI R&D and also enabling physical tasks via robotics. The argument here also relies upon the idea that this AGI would still ‘fail badly at many real-world tasks.’ Why? Because they don’t actually feel the AGI in this, I think? That not being how any of this works with AGI is the whole point of AGI! If you have an ‘ordinary’ AI, or any other ‘mere tool,’ and you use it to automate my job, I can move on to a different job. If you have a mind (digital or human) that can adjust the same way I can, only superior in every way, then the moment I find a new job, then you go ahead and take that too. Music break, anyone?Chipping Away
Trump administration reiterates that it plans to change and simplify the export control rules on chips, and in particular to ease restrictions on the UAE, potentially during his visit next week. This is also mentioned: If I found out the Malaysian data centers are not largely de facto Chinese data centers, I would be rather surprised. This is exactly the central case of why we need the new diffusion rules, or something with similar effects. This is certainly one story you can tell about what is happening: Tao Burga of IFP has a thread reiterating that we need to preserve the point of the rules, and ways we might go about doing that. We can absolutely improve on the Biden rules. What we cannot afford to do is to replace them with rules that are simplified or designed to be used for leverage elsewhere, in ways that make the rules ineffective at their central purpose of keeping AI compute out of Chinese hands.The Quest for Sane Regulations
Nvidia is going all-in on ‘if you don’t sell other countries equal use of your key technological advantage then you will lose your key technological advantage.’ Nvidia even goes so far as to say Anthropic is telling ‘tall tales’ (without, of course, saying specific claims they believe are false, only asserting without evidence the height of those claims) which is rich coming from someone saying China is ‘not behind on AI’ and also that if you don’t let me sell your advanced chips to them America will lose its lead. Want sane regulations for the department of housing and urban development and across the government? So do I. Could AI help rewrite the regulations? Absolutely. Would I entrust this job to an undergraduate at DOGE with zero government experience? Um, no, thanks. The AI is a complement to actual expertise, not something to trust blindly, surely we are not this foolish. I mean, I’m not that worried the changes will actually stick here, but good wowie moment of the week candidate. Indeed, I am far more worried this will give ‘AI helps rewrite regulations’ an even worse name than it already has. Our immigration policies are now sufficiently hostile that we have gone from the AI talent magnet of the world to no longer being a net attractor of talent:Line in the Thinking Sand
I often analyze various safety and security (aka preparedness) frameworks and related plans. One problem is that the red lines they set don’t stay red and aren’t well defined. I don’t sense that OpenAI, Google or Anthropic has confidence in what does or doesn’t, or should or shouldn’t, count as a dangerous capability, especially in the realm of automating AI R&D. We use vague terms like ‘substantial uplift’ and provide potential benchmarks, but it’s all very dependent on spirit of the rules at best. That won’t fly in crunch time. Like Jeffrey, I don’t have a great set of answers to offer on the object level. What I do know is that I don’t trust any lab not to move the goalposts around to find a way to release, if the question is at all fudgeable in this fashion and the commercial need looks strong. I do think that if something is very clearly over the line, there are labs that won’t pretend otherwise. But I also know that all the labs intend to respond to crossing the red lines with (as far as we see relatively mundane and probably not so effective) mitigations or safeguards, rather than a ‘no just no until we figure out something a lot better.’ That won’t work.The Week in Audio
Want to listen to my posts instead of read them? Thomas Askew offers you a Podcast feed for that with richly voiced AI narrations. You can donate to help out that effort here, the AI costs and time commitment do add up. Jack Clark goes on Conversations With Tyler, self-recommending. Tristan Harris TED talks the need for a ‘narrow path’ between diffusion of advanced AI versus concentrated power of advanced AI. Humanity needs to have enough power to steer, without that power being concentrated ‘in the wrong hands.’ The default path is insane, and coordination away from it is hard, but possible, and yes there are past examples. The step where we push back against fatalism and ‘inevitability’ remains the only first step. Alas, like most others he doesn’t have much to suggest for steps beyond that. The SB 1047 mini-movie is finally out. I am in it. Feels so long ago, now. I certainly think events have backed up the theory that if this opportunity failed, we were unlikely to get a better one, and the void would be filled by poor or inadequate proposals. SB 813 might be net positive but ultimately it’s probably toothless. The movies got into the act with Thunderbolts*. Given their track record the last few years has been so bad I stopped watching most Marvel movies, I did not expect this to be anything like as good as it was, or that it would (I assume fully unintentionally) be a very good and remarkably accurate movie about AI many and the associated dynamics, in addition to the themes like depression, friendship and finding meaning that are its text. Great joy, 4.5/5 stars if you’ve done your old school MCU homework on the characters (probably 3.5 if you’d be completely blind including the comics?).Rhetorical Innovation
Jesse Hoogland coins ‘the sweet lesson’ that AI safety strategies only count if they scale with compute. As in, as we scale up all the AIs involved, the strategy at least keeps pace, and ideally grows stronger. If that’s not true, then your strategy is only a short term mundane utility strategy, full stop. Ah, the New Yorker essay by someone bragging about how they have never used ChatGPT, bringing very strong opinions about generative AI and how awful it is. Okay, this is actually a great point: Except, hang on… Those two have very little to do with each other. I think it’s a great point that looking for a public perception discontinuity, where everyone points and suddenly says ‘AGI!’ runs hard into this critique, with caveats. The first thing is, reality does not have to care what you think of it. If AGI would indeed blow the world up, then we have ‘this seems like continuous progress, I said, as my current arrangement of atoms was transformed into something else that did not include me,’ with or without involving drones or nanobots. Even if we are talking about a ‘normal’ exponential, remember that week in 2020? Which leads into the second thing is, public perception of many things is often continuous and mostly oblivious until suddenly it isn’t. As in, there was a lot of AI progress before ChatGPT, then that came out and then wham. There’s likely going to be another ‘ChatGPT’ moment for agents, and one for the first Siri-Alexa-style thing that actually works. Apple Intelligence was a miss but that’s because it didn’t deliver. Issues simmer until they boil over. Wars get declared overnight. And what is experienced as a discontinuity, of perception or of reality, doesn’t have to mostly be overnight, it can largely be over a period of months or more, and doesn’t even have to technically be discontinuous. Exponentials are continuous but often don’t feel that way. We are already seeing wildly rapid diffusion and accelerating progress even if it is technically ‘continuous’ and that’s going to be more so once the AIs count as meaningful optimization engines.A Good Conversation
Arvind Narayanan and Ajeya Cotra have a conversation in Asterisk magazine. As I expected, while this is a much better discussion than your usual, especially Arvind’s willingness to state what evidence would change his mind on expected diffusion rates, but I found much of it extremely frustrating. Such as this, offered as illustrative: This seems like Arvind is saying that AI in general can’t ever systematically run companies successfully because it would be up against other companies that are also run by similar AIs, so its success rate can’t be that high? And well, okay, sure I guess? But what does that have to do with anything? That’s exactly the world being envisioned – that everyone has to turn their company over to AI, or they lose. It isn’t a meaningful claim about what AI ‘can’t do,’ what it can’t do in this claim is be superior to other copies of itself. Arvind then agrees, yes, we are headed for a world of universal deference to AI models, but he’s not sure it’s a ‘safety risk.’ As in, we will turn over all our decision making to AIs, and what, you worried bro? I mean, yes, I’m very worried about that, among other things. As another example: The implication is then, since we can’t imagine it, we shouldn’t worry about it yet. Except we are headed straight towards it, in a way that may soon make it impossible to change course, so yes we need to think about it now. It’s rather necessary. If we can’t even imagine it, then that means it will be something we can’t imagine, and no I don’t think that means it will probably be fine. Besides, we can know important things about it without being able to imagine it, such as the above agreement that AI will by default end up making all the decisions and having control over this future. The difference with the Industrial Revolution is that there we could steer events later, after seeing the results. Here, by default, we likely can’t. And also, it’s crazy to say that if you lived before the Industrial Revolution you couldn’t say many key things about that future world, and plan for it and anticipate it. As an obvious example, consider the US Constitution and system of government, which very much had to be designed to adapt to things like the Industrial Revolution without knowing its details. Then there’s a discussion of whether it makes sense to have the ability to pause or restrict AI development, which we need to do in advance of there being a definitive problem because otherwise it is too late, and Arvind says we can’t do it until after we have definitive evidence of specific problems already. Which means it will 100% be too late – the proof that satisfies his ask is a proof that you needed to do something at least a year or two ago, so I guess we finished putting on all the clown makeup, any attempt to give us such abilities only creates backfire, and so on. So, no ability to steer the future until it is too late to do so, then. Arvind is assuming progression will be continuous, but even if this is true, that doesn’t mean utilization and realization won’t involve step jumps, and also that scaffolding won’t enable a bunch of progression off of existing available models. So again, essentially zero chance we will be able to steer until we notice it is too late. This was perhaps the best exchange: It is a highly dangerous position we are in, likely to result in highly discontinuous felt changes, to have model capabilities well ahead of product development, especially with open models not that far behind in model capabilities. If OpenAI, Anthropic or Google wanted to make their AI a better or more useful consumer product, to have it provide better mundane utility, they would do a lot more of the things a product company would do. They don’t do that much of it. OpenAI is trying to also become a product company, but that’s going slowly, and this is why for example they just bought Windsurf. Anthropic is fighting it every step of the way. Google of course does create products, but DeepMind hates the very concept of products, and Google is a fundamentally broken company, so the going is tough. I actually wish they’d work a lot harder on their product offerings. A lot of why it’s so easy for many to dismiss AI, and to expect such slow diffusion, is because the AI companies are not trying to enable that diffusion all that hard.The Urgency of Interpretability
From last week, Anthropic CEO Dario Amodei wrote The Urgency of Interpretability. I certainly agree with the central claim that we are underinvesting in mechanistic interpretability (MI) in absolute terms. It would be both good for everyone and good for the companies and governments involved if they invested far more. I do not however think we are underinvesting in MI relative to other potential alignment-related investments. He says that the development of AI is inevitable (well, sure, with that attitude!). Dario does say that he thinks AI can be steered before models reach an overwhelming level of power, which implies where he thinks this inevitably goes. And Dario says he has increasingly focused on interpretability as a way of steering. Whereas by default, we have very little idea what AIs are going to do or how they work or how to steer. Dario buys into what I think is a terrible and wrong frame here: I am sorry, but no. I do not sympathize, and neither should he. These are not ‘vague theoretical arguments’ that these things ‘might’ have the incentive to emerge, not at this point. Sure, if your livelihood depends on seeing them that way, you can squint. But by now that has to be rather intentional on your part, if you wish to not see it. Dario is treating such objections as having a presumption of seriousness and good faith that they, frankly, do not deserve at this point, and Anthropic’s policy team is doing similarly only more so, in ways that have real consequences. Do we need interpretability to be able to prove this in a way that a lot more people will be unable to ignore? Yeah, that would be very helpful, but let’s not play pretend. The second section, a brief history of mechanistic interpretability, seems solid. The third section, on how to use interpretability, is a good starter explanation, although I notice it is insufficiently paranoid about accidentally using The Most Forbidden Technique. Also, frankly, I think David is right here: That doesn’t mean interpretability can’t help you do things safely. It absolutely can. Building intermediate safe systems you can count on is extremely helpful in this regard, and you’ll learn a lot both figuring out how to do interpretability and from the results that you find. It’s just not the solution you think it is. Then we get to the question of What We Can Do. Dario expects an ‘MRI for AI’ to be available within 5-10 years, but expects his ‘country of geniuses in a datacenter’ within 1-2 years, so of course you can get pretty much anything in 3-8 more years after that, and it will be 3-8 years too late. We’re going to have to pick up the pace. The essay doesn’t say how these two timelines interact in Dario’s model. If we don’t get the genuines in the datacenter for a while, do we still get interpretability in 5-10 years? Is that the timeline without the Infinite Genius Bar, or with it? They imply very different strategies.- His first suggestion is the obvious one, which is to work harder and spend more resources directly on the problem. He tries to help by pointing out that being able to explain what your model does and why is a highly profitable ability, even if it is only used to explain things to customers and put them at ease.
- Governments can ‘use light-touch rules’ to encourage the development of interpretability research. Of course they could also use heavy-touch rules, but Anthropic is determined to act as if those are off the table across the board.
- Export controls can ‘create a ‘security buffer’ that might give interpretability more time.’ This implies, as he notes, the ability to then ‘spend some of our lead’ on interpretability work or otherwise stall at a later date. This feels a bit shoehorned given the insistence on only ‘light-touch’ rules, but okay, sure.
I think that would be a reasonable rebrand if it was bought into properly. Mostly the message is simple and clear: Get to work. I agree with Neel Nanda that the essay is implicitly presenting the situation as if interpretability would be the only reliable path forward for detecting deception in advanced AI. He’s saying it is both necessary and sufficient, whereas I would say it is neither obviously necessary nor is it sufficient. As Neel says, ‘high reliability seems unattainable’ using anything like current methods. Neel suggests a portfolio approach. I agree we should be investing in a diverse portfolio of potential approaches, but I am skeptical that we can solve this via a kind of ‘defense in depth’ when up against highly intelligent models. That can buy you some time on the margin, which might be super valuable. But ultimately, I think you will need something we haven’t figured out yet and am hoping such a thing exists in effectively searchable space. (And I think relying heavily on defense-in-depth with insufficiently robust individual layers is a good way to suddenly lose out of nowhere when threshold effects kick in.) Neel lists reasons why he expects interpretability not to be reliable. I agree, and would emphasize the last one, that if we rely on interpretability we should expect sufficiently smart AI to obfuscate around our techniques, the same way humans have been growing steadily bigger brains and developing various cultural and physical technologies in large part so we can do this to each other and defend against others trying to do it to us.The Way
As Miles says, so very far to go, but every little bit helps (also I am very confident the finding here is correct, but it’s establishing the right process that matters right now): Also, yes, it seems there is now an Amazon Nova Premier, but I don’t see any reason one would want to use it?Aligning a Smarter Than Human Intelligence is Difficult
Some additional refinements to the emergent misalignment results. The result is gradual, and you can get it directly from base models, and also can get it in reasoning models. Nothing I found surprising, but good to rule out alternatives. Janus finds GPT-4-base does quite a lot of alignment faking.People Are Worried About AI Killing Everyone
MIRI is the original group worried about AI killing everyone. They correctly see this as a situation where by default AI kills everyone, and we need to take action so it doesn’t. Here they provide a handy chart of the ways they think AI might not kill everyone, as a way of explaining their new agenda. If anything this chart downplays how hard MIRI thinks this is going to be. It does however exclude an obvious path to victory, which is that an individual lab (rather than a national project) gets the decisive strategic advantage, either sharing it with the government or using it themselves. An off switch let alone a halt is going to be very difficult to achieve. It’s going to be even harder the longer one waits to build towards it. It makes sense to, while also pursuing other avenues, build towards having that option. I support putting a lot of effort into creating the ability to pause. This is very different from advocating for actually halting (also called ‘pausing’) now.Other People Are Not As Worried About AI Killing Everyone
Paul Tutor Jones, who said there’s a 90% chance AI doesn’t even wipe out half of humanity, let alone all of it. What a relief. Hedge fund guys sometimes understand risk, including tail risk, and can have great practical ways of handing it. This kind of statement from Paul Tutor Jones is very much the basic normie argument that should be sufficient to carry the day. Alas.The Lighter Side
On the contrary, it’s lack-of-empathy-as-a-service, and there’s a free version! Dear [blue], I would like a more formal version, please. Best, [red].