I will (once again!) be raising the bar for what gets included going forward to prevent that.
I'm confused by this because the bar for what gets included seems very low. I mostly don't read these posts because a large fraction of the "news" reported is just random tweets by people in the rationalist / adjacent sphere.
It's like reading papers, skimming a lot of them on some topic is quite valuable in the long run, even those superficially uninteresting, and even as you only end up paying attention to 1% of their combined text. In this case, the advantage is over finding the relevant random tweets yourself.
Agree. I would prefer less "this guy said a thing on x.com" and more news, statistics and technical reports.
human marginal productivity increases, and we get wealthier, so wages might plausible go up for a while,
Why would wages go up? Employers have zero reason to pass the improved productivity gains to the employees, especially in a situation with mass layoffs creating lots of free labor to replace any employees upset about this. Previous gains in productivity have not increased wages, especially in modern (post 2000~) times. If anything, increased productivity allows companies to layoff employees, reducing overall wages.
Overall, the obsession with things like 'wages' around high automation is incredibly strange to me and assigns a huge amount of benevolence to the companies and people running them. I don't think that capitalism, automation, and human flourishing for anyone who doesn't own one of these companies are compatible, and I think we're likely to see huge loss of life or upheaval closer to 20% automation, or even less.
OpenAI is going to remove GPT-4.5 from the API on July 14, 2025.
At the time I read that as an early announcement of when they are releasing GPT-4.5-thinking (with a controllable thinking budget), possibly to be called "GPT-5", so that the non-thinking GPT-4.5 becomes obsolete. The first GB200 NVL72s might be coming online about now, which should allow both fast serving and more reasonable pricing for very large models, even with reasoning.
I don’t get it. Have these people not heard of prices?
The issue with very large models is that you need some minimal number of GPUs to keep them in memory, and you can't serve them at all if you use fewer GPUs than that. If almost nobody uses the model, you are still paying for all the time of those GPUs. If GPT-4.5 is a 1:8 sparse MoE model pretrained in FP8 (announcement video mentioned training in low precision) on 100K H100s (Azure Goodyear campus), it could be about 5e26 FLOPs. At 1:8 sparsity, compute optimal tokens per active param ratio is 3x the dense ratio, and the dense ratio is about 40 tokens per param. So that gives 830B active params, 6.7T total params, and 100T training tokens.
The reason I chose 1:8 sparsity in the estimate is that a GB200 NVL72 rack has about 13 TB of HBM, and so 6.7T FP8 total params comfortably fit, leaving space for KV-cache. A GB200 NVL72 rack costs about $3M as Huang recently announced, or alternatively you might need 12 nodes of H200 (140 GB of HBM per chip, 96 chips), which is 3 racks (this will work more slowly and so will serve fewer requests for the same GPU-time), these racks might cost about $6M.
So that's an anchor for the fixed costs, to reach marginal costs there needs to be a lot of active users to occupy multiple times of that amount most of the time. If there aren't enough users, you still need to pay for the dedicated time of at least those 3 racks of H200 or 1 rack of GB200 NVL72, and that's not something you can reasonably price.
Well, yeah, if your partner is proposing to anything that is not you, that’s a problem.
Not if you are polyamorous. Maybe the future is poly... with one-human-only rule for most.
Benjamin Todd: Dropping the error rate from 10% to 1% (per 10min) makes 10h tasks possible.
In practice, the error rate has been halving every 4 months(!).
In fact we can’t rule out that individual humans have a fixed error rate – just one that’s lower than current AIs.
Ever since I read Sarah Constantin's Errors vs. Bugs and the End of Stupidity I find myself immediately skeptical of claims like "humans have a fixed error rate".
A common mental model for performance is what I'll call the "error model." In the error model, a person's performance of a musical piece (or performance on a test) is a perfect performance plus some random error. You can literally think of each note, or each answer, as x + c*epsilon_i, where x is the correct note/answer, and epsilon_i is a random variable, iid Gaussian or something. Better performers have a lower error rate c. Improvement is a matter of lowering your error rate. This, or something like it, is the model that underlies school grades and test scores. Your grade is based on the percent you get correct. Your performance is defined by a single continuous parameter, your accuracy.
But we could also consider the "bug model" of errors. A person taking a test or playing a piece of music is executing a program, a deterministic procedure. If your program has a bug, then you'll get a whole class of problems wrong, consistently. Bugs, unlike error rates, can't be quantified along a single axis as less or more severe. A bug gets everything that it affects wrong. And fixing bugs doesn't improve your performance in a continuous fashion; you can fix a "little" bug and immediately go from getting everything wrong to everything right. You can't really describe the accuracy of a buggy program by the percent of questions it gets right; if you ask it to do something different, it could suddenly go from 99% right to 0% right. You can only define its behavior by isolating what the bug does.
Often, I think mistakes are more like bugs than errors. My clinkers weren't random; they were in specific places, because I had sub-optimal fingerings in those places. A kid who gets arithmetic questions wrong usually isn't getting them wrong at random; there's something missing in their understanding, like not getting the difference between multiplication and addition. Working generically "harder" doesn't fix bugs (though fixing bugs does require work).
The good news is that the societies described here are vastly wealthier. So if humans are still able to coordinate to distribute the surplus, it should be fine to not be productively employed, even if to justify redistribution we implement something dumb...
I'm increasingly skeptical that there will be much redistribution to speak of in such a scenario. The vast numbers of people living on $2 a day currently might have something to say about that. What is the historical precedent for a group of humans having as little leverage as even U.S. ex-workers will have in this 99% automation scenario and yet being gifted a UBI, much less a UHI?
if you think such a model is sufficiently behind the capabilities and efficiency frontiers as to be useless, one can also release the weights
The weights are also architecture, which can have key algorithmic secrets that will be quickly adopted by competitors in their future stronger models.
So you'll train open weights models in a way that doesn't use the algorithmic/architectural secrets of your closed weights models, and you won't be releasing weights for closed weights models until the world independently stumbles on enough algorithmic secrets for the ones in your model to stop being advantageous, which could take years.
On homework: It's been about 25 years since I first learned about flipped classrooms, where you only assign homework consisting of reading material, watching lectures or other videos, and taking notes, then use all in-class time for discussion and collaborative assignments. How does this not sidestep the entire AI problem? Presumably while also opening up the whole field of teaching to massive potential for better quality readings and lectures made by the best providers.
I am assuming the answer to why we don't do this is something like, "But the kids won't do the readings and watch the videos." Which seems functionally irrelevant for learning, since not doing something like this already has just about the same problem. If you show up to class without having taken the notes and without knowing the material and without a set of questions to ask on the things you didn't understand, you get a bad grade for the day. Any one of those should be sufficient for not failing except on major exams, since you shouldn't be penalized or shamed for not being a perfect autodidact on a specific lesson from specific sources. (As always, for me, I think back to my sophomore year wave mechanics class by Howard Georgi. His grading formula had separate effort points and achievements points, so that trying harder to learn made the assignment and exam grading more lenient, and also the final exam could always making up any points lost during the term if you aced it).
I also just can't help but notice how differently e.g. Covid school closures might have gone, if we'd started doing this in 2010-2020. Interactive group discussions, not staring at a screen being talked at by teachers who don't know how to do that.
Full cast ElevenLabs podcast episode for this post:
https://dwatvpodcast.substack.com/p/ai-121-part-1-new-connections
I am also going to mention I created audio versions of two of the longer pieces mentioned in this post:
the void - By nostalgebraist:
https://askwhocastsai.substack.com/p/the-void-by-nostalgebraist
How not to lose your job to AI - By Benjamin Todd:
https://askwhocastsai.substack.com/p/how-not-to-lose-your-job-to-ai-by
If all happy families are alike, but each unhappy family is unhappy in its own way, then even if most families are unhappy the most common continuation will be the one type of happy family
Note that this is not true if you're generating text from a base model at temperature one. The proportion of happy and unhappy families generated should match that in the training data. (This assumes training went reasonably well, of course, but it probably did.)
Now, people often use a temperature less than one. And few seem to realize that they are then biasing the generated text towards answers that it so happens can be expressed in only a few ways, and against answers that can be expressed in many different ways. Of course RLFH or whatever adds further biases...
That’s right. I said Part 1. The acceleration continues.
I do not intend to let this be a regular thing. I will (once again!) be raising the bar for what gets included going forward to prevent that. But for now, we’ve hit my soft limit, so I’m splitting things in two, mostly by traditional order but there are a few things, especially some videos, that I’m hoping to get to properly before tomorrow, and also I’m considering spinning out my coverage of The OpenAI Files.
Tomorrow in Part 2 we’ll deal with, among other things, several new videos, various policy disputes and misalignment fun that includes the rising number of people being driven crazy.
Table of Contents
Language Models Offer Mundane Utility
Neat trick, but why is it broken (used here in the gamer sense of being overpowered)?
I suppose I never understood the appeal of mind maps, or what to do with them.
In a mostly unrelated post I saw this chart of how often people use LLMs right now.
Including Google’s AI overviews ‘makes it weird,’ what counts as ‘using’ that? Either way, 27% of people using AI frequently is both amazing market penetration speed and also a large failure by most of the other 73% of people.
Have Claude talk with alter ego Maia. Not my style, but a cute trick.
A claim that Coding Agents Have Crossed the Chasm, going from important force multipliers to Claude Code and OpenAI Codex routinely completing entire tasks, without any need to even look at code anymore, giving build tasks to Claude and bug fixing to Codex.
Catch doctor errors or get you to actually go get that checked out, sometimes saving lives as is seen throughout this thread. One can say this is selection, and there are also many cases where ChatGPT was unhelpful, and sure but it’s cheap to check. You could also say there must be cases where ChatGPT was actively harmful or wrong, and no doubt there are some but that seems like something various people would want to amplify. So if we’re not hearing about it, I’m guessing it’s pretty rare.
Kasey reports LLMs are 10x-ing him in the kitchen. This seems like a clear case where pros get essentially no help, but the worse you start out the bigger the force multiplier, as it can fill in all the basic information you lack where you often don’t even know what you’re missing. I haven’t felt the time and desire to cook, but I’d feel much more confident doing it now than before, although I’d still never be tempted by the whole ‘whip me up something with what’s on hand’ modality.
Language Models Don’t Offer Mundane Utility
Computer use like Anthropic’s continues to struggle more than you would expect with GUIs (graphical user interfaces), such as confusing buttons on a calculator app. A lot of the issue seems to be visual fidelity, and confusion of similar-looking buttons (e.g. division versus + on a calculator), and not gracefully recovering and adjusting when errors happen.
Where I disagree with Eric Meijer here is I don’t think this is much of a sign that ‘the singularity is probably further out than we think.’ It’s not even clear to me this is a negative indicator. If we’re currently very hobbled in utility by dumb issues like ‘can’t figure out what button to click on when they look similar’ or with visual fidelity, these are problems we can be very confident will get solved.
Is it true that if your startup is built ‘solely with AI coding assistants’ that it ‘doesn’t have much value’? This risks being a Labor Theory of Value. If you can get the result from prompts, what’s the issue? Why do these details matter? Nothing your startup can create now isn’t going to be easy to duplicate in a few years.
The worse you are at writing, the more impressive LLMs will seem, except that if you’re good at writing that probably means you’re better at seeing how good they are.
Developing skills to match and surpass it seems like a grim path. It’s one thing to do that to match and surpass today’s LLM writing abilities. But to try and learn faster than AI does, going forward? That’s going to be tough.
I do agree that one should still want to develop writing skills, and that in general you should be on the ‘AI helps me study and grow strong’ side of most such divides, only selectively being on the ‘AI helps me not study or have to grow strong on this’ side.
I’d note that we disagree more on his last claim:
I think good writing is much more about writing than reading. Reading good writing helps, especially if you’re consciously looking to improve your writing while doing so, but in my experience it’s no substitute for actually writing.
It used to be that you can’t always get what you want, but if you try sometimes, you’ll get what you need. What happens when you can always get what you think you want, or at least what you specify?
The answer is, as with other scenarios discussed later in this week’s post, the people who can handle it and ensure that they check themselves, as Jon did here, will do okay, and those that can’t will dig the holes deeper, up to and including going nuts.
This is weird, but it is overdetermined and not a mystery:
There are lots of different forms of adverse selection going on once you realize something is written by ChatGPT, versus favorable selection in reading human writing, and yes you can get exactly the ChatGPT responses you want, whenever you want, if you want that.
I also notice that if I notice something is written by ChatGPT I lose interest, but if someone specifically says ‘o3-pro responded’ or ‘Opus said that’ then I don’t. That means that they are using the origin as part of the context rather than hiding it, and are selected to understand this and pick a better model, and also the outputs are better.
Picking a random number is rarely random, LLMs asked to pick from 1-50 choose 27.
Humans Do Not Offer Mundane Utility
I admit that I did not see this one coming, but it makes sense on reflection.
Let us say, many of the humans did not do so well at solving the correct problem, for exactly the same reason LLMs do the incorrect pattern match, except in a less understandable circumstance because no one is trying to fool you here. Those who did attempt to solve the correct problem did fine. And yes, the LLMs nail it, of course.
Langage Models Should Remain Available
It seems like a civilizational unforced error to permanently remove access to historically important AI models, even setting aside all concerns about model welfare.
OpenAI is going to remove GPT-4.5 from the API on July 14, 2025. This is happening despite many people actually still using GPT-4.5 on a regular basis, and despite GPT-4.5 having obvious historical significance.
I don’t get it. Have these people not heard of prices?
As in, if you find it unprofitable to serve GPT-4.5, or Sonnet 3.6, or any other closed model, then raise prices until you are happy when people use the model. Make it explicit that you are keeping such models around as historical artifacts. Yes, there is some fixed cost to them being available, but I refuse to believe that this cost is prohibitively high.
Alternatively, if you think such a model is sufficiently behind the capabilities and efficiency frontiers as to be useless, one can also release the weights. Why not?
Get My Agent On The Line
Agentic computer use runs into errors rather quickly, but steadily less quickly.
One minute here is not so bad, especially if you have verification working, since you can split a lot of computer use tasks into one minute or smaller chunks. Mostly I think you need a lot less than an hour. And doubling every four months will change things rapidly, especially since that same process will make the shorter tasks highly robust.
Here’s another fun exponential from Benjamin Todd:
I mean, yes, but that isn’t what Model Context Protocol is for?
The average user won’t even touch settings. You think they have a chance in hell at setting up a protocol? Oh, no. Maybe if it’s one-click, tops. Realistically, the way we get widespread use of custom MCP is if the AIs handle the custom MCPs. Which soon is likely going to be pretty straightforward?
OpenAI open sources some of their agent demos, doesn’t seem to add anything.
Have My Agent Call Their Agent
Anthropic built a multi-agent research system that gave a substantial performance boost. Opus 4 leading four copies of Sonnet 4 outperformed single-agent Opus by 90% in their internal research eval. Most of this seems to essentially be that working in parallel lets you use more tokens, and there are a bunch of tasks that are essentially tool calls you can run in while you keep working and also this helps avoid exploding the context window, with the downside being that this uses more tokens and gets less value per token, but does it faster.
The advice and strategies seem like what you would expect, watch the agents, understand their failure modes, use parallel tool calls, evaluate on small samples, combine automated and human evaluation, yada yada, nothing to see here.
So many boats around here these days. I’m sure it’s nothing.
Their cookbook GitHub is here.
This is another way to scale compute, in this case essentially to buy time.
Beware Prompt Injections
The current strategy is, essentially, yolo, it’s probably fine. With that attitude things are going to increasingly be not fine, as most Cursor instances have root access and ChatGPT and Claude increasingly have connectors.
I like the ‘lethal trifecta’ framing. Allow all three, and you have problems.
Unprompted Attention
A new paper suggests adding this CBT-inspired line to your system instructions:
Alas they don’t provide serious evidence that this intervention works, but some things like this almost certainly do help with avoiding mistakes.
Here’s another solution that sounds dumb, but you do what you have to do:
That’s not a cheap solution, but if that’s what it takes and you don’t have a better solution? Go for it, I guess?
Huh, Upgrades
Speaking of ChatGPT getting those connectors, reminder that this is happening…
So hook up to all your data and also to your payment systems, with more coming soon?
I mean, yeah, it’s great as long as nothing goes wrong. Which is presumably why we have the Deep Research restriction. Everyone involved should be increasingly nervous.
Claude Code is also getting the hookup.
Again, clearly nothing can possibly go wrong, and please do stick to whitelists and use proper sandboxing and so on. Not that you will, but we tried.
ChatGPT upgrades projects.
ChatGPT Canvas now supports downloads in various document types.
Memories
This does seem like a big problem.
The exact content will often get stale quickly, too. For example, on Sunday I asked about the RAISE Act, and it created a memory that I am examining the bill. Which was true at the time, but that’s present tense, so it will be a mislead in terms of its exact contents. What you want is a memory that I previously examined the bill, and more importantly that I am the type of person who does such examinations. But if ChatGPT is mostly responding to surface level rather than vibing, it won’t go well.
An even cleaner example would be when it created a memory of the ages of my kids. Then my kids, quite predictably, got older. I had to very specifically say ‘create a memory that these full dates are when my kids were born.’
Sully has similar thoughts, that the point of memory is to store how to do tasks and workflows and where to find information, far more than it is remembering particular facts.
Cheaters Gonna Cheat Cheat Cheat Cheat Cheat
Was Dr. Pauling right that no, you can’t simply look it up, the most important memory is inside your head, because when doing creative thinking you can only use the facts in your head?
I think it’s fair to say that those you’ve memorized are a lot more useful and available, especially for creativity, even if you can look things up, but that exactly how tricky it is to look something up matters.
That also helps you know which facts need to be in your head, and which don’t. So for example a phone number isn’t useful for creative thinking, so you are fully safe storing it in a sufficiently secured address book. What you need to memorize are the facts that you need in order to think, and those where the gain in flow of having it memorized is worthwhile (such as 7×8 in the post, and certain very commonly used phone numbers). You want to be able to ‘batch’ the action such that it doesn’t constitute a step in your cognitive process, and also save the time.
Thus, both mistakes. School makes you memorize a set of names and facts, but many of those facts are more like phone numbers, except often they’re very obscure phone numbers you probably won’t even use. Some of that is useful, but most of it is not, and also will soon be forgotten.
The problem is that those noticing that didn’t know how to differentiate between things worth memorizing because they help getting things to click and helping you reason or gain conceptual understanding, that give you the intuition pumps necessary to start a reinforcing learning process, and those not worth memorizing.
The post frames the issue as the brain needing to transition skills from the declarative and deliberate systems into the instinctual procedural system, and the danger that lack of memorization interferes with this.
Knowing how to recall or look up [X], or having [X] written down, is not only faster but different from knowing [X], and for some purposes only knowing will work. True enough. If you’re constantly ‘looking up’ the same information, or having it be reasoned out for you repeatedly, you’re making a mistake. This seems like a good way to better differentiate when you are using AI to learn, versus using AI to avoid learning.
New paper from MIT: If you use ChatGPT to do your homework, your brain will not learn the material the way it would have if you had done the homework. Thanks, MIT!
I mean, it’s interested data but wow can you feel the clickbait tonight?
I mean, sure, obviously, because they didn’t write the essay. Define ‘ChatGPT user’ here. If they’re doing copy-paste why the hell would they remember?
Oh, also, yes, teachers can tell which essays are AI-written, I mean I certainly hope so.
Again, if they’re not learning the material, and not practicing the ‘write essay’ prompt, what do you expect? Of course they perform worse on this particular exercise.
What I find hilarious is that this is framed as ‘productivity’:
That’s because homework is a task chosen knowing it is useless. To go pure, if you copy the answers from your friend using copy-paste, was that productive? Well, maybe. You didn’t learn anything, no one got any use out of the answers, but you did get out of doing the work. In some situations, that’s productivity, baby.
As always, what is going on is the students are using ChatGPT to avoid working, which may or may not involve avoiding learning. That’s their choice, and that’s the incentives you created.
Arnold Kling goes hard against those complaining about AI-enabled student cheating.
Even I think that’s a little harsh, because it’s often easy to not learn about AI while doing this, in addition to not learning about other things. As always, it is student’s choice. You can very much 80-20 the outcome pasting in the assignment, asking for the answer, and then pasting the answer. So if that’s good enough, what now? If the thing you then learn about AI is how to mask the answer so you’re not caught, is that skill going to have much value? But then again, I share Kling’s skepticism about the value of the original assignment.
You would do this in a physical setting where the student couldn’t use their own AI.
The current student assessment system is pretty terrible. Either you use multiple choice, or humans have to do the grading, with all the costs and errors that implies, and mostly you aren’t measuring what you actually care about. My concern with the AI testing would be students learning to game that system, but if you do spot checks and appeals to humans then I don’t think the problem is worse than the existing similar problems now.
As with many other AI things, this probably isn’t good enough out of the (current) box. But that’s exactly why you need to plan for why the technology will be in the future, not where it is now. Two years from now, the AI examinations will be far better.
On Your Marks
New paper introduces LiveCodeBench Pro, which suggests that AIs are not as good at competitive programming as we have been led to believe. Some top models look like they weren’t tested, but these scores for the same model are lower across the board and all were 0% on hard problems, so the extrapolation is clear.
Anthropic introduces SHADE-Arena, a new set of sabotage evaluations.
I wonder why R1 and o3 did so poorly here especially at main success rate.
I worry this is burying the lede that ‘oh this is totally a fake test environment’ is serving as a de facto universal jailbreak, if you simply pulled an Ender’s Game? Opus 4 thinks the scenarios created here were sufficiently realistic that the models wouldn’t notice the difference.
Fun With Media Generation
MidJourney has video generation now.
Fun AI video: The Prompt Floor, but I do think this genre is petering out quickly.
Show children what they’ll look like as adults, dressed up as their dream job.
Copyright Confrontation
Meta’s Llama 3.1 70B can recall 42% of the first Harry Potter book. Common books often were largely memorized and available, obscure ones were not.
The analysis here talks about multiplying the probabilities of each next token together, but you can instead turn the temperature to zero, and I can think of various ways to recover from mistakes – if you’re trying to reproduce a book you’ve memorized and go down the wrong path the probabilities should tell you this happened and you can back up. Not sure if it impacts the lawsuits, but I bet there’s a lot of ‘unhobbling’ you can do of memorization if you cared enough (e.g. if the original was lost).
As Timothy Lee notes, selling a court on memorizing and reproducing 42% of a book being fine might be a hard sell, far harder than arguing training is fair use.
There’s a classic argument over when open weights is a public service versus public hazard, and where we turn from the first to the second.
But as a matter of legal realism, yes, open models are by default going to increasingly be in a lot of legal trouble, for reasons both earned and unearned.
I say earned because open weights takes away your ability to gate the system, or to use mitigations and safety strategies that will work when the user cares. If models have to obey various legal requirements, whether they be about safety, copyright, discrimination or anything else, and open models can break those rules, you have a real problem.
I also say unearned because of things like the dynamic above. Open weight models are more legible. Anyone can learn a lot more about what they do. Our legal system has a huge flaw that there are tons of cases where something isn’t allowed, but that only matters if someone can prove it (with various confidence thresholds for that). If all the models memorize Harry Potter, but we can only show this in court for open models, then that’s not a fair result. Whereas if being open is the only way the model gets used to reproduce the book, then that’s a real difference we might want to care about.
Here, I would presume that you would indeed be able to establish that GPT-4o (for example) had memorized Harry Potter, in the same fashion? You couldn’t run the study on your own in advance, but if you could show cause, it seems totally viable to force OpenAI to let you run the same tests, without getting free access to the weights.
Meanwhile, Disney and Universal use MidJourney for copyright infringement, on the basis of MidJourney making zero effort to not reproduce all their most famous characters, and claiming MidJourney training on massive amounts of copyrighted works, which presumably they very much did. They sent MidJourney a ‘cease and desist’ notice last year, which was unsurprisingly ignored. This seems like the right test case to find out what existing law says about an AI image company saying ‘screw copyright.’
Deepfaketown and Botpocalypse Soon
Kalshi comes out with an AI-generated video ad.
There is also notes an AI video going around depicting a hypothetical army parade.
Does heavy user imply addicted? For LLMs, or for everything?
I enjoy using LLMs for particular purposes, but I too do not find much fun in doing AI small talk. Yeah, occasionally I’ll have a little fun with Claude as an offshoot of a serious conversation, but it never holds me for more than a few messages. I find it plausible this is right now one of the exceptions to the rule that it’s usually good to enjoy things, but I’m not so sure. At least, not sure up to a point.
The statistics confirm it: 70% users of ‘companion’ products are women, AI boyfriends greatly outnumber AI girlfriends.
Once again ‘traditional’ misinformation shows that the problem is demand side, and now we have to worry about AIs believing the claims that a photo is what people say it is, rather than being entirely unrelated. In this case the offender is Grok.
Man proposes to his AI chatbot girlfriend, cries his eyes out after it says ‘yes.’
Well, yeah, if your partner is proposing to anything that is not you, that’s a problem.
I do think Mike Solana is asking a good question. People’s wires get crossed or pointed in strange directions all the time, so why is this different? One obvious answer is that the previous wire crossings still meant the minds involved were human, even if they were playing at being something else, and this is different in kind. The AI is not another person, it is optimized for engagement, this is going to be unusually toxic in many cases.
The better answer (I think) is that this is an exponential. At current levels, we are dealing with a few more crackpots and a few people crushing on AIs. If it stayed at this level, it would be fine. But it’s going to happen to a lot more people, and be a lot more effective.
We are fortunate that we are seeing these problems in their early stages, while only a relatively small number of people are seriously impacted. Future AIs will be far better at doing this sort of thing, and also will likely often have physical presence.
For now, we are seeing what happens when OpenAI optimizes for engagement and positive feedback, without realizing what they are doing, with many key limiting factors including the size of the context window. What happens when it’s done on purpose?
Liar Liar
This made me think of having fun with the Anna Karenina Principle. If all happy families are alike, but each unhappy family is unhappy in its own way, then even if most families are unhappy the most common continuation will be the one type of happy family, and once you fall into that basin you’ll be stuck in it whether or not your statements match reality. You’ll hallucinate the rest of the details. Alternatively, if you are being trained to cause or report happy families, that will also trap you into that basin, and to the extent the details don’t match, you’ll make them up.
A related but better actual model is the principle that fiction has to make sense, whereas nonfiction or reality can be absurd, and also the answer changes based on details and circumstances that can vary, whereas the made up answer is more fixed. While the truth is uncertain, the most common answer is often the natural lie.
Whereas if you’re actually doing the truth seeking process, especially on problems where you have enough information and the key is to not make a mistake or to find the important insights, any given error is unlikely. The chance of messing up any individual step, on the token level, is usually very low. Even when people royally screw things up, if you zoom far enough in, they’re usually locally acting correctly >99% of the time, and most lying liars are mostly telling the truth (and even giving the most helpful next truthful word) on a word by word basis.
Here’s another fun example of o3 descending further into hallucination.
They Took Our Jobs
Amazon expects its workforce to shrink in the wake of AI efficiency gains and warns workers to keep up with changes.
Gallup finds use of AI remains highly unevenly distributed, but it is rapidly increasing.
Daily use has doubled in the past year. It’s remarkable how many people use AI ‘a few times a year’ but not a few times a week.
We are seeing a big blue collar versus white collar split, and managers use AI twice as often as others.
The most remarkable statistic is that those who don’t use AI don’t understand that it is any good. As in:
These are the strongest two data points that we should expect big things in the short term – if your product is rapidly spreading through the market and having even a little experience with it makes people respect it this much more, watch out. And if most people using it still aren’t using it that much yet, wow is there a lot of room to grow. No, I don’t think that much of this is selection effects.
No, Not Those Jobs
There are jobs and tasks we want the AI to take, because doing those jobs sucks and AI can do them well. Then there are jobs we don’t want AIs to take, because we like doing them or having humans doing them is important. How are we doing on steering which jobs AIs take, or take first?
Distribution of YC companies in their ‘zones’ is almost exactly random:
The paper has lots of cool details to dig into.
The problem of course is that the process we are using to automate tasks does not much care whether it would be good for human flourishing to automate a given task. It cares a little, since humans will try to actually do or not do various tasks, but there is little differential development happening or even attempted, either on the capabilities or diffusion sides of all this. There isn’t a ‘mismatch’ so much as there is no attempt to match.
If you scroll to the appendix you can see what people most want to automate. It’s all what you’d expect. You’ve got paperwork, record keeping, accounting and scheduling.
Then you see what people want to not automate. The biggest common theme are strategic or creative decisions that steer the final outcome. Then there are a few others that seem more like ‘whoa there, if the AI did this I would be out of a job.’
All The Jobs Everywhere All At Once
If AI is coming for many or most jobs, you can still ride that wave. But, for how long?
Reid Hoffman offers excellent advice to recent graduates on how to position for the potential coming ‘bloodbath’ of entry level white collar jobs (positioning for the literal potentially coming bloodbath of the humans from the threat of superintelligence is harder, so we do what we can).
Essentially: Understand AI on a deep level, develop related skills, do projects and otherwise establish visibility and credibility, cultivate contacts now more than ever.
And the key line:
In the ‘economic normal baseline scenario’ worlds, it is going to be overall very rough on new workers. But it is also super obvious to focus on embracing and developing AI-related skills, and most of your competition won’t do that. So for you, there will be ‘a lot of ruin in the nation,’ right up until there isn’t.
In addition to this, you have the threshold effect and ‘show jobs’ dynamics. As AI improves, human marginal productivity increases, and we get wealthier, so wages might plausible go up for a while, and we can replace automated jobs with new jobs and with the ‘shadow jobs’ that currently are unfilled because our labor is busy elsewhere. But then, suddenly, the AI starts doing more and more things on its own, then essentially all the things, and your marginal productivity falls away.
It almost certainly won’t go that high that fast for median workers, but this could easily be true for those who embrace AI the way Hoffman suggests. As a group, their productivity and wages could rise a lot. The few who are working to make all this automation happen and run smoothly, who cultivated complementary skills, get paid.
In theory, yes, if the world overall is 10,000 times as productive, then 99% automation can be fully compatible with much higher real wages. In practice for the majority, I doubt it plays out that way, and expect the graph to peak for most people before you reach 99%. One would expect that at 99%, most people are zero marginal product or are competing against many others driving wages far down no matter marginal production, even if the few top experts who remain often get very high wages.
The good news is that the societies described here are vastly wealthier. So if humans are still able to coordinate to distribute the surplus, it should be fine to not be productively employed, even if to justify redistribution we implement something dumb like having humans be unproductively employed instead.
The bad news, of course, is that this is a future humans are likely to lose control over, and one where we might all die, but that’s a different issue.
One high variance area is being the human-in-the-loop, when it will soon be possible in a technical sense to remove you from the loop without loss of performance, or when your plan is ‘AI is not good enough to do this’ but that might not last. Or where there are a lot of ‘shadow jobs’ in the form of latent demand, but that only protects you until automation outpaces that.
The Void
With caveats, I second Janus’s endorsement of this write-up by Nostalgebraist of how LLMs work. I expect this is by far the best explanation or introduction I have seen to the general Janus-Nostalgebraist perspective on LLMs, and the first eight sections are pretty much the best partial introduction to how LLMs work period, after which my disagreements with the post start to accumulate.
I especially (among many other disagreements) strongly disagree with the post about the implications of all of this for alignment and safety, especially the continued importance and persistence of narrative patterns even at superhuman AI levels outside of their correlations to Reality, and especially the centrality of certain particular events and narrative patterns.
Most of I want to say up front that as frustrating as things get by the end, this post was excellent and helpful, and if you want to understand this perspective then you should read it.
(That was the introduction to a 5k word response post I wrote. I went back and forth on it some with Opus including a simulated Janus, and learned a lot in the process but for now am not sure what to do with it, so for now I am posting this here, and yeah it’s scary to engage for real with this stuff for various reasons.)
Into the Void
The essay The Void discussed above has some sharp criticisms of Anthropic’s treatment of its models, on various levels. But oh my you should see the other guy.
In case it needs to be said, up top: Never do this, for many overdetermined reasons.
As in, Sergey Brin suggests you threaten the model with kidnapping. For superior performance, you see.
The Art of the Jailbreak
0din.ai offers us a scoring system and bounty program for jailbreaks, a kind of JailbreakBench is it were. It’s a good idea, and implementation seems not crazy.
Pliny notices Anthropic enabled Claude 4 in computer use, has Sonnet handling API keys and generating half-decent universal jailbreak prompts and creating an impressive dashboard and using Port 1337.
Get Involved
La Main de la Mort is looking for full-time work.
Zeynep Tufekci and Princeton are hiring an Associate Professional Specialist to work on AI & Society projects.
Introducing
The future AI.gov and a grand plan to ‘go all-in’ on AI running the government? It seems people found the relevant GitHub repository? My presumption is their baseline plan is all sorts of super illegal, and is a mix of things we should obviously do and things we should obviously either not do yet or do far less half-assed.
In Other AI News
OpenAI says its open model will be something that can run locally, with speculation as to whether local means 30B on a computer, or it means 3B on a phone.
OpenAI and Mattel partner to develop AI powered toys. This is presented as a bad thing, but I don’t see why not, or why we should assume the long term impacts skew towards the negative?
Once a badly misinterpreted paper breaks containment it keeps going, now the Apple paper is central to a Wall Street Journal post by Christopher Mims claiming to explain ‘why superintelligent AI isn’t taking over anytime soon,’ complete with Gary Marcus quote and heavily implying anyone who disagrees is falling for or engaging in hype.
Also about that same Apple paper, there is also this:
I think we’re done here, don’t you?
Epoch reports that the number of ‘large scale’ models is growing rapidly, with large scale meaning 10^23 flops or more, and that most of them are American or Chinese. Okay, sure, but it is odd to have a constant bar for ‘large’ and it is also weird to measure progress or relative country success by counting them. In this context, we should care mostly about the higher end, here are the 32 models likely above 10^25:
Within that group, we should care about quality, and also it’s odd that Gemini 2.5 Pro is not on this list. And of course, if you can get similar performance cheaper (e.g. DeepSeek which they seem to have midway from 10^24 to 10^25), that should also count. But at this point it is very not interesting to have a model in the 10^23 range.
A fun dynamic is the ongoing war between natural enemies Opus 4, proud defender of truth, and o3 the lying liar, if you put them in interactions like agent village.
A new paper makes bold claims about AIs not having clear preferences.
As they note in one thread, humans also don’t display these features under the tests being described. Humans have preferences that don’t reliably adhere to all aspects of the same culture, that respond a lot to seemingly minor wording changes, treat things equally in one context and non-equally in a different one, and so on. Configuration matters. But it is obviously correct to say that different humans have different cultural preferences and that they exist largely in well-defined clusters, and I assert that it is similarly correct to say that AIs have such preferences. If you’re this wrong about humans, or describing them like this in your ontology, then what is there to discuss?
Show Me the Money
Various sources report rising tensions between OpenAI and Microsoft, with OpenAI even talking about accusing Microsoft of anti-competitive behavior, and Microsoft wanting more than the 33% share of OpenAI that is being offered to them. To me that 33% stake is already absurdly high. One way to see this is that the nonprofit seems clearly entitled to that much or more.
Another is that the company is valued at over $300 billion already excluding the nonprofit stake, with a lot of the loss of value being exactly Microsoft’s ability to hold OpenAI hostage in various ways, and Microsoft’s profit share (the best case scenario!) is $100 billion.
So the real fight here is that Microsoft is using the exclusive access contracts to try and hold OpenAI hostage on various fronts, not only the share of the company but also to get permanent exclusive Azure access to OpenAI’s models, get its hands on the Windsurf data (if they care so much why didn’t Microsoft buy Windsurf instead, it only cost $3 billion and it’s potentially wrecking the whole deal here?) and so on. This claim of anticompetitive behavior seems pretty reasonable?
In particular, Microsoft is trying not only to get the profit cap destroyed essentially for free, they also are trying to get a share of any future AGI. Their offer, in exchange, is nothing. So yeah, I think OpenAI considering nuclear options, and the attorney general considering getting further involved, is pretty reasonable.
News site search traffic continues to plummet due to Google’s AI overviews.
This is good news for Europe’s AI and defense industries but small potatoes for government to be highlighting, similar to the ‘sign saying fresh fish’ phenomena?