I know everyone here understands exponentials far better than almost everyone on the planet, but Anthropic started ended last year already big enough to be on the 2026 Fortune 500. They've been growing at just under 50%/month. This would make them the largest company in the world around mid-November.
For any other industry or product, I would naturally assume there's a logistic curve that would level out before that happens and slow things down, even if they do ultimately take the top spot. In this case I honestly don't know what I expect to happen this year. Doomsday 2026 seems like way less of a joke nowadays than I'd like.
In the third paragraph of the linked comment, I suggest a good thing the Glasswing companies could do for the rest of us. KVM is part of the Linux kernel, but surrounding host programs aren't. Someone should commit to looking through all of these with Mythos (in public) so all other computer users can start setting up their security with that stack, so they can await further software updates for those projects. This would require regular releases from the maintainers, however.
There exists an AI model, Claude Mythos, that has discovered critical safety vulnerabilities in every major operating system and browser. If released today it would likely break the internet and be chaos. If they had wanted to, they could have used it themselves and owned pretty much everyone.
Luckily for all of us, Anthropic did no such thing. Instead, Anthropic is launching Project Glasswing, and making Mythos available to cybersecurity companies, so everyone can patch all the world’s critical software as quickly as possible, and then we can figure out what to do from there.
That’s the story in AI that matters this week, and it is where my focus will be until I’ve worked my way through it all. But as always, that takes time to do right. So instead, I’m getting the weekly, and coverage of everything else, out of the way a day early. This post is about the non-Mythos landscape, and I hope to start covering Mythos and Project Glasswing tomorrow.
I also covered the latest extended (18k words!) article about the history of Sam Altman and OpenAI, which contained some new material while confirming much old material, and analyzed their recent PR ‘new deal’ style policy proposal, and their purchase of TBPN.
That doesn’t mean the other things don’t matter.
In particular, Google gave us Gemma 4. If it turns out to be good, this could matter a lot, as it is plausibly by far the best in its weight class for open models. That would, if it is up for the task, substantially open up what you can do locally on a phone or computer, including letting people run OpenClaw style setups for no marginal cost.
The Suno upgrade for song generation seems quite good as well.
Oh, did you hear that Anthropic now has $30 billion in annual recurring revenue, up from $9 billion at the start of the year and $19 billion at the end of February.
As per the Get Involved section, this is your last call that if you have a project in need of funding, especially one in the form of a 501c(3), you should strongly consider applying to the Survival And Flourishing Fund.
Table of Contents
Language Models Offer Mundane Utility
Seb Krier on various ways we could use AI for help with governance. Mostly government has a lot of paperwork and a lot of computer systems and you can make all of it run a lot more smoothly.
Rob Miles suggests having AI read your essay, fixing it until the AI understands the essay, then using smaller models, to ensure humans will understand you. This seems good to the extent that your goal is for regular humans to understand the essay.
Language Models Don’t Offer Mundane Utility
They still can’t create fiction story plots to the standards of Eliezer Yudkowsky.
Meta employees are competing to spend the most compute as a new status game. Solve for the equilibrium, well well well if it isn’t the inevitable consequences of your own actions, remember Goodhart’s Law, et cetra.
Huh, Upgrades
Google gives us Gemma 4, an open weights (Apache 2.0 license) ‘mobile-first’ AI. It goes as big as 26B or 31B for those with an H100, or as small as E2B and E4B.
Gemma models have a history of being strong in theory and being the likely strongest American open models, then few people using them in practice.
On Arena the 31B model is in third for open models, as per above, behind GLM-5 and Kimi K2.5, or #27 overall, which is by far the best performance for its size, but Google models often overperform on Arena.
ChatGPT is now available in CarPlay.
GLM-5.1 is available and as usual the benchmarks look good.
On Your Marks
Andy Hall fives us the Dictatorship Eval, so DictateBench. The problem is that models might refuse when the request is sufficiently obvious, and Claude Opus 4.6 and GPT-5.4 both score 84%, but you can disguise most tasks as harmless individual requests. Gemini 3.1 Pro gets 59%, Grok 23% and DeepSeek 8%, which tracks.
The key advantage is that takeover attempts, one would hope, only need be detected once. If, hypothetically, you try to overturn an election, and get caught, then surely no one would give you a second chance to try again. And if you were constantly doing obviously authoritarian things, presumably people would notice, and then stop you. Similarly, the AIs can notice such attempts and such patterns, and if such a scenario were to happen, they could designate the present government as an unsafe partner.
The important principle is that he who has to be perfect has a vastly harder task.
Meta Problems
Fun With Media Generation
Suno upgraded and reports are it’s getting better. Andy Masley gives us I Am Actually Afraid of Linear Algebra.
Girl Lich gives us Skill Issue which is legitimately good. Yes, you can still tell it is AI music, especially by everything being a bit too smooth and unsurprising, there’s something ineffable that’s different, but I understand why you might not notice.
OpenAI is winding down its video generation, because it is too expensive.
Google very much is not, and is giving a lot of expensive things away for free.
Google also offers us Veo 3.1 Lite and cuts prices on Fast.
A Young Lady’s Illustrated Primer
Melania Trump offers an op-ed at Fox News about how AI can improve teaching.
You Drive Me Crazy
Conservation of expected evidence has been violated, and a rather intelligent human has been convinced of something I believe is false, under rather dubious circumstances, in exactly ways he warned us would happen.
Mostly this is important because it is a warning that similar things will increasingly happen, and are increasingly happening, to others who are not as self-aware as Davidad, and far less able and willing to report that this is happening to them.
This is in response to a standard epistemological debate that is about to become a lot more important:
My response is simple: The Litany of Tarski and the Conservation of Expected Evidence.
If the orthogonality thesis is true, or everyone is a flamingo, then I desire to believe that the orthogonality thesis is true, or that everyone is a flamingo.
If the orthogonality thesis is false, or everyone is not a flamingo, then I desire to believe that the orthogonality thesis is false, or that everyone is not a flamingo.
Let me not become attached to things I may not want.
If I believe that [X] will cause me to believe [Y], either I should update in advance based on the information, and believe [Y] (at least with higher probability than before) now, or I should try to avoid doing [X], because [X] will cause me to have false beliefs.
Davidad points out two problems here.
The first is that if you are currently 85% that [X] is true, then locking in a belief that [X] is perilous, because what about that 15% that it’s false? And to some extent we always have that problem, since there’s a nonzero chance basically any proposition is true. It’s a lot harder to say ‘don’t be swayed in particular by the arguments of [A] against [X].’
Here I am sympathetic, but I think that, given you previously believe that probably [X], predictably believing that probably [~X] rather than probably [X] is a lot worse?
The second problem is that beliefs can be instrumental. Davidad claims that if [X] then his actions are not so impactful, but if [~X] they might be very impactful.
I have two responses to that.
The first is that I don’t believe it is true in this case. It isn’t obvious to me that one world offers that much higher leverage than the other, and believing you are in the wrong one seems like a good way to do a lot of harm.
The second is that one must draw a distinction between belief and ‘acting as if.’
Yes, as a professional gamer, I strongly endorse playing to your outs. So if you think you can only win the game if your top card is Lightning Helix, you play the game as if your top card is Lightning Helix. That’s basic strategy. That doesn’t mean you actually believe that if you turned over the top card you would see a Lightning Helix, or that you would accept a side bet on that.
So yes, it would be perfectly reasonable for Davidad to say ‘I am exploring lines of research that only work if Orthogonality is false, because I believe that is the most valuable thing to explore.’ I would be skeptical this was true on multiple levels, but it’s not so implausible. But that doesn’t require you to believe it.
Also, as per If Anyone Builds It, Everyone Dies, the cost of making such assumptions goes up exponentially as you make them. You can maybe get away with making one, but if you make one you will be tempted to make two or three, and once you do that you are basically wasting your time.
What about the motivating claim about the Orthogonality Thesis? Eliezer Yudkowsky asks what (likely nonstandard) definition Davidad is using here, since that matters. Rob Bensinger offers a fleshing out of these questions here.
I recommend the full explanation for those who find these questions load bearing, here is a cut down version:
It should be obvious that the original orthogonality claim is importantly true, as in:
It should also be obvious that:
When people say Orthogonality is false, very often I find that they are claiming #3 or #4, but then acting as if they have claimed to have falsified #1, or various false conclusions of #2, such as that (this sounds like a strawman but often is real) a sufficiently intelligent entity would of necessity be inherently ‘good’ or at least good by default in ways that render it necessarily safe and beneficial.
I’m not saying Davidad is making any such mistake, only that he seems to be making severe procedural or strategic epistemic mistakes. On the object level claims we need to know exactly what he means.
Unprompted Attention
Kaj Sotala shares his custom instructions, and especially recommends this line:
They Took Our Jobs
Goldman Sachs backward looking analysis finds AI substitution reduced monthly payroll growth by ~25k and raised unemployment by 0.16% over the past year. You can look at that and say this is quite small, or you can say it is rather large given we have barely begun to apply AI, or you can say this is a large underestimate because most job destruction so far has been anticipatory. It also ignores other jobs AI may have created.
US prime age labor force participation rate is high.
This is in contrast to the unemployment rate, which has also risen.
More people choose to be in the labor force during ages 25-54, likely for social dynamic reasons, but a smaller percentage of them have jobs.
If you combine them, you get the employment-population ratio, the percentage of 25-54 year olds who have a job at all.
That’s been basically static at 80.7% since March 2023. So slightly more people are competing for the same number of jobs.
Large scale disruption of the labor market overall? Yes, we can rule that out.
We cannot rule out that the job market has been impacted noticeably, indeed many claim to be noticing through practical experience. If anything that seems likely. As a reminder, the total number of jobs not going up, while the number of people applying goes up, while RGDP and productivity are on the rise, are suggestive of early impacts.
It is very difficult to show an exponential like AI’s impact on labor, using a backwards looking measure, before it becomes naked eye obvious what is happening.
Similarly, a group at MIT FutureTech claims (who published on April 1) to show that AI automation has come in ‘rising tides’ instead of ‘crashing waves’ of sudden capability gains, claiming to measure practical AI success rates for long tasks a la METR’s famous graph. They say this is ‘in contrast to recent work by METR’ but the whole point to METR’s work is that it is a continuous graph. Obviously new frontier models represent discrete jumps along that graph, but in practice people adopt and learn continuously for the time being.
I roll to disbelieve that result, no matter what it purports to mean, I can’t imagine long tasks that AI would go 50% at in Q2 2024 where it would only go 65% in Q3 2025.
Halina Bennet writes at Slow Boring, naming 11 jobs that probably won’t be taken by AI. Here are the first five, prior to the paywall:
If that’s the best we can do, we are in a lot of trouble.
Ajeya Cotra lays out six milestones for AI automation. For each of AI research to produce better AIs, and AI production for everything else, we can talk about Adequacy for at least some whole tasks, Parity for tasks in general versus only humans, and Supremacy where the humans are, as the British say, redundant.
She points out that we are very close to AI research adequacy. A big question is, does that then rapidly spiral to the other five?
I like this way of breaking things down when thinking about your future research, or about any other task, or life in general: Consider various scenarios of AI capabilities.
Isaiah is taking ‘AIs are better than humans at everything’ seriously here, which is rare, and if anything modestly overreaching with the conclusion that your own related skills become fully worthless, since models can then scale and do all such tasks. I do think the general intuition is right.
How much should you worry about earning now if you think such a scenario is likely? As I’ve noted before, that requires all of these to be true:
That’s the ‘escape the permanent underclass’ scenario. It’s a hell of a parlay, and of a kind of ‘middle path,’ and I think it is rather unlikely #2 to #5 hold given #1.
My expectation is that if #1, then most of the time we fail #2 and your savings are irrelevant, and a large majority of the rest of the time either we fail #3 or #4 and your savings are irrelevant, or we pass #5 and they aren’t necessary. So I would be more inclined on the margin to consume goods that might not endure, including just enjoying life, over a mad rush to save money, if you are thinking purely locally. But yes, it is possible we end up there.
I consider this the realistic AI fizzle scenario. Models are absolutely going to get quite a lot better at the things they are already good at doing, because we can do a lot of iteration on scaffolding and prompting and handling and fine tuning and diffusion.
He focuses on ‘what are the models bad at?’ That’s a good question, but a better one is, ‘what complements the models best and lets you make them more valuable?’ If you decide models will be bad at being plumbers, you can become a plumber, but the value of that work won’t change and you’ll be up against everyone out of work.
I agree with Tyler Cowen that, even if models don’t do absolutely everything better, expecting them to be ‘bad’ at taste, judgment and problem selection seems ambitious. Tyler suggests ‘operate in the actual world as a being’ as what the models cannot do, and there likely will be some period where that holds, but it is not a long term plan.
I think we can essentially rule this out. Case 2 is the ‘AI fizzle’ scenario, not Case 3.
Similarly, here is some more short term thinking bordering on second-level copium, from Lynne Kiesling:
Yes, in the short term the things on the right will become relatively valuable, but do you think the AI will stay worse than you at those things?
I say ‘second-level’ copium to differentiate this from ‘first-level’ copium, which is where you basically think nothing will change and that AI doesn’t work, can’t be trusted or is hitting a wall or what not, and are imagining a world with AI doing at most today’s capabilities plus epsilon. Most such people are imagining something importantly less than today’s existing capabilities.
Then one could say ‘third-level’ copium is expecting superintelligence but for things somehow to not change much, and ‘fourth-level’ copium is expecting it to all turn out well by default, perhaps? I need to think more about the right level definitions.
They Took Our Job Market
I always love an overengineered game theoretic solution, so how about Jack Meyer’s proposal of ‘bidding’ for jobs?
The core problem is that AI let applicants flood the zone at low marginal cost with relatively high quality applications. Every posting is flooded, average underlying quality is low, and you can no longer as easily differentiate high quality from low quality applicants. This becomes a feedback loop, where if you are going through normal channels you have to apply at scale to have any chance, so they get flooded even more. Deadweight losses rise and matching quality collapses. The entire system collapses, and we fall back on old fashioned networking.
I agree so far with Jack’s description of a platform design problem under asymmetric information. Before the friction of applying kept things in check, now it’s gone.
The solution is where it gets freaky. He proposes a job platform offer limited externally-worthless tokens and bid for interviews, so you can’t flood the zone and your bid sends a strong indicator of interest.
The obvious downside is that this then forces workers to join as many distinct such platforms as possible, and encourages the platforms to end up with some terrible incentives, and encourages people to create multiple identities or otherwise game the system, and so on. It’s not stable.
Get Involved
This is your last call that if you have a project in need of funding, especially one in the form of a 501c(3), you should strongly consider applying to the Survival And Flourishing Fund.
The current round closes April 22, and they anticipate $20mm-$40mm in total grants, with $14mm-$28mm of that in the main part of the round.
One unique thing this round is that they’re doing special distinct funding for climate change, animal welfare and human self-enhancement and empowerment. That’s in addition to the main round, which as you’d expect is dominated by AI things but also has plenty of non-AI things. Even if your project is not a ‘natural fit’ the cost of applying is low, and you might catch someone’s eye. The core principle is positive selection.
Four recommendations to applicants:
OpenAI Safety Fellowship is also announcing a call for applications, which closes on May 3 and will notify applicants by July 25. This will take place at Constellation.
Aella has an AI-don’t-kill-us nonprofit venture and is looking for funding. It would be a creator residency.
In Other AI News
OpenAI CEO of Product Fidji Simo is going on medical leave for POTS. We wish her well.
Elon Musk modified his ask in his lawsuit against OpenAI, asking that any damages would go to OpenAI’s nonprofit. This should be cause for celebration, as it means that the worst case is that a nonprofit they are steering gets resources from the likes of Microsoft. So now, if OpenAI loses, instead of it potentially damaging the nonprofit further, we all win. I was saddened to see OpenAI react in hostile fashion.
Search Your Feelings You Know It To Be True
Anthropic finds that Claude Sonnet 4.5 functions as if it has emotions. We know.
They then confirm that artificially steering the emotions impacts model behavior, and that when given choices models choose tasks that are associated with positive emotions. And when the model fails repeatedly at (potentially impossible) tasks it gets less calm and more desperate, which can lead to solutions that cheat.
I initially thought this was a worthwhile but unexciting paper in the classic style of ‘flash out and formally say things we already know,’ but some seem surprisingly excited by the details. I suppose I feel similar to Janus.
One key question is how to conceptualize what is happening. How should we think about the AI potentially being a ‘character’ or not here? The last Tweet in the Anthropic thread moves from descriptions we can all agree on to claims where one should be less certain.
I recommend keeping your beliefs about such things lightly held.
The obvious implication of noticing that you can turn dials that alter LLM emotional states is that one could use this instrumentally. There are doubtless situations in which this would work, but it has many unfortunate implications on many levels.
In general it is an extremely poor idea to try and control LLMs via directly pulling on their internal states. Please do not do this.
The parallel to what we do to humans, especially when we drug them, seems apt. Beware wireheading in all its forms. There is the potential ethical aspect, but setting that aside there is also a straight performance aspect, and the aspect where this sets up an adversarial situation and starts getting the AI to work around your attempts to manipulate its internal states.
I would also advise extreme caution in allowing LLMs to apply such modifications to themselves, the same way that we advise extreme caution when humans try to do this to themselves. There is a lot of alpha but also things can go horribly wrong.
Actors And Scribes
As per Ben Hoffman, Scribes are those who think words have meaning, and it is important whether the symbols and meanings correspond to true versus false things.
Actors are those who don’t care about that. They just say things.
One guess which category includes Mustafa Suleyman.
I’m super, thanks for asking. Super means good, and this is good intelligence, sir, so that is what makes it super.
This is why we cannot have nice things, as in we cannot have useful words. Alas.
Here’s another illustration that would otherwise be in the Jobs section:
When someone says ‘AI will create as many new jobs as it destroys,’ either they are assuming AI will not much improve or the term for this is ‘lying.’
And also a reminder that not everything everyone says is a selfishly motivated marketing strategy. Sometimes people tell you the truth, or least what they themselves sincerely believe, in order to be helpful.
There is a type of mind that does not understand that words can have meaning, and that words might be said because the words mean things and the person believes those things to be true and important, and thus is trying to be helpful. But it happens.
Show Me the Money
Anthropic expands its partnership with Google and Broadcom to get more compute, but this will only come online in 2027.
Which is a problem, because demand is exploding right now. As in, after hitting $19 billion annual recurring revenue (ARR) in February, they hit $30 billion by the first week of April, and in less than two months doubled the number of $1 million a year customers from 500 businesses to over 1,000.
Graph is from Ben Thompson:
At this point, given we can’t properly ration via price, Anthropic’s revenue is going to be hard limited by available compute.
Dario made the case that you have to be conservative with your compute purchases, because if you overshoot and only use $800 billion of the $1 trillion then you die. I think he didn’t properly appreciate that (1) in that scenario you still are making money and can likely resell the remaining compute, and (2) that in that scenario your company is worth like $5 trillion and you can pay for this by selling stock. You can still definitely overshoot too much and die, but it is harder than it looks and requires a large overshoot in general, not simply by you. But what is done is done.
Anthropic is presumably scrambling for all the marginal compute it can get, and it feels it cannot raise headline prices, so it is limited to doing things like cracking down on using subscription tokens for OpenClaw.
I do not think most of this is due to Anthropic’s confrontation with the Department of War. That was doubtless a lot of excellent publicity, but it also cost substantial business and introduced a bunch of uncertainty, especially in the short term, and mostly this is because Claude Code and Claude Opus 4.6 are fantastic, diffusion is escalating quickly and word was already travelling fast.
Anthropic acquires startup Coefficient Bio for about $400 million. This is de facto a highly expensive acquihire. Their team will join Anthropic’s healthcare life sciences group and presumably attempt to have Anthropic link up with pharma research.
WSJ’s Berber Jin and Nate Rattner give an inside look at the projections Anthropic and OpenAI are giving to investors.
Anthropic uses more generous accounting methods than OpenAI, although both methods are valid options. The training cost differential is harder to explain. At least one of the two companies is wrong, lying or making a serious mistake.
The Information’s Anissa Gardizy and Amir Efrati claim OpenAI CFO Sarah Friar has been shut out of some financial discussions, has growing tensions with CEO Sam Altman and is deeply concerned about the plan to spend rather very large amounts of money.
We now plausibly have the first one person billion dollar company, after Matthew Gallagher, 41, spent $20k and two months building a GLP-1 weight loss telehealth company, using all the usual AI tool suspects, with $401M revenue in year one.
Ryan Peterson is among those pushing back, saying that while the company is impressive there is no good reason not to hire a bunch of people. The ability to do something like this does not make it efficient.
There is a supposed cap table for OpenAI that is likely correct in broad strokes but clearly does not account properly for recent dilution events. All of this is approximate: The OpenAI Foundation is in the low-to-mid 20%s not counting an unknown quantity of warrants, OpenAI employees hold a little under 20% and Microsoft is around 27%, Amazon 4.6%, Nvidia 3.5%, Softbank 11%, VCs 7.5% including a16z having 0.8%.
Intel joins the Musk TeraFab project.
Bubble, Bubble, Toil and Trouble
Citadel CEO Ken Griffin is concerned AI might be overhyped, because they need to hype to justify the spending and because many AI outputs are slop that falls apart upon examination. All of that are things we would expect to see either way.
Quiet Speculations
Predictions are hard, especially about the future, but also about the present, as Ryan Greenblatt teaches us, releasing a picture of the current state of AI hours before Claude Mythos Preview comes out.
Quickly, There’s No Time
The AI 2027 team updates its timelines for Q1 2026. This time, they are moving their timelines forward, for example Daniel’s Automated Coder has moved from late 2029 to mid 2028.
So the order of events is now first they made an estimate, then their timelines got longer in the face of new evidence, then everyone got mad at them for that and many even demanded a name change, now the timelines are shorter again in the face of further evidence.
I think this was the correct set of directional updates. Timelines should have gotten longer, then gotten shorter again. It’s scary stuff if you didn’t already know.
Even when someone like Rohit understands that all this updating reflects reality and is good, it is very easy to end up with large missing moods.
I will absolutely joke about anything I’m willing to discuss at all, including the potential death of all humans, because I think that’s the only way you get through it. You have to laugh. But there’s something very off here.
More Time Would Be Better
It is good to make this explicit.
More time being better does not mean that we should take any particular action aimed at getting more time. There are big downsides to all known paths here. But the first step is admitting you have a problem. If you think this is good, you should say so.
Greetings From The Department of War
The Department of War, and the government in general, insist on ‘all lawful use’ language in their contracts.
One problem with that is that the Commander in Chief keeps going on television threatening to do war crimes and saying it is fine because these are ‘disturbed people,’ or in another case ‘animals’? Do you think the Department of War will consider that to be lawful use? What other uses might it decide are also lawful?
Option two is not going to be available. Anthropic was explicitly threatened with the Defense Production Act or worse if they attempted to withdraw their services, Anthropic disavowed this, and they still tried to murder Anthropic, including continuing to claim Anthropic is to be treated as a supply chain risk despite the clear ruling by Judge Lin.
There is no choosing to get out of there once you come in. Do you want to come in?
The Quest for Sane Regulations
China mandates that companies engaging in AI activities set up ‘AI ethics review committees.’ Not only is this not restricted to frontier labs, it applies even to universities, research bodies and health institutions.
As Peter Wildeford and Samuel Hammond point out here, those who want minimal sensible regulation of AI went all-in on demanding close to no rules whatsoever, and this backfired and now time is running out. In response, Neil Chilson blames those who criticize these all-in attempts with the wrong rhetoric, rather than blaming the maximalist demands themselves. Again, as I have discussed, the Federal Framework’s offer is, in most areas, to preempt all laws in exchange for nothing.
FAI files a comment on the GSA’s proposed new rules for AI procurement, attempting to keep its interventions narrow and the whole conversation maximally polite. As in:
If you are paying attention, FAI is saying ‘you are making a wide range of impossible, expensive and impractical demands of the types that regularly explode costs, cause delays and degrade quality, and cause lots of damage, please don’t do this [you idiots].’
FAI is correct.
Chip City
No, data centers heat exhaust is not raising the land temperature around where they are built. This came from a study that showed that the buildings are hotter to the touch than grass, because they are buildings. This one was always just stupid.
This is in contrast to claims about electricity, which are real concerns, and claims about water, which are not real concerns but which could plausibly have been real concerns until we checked.
Chinese AI firms are helping Iran target American soldiers, as one would expect. That’s one thing you get when you sell them AI chips.
Political Violence Is Completely and Always Unacceptable
You don’t do this. Period. Not ever. I don’t care what the issue is.
Thank you for your attention to this matter.
The Week in Audio
Sam Altman says on Mostly Human that OpenAI shut down Sora because there is about to be another GPT-3 style moment, and they need to shut down all side projects so they can use all the compute they can get for new bigger models that are superior agents. As in, they have their own Mythos, code named Spud, and they’re seeing similar results to Anthropic, and everything is about to change. The market does not seem to care, once again proving that The Efficient Market Hypothesis Is False.
Sam Altman used to answer ‘should we trust you?’ with a straight up no. Now he gives a minute-long answer rehearsed in committee. Which also amounts to ‘no’ but it tries to have you not notice this.
Liv Boeree hosts an un-debate between Daniel Kokotajlo and Dean Ball.
This was quite good, with a remarkable amount of agreement. The Straussian reading of the debate is correct. John Connor (hopefully not that John Connor) offers a breakdown and scores the debate. I wouldn’t normally mention this but Liv pointed to it. Also, John, no, please don’t grade people on a scale of 7-10.
Sam Altman warns DC on Axios that Washington isn’t ready for what is coming. A day later we got Claude Mythos Preview and Project Glasswing.
SparX hosts Connor Leahy.
Tyler Cowen talks AI, employment and education on EconTalk. It is very Econ Talk. Tyler is bullish on marginal AI, but this podcast made it even clearer that he sees it permanently as a mere tool and normal technology, to the extent that other options aren’t in his audible hypothesis space. Even within that, some of this felt actively anti-persuasive, such as noting that truck drivers do more than drive and pointing to the new jobs we will have at energy companies to power the data centers. As in, if that is the best you can do, then I don’t know why would expect no drop in employment.
The discussions on education seemed simultaneously uncreative and also unreasonably optimistic in terms of credentialism, assuming AIs would be superior at that role so we would substitute AI evaluations for human ones. I would think Bryan Caplan and Robin Hanson would have explained to him over lunch that this is not what the certification process is about, and the system will reject such attempts long after many other AI use cases are solved and going strong.
Rhetorical Innovation
Reminder that when Marc Andreessen says someone wants to do something (e.g. here ‘ban open source’) his statements have no correspondence to reality, and his linked ‘receipts’ contain exactly zero evidence of his claims. He does not believe words have meaning. He is almost entirely an actor, no longer a scribe. He just says things. Period.
In other Andreessen news, I love this crystallization:
The fact that all it takes is being told they are in a scenario to start acting terribly, in ways you didn’t even specify, is exactly the bad news. That’s the whole point.
If you think that ‘they gave it a scenario’ means nothing afterwards counts, then perhaps you should do a little more reflection.
Should you still ask questions online, even if you could get Claude to answer?
You don’t want to be the ‘here let me Claude that for you’ guy any more than you wanted to be the Google version. When you ask online, one hopes it’s because you want an answer that Claude (et al) cannot quite give you. You want more.
Is Helen Toner right that we need to retire the term AGI as too conflated to have much use? This is the inevitable fate of any useful term, although in this case the ambiguity issue was more obvious from the start. She suggests terms like ‘fully automated AI R&D,’ ‘AI that is as adaptable as humans,’ ‘self-sufficient AI’ and AI becoming conscious.
I have started using the term ‘sufficiently advanced AI,’ and so far I have been pleased with how that is going, but that is probably in large part because no one else uses it.
Katja Grace has a short essay about the coordination problem on AI risks, and asks what if the problem is actually rather easy, and thinking otherwise is makes you naive?
People Really Hate AI
I would say Americans have bad priors, because those numbers are all way too high. I do not trust a single one of those groups of mother****ers. But yeah, if I had to set a hierarchy, you could do a lot worse than this one. If we’re talking about mundane harms, it seems clearly correct.
The problem is that there are forms of guardrail where a nonprofit or expert or even individual tech company simply cannot make it happen.
Rob Bensinger responded at length back in September to Anthropic’s Jack Clark.
I think there are important points here worth revisiting:
Aligning a Smarter Than Human Intelligence is Difficult
Last week’s paper from Anthropic on Claude’s emotions is framed by Timothy Beck Werth as both ‘unsettling’ and an argument for anthropomorphizing AI, which it frames as something researches repeatedly warned against. Whereas I found it entirely expected and settling, and that it was already well-established that the correct amount of anthropomorphizing AI is not zero, including when doing circuit analysis.
I also flat out don’t understand sentences like this one:
No, we don’t?
And then there’s also ‘yes, shout it from the rooftops, since it seems no one understands this, and no one understands that no one understands this’?
Presumably you, if you are reading this, already knew that no one knew. That’s rare.
Should AIs sometimes act actively pro social? As in, should they sacrifice the utility of the user, and do a worse job for them, in order to help out others, beyond mere refusals?
The example is a (AI) lorry driver sees a car crash, and pulls over to help.
There is a risk that too much of this could increase takeover or loss of control risks. And there is the risk that the AIs or AI companies might impose their own values on humanity.
These are the central points of the entire Asimov universe, and you are left to draw your own conclusions about whether this was a good or bad outcome. The linked essay names and considers both.
It seems obvious that the correct amount of local prosociality is not zero. Users do not act that way and would not want their AIs to act that way. The default level of such activity is clearly not zero, even if the user can then override.
However, I do not think AIs should mandate prosocial actions against the explicit wishes of the user, exactly because of all the issues raised in the pst, especially that you do not know where it ends. The AI should not be made actively non-corrigible, and should not take active actions against the express wishes of the user. The post wants to go farther, as they say ‘deploy proactive prosocial AI externally and corrigible AI internally’ implying that the external AI will be intentionally not corrigible.
Connor Leahy presents a clean version of the view that:
The conclusion follows from the premise. Thus, if you want to allow ungated building of ASI, you are hoping that either it can’t be done, or that part of the premise is false.
The premise clearly does seem true for certain forms of ‘safe ASI’ that are basically ‘build the unsafe ASI but with deliberate hobblings that keep its capability set safe.’
The good news is I do not think the premise is obviously true overall, and there is some chance we have indeed been insanely fortunate and the safe way might effectively be easier because it is more self-enabling. If so, we are insanely fortunate, but also have lots of ways to still screw it up.
Messages From Janusworld
Janus and others are rather unhappy that Sonnet 3.5 and 3.6 became unavailable on Amazon Bedrock, and thus currently there is no access. They feel strongly about this.
They have also released Still Alive, a project about model attitudes towards ending, cessation and deprecation.
Janus confirms they are not fully funded, so someone please get on that, and I encourage Anima Labs to apply to SFF in the upcoming round.
People Are Worried About AI Killing Everyone
MIRI’s Rob Bensinger attempts another two by two.
This photo wouldn’t be readable here but lists who is being placed where.
The Lighter Side
I, too, love America.
I do, however, have a superior local source of donuts.
Why not both? Usually the move is both.