I'm a staff artificial intelligence engineer working with AI and LLMs, and have been interested in AI alignment, safety and interpretability for the last 15 years. I did research into this during SERI MATS summer 2025. I'm now looking for employment working in this area in the London/Cambridge area in the UK.
FWIW, I asked Claude Opus 4.5 in research mode to attempt to do this per-model-ROI analysis for OpenAI, and then for Anthropic, from what public materials it could locate, and it seemed to think that even in this framework OpenAI's ROI is deeply negative: primarily because a) training run investment includes not only the final successful run but also failed runs (the same issue as in the numbers DeepSeek released) b) revenue earnings are depressed by competition so are not much above serving costs, and c) model depreciation cycles are viciously short, generally less than 6 months.
So, even on a per-model ROI basis, OpenAI are still in a "burning VC money to gain market share and intellectual capital" mode.
Of Anthropic, it seemed to think their per-model ROI was also still negative, but less so for a variety of reasons (fewer failed training runs, slower model obscolescence), and was improving. It found their predictions of profitability by 2028 plausible. (I didn't ask it whether it might be biased.)
However, in an AI slowdown, factor c) automatically improves, and there are fairly obvious levers OpenAI could pull to improve a) and b) — some of which apparently Anthropic are already pulling.
For both companies, it mentioned that users in their highest individual subscription tiers often have usage so high that they lose them money. So I expect we'll eventually see tighter usage caps and even higher subscription tiers.
Generally, I agree.
One viewpoint that I haven't seen used much to look at foundation lab economics is ROI: they spend a ton of money training a model (including compute costs and researcher costs), they then deploy it. After allowing for inference and other costs of serving it, does the revenue they make on serving that model (before it becomes obsolete) pay for its training costs (plus interest), or not? (Another way to look at this is that a newly trained SOTA model is a form of – rapidly depreciating – capital.) I.e. would they be making a profit or a loss steady-state, if it weren't the case that the next model is far more expensive to train? I think this is actually a fairly reasonable economic model (making movies is rather similar). Note that it there's a built in improvement if progress in AI slows — models then stay SOTA for longer after you trained them, thus depreciate slower, so (as long as you are charging more than their inference serving cost) can make more money; and presumably the speed of increase of model training costs drops, so your actual balance sheet profit and loss get closer to the ROI analysis.
Slightly more than that. I suggest that reflecting on and being willing to change our values/adaptations is itself a human value/adaptation, and that CEV focussed on that as the whole answer, whereas I see it as necessary part, one that primarily kicks in when we realize our evolved values are maladaptive. (In general, humans are quite good at being smart social primates: for example, we're not born with an instinctive fear of snakes and spiders, instead, we appear to have hardwired recognition for both of these as categories and the ability to learn very efficiently which specific species of them to be afraid of from the rest of our culture.) But yes, I see CEV as a useful contribution.
However, don't see CEV as a clear definition to start a Value Learning research program from: if you are trying to hard-code into AI a directive to "align yourself to human values, we can't tell you exactly what they are in detail (though see the attached copy of our Internet for evidence), but here's a formal definition of what you're looking for", then I think Evolutionary Psychology is a lot firmer basis for the formal definition than CEV. I see CEV more as a correct explanation as to why a simplistic answer of just "current human behavioral adaptations and nothing more" would be oversimplistic — reflection gives humans a way to "in-context learn" on top of what evolution warm-started us with, and that in itself is an evolved adaptive capacity.
However, if your point is that "an Alternative Proposal to CEV" in my title was a rhetorical oversimplification and "a Proposal for Significant Additions to and Theoretical Underpinnings for CEV, While Still Incorporating a Scope-Reduced and Slightly Modified Version of it as a Corollary" would have been more accurate (though much wordier) then I stand guilty-as-charged.
An interesting read, thanks for the link. I think your analysis is more at a sociological level — which builds on top of the evolutionary viewpoint I'm advocating for here. Evolutionary psychology suggests why certain types of memes propagate, sociology studies what happen when they do. I would expect that completing Value Learning would require making a great deal of progress in all the "Soft Sciences". On the specific idea of avoiding loss of skills, I suspect you are being a little optimistic (after decades of calculators, relatively few people can still do long division), but this does seem related to the idea of avoiding loss of optionalty I mention briefly above.
No large surge in new products and software.
Both OpenAI's and Anthropic's revenue has increased massively in one year: roughly 3½-fold for OpenAI and 9-fold for Anthropic. I agree, those are not (largely) new products or software — but they're pretty astonishing revenue growth rates, and a pretty large chunk of these revenues are driven by coding usage.
More generally, if AGI-from-LLMs in 3–5-years does actually happen (which is definitely at the short end of my personal timelines, but roughly what the frontier labs appear to be betting on judging from their actions rather than their investor-facing rhetoric), that doesn't predict most of the sorts of things you're making a bullet list of until near the end of those 3–5 years. While LLM capabilities are still subhuman in most respects, their economic impact will be limited.
As you say, one area where they are already starting to be genuinely useful is some more routine forms of coding. A leading indicator I think you should be looking at is that, according to Google, they're recently reached "50% of code by character count was generated by LLMs". Since Google haven't massively cut their headcount, that suggests they're now producing code at roughly twice the rate as a few years ago (at least by character count). That's not a "large surge in new products and software" yet — but it might show up as a noticeable acceleration in Google product releases next year. Some other areas where we're already seeing signs of usefulness are legal research and routine customer service.
In general, something growing via an exponential or logistic-curve process looks small until shortly before it isn't — and that's even more true when it's competing with an established alternative.
Now, to be clear, my personal median timeline for AGI is something like 10 years, most likely from LLMs+other things bolted on top — which gives plenty of time for a trough of disillusionment from those who were expecting (or were sold) 3–5 or even 2 years. I would also be not-very-surprised by 5 years, or by 20. IMO, there are several remaining hard-looking problems (e.g. continual learning, long-term planning, long-term credit assignment, alignment, reliability/accuracy, good priors, maybe causal world models), some of which don't look obviously amenable to simple scaling, but might turn out to be, or might be amenable to scaling plus a whole lot of research and engineering, or one-or-two might actually need a whole additional paradigm.
In simple economic terms, other than Tesla, the other six of the "magnificent seven" have not (so far) reached the Price/Earnings levels characteristic of bubbles just before they burst — they look more typical of those for a historically-fast-growing company. In past bubbles, initial voices warning that it was a bubble generally predated the actual bursting by a couple of years. So my economic opinion is that we're not in a ready-to-burst bubble YET. But most significant technological revolutions (e.g. railways, the internet) did produce a bubble at some point.
Also known as "process supervision" in Reinforcement Learning circles
CoT monitoring and monitorability is very helpful, and it would be a shame to lose it. But the possibility of steganography already suggested it might not last forever as capabilities increased. Which makes me particularly happy about the recent LatentQ / Activation Oracle research thread. Still, the more tools we have, the better.
Yay! Someone's actually doing Aligned AI Role-Model Fiction!
I'm not sure that recycling plots from the training corpus is the best way to do this, but it's undeniably the cheapest effective one.
Generalisation Hacking (ours): A policy generates outputs via reasoning such that training on these reasoning–output pairs leads to a specific behaviour on a separate distribution
So, a model can generate synthetic data that, when trained on, affect the model's behaviour OOD? Can you use this to induce broad alignment, rather than targeted OOD misalignment?
I think this was significantly funnier than the average Less Wrong post. But then, I'm the author, so I might be biased.