This is a quick repost of a piece that used to be on my substack. It was written in February 2025.
TLDR: The central equation in AI economics is the relationship between the cost of generating a token, and the value (e.g. GDP) that token generates in society. Algorithmic efficiency and algorithmic effectiveness both aim to maximise this.
This chart has this whole essay on it. Feel free to stop reading here.
*
AI models can do well on benchmarks in one of two ways, or both simultaneously.
One way they do well is by throwing loads of tokens at an issue. Consider Ryan Greenblatt’s solution to the Arc Benchmark with GPT4o, or BEAST adversarial attack method. This strategy is simple: have a relatively competent language model generate ideas for a long time, then have a relatively straightforward system for pruning or selecting them.
If you can’t increase the value of the individual tokens, the way you make these models smarter in realistic terms is by making the cost per token cheaper. You do this by investing in algorithmic efficiency, compute efficiency, and by bringing down the prices of electricity. I’m going to call this intelligence efficiency.
A second way that AI models can do well on benchmarks is by picking tokens extremely carefully. This is a more human-like strategy, akin to the way in which chess players recognise the important parts of a position in ways that diminish the processing time required, or the way in which mathematicians internalise neuro-symbolic techniques for integrating information.
If you can’t decrease the cost of generating more tokens, the way you make these models smarter is by making sure the model outputs more valuable tokens. You do this by reinforcement learning on datasets that demonstrate good thinking technique and research taste. I’m going to call this intelligence effectiveness.
What’s a valuable token? There might be many ways to assign value to a token, but I’m going to say that the value of the token is equivalent to some miniscule sum of economic value. If a model processed a million tokens to automate a task worth 10 USD, the value of the token would be 0.00001 USD.
This is hopefully a fairly non-controversial state of affairs from which I now want to make three points.
First, contemporary AI companies are obviously trying to do both together: that is, they are trying to maximise both algorithmic efficiency and algorithmic effectiveness. Some AI companies pursue different strategies here. Google DeepMind, for instance, develops AI to save money on chips, thereby improving its intelligence efficiency. S1 used highly targeted datasets to improve intelligence effectiveness.
From July 2024, it seemed like techniques to boost intelligence effectiveness (e.g. RL) were stalling, whereas efficiency (through techniques like distillation) was improving rapidly. Now that post-training RL seems to be effective, we should expect a fast improvement in capabilities as the increased number of generated tokens each individually become more valuable. Should returns to RL post-training diminish in the second half of 2025, we might expect labs to respond by pushing for efficiency, pushing down token costs for tasks of the same economic value.
Second, it seems fair to say that a world in which effectiveness is more important than efficiency behaves differently from a compute-use and regulation point of view, and vice-versa. You can think of worlds in which effectiveness is the most important thing as having high drawbridges (because they depend on ideas/datasets) that fall fast (because datasets and ideas can be shared or distilled, and then anyone can run that model for a small amount of compute). Oppositely, you can think of worlds in which efficiency is the most important thing as having low fences (because anyone can access efficiency with enough compute) that fall slow (because you will still ultimately need compute to get in, and compute is getting cheaper, but comparatively slowly).
What does this mean in real terms? People have spoken a lot about the cheapness of DeepSeek and how efficient it was to train and run. However, as commentators like Dario have pointed out, this efficiency was basically on par with forecasts. Instead, the really worrying/impressive thing for DeepSeek from a capabilities point of view was that it seemed to let down the drawbridge of effectiveness (via RL) that has since been taken up by models like S1, and that even marginal effectiveness gains might multiply quickly in a world where intelligence is already quite efficient.
Third, it seems like there might be some value towards the limit in thinking about the economics of intelligence as a function describing the relationship between intelligence efficiency (unit: token cost) and intelligence effectiveness (unit: token value, in e.g. USD). By ‘towards the limit’, I mean that this becomes more valuable as we move towards scenarios where spending on AI is the most meaningful way of increasing national GDP. For now, the economics look ugly: models are unreliable, agents are nascent, diffusion is inefficient, and customers are unsure. In the long term, though (or if we hit AI R&D and the feedback loop begins), it might be that an increasing amount of economics can be modelled by the simple relationship between token value and token cost.