Authors: Joshua Qin, Mohammad Khan, Jaray Liu

This blog post covers material for the sixth lecture of Harvard CS 2881r: AI Safety and Alignment, taught by Boaz Barak. We provide an outline of the lecture below.

Links

Introduction

The idea of recursive self-improvement lies at the core of contemporary debates over AI takeoff dynamics. As models become active participants in their own development pipelines, such as optimizing architectures, generating training data, and automating research, the feedback loop between capability and capability growth begins to close. Recent analyses, from Epoch’s Gate Model to Anthropic’s “Takeoff Speeds” and the Three Types of Intelligence Explosion framework, attempt to formalize this transition: when does scaling shift from incremental efficiency gains to endogenous acceleration? In this view, recursive self-improvement is not a speculative singularity but an emergent property of scaling laws, where performance improvement itself becomes the substrate of further progress. The frontier question, then, is quantitative rather than philosophical: what functional form best describes the curve of intelligence when the optimizer learns to optimize itself?

Intelligence, Formalized

We provide an economic viewpoint of intelligence. To formalize a notion of intelligence growth, we define an intelligence function $I (t)$ , which can measure say the length of duration that a human human would take to do tasks that AI can successfully complete within some fixed success rate, cost, and wall-clock time, e.g. 50% at $100 at an hour. We consider three options for growth of $I (t)$ :
1. Constant AI growth: $I (t + 1) = I (t) + c \to I (t) = c t + constant$
2. More intelligence = more growth : $I (t + 1) = c \cdot I (t), c > 1 \to I (t) = constant \cdot c^{t}$
3. More intelligence = more growth in less time: $I (t + \frac{d}{I (t)}) = c \cdot I (t) \to I (t) = \frac{constant}{t^{*} - t}$

This intelligence function serves as the foundation for production functions and models of automation. The third possibility, a super-exponential function, is most concerning, in which as intelligence increases, the time required for each improvement shrinks, reaching a vertical asymptote at $t = t^{*}$ . British mathematician I.J. Good described the phenomena as once an AI can autonomously improve its own design, “intelligence will increase rapidly until it far surpasses human intelligence.”

Baumol Cost Disease

The Baumol Cost Disease offers an interesting counterpoint to the dream of boundless automation. Imagine an economy with farmers and teachers, each earning $1 per day. Farmers produce six meals daily, and teachers educate four students. Each contributes equally to GDP. Then, a technological leap quintuples farming productivity, and farmers now produce thirty meals a day. Yet wages must remain roughly balanced: if a farmer earns $X, a teacher must too, or no one would choose teaching. Fewer farmers are now needed to feed everyone, but education remains stubbornly human-bound; each teacher still reaches only a handful of students.

The paradox is that even as the economy’s aggregate productivity soars, the relative cost of education rises, because it resists automation. Under Baumol's framing, sectors that cannot scale with technology, those that are rooted in human time, attention, and care, grow more expensive not because they worsen, but because everything else becomes more efficient and technilogically advanced.

Baumol effect - Wikipedia

On a more philosophical note, recursive self-improvement represents the theoretical endpoint of curing Baumol’s disease. Any bottleneck in education, healthcare, and creative work has always been their dependence on human cognition, tasks that scale linearly with attention, empathy, or expertise. But if intelligence itself becomes automatable, the constraint dissolves. A self-improving AI doesn’t merely accelerate output in capital-intensive sectors, rather it erodes the wage equalization mechanism that once kept low-automation work expensive. When teaching, diagnosis, and design can be recursively optimized by the same systems that once only optimized manufacturing, the traditional divergence between “productive” and “stagnant” sectors can collapse.

Production Functions

To further model the rate at which our intelligence function grows, we consider the production of intelligence to be a weighted function of current intelligence, and compute, which we represent with a function C(t) analyzing current compute capabilities. We thus define $\frac{d I}{d t} \propto I (t)^{α} C (t)^{1 - α}$ , and we also suppose compute itself scales with intelligence, such that $\frac{d C}{d t} \propto I^{c}$ . Depending on the value of c, measuring the elasticity of compute growth with respect to intelligence, we see how smart systems can accelerate their learning. When $c < 1$ , compute growth lags behind intelligence growth and yields polynomial improvement, prevent RSI from taking off in an explosion and thus RSI “fizzles.” When $c = 1$ , we have balanced exponential growth, in which we have $I (t) = exp (g_{I} \cdot t)$ . Finally, when $c > 1$ , self-reinforcing feedback creates the “intelligence explosion” effect in which intelligence diverges at some time $t^{*}$ .

Task Automation

Using our conception of intelligence, we can also analyze how tasks are automated over time, where we consider exponential intelligence growth such that $I (t) \propto exp (g_{I} \cdot t)$ . Let’s assume that tasks have a “complexity” random variable $X$ with pdf $p (x)$ , that is heavy tailed such that its survival function, the complement of the CDF, follows $F (I) \approx I^{- c} \to F (I (t)) \approx exp (- c \cdot g_{I} t)$ . Then, at time t, we assume tasks with complexity $X \leq I (t)$ will be automated, and thus the proportion of tasks that are non automated by time t follows $F (I (t)) \approx exp (- c \cdot g_{I} t)$ , and so the fraction of non-automated tasks diminishes exponentially.

Recursive Self Improvement — AI 2027

To concretely understand the implication of recursive self improvement, we discussed AI 2027, a detailed forecast scenario that includes self improvement predictions and summarized the RSI-specific predictions:

Early 2026 – 50% Faster Research Process Using AI Assistants. This means that frontier labs are making algorithmic improvements 50% faster^[1] using assistants than they would have been able to without.

January 2027 – Online Learning Models Specializing at AI R&D. Frontier labs will be continuously developing (but likely opt not to release) a model that is “as good as the top human experts at research engineering” and has similar “research taste” (intuition on the best research direction) as a mediocre research scientist working at a lab^[2].

March 2027 – Algorithmic Breakthroughs Aided by AI Assistants. Entire datacenters are used by labs to host frontier models that run experiments and produce synthetic data that continually make the agents smarter. This is accompanied by breakthroughs in model design that further speeds up this process^[3].

This is accompanied by the production of an AI “superhuman coder”. METR, a company that develops AI evals, has observed that the time horizon of coding tasks that AI agents can complete reasonably and autonomously doubles once every 7 months. One way of formulating the question of recursive self-improvement is whether we will continue following the METR curve (shown below) or "explode" in intelligence, leading to much more drastic-than-expected results, such as progress occuring in a few years that we only expected in the timescale of decades.

Importantly, although the authors of AI 2027 note that their predictions after 2026 are quite speculative, several forecasters believe that a “superhuman coder” model will appear by 2027:

June 2027 – AI Does Most Research. Frontier labs now has several hundred thousands of its SC (superhuman coder) model running at many times faster than the human brain. Due to sheer size and speed, most human researchers find themselves simply managing progress instead of catalyzing it and increasingly feel left behind.

September 2027 – Superhuman Researcher Model. The previous SC model, optimized for researcher, makes algorithmic strides and develops a new model that is much more efficient and capable, vastly superior to any human at AI research. Despite the ability to run several thousand copies in parallel, AI R&D is only sped up by ~50x due to bottlenecks from compute.

^{^}
The authors defined something called the “AI R&D” multiplier, which is a measure of how much faster AI research can be conducted using AI assistants. They break AI progress into two components: increased compute and improved algorithms, only the latter of which is significantly augmented using agents. Importantly, the AI R&D multiplier only measures the relative speed of progress and so bottlenecks and roadblocks are encountered all the same (dealing with these roadblocks can be much quicker using AI assistants, however).
^{^}
The authors believe that this will be more difficult to train because of “longer feedback loops and less data availability.”
^{^}
Authors list neuralese and iterated distillation and amplification as examples of breakthroughs.

Class Experiment

In lecture 6, Ege, Dashiell, Hannah, and Julia presented an experiment exploring whether composing multiple agents together increases the difficulty of tasks that can be solved. Specifically, they look at if composition and task delegation can solve various RSI benchmark tasks, which then could lead to an intelligence boom.

The team tested three agent configurations: the first with agents organized in a binary tree, the second with agents organized in a star graph, and finally a single agent acting as a control group. To evaluate, the team chose tasks often seen in research processes, such as identifying dataset families and training models.

When evaluated on a task involving classifying medical data, the binary tree system performed the best, achieving 0.916 accuracy compared to the 0.890 and 0.854 prediction accuracy achieved by the star graph and single agent, respectively. Notably, while the multi agent systems each had 7 LLM calls worth of reasoning, compared to the single agents 1 LLM call, the task performance between the three were comparable, with the binary tree and star graph performing slightly better as expected. Moreover, on an exploratory data analysis task, all agent configurations performed much worse than expected.

Overall, the experiments suggest that depth based agent architectures, like the binary tree, seem to elicit more exploratory behavior in the task delegation, while breadth first architectures like the star graph elicit a more brute force approach towards problem solving. Agents also seem to perform better with well specified and structured objectives compared to open ended tasks like EDA, which could pose a significant challenge to achieving RSI. Another insight from the experiments is that agents must be given localized tools, as many agents attempted forbidden or out of domain operations which poses alignment and security concerns, especially if agents are given extended freedom in an uncontained environment. Ultimately, even with structured task delegation and multi agent architectures, the experiments reveal that there are still significant barriers to RSI which future work could be directed towards.

A Final Note

These forecasts presented by this week's readings rest upon several layers of assumptions, like progress timelines, future bottlenecks, or the limit of effective computation, each carrying its own wide range of uncertainty. When these assumptions are layered on top of one another, small estimation errors multiply, and the resulting variance becomes so large that single point predictions lose much of their statistical meaning.

If, for instance, the resource cost of data centers skyrockets, or AI assisted research progress plateaus because experiments can’t be parallelized, or if regulatory legislation limits the rate of progress through government oversight, any potential self improvement loop could fall short of an “intelligence explosion” as defined in the readings. With the relative novelty of this technology, it is difficult to foresee how current challenges will be solved as AI evolves, much less predict what future challenges might look like and how they in turn will be handled.

Still, the readings are useful not because they offer accurate figures in predicting AI advancements, but because they provide ways for us to reason about uncertainty. Ultimately, their value lies in laying out these frameworks for thinking about potential trajectories, constraints, and implications of rapid progress towards super intelligence. The bottom line is that we don’t know how AI will affect AI advancement, just that it will have an impact. With so many factors at play and so many unknowns, it's possible that AI will simply help sustain current trends (exponential growth), or create a short term boost that eventually tapers off, or initiate a period of accelerating growth that fundamentally alters the trajectory of technological development (superintelligence boom).

LESSWRONG
LW