LESSWRONG
LW

AI

8

RL-as-a-Service will outcompete AGI companies (and that's good)

by harsimony
8th Sep 2025
3 min read
2

8

AI

8

RL-as-a-Service will outcompete AGI companies (and that's good)
7Vladimir_Nesov
1harsimony
New Comment
2 comments, sorted by
top scoring
Click to highlight new comments since: Today at 6:02 PM
[-]Vladimir_Nesov2h70

LLMs don't suffer from negative transfer, and might even have positive transfer between tasks (getting better at one task doesn't make them worse at other tasks). Most negative transfer visible in practice is about opportunity cost, where focusing in one area leads to neglecting other areas. So it's mostly about specialized data collection (including development of RLVR environments, or generation of synthetic "textbook" data), and that data can then be used in general models that can do all the tasks simultaneously.

In terms of business, the question is where the teams working on task-specific data are working. They could just be selling the data to the AI companies to be incorporated in the general models, and these teams might even become parts of those AI companies. Post-training open weights models for a single task mostly produces an inferior product, because the model will be worse than a general model at everything else, while the general model could do this particular task just as well (if it had the training data).

A better product might be possible with the smallest/cheapest task-specialized models where there actually does start to be negative transfer and you can get them at some level of capability in any one area, but not in multiple areas at the same time. It's unclear if this remains a thing with models of 2026-2029 (when the "smallest/cheapest" models will be significantly larger than what is considered "smallest/cheapest" today), in particular because the prevailing standard of quality might grow into the lower cost of inferencing larger models, making the models that are small by today's standards unappealing.

So if the smallest economically important models get large enough, negative transfer might disappear, and there won't be a technical reason to specialize models, as long as you have all the task specific data for all the tasks in the hands of one company. AI companies that produce foundation models are necessarily quite rich, because they need access to large amounts of training compute (2026 training compute is already about $30bn per 1 GW system for compute hardware alone, which is at least $15bn per year in the long term, but likely more since AI growth is not yet done). So it's likely that they'll manage to get access to good task specific data for most of the economically important topics, by acquiring other companies if necessary, at which point the smaller task specific post-training companies mostly don't have a moat, because their product is neither cheaper nor better than the general models of the big AI companies.

Reply
[-]harsimony2h10

These are good points. I'm uncertain about what models will form the foundation of RLaaS. But I think your point about where the task-specific data teams are working is more important. Off the top of my head, I think there's 3 bins:

  1. For a lot of programming tasks, big AI companies already have lots of expertise and users in-house, so I expect them to dominate production of code generation.
  2. For some tasks like writing marketing copy, LLM's are already good enough at this. There's no business training models further here.
  3. Most interesting are tasks that require lots of tacit knowledge or iteration. For example, getting to self-driving cars required a decade plus of iterating on algorithms and data. I imagine lots of corporations will privately put a bunch of effort into making AI work on their specific problems. Physical tasks in specialized trades are another example.

For tasks in #3, the question is whether to join up with the big AI companies, or develop your own solution to the problem and keep it private. 

Reply
Moderation Log
More from harsimony
View more
Curated and popular this week
2Comments

Companies drive AI development today. There's two stories you could tell about the mission of an AI company:

AGI: AI labs will stop at nothing short of Artificial General Intelligence. With enough training and iteration AI will develop a general ability to solve any (feasible) task. We can leverage this general intelligence to solve any problem, including how to make a profit. 

Reinforcement Learning-as-a-Service (RLaaS)[1]: AI labs have an established process for training language models to attain high performance on clean datasets. By painstakingly creating benchmarks for problems of interest, they can solve any given problem with RL leveraging language models as a general-purpose prior. This is essentially a version of the CAIS model.

Found here. I can’t find the original Epoch article for this.

Both visions are ambitious in the sense that they aim to solve every problem. But RLaaS is more conservative because it tackles each problem separately and relies on some human effort to build datasets. RLaaS requires many models of limited capability honed on specific problems. AGI requires one model with high performance on many tasks.

So, which will dominate the market? I argue that RLaaS has both a better business case and creates less existential risk. It should be promoted.

Why RLaaS will win

RLaaS has proven performance 

We already know that training a model on enough task data is sufficient to get high performance on that task. This has been true for the last few decades in machine learning, but has come into focus with language models. AI companies improved model performance across dozens of benchmarks using RL and data from related tasks. The RLaaS model is proven.

The argument that general-purpose reasoning ability will transfer to many domains is more tenuous. Performance gains on e.g. math problems does transfer to other domains, but only in a limited fashion. We've seen dramatic improvements on IMO performance, but this hasn't translated into dramatic gains in other fields.

This is not to say that better reasoning can't produce broad performance gains, just that for a conservative investor today, training a model on well-defined tasks is a safer bet.

RLaaS might cost less

It's safe to assume that a general-purpose model will be more complicated than a specialized model. So inference costs will be higher per task. However, that may be not be a problem if the general-purpose model has much higher performance than the specialized model.[2]

Something I'm less certain about is the per-task R&D cost. To solve a particular task, building AGI requires substantially more investment than RLaaS, but amortized across enough tasks, the costs may be lower.[3]

RLaaS is harder to copy

If data scaling drives model performance, you can carve out a safe niche by building your own private dataset for task X. By fine-tuning a model on that data, you now have the best model for completing task X. Competitors would have to go through the same trial-and-error process, which may not be worthwhile if you have a first-mover advantage.

Building AGI, by contrast, is harder to control. For one, there's always the risk that a more capable model is misaligned and simply escapes.

Even with an aligned model, it's not clear that AGI can be kept under wraps. Consider how quickly things like model release dates and algorithms diffuse in the AI industry. If the key is a handful of clever tricks, those details can be leaked pretty easily. And by virtue of being so valuable, there are stronger incentives to steal AGI. 

Merely knowing that you created AGI may be enough for others to retrace your steps. It's a lot easier to invent something when you already know it's possible.

AGI is inherently harder to control than a niche dataset.

RLaaS has lower misalignment risk

Present-day models trained on defined tasks are aligned with their users and creators. While these models may be used for malicious purposes, they pose little risk on their own. RLaaS is roughly aligned.[4]

However, the AGI model doesn't have the same assurances. Future AI systems trained under a different paradigm and operating in an open-ended fashion may be misaligned. This adds a substantial downside to developing and using such models.

Conclusion: RLaaS is better and should be promoted

RLaaS has proven performance, lower costs, is more excludable, and is safer. If this holds, most AI companies will pivot away from pursuing AGI and towards RLaaS. 

That's good news because it promises a switch to a safer mode of AI development. To the degree that we can promote such a transition, RLaaS should be encouraged. In fact, I'm intentionally using the buzzword "RLaaS" for this reason. 

Of course,  RLaaS is not without risks; misuse of specialized models is a near term concern. In the future, the concatenation of specialized models may create or assist general intelligences.

But on balance, a transition to the RLaaS model would reduce AI risk and delay the arrival of AGI.

  1. ^

    This article is the first place I encountered the term.

  2. ^

    Though during deployment, it may make more sense to train a specialized model on outputs of the general model to save on inference costs.

  3. ^

    Another possible problem with this story is if AGI can complete tasks that aren't composed of smaller subtasks, unlocking unforseen value that can't be achieved with RLaaS. I'm skeptical, for example, I can't think of a task that can't be completed by organizing enough smart people to work on subproblems. But it's worth mentioning.

  4. ^

    Models are aligned in practice, but are they aligned in theory? I think we're approaching an understanding of why neural networks generalize both in distribution and out of distribution.

    Informally, training chisels cognitive grooves into an agent. Results like the above make me hopeful that prosaic alignment is possible with models trained in the current paradigm.