RL-as-a-Service will outcompete AGI companies (and that's good)

[-]Vladimir_Nesov3mo81

LLMs don't suffer from negative transfer, and might even have positive transfer between tasks (getting better at one task doesn't make them worse at other tasks). Most negative transfer visible in practice is about opportunity cost, where focusing in one area leads to neglecting other areas. So it's mostly about specialized data collection (including development of RLVR environments, or generation of synthetic "textbook" data), and that data can then be used in general models that can do all the tasks simultaneously.

In terms of business, the question is where the teams working on task-specific data are working. They could just be selling the data to the AI companies to be incorporated in the general models, and these teams might even become parts of those AI companies. Post-training open weights models for a single task mostly produces an inferior product, because the model will be worse than a general model at everything else, while the general model could do this particular task just as well (if it had the training data).

A better product might be possible with the smallest/cheapest task-specialized models where there actually does start to be negative transfer and you can get them at some level of capability in any one area, but not in multiple areas at the same time. It's unclear if this remains a thing with models of 2026-2029 (when the "smallest/cheapest" models will be significantly larger than what is considered "smallest/cheapest" today), in particular because the prevailing standard of quality might grow into the lower cost of inferencing larger models, making the models that are small by today's standards unappealing.

So if the smallest economically important models get large enough, negative transfer might disappear, and there won't be a technical reason to specialize models, as long as you have all the task specific data for all the tasks in the hands of one company. AI companies that produce foundation models are necessarily quite rich, because they need access to large amounts of training compute (2026 training compute is already about $30bn per 1 GW system for compute hardware alone, which is at least $15bn per year in the long term, but likely more since AI growth is not yet done). So it's likely that they'll manage to get access to good task specific data for most of the economically important topics, by acquiring other companies if necessary, at which point the smaller task specific post-training companies mostly don't have a moat, because their product is neither cheaper nor better than the general models of the big AI companies.

[-]harsimony3mo10

These are good points. I'm uncertain about what models will form the foundation of RLaaS. But I think your point about where the task-specific data teams are working is more important. Off the top of my head, I think there's 3 bins:

For a lot of programming tasks, big AI companies already have lots of expertise and users in-house, so I expect them to dominate production of code generation.
For some tasks like writing marketing copy, LLM's are already good enough at this. There's no business training models further here.
Most interesting are tasks that require lots of tacit knowledge or iteration. For example, getting to self-driving cars required a decade plus of iterating on algorithms and data. I imagine lots of corporations will privately put a bunch of effort into making AI work on their specific problems. Physical tasks in specialized trades are another example.

For tasks in #3, the question is whether to join up with the big AI companies, or develop your own solution to the problem and keep it private.

[-]Archimedes3mo41

Can you provide some examples that you think are well-suited to RLaaS? Getting high-quality data to train on is a highly nontrivial task and one of the bottlenecks for general models too.

I can imagine a consulting service that helps companies turn their proprietary data into useful training data, which they then use to train a niche model. I guess you could call that RLaaS, though it's likely to be more of a distilling and fine-tuning of a general model.

[-]harsimony3mo21

I would count your consulting service as RLaaS essentially. I'll admit, RLaaS is a buzzword that obscures a lot. "Have AI researchers and domain experts iterate on current AI models until they are performant at a particular task" would be more accurate. Things I think this model will apply to:

Anything involving robots. Consider the journey to self driving cars with lots of human data collection, updating the hardware, cleaning the dataset, and tweaking algorithms. Any physical manipulation task that has to be economically competitive will need a lot of input from experts. Factory managers will need robots that operate under idiosyncratic requirements. It'll take time to iron out the kinks.
To a lesser extent, repetitive internal company processes will need some fine tuning. Filling out forms specific to a company, filing reports in the local format, etc. Current LLM's can probably do this with 90% success, but pushing that to 99% is valuable and will take a little work.
Research-heavy domains. The stuff covered in publications is 10% of the knowledge you need to do science. I expect LLM research assistants to need adjustment for things like "write code using all these niche software packages", "this is the important information we need from this paper", "results from this lab are BS so ignore them".

My priors are that reality is detailed and getting a general purpose technology like modern AI to actually work in a particular domain takes some iteration. That's my key takeaway from that METR study:

https://www.lesswrong.com/posts/m2QeMwD7mGKH6vDe2/?commentId=T5MNnpneEZho2CuZS

[-]Donald Hobson3mo30

The world is not automatically divided up into lots of separate tasks.

If you divide tasks into too many small pieces, too many little buckets, many important problems can fall through the gaps.

For example. If you use RL to train a plumbing robot. And separately train an electrician robot. Then neither of these robots is can solve the problem that you get an electric shock whenever you turn on the tap.

If you train on a few huge buckets, then you have 1 robot that does everything, and that's basically an AGI again.

And in this RL as a service model, wouldn't there be people doing RL for AI research.

So, when this model gets good enough, someone can just say "build an AGI" and get one. Because all tasks are being automated, and that includes the task of building AGI.

Actually, RL is based on trial and error. It would be hard to train an AI researcher without giving it the opportunity to run arbitrary code in training.

[-]Noosphere893mo20

Another possible problem with this story is if AGI can complete tasks that aren't composed of smaller subtasks, unlocking unforseen value that can't be achieved with RLaaS. I'm skeptical, for example, I can't think of a task that can't be completed by organizing enough smart people to work on subproblems. But it's worth mentioning.

A big one is any task that has a lot of abstraction leaks, where you can't neatly use APIs to factor out problems:

I’m generally skeptical that anything in the vicinity of factored cognition will achieve both sufficient safety and sufficient capability simultaneously, for reasons similar to Eliezer’s here. For example, I’ll grant that a team of 10 people can design a better and more complex widget than any one of them could by themselves. But my experience (from having been on many such teams) is that the 10 people all need to be explaining things to each other constantly, such that they wind up with heavily-overlapping understandings of the task, because all abstractions are leaky. And you can’t just replace the 10 people with 100 people spending 10× less time, or the project will absolutely collapse, crushed under the weight of leaky abstractions and unwise-in-retrospect task-splittings and task-definitions, with no one understanding what they’re supposed to be doing well enough to actually do it. In fact, at my last job, it was not at all unusual for me to find myself sketching out the algorithms on a project and sketching out the link budget and scrutinizing laser spec sheets and scrutinizing FPGA spec sheets and nailing down end-user requirements, etc. etc. Not because I’m individually the best person at each of those tasks—or even very good!—but because sometimes a laser-related problem is best solved by switching to a different algorithm, or an FPGA-related problem is best solved by recognizing that the real end-user requirements are not quite what we thought, etc. etc. And that kind of design work is awfully hard unless a giant heap of relevant information and knowledge is all together in a single brain.

While I don't entirely unendorse the use of parallelism, and do tend to be more optimistic on how much can be parallelized and how useful parallelization is, I also don't agree with the claim that you can organize enough smart people to work on subproblems for anywhere close to 100% of a problem, and serial speeds are still a bottleneck (but not too restrictive of a bottleneck) (at least if we don't assume superhuman coordination).

My main reason why I don't expect this world is that I expect data-rich but compute-poor models to by default only be useful on benchmarks, because it's way too easy to overfit on small models, and I remember reading a paper that showed that scaled-up generalist models had far less overfitting/teaching to the test than small models.

Even for modern LLMs, data leakage is an issue, but it's even worse for small models, so I expect much worse results for other companies trying to train their own small models by using lots of data (except in domains where this is easy to verify, but at that point a generalist model also works.)

^{^}

This article is the first place I encountered the term.

^{^}

Though during deployment, it may make more sense to train a specialized model on outputs of the general model to save on inference costs.

^{^}

Another possible problem with this story is if AGI can complete tasks that aren't composed of smaller subtasks, unlocking unforseen value that can't be achieved with RLaaS. I'm skeptical, for example, I can't think of a task that can't be completed by organizing enough smart people to work on subproblems. But it's worth mentioning.

^{^}

Models are aligned in practice, but are they aligned in theory? I think we're approaching an understanding of why neural networks generalize both in distribution and out of distribution.

Informally, training chisels cognitive grooves into an agent. Results like the above make me hopeful that prosaic alignment is possible with models trained in the current paradigm.

LESSWRONG
LW

LESSWRONG
LW

11

RL-as-a-Service will outcompete AGI companies (and that's good)

11

11

Why RLaaS will win

RLaaS has proven performance

RLaaS might cost less

RLaaS is harder to copy

RLaaS has lower misalignment risk

Conclusion: RLaaS is better and should be promoted