When capabilities work is the *safe* bet

RobinHa

31 When capabilities work is the safe bet

by RobinHa

1st Jul 2026

2 min read

0

31

This is a linkpost for https://robinhaselhorst.com/blog/capabilities-safe-bet

If you believe that LLMs lend themselves unusually well to alignment compared to other regimes, this can be a very good reason to start doing capability research on them rather than LLM safety research. Imagine you have these beliefs about how AI goes:

By I mean the probability that the first ASI is LLM-based (and that it isn't) - the two are mutually exclusive and sum to 100%.

Let's imagine you are a super genius, and your effort alone makes something 10% more/less likely than currently. Then

This is 1% less doom than doing nothing, congrats! Now for frontier capability work on LLMs - since these probabilities are about which regime reaches ASI first, pushing up also pulls down.

Woah, almost an additional 3% down! You could also instead go the Steven Byrnes route:

An additional percentage down!

These numbers shouldn't be taken seriously - the '10% more/less likely than currently' assumption in particular is arbitrary. Different problems aren't equally movable: making LLM ASI happen when it otherwise wouldn't could be far harder than the other shifts (esp. since many people are already trying), or making non-LLM ASI safe might be so hard that any effort is wasted. My main point is really just that working on capabilities can be a perfectly rational move for a safety-oriented agent, as long as he believes this regime is genuinely the safer one.

There's an interactive version of this - set your own numbers and it ranks the options

AI

Frontpage

31

New Comment

Moderation Log