If you believe that LLMs lend themselves unusually well to alignment compared to other regimes, this can be a very good reason to start doing capability research on them rather than LLM safety research. Imagine you have these beliefs about how AI goes:
By I mean the probability that the first ASI is LLM-based (and that it isn't) - the two are mutually exclusive and sum to 100%.
Let's imagine you are a super genius, and your effort alone makes something 10% more/less likely than currently. Then
This is 1% less doom than doing nothing, congrats! Now for frontier capability work on LLMs - since these probabilities are about which regime reaches ASI first, pushing up also pulls down.
Woah, almost an additional 3% down! You could also instead go the Steven Byrnes route:
An additional percentage down!
These numbers shouldn't be taken seriously - the '10% more/less likely than currently' assumption in particular is arbitrary. Different problems aren't equally movable: making LLM ASI happen when it otherwise wouldn't could be far harder than the other shifts (esp. since many people are already trying), or making non-LLM ASI safe might be so hard that any effort is wasted. My main point is really just that working on capabilities can be a perfectly rational move for a safety-oriented agent, as long as he believes this regime is genuinely the safer one.
If you believe that LLMs lend themselves unusually well to alignment compared to other regimes, this can be a very good reason to start doing capability research on them rather than LLM safety research. Imagine you have these beliefs about how AI goes:
By I mean the probability that the first ASI is LLM-based (and that it isn't) - the two are mutually exclusive and sum to 100%.
Let's imagine you are a super genius, and your effort alone makes something 10% more/less likely than currently. Then
This is 1% less doom than doing nothing, congrats! Now for frontier capability work on LLMs - since these probabilities are about which regime reaches ASI first, pushing up also pulls down.
Woah, almost an additional 3% down! You could also instead go the Steven Byrnes route:
An additional percentage down!
These numbers shouldn't be taken seriously - the '10% more/less likely than currently' assumption in particular is arbitrary. Different problems aren't equally movable: making LLM ASI happen when it otherwise wouldn't could be far harder than the other shifts (esp. since many people are already trying), or making non-LLM ASI safe might be so hard that any effort is wasted. My main point is really just that working on capabilities can be a perfectly rational move for a safety-oriented agent, as long as he believes this regime is genuinely the safer one.
There's an interactive version of this - set your own numbers and it ranks the options