7

12th Nov 2025

1 min read

7

Capabilities research seems bad since we don't know how to make safe AGI, but assuming we can't stop capabilities research entirely^[1], I wonder if capabilities research on SGD is positive given that RL is so much worse.

One of the risks with AGI is that an AI trained with reinforcement learning (RL) is very prone to reward hacking. Conveniently, Stochastic Gradient Descent (SGD) doesn't seem to reward hack in the same way and generalizes much better, and that seems to be why modern LLMs are weirdly sort-of aligned by default and have a general (if sometimes too-broad) concept of good and bad.

Unfortunately, some kinds of training are hard to do with SGD, so we train LLMs to be happy chatbots and to reason using RL.

What I'm wondering is, since SGD seems to be less dangerous than RL, is it actually good to do research on using SGD for chatbotification and reasoning, since that would would lead to safer models? The downside is that SGD is also much more efficient than RL so this would almost certainly make the models more capable too.

^{^}
If we actually manage to ban all capabilities research I obviously don't think there should be an exception for SGD.

AI

Frontpage

7

New Answer

New Comment

1 comment, sorted by

top scoring

Click to highlight new comments since: Today at 7:13 AM

[-]mishka2mo60

RL vs SGD does not seem to be a correct framing.

Very roughly speaking, RL is about what you optimize for (a subclass of what you can optimize for) and SGD is one of the many optimization methods (in particular, SGD and its cousins are highly useful in RL tasks (consider policy gradients and such)).

Reply

1

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

7

[ Question ]

Is SGD capabilities research positive?

7

7