x

LESSWRONG

LW

Yurii Shulima — LessWrong

Yurii Shulima

Yurii Shulima

Message

2

1y

Yurii Shulima

1y

Do AI agents need "ethics in weights"?

In this article, I’m trying to argue why outer alignment is preferable and where, in my opinion, the error lies. I also explain why ethics must be part of the task, not embedded in the weights. Perhaps I’m wrong. But I believe it is necessary to consider any ideas in...

Nov 4, 2025•1

Do AI agents need "ethics in weights"?

[Cross-posted from r/ControlProblem (originally posted on July 28, 2025)] Perhaps someone might find it helpful to discuss an alternative viewpoint. This post describes a dangerous alignment mistake which, in my opinion, leads to an inevitable threat — and proposes an alternative approach to agent alignment based on goal-setting rather than...

Nov 3, 2025•1