meriton — LessWrong

I'm a CEO of Monoid AI safety hub in Moscow. On the side, I do independent research in conceptual AI safety and organize events.

Where: Monoid AI Safety Hub (Moscow, Russia)
When: December 21 at 5:00 PM

The Solstice 2025 at Monoid is dedicated to the values of humanity, the very thing that a "good" strong AI should be aligned with. Everything, large and small, eternal and momentary, that we desire to see in our lives.

Event page on LW: https://www.lesswrong.com/events/sHbrkgAY2FX26rodk/moscow-secular-winter-solstice-2025
More info (in Russian): https://monoid.ru/events/solstice-2025

Advance registration is not required;

Wait, but I thought 1 and 2a look the same from a first-person perspective. I mean, I don’t really notice the difference between something happening suddenly and something that’s been happening for a while — until the consequences become “significant” enough for me to notice. In hindsight, sure, one can find differences, but in the moment? Probably not?

I mean, single-single alignment assumes that the operator (human) is happy with the goals their AI is pursuing — not necessarily* with the consequences of how pursuing those goals affects the world around them (especially in a world where other human+AI agents are also pursuing their own goals).

And so, like someone pointed out in a comment above, we might mistake early stages of disempowerment — the kind that eventually leads to undesirable outcomes in the economy/society/etc. — for empowerment. Because from the individual human’s perspective, that is what it feels like.

No?

What am I missing here?

*Unless we assume the AI somewhat "teaches" the human what goals they should want to pursue — from a very non-myopic perspective.

Just to double check: You'd rather say that embedded (in embedded agency) is the synonym of “built-in” (like "In 2005 next generation proprietary embedded controller was introduced.") then “ingrained” (like "Less - is embedded metalanguage: valid CSS will be valid less-program with the same semantics."), correct?

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments