meriton

I'm a generalist manager at the AI safety hub "Monoid" in Moscow. On the side, I do independent research in conceptual AI safety and organize events.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
meriton10

Wait, but I thought 1 and 2a look the same from a first-person perspective. I mean, I don’t really notice the difference between something happening suddenly and something that’s been happening for a while — until the consequences become “significant” enough for me to notice. In hindsight, sure, one can find differences, but in the moment? Probably not?

I mean, single-single alignment assumes that the operator (human) is happy with the goals their AI is pursuing — not necessarily* with the consequences of how pursuing those goals affects the world around them (especially in a world where other human+AI agents are also pursuing their own goals).

And so, like someone pointed out in a comment above, we might mistake early stages of disempowerment — the kind that eventually leads to undesirable outcomes in the economy/society/etc. — for empowerment. Because from the individual human’s perspective, that is what it feels like.

No?

What am I missing here?

*Unless we assume the AI somewhat "teaches" the human what goals they should want to pursue — from a very non-myopic perspective.

Just to double check: You'd rather say that embedded (in embedded agency) is the synonym of “built-in” (like "In 2005 next generation proprietary embedded controller was introduced.") then “ingrained” (like "Less - is embedded metalanguage: valid CSS will be valid less-program with the same semantics."), correct?