LESSWRONG
LW

2645
meriton
0020
Message
Dialogue
Subscribe

I'm a generalist manager at the AI safety hub "Monoid" in Moscow. On the side, I do independent research in conceptual AI safety and organize events.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Disempowerment spirals as a likely mechanism for existential catastrophe
meriton6mo10

Wait, but I thought 1 and 2a look the same from a first-person perspective. I mean, I don’t really notice the difference between something happening suddenly and something that’s been happening for a while — until the consequences become “significant” enough for me to notice. In hindsight, sure, one can find differences, but in the moment? Probably not?

I mean, single-single alignment assumes that the operator (human) is happy with the goals their AI is pursuing — not necessarily* with the consequences of how pursuing those goals affects the world around them (especially in a world where other human+AI agents are also pursuing their own goals).

And so, like someone pointed out in a comment above, we might mistake early stages of disempowerment — the kind that eventually leads to undesirable outcomes in the economy/society/etc. — for empowerment. Because from the individual human’s perspective, that is what it feels like.

No?

What am I missing here?

*Unless we assume the AI somewhat "teaches" the human what goals they should want to pursue — from a very non-myopic perspective.

Reply
Embedded Agency (full-text version)
meriton3y10

Just to double check: You'd rather say that embedded (in embedded agency) is the synonym of “built-in” (like "In 2005 next generation proprietary embedded controller was introduced.") then “ingrained” (like "Less - is embedded metalanguage: valid CSS will be valid less-program with the same semantics."), correct? 
 

Reply