Sorted by New

Wiki Contributions


I also think that the fact that AI safety thinking is so much driven by these fear + distraction patterns, is what's behind the general flail-y nature of so much AI safety work. There's a lot of, "I have to do something! This is something! Therefore, I will do this!"

I think your diagnosis of the problem is right on the money, and I'm glad you wrote it. 

As for your advice on what a person should do about this, it has a strong flavor of: quit doing what you're doing and go in the opposite direction. I think this is going to be good for some people but not others. Sometimes it's best to start where you are. Like, one can keep thinking about AI risk while also trying to become more aware of the distortions that are being introduced by these personal and collective fear patterns.

That's the individual level though, and I don't want that to deflect from the fact that there is this huge problem at the collective level. (I think rationalist discourse has a libertarian-derived tendency to focus on the former and ignore the latter.)

Nice essay, makes sense to me! Curious how you see this playing into machine intelligence.

One thought is that "help maintain referential stability", or something in that ballpark, might be a good normative target for an AI. Such an AI would help humans think, clarify arguments, recover dropped threads of meaning. (Of course, done naively, this could be very socially disruptive, as many social arrangements depend on the absence of clear flows of meaning.)

As a slightly tangential point, I think if you start thinking about how to cast survival / homeostasis in terms of expected-utility maximization, you start having to confront a lot of funny issues, like, "what happens if my proxies for survival change because I self-modified?", and then more fundamentally, "how do I define / locate the 'me' whose survival I am valuing? what if I overlap with other beings? what if there are multiple 'copies' of me?". Which are real issues for selfhood IMO.

>There is no way for the pursuit of homeostasis to change through bottom-up feedback from anything inside the wrapper.  The hierarchy of control is strict and only goes one way.

Note that people do sometimes do things like starve themselves to death or choose to become martyrs in various ways, for reasons that are very compelling to them. I take this as a demonstration that homeostatic maintenance of the body is in some sense "on the same level" as other reasons / intentions / values, rather than strictly above everything else.

I do see the inverse side: a single fixed goal would be something in the mind that's not open to critique, hence not truly generally intelligent from a Deutschian perspective (I would guess; I don't actually know his work well).

To expand on the "not truly generally intelligent" point: one way this could look is if the goal included some tacit assumptions about the universe that turned out later not to be true in general -- e.g. if the agent's goal was something involving increasingly long-range simultaneous coordination, before the discovery of relativity -- and if the goal were really unchangeable, then it would bar or at least complicate the agent's updating to a new, truer ontology.

I've been thinking along the same lines, very glad you've articulated all this!

The way I understand the intent vs. effect thing is that the person doing "frame control" will often contain multitudes: an unconscious, hidden side that's driving the frame control, and then the more conscious side that may not be very aware of it, and would certainly disclaim any such intent.

Load More