Posts

Sorted by New

Wiki Contributions

Comments

fwang8y00

Honesty is the best policy. <3 Mom

Regarding the assumptions for a strong AI that will cease cooperating and pursue its own values, the scenario presented assumed that 1) L will have the ability to subvert S's control, and then 4a) S must be unaware that L has that ability. That is (if I understand this correctly), the ability for L to subvert S is undetected. But if it's the case that we assume 1), then perhaps S should instead operate under the assumption that L already has the ability (and perhaps even knowledge) to subvert, but S doesn't know what exactly this ability is, and if S knew, not the knowledge of how it may be applied (which we may assume L knows).

In other words, I imagine the scenario would be more like: L has the ability to subvert S, and knows how to use it. S doesn't know the ability, and even if it does, doesn't know how it's used. L (for the sake of argument) knows that S is unaware, and cannot stop it even if S tried. Now, this seems like a pretty bad scenario. However, here, because S knows that it doesn't know, S might spend more effort in devising ways of dealing with this lack of knowledge (e.g. getting L to tell it about abilities it learned, perhaps rewarding with a heart) and the potential desire of L to follow through with taking control of the heart-machine (e.g. considering potential wireheading induces negative reward).

EDIT: Perhaps I'm not as clear on why L should try to deceive S in the first place. It seems there should be a better way of dealing with deception on behalf of L other than resetting/large-negative-reward when regarded as a 'potential threat' to S but no desire to actually threaten S, which, as you mention would just lead to pressure toward better concealment at the detriment of S, rather than pressure toward what S really wants, which is alignment of L's goals with its own.

fwang9y20

This may be a little basic, but if you're already going with the "efficient at transmitting forces" idea for physical posture, then I think a good analogy in terms of mental posture would simply be "efficient at processing information" (of which being rational is a pretty useful method, just like keeping your spine, er, non-kyphotic is useful).

This is much more concise than:

how efficiently one's patterns of directing attention let one mentally navigate one's environment

while at the same time relatively neutral with respect to the actual goals that you may have (e.g. keeping one's balance, either physically when someone bumps into you, or mentally when someone offends you).

And if you were to compare mental posture to an artform like aikido on the physical side, you might also get something like the noble eightfold path on the mental side (which, sayings like having "right mindfulness" can be compared with "relax more").

Also, with the amount of upvotes and relevance to this topic, I'm surprised to see that Roles are Martial Arts for Agency hasn't been linked yet. It was definitely in the back of my mind as I made a few comparisons.