LESSWRONG
LW

1047
Sappique
16160
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Can a pre-commitment to not give in to blackmail be "countered" by a pre-commitment to ignore such pre-commitments?
Sappique3mo10

Good question.

I can't tell, if saying that you will reject unfair splits would be a threat by the definition in my above comment. For it to be a threat, you would have to only do it if the other person cares about the thing being split. But in the Ultimatum Game both players per definition care about it, so I have a hard time thinking about what you would do if someone offers you a unfair split of something they don't care about (how can a split even be unfair, if only one person values the thing being split?).

Reply
Can a pre-commitment to not give in to blackmail be "countered" by a pre-commitment to ignore such pre-commitments?
Sappique3mo10

I believe "on purpose" in this case means, doing something conditional on the other actor's utility function disvaluing it.

So if you build a interstellar highway through someone's planet because that is the fastes route, you are not "purposefully minimizing their utility function", even if they strongly disvalue it. If you build it through their planet only if they disvalue it and would have build it around if they disvalued that, then you are "purposefully minimizing their utility function".

If you do so to prevent them from having a planet or to make them react in some (useful to you) way, and would have done so even if they didn't have disvalued their planet being destroyed, then you are not "purposefully minimizing their utility function", I think?

Reply
Can a pre-commitment to not give in to blackmail be "countered" by a pre-commitment to ignore such pre-commitments?
Sappique3mo10

Thanks, that's a interesting way to think about pre-commitments.

However, I'm not sure if I understand what your conclusion is. Do you believe that actors can not protect themself from blackmail with pre-commitments?

Reply
Can a pre-commitment to not give in to blackmail be "countered" by a pre-commitment to ignore such pre-commitments?
Sappique3mo*30

As far as I can tell from Eliezer's writing (mostly Planecrash), a threat is when someone will (counterfactually) purposefully minimize someone else's utility function.

So releasing blackmail material would be a threat, but building a road through someone else's home (if doing so offers slightly more utility then going around) wouldn't be?

Actors could pre-commit to ignore any counterfactuals where someone purposefully minimizes their utility function, but then again would-be blackmailers could pre-commit to ignore such pre-commitments.

Maybe pre-commiting to ignore threats is a kind of "pre-commitment shelling point", that works if everyone does it? If all actors coordinated (even by just modeling other actors and without communication) to pre-commit to ignore threats, the would-be extorters accept that?

Reply1
The Dilemma’s Dilemma
Sappique3mo50

Sure, the game of "gain controll of as much energy as possible" is 0-sum, but the "real" game of each actor maximizing their utility function isn't necessarily.

Utility functions could be bounded or only locally caring (and thus require only a limited amount of energy to maximize) and multiple actors could have identical utility functions (making it a positive sum game for them).

Reply
Detailed Ideal World Benchmark
Sappique7mo20

For example, a bad moral argument may argue that even the simplest being which is "capable of arguing for equal rights for itself," ought to deserve equal rights and personhood. This simplest being is a small AI.

Or that that simplest "being" is a rock with "I want equal rights" written on it.

Generally I think the Ideal World Benchmark could be useful for identifying some misaligned AIs. However, some misaligned AIs can be identified by asking "What are your goals?" and I do not expect the Ideal World Benchmark to be significantly more robust to deception.

If tomorrow my boss claimed to be sent by a future version of myself that obtained vast intelligence and power and asked me what that version should do, I would want some convincing prove before saying anything controversial.

Reply
10Can a pre-commitment to not give in to blackmail be "countered" by a pre-commitment to ignore such pre-commitments?
Q
3mo
Q
12