Jonii — LessWrong

Notes on logical priors from the MIRI workshop

Try as I might, I cannot find any reference to what's canonical way of building such counterfactual scenarios. Closest I could get was in http://lesswrong.com/lw/179/counterfactual_mugging_and_logical_uncertainty/ , where Vladimir Nesov seems to simply reduce logical uncertainty to ordinary uncertainty, but this does not seem to have anything to do with building formal theories and proving actions or any such thing.

To me, it seems largely arbitrary how agent should do when faced with such a dilemma, all dependent on actually specifying what it means to test a logical counterfactual. If you don't specify what it means, whatever could happen as a result.

Notes on logical priors from the MIRI workshop

Jonii12y00

I asked about these differences in my second post in this post tree, where I explained how I understood these counterfactuals to work. I explained as clearly as I could that, for example, calculators should work as they do in real world. I did this explaining in hopes of someone voicing disagreement if I had misunderstood how these logical counterfactuals work.

However, modifying any calculator would mean that there can not be, in principle, any "smart" enough ai or agent that could detect it was in counterfactual. Our mental hardware that checks if logical coin should've been heads or tails is a calculator the same as any computer, and again, there does not seem to be any reason to assume Omega leaves some calculators unchanged while changes results of others.

Unless, this thing is just assumed to happen, with some silently assumed cutaway point where calculators become so internal they are left unmodified.

Notes on logical priors from the MIRI workshop

Jonii12y00

Well, to be exact, your formulation of this problem has pretty much left this counterfactual entirely undefined. Naive approximation, that the world is just like ours, and Omega just lies in counterfactual, would not contain such weird calculators which give you wrong answers. If you want to complicate problem by saying that some specific class of agents have a special class of calculators that one would usually think to work in certain way, but actually they work in a different way, well, so be it. That's however just a free-floating parameter you have left unspecified and that, unless stated otherwise, should be assumed not to be the case.

Notes on logical priors from the MIRI workshop

Jonii12y00

Yes, those agents you termed "stupid" in your post, right?

Notes on logical priors from the MIRI workshop

Jonii12y20

After asking about this on #LW irc channel, I take back my initial objection, but I still find this entire concept of logical uncertainty kinda suspicious.

Basically, if I'm understanding this correctly, Omega is simulating an alternate reality which is exactly like ours, and where the only difference is that Omega says something like "I just checked if 0=0, and turns out it's not. If it was, I would've given you moneyzzz(iff you would give me moneyzzz in this kind of situation), but now that 0!=0, I must ask you for $100." Then the agent notices, in that hypothetical situation, that actually 0=0, so actually Omega is lying, so he is in hypothetical, and thus he can freely give moneyzzz away to help to real you. Then, because some agents can't tell for all possible logical coins if they are lied to or not, they might have to pay real moneyzzz, while sufficiently intelligent agents might be able to cheat the system if they are able to notice if they are lied to about the state of the logical coin.

I still don't understand why a stupid agent would want to make a smart AI that did pay. Also, there are many complications that restrict decisions of both smart and stupid agents, given argument I've given here, stupid agents still might prefer not paying, and smart agents might prefer paying, if they gain some kind of insght to how Omega chose these logical coins. Also, this logical coin problemacy seems to me like a not-too-special special class of Omega problems where some group of agents is able to detect if they are in counterfactuals

Notes on logical priors from the MIRI workshop

Jonii12y10

You lost me at part

In Counterfactual Mugging with a logical coin, a "stupid" agent that can't compute the outcome of the coinflip should agree to pay, and a "smart" agent that considers the coinflip as obvious as 1=1 should refuse to pay.

The problem is that, I see no reason why smart agent should refuse to pay. Both stupid and smart agent know it as logical certainty that they just lost. There's no meaningful difference between being smart and stupid in this case, that I can see. Both however like to be offered such bets, where logical coin is flipped, so they pay.

I mean, we all agree that a "smart" agent, that refused to pay here, would receive $0 if Omega flipped logical coin of asking if 1st digit of pi was an odd number, while "stupid" agent would get $1,000,000.

On manipulating others

Jonii13y30

This actually was one of the things inspiring me to write this post. I was wondering if I could make use of LW community to run such tests, because it would be interesting to get to practice these skills with consent, but trying to devise such tests stumped me. It's actually pretty difficult to come up with a goal that's actually difficult to achieve in any not-overtly-hostile social context. Laborious, maybe, but that's not the same thing. I just kinda generalized from this, that it should actually be pretty easy to run with any consciously named goal and achieve it, but there must be some social inhibition.

The set of things that inspired me was wide and varying. It just may be reflected in how the essay was... Not as coherent as I'd have hoped.

On manipulating others

Jonii13y-30

That's a nice heuristic, but unfortunately, it's easy to come up with cases where this heuristic is wrong. Say, people want to play a game, I'll use chess for availability, not because it best exemplifies this problem. If you want to have a fun game of chess, ideally you'd hope you did have roughly equal matches. If 9 out of 10 players are pretty weak, just learning the rules, and want to play and have fun with it, you, the 10th player, a strong club player, being an outlier, cannot partake because you are too good(with chess, you could maybe try giving your queen to handicap yourself, or take time handicap, to make games more interesting, but generally I feel that sorta of tricks still make it less for fun for all parties)

While there might be obvious reasons to suspect bias being at play, unless you want to ban ever discussing topics that might involve bias, the best way around it, that I know of, is to actually focus on the topic. Just stating "woah, you probably are biased if you think thoughts like this" is something I did take into consideration. I was still curious to hear LW thoughts on this topic. The actual topic, not on whether LW thinks it's a bias-inducing topic or not. If you want me to add some disclaimer for other people, I'm open to suggestions. I was going to include one myself, that was basically saying "Failing socially in a way described here would at best be very very weak evidence of you being socially gifted, intelligent, or whatever. Reasoning presented here is not peer-reviewed, and might as well contain errors". I did not, because I didn't want to add yet another shiny distraction from the actual point presented. I didn't think it would be needed, either.

On manipulating others

Jonii13y20

Oh, yes, that is basically my understanding: We do social manipulation to the extent it is deemed "fair", that is, to the point it doesn't result in retaliation. But at some point it starts to result in such retaliation, and we have this "fairness"-sensor that tells us when to retaliate or watch out for retaliation.

I don't particularly care about manipulation that results in obtaining salt shaker or a tennis partner. What I'm interested in is manipulation you can use to form alliances, make someone liable to help you with stuff you want, make them like you, make them think of you as their friend or "senpai" for the lack of better term, or make them fall in love with you. What also works is getting them to have sex with you, to reveal something embarrassing about themselves, or otherwise become part of something they hold sacred. Pretending to be a god would fall into this category. I'm struggling to explain why I think manipulation on those cases is iffy, I think it has to do with that kind of interaction kinda assuming that there are processes involved beyond self-regulation. With manipulation, you could bypass that and in effect you would lie about your alliance.

It is true many social interactions are not about anything deeper than getting the salt shaker. I kind of just didn't think of them while writing this post. I might need to clarify that point.

On manipulating others

Jonii13y-20

This I agree with completely. However, it sounding like power fantasy doesn't mean it's wrong or mistaken.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments