Notes on logical priors from the MIRI workshop

Wait, what is Tegmark level 5?

It's a name we made up for mathematically impossible universes that we still care about because we haven't yet proved them to be mathematically impossible. That becomes relevant in problems like Counterfactual Mugging with a logical coin.

[-]Douglas_Knight12y10

What is a "locally consistent theory"?

[-]cousin_it12y30

Let's say "a set of statements that has no short proof of inconsistency", for some reasonable meaning of "short".

[-]Douglas_Knight12y00

That's what I guessed, but you said "So we abandoned this idea," implying that the rest of the article was completely different, while the rest of the article was about still proof lengths, just smoothing thresholds into weights, so I became skeptical of the guess. I don't have any suggestions for how to talk about false starts and how they relate, but I think it might be useful for insight into my confusion. (Actually, in this particular case, I do have a suggestion, which is to use the term "proof length" much earlier.)

[-]cousin_it12y00

Thanks! Made a small edit to the post.

[-]Jonii12y10

You lost me at part

In Counterfactual Mugging with a logical coin, a "stupid" agent that can't compute the outcome of the coinflip should agree to pay, and a "smart" agent that considers the coinflip as obvious as 1=1 should refuse to pay.

The problem is that, I see no reason why smart agent should refuse to pay. Both stupid and smart agent know it as logical certainty that they just lost. There's no meaningful difference between being smart and stupid in this case, that I can see. Both however like to be offered such bets, where logical coin is flipped, so they pay.

I mean, we all agree that a "smart" agent, that refused to pay here, would receive $0 if Omega flipped logical coin of asking if 1st digit of pi was an odd number, while "stupid" agent would get $1,000,000.

[-]cousin_it12y20

Note that there's no prior over Omega saying that it's equally likely to designate 1=1 or 1≠1 as heads. There's only one Omega, and with that Omega you want to behave a certain way. And with the Omega that designates "the trillionth digit of pi is even" as heads, you want to behave differently.

[-]wedrifid12y40

And with the Omega that designates "the trillionth digit of pi is even" as heads, you want to behave differently.

Specifically, you want to bet on 'heads'. The trillionth digit of pi is a two.

I think we need to find a trickier logical uncertainty as a default example. There is a (mildly) interesting difference between logical uncertainties that we could easily look up or calculate like "Is 1,033 a prime?" or "is the trillionth digit of pi even?" and logical uncertainties that can not be plausibly looked up. Both types of uncertainty are sometimes relevant but often we want a 'logical coin' that isn't easily cheated.

[-]Jonii12y20

After asking about this on #LW irc channel, I take back my initial objection, but I still find this entire concept of logical uncertainty kinda suspicious.

Basically, if I'm understanding this correctly, Omega is simulating an alternate reality which is exactly like ours, and where the only difference is that Omega says something like "I just checked if 0=0, and turns out it's not. If it was, I would've given you moneyzzz(iff you would give me moneyzzz in this kind of situation), but now that 0!=0, I must ask you for $100." Then the agent notices, in that hypothetical situation, that actually 0=0, so actually Omega is lying, so he is in hypothetical, and thus he can freely give moneyzzz away to help to real you. Then, because some agents can't tell for all possible logical coins if they are lied to or not, they might have to pay real moneyzzz, while sufficiently intelligent agents might be able to cheat the system if they are able to notice if they are lied to about the state of the logical coin.

I still don't understand why a stupid agent would want to make a smart AI that did pay. Also, there are many complications that restrict decisions of both smart and stupid agents, given argument I've given here, stupid agents still might prefer not paying, and smart agents might prefer paying, if they gain some kind of insght to how Omega chose these logical coins. Also, this logical coin problemacy seems to me like a not-too-special special class of Omega problems where some group of agents is able to detect if they are in counterfactuals

[-]cousin_it12y20

Note that the agent is not necessarily able to detect that it's in a counterfactual, see Nesov's comment.

[-]Jonii12y00

Yes, those agents you termed "stupid" in your post, right?

[-]cousin_it12y20

The smart ones too, I think. If you have a powerful calculator and you're in a counterfactual, the calculator will give you the wrong answer.

[-]Jonii12y00

Well, to be exact, your formulation of this problem has pretty much left this counterfactual entirely undefined. Naive approximation, that the world is just like ours, and Omega just lies in counterfactual, would not contain such weird calculators which give you wrong answers. If you want to complicate problem by saying that some specific class of agents have a special class of calculators that one would usually think to work in certain way, but actually they work in a different way, well, so be it. That's however just a free-floating parameter you have left unspecified and that, unless stated otherwise, should be assumed not to be the case.

[-]cousin_it12y00

Hmm, no, I assumed that Omega would be using logical counterfactuals, which are pretty much the topic of the post. In logical counterfactuals, all calculators behave differently ;-) But judging from the number of people asking questions similar to yours, maybe it wasn't a very transparent assumption...

[-]Jonii12y00

I asked about these differences in my second post in this post tree, where I explained how I understood these counterfactuals to work. I explained as clearly as I could that, for example, calculators should work as they do in real world. I did this explaining in hopes of someone voicing disagreement if I had misunderstood how these logical counterfactuals work.

However, modifying any calculator would mean that there can not be, in principle, any "smart" enough ai or agent that could detect it was in counterfactual. Our mental hardware that checks if logical coin should've been heads or tails is a calculator the same as any computer, and again, there does not seem to be any reason to assume Omega leaves some calculators unchanged while changes results of others.

Unless, this thing is just assumed to happen, with some silently assumed cutaway point where calculators become so internal they are left unmodified.

[-]Vladimir_Nesov12y20

However, modifying any calculator

Calculators are not modified, they are just interpreted differently, so that when trying to answer the question of what happens in a certain situation (containing certain calculators etc.) we get different answers depending on what the assumptions are. The situation is the same, but the (simplifying) assumptions about it are different, and so simplified inferences about it are different as well. In some cases simplification is unavoidable, so that dependence of conclusions on assumptions becomes an essential feature.

[-]cousin_it12y10

My current understanding of logical counterfactuals is something like this: if the inconsistent formal theory PA+"the trillionth digit of pi is odd" has a short proof that the agent will take some action, which is much shorter than the proof in PA that the trillionth digit of pi is in fact even, then I say that the agent takes that action in that logical counterfactual.

Note that this definition leads to only one possible counterfactual action, because two different counterfactual actions with short proofs would lead to a short proof by contradiction that the digit of pi is odd, which by assumption doesn't exist. Also note that the logical counterfactual affects all calculator-like things automatically, whether they are inside or outside the agent.

That's an approximate definition that falls apart in edge cases, the post tries to make it slightly more exact.

[-]Vladimir_Nesov12y30

(Btw, I think it should be mentioned that a central piece of motivation for this "logical counterfactuals" thing is that it's probably the same construction that's needed to evaluate possible actions in normal cases, without any contrived coins, for an agent that knows its own program. So for example although a counterfactual scenario can't easily "lead" to two different actions, two different actions in that scenario can still be considered as possibly even more (easily shown to be) contradictory "logical counterfactuals" that include additional assumptions about what the action is.)

[-]Jonii12y00

Try as I might, I cannot find any reference to what's canonical way of building such counterfactual scenarios. Closest I could get was in http://lesswrong.com/lw/179/counterfactual_mugging_and_logical_uncertainty/ , where Vladimir Nesov seems to simply reduce logical uncertainty to ordinary uncertainty, but this does not seem to have anything to do with building formal theories and proving actions or any such thing.

To me, it seems largely arbitrary how agent should do when faced with such a dilemma, all dependent on actually specifying what it means to test a logical counterfactual. If you don't specify what it means, whatever could happen as a result.

[-]IlyaShpitser12y00

I am not sure there is a clean story yet on logical counterfactuals. Speaking for myself only, I am not yet convinced logical counterfactuals are "the right approach."

[-]cousin_it12y00

Hi Ilya,

I am not yet convinced logical counterfactuals are "the right approach."

Me neither. Have you seen my post about common mistakes? To me it seems more productive and more fun to explore the implications of an idea without worrying if it's the right approach.

[-]IlyaShpitser12y00

I like "breadth first search" or more precisely "iterative deepening" better than "depth first search."

(DFS is not guaranteed to find the optimal solution, after all!)

[-]Ishaan12y10

But if a stupid agent is asked to write a smart agent, it will want to write an agent that will agree to pay.

Wait, I'm afraid I'm already lost and this question seems so simple as to suggest I'm missing some important premise of the hypothetical scenario: Why would the stupid agent want this? Why wouldn't it want to write a smart agent that calculates the millionth digit and makes the winning choice?

Restatement of what I understand about the problem:

You offer me lots of money if the millionth digit of pi is even and a small loss of it is odd. I should take the bet since I can't calculate the answer and it might as well be random .

You offer me lots of money if the millionth digit of pi is even and a small loss of it is odd, and the chance to build a calculator to calculate the answer. I should still take the bet, even if my calculator tells me that it's odd.

If I'm rephrasing it correctly it, then why?! If you're given the chance to make a calculator to solve the problem, why wouldn't you use it?

[-]cousin_it12y20

What you're describing is not Counterfactual Mugging, it's just a bet, and the right decision is indeed to use the calculator. The interesting feature of Counterfactual Mugging is that Omega is using counterfactual reasoning to figure out what you would have done if the coin had come out differently. You get the money only if you would have paid up in the counterfactual branch. In that case the right decision is to not use the calculator, I think. Though other people might have different intuitions, I'm sort of an outlier in how much I'm willing to follow UDT-ish reasoning.

[-]Vladimir_Nesov12y20

The setup is such that muggings and rewards are grouped in pairs, for each coin there is a reward and a mugging, and the decision in the mugging only affects the reward of that same coin. So even if you don't know where the coin comes from, or whether there are other coins with the same setup, or other coins where you don't have a calculator, your decision on a mugging for a particular coin doesn't affect them. If you can manage it, you should pay up only in counterfactuals, situations where you hypothetically observe Omega asserting an incorrect statement.

Recognizing counterfactuals requires that the calculator can be trusted to be more accurate than Omega. If you trust the calculator, the algorithm is that if the calculator disagrees with Omega, you pay up, but if the calculator confirms Omega's correctness, you refuse to pay (so this confirmation of Omega's correctness translates into a different decision than just observing Omega's claim without checking it).

Perhaps in the counterfactual where the logical coin is the opposite of what's true, the calculator should be assumed to also report the incorrect answer, so that its result will still agree with Omega's. In this case, the calculator provides no further evidence, there is no point in using it, and you should unconditionally pay up.

[-]cousin_it12y30

Perhaps in the counterfactual where the logical coin is the opposite of what's true, the calculator should be assumed to also report the incorrect answer, so that its result will still agree with Omega's. In this case, the calculator provides no further evidence, there is no point in using it, and you should unconditionally pay up.

Yeah, that's pretty much the assumption made in the post, which goes on to conclude (after a bunch of math) that you should indeed pay up unconditionally. I can't tell if there's any disagreement between us...

[-][anonymous]12y00

The origin of the logical coin seems relevant if you can compute it. Even if you know which side is counterfactual according to a particular logical coin, you might still be uncertain about why (whether) this coin (puzzle) was selected and not another coin that might have a different answer. This uncertainty, if allowed by the boundaries of the game, would motivate still paying up where you know reward to be logically impossible (according to the particular coin/puzzle), because it might still be possible according to other possible coins, that you can't rule out a priori.

[This comment is no longer endorsed by its author]Reply

[-][anonymous]12y00

It seems to me that if you have a calculator, you should pay up exactly when you are in a counterfactual (i.e. you hypothetically observe Omega asserting an incorrect statement about the logical coin), but refuse to pay up if the alternative (Omega paying you) is counterfactual (in this case, you know that the event of being paid won't be realized, assuming these are indeed the boundaries of the game). There doesn't appear to be a downside to this strategy, if you do have a calculator and are capable of not exploding in the counterfactual that you know to be counterfactual (according to whatever dynamic is used to "predict" you in the counterfactual).

(Intuitively, a possible downside is that you might value situations that are contradictory, but I don't see how this would not be a semantic confusion, seeing a situation itself as contradictory as opposed to merely its description being contradictory, a model that might have to go through all of the motions for the real thing, but eventually get refuted.)

[This comment is no longer endorsed by its author]Reply

[-]Manfred12y00

Hm, yeah, that sounds really odd.

I think the reason is sounds so odd is: how the hell is Omega calculating what your answer would have been if 1=0?

If what Omega is really calculating is what you would have done if you were merely told something equivalent to 1=0, then sure, paying up can make sense.

[-]cousin_it12y10

It seems to me that the relevant difference between "1=0" and "the billionth digit of pi is even" is that the latter statement has a really long disproof, but there might be a much shorter proof of what the agent would do if that statement were true. Or at least I imagine Omega to be doing the same sort of proof-theoretic counterfactual reasoning that's described in the post. Though maybe there's some better formalization of Counterfactual Mugging with a logical coin that we haven't found...

[-]Manfred12y00

Even if you're cutting off Omega's proofs at some length, there are plenty of math problems that people can't do that are shorter than high-probability predictions that people will or won't pay up. Certainly when I imagine the problem, I imagine it in the form of predicting someone who's been told that the trillionth digit of pi is even and then paying out to that person depending on their counterfactual actions.

Of course, that leads to odd situations when the agent being predicted can do the math problem, but Omega still says "no bro, trust me, the trillionth digit of pi really is even." But an agent who can do the math will still give Omega the money because decision theory, so does it really matter?

[-]cousin_it12y00

If you're proposing to treat Omega's words as just observational evidence that isn't connected to math and could turn out one way or the other with probability 50%, I suppose the existing formalizations of UDT already cover such problems. But how does the agent assign probability 50% to a particular math statement made by Omega? If it's more complicated than "the trillionth digit of pi is even", then the agent needs some sort of logical prior over inconsistent theories to calculate the probabilities, and needs to be smart enough to treat these probabilities updatelessly, which brings us back to the questions asked at the beginning of my post... Or maybe I'm missing something, can you specify your proposal in more detail?

[-]Manfred12y00

Well, I was thinking more in terms of a logical prior over single statements, see my favorite here.

But yeah I guess I was missing the point of the problem.

Also: suppose Omega comes up to you and says "If 1=0 was true I would have given you billion dollars if and only if you would give me 100 dollars if 1=1 was true. 1=1 is true, so can you spare $100?" Does this sound trustworthy? Frankly not, it feels like there's a principle of explosion problem that insists that Omega would have given you all possible amounts of money at once if 1=0 was true.

A formulation that avoids the principle of explosion is "I used some process that I cannot prove the outcome of to pick a digit of pi. If that digit of pi was odd I would have given you a billion dollars iff [etc]."

[-]Ishaan12y-20

Are you saying that Omega won't even offer you the deal unless it used counter-factual reasoning to figure out what you'll do once it offers?

So if Omega has already offered you the deal and you know the coin came out against your favor, and you find you are physically capable of rejecting the deal, you should reject the deal. You've already fooled Omega into thinking you'll take the deal.

It's just that if you've successfully "pre-committed" to the extent that a 100% accurate Omega has predicted you will take the offer, you'll be physically incapable of not taking the offer. It's just like Newcombs problem.

[This comment is no longer endorsed by its author]Reply

[-]Ishaan12y-10

And if that's true, it means that the problem we are facing is, how to make an algorithm that can't go back on its pre-commitments even after it gains the knowledge of how the bet came out.

[This comment is no longer endorsed by its author]Reply

[-]Ishaan12y00

Retraction was unintentional - I thought this was a duplicate comment and "unretract" isn't a thing.

[-]Vladimir_Nesov12y00

You can delete and then re-post a retracted comment if it has no replies yet.

[-]Armok_GoB12y00

As a sort of metaphor/intuition pump/heuristic thingy to make sure I understood things right: Can you think of this as A living in platonic space/tegmark 4 multiverse "before" your universe and having unlimited power, you importing B from it and using that repeatedly as your limited AI, and the set of axiom sets A cares about as a "level 5 tegmark multiverse" with each of the logics being it's own tegmark 4 multiverse?

Perhaps not so useful in building such an AI, but I find these kind of more intuitive rough approximations useful when trying to reason about it using my own human brain.

[-]cousin_it12y20

I'm not sure "existence" is the best intuition pump, maybe it's better to think in terms of "caring", like "I care about what these programs would return" and "I care about what would happen if these logical facts were true". There might well be only one existing program and only one set of true logical facts, but we care about many different ones, because we are uncertain.

[-]Armok_GoB12y00

I already have an intuition setup where "what I care about" and "what really exists" are equivalent. Since, you know, there's nothing else "exists" could mean I can think of and "what I care about" is what it seems to be used like?

May or may not need an additional clause about things that exist to things that exist also existing, recursively.

[-]cousin_it12y40

Let's say you "care" about some hypothetical if you'd be willing to pay a penny today unconditionally in order to prevent your loved ones from dying in that hypothetical. If we take some faraway digit of pi, you'll find that you "care" about both the hypothetical where it's even and the hypothetical where it's odd, even though you know in advance that one of those provably does not "exist". And if you only had a limited time to run a decision theory, you wouldn't want to run any decision theory that threw away these facts about your "care". That's one of the reasons why it seems more natural to me to use "care" rather than "existence" as the input for a decision theory.

[-]Armok_GoB12y00

Sounds like both of us could use either interpretation without any difference in conclusion, and just find different abstractions useful for thinking about it due to small differences in what other kinds of intuitions we've previously trained. Just a minor semantic hiccup and nothing more.

[-]twanvl12y00

How does a logical coin work exactly? To come up with such a thing, wouldn't Omega first need to pick a particular formula? If the statement is about the nth digit of pi, then he needs to pick n. Was this n picked at random? What about the sign of the test itself? If not, how can you be sure that the logical coin is fair?

[-]cousin_it12y00

The approach outlined in the post assumes that "fairness" of the coin is determined by your initial state of logical uncertainty about which math statements are true, rather than indexical uncertainty about which particular Omega algorithm you're going to face. Though I agree that's a big assumption, because we still don't understand logical uncertainty very well.

[-]twanvl12y00

A priori I wouldn't trust Omega to be fair. I only know that he doesn't lie. If Omega also said that he chose the logical statement in some fair way, then that would assure me the logical coin is identical to a normal coin. He can do this either using real uncertainty, like rolling a die to pick from a set of statements where half of them are true. Or he could use logical uncertainty himself, by not calculating the digit of pi before deciding to make the bet, and having a prior that assigns 50% probability to either outcome.

[-]cousin_it12y10

For what it's worth, the post assumes that Omega decides to participate in the game unconditionally, its code doesn't have a branch saying it should play only if such-and-such conditions are met. I'm not sure if that answers your question.

LESSWRONG
LW

LESSWRONG
LW

32

Notes on logical priors from the MIRI workshop

32

32

Attempt 1

Attempt 2

Some ideas about logical counterfactuals

Attempt 3, successful