## LESSWRONGLW

Andrew Jacob Sauer

# Posts

Sorted by New

Troll Bridge

Thanks for the link, I will check it out!

War and/or Peace (2/8)
As for cannibalism, it seems to me that its role in Eliezer's story is to trigger a purely illogical revulsion in the humans who antropomorphise the aliens.

I dunno about you but my problem with the aliens isn't that it is cannibalism but that the vast majority of them die slow and horribly painful deaths

No cannibalism takes place, but the same amount of death and suffering is present as in Eliezer's scenario. Should we be less or more revolted at this?

The same.

Which scenario has the greater moral weight?

Neither. They are both horrible.

Should we say the two-species configuration is morally superior because they've developed a peaceful, stable society with two intelligent species coexisting instead of warring and hunting each other?

Not really because most of them still die slow and horribly painful deaths.

Troll Bridge

Sorry to necro this here, but I find this topic extremely interesting and I keep coming back to this page to stare at it and tie my brain in knots. Thanks for your notes on how it works in the logically uncertain case. I found a different objection based on the assumption of logical omniscience:

Regarding this you say:

Perhaps you think that the problem with the above version is that I assumed logical omniscience. It is unrealistic to suppose that agents have beliefs which perfectly respect logic. (Un)Fortunately, the argument doesn't really depend on this; it only requires that the agent respects proofs which it can see, and eventually sees the Löbian proof referenced.

However, this assumes that the Löbian proof exists. We show that the Löbian proof of A=cross→U=−10 exists by showing that the agent can prove □(A=cross→U=−10)→(A=cross→U=−10), and the agent's proof seems to assume logical omniscience:

Examining the agent, either crossing had higher expected utility, or P(cross)=0. But we assumed □(A=cross→U=−10), so it must be the latter. So the bridge gets blown up.

If here means "provable in PA", the logic does not follow through if the agent is not logically omniscient: the agent might find crossing to have a higher expected utility regardless, because it may not have seen the proof. If here instead means "discoverable by the agent's proof search" or something to that effect, then the logic here seems to follow through (making the reasonable assumption that if the agent can discover a proof for A=cross->U=-10, then it will set its expected value for crossing to -10). However, that would mean we are talking about provability in a system which can only prove finitely many things, which in particular cannot contain PA and so Löb's theorem does not apply.

I am still trying to wrap my head around exactly what this means, since your logic seems unassailable in the logically omniscient case. It is counterintuitive to me that the logically omniscient agent would be susceptible to trolling but the more limited one would not. Perhaps there is a clever way for the troll to get around this issue? I dunno. I certainly have no proof that such an agent cannot be trolled in such a way.

The Strangest Thing An AI Could Tell You

That's what I was thinking. Garbage in, garbage out.

Is this viable physics?

This seems equivalent to Tegmark Level IV Multiverse to me. Very simple, and probably our universe is somewhere in there, but doesn't have enough explanatory power to be considered a Theory of Everything in the physical sense.

Two Alternatives to Logical Counterfactuals

From an omniscient point of view, yes. From my point of view, probably not, but there are still problems that arise relating to this, that can cause logic-based agents to get very confused.

Let A be an agent, considering options X and not-X. Suppose A |- Action=not-X -> Utility=0. The naive approach to this would be to say: if A |- Action=X -> Utility<0, A will do not-X, and if A |- Action=X -> Utility>0, A will do X. Suppose further that A knows its source code, so it knows this is the case.
Consider the statement G=(A |- G) -> (Action=X -> Utility<0). It can be constructed by using Godel-numbering and quines. Present A with the following argument:

Suppose for the sake of argument that A |- G. Then A |- (A |- G), since A knows its source code. Also, by definition of G, A |- (A |- G) -> (Action=X -> Utility<0). By modus ponens, A |- (Action=X -> Utility<0). Therefore, by our assumption about A, A will do not-X: Action!=X. But, vacuously, this means that (Action=X -> Utility<0). Since we have proved this by assuming A |- G, we know that (A |- G) -> (Action=X -> Utility<0), in other words, we know G.

The argument then goes, similarly to above:
A |- G
A |- (A |- G)
A |- (A |- G) -> (Action=X -> Utility<0)
A |- (Action=X -> Utility<0)
Action=Not-X

We proved this without knowing anything about X. This shows that naive logical implication can easily lead one astray. The standard solution to this problem is the chicken rule, making it so that if A ever proves which action it will take, it will immediately take the opposite action, which avoids the argument presented above, but is defeated by Troll Bridge, even when the agent has good logical uncertainty.

These problems seem to me to show that logical uncertainty about the action one will take, paired with logical implications about what the result will be if you take a particular action, are insufficient to describe a good decision theory.

Two Alternatives to Logical Counterfactuals
Suppose you learn about physics and find that you are a robot. You learn that your source code is "A". You also believe that you have free will; in particular, you may decide to take either action X or action Y.

My motivation for talking about logical counterfactuals has little to do with free will, even if the philosophical analysis of logical counterfactuals does.

The reason I want to talk about logical counterfactuals is as follows: suppose as above that I learn that I am a robot, and that my source code is "A"(which is presumed to be deterministic in this scenario), and that I have a decision to make between action X and action Y. In order to make that decision, I want to know which decision has better expected utility. The problem is that, in fact, I will either choose X or Y. Suppose without loss of generality that I will end up choosing action X. Then worlds in which I choose Y are logically incoherent, so how am I supposed to reason about the expected utility of choosing Y?

"No evidence" as a Valley of Bad Rationality

It's hard to tell, since while common sense is sometimes wrong, it's right more often than not. An idea being common sense shouldn't count against it, even though like the article said it's not conclusive.

How to Measure Anything

Seems to me that before a philosophical problem is solved, it becomes a problem in some other field of study. Atomism used to be a philosophical theory. Now that we know how to objectively confirm it, it (or rather, something similar but more accurate) is a scientific theory.

It seems that philosophy (at least, the parts of philosophy that are actively trying to progress) is about trying to take concepts that we have intuitive notions of, and figure out what if anything those concepts actually refer to, until we succeed at this well enough that to study then in more precise ways than, well, philosophy.

So, how many examples can we find where some vague but important-seeming idea has been philosophically studied until we learn what the idea refers to in concrete reality, and how to observe and measure it to some degree?