Identity and quining in UDT

[-]Manfred11y110

I'm pretty sure we've had this anti-Newcomb problem discussion already. Short version: what this is is a constructive proof that any decision algorithm has to lose on some problems, because Omega could diagonalize against any algorithm, and then any agent implementing that algorithm is hosed. Therefore, it is fruitless to demand that a good decision algorithm or good agent never lose on any problems - one needs a more generous criterion of goodness. As for XDT, I don't see why it shouldn't get the $1M when playing an anti-Newcomb problem. 1,000,000 is bigger than 1,000, after all.

I'm not totally sure why you need the further hardware in this post either. From what I can tell, the real trouble is that logical implication is not what you want here - we're still waiting on a reasonable system of logical counterfactuals. 3, 4', and if I understand it right, 5, are just not addressing that core problem.

[-]Squark11y50

Hi Manfred, thx for commenting!

...what this is is a constructive proof that any decision algorithm has to lose on some problems, because Omega could diagonalize against any algorithm, and then any agent implementing that algorithm is hosed.

See my reply to KnaveOfAllTrades.

As for XDT, I don't see why it shouldn't get the $1M when playing an anti-Newcomb problem. 1,000,000 is bigger than 1,000, after all.

Which anti-Newcomb problem? In the XDT anti-Newcomb problem, 1000 is the maximal payoff. No decision theory gets more. In the UDT anti-Newcomb problem, XDT gets 1,000,1000 while UDT remains with 1,000,000.

...we're still waiting on a reasonable system of logical counterfactuals. 3, 4', and if I understand it right, 5, are just not addressing that core problem.

Well, there is more than one remaining problem :) Regarding logical counterfactuals, I think that the correct approach is going to be via complexity theory. Hope to write about it later, in the meanwhile you can check out this. By now I discovered some problems with the formalism I used there (and a possible path to fixing them), but I think the general direction is right.

[-]Manfred11y20

As for XDT, I don't see why it shouldn't get the $1M when playing an anti-Newcomb problem. 1,000,000 is bigger than 1,000, after all.

Which anti-Newcomb problem? In the XDT anti-Newcomb problem, 1000 is the maximal payoff. No decision theory gets more.

Right, the XDT ANP. Because this is in fact a decision-controlled problem, only from the perspective of an XDT agent. And so they can simply choose to receive $1M on this problem if they know that that's what they're facing. $1M being bigger than $1000, I think they should do so.

But you do raise a good point, which is that there might be some way to avoid being beaten by other agents on decision-controlled problems, if you give up on maximizing payoff. It might depend on what metric of success you optimize the the decision procedure for. If you take the view logically upstream of filling the boxes, the maximum is $1.001M, and success is relative to that. If you take the view downstream, you might be satisfied with $1000 because that's the maximum.

[-]Squark11y00

Right, the XDT ANP. Because this is in fact a decision-controlled problem, only from the perspective of an XDT agent.

It is decision-determined from the perspective of any agent. The payoff only depends on the agent's decision: namely, it's 1000$ for two-boxing and 0$ for one-boxing.

And so they can simply choose to receive $1M on this problem if they know that that's what they're facing. $1M being bigger than $1000, I think they should do so.

Look on the problem from the perspective of the precursor. The precursor knows XDT two-boxes on the problem. There is no way to change this fact. So one box is going to be empty. Therefore building an XDT agent in this situation is no worse than building any other agent.

[-]Manfred11y10

It is decision-determined from the perspective of any agent. The payoff only depends on the agent's decision: namely, it's 1000$ for two-boxing and 0$ for one-boxing.

Yeah, sorry, I misspoke. The contents of the boxes are controlled by the agent's decision, only for an XTD agent.

Look on the problem from the perspective of the precursor. The precursor knows XDT two-boxes on the problem. There is no way to change this fact. So one box is going to be empty. Therefore building an XDT agent in this situation is no worse than building any other agent.

I am using XDT here in the sense of "the correct decision algorithm (whatever it is)." An XDT agent, if faced with the XDT-anti-Newcomb-problem, can, based on its decision, either get $1M, or $1k. If it takes the $1M, it loses in the sense that it does worse on this problem than a CDT agent. If it takes the $1k, it loses in the sense that it just took $1k over $1M :P

And because of XDT's decision controlling the contents of the box, when you say "the payoff is $1000 for two-boxing and $0 for one-boxing," you're begging the question about what you think the correct decision algorithm should do.

[-]Squark11y00

And because of XDT's decision controlling the contents of the box, when you say "the payoff is $1000 for two-boxing and $0 for one-boxing," you're begging the question about what you think the correct decision algorithm should do.

The problem is in the definition of "correct". From my point of view, "correct" decision algorithm means the algorithm that a rational precursor should build. That is, it is the algorithm instantiating which by the precursor will yield at least as much payoff as instantiating any other algorithm.

[-]Manfred11y00

Well, I agree with you there :P But I think you're cashing this out as the fixed point of a process, rather than as the maximization I am cashing it out as.

[-]KnaveOfAllTrades11y80

I didn't follow everything in the post, but it seems like the motivating problem is that UDT fails in an anti-Newcomb problem defined in terms of the UDT agent. But this sounds a lot like a fully general counterargument against decision algorithms; for any algorithm, we can form a decision problem that penalizes exactly that and only that agent. Take any algorithm running on a physical computer and place it in a world where we specify, as an axiom, that any physical instantiation of that algorithm is blasted by a proton beam as soon as it begins to run, before it can act. This makes any algorithm look bad, but this is just a no free lunch argument that every algorithm seems to fail in some worlds, especially ones where the deck is stacked against the algorithm by defining it to lose.

A decision process 'failing' in some worlds is a problem only if there exists some other decision process that does not suffer analogously. (Possible open problem: What do we mean by analogous failure or an analogously stacked world?)

Of course, it's possible this objection is headed off later in the post where I start to lose the train of thought.

[-]Squark11y40

Hi KnaveOfAllTrades, thx for commenting!

Your proton beam problem is not decision-determined in the sense of Yudkowsky 2010. That is, it depends directly on the decision algorithm rather than depending only on the decisions it makes. Indeed it is impossible to come up with a decision theory that is optimal for all problems, but it might be possible to come up with a decision theory that is optimal for some reasonable class of problems (like decision-determined problems).

Now, there is a decision-determined version of your construction. Consider the following "diagonal" problem. The agent makes a decision out of some finite set. Omega runs a simulation of XDT and penalizes the agent iff its decision is equal to XDT's decision.

This is indeed a concern for deterministic agents. However, if we allow the decision algorithm access to an independent source of random bits, the problem is avoided. XDT produces all answers distributed uniformly and gets optimal payoff.

[-]V_V11y60

This is indeed a concern for deterministic agents. However, if we allow the decision algorithm access to an independent source of random bits, the problem is avoided. XDT produces all answers distributed uniformly and gets optimal payoff.

Omega can predict the probability distribution on the agent answers, and it is possible to construct a Newcomb-like problem that penalizes any specific distribution.

[-]Squark11y20

Hi V_V, thx for commenting!

In a Newcomb-like problem, Omega copies the agent and runs it many times to measure the decision distribution. It then penalizes the agent if the distribution matches an explicitly given one. Such a problem is easily solved by the agent since it knows which distribution to avoid.

In an anti-Newcomb-like problem, Omega measures the distribution produced by XDT and compares it with the agent's decision. It then penalizes the agent according to the likelihood of its decision in the distribution. However, if XDT produces a uniform distribution, all agents do equally well.

A trickier diagonalization is a hybrid Newcomb-anti-Newcomb (NAN) problem. Omega copies the agent and runs it multiple times to measure the distribution. It then compares the result with a similar procedure applied to XDT and penalizes the agent if there is a close match.

Now, the NAN diagonalization can be solved if we assume the agent has access to random bits which are common between all of its copies but inaccessible by the rest of the universe (Omega). This assumption can be interpreted to mean the precursor gives the agent a randomly generated "password" before Omega copies it.

[-]V_V11y80

Now, the NAN diagonalization can be solved if we assume the agent has access to random bits which are common between all of its copies but inaccessible by the rest of the universe (Omega). This assumption can be interpreted to mean the precursor gives the agent a randomly generated "password" before Omega copies it.

Sure, but this defeats the purpose of Omega being a predictor.

[-]Squark11y10

Not really. The idea is looking for a decision theory that performs optimally a natural class of problems which is as wide as possible. CDT is optimal for action-determined problems, UDT is supposed to be optimal for decision-determined problems. Allowing this use of randomness still leaves us with a class of problems much wider than action-determined: for example the Newcomb problem!

[-][anonymous]11y00

Hi V_V, thx for commenting!

In Newcomb-like problems (i.e. when Omega copies the source code of the agent) you can penalize any specific distribution but the agent can always choose to produce a different distribution (because it knows which distribution is penalized).

In anti-Newcomb-like problems (i.e. when Omega uses hardcoded XDT) the payoff depends on a single decision rather than a distribution.

[This comment is no longer endorsed by its author]Reply

[-]Wei Dai11y70

Your "UDT anti-Newcomb problem" reminds me of an open problem with TDT/UDT that was identified in the early days and still unsolved AFAIK. See the discussion here, but in short, it was pointed out that if most of the agents in the universe/multiverse are going to be UDT agents, they should be willing to play C in a one-shot PD game against a random agent sampled from the universe/multiverse. But human beings are not yet UDT agents, so it might be better to build some other kind of agent so we (or our successors) can play D instead. The approach suggested here looks like it could be relevant to solving that problem. What do you think?

One thing that worries me here is that if A has more computing power than P, it can compute P's actual output just by simulating it, so with the equations given in this post, you'd have to define how to condition the expectation of U() on statements that are known by the agent to be false. I was hoping that this is not actually a problem that needs to be solved in order to solve decision theory.

[-]Squark11y40

Hi Wei Dai, thx for commenting!

The "game of 3" problem indeed seems related. Apparently my decision theories solve it correctly, if we consider the precursor to be the "AI submitter". The discussion there is of enormous size, can you please point out the part about most of the agents in the multiverse?

Regarding successor-simulating-precursor, I think that any way of applying logical uncertainty that solves logical counterfactual mugging should handle this as well, since the precursor's "Tegmark V" should be preserved by the successor in some sense, and the precursor cannot simulate itself (see also this).

A possible solution is evaluating the expectation value at a depth of analysis (reference amount of computing resources) corresponding to the precursor's computing power. This doesn't mean the successor's superior computing power is wasted since it allows for more exhaustive maximization. In case of equation (4') the extra power is also used for evaluating pi.

Need to think more on this!

[-]Wei Dai11y40

Apparently my decision theories solve it correctly, if we consider the precursor to be the "AI submitter".

Can you go into some detail on how your decision theories solve it?

The discussion there is of enormous size, can you please point out the part about most of the agents in the multiverse?

I guess it wasn't mentioned explicitly in that discussion, but it's how I've come to think of the problem. Perhaps the most relevant part of that discussion is Eliezer's direct reply, here.

[-]Squark11y10

Can you go into some detail on how your decision theories solve it?

Your problem can be written as

$U\(\$ %20=%20\frac{1}{3}(PD_\alpha(A,%20B_1(A))+PD_\alpha(A,B_2(A))))

where $B\_1$ and $B\_2$ are Omega's players, $A=P\(\$ ) is your player and $PD\_\\alpha$ is the payoff of the first player in the Prisoner Dilemma with the (1,5,6) payoff matrix.

Omega's players end up playing C regardless of $A$ . The agent can either understand this or at least fail to find a strong dependence of the logical probabilities of Omega's players' strategy on either their input (the agent's source) or the conditions in the expectation values it is evaluating (since the conditions are of the form $P\(\$ =X) which seems to be correlated with $B\_i\(P\(\$ )) only in the obvious way i.e. by determining the input to $B\_i$ ).

Therefore, the highest expectation values will be computed for conditions of the form $P\(\$ =DefectBot). Therefore the agent will defect.

I guess it wasn't mentioned explicitly in that discussion, but it's how I've come to think of the problem. Perhaps the most relevant part of that discussion is Eliezer's direct reply, here.

I see. However, your problem doesn't seem to be a realistic model of acausal bargaining with agents in other universes, since in such bargaining you know who you're cooperating with. For example, when an agent considers filling its universe with human utility, it does it in order to cooperate with a human FAI, not in order to cooperate with a paperclip maximizer (which would require a very different strategy namely filling its universe with paperclips).

[-]Wei Dai11y60

It's more of a model for FAI meeting an alien AI in space. Suppose each side then has the choice of doing an arms buildup or not, and the payoffs for these choices are analogous to PD. (If one side builds up while the other doesn't, it can attack and conquer the other. If both sides build up, it's a stalemate and just wastes resources.) BTW, what was your "UDT anti-Newcomb problem" intended to be a model for?

I guess if both sides are using your decision theory, then whether the human FAI plays C or D against the alien AI depends on how much logical correlation the FAI thinks exists between its human designer and the alien AI's designer, which does make sense (assuming we solve the problem of the FAI just simulating its designer and already knowing what their output is).

[-]Squark11y00

It's more of a model for FAI meeting an alien AI in space.

Makes sense.

BTW, what was your "UDT anti-Newcomb problem" intended to be a model for?

Frankly, I didn't have a specific realistic scenario in mind. I came up with the anti-Newcomb problem as simple semi-artificial problem demonstrating the problems with quining in UDT. The reason I started thinking about these problems is that it doesn't seem "classical" UDT can be translated to realistic AGI architecture. UDT takes a finite number of bits and produces a finite number of bits whereas a realistic AGI has continuous input and output streams. Such an AGI has to somehow take into account a formal specification of its own hardware, and the natural way of introducing such a specification seems to me to go through introducing a precursor, specifically a precursor which is a Solomonoff average over all "theories of physics" containing the formal specification.

[-]drnickbone11y50

Consider the following decision problem which I call the "UDT anti-Newcomb problem". Omega is putting money into boxes by the usual algorithm, with one exception. It isn't simulating the player at all. Instead, it simulates what would a UDT agent do in the player's place.

This was one of my problematic problems for TDT. I also discussed some Sneaky Strategies which could allow TDT, UDT or similar agents to beat the problem.

[-]Squark11y50

Hi drnickbone, thx for pointing this out! I added links to your posts.

[-]jessicat11y50

This is an interesting approach. The way I'm currently thinking of this is that you ask what agent a UDT would design, and then do what that agent does, and vary what type an agent is between the different designs. Is this correct?

Consider the anti-Newcomb problem with Omega's simulation involving equation (2)

So is this equation (2) with P replaced with something else?

However, the computing power allocated for evaluation the logical expectation value in (2) might be sufficient to suspect P's output might be an agent reasoning based on (2).

I don't understand this sentence.

[-]Squark11y30

Hi jessicat, thx for commenting!

The way I'm currently thinking of this is that you ask what agent a UDT would design, and then do what that agent does, and vary what type an agent is between the different designs. Is this correct?

Sounds about right.

So is this equation (2) with P replaced with something else?

No, it's the same P. When I say "Omega's simulation doesn't involve P" I mean Omega is not executing P and using the result. Omega is using equation (2) directly, but P still enters into equation (2).

I don't understand this sentence.

Logical uncertainty (the way I see it), is a way to use a certain amount of computing resources to assign probabilities to the outcomes of a computation requiring a larger amount of computing resource. These probabilities depend on the specific resource bound. However, this is not essential to the point I'm making. The point I'm making is that if the logical uncertainty ensemble assigns non-zero probability to P producing XDT, we end up with logical correlation that is better avoided.

[-]orthonormal11y40

By the way, there are other reasons that we use quining to study decision theories within (virtual) mathematical universes. Most importantly, it lets us play with the logic of provability in a straightforward way, which gives us some really nice polynomial-time tools for analyzing the outcomes. See Benja's modal UDT implementation in Haskell and my intro to why this works (especially Sections 6 and 7).

Of course, there are things outside that scope we want to study, but for the moment provability logic is a good lamppost under which we can search for keys.

[-]V_V11y40

When one is programming an AI, it doesn't seem optimal for the AI to locate itself in the universe based solely on its own source code. After all, you build the AI, you know where it is (e.g. running inside a robot), why should you allow the AI to consider itself to be something else, just because this something else happens to have the same source code

IIUC, the point of making the agent use quining to locate itself in the world is that it allows it to cooperate with copies of itself in games such as program-equilibrium prisoner dilemma.
Your proposal based on origin will fail to cooperate in such cases.

(more realistically, happens to have a source code correlated in the sense of logical uncertainty)?

I'm not sure I understand what you are referring to. Quining allows the agent to shortcut Gödelian logical uncertainty in certain cases: the agent can not predict its own actions before it executes them, but if it notices that certain parts of the world contain copies of itself, it can infer that, whatever function they compute, it is the same function as its own. In various cases this is sufficient to achieve coordination.

[-]Squark11y10

IIUC, the point of making the agent use quining to locate itself in the world is that it allows it to cooperate with copies of itself in games such as program-equilibrium prisoner dilemma. Your proposal based on origin will fail to cooperate in such cases.

Not really. Two XDT agents with symmetric precursors will cooperate since conditioning on the output of one precursor will determine the output of the other precursor. An agent running equation (4') will cooperate with a regular UDT agent e.g. by letting pi be a UDT agent. In fact, this latter property doesn't seem to generalize to the other variants which looks like a good argument in favor of (4')!

Quining allows the agent to shortcut Gödelian logical uncertainty in certain cases: the agent can not predict its own actions before it executes them, but if it notices that certain parts of the world contain copies of itself, it can infer that, whatever function they compute, it is the same function as its own. In various cases this is sufficient to achieve coordination.

The problem is that from the precursor's point of view, such behavior is not always rational. More precisely, it is rational when there is a logical relation between the precursor's decision and the other copy (e.g. Omega copied the successor after its creation or Omega copied the precursor and used it to create a copy of the successor). It is not rational when the other copy occurs by coincidence. Indeed, a precursor running UDT will not choose to build a UDT agent in such situations.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

15

Identity and quining in UDT

15

15