Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Is Agent Simulates Predictor a "fair" problem?

15Wei Dai

4Chris_Leong

6Wei Dai

2Chris_Leong

4Wei Dai

2Chris_Leong

9cousin_it

6jessicata

2Gordon Seidoh Worley

2Chris_Leong

2Gordon Seidoh Worley

5Pattern

3Wei Dai

1shminux

8Chris_Leong

2shminux

2Gordon Seidoh Worley

2Chris_Leong

1clone of saturn

2Wei Dai

New Answer

New Comment

4 Answers sorted by

My thinking about this is that a problem is fair if it captures some aspect of some real world problem. I believe Gary Drescher came up with ASP as a distillation of the following problem, which itself tries to capture some essense of bargaining in the real world (similar to how Newcomb's Problem is a distillation of Prisoner's Dilemma, which tries to capture some essense of cooperation in the real world):

Consider a simple two-player game, described by Slepnev (2011), played by a human and an agent which is capable of fully simulating the human and which acts according to the prescriptions of UDT. The game works as follows: each player must write down an integer between 0 and 10. If both numbers sum to 10 or less, then each player is paid according to the number that they wrote down. Otherwise, they are paid nothing. For example, if one player writes down 4 and the other 3, then the former gets paid $4 while the latter gets paid $3. But if both players write down 6, then neither player gets paid. Say the human player reasons as follows:

"I don’t quite know how UDT works, but I remember hearing that it’s a very powerful predictor. So if I decide to write down 9, then it will predict this, and it will decide to write 1. Therefore, I can write down 9 without fear."

The human writes down 9, and UDT, predicting this, prescribes writing down 1. This result is uncomfortable, in that the agent with superior predictive power “loses” to the “dumber” agent. In this scenario, it is almost as if the human’s lack of ability to predict UDT (while using correct abstract reasoning about the UDT algorithm) gives the human an “epistemic high ground” or “first mover advantage.” It seems unsatisfactory that increased predictive power can harm an agent.

(It looks like the citation here is wrong, since I can't find a description of this game in Slepnev (2011). As far as I know, I was the first person to come up with this game as something that UDT seems to handle poorly.)

"My thinking about this is that a problem is fair if it captures some aspect of some real world problem" - I would say that you have to accept that the real world can be unfair, but that doesn't make real world problems "fair" in the sense gestured at in the FDT paper. Roughly, it is possible to define a broad class of problems such that you can have an algorithm that optimally handles all of them, for example if the reward only depends on your choice or predictions of your choice.

"It seems unsatisfactory that increased predic...

6

ASP doesn’t seem impossible to solve (in the sense of having a decision theory that handles it well and not at the expense of doing poorly on other problems) so why define a class of “fair” problems that excludes it? (I had an idea that I called UDT2 which I think does better on it than UDT1.1 but it's not as elegant as I hoped.) Defining such problem classes may be useful for talking about the technical properties of specific decision theories, but that doesn't seem to be what you're trying to do here. The only other motivation I can think of is finding a way to justify not solving certain problems, but I don't think that makes sense in the case of ASP.

2

"ASP doesn’t seem impossible to solve (in the sense of having a decision theory that handles it well and not at the expense of doing poorly on other problems) so why define a class of “fair” problems that excludes it?" - my intuition is the opposite, that doing well on such problems means doing poorly on others.

4

Can you explain your intuition? (Even supposing your intuition is correct, it still doesn't seem like defining a "fair" class of problems is that useful. Shouldn't we instead try to find a decision theory that offers the best trade-offs on the actual distribution of decision problems that we (or our AIs) will be expected to face?)
To explain my intuition, suppose we had a decision theory that does well on ASP-like problems and badly on others, and a second decision theory that does badly on ASP-like problems and well on others, then we can create a meta decision theory that first tries to figure out what kind of problem it is facing and then select one of these decision theories to solve it. This meta decision theory would itself be a decision theory that does well on both types of problems so such a decision theory ought to exist.
BTW, you can quote others by putting a quote in a separate paragraph and putting ">" in front of it.

2

It still doesn't seem like defining a "fair" class of problems is that useful" - discovering one class of fair problems lead to CDT. Another lead to TDT. This theoretical work is seperate from the problem of producing pragmatic algorithms that deal with unfairness, but both approaches produce insights.
"This meta decision theory would itself be a decision theory that does well on both types of problems so such a decision theory ought to exist" - I currently have a draft post that does allow some kinds of rewards based on algorithm internals to be considered fair and which basically does the whole meta-decision theory thing (that section of the draft post was written a few hours after I asked this question which is why my views in it are slightly different).

I've defined three classes of "fair" problems for UDT, which are all basically equivalent: single player extensive form games, programs with a halting oracle, and formulas in provability logic. But none of these are plain old programs without oracles or stuff. I haven't been able to define any class of "fair" problems involving plain old programs. The most I can do is agree with you: ASP doesn't seem "fair" in spirit and doesn't translate into any of the classes I mentioned. This is an open question - maybe you can find a better "fair" class!

There are some formal notions of fairness that include ASP. See Asymptotic Decision Theory.

Here's one way of thinking about this. Imagine a long sequence of instances of ASP. Both the agent and predictor in a later instance know what happened in all the earlier instances (say, because the amount of compute available in later instances is much higher, such that all previous instances can be simulated). The predictor in ASP is a logical inductor predicting what the agent will do this time.

Looking at the problem this way, it looks pretty fair. Since logical inductors can do induction, if an agent takes actions according to a certain policy, then the predictor will eventually learn this, regardless of the agent's source code. So only the policy matters, not the source code.

To my mind what seems unfair about some problems is that they propose predictors that, to the best of our knowledge, are physically impossible, like a Newcomb Omega that never makes a mistake, although these are only unfair in the sense that they depict scenarios we won't ever encounter (perfect predictors), not that they ask us something mathematically unfair.

Other more mundane types of unfairness, like where a predictor simply demands something so specific that no general algorithm could always find a way to satisfy it, seem more fair to me because they are the sorts of things we actually encounter in the real world. If you haven't encountered this sort of thing, just spend some time with a toddler, and you will be quickly disabused of the notion that there could not exist an agent which demands impossible things.

I already acknowledged in the real post that there exist problems that are unfair, so I don't know why you think we disagree there.

2

I don't think we disagree.

How can a predictor be unfair to an algorithm that enumerates possible worlds and picks the best one, without any "decision theory" whatsoever? Unless by "unfair" you mean something like "you will get a coin that always lands tails, but the heads win, while everyone else gets a fair coin"

I don't quite understand the question, but unfair refers to the environment requiring the internals to be a particular way. I actually think it is possible to allow some internal requirements to be considered fair and I discuss this in one of my draft posts. Nonetheless, it works as a first approximation.

Say you have certain information about the world and calculate the odds of different outcomes and their utilities. For example, in the twin prisoners dilemma the odds of DC and CD are zero, so the choice is between DD and CC. In the Newcomb's problem the odds of getting $1001000 are zero, so the choice is between $1000000 (one-box) and $1000 (two-box). In the Death in Damascus problem the odds of escaping Death are zero, so the choice is to spend money on travel or not. What would be a concrete example of an unfair problem against this approach?

It's impossible to enumerate possible worlds and pick the best one without a decision theory, because your decision process gives the same output in every possible world where you have a given epistemic state. We obviously need counterfactuals to make decisions, and the different decision theories can be seen as different theories about how counterfactuals work.

It's a simple question, but I think it might help if I add in context. In the paper introducing Functional Decision Theory, it is noted that it is impossible to design an algorithm that can perform well on all decision problems since some of them can be specified to be blatantly unfair, ie. punish every agent that isn't an alphabetical decision theorist.

The question then arises, how do we define which problems are or are not fair? We start by noting that some people consider Newcomb's-like problems to be unfair since your outcome depends on a predictor's prediction, which is rooted in an analysis of your algorithm. So what makes this case any different from only rewarding the alphabetical decision theorist?

The paper answers that the prediction only depends on the decision you end up making and that any other internal details are ignored. So it only cares about your decision and not how you come to it, the problem seems fair. I'm inclined to agree with this reasoning, but a similar line of reasoning doesn't seem to hold with Agent Simulates Predictor. Here the algorithm you use is relevant as the predictor can only predict the agent if it's algorithm is less than a certain level of complexity, otherwise it may make a mistake.

Please note that this question isn't about whether this problem is worth considering; life is often unfair and we have to deal with it the best that we can. The question is about whether the problem is "fair", where I roughly understand "fair" meaning that this is in a certain class of problems that I can't specify at this moment (I suspect it would require its own seperate post) where we should be able to achieve the optimal result in each problem.