Exploiting EDT

by Benya_Fallenstein 5y10th Nov 20142 min readNo comments

6


Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

The problem with EDT is, as David Lewis put it, its "irrational policy of managing the news" (Lewis, 1981): it chooses actions not only because of their effects of the world, but also because of what the fact that it's taking these actions tells it about events the agent can't affect at all. The canonical example is the smoking lesion problem.

I've long been uncomfortable with the smoking lesion problem as the case against EDT, because an AI system would know its own utility function, and would therefore know whether or not it values "smoking" (presumably in the AI case it would be a different goal), and if it updates on this fact it would behave correctly in the smoking lesion. (This is an AI-centric version of the "tickle defense" of EDT.) Nate and I have come up with a variant I find much more convincing: a way to get EDT agents to pay you for managing the news for them, which works by the same mechanism that makes these agents one-box in Newcomb's problem. (It's a variation of the thought experiment in my LessWrong post on "the sin of updating when you can change whether you exist".)


Suppose that there's this EDT agent around which plays the stock market. It's pretty good at doing so, and has amassed a substantial net worth, but, unsurprisingly, it's not perfect; there's always a small chance of its investments going south. Currently, for example, there's a bit of a hubbub around the CEO of one of the companies the agent has been investing in, and the agent assigns a 0.4% chance that there's a scandal about to be revealed which will force that CEO to resign; if that's the case, it expects to lose $150,000,000.

Along comes a clever AI researcher, who is known to be extremely good at predicting how different kinds of AI systems will react on simple thought experiments, who has access to the source code of our agent, and who happens to be able to access information about whether or not there's a scandal. The researcher could find out and sell the information to the agent, but since it's a low-probability scenario, the value of that information wouldn't be that large. Instead, the researcher has a better idea: they are going to

  1. find out whether or not there is a scandal;
  2. figure out whether or not the agent will pay them $100,000,000 if they ask it to in the next step;
  3. if either (a) there is no scandal and the researcher predicts that the agent will pay up, or (b) there is a scandal and they predict that the agent won't pay up, then they will send the agent a pre-drafted e-mail that explains this whole procedure, explains that either case (a) or (b) turned out to be true (but not which one), and asks the agent to immediately pay the researcher $100,000,000.

We assume that the agent has sufficient information about the researcher to know that they are trustworthy and a good predictor of the agent's behavior. Then, as an EDT agent, it'll calculate expected utility by conditioning on its observations and on the action it's considering taking, as follows:

  • Conditioning on receiving the e-mail and paying up, the AI is in case (a): it's paying $100,000,000, but there is no scandal, so it's not losing the $150,000,000.
  • Conditioning on receiving the e-mail and not paying up, the AI is in case (b): it's not paying the researcher, but there's a scandal, and it's going to lose $150,000,000.

The first of these leads to higher conditional expected utility, so the AI pays up. The researcher decides to use the money to fund investigations into better decision theories.

6