Optional stopping

Ege Erdil

I offer you an opportunity to play the following game: if you agree to play, I will flip a fair coin until you tell me to stop, and for each toss of the coin if the coin comes up heads I'll pay you 1$, if the coin comes up tails you'll pay me 1$. Your advantage is that you can stop playing the game whenever you want. How much should you be willing to pay to play this game if you only care about the expected value of your payoff, i.e. if you have linear utility?

Let's try to work this out. If is the expected value of the game for you, then $V$ solves $V = max {0, ((V - 1) + (V + 1)) / 2} = max {0, V}$ because you can either stop for an expected value of 0 or toss a coin for an expected value of $V - 1$ with probability $1 / 2$ and $V + 1$ with probability $1 / 2$ , and we assume you make the optimal choice.

Well, that got us nowhere: this equation is solved by any $V \geq 0$ . All we learned is that the expected value of the game is not negative, which is trivial since you can always just choose to stop the moment you start playing! What's worse is that under our setup this Bellman equation seems to be the only constraint we can find on $V$ , so it seems as if we should say this game has no well-defined expected value.

The bounded game

Let's investigate another variant of this problem to see what we can say about it. Suppose we both have finite bankrolls, $A, B$ respectively for you and me. The game must stop when either of us hits zero, since then we're not guaranteed to be able to pay if we lose the next coin toss.

In this case, your expected value will be a function of both our bankrolls, and the Bellman equation will now become

$V (A, B) = ⎧ ⎨ ⎩ \begin{matrix} max {0, (V (A + 1, B - 1) + V (A - 1, B + 1)) / 2} & if A, B \geq 1 V (A, 0) = 0 V (0, B) = 0 \end{matrix}$

Now we can prove that there is only one solution and that is $V = 0$ everywhere. Clearly $V \geq 0$ everywhere by definition, so we can remove the maximum and simply solve the equation

$V (A, B) = ⎧ ⎨ ⎩ \begin{matrix} (V (A + 1, B - 1) + V (A - 1, B + 1)) / 2 & if A, B \geq 1 V (A, 0) = 0 V (0, B) = 0 \end{matrix}$

The nice thing about this is that it is a decoupled set of single-variable functional equations on each line $A + B = N$ for all $N$ , and on each such line the function must be zero because it's harmonic with zero boundary conditions on both ends. To make this explicit, suppose we fix $N \geq 1$ and define $c = V (1, N - 1)$ . Then, we'll have

$2 c = 2 V (1, N - 1) = V (2, N - 2) + V (0, N) = V (2, N - 2)$

$4 c = 2 V (2, N - 2) = V (1, N - 1) + V (3, N - 3) = c + V (3, N - 3), V (3, N - 3) = 3 c$

and from here it's easy to show by induction that $V (k, N - k) = k c$ . However, we also know that on the other end $V (N, 0) = 0$ , so this implies that $c = 0$ and therefore $V (A, B) = 0$ on the whole line $A + B = N$ . Since $N$ was arbitrary, we deduce $V$ is zero everywhere. In other words, playing this game is worth nothing in the case both our bankrolls are finite.

The unbounded game

I want to make the case that there's indeed something deep that we've stumbled over here. Suppose that we go back to the original game with our bankrolls being unbounded and you execute the following strategy: play until you win $X$ dollars and then stop, for $X$ to be determined later. In other words, play until the difference between the number of heads and the number of tails that have come up so far is equal to $X$ and then stop. What happens?

Well, let's figure out the probability that the game stops, since it's not clear that with this strategy the game stops at all. If $D$ is the difference between the heads and tails that have come up thus far, then the probability $p (D)$ this strategy will eventually halt is going to solve

$p (D) = {\begin{matrix} (p (D - 1) + p (D + 1)) / 2 & if D < X 1 & if D \geq X \end{matrix}$

This is quite similar to what we had before and it's easy to solve it in the same way. Suppose $p (X - 1) = 1 - c$ for some $c$ . Then, we can argue as we did before that $p (X - k) = 1 - k c$ for all $k$ , and since probabilities are bounded from below by $0$ the only way this can work is if $c = 0$ and $p$ is equal to $1$ everywhere. In other words, our strategy stops with probability $1$ no matter what $X$ is.

This is the source of the problem in the unbounded case: for any $X$ we have strategies that stop almost surely (with probability equal to $1$ ) and that give us arbitrarily large amounts of profit from the game. However, since the bounded version of the game has an expected value of zero, we can deduce that there's something fishy about these strategies: they all rely crucially on the fact that you have an infinite bankroll and can afford to lose any finite amount of money.

We might therefore ask the following question: what condition do we need a strategy to satisfy in order to guarantee that its expected value is zero? In other words, if the space of all possible strategies in the infinite version of the game includes some that aren't "sensible", which strategies in this collection can we say are "sensible" in some sense?

The optional stopping theorem

Let's formalize the question we're asking. A strategy in this game is just going to be a decision of when we stop playing and it can only depend on the information we have up until that point (plus some randomness if we want). In other words, in the $k$ th round we must make the decision to stop or not based on information we have for the first $k - 1$ rounds. Then the time in which we stop playing itself becomes a random variable, say $T$ . Our profit from the game is

$π = \infty \sum k = 1 C_{k} 1_{T \geq k}$

where $C_{k} \in {1, - 1}$ encodes the result of the $k$ th coin flip: it's equal to $1$ if it came up heads and $- 1$ if it came up tails. This sum is just cutting off the sum if we ever decide to stop playing, so the $k$ th flip $C_{k}$ only appears if $T$ is actually larger than or equal to $k$ .

If we now do a naive calculation of the expected profit, we'll get

$E [π] = E [\infty \sum k = 1 C_{k} 1_{T \geq k}] = \infty \sum k = 1 E [C_{k} 1_{T \geq k}]$

and now we can use the fact that we know whether $T \geq k$ or not at the end of the $(k - 1)$ th round to use the law of total expectation:

$= \infty \sum k = 1 E [E_{k - 1} [C_{k} 1_{T \geq k}]]] = \infty \sum k = 1 E [E_{k - 1} [C_{k}] 1_{T \geq k}]] = 0$

since the conditional expectation of $C_{k}$ at time $k - 1$ , which I denoted by $E_{k - 1} [C_{k}]$ here, is always zero.

However, something must've gone wrong in this argument, because we actually have a concrete example of a strategy which gives us positive expected profit. Where did the mistake creep in?

It turns out it's the step in which we swap the expectation with the infinite sum. We can always do this if the sum is finite, of course, but if it's infinite we have to be more careful. Since the expectation is a kind of integral and an infinite sum is the limit of its partial sums, the question is essentially about when we can swap a limit with an integral. The dominated convergence theorem controls when we can interchange these two operations: we can do it if we can bound the absolute value of the partial sums from above by something that has finite expected value.

Can we do this? Well, the best we can really do here is the triangle inequality, so we get

$∣ ∣ ∣ ∣ N \sum k = 1 C_{k} 1_{T \geq k} ∣ ∣ ∣ ∣ \leq N \sum k = 1 | C_{k} | 1_{T \geq k} = N \sum k = 1 1_{T \geq k} \leq \infty \sum k = 1 1_{T \geq k} = T$

This is exactly what we wanted: if $E [T]$ is finite, in other words if our strategy has a finite expected stopping time, it has an expected payoff of zero.

We have proved the following:

Optional stopping theorem, bounded increments form: If the coin flip payoffs $C_{k}$ are all bounded from above in absolute value by an absolute constant $M$ independent of $k$ , then any strategy which has a stopping time of finite expected value has expected payoff equal to zero.

In fact the statement applies to much more general objects than our simple coin flipping game: any martingale with bounded increments is going to obey this condition, even if the increments are not independently and identically distributed. In fact, satisfying some version of the optional stopping theorem turns out to be equivalent to being a martingale: this is the basis for the claim that "martingales represent fair games/efficient markets".

What about our strategy of stopping when $D = X$ ? The stopping time of this turns out to have a probability mass function $f_{T} (k) \sim 1 / k^{3 / 2}$ and so its expected value is

$E [T] = \infty \sum k = 1 k f_{T} (k) \sim \infty \sum k = 1 \frac{1}{\sqrt{k}} = \infty$

as we already knew from applying the optional stopping theorem.

What to make of this?

I've written this post to highlight how questions about "what is a fair game?" become much trickier to answer when the games in question are infinite in some sense. Even if a game is "locally fair", in the sense that the payoff is a martingale, it may not be "globally fair" when exploited by using certain ill-behaved strategies. The optional stopping thoerem controls how bad this problem can get.

However, on the flip side, the optional stopping theorem also gives us a guarantee that all strategies which are well behaved will behave "as they should", in the sense that they won't get the player using them any profit from nowhere.

To see how powerful this is, consider the following problem: if I flip a fair coin until we either get $X$ more heads than tails or $X$ more tails than heads and then stop, on average how many times do I flip the coin?

There is both an elementary solution to this problem which uses Bellman equations as we did above, and a solution using the optional stopping theorem on the martingale $D^{2} - k$ where $D$ is the number of heads minus the number of tails so far and $k$ is the number of turns played so far. The argument using the optional stopping theorem is both much nicer and generalizes much more readily to other setups than this simple coin flipping game.

^{^}

...glossing over the question as to if you actually know when you reach said max if you should stop.

[-]TLW2y30

How much should you be willing to pay to play this game if you only care about the expected value of your payoff, i.e. if you have linear utility?

It's a random walk in 1D. It will hit any finite integer value "eventually" with probability one.

(Of course, this assumes that you have an arbitrarily long amount of time available.)

Once you have this insight the rest of the behavior becomes clear.

A 1D random walk attains a maximum (or minimum) value of ~~O(sqrt(steps))~~ O(sqrt(steps log log steps)). So if you can play for N steps at max, you should be willing to pay ~~O(sqrt(N))~~ O(sqrt(N log log N))^[1]. This is obviously unbounded in the limit .

In other words, playing this game is worth nothing in the case both our bankrolls are finite.

Another way to look at this: the probability of a random walk starting at zero hitting $b$ before $- a$ is $\frac{a}{a + b}$ . So your expected payoff is $b \frac{a}{a + b} + - a (1 - \frac{a}{a + b}) = \frac{a b}{a + b} - a (\frac{(a + b) - a}{a + b}) = \frac{a b}{a + b} - \frac{a b}{a + b} = 0$

EDIT: thanks to Ege Erdil for the correction. The expected final value is O(sqrt(N)); the expected maximum value is O(sqrt(N log log N)). An interesting distinction I was previously unaware of.

^{^}
...glossing over the question as to if you actually know when you reach said max if you should stop.

[-]Ege Erdil2y10

This is all true, but these are all points I make in the post already.

Edit: I couldn't point this out earlier because I was on my phone, but the maximum attained by a random walk is not actually , it's $O (\sqrt{N log log N})$ . It really doesn't affect your argument, but strictly speaking the claim is otherwise not true.

[-]TLW2y20

Interesting. You were correct and I was wrong here. I was unaware that the expected maximum was enough larger than the expected final value that it ended up asymptotically different.

[-]TLW2y10

Is it?

Ctrl-F "random walk" in the post shows no hits (aside from this comment.)

I don't use the same vocabulary but your profit from the game is obviously a random walk.

obviously

It is obvious in hindsight; many things that are obvious in hindsight aren't necessarily obvious until pointed out.

I thought it good to explicitly point out the connection in case someone didn't see it.

LESSWRONG
LW

Optional stopping

14

The bounded game

The unbounded game

The optional stopping theorem

What to make of this?

New to LessWrong?

14