It's not clear to me that it's impossible, and I think it's worth exploring the idea further before giving up on it. In particular, I think that saying "optimizing expected money is the thing that Bob cares about" assumes the conclusion. Bob cares about having the most money he can actually get, so I don't see why he should do the thing that almost-surely leads to bankruptcy. In the limit as the number of bets goes to infinity, the probability of not being bankrupt will converge to 0. It's weird to me that something of measure 0 probability can swamp the entirety of the rest of the probability.
Hmm. I think we might be misunderstanding each other here.
When I say Gwern's post leads to "approximately Kelly", I'm not trying to say it's exactly Kelly. I'm not even trying to say that it converges to Kelly. I'm trying to say that it's much closer to Kelly than it is to myopic expectation maximization.
Similarly, I'm not trying to say that Kelly maximizes expected value. I am trying to say that expected value doesn't summarize wipeout risk in a way that is intuitive for humans, and that those who expect myopic expected values to persist across a time series of games in situations like this will be very surprised.
I do think that people making myopic decisions in situation's like Bob's should in general bet Kelly instead of expected value maximizing. I think an understanding of what ergodicity is, and whether a statistic is ergodic, helps to explain why. Given this, I also think that it makes sense to ask whether you should be looking for bets that are more ergodic in their ensemble average (like index funds rather than poker).
In general, I find expectation maximization unsatisfying because I don't think it deals well with wipeout risk. Reading Ole Peters helped me understand why people were so excited about Kelly, and reading this article by Gwern helped me understand that I had been interpreting expectation maximization in a very limited way in the first place.
In the limit of infinite bets like Bob's with no cap, myopic expectation maximization at each step means that most runs will go bankrupt. I don't find the extremely high returns in the infinitesimally probable regions to make up for that. I'd like a principled way of expressing that which doesn't rely on having a specific type of utility function, and I think Peters' ergodicity economics gets most but not all the way there.
Other than that, I don't disagree with anything you've said.
If we're already sacrificing max utility to create a myopic agent that's lower risk, why would we not also want it to maximize temporal average rather than ensemble average to reduce wipeout risk?
Arthur found an exact formula which uses so few operations that I wasn't sure how to benchmark it meaningfully
Oh, cool. I'll have to read your post again more carefully.
rather than simplified strawmen
Myopic expectation maximization may be a bad argument, but I don't think it's a strawman. People do believe that you should expectation maximize on each step of a coin-flipping game, instead over the full history of the game. They act on that belief and go bust, like 30% of the players in Haghani & Dewey. Those people would actually do better adopting an ergodic statistic.
I now understand that Bellman based RL learns a value function that ends up maximizing expected value over a history instead of myopically. That doesn't mean that any AI agent using expectation maximization will do this. In particular, I worry that people will wrap a world model in naive expectation maximization and end up with an agent that goes bust in resources. This seems like something people are actually trying to do with LLMs.
Since writing the original post, I've found Gwern's post about a solution to something almost identical to Bob's problem. In this post, he creates a decision tree for every possible move starting from the first one, determining final value at the leaf nodes. He then uses the Bellman equation and traditional expected value to back out what you should do in the earliest moves. The answer is that you bet approximately Kelly.
Gwern's takeaway here is (I think) that expected value always works, but you have to make sure you're solving the right problem. Using expected value naively at each step, discounting the temporal nature of the problem, leads to ruin.
I think many of the more philosophical points in my original post still stand, as doing backwards induction even on this toy problem is pretty difficult (it took his software 16 hours to find the solution). Collapsing a time series expected value problem to a one-shot Kelly problem saves a lot of effort, but to do that you need an ergodic statistic. Even once you've done that, you should still make sure the game is worth playing before you actually start betting.
The temporal average is pretty much just the average exponential growth rate. The reason that it works to use this here is that it's an ergodic quantity in this problem, so the statistic that it gives you in the one-step problem matches the statistic that it would give you for a complete time-sequence. That means if you maximize it in the one-step problem, you'll end up maximizing it in the time-series too (not true of expected value here).
I think your justification for Kelly is pragmatically sufficient, but theoretically leaves me a bit cold. I'm interested in knowing why Kelly is the right choice here, and Ole Peters' paper blew my mind when I read it the first time because it finally gave an answer to this.
Well put. I agree that we should try to maximize the value that we expect to have after playing the game.
My claim here is that just because a statistic is named "expected value" doesn't mean it's accurately representing what we expect to happen in all types of situations. In Alice's game, which is ergodic, traditional ensemble-averaging based expected value is highly accurate. The more tickets Alice buys, the more her actual value converges to the expected value.
In Bob's game, which is non-ergodic, ensemble-based expected value is a poor statistic. It doesn't actually predict the value that he would have. There's no convergence between Bob's value and "expected value", so it seems strange to say that Bob "expects to get" the result of the ensemble average here.
You can certainly calculate Bob's ensemble average, and it will have a higher result than the temporal average (as I state in my post). My claim is that this doesn't help you, because it's not representative of Bob's game at all. In those situations, maximizing temporal average is the best you can do in reality, and the Kelly criterion maximizes that. Trying to maximize ensemble-based expected value here will wipe you out.
I wrote this with the assumption that Bob would care about maximizing his money at the end, and that there would be a high but not infinite number of rounds.
On my view, your questions mostly don't change the analysis much. The only difference I can see is that if he literally only cares about beating Alice, he should go all in. In that case, having $1 less than Alice is equivalent to having $0. That's not really how people use money though, and seems pretty artificial.
How are you expecting these answers to change things?
I think you give short shrift to Ole Peters ideas here. His argument is similar to the one maximizing repeated bets, but it holds together a lot better. I particularly like his explanation in his paper about the St. Petersburg problem.
You say that "We can't time-average our profits [...] So we look at the ratio of our money from one round to the next." But that's not what Peters does! He looks at maximizing total wealth, in the limit as time goes to infinity.
In particular, we want to maximize U=limT→∞U0∗∏Tt=0Rt where U is wealth after all the bets and Rt is 1 plus the percent-increase from bet t. The unique correct thing to maximize is wealth after all your bets.
You want to know what choice to make for any given decision, so you want to maximize your rate of return for each individual bet, which is (∏Tt=0Rt)1T. Peters does a few variable substitutions in the limit as T→∞ to get Rt as a function of probabilities for outcomes of the bets (see the paper), and finds (∏Tt=0Rt)1T=∏nrpnn, where rn is the gain from one possible outcome of the bet and pn is the probability of that outcome.
Then you just choose how much to bet to maximize ∏nrpnn. The argmax of a product is the same as the argmax of sum of the logs, so choosing to maximize this time average will lead to the same observed behavior as choosing to maximize for log utility in ensemble averages (because logrpnn=pnlogrn).