Ole Peters claims that the standard expected utility toolbox for evaluating wagers is a flawed basis for rational decisionmaking. In particular, it commonly fails to take into account that an investor/bettor taking a series of repeated bets is not an ergodic process.
Optimization Process, internety, myself, and a couple others spent about 5 hours across a couple of Seattle meetups investigating what Peters was saying.
Background
Why do we care?
Proximally, because Nassim Taleb is bananas about ergodicity.
More interestingly, expected utility maximization is widely accepted as the basis for rational decisionmaking. Finding flaws (or at least pathologies) in this foundation is therefore quite high leverage.
A specific example: many people's retirement investment strategies might be said to be taking the "ensemble average" as their optimization target - i.e. their portfolios are built on the assumption that, every year, an individual investor should make the choice that, when averaged across (e.g.) 100,000 investors making that choice for that year, will maximize the mean wealth (or mean utility) of investors in the group at the end of that year. It's claimed that this means that individual retirement plans can't work because many individuals will, in actuality, eventually be impoverished by market swings, and that social insurance schemes (e.g. Social Security) where the current rich are transferring wealth to the current poor avoid this pitfall.
Claims about shortcomings in expected utility maximization are also interesting because I've felt vaguely confused for a long time about why expected value/utility is the right way to evaluate decisions; it seems like I might be more strongly interested in something like "the 99th percentile outcome for the overall utility generated over my lifetime". Any work that promises to pick at the corners of EU maximization is worth looking at.
What does existing non-Peters theory say?
The Von Neumann-Morgenstern theorem says, loosely, that all rational actors are maximizing some utility function in expectation. It's almost certainly not the case that Ole Peters has produced a counterexample, but (again) identifying apparently pathological behavior implied by the VNM math would be quite useful.
Economics research as a whole tends to take it as given that individual actors are trying to maximize, in expectation, the logarithm of their wealth (or some similar risk-averse function mapping wealth to utility).
Specific claims made by Peters et al.
We were pretty confused about this and spent a bunch of investigation time simply nailing down what was being claimed!
- Log-wealth-maximization straightforwardly falls out of doing some "more appropriate" time-average (instead of ensemble-average) analysis of the St. Petersburg lottery.
- You can make rational choices about wealth without needing to pick a "utility function" at all.
- Expected utility maximization has some major pathologies. (didn't have time to dig through this paper enough to identify the specific pathologies claimed)
- There's a major difference between the conclusions you'll come to when reasoning using "ensemble average" instead of "time average".
What we learned
1.5x/0.6x coin flip bet
This is a specific example from https://medium.com/fresheconomicthinking/revisiting-the-mathematics-of-economic-expectations-66bc9ad8f605
Here's what we concluded. [These tags explain the level of proof we used.]
- It is indeed the case that playing many, many rounds of this bet compresses almost all the winnings into a tiny corner of probability space, with "lost a bunch of money" being the overwhelming majority of outcomes. [math proof]
- However, no log-wealth-maximizer would accept the bet, ever (at least, not at the stated "bet entire bankroll every time" stakes). [math proof]
- Betting only a tiny, constant chunk of your bankroll every time instead of all your money at once does, as expected, make you richer most of the time. [Monte Carlo simulation, intuition]
- Reasoning about what happens over a gazillion rounds of the game is a little bunk because you don't have to commit to play a zillion rounds up front. [hand-waving math intuition]
- i.e. if someone is choosing, every round, whether or not to keep playing the game, pointing out that (their decision in round N to keep playing is dumb because it would be a terrible idea to commit to play a gazillion ( >> N ) rounds up front) is a red herring.
"Rich house, poor player" theorems
The "coin flip" example of the previous section is claimed to be interesting because most players go bankrupt, despite every wager offered being positive expected value to the player.
So then an interesting question arises: can some rich "house" exploit some less-rich "player" player by offering a positive-expected-value wager that the player will always choose to accept, but that leads with near certainty to the player's bankruptcy when played indefinitely?
(As noted in the last section, no log-wealth-utility player would take even the first bet, so we chose to steelman/simplify by assuming that wealth == utility (either adjusting the gamble so that it is positive expected utility, or adjusting the player to have utility linear in wealth))
We think it's pretty obvious that, if the house can fund wagers whose player-utility is unbounded (either the house has infinity money, or the player has some convenient utility function), then, yes, the house can almost surely bankrupt the player.
So, instead, consider a house that has some finite amount of money. We have a half-baked math proof ([1] [2]) that there can't exist a way for the house to almost-surely (defined as "drive the probability of bankruptcy to above (1 - epsilon) for any given epsilon") bankrupt the player.
Tangentially: there's a symmetry issue here: you can just as well say "the house will eventually go bankrupt" if the house will be repeatedly playing some game with unbounded max payoff with many players. However, note that zero-sum games that neither party deems wise to play are not unheard of; risk-averse agents don't want to play any zero-sum games at fair odds!
Paper: The time resolution of the St Petersburg Paradox
This paper claims to apply Peters's time-average (instead of ensemble-average) methods to resolve the St. Petersburg Paradox, and to derive "utility logarithmic in wealth" as a straightforward implication of the time-average reasoning he uses.
We spent about an hour trying to digest this. Unfortunately, academic math papers are often impenetrable even when they're making correct statements using mathematical tools the reader is familiar with, so we're not sure of our conclusions.
Optimization Process also pointed out that equation (6.6) doesn't really make sense for a lottery where the payout is always zero.
This paper works from the assumption that the player is trying to maximize (in expectation) the exponential growth rate of their wealth. We noticed that this is the log-wealth-maximizer - i.e. in order to to get from "maximizes growth" to "maximizes the logarithm of wealth", you don't seem to actually need whatever derivation Peters's paper is making.
Conclusions
We still don't understand what "the problem with expected utility" is that Peters is pointing at. It seems like expected utility with a risk-averse utility function is sufficient to make appropriate choices in the 1.5x/0.6x flip and St. Petersburg gambles.
Peters's time-average vs. ensemble-average St. Petersburg paper either has broken math, or we don't understand it. Either way, we're still confused about the time- vs. ensemble-average distinction's application to gambles.
Peters's St. Petersburg Paradox paper does derive something equivalent to log-wealth-utility from maximizing expected growth rate, but maybe this is an elaborate exercise in begging the question by assuming "maximize expected growth rate" as the goal.
I, personally, am unimpressed by Peters's claims, and I don't intend to spend more brainpower investigating them.
I haven't read the material extensively (I've skimmed it), but here's what I think is wrong with the time-average-vs-ensemble-average argument and my attempt to steelman it.
It seems very plausible to me that you're right about the question-begging nature of Peter's version of the argument; it seems like by maximizing expected growth rate, you're maximizing log wealth.
But I also think he's trying to point at something real.
In the presentation where he uses the 1.5x/0.6x bet example, Peters shows how "expected utility over time" is an increasing line (this is the "ensemble average" -- averaging across possibilities at each time), whereas the actual payout for any player looks like a straight downward line (in log-wealth) if we zoom out over enough iterations. There's no funny business here -- yes, he's taking a log, but that's just the best way of graphing the phenomenon. It's still true that you lose almost surely if you keep playing this game longer and longer.
This is a real phenomenon. But, how do we formalize an alternative optimization criterion from it? How do we make decisions in a way which "aggregates over time rather than over ensemble"? It's natural to try to formalize something in log-wealth space since that's where we see a straight line, but as you said, that's question-begging.
Well, a (fairly general) special case of log-wealth maximization is the Kelly criterion. How do people justify that? Wikipedia's current "proof" section includes a heuristic argument which runs roughly as follows:
Now, it's easy to see this derivation and think "Ah, so the Kelly criterion optimizes your wealth after a large number of steps, whereas expected utility only looks one step ahead". But, this is not at all the case. An expected money maximizer (EMM) thinking long-term will still take risky bets. Observe that (in the investment setting in which Kelly works) the EMM strategy for a single step doesn't depend on the amount of money you have -- you either put all your money in the best investment, or you keep all of your money because there are no good investments. Therefore, the payout of the EMM in a single step is some multiple C of the amount of money it begins that step with. Therefore, an EMM looking one step ahead just values its winnings at the end of the first step C more -- but this doesn't change its behavior, since multiplying everything by C doesn't change what the max-expectation strategy will be. Similarly, two-step lookahead only modifies things by C2, and so on. So an EMM looking far ahead behaves just like one maximizing its holdings in the very next step.
The trick in the analysis is the way we replace a big sum over lots of possible ways things could go with a single "typical" outcome. This might initially seem like a mere computational convenience -- after all, the vast vast majority of possible sequences have approximately the expected win/loss frequencies. Here, though, it makes all the difference, because it eliminates from consideration the worlds which have the highest weight in the EMM analysis -- the worlds where things to really well and the EMM gets exponentially much money.
OK, so, is the derivation just a mistake?
I think many english-language justifications of the Kelly criterion or log-wealth maximization are misleading or outright wrong. I don't think we can justify it as an analysis of the best long-term strategy, because the analysis rules out any sequence other than those with the most probable statistics, which isn't a move motivated by long-term analysis. I don't think we can even justify it as "time average rather than ensemble average" because we're not time-averaging wealth. Indeed, the whole point is supposedly to deal with the non-ergodic cases; but non-ergodic systems don't have unique time-averaged behavior!
However, I ultimately find something convincing about the analysis: namely, from an evolutionary perspective, we expect to eventually find that only (approximate) log-wealth maximizers remain in the market (with non-negligible funds).
This conclusion is perfectly compatible with expected utility theory as embodied by the VNM axioms et cetera. It's an argument that market entities will tend to have utility=log(money), at least approximately, at least in common situations which we can expect strategies to be optimized for. More generally, there might be an argument that evolved organisms will tend to have utility=log(resources), for many notions of resources.
However, maybe Nassim Nicolas Taleb would rebuke us for this tepid and timid conclusion. In terms of pure utility theory, applying a log before taking an expectation is a distinction without a difference -- we were allowed any utility function we wanted from the start, so requiring an arbitrary transform means nothing. For example, we can "solve" the St. Petersburg paradox by claiming our utility is the log of money -- but we can then re-create the paradox by putting all the numbers in the game through an exponential function! So what's the point? We should learn from our past mistakes, and choose a framework which won't be prone to those same errors.
So, can we steelman the claims that expected utility theory is wrong? Can we find a decision procedure which is consistent with the Peters' general idea, but isn't just log-wealth maximization?
Well, let's look again at the kelly-criterion analysis. Can we make that into a general-purpose decision procedure? Can we get it to produce results incompatible with VNM? If so, is the procedure at all plausible?
As I've already mentioned, there isn't a clear way to apply the law-of-large-numbers trick in non-ergodic situations, because there is not a unique "typical" set of frequencies which emerges. Can we do anything to repair the situation, though?
I propose that we maximize median expected value. This gives a notion of "typical" which does not rely on an application of the law of large numbers, so it's fine if the statistics of our sequence don't converge to a single unique point. If they do, however, the median will evaluate things from that point. So, it's a workable generalization of the principle behind Kelly betting.
The median also relates to something mentioned in the OP:
The median is the 50th percentile, so there you go.
Maximizing the median indeed violates VNM:
Both of these concerns become negligible as we take a long-term view. The longer into the future we look, the more outcomes there will be, making the median more robust to shifting probabilities. Similarly, a median-maximizer is indifferent between the two options above, but if you consider the iterated game, it will strongly prefer the global strategy of always selecting the first option.
Still, I would certainly not prefer to optimize median value myself, or create AGI which optimizes median value. What if there's a one-shot situation which is similar to the 40%-death example? I think I similarly don't want to maximize the 99th percentile outcome, although this is less clearly terrible.
Can we give an evolutionary argument for median utility, as a generalization of the evolutionary argument for log utility? I don't think so. The evolutionary argument relies on the law of large numbers, to say that we'll almost surely end up in a world where log-maximizers prosper. There's no similar argument that we almost surely end up in the "median world".
So, all told:
I like the "Bellman did it better" retort ;p
FWIW, I remain pretty firmly in the expected-utility camp; but I'm quite interested in looking for cracks around the edges, and exploring possibilities.
I agree that there's no inherent decision-theory issue with multi-step problems (except for the intricacies of tiling issues!).
However, the behavior of Bayesian agents with utility linear in money, on the Kelly-betting-style iterated investment game, for high number of iterations, seems viscerally wrong. I can respect treating it as a decision-theoretic counterexample, and looking for decision theories which don't "make that mistake". I'm interested in seeing what the proposals look like.