__Ole Peters__ claims that the standard expected utility toolbox for evaluating wagers is a flawed basis for rational decisionmaking. In particular, it commonly fails to take into account that an investor/bettor taking a series of repeated bets is not an __ergodic__ process.

Optimization Process, internety, myself, and a couple others spent about 5 hours across a couple of Seattle meetups investigating what Peters was saying.

# Background

## Why do we care?

Proximally, because __Nassim Taleb is bananas about ergodicity__.

More interestingly, expected utility maximization is widely accepted as the basis for rational decisionmaking. Finding flaws (or at least pathologies) in this foundation is therefore quite high leverage.

A specific example: many people's retirement investment strategies might be said to be taking the "ensemble average" as their optimization target - i.e. their portfolios are built on the assumption that, every year, an individual investor should make the choice that, when averaged across (e.g.) 100,000 investors making that choice for that year, will maximize the mean wealth (or mean utility) of investors in the group at the end of that year. It's __claimed__ that this means that individual retirement plans can't work because many individuals will, in actuality, eventually be impoverished by market swings, and that social insurance schemes (e.g. Social Security) where the current rich are transferring wealth to the current poor avoid this pitfall.

Claims about shortcomings in expected utility maximization are also interesting because I've felt vaguely confused for a long time about why expected value/utility is the right way to evaluate decisions; it seems like I might be more strongly interested in something like "the 99th percentile outcome for the overall utility generated over my lifetime". Any work that promises to pick at the corners of EU maximization is worth looking at.

## What does existing non-Peters theory say?

The __Von Neumann-Morgenstern theorem__ says, loosely, that all rational actors are maximizing *some* utility function in expectation. It's almost certainly not the case that Ole Peters has produced a counterexample, but (again) identifying apparently pathological behavior implied by the VNM math would be quite useful.

Economics research as a whole tends to take it as given that individual actors are trying to maximize, in expectation, the logarithm of their wealth (or some similar risk-averse function mapping wealth to utility).

# Specific claims made by Peters et al.

We were pretty confused about this and spent a bunch of investigation time simply nailing down what was being claimed!

__Log-wealth-maximization straightforwardly falls out of doing some "more appropriate" time-average (instead of ensemble-average) analysis of the St. Petersburg lottery.____You can make rational choices about wealth without needing to pick a "utility function" at all.____Expected utility maximization has some major pathologies.__(didn't have time to dig through this paper enough to identify the specific pathologies claimed)__There's a major difference between the conclusions you'll come to when reasoning using "ensemble average" instead of "time average".__

# What we learned

__1.5x/0.6x coin flip bet__

This is a specific example from __https://medium.com/fresheconomicthinking/revisiting-the-mathematics-of-economic-expectations-66bc9ad8f605__

Here's what we concluded. [These tags explain the level of proof we used.]

- It is indeed the case that playing many, many rounds of this bet compresses almost all the winnings into a tiny corner of probability space, with "lost a bunch of money" being the overwhelming majority of outcomes. [math proof]
__However, no log-wealth-maximizer would accept the bet, ever (at least, not at the stated "bet entire bankroll every time" stakes). [math proof]__- Betting only a tiny, constant chunk of your bankroll every time instead of all your money at once does, as expected, make you richer most of the time. [Monte Carlo simulation, intuition]
- Reasoning about what happens over a gazillion rounds of the game is a little bunk because you don't have to commit to play a zillion rounds up front. [hand-waving math intuition]
- i.e. if someone is choosing, every round, whether or not to keep playing the game, pointing out that (their decision in round N to keep playing is dumb because it would be a terrible idea to commit to play a gazillion ( >> N ) rounds up front) is a red herring.

## "Rich house, poor player" theorems

The "coin flip" example of the previous section is claimed to be interesting because most players go bankrupt, despite every wager offered being positive expected value to the player.

So then an interesting question arises: can some rich "house" exploit some less-rich "player" player by offering a positive-expected-value wager that the player will always choose to accept, but that leads with near certainty to the player's bankruptcy when played indefinitely?

(As noted in the last section, no log-wealth-utility player would take even the first bet, so we chose to steelman/simplify by assuming that wealth == utility (either adjusting the gamble so that it *is* positive expected utility, or adjusting the player to have utility linear in wealth))

We think it's pretty obvious that, if the house can fund wagers whose player-utility is unbounded (either the house has infinity money, or the player has some convenient utility function), then, yes, the house can __almost surely__ bankrupt the player.

So, instead, consider a house that has some finite amount of money. We have a half-baked math proof (__[1]__ __[2]__) that there can't exist a way for the house to almost-surely (defined as "drive the probability of bankruptcy to above (1 - epsilon) for any given epsilon") bankrupt the player.

Tangentially: there's a symmetry issue here: you can just as well say "the house will eventually go bankrupt" if the house will be repeatedly playing some game with unbounded max payoff with many players. However, note that zero-sum games that neither party deems wise to play are not unheard of; risk-averse agents don't want to play *any* zero-sum games at fair odds!

__Paper: The time resolution of the St Petersburg Paradox__

This paper claims to apply Peters's time-average (instead of ensemble-average) methods to resolve the __St. Petersburg Paradox__, and to derive "utility logarithmic in wealth" as a straightforward implication of the time-average reasoning he uses.

We spent about an hour trying to digest this. Unfortunately, academic math papers are often impenetrable even when they're making correct statements using mathematical tools the reader is familiar with, so we're not sure of our conclusions.

Optimization Process also pointed out that equation (6.6) doesn't really make sense for a lottery where the payout is always zero.

This paper works from the assumption that the player is trying to maximize (in expectation) the exponential growth rate of their wealth. __We noticed that this is the log-wealth-maximizer__ - i.e. in order to to get from "maximizes growth" to "maximizes the logarithm of wealth", you don't seem to actually need whatever derivation Peters's paper is making.

# Conclusions

We still don't understand what "the problem with expected utility" is that Peters is pointing at. It seems like expected utility with a risk-averse utility function is sufficient to make appropriate choices in the 1.5x/0.6x flip and St. Petersburg gambles.

Peters's time-average vs. ensemble-average St. Petersburg paper either has broken math, or we don't understand it. Either way, we're still confused about the time- vs. ensemble-average distinction's application to gambles.

Peters's St. Petersburg Paradox paper does derive something equivalent to log-wealth-utility from maximizing expected growth rate, but maybe this is an elaborate exercise in begging the question by assuming "maximize expected growth rate" as the goal.

I, personally, am unimpressed by Peters's claims, and I don't intend to spend more brainpower investigating them.

I haven't read the material extensively (I've skimmed it), but here's what I think is wrong with the time-average-vs-ensemble-average argument and my attempt to steelman it.

It seems very plausible to me that you're right about the question-begging nature of Peter's version of the argument; it seems like by maximizing expected growth rate, you're maximizing log wealth.

But I also think he's trying to point at something real.

In the presentation where he uses the 1.5x/0.6x bet example, Peters shows how "expected utility over time" is an increasing line (this is the "ensemble average" -- averaging across possibilities at each time), whereas the actual payout for any player looks like a straight downward line (in log-wealth) if we zoom out over enough iterations. There's no funny business here -- yes, he's taking a log, but that's just the best way of graphing the phenomenon. It's still true that you lose almost surely if you keep playing this game longer and longer.

This is a real phenomenon. But, how do we formalize an alternative optimization criterion from it? How do we make decisions in a way which "aggregates over time rather than over ensemble"? It's natural to try to formalize something in log-wealth space since that's where we see a straight line, but as you said, that's question-begging.

Well, a (fairly general) special case of log-wealth maximization is the Kelly criterion. How do people justify that? Wikipedia's current "proof" section includes a heuristic argument which runs roughly as follows:

Now, it's easy to see this derivation and think "Ah, so the Kelly criterion optimizes your wealth after a large number of steps, whereas expected utility only looks one step ahead". But, this is not at all the case. An expected money maximizer (EMM) thinking long-term will still take risky bets. Observe that (in the investment setting in which Kelly works) the EMM strategy for a single step doesn't depend on the amount of money you have -- you either put all your money in the best investment, or you keep all of your money because there are no good investments. Therefore, the payout of the EMM in a single step is some multiple C of the amount of money it begins that step with. Therefore, an EMM looking one step ahead just values its winnings at the end of the first step C more -- but this doesn't change its behavior, since multiplying everything by C doesn't change what the max-expectation strategy will be. Similarly, two-step lookahead only modifies things by C2, and so on. So an EMM looking far ahead behaves just like one maximizing its holdings in the very next step.

The trick in the analysis is the way we replace a big sum over lots of possible ways things could go with a single "typical" outcome. This might initially seem like a mere computational convenience -- after all, the vast vast majority of possible sequences have approximately the expected win/loss frequencies. Here, though, it makes all the difference, because it eliminates from consideration the worlds which have the highest weight in the EMM analysis -- the worlds where things to really well and the EMM gets exponentially much money.

OK, so, is the derivation just a mistake?

I think many english-language justifications of the Kelly criterion or log-wealth maximization are misleading or outright wrong. I don't think we can justify it as an analysis of the best long-term strategy, because the analysis rules out any sequence other than those with the most probable statistics, which isn't a move motivated by long-term analysis. I don't think we can even justify it as "time average rather than ensemble average" because we're not time-averaging wealth. Indeed, the whole point is supposedly to deal with the non-ergodic cases; but non-ergodic systems don't have unique time-averaged behavior!

However, I ultimately find something convincing about the analysis: namely, from an evolutionary perspective, we expect to eventually find that only (approximate) log-wealth maximizers remain in the market (with non-negligible funds).

This conclusion is perfectly compatible with expected utility theory as embodied by the VNM axioms et cetera. It's an argument that market entities will tend to have utility=log(money), at least approximately, at least in common situations which we can expect strategies to be optimized for. More generally, there

mightbe an argument that evolved organisms will tend to have utility=log(resources), for many notions of resources.However, maybe Nassim Nicolas Taleb would rebuke us for this tepid and timid conclusion. In terms of pure utility theory, applying a log before taking an expectation is a distinction without a difference -- we were allowed any utility function we wanted from the start, so requiring an arbitrary transform means nothing. For example, we can "solve" the St. Petersburg paradox by claiming our utility is the log of money -- but we can then re-create the paradox by putting all the numbers in the game through an exponential function! So what's the point? We should learn from our past mistakes, and choose a framework which won't be prone to those same errors.

So, can we steelman the claims that expected utility theory is wrong? Can we find a decision procedure which is consistent with the Peters' general idea, but isn't just log-wealth maximization?

Well, let's look again at the kelly-criterion analysis. Can we make that into a general-purpose decision procedure? Can we get it to produce results incompatible with VNM? If so, is the procedure at all plausible?

As I've already mentioned, there isn't a clear way to apply the law-of-large-numbers trick in non-ergodic situations, because there is not a unique "typical" set of frequencies which emerges. Can we do anything to repair the situation, though?

I propose that we maximize median expected value. This gives a notion of "typical" which does not rely on an application of the law of large numbers, so it's fine if the statistics of our sequence don't converge to a single unique point. If they do, however, the median will evaluate things from that point. So, it's a workable generalization of the principle behind Kelly betting.

The median also relates to something mentioned in the OP:

The median is the 50th percentile, so there you go.

Maximizing the median indeed violates VNM:

Both of these concerns become negligible as we take a long-term view. The longer into the future we look, the more outcomes there will be, making the median more robust to shifting probabilities. Similarly, a median-maximizer is indifferent between the two options above, but if you consider the iterated game, it will strongly prefer the global strategy of always selecting the first option.

Still, I would certainly not prefer to optimize median value myself, or create AGI which optimizes median value. What if there's a one-shot situation which is similar to the 40%-death example? I think I similarly don't want to maximize the 99th percentile outcome, although this is less clearly terrible.

Can we give an evolutionary argument for median utility, as a generalization of the evolutionary argument for log utility? I don't think so. The evolutionary argument relies on the law of large numbers, to say that we'll almost surely end up in a world where log-maximizers prosper. There's no similar argument that we almost surely end up in the "median world".

So, all told:

I now like the "time vs ensemble" description better. I was trying to understand everything coming from a Bayesian frame, but actually, all of these ideas are more frequentist.

In a Bayesian frame, it's natural to think directly in terms of a decision rule. I didn't think time-averaging was a good description because I didn't see a way for an agent to directly replace ensemble average with time average, in order to make decisions:

possibilities.When you try to time-average to get rid of your uncertainty, you have to ask "time averagewhat?" -- you don't know what specific situation you're in.actualtime in the analysis; in actual time, you end up dead and lose all your money, so the time-average analysis is trivial.However, all of these points are also true of frequentism:

reference class problem-- what infinite sequence of experiments do you conceive of your experiment as part of?So, I now think what Ole Peters is working on is

frequentist decision theory. Previously, the frequentist/Bayesian debate was about statistics and science, but decision theory was predominantly Bayesian. Ole Peters is working out the natural theory of decision making which frequentists could/should have been pursuing. (So, in that sense, it's much more than just a new argument for kelly betting.)Describing frequentist-vs-Bayesian as time-averaging vs possibility-averaging (aka ensemble-averaging) seems perfectly appropriate.

So, on my understanding, Ole's response to the three difficulties could be:

given an objective scenariothe decision-making technique does well -- the same as frequentists wanting estimates to be unbiased. Bayesians want decisions and estimates to be optimalgiven our uncertaintyinstead.Yes. As I've pointed out before, a lot of these problems go away if you simply solve the actual problem instead of a pseudo-problem. Decision theory, and Bayesian decision theory, has no problem with multi-step processes, like POMDPs/MDPs - or at least, I have yet to see anyone explain what, if anything, of Peters/Taleb's 'criticisms' of expected-value goes away if you actually solve the corresponding MDP. (Bellman did it better 70 years ago.)

I like the "Bellman did it better" retort ;p

FWIW, I remain

pretty firmlyin the expected-utility camp; but I'mquiteinterested in looking for cracks around the edges, and exploring possibilities.I agree that there's no inherent decision-theory issue with multi-step problems (except for the intricacies of tiling issues!).

However, the behavior of Bayesian agents with utility linear in money, on the Kelly-betting-style iterated investment game, for high number of iterations,

seems viscerally wrong. I can respect treating it as a decision-theoretic counterexample, and looking for decision theories which don't "make that mistake". I'm interested in seeing what the proposals look like.Thanks for taking the time to delve into this!

You note that expected utility with a risk-averse utility function is sufficient to make appropriate choices [in those particular scenarios].

This is a slight tangent, but I'm curious to what extent you think people actually follow something that approximates this utility function in real life? It seems like some gamblers instinctively use a strategy of this nature (e.g. playing with house money) or explicitly run the numbers (e.g. the Kelly criterion). And I doubt that anyone is dumb enough to keep betting their entire bankroll on a positive EV bet until they inevitably go bust.

But in other cases (like retirement planning, as you mentioned) a lot of people really do seem to make the mistake of relying on ensemble-average probabilities. Some of them will get burned, with much more serious consequences than merely making a silly bet at the casino.

I guess what I'm asking is: even if Peters et al are wrong about expected utility, do you think they're right about the dangers of failing to understand ergodicity?

Not sure. I can't tell what additional information, if any, Peters is contributing that you can't already get from learning about the math of wagers and risk-averse utility functions.

It seems to me like it's right. So far as I can tell, the "time-average vs ensemble average" argument doesn't really make sense, but it's still true that log-wealth maximization is a distinguished risk-averse utility function with especially good properties.

(I'm not claiming Peters is necessarily adding anything to this analysis.)

(I've only spent several hours thinking about this, so I'm not confident in what I say below. I think Ole Peters is saying something interesting, although he might not be phrasing things in the best way.)

Time-average wealth maximization and utility=log(wealth) give the same answers for multiplicative dynamics, but for additive dynamics they can prescribe different strategies. For example, consider a game where the player starts out with $30, and a coin is flipped. If heads, the player gains $15, and if tails, the player loses $11. This is an additive process since the winnings are added to the total wealth, rather than calculated as a percentage of the player's wealth (as in the 1.5x/0.6x game). Time-average wealth maximization asks whether (15−11)/2>0, and takes the bet. The agent with utility=log(wealth) asks whether (log(30+15)+log(30−11))/2>log30, and refuses the bet.

What happens when this game is repeatedly played? That depends on what happens when a player reaches negative wealth. If debt is allowed, the time-average wealth maximizer racks up a lot of money in almost all worlds, whereas the utility=log(wealth) agent stays at $30 because it refuses the bet each time. If debt is not allowed, and instead the player "dies" or is refused the game once they hit negative wealth, then with probability at least 1/8, the time-average wealth maximizer dies (if it gets tails on the first three tosses), but when it

doesn'tmanage to die, it still racks up a lot of money.In a world where this was the "game of life", the utility=log(wealth) organisms would soon be out-competed by the time-average wealth maximizers that happened to survive the early rounds. So the organisms that tend to evolve in this environment will have utility linear in wealth.

So I understand Ole Peters to be saying that time-average wealth maximization adapts to the game being played, in the sense that organisms which follow its prescriptions will tend to out-compete other kinds of organisms.

Tangentially: reading about the history of gambling theory (the "unfinished game" problem, etc.) is pretty interesting.

Imagine how weird it was when people basically didn't understand expected value at all! Did casinos even know what they were doing, or did they somewhat routinely fail after picking the wrong game design? Did they only settle on profitable designs by accident? Are blackjack, roulette, and other very old games still with us because they happened not to bankrupt casinos that ran them, and were only later analyzed with tools capable of identifying whether the house had the edge?

1. Something like MVP. Don't start by throwing a brand new game out there - even if you have the edge in the game, you have to get people to play it. Getting the stuff for a new game + advertising costs money. Test it out a little (small scale). If you lose money testing it*, you paid a little bit of money to find out you'd have lost a lot of money if you'd tried it out big. (More naturally - big companies are at times known for staying the same, with startups coming in with new ideas. If you copy ideas from other people that haven't bankrupted them...)

2. It seems like it's possible to make it by on "this is unlikely" or setting things up so you always win. (I notice snake eyes doesn't come up a lot. (Perhaps I check this by rolling dice a bunch of times.))

Simplest case: you buy a place, and pool equipment, and you rent it out to people. If they make bets with each other on the outcome, you don't care - they're just paying you so they can play pool.

Slightly more complicated: you offer to handle the betting on the game. People pay you a little to be able to bet (and later, to win big), but the money all comes from them, and you don't care who wins - you make money off people playing, people watching, and people betting!

3. Were casinos a thing before probability was understood?

*One game night with a few people, maybe you and your friends? If you have people who are happy to try out a new game, without real money, (For Free! perhaps?), that's a place to start initially - and all you lose is the time to run it. If you have fun, then maybe that's a small price to pay. And if people are willing to pay to play a game with fake money, then you can just print more monopoly money if you run out - no odds calculation needed for a sure bet.

This seems to assume the people who did the origination here were casinos or explicit entrepreneurs, instead of people who started gambling informally and then started with some sense of which games had which payoffs.

(Or rather, maybe you're explicitly not assuming that and that's your point. But the way I'd make the same point you seem to be making here is not "they operated like a startup" and more like "they operated like a group of friends/rivals/communities incrementally experimenting, and by the time someone considered starting an explicit business, good gamblers had some intuitive sense of how games worked.")

Yeah, there's reverse causality in assuming purpose - I wrote to explain how the reader could make such a thing intentionally without resorting to "entrepreneurs gambled by starting casinos and pseudo-darwinian survival of the business whose games don't lose them money led to the casinos of today". This is probably a side effect of my constructionist tendencies. (I feel like the points I came up with in 5 minutes, which don't reference odds, are within the imagination of a business owner whose livelihood is at stake.)

a) That was in the footnote, and point 2, respectively, though you put it way more clearly. b) I suggested the possibility that they could arise without doing odds at all, or even starting

notwith games of chance. c) I would further note that being a casino and "incrementally experimenting"/'operating like a group of friends' need not be incompatible - consider a game store. You buy a new game. If it's not popular, you lose a little. If it's really popular you buy a lot more.Peters' December 2019 Nature Physics paper (https://www.nature.com/articles/s41567-019-0732-0 ) provides some perspective on 0.6/1.5x coin flip example and other conclusions of the above discussion. (If Peters' claims have changed along the way, I wouldn't know.)

In my reading, there Peters' basic claim is not that ergodicity economics can solve the coin flip game in a way that classical economics can not (because it can, by switching to expected log wealth utility instead of expected wealth), but the utility functions as originally presented are a clutch that misinforms us on people's psychological motives in doing economic decisions. So, while the mathematics of many parts stays the same, the underlying phenomena can be more saliently reasoned about by looking at the individual growth rates in context of whether the associated wealth "process" is additive or multiplicative or something else. Thus there is also less need to use lingo where people may have an (innate, weirdly) "risk-averse utility function" (as compared to some other less risk-averse theoretical utility function).