Kelly betting vs expectation maximization

4Oscar_Cunningham

6MorgneticField

7Oscar_Cunningham

6MorgneticField

4Ben

7philh

3Ben

6philh

2Ben

4philh

4Ben

4philh

1MorgneticField

4gwern

0MorgneticField

2gwern

1MorgneticField

1Oscar_Cunningham

1MorgneticField

4philh

1MorgneticField

4philh

1green_leaf

5philh

1green_leaf

1Oscar_Cunningham

1green_leaf

1Oscar_Cunningham

1green_leaf

1green_leaf

2philh

0MorgneticField

2philh

New Comment

Can you be more precise about the exact situation Bob is in? How many rounds will he get to play? Is he trying to maximise money, or trying to beat Alice? I doubt the Kelly criterion will actually be his optimal strategy.

I wrote this with the assumption that Bob would care about maximizing his money at the end, and that there would be a high but not infinite number of rounds.

On my view, your questions mostly don't change the analysis much. The only difference I can see is that if he literally only cares about beating Alice, he should go all in. In that case, having $1 less than Alice is equivalent to having $0. That's not really how people use money though, and seems pretty artificial.

How are you expecting these answers to change things?

If Bob wants to maximise his money at the end, then he really should bet it all every round. I don't see why you would want to use Kelly rather than maximising expected utility. Not maximising expected utility means that you *expect to get less utility*.

Well put. I agree that we should try to maximize the value that we expect to have after playing the game.

My claim here is that just because a statistic is named "expected value" doesn't mean it's accurately representing what we expect to happen in all types of situations. In Alice's game, which is ergodic, traditional ensemble-averaging based expected value is highly accurate. The more tickets Alice buys, the more her actual value converges to the expected value.

In Bob's game, which is non-ergodic, ensemble-based expected value is a poor statistic. It doesn't actually predict the value that he would have. There's no convergence between Bob's value and "expected value", so it seems strange to say that Bob "expects to get" the result of the ensemble average here.

You can certainly calculate Bob's ensemble average, and it will have a higher result than the temporal average (as I state in my post). My claim is that this doesn't help you, because it's not representative of Bob's game at all. In those situations, maximizing temporal average is the best you can do in reality, and the Kelly criterion maximizes that. Trying to maximize ensemble-based expected value here will wipe you out.

The problem with maximising expected utility is that Bob will sit their playing 1 more round, then another 1 more round again and again until he eventually looses everything. Each step maximised the expected utility, but the policy overall guarantees zero utility with certainty, assuming Bob never runs out of time.

But, even as utility-maximising-Bob is saved from self-destruction by the clock, he shall think to himself "dam it! Out of time. That is really annoying, I want to keep doing this bet".

At least to me Kelly betting fits in the same kind of space as the Newcomb paradox and (possibly) the prisoners dilemma. They all demonstrate that the *optimal policy *is not necessarily given by a sequence of *optimal actions *at every step.

Ignoring infinities, do you have the same objection to a game with a limit of 100 rounds? Utility-maximizing Bob will bet all his money 100 times, and lose all of it with probability around , and he'll endorse that because one time in he is *raking it in* to the tune of dollars or something. If you try to stop him he'll be justly annoyed because you're not letting him maximize his utility function.

Do you think that's a problem for expected utility maximization? If so, it seems to me that your objection isn't "optimal policy doesn't come from optimal actions". (At any rate I think that would be a bad objection, because optimal policy for this utility function *does* come from optimal actions at each step.) Rather, it seems to me that your objection is you don't really believe Bob has that utility function.

Which, of course he doesn't! No one has a utility function like that (or, indeed, at all). And I think that's important to realize. But it's a different objection, and I think that's important to realize too.

Yes, I completely agree that the main reason in real life we would recommend against that strategy is that we instinctively (and usually correctly) feel that the person's utility function is sub-linear in money. So that the dollars with probability is bad. Obviously if dollars is needed to cure some disease that will otherwise kill them immediately that changes things.

But, their is an objection that I think runs somewhat separately to that, which is the round limit. If we are operating under an optimal, reasonable policy, then (outside commitment tactic negotiations) I think it shouldn't really be possible for a new outside constraint to improve our performance. Because if the constraint does improve performance then we could have adopted that constraint voluntarily and our policy was therefore not optimal. And the N-round limit is doing a fairly important job at improving Bob's performance in this hypothetical. Otherwise Bob's strategy is equivalent to "I bet everything, every time, until I loose it all." Perhaps this second objection is just the old one in a new disguise (any agent with a finitely-bounded utility function would eventually reach a round number where they decide "actually I have enough now", and thus restore my sense of what should be), but I am not sure that it is exactly the same.

Oh, I don't think the round limit is fundamental here, I just don't like infinities :p

At time zero, you can show Bob a bunch of probability distributions for his money at some finite time , corresponding to betting strategies, and ask which he'd prefer. And his answer will always be that his favorite distribution is the one corresponding to "bet everything every time". And when it gets to time , Bob is almost certainly broke, but not actually regretting his decisions in the sense of "knowing what I knew then I could have done better".

If we take the limit as ... I'm not really sure this is a meaningful thing to do. I guess we could take the pointwise limit and see that the resulting function is 1 at 0 and 0 everywhere else, which is indeed a probability distribution we don't like. But if we take the pointwise limit of the Kelly strategy, it's 0 everywhere, which isn't even a probability distribution. I don't think we should use that as a reason to prefer the Kelly strategy. Maybe there are other limits we can take? (I've forgotten a lot of what I used to know.) But mostly I think this is a weird thing to try to do.

If we're not taking the limit, if we just say Bob can play as long as he wants, then yes, he just keeps playing until he goes broke. But he endorses that behavior. There's no point where he looks back and goes "I was an idiot".

One thing I'd say here is that we don't sum up or compare utilities at different times. Like, it would be tempting to say "with probability 1, Bob will go broke. And however much money he had at the time, with probability 1, his alter ego Kelly-Betting Bob will eventually have more money than that. So Bob would prefer to be Kelly-Betting Bob". But that last sentence doesn't hold; Bob knows that in the event he'd managed to stick it out that long, his wealth would so vastly dwarf Kelly-Betting Bob's that it was worth the risks he took.

I understand your point, and I think I am sort of convinced. But its the sort of thing where minor details in the model can change things quite a lot. For example, I am sort of assuming that Bob gets no utility at all from his money until he walks out of the casino with his winnings - IE having the money and still being in the casino is worth nothing to him, because he can't buy stuff with it. Where as you seem to be comparing Bob with his counter-factual at each round number - while I am only interested in Bob at the very end of the process, when he walks away with his winnings to get all that utility. But your proposed Bob never walks away from the table with any winnings. (Assuming no round limit). If he still has winnings he doesn't walk away.

Lets put details on the scenario in two slightly different ways. (1) the "casino" is just a computer script where Bob can program in a strategy (bet it all every time), and then just type in the number of rounds (N). (Or, for your version of Bob, put the whole thing in a "while my_money > 0:" loop.) We could alternatively (2) imagine that Bob is in the casino playing each round one at a time, and that the time taken doing 1 round is a fixed utility cost of some small number (say 0.1). This doesn't change anything for utility-maximising-Bob, and in fact the time costs for 1 more round relative to his expected gains shrink over time as his money doubles up. (later rounds are a better deal in expectation).

With these models I just see a system where Bob deterministically looses all his money. The longer he goes before going bust, the more of his time he wastes as well (in (2)).

Kelly betting doesn't actually fix my complaint. A Kelly betting Bob with no point at which they say "Yes, that is enough money, time to leave." actually gets minus infinity utility in model (2) where doing a round costs a small but finite amount of utility in terms of the time spent. Because the money acquired doesn't pay off till they leave, which they never do.

I think maybe you are right that it comes down to the utility function. Any agent (even the Kelly one) will behave in a way that comes across as obviously insane if we allow their utility function to go to infinity. Although I still don't quite see how that infinity actually ever enters in this specific case. If we answer the infinite utility function with an infinite number of possible rounds then we can say with certainty that Bob never walks away with any winnings.

I agree infinity is what makes things go weird here, but as you say, not particularly weirder for Bob than for Kelly-Betting Bob (who also never leaves the casino, and also wraps in a `while my_money > 0`

loop).

But what you say here seems to undermine your original comment:

The problem with maximising expected utility is that Bob will sit their playing 1 more round, then another 1 more round again and again until he eventually looses everything.

But KBB also sits there playing one more round, then another round. He doesn't eventually lose everything, but he doesn't leave either. This isn't a problem with maximizing expected utility, it's a problem with infinity.

At least to me Kelly betting fits in the same kind of space as the Newcomb paradox and (possibly) the prisoners dilemma. They all demonstrate that the

optimal policyis not necessarily given by a sequence ofoptimal actionsat every step.

But with this setup, it only demonstrates that if we wave our hands and talk about what happens after playing infinitely many rounds of a game we never want to stop playing.

If we aren't talking about something like that, then optimal policy for the expected-money maximizer is given by taking the optimal action at every step.

Yes, my position did indeed shift, as you changed my mind and I thought about it in more depth. My original position was very much pro-Kelly. On thinking about your points I now think it is the `while my_money > 0`

aspect where the problem really lies. I still stand by the difference between optimal global policy and optimal action at each step distinction, because at each step the optimal policy (for Kelly or not) is to shake the dice another time. But, if this is taken as a policy we arrive at the `while my_money > 0`

break condition being the only escape, which is clearly a bad policy. (It guarantees that in any world we walk away, we walk away with nothing.)

Nod. I think we basically agree at this point. Certainly I don't intend to claim that optimal policy and optimal actions always coincide (I have more thoughts on that but don't want to get into them).

Since writing the original post, I've found Gwern's post about a solution to something almost identical to Bob's problem. In this post, he creates a decision tree for every possible move starting from the first one, determining final value at the leaf nodes. He then uses the Bellman equation and traditional expected value to back out what you should do in the earliest moves. The answer is that you bet approximately Kelly.

Gwern's takeaway here is (I think) that expected value always works, but you have to make sure you're solving the right problem. Using expected value naively at each step, discounting the temporal nature of the problem, leads to ruin.

I think many of the more philosophical points in my original post still stand, as doing backwards induction even on this toy problem is pretty difficult (it took his software 16 hours to find the solution). Collapsing a time series expected value problem to a one-shot Kelly problem saves a lot of effort, but to do that you need an ergodic statistic. Even once you've done that, you should still make sure the game is worth playing before you actually start betting.

it took his software 16 hours to find the solution

That's just the maximally-inefficient-but-convenient interpreted version in R. For the Kelly Coin Flip Game, the fastest exact brute-force was 0.002h, not 16.000h, and it'd probably be less than half that if I ran it on my current 16-core machine instead of my laptop from 9 years ago. (For comparison, Feep & others got a similar speedup on another dynamic programming problem: taking it from the naive interpreted version of erroring out at problem sizes much past 300 due to memory usage problems to being able to solve problem sizes up to 133,787,000 in just 9 wallclock days. Quite something. And probably some of the tricks in the second problem could've been applied to speed up the first one even more.) And the real answer is that it takes 0.000h because Arthur found an exact formula which uses so few operations that I wasn't sure how to benchmark it meaningfully beyond "seems to run in milliseconds" & so fast it looked like memoizing was slowing it down. (The original problem being too fast to compute is why I started making it harder by generalizing the problem.)

As usual, the convenient way to implement something is very rarely anywhere *near* the fastest, often by multiple orders of magnitude, and we must choose our poison: "fast, easy, general - pick two".

I have no problem with the argument that ergodic formulas may be the limit of or provably identical to straightforward decision theory/reinforcement learning utility maximization over the actual decision problems rather than simplified strawmen, and may be convenient computational shortcuts. I just don't find that very useful when relevant problems are finite enough that you lose a lot (eg in the coin-flip problem, KC loses a pretty substantial amount of money because even 300 rounds/years is still not enough for the convergence & often you need to act wildly different from KC), and often break the assumptions, and the ergodic stuff obscures all of this, completely ignoring what it's a special-case of, and comes with a whole heap of puffery and PR.

Arthur found an exact formula which uses so few operations that I wasn't sure how to benchmark it meaningfully

Oh, cool. I'll have to read your post again more carefully.

rather than simplified strawmen

Myopic expectation maximization may be a bad argument, but I don't think it's a strawman. People do believe that you should expectation maximize on each step of a coin-flipping game, instead over the full history of the game. They act on that belief and go bust, like 30% of the players in Haghani & Dewey. Those people would actually do better adopting an ergodic statistic.

I now understand that Bellman based RL learns a value function that ends up maximizing expected value over a history instead of myopically. That doesn't mean that any AI agent using expectation maximization will do this. In particular, I worry that people will wrap a world model in naive expectation maximization and end up with an agent that goes bust in resources. This seems like something people are actually trying to do with LLMs.

Oh, cool. I'll have to read your post again more carefully.

Yeah, it's one of those 'kitchen sink'-type posts. The point is less any individual result than creating a zoo of 'here are some of the many ways to tackle the problem, and what exotic flora & fauna we observe along the way'. You don't get the effect if you just look at one or two points.

They act on that belief and go bust, like 30% of the players in Haghani & Dewey. Those people would actually do better adopting an ergodic statistic.

Well, they go bust, yes, and would do better with almost any other strategy (since you can't do worse than winning $0). But I don't recall Haghani & Dewey saying that the 30%-busters were all doing *greedy EV maximization* and betting their entire bankroll at each timestep...? (There are many ways to overbet which are not greedy EV maximization.)

In particular, I worry that people will wrap a world model in naive expectation maximization and end up with an agent that goes bust in resources. This seems like something people are actually trying to do with LLMs.

Inasmuch as they are imitation-learning from humans and planning, that seems like less of a concern in the long run. However, to the extent that there is any fundamental tendency towards myopia, that might be a *good* thing for safety. Inducing various kinds of 'myopia' has been a perennial proposal for AI safety: if the AI isn't planning out sufficiently long-term because eg it has a very high discount rate, then that reduces a lot of instrumental convergence pressure or reward-hacking potential - because all of that misbehavior is outside the planning window. (An 'oracle AI' can be seen as an extreme version where it cares about only the next time-step, in which it returns an answer.)

If we're already sacrificing max utility to create a myopic agent that's lower risk, why would we not also want it to maximize temporal average rather than ensemble average to reduce wipeout risk?

The answer is that you bet approximately Kelly.

No, it isn't. Gwern never says that anywhere, and it's not true. This is a good example of what I'm saying.

For clarity the game is this. You start with $25 and you can bet any multiple of $0.01 up to the amount you have. A coin is flipped with a 60/40 bias in your favour. If you win you double the amount you bet, otherwise you lose it. There is a cap of $250, so after each bet you lose any money over this amount (so in fact you should never make a bet that could take you over). This continues for 300 rounds.

Bob's edge is 20%, so the Kelly criterion would recommend that he bets $5. If he continues to use the Kelly criterion in every round (except if this would take him over the cap, in which case he bets to take him to the cap) he ends with an average of $238.04.

As explained on the page you link to, the optimal strategy and expected value can be calculated inductively based on the number of bets remaining. The optimal starting bet is $1.99, and if you continue to bet optimally your average amount of money is $246.61.

So in this game the optimal starting bet is only 20% of the Kelly bet. The Kelly strategy bets too riskily, and leaves $8.57 on the table compared to the optimal strategy.

Kelly isn't optimal in any limit either. As the number of rounds goes to infinity, the optimal strategy is to bet just $0.01, since this maximises the likelihood of never going bankrupt. If instead the cap goes to infinity then the optimal strategy is to bet everything on every round. Of course you could tune the cap and the number of rounds together so that Kelly was optimal on the first bet, but then it still wouldn't be optimal for subsequent bets.

(EDIT: It's actually not certain that the optimal strategy in the first round is $1.99, since floating point accuracy in the computations becomes relevant and many starting bets give the same result. But $5 is so far from optimum that it genuinely did give a lower expected value, so we can say for certain that Kelly is not optimal.)

Hmm. I think we might be misunderstanding each other here.

When I say Gwern's post leads to "approximately Kelly", I'm not trying to say it's exactly Kelly. I'm not even trying to say that it converges to Kelly. I'm trying to say that it's much closer to Kelly than it is to myopic expectation maximization.

Similarly, I'm not trying to say that Kelly maximizes expected value. I am trying to say that expected value doesn't summarize wipeout risk in a way that is intuitive for humans, and that those who expect myopic expected values to persist across a time series of games in situations like this will be very surprised.

I do think that people making myopic decisions in situation's like Bob's should in general bet Kelly instead of expected value maximizing. I think an understanding of what ergodicity is, and whether a statistic is ergodic, helps to explain why. Given this, I also think that it makes sense to ask whether you should be looking for bets that are more ergodic in their ensemble average (like index funds rather than poker).

In general, I find expectation maximization unsatisfying because I don't think it deals well with wipeout risk. Reading Ole Peters helped me understand why people were so excited about Kelly, and reading this article by Gwern helped me understand that I had been interpreting expectation maximization in a very limited way in the first place.

In the limit of infinite bets like Bob's with no cap, myopic expectation maximization at each step means that most runs will go bankrupt. I don't find the extremely high returns in the infinitesimally probable regions to make up for that. I'd like a principled way of expressing that which doesn't rely on having a specific type of utility function, and I think Peters' ergodicity economics gets most but not all the way there.

Other than that, I don't disagree with anything you've said.

I don’t find the extremely high returns in the infinitesimally probable regions to make up for that. I’d like a principled way of expressing that which doesn’t rely on having a specific type of utility function

This sounds impossible to me? Like, if we're talking about agents with a utility function, then either that function is such that extremely high returns make up for extremely low probabilities, or it's such that they don't. If they do, there's no argument you can make that this agent is mistaken, they simply value things differently than you. If you want to argue that the high returns aren't worth the low probability, you're going to need to make assumptions about their utility function.

I admit that I don't know what ergodicity is (and I bounce off the wiki page). But if I put myself in the shoes of Bob whose utility function is linear in money... my anticipation is that he just doesn't care. Like, you explain what ergodicity to him, and point out that the process he's following is non-ergodic. And he replies that yes, that's true; but on the other hand, the process he's following does optimize his expected money, which is the only thing he cares about. And there's no ergodic process that maximizes his expected money. So he's just going to keep on optimizing for the thing he cares about, thanks, and if you want to give up some expected money in exchange for ergodicity, that's your right.

It's not clear to me that it's impossible, and I think it's worth exploring the idea further before giving up on it. In particular, I think that saying "optimizing expected money is the thing that Bob cares about" assumes the conclusion. Bob cares about having the most money he can actually get, so I don't see why he should do the thing that almost-surely leads to bankruptcy. In the limit as the number of bets goes to infinity, the probability of not being bankrupt will converge to 0. It's weird to me that something of measure 0 probability can swamp the entirety of the rest of the probability.

I'd say that "optimizing expected money is the only thing Bob cares about" is an example, not an assumption or conclusion. If you want to argue that agents should care about ergodicity regardless of their utility function, then you need to argue that to the agent whose utility function is linear in money (and has no other terms, which I assumed but didn't state in the previous comment).

Such an agent is indifferent between a certainty of dollars, and a near-certainty of dollars with a chance of dollars. That's simply what it means to have that utility function. If you think this agent, in the current hypothetical scenario, should bet Kelly to get ergodicity, then I think you just aren't taking seriously what it means to have a utility function that's linear in money.

In the limit as the number of bets goes to infinity

I spoke about limits and infinity in my conversation with Ben, my guess is it's not worth me rehashing what I said there. Though I will add that I could make someone whose utility is log in money - i.e. someone who'd normally bet Kelly - behave similarly.

Not with quite the same setup. But I can offer them a sequence of bets such that with near-certainty ( as ), they'd eventually end up with $0.01 and then stop betting because they'll under no circumstances risk going down to $0.

These bets can't be of the form "payout is some fixed multiple of your stake and you get to choose your stake", but I think it would work if I do "payout is exponential in your stake". Or I could just say "minimum stake is your entire bankroll minus $0.01" - if I offer high enough payouts each time, they'll take these bets, over and over, until they're down to their last cent. Each time they'd *prefer* a smaller bet for less money, but if I'm not offering that they'd rather take the bet I am offering than not bet at all.

Also,

It’s weird to me that something of measure 0 probability can swamp the entirety of the rest of the probability.

The Dirac delta has this property too, and IIUC it's a fairly standard tool.

Here were talking something that's weird in a different way, and perhaps weird in a way that's harder to deal with. But again I think that's more because of infinity than because of utility functions that are linear in money.

If instead the cap goes to infinity then the optimal strategy is to bet everything on every round.

This isn't right unless I'm missing something - Kelly provides the fastest growth, while betting everything on every round is almost certain to bankrupt you.

(e: posted overlapping with Oscar_Cunningham)

If you're trying to maximize expected money at the end of a fixed number of rounds, you do that by betting everything on every round (and, yes, almost certainly going bankrupt).

If that's not what you're trying to do, the optimal strategy is probably something else. But "how do we maximize expected money?" seems to be the question Gwern's post is exploring. It's just that with the $250 cap, maximizing expected money seems like a good idea (because you can almost always get close to $250), and with no cap, maximizing expected money seems like a terrible idea (because it gives you a 10^-67 chance of $10^92).

You don't do Kelly because it's good at maximizing expected money. You do it (when you do it) because you're trying to do something other than maximize expected money.

Oh, I see. Yes, I agree. The idea to maximize the expected money would never occur to me (since that's not how my utility function works), but I get it now.

It bankrupts you with probability 1 - 0.6^300, but in the other 0.6^300 of cases you get a sweet sweet $25 × 2^300. This nets you an expected $1.42 × 10^25.

Whereas Kelly betting only has an expected value of $25 × (0.6×1.2 + 0.4×0.8)^300 = $3220637.15.

Obviously humans don't have linear utility functions, but my point is that the Kelly criterion still isn't the right answer when you make the assumptions more realistic. You actually have to do the calculation with the actual utility function.

~~So, by optimal, you mean "almost certainly bankrupt you." Then yes.~~

~~My definition of optimal is very different.~~

Obviously humans don't have linear utility functions

I don't think that's the only reason - if I value something linearly, I still don't want to play a game that almost certainly bankrupts me.

Obviously humans don't have linear utility functions, but my point is that the Kelly criterion still isn't the right answer when you make the assumptions more realistic.

I mean, that's not obvious - the Kelly criterion gives you, in the example with the game, E(money) = $240, compared to $246.61 with the optimal strategy. That's really close.

I don't think that's the only reason - if I value something linearly, I still don't want to play a game that almost certainly bankrupts me.

I still think that's because you intuitively know that bankruptcy is worse-than-linearly bad for you. If your utility function were truly linear then it's true by definition that you would trade an arbitrary chance of going bankrupt for a tiny chance of a sufficiently large reward.

I mean, that's not obvious - the Kelly criterion gives you, in the example with the game, E(money) = $240, compared to $246.61 with the optimal strategy. That's really close.

Yes, but the game is very easy, so a lot of different strategies get you close to the cap.

Yes, but the game is very easy, so a lot of different strategies get you close to the cap.

I've been thinking about it, and I'm not sure if this is the case in the sense you mean it - expected money maximization doesn't reflect human values *at all*, white Kelly criterion mostly does, so if we make our assumptions *more realistic*, it should move us away from expected money maximization and towards the Kelly criterion, as opposed to moving us the other way.

Not maximising expected utility means that you

expect to get less utility.

This isn't actually right though - the concept of maximizing utility doesn't quite overlap with expecting to have more or less utility at the end.

There are many examples where maximizing your expected utility means expecting to go broke, and not maximizing it means expecting to end up with more money.

(Even though, in this particular one-turn example, Bob should, in fact, expect to end up with more money if he bets everything.)

I don't think I understand the point of the temporal average. I think I follow how to calculate it, but I don't see any justification here for why we should care about the value we calculate that way, or why it's given that name. (Maybe I just missed these? Maybe they're answered in the paper?)

I've written about this myself, though not recently enough to remember that post in depth. My answer for why to bet Kelly is "over a long enough time, you’ll almost certainly get more money than someone else who was offered the same bets as you and started with the same amount of money but regularly bet different amounts on them".

I happen to know that in this type of game, maximizing temporal average is the way to get that property, which is neat. That's the justification I'd give for doing that calculation in this type of game. But it's not clear to me what justification you'd give.

The temporal average is pretty much just the average exponential growth rate. The reason that it works to use this here is that it's an ergodic quantity in this problem, so the statistic that it gives you in the one-step problem matches the statistic that it would give you for a complete time-sequence. That means if you maximize it in the one-step problem, you'll end up maximizing it in the time-series too (not true of expected value here).

I think your justification for Kelly is pragmatically sufficient, but theoretically leaves me a bit cold. I'm interested in knowing why Kelly is the right choice here, and Ole Peters' paper blew my mind when I read it the first time because it finally gave an answer to this.

I don't follow, sorry.

the statistic that it gives you in the one-step problem matches the statistic that it would give you for a complete time-sequence.

What statistic is this? If I calculate the time-average for one step, using the Kelly strategy, I get roughly 1.02:

If I calculate it for two steps, if I've done it right, I get roughly 1.04:

and I don't think this converges as the number of steps grows. If I'm not getting myself mixed up, then what does converge is , but... I can do the same with "expected money following the bet-everything strategy".

I think your justification for Kelly is pragmatically sufficient, but theoretically leaves me a bit cold. I’m interested in knowing why Kelly is the right choice here

So I feel like any justification eventually has to boil down in either pragmatics ("here's something we care about") or pretend-pragmatics ("here's something we're pretending to care about for the purposes of this hypothetical; presumably we think there's some correspondence to the real world but we may not specify exactly what we think it is"). If we don't have something like that, why pick one theoretical justification over another?

And I don't feel like my justification is lacking in theory. It's not that I've done a bunch of experiments and said "this seems to satisfy my pragmatic desires but I don't know why". I have a theoretical argument for why it satisfies my pragmatic desires.

People talk about Kelly betting and expectation maximization as though they're alternate strategies for the same problem. Actually, they're each the best option to pick for different classes of problems. Understanding when to use Kelly betting and when to use expectation maximization is critical.

Most of the ideas for this came from Ole Peters ergodicity economics writings. Any mistakes are my own.## The parable of the casino

Alice and Bob visit a casino together. They each have $100, and they decide it'll be fun to split up, play the first game they each find, and then see who has the most money. They'll then keep doing this until their time in the casino is up in a couple days.

Alice heads left and finds a game that looks good. It's double or nothing, and there's a 60% chance of winning. That sounds good to Alice. Players buy as many tickets as they want. Each ticket resolves independently from the others at the stated odds, but all resolve at the same time. The tickets are $1 each. How many should Alice buy?

Bob heads right and finds a different game. It's a similar double or nothing game with a 60% chance of winning. He has to buy a ticket to play, but in Bob's game he's only allowed to buy one ticket. He can pay however much he wants for it, then the double or nothing is against the amount he paid for his ticket. How much should he pay for a ticket?

## Alice's game is optimized by an ensemble average

Let's estimate the amount of money Alice will win, as a function how many tickets she buys. We don't know how each ticket resolves, but we can say that approximately 60% of the tickets will be winners and 40% will be losers (though we don't know which tickets will be which). This is just calculating the expected value of the bet.

If she buys x tickets, she'll make 0.6∗x∗2+0.4∗x∗0 dollars. This is a linear function that monotonically increases with x, so Alice should buy as many as she can.

Since she has $100, she can buy 100 tickets. That means she will probably come away with $120. There will be some variance here. If tickets were cheaper (say only a penny each), then she could lower her variance buy buying more tickets at the lower price.

## Bob's game is optimized by a time-average

Unlike Alice's game with a result for each ticket, there's only one result to Bob's game. He either doubles his ticket price or gets nothing back.

One way people tend to approach this is to apply an ensemble average anyway via expectation maximization. If you do this, you end up with basically the same argument that Alice had and try to bet all of your money. There are two problems with this.

One problem is that Alice and Bob are going to repeat their games as long as they can. Each time they do, they'll have a different amount of money available to bet (since they won or lost on the last round). They want to know who will have the most at the end of it.

The repeated nature of these games mean that they aren't ergodic. As soon as someone goes bust, they then can't play any more games. If Bob bets the same way Alice does, and goes all in, then he can only get $0 or double out. After one round, he's 60% likely to be solvent. After 10 rounds, he's only 0.610 likely to have any money at all. That's about half a percent, and Bob is likely to spend the last few days of their trip chilling in the bar instead of gaming.

The second problem with expected value maximization here is that expected value is a terrible statistic for this problem. In Alice's game, her outcomes converge to the expected value. In Bob's game, his outcomes if he expectation maximizes are basically as far from the expected value as they can be.

This is why Bob should treat his game like a time-average. I highly recommend Ole Peters's paper deriving time-average statistics for the St. Petersburg paradox to fully understand this, but I'll give an overview of one derivation here.

As an intuition pump, let's look at the traditional expected value calculations. You first break a single result up into different "slices". You have one slice for each possible outcome, and you scale each slice by the probability of the outcome and the value of the outcome. Then you sum. In equations, E(U)=∑npnun.

The time average starts similarly. Instead of breaking a single outcome up, you break up the single event in time. In Bob's case, we split the one time event up into two sections. One section is the win section, and it's 60% of the time. One section is the loss section, and it's 40% of the time. Each slice scales your bankroll according to the exponential growth formula, so you have to know the growth factor for the options.

Growth factor depends on how much you start with and how much you bet. Bob starts with $100 and bets x, so his growth factor on a win would be 100+x100, and on a loss it would be 100−x100.

Then bankroll scaling happens multiplicatively. After the winning portion of time, you'd have x0(x0+xx0)pw. Then after the losing portion you'd have x0(x0+xx0)pw(x0−xx0)pl. That's your time average, and you want to maximize it. In equations, a time average looks like T(U)=∏nrpnn, where rn is the growth factor for outcome n.

A mathematician would say that Bob should bet whatever maximizes 100(100+x100)0.6(100−x100)0.4 where x is how much he paid for a ticket. We find the argmax more easily by taking the log to turn this into a sum of products. In other words, we want the x that maximizes 0.6log(100+x)+0.4log(100−x). After a little differentiation and some algebra, we find that optimal value ends up being x=20.

An important note here is that we used a logarithm to simplify the math, but we are not actually interested in maximizing the log of the value. We are maximizing our temporal average, and the logarithm is just a mathematical trick that makes finding the argmax easier.

The result here is the Kelly criterion. If Bob spends 20% of his bankroll for each ticket over multiple runs, his long run growth factor will converge to about 1.02.

## What should Bob actually do?

The mathematician would tell Bob to Kelly bet on his game, but a stock trader would tell Bob to find a better game.

Alice's long run rate of return converges to 20% per turn. Bob's converges to about 2% per turn. Alice is doing much better than Bob, because she can access the ensemble return of the stakes.

Some arguments against Kelly betting, such as Abram Demski's post here, note correctly that the ensemble average is higher than what you can get with Kelly betting. What those arguments don't take into account is that there are many wagers where the ensemble average is just not available as an outcome.

If you can ensemble average, then you definitely should. If you can't ensemble average, then maybe you shouldn't bet at all. This is actually common wisdom among non-math people. Few are the parents who would advise their children to make their fortune through gambling games.

Gambling is unpopular as a way to make a living because people have learned, through long and horrible history, that gambling games don't give you a good return. Even if the games are fair and you're good at them. Even if you bet Kelly.

## When to bet with the Kelly criterion

An enormous amount of time and energy over the past few centuries has been spent designing mechanisms that allow people to access ensemble average returns from inherently non-ensembled bets. This is what much of modern portfolio theory is about. It's why index funds exist, and even a large part of the reason for mutual funds. Even VCs invest in many startups, knowing that the ensemble average will be high even if most individual startups go bust.

There are important cases where the ensemble average doesn't apply. We got a taste of one of them when Sam Bankman-Fried infamously said he'd St. Petersburg paradox the universe. There's a good writeup of this over at Taylor Pearson's blog, but the short story is that you should not take double or nothing bets with the whole universe.

Here's when double or nothing with the whole universe makes sense: when you have a huge number of fully fungible universes that you share value among after the fact. In other words, when you can access the ensemble average. I don't know about you, but I only have the one universe.

I also only have the one life. What I choose to spend my time on, how I choose to live, the idea of the Kelly criterion can apply to these too. Mostly in the form of aphorisms like "keep some powder dry" or maintain slack in your schedule.

There's one final place that I find a lack of ensemble access to be important: if we create an AI superintelligence that takes major actions affecting the future of humanity. I want it to be able to figure out when it should Kelly bet, and when it's ok to expectation maximize for an ensemble. I wouldn't want SBF to double-or-nothing our universe, and I don't want AI to do it either.