Right, this seems really good! (It deserves more upvotes than it has; it's suitably mind blowing ;p)
There are some arbitrary choices (this is a generalization of expectation maximization, not an argument from expectation maximization, so it's not surprising that it's not a unique solution), but the only really arbitrary seeming port is the choice about how to order the limits. And as discussed in the other comment thread here, Eigil's comment about lim(EV) vs EV(lim) makes your choice of ordering seem like the more appropriate one -- your version matches up with the fact that EV of the all-in strategy for the infinite game is zero, while allowing us to evaluate strategies for the cases where EV of the infinite game is not well-defined.
The alternate intuition is that, since EV of infinite games is problematic, we should just compare the EV of strategies on very large numbers of iterations. This is basically your alternate limit ordering, and Eigil's lim(EV) as opposed to EV(lim). And "the boring option".
I think the boring option has some a priori superiority, but loses on net provided you're right about the version of the game which has a small chance of ending at each round. I think it's analogous to the following argument about Prisoner's Dilemma. The argument is between a Strawman Economist and a Steelman Douglas Hofstadter.
ECONOMIST: The normatively correct thing to do in Prisoner's Dilemma (PD) is to defect.
DOUGLAS: But in iterated PD, players can use the tit-for-tat strategy. If they do, it's rational for both of them to cooperate, and for both of them to continue using tit-for-tat. And most real PDs can be considered as iterated.
E: Ahh, true, but no game is really infinitely iterated. We know it stops at some point. At that point, there's no remaining incentive to cooperate. So both players should defect. Knowing this, players should actually think of the tit-for-tat chain as stopping one step earlier than this. But then the second-to-last move also becomes defect. And so on. The tit-for-tat strategy unravels all the way back to the beginning, and we're back at total defection.
D: Ahh, true, but in practice we're uncertain about when the game will end! Depending on our uncertainty, this can rescue tit-for-tat. So what we really get is a specific crossover point. If we're sufficiently certain about when the game will end, you are correct. If we're sufficiently uncertain, then I'll be correct instead.
E: Damn, you're right!
Similarly with straw economist and steel Kelly:
E: The rational way to evaluate bets is by taking the expectation. If a bet is worthwhile at all, it's worth going all-in, if the other side will accept that large of a bet.
K: Wait, look! In an infinitely iterated game, Bunthut's generalization of expectation maximization says to use my Kelly Criterion. And betting really is an iterated game. You shouldn't consider each bet in isolation.
E: Why are the limits ordered that way?
K: As Eigil commented elsewhere, lim(EV) doesn't equal EV(lim). And in fact the EV of the all-in strategy in the infinitely iterated case is zero. So this ordering of limits is the one that generalizes EV. The other ordering prefers the all-in strategy, even for the infinite game, so it can't be a valid generalization of EV.
E: OK true, but consider this: I'm only going to make a finite number of bets in my life. Maybe I play the stocks for several decades, but then I retire; the size of my nest egg at retirement is what I care about. Your formula agrees with EV maximization in finite cases, so it must agree that I should use the all-in strategy here.
K: Suuure, but consider this: you don't generally know when you'll make your last bet. You probably won't stop playing the stocks when you retire, and few anticipate the exact day they die. If we incorporate that uncertainty, we get behavior resembling EV maximization when we're sufficiently certain of the game's end, but we get behavior resembling Kelly when we're sufficiently uncertain.
So my takeaway is: your argument about the 1%-probability-of-ending case is a crux for me. It makes the difference between this being a clever but rarely-applicable analysis of an infinite game, vs a frequently-applicable analysis of games with uncertain end. I'd really like to see how that works out.
I'm also curious whether this can be applied to other problems, like the St. Petersburg Lottery.
So then, to get back the bitcoin in 12 years when it's worth a billion dollars or whatever, you just have to make back the million USD you spent (plus interest), convert to USDC, and pay back (with interest)?
And then the risk you run is that bitcoin falls between now and that time, at which point your collateral would be liquidated to the highest bidder, and you'd be left with a Tesla but no bitcoin. (Not such a bad risk, all things considered.)
That's pretty awesome.
Ok, interesting! Thanks for the explanation.
What's the point of a loan if you need 100% collateral, and the collateral isn't something like a house that you can put to good use while using as collateral?
If I can use bitcoin as collateral to get some Doge, hoping to make enough money with the Doge to get my bitcoin back later... couldn't I just sell my bitcoin to buy the Doge, again hoping to make enough money to get my bitcoin back later?
Let's consider a max-expectation bettor on a double-or-nothing bet with an 80% probability of paying out.
My expected value per dollar in this bet is $1.60, whereas the expected value of a dollar in my pocket is $1. So I maximize expected value by putting all my money in. If I start with $100, my expected value after 1 round is $160. The expected value of playing this way for two rounds is $100x1.6x1.6 = $256. In general, the expected value of this strategy is 100 ⋅1.6n.
The Kelly strategy puts 60% of its money down, instead. So in expectation, the Kelly strategy multiplies the money by .6⋅1.6+.4=1.36.
So after one round, the Kelly bettor has $136 in expectation. After two rounds, about $185. In general, the Kelly strategy gets an expected value of 100⋅1.36n.
So, after a large number of rounds, the all-in strategy will very significantly exceed the Kelly strategy in expected value.
I suspect you will object that I'm ignoring the probability of ruin, which is very close to 1 after a large number of rounds. But the expected value doesn't ignore the probability of ruin. It's already priced in: the expected value of 1.6 includes the 80% chance of success and the 20% chance of failure: .8⋅2+.2⋅0. Similarly, the $256 expected value for two rounds already accounts for the chance of zero; you can see how by multiplying out 100⋅(.8⋅2+.2⋅0)2 (which shows the three possibilities which have value zero, and the one which doesn't). Similarly for the nth round: the expected value of 100⋅1.6n already discounts the winnings by the (tiny) probability of success. (Otherwise, the sum would be $2^n instead.)
One interpretation of this would be imitation learning: teaching a system to imitate human strategies, rather than optimize some objective of its own.
The problem with imitation learning is: since humans are pretty smart, a close imitation of a human strategy is probably going to involve planning in the deliberate service of some values. So if you set a big neural network on the problem of imitating humans, it will develop its own preferences and ability to plan. This is a recipe for an inner optimizer. Its values and planning will have to line up with humans in typical cases, but in extreme cases (eg adversarial examples), it could be very different. This can be a big problem, because the existence of such an AI could itself push us to extreme cases where the AI has trouble generalizing.
Another interpretation of your idea could be "approval-directed agents". These are not trained to imitate humans, but rather, trained based on human approval of actions. However, unlike reinforcement learners, they don't plan ahead to maximize expected approval. They only learn to take specific actions more when they are approved of, and less when they earn disapproval.
Unlike imitation learners, approval-directed agents can be more capable than human trainers. However, unlike reinforcement learning agents, approval-directed agents don't have any incentive to take over control of their reward buttons. All the planning ahead comes from humans, looking at particular sorts of actions and deciding that they're good.
Unfortunately, this still faces basically the same problem as imitation learning. Because humans are approving/disapproving based on complicated models of the world and detailed thoughts about the consequences of actions, a big neural network has good reason to replicate those faculties within itself. You get an inner optimizer again, with the risks of misalignment that this brings.
I have often wondered why use of IQ in hiring isn't more common, so I just sorta believed it when you said it's illegal, even though I probably looked into it and figured out it wasn't on a previous occasion.
OTOH, businesses might be afraid of getting sued anyway, based on the supreme court case.
"The map is not the territory" does seem like a step up from "the name that can be named is not the eternal name". Though, that could be a translation issue.
As GuySrinavasan says, do the math. It doesn't work out. Maximizing geometric growth rate is not the same as maximizing mean value. It turns out Kelly favors the first at a severe cost to the second.
This is my big motivator for writing stuff like this: discussions of Kelly usually prove an optimality notion like expected growth rate, and then leave it to the reader to notice that this doesn't at all imply more usual optimality notions. Most readers don't notice; it's very natural to assume that "Kelly maximizes growth rate" entails "Kelly maximizes expected wealth".
But if Kelly maximized expected wealth, then that would probably have been proved instead of this geometric-growth-rate property. You have to approach mathematics the same way you approach political debates, sometimes. Keep an eye out for when theorems answer something only superficially similar to the question you would have asked.
Yep, I actually note this in footnote 3. I didn't change section 2 because I still think that if each of these is individually bad, it's pretty questionable to use them as justification for Kelly.
Note that if a strategy Sb is better or equal in every quantile, and strictly better in some, compared to some Sw, then expected utility maximization will prefer Sb to Sw, no matter what the utility function is (so long as more money is considered better, ie utility is monotonic).
So all expected utility maximizers would endorse an all-quantile-optimizing strategy, if one existed. This isn't a controversial property from the EU perspective!
But it's easy to construct bets which prove that maximizing one quantile is not always consistent with maximizing another; there are trade-offs, so there's not generally a strategy which maximizes all quantiles.
So it's critically important that Kelly is only approximately doing this, in the limit. If Kelly had this property precisely, then all expected utility maximizers would use the Kelly strategy.
In particular, at a fixed finite time, there's a quantile for the all-win sequence. However, since this quantile becomes smaller and smaller, it vanishes in the limit. At finite time, the expected-money-maximizer is optimizing this extreme quantile, but the Kelly strategy is making trade-offs which are suboptimal for that quantile.
(Note: maybe I'm misunderstanding what johnswentworth said here, but if solving for any x%-quantile maximizer always yields Kelly, then Kelly maximizes for all quantiles, correct?)
That's my belief too, but I haven't verified it. It's clear from the usual derivation that it's approximately mode-maximizing. And I think I can see why it's approximately median-maximizing by staring at the wikipedia page for log-normal long enough and crossing my eyes just right [satire].