Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Superrational Agents Kelly Bet Influence!

15Thomas Kwa

6Gurkenglas

2abramdemski

4SimonM

2abramdemski

New Comment

Nuno Sempere points out that this was written up in an economics paper in 2012: https://arxiv.org/pdf/1201.6655.pdf

Suppose instead of a timeline with probabilistic events, the coalition experiences the full tree of all possible futures - but we translate everything to preserve behavior. Then beliefs encode which timelines each member cares about, and bets trade influence (governance tokens) between timelines.

Can you justify Kelly "directly" in terms of Pareto-improvement trades rather than "indirectly" through Pareto-optimality? I feel this gets at the distinction between the selfish vs altruistic view.

I also looked into this after that discussion. At the time I thought that this might have been something special about Kelly, but when I did some calculations afterwards I found that I couldn't get this to work in the other direction. I haven't fully parsed what you mean by:

(And since payoffs of the bet-against-yourself strategy are exactly identical to Kelly betting payoffs, a bunch of Kelly bets at house odds rearrange money in exactly the same way as this.)

But this is clearly equivalent to how hypotheses redistribute weight during Bayesian updates!

So, a market of Kelly betters re-distributes money according to Bayesian updates.

So take the following with a (large) grain of salt before I can recheck my reasoning, but:

Everything you've written (as I currently understand it) also applies for many other betting strategies. eg if everyone was betting (the same constant) fractional Kelly.

Specifically the market will clear at the same price (weighted average probability) and "everyone who put money on the winning side picks up a fraction of money proportional to the fraction they originally contributed to that side".

I also looked into this after that discussion. At the time I thought that this might have been something special about Kelly, but when I did some calculations afterwards I found that I couldn't get this to work in the other direction.

I'm not sure what you mean here. What is "this" in "looked into this" -- Critch's theorem? What is "the other direction"?

Everything you've written (as I currently understand it) also applies for many other betting strategies. eg if everyone was betting (the same constant) fractional Kelly.

Specifically the market will clear at the same price (weighted average probability) and "everyone who put money on the winning side picks up a fraction of money proportional to the fraction they originally contributed to that side".

It seems obvious to me that the market will clear at the same price if everyone is using the same fractional Kelly, but if people are using different Kelly fractions, the weighted sum would be correspondingly skewed, right? Anyway, that's not really important here...

The important thing for the connection to Critch's theorem is: the total wealth gets adjusted like Bayes' Law. Other betting strategies may not have this property; for example, fractional Kelly *means losers lose less, and winners win less*. This doesn't limit us to exactly Kelly (for example, the bet-against-yourself strategy in the post also has the desired property); however, all such strategies __must__ be equivalent to Kelly in terms of the payoffs (otherwise, they wouldn't be equivalent to Bayes in terms of the updates!).

For example, if everyone uses fractional Kelly with the same fraction, then on the first round of betting, the market clears with all the right prices, since everyone is just scaling down how much they bet. However, the subsequent decisions will then get messed up, because the everyone has the wrong weights (weights changed less than they should).

As a follow-up to the Walled Garden discussion about Kelly betting, Scott Garrabrant made some super-informal conjectures to me privately, involving the idea that some class of "nice" agents would "Kelly bet influence", where "influence" had something to do with anthropics and acausal trade.

I was pretty incredulous at the time. However, as soon as he left the discussion, I came up with an argument for a similar fact. (The following does not perfectly reflect what Scott had in mind, by any means. His notion of "influence" was very different, for a start.)

The meat of my argument is just Critch's negotiable RL theorem. In fact, that's practically the entirety of my argument. I'm just thinking about the consequences in a different way from how I have before.

## Superrationality

Rather than articulating a real decision theory that deals with all the questions of acausal trade, bargaining, commitment races, etc, I'm just going to imagine a class of superrational agents which solve these problems somehow. These agents "handshake" with each other and negotiate (perhaps acausally) a policy which is Pareto-optimal wrt each of their preferences.

## Negotiable RL

Critch's negotiable RL result studies the question of what an AI should do if it must serve multiple masters. For this post, I'll refer to the masters as "coalition members".

He shows the following:

Any policy which is Pareto-optimal with respect to the preferences of coalition members, can be understood as doing the following. Each coalition member is assigned a starting weight, with weights summing to one. At each decision, the action is selected via the weighted average of the preferences of each coalition member, according to the current weights. At each observation, the weights are updated via Bayes' Law, based on the beliefs of coalition members.He was studying what an AI's policy should be, when serving the coalition members; however, we can apply this result to a coalition of superrational agents who are settling on

their ownpolicy, rather than constructing a robotic servant.Critch remarks that we can imagine the weight update as the result of bets which the coalition members would make with each other. I've known about this for a long time, and it made intuitive sense to me that they'll happily bet on their beliefs; so, of course they'll gain/lose influence in the coalition based on good/bad predictions.

What I didn't think too hard about was

howthey end up betting. Sure, the fact that it's equivalent to a Bayesian update is remarkable. But it makes sense once you think about the proof.Or does it?

To foreshadow: the proof works from the assumption of Pareto optimality. So it

collectivelymakes sense for the agents to bet this way. But the "of course it makes sense for them to bet on their beliefs" line of thinking tricks you into thinking that itindividuallymakes sense for the agents to bet like this. However, this need not be the case.## Kelly Betting & Bayes

The Kelly betting fraction can be written as:

f=p−1r1−1rWhere

pis your probability for winning, andris the return rate if you win (ie, if you stand to double your money, r=2; etc).Now, it turns out, betting

fof your money (and keeping the rest in reserve) is equivalent to bettingpof your money and putting (1-p) on the other side of the bet. Betting against yourself is a pretty silly thing to do, but since you'll win either way, there's no problem:fBettingof your money:Betting against yourself, with fractions like your beliefs:But now imagine that a bunch of bettors are using the second strategy to make bets with each other, with the "house odds" being the weighted average of all their beliefs (weighted by their bankrolls, that is). Aside from the betting-against-yourself part, this is a pretty natural thing to do: these are the "house odds" which make the house revenue-neutral, so the house never has to dig into its own pockets to award winnings.

You can imagine that everyone is putting money on two different sides of a table, to indicate their bets. When the bet is resolved, the losing side is pushed over to the winning side, and everyone who put money on the winning side picks up a fraction of money proportional to the fraction they originally contributed to that side. (And since payoffs of the bet-against-yourself strategy are exactly identical to Kelly betting payoffs, a bunch of Kelly bets at house odds rearrange money in exactly the same way as this.)

But this is clearly equivalent to how hypotheses redistribute weight during Bayesian updates!

So, a market of Kelly betters re-distributes money according to Bayesian updates.

## Altruistic Bets

Therefore, we can interpret the superrational coalition members as betting their coalition weight, according to the Kelly criterion.

But, this is a pretty weird thing to do!

I've argued that the main sensible justification for using the Kelly criterion is if you have utility logarithmic in wealth. Here, this translates to utility logarithmic in coalition weight.

It's

possiblethat under some reasonable assumptions about the world, we can argue that utility of coalition members will end up approximately logarithmic. But Critch's theorem applies to lots of situations, including small ones where there isn't any possibility for weird things to happen over long chains of bets as in some arguments for Kelly.Typically, final utility will not even be

continuousin coalition weight: small changes in coalition weight often won't change the optimal strategy at all, but at select tipping points, the optimal strategy will totally change to reflect the reconfigured trade-offs between preferences.Intuitively, these tipping points

shouldfactor significantly in a coalition member's betting strategy; you'd be totally indifferent to small bets which can't change anything, but avoid specific transitions strongly, and seek out others. If the coalition members were betting based on their selfish preferences, this would be the case.Yet, the coalition members end up betting according to a very simple formula, which does not account for any of this.

Why?

We can't justify this betting behavior from a selfish perspective (that is, not with the usual decision theories); as I said, the bets don't make sense.

But we're not dealing with selfish agents. These agents are acting according to a Pareto-optimal policy.

And that's ultimately the perspective we can justify the bets from: these are

altruistically motivated bets.Exchanging coalition weight in this way isbest for everyone. It keeps you Pareto-optimal!This is very counterintuitive. I suspect most people would agree with me that there

seemsto be no reason to bet, if you're being altruistic rather than selfish. Not so! They're not betting for their personal benefit. They're betting for the common good!Of course, that fact is a very straightforward consequence of Critch's theorem. It shouldn't be surprising. Yet, somehow, it didn't stick out to me in quite this way. I was too stuck in the frame of trying to interpret the bets selfishly, as Pareto-improvements which both sides happily agree to.

I'm quite curious whether we can say anything interesting about how altruistic agents would handle money, based on this. I don't think it means altruists should Kelly bet money; money is a very different thing from coalition weight. Coalition weights are like exchange rates or prices. Money is more of a thing being exchanged. You do not

paycoalition weight in order to get things done.