The Geometric Expectation

18Algon

3Vivek Hebbar

3scottviteri

2scottviteri

15Rohin Shah

8Eric Neyman

2eapi

5Eric Neyman

1Noosphere89

4scottviteri

1scottviteri

2scottviteri

2scottviteri

4Vivek Hebbar

2Kshitij Sachan

2Quinn

2simonsimonsimon

8Vivek Hebbar

4quetzal_rainbow

1A.H.

New Comment

A video on the geometric derivative by the ever excellent Michael Penn:

Edit:

The geometric derivative is the instantaneous exponential growth rate i.e. where is the geometric derivative.

And if I pushed around symbols correctly, the geometric derivative can be pulled inside of a geometric expectation () similarly to how an additive derivative can be pulled inside an additive expectation (). Also, just as additive expectation distributes over addition (), geometric expectation distributes over multiplication ().

I think what is going on here is that both and are of the form with and , respectively. Let's define the star operator as . Then , by associativity of function composition. Further, if and commute, then so do and :

So the commutativity of the geometric expectation and derivative fall directly out of their representation as and , respectively, by commutativity of and , as long as they are over different variables.

We can also derive what happens when the expectation and gradient are over the same variables: . First, notice that , so .. Also .

Now let's expand the composition of the gradient and expectation. , using the log-derivative trick. So .

Therefore, .

Writing it out, we have .

Thanks for the post -- I've been having thoughts in this general direction and found this post helpful. I'm somewhat drawn to geometric rationality because it gives more intuitive answers in thoughts experiments involving low probabilities of extreme outcomes, such as Pascal's mugging. I also agree with your claim that "humans are evolved to be naturally inclined towards geometric rationality over arithmetic rationality."

On the other hand, it seems like geometric rationality only makes sense in the context of natural features that cannot take on negative values. Most of the things I might want to maximize (e.g. utility) can be negative. Do you have thoughts on the extent to which we can salvage geometric rationality from this problem?

But if your utility function is bounded, as it apparently should be then you're one affine transform away from being able to use geometric rationality, no?

If arithmetic and geometric means are so good, why not the harmonic mean? https://en.wikipedia.org/wiki/Pythagorean_means. What would a "harmonic rationality" look like?

Also here is a nice family that parametrizes these different kinds of average (https://m.youtube.com/watch?v=3r1t9Pf1Ffk)

Actually maybe this family is more relevant:

https://en.wikipedia.org/wiki/Generalized_mean, where the geometric mean is the limit as we approach zero.

The "harmonic integral" would be the inverse of integral of the inverse of a function -- https://math.stackexchange.com/questions/2408012/harmonic-integral

**Some results related to logarithmic utility and stock market leverage **(I derived these after reading your previous post, but I think it fits better here)**:**

**Tl;dr: **We can derive the optimal stock market leverage for an agent with utility logarithmic in money. We can also back-derive a utility function from any constant leverage^{[1]}, giving us a nice class of utility functions with different levels of risk-aversion. Logarithmic utility is recovered a special case, and has additional nice properties which the others may or may not have.

For an agent investing in a stock whose "instantaneous" price movements are i.i.d. with finite moments:

- Suppose, for simplicity, that the agent's utility function is over the
*amount of money they have in the next timestep.*(As opposed to more realistic cases like "amount they have 20 years from now".)- If , then:
- The optimal leverage for the agent to take is given by the formula , where and s is the standard deviation of the same. Derivation here. By my calculations, this implies a leverage of about 1.8 on the S&P 500.

- What if we instead suppose the agent prefers some constant leverage , and try to infer it's utility function?
- The relevant differential equation is
- This is solved by for and for . You can play with the solutions here.

- If , then:
- Now suppose instead that the agent's utility function is "logarithmic withdrawals, time-discounted exponentially" -- , where is the absolute
^{[2]}rate of withdrawal at time . It turns out that optimal leverage is still constant, and is still given by the same formula . Furthermore, the optimal rate of withdrawal is a constant , regardless of what happens.- Things probably don't work out as cleanly for the non-logarithmic case.

[Disclaimer: This is not investment advice.]

^{^}Caveats:

1. This assumption of constant leverage is pretty arbitrary, so there's no normative or descriptive force to the class of utility functions we derive from it

2. We have to make an unrealistic assumption that the utility function is over $$ at the next timestep, rather than further in the future. In the log case, these kind of assumptions tend to not change anything, but I'm not sure whether the general case is as clean.

^{^}i.e. in dollars, not percents

## A Suspicious Pattern

There is a pattern that shows up in many of the toys we like to play with around here: the pattern of maximizing the expected logarithm.

Nash bargaining is a method for aggregating preferences without a means to directly compare them. When Nash bargaining, you are maximizing the expected logarithm of utility, where the expectation is over uncertainty about which person you are.

Kelly betting is an extremely useful tool for not putting all your future wealth in one basket. When Kelly betting, you are maximizing the expected logarithm of your wealth.

The log scoring rule is a very natural way to extract beliefs. When maximizing your log score, you are maximizing the expectation of the logarithm of the probability you assign to the right answer. This is one example of a general pattern. Maximizations of expected logarithms show up all over information theory, often phrased as minimizing the negative of the expected logarithm.

Why does maximization of the expected logarithm keep showing up?

One answer is that all of the instances of it showing up are actually related. In my previous two posts, I made some connections between Nash bargaining and Kelly betting. The fact that Kelly betting can be used to model Bayesian updating illustrates its relationship with the information theory applications. To a certain extent, there is really only one instance of this pattern.

However, I think that there is another argument for why you should expect this pattern to show up a lot, which is that the pattern is very simple. More simple than it looks on the surface. It only looks complicated because mathematicians have failed us.

## The Geometric Integral

One of the most underrated concepts in mathematics is the geometric integral, given by ∏f(x)dx=e∫ln(f(x))dx. (The fact that I couldn't easily get a latex symbol that looks like an elongated P is a testament to its underratedness.) The geometric integral is just like the standard integral, but everywhere you would add, you multiply instead. Defining it in terms of the standard (arithmetic) integral with logs and exponents is insulting to its nature, and I don't recommend thinking of it that way. (You wouldn't define x×y as eln(x)+ln(y).) Instead, you should just think of it as the multiplicative version of the integral. However, using logs and exponentiation, it is the fastest way to get the definition across.

I think people don't practice thinking multiplicatively enough, which causes them to throw inherently multiplicative things into logarithms, so they can think about them additively.

I will use the phrase geometric expectation when I take a geometric integral over a probability distribution, and I will use the symbol G. Thus, we will write Gx∼Pf(x)=eEx∼Plnf(x).

## Discrete Geometric Expectations

Luckily, most of the time, we will want to talk about discrete geometric expectations, where we can use (possibly infinite) sums rather than integrals and (possibly infinite) products rather than geometric integrals.

Let us gain some intuition for discrete geometric expectations by going though some simple cases. We will start with a uniform distribution on a finite set.

Let X={x1,…,xn} be a finite set with n elements. Let f:X→R≥0 be a function that assigns a nonnegative value to each xi. Let P be the uniform probability distribution on X that assigns probability 1n to each element of X.

We have that Ex∼Pf(x)=∑x∈XP(x)f(x)=∑ni=1f(xi)n=f(x1)+…+f(xn)n. This is just the average, or arithmetic mean of the f values.

We can compute Gx∼Pf(x) using the above formula Gx∼Pf(x)=eEx∼Plnf(x). Here, we get

Gx∼Pf(x)=eEx∼Plnf(x)=elnf(x1)+…+lnf(xn)n=n√elnf(x1)…elnf(xn)=n√f(x1)…f(xn).

Thus, the geometric expectation of the uniform distribution is just the geometric mean of the f values. Hence the name.

The infinite non-uniform discrete case is not much more difficult. If X is a finite or countably infinite set, f:X→R≥0 assigns a nonnegative value to each x∈X, and P is a probability distribution on Y, then Ex∼Pf(x)=∑x∈XP(x)f(x), and

Gx∼Pf(x)=eEx∼Plnf(x)=e∑x∈XP(x)lnf(x)=∏x∈XeP(x)lnf(x)=∏x∈Xf(x)P(X).

These two values can be thought of as a weighted arithmetic mean and weighted geometric mean respectively.

When taking the geometric expectation of f with respect to P, you just take the product over all x∈X of f(x)P(x). You are multiplying together all the f values, but the exponent P(x) is saying that values with less probability get less weight (or less "power").

## Maximizing the Geometric Expectation

Maximization is invariant under applying a monotonic function. Thus argmaxy∈YEx∼Pln(f(x,y))=argmaxy∈YeEx∼Pln(f(x,y))=argmaxy∈YGx∼Pf(x,y).

So every time we maximize an expectation of a logarithm, this was equivalent to just maximizing the geometric expectation.

Rather than saying "maximize the geometric expectation", I will just say "geometrically maximize". For example, when Kelly betting, we are just geometrically maximizing wealth. Note that the unit on the geometric expectation of wealth is dollars. The unit on the expected logarithm of dollars is... confusing? It is log dollars, but like, you add it instead of multiplying? I don't know how it works. What even is a log dollar?

The geometric expectation just makes more sense than the expected logarithm. It is a real thing with a real meaning. However, when we put the geometric expectation inside of a maximization, and we don't naturally think in terms of geometric expectations, we are tempted to take a logarithm of the whole thing, (which we can do because the maximization eats the monotonic function), and end up with maximizing the expected logarithm.

## Geometric Rationality

When Kelly betting, you are really just geometrically maximizing wealth.

When Nash Bargaining, you are really just geometrically maximizing expected utility with respect to your uncertainty about your identity. In defense of Nash bargaining, It is normally presented as maximizing the product of the utilities. However, if you don't already have the concept of geometric expectation, it is tempting to convert it to an expected logarithm so you can handle the weighted case and think of it as being about uncertainty behind the veil of ignorance. (Also, it is more like the square root of the product of the utilities rather than the product of the utilities.)

When maximizing log score, you are really just geometrically maximizing the probability you assign your observation.

I will informally use the phrase "geometric rationality" to refer to techniques that tend to geometrically maximize natural features (of the world or the self). I want to raise to attention the hypothesis that humans are evolved to be naturally inclined towards geometric rationality over arithmetic rationality, and that around here, the local memes have moved us too far off this path.