LESSWRONG
LW

Quantitative Finance
EntropyProbability & StatisticsWorld Modeling
Frontpage

17

Hypothesis Space Entropy

by lsusr
14th May 2021
2 min read
2

17

EntropyProbability & StatisticsWorld Modeling
Frontpage

17

Previous:
Why quantitative finance is so hard
5 comments44 karma
Next:
The Kelly Criterion in 3D
18 comments47 karma
Log in to save where you left off
Hypothesis Space Entropy
1Alexander Gietelink Oldenziel
3lsusr
New Comment
2 comments, sorted by
top scoring
Click to highlight new comments since: Today at 2:57 PM
[-]Alexander Gietelink Oldenziel4y10

I am confused what is meant by a 'hypothesis'. 

Is this a probability distribution? What is the mathematical object that you denote by hypothesis?

Reply
[-]lsusr4y30

It's a probability distribution. A hypothesis space is a probability distribution of probability distributions.

Reply
Moderation Log
Curated and popular this week
2Comments

In Why quantitative finance is so hard I explained why the entropy of your dataset must exceed the entropy of your hypothesis space. I used a simple hypothesis space with n equally likely hypotheses each with m tunable parameters. Real life is not usually so homogeneous.

No Tunable Parameters

Consider an inhomogeneous hypothesis space with zero tunable parameters. Instead of H=logn which works for homogeneous hypothesis spaces, we must use more complicated entropy equation.

H=−n∑i=1ρilnρi

This equation makes intuitive sense. It vanishes when one ρi=j equals 1 and all other ρi≠j equal 0. Our equation is extremized when all ρi are equal at 1n. H=logn is the maximal case when ρi=1n∀i∈{1,…,n}[1].

With Tunable Parameters

Suppose each hypothesis i has mi tunable parameters. We can plug mi into our entropy equation.

H=n∑i=1ρi(mi−lnρi)

Our old equation H=m+logn is just the special case where all ρi are homogenous and mi are homogeneous too.

We have so far treated mi as representative of each hypothesis's tunable parameters. More generally, mi represents each hypothesis's internal entropy. If we think of hypotheses as a weighted tree, mi is what you get when you iterate one level down the tree. Our variable H identifies the root of the tree. Suppose ith branch of the next level down is called Hi.

H=n∑i=1ρi(Hi−lnρi)

We can define the entropy of the rest of the tree with a recursive equation.

Hμ=n∑i=1ρi(Hμ,i−lnρi)=n∑i=1(ρiHμ,i−ρilnρi)

There are two parts to this equation: the recursive component ρiHμ,i and the branching component −ρilnρi.

Branching component −ρilnρi

The −ρilnρi component is maximized when ρi=1e.

−ρilnρi=−1eln1e=1elne=1e

The branching component tops out at 1e. It can never contribute a massive quantity of entropy to your hypothesis space because it is limited to 1e entropy per level of the tree.

0≤−ρilnρi≤1e

The branching factor is mostly unimportant. The bulk of our entropy comes from the recursive component.

Recursive component ρiHμ,i

Fix ρi at a positive value. There is no limit to how big Hμ,i can become. You can make it arbitrarily large just by adding parameters. Consequently ρiHμ,i can become arbitrarily large too. In real world situations we should expect the recursive components of our hypothesis space to dominate the branching components.

If ρi vanishes then the recursive component disappears. This might explain why human minds like to round "extremely unlikely" ϵ>ρi>0 to "impossible" ρi=0 when Hμ,i is large. It removes lots of entropy from our hypothesis space still being right almost all of the time. This may be related to synaptic pruning.

Lessons for Hypothesis Space Design

Once again, we have confirmed that having hypotheses with lots of parameters is a worse problem that having lots of hypotheses to choose between. More generally, one or more hypotheses with exceptionally high entropy dominate the total entropy of your hypothesis space. If you want better priors then the first step of your optimization should be to eliminate these complex subtrees from your hypothesis space.


  1. Proof: H=−n∑i=1ρilnρi=−n∑i=11nln1n=−ln1n=−ln(n−1)=lnn ↩︎

Mentioned in
72[Book Review] "The Alignment Problem" by Brian Christian
31A Word to the Wise is Sufficient because the Wise Know So Many Words
29Technical Predictions Related to AI Safety
20Re: Attempted Gears Analysis of AGI Intervention Discussion With Eliezer