Where can I find good explanations of the central limit theorems for people with a Bayesian background?

Nov 13, 2020*

Don't have any good source except univeristy textbooks, but:

The simplest proof I know of (in 3 lines or so) is to just compute characteristic functions.
In general, the theorem talks about weak convergence, i.e. convergence in distributions.
The sample mean converges to expected value of the distribution it was taken from almost surely (i.e. strong convergence). This is a different phenomenon than CLT, it's called the law of large numbers.
CLT applies to a family of random variables, not to distributions. The random variables in question do not have to be identically distributed, but do have to be independent (in particular, independence of a family of random variables is NOT the same as their pairwise independence).
The best intuition behind the CLT I know of: Gaussian is the only distribution with a finite variance where a linear combination of two independent variables has the same distribution (modulo parameter shift) as they have (i.e. it is a stable distribution). So, if try to "solve" the recursive equation for the limit in CLT, you'll see that, if it exists, it has to be Gaussian. The theorem is actually about showing that the limit exists.

In general, as someone nicely put this: The importance of stable probability distributions is that they are "attractors" for properly normed sums of independent and identically distributed (iid) random variables.

Thanks. I think I had the law of large numbers and CLT in the same bucket in my head, so pointing out they're different is helpful. Your point #5, and the attractor bit, are especially interesting - and I've seen similar arguments in Jaynes's book, around gaussians, so this is starting to get into places I can relate to. And knowing that convergence in distribution is called weak convergence should help when I'm searching for stuff. Helpful!

CLT applies to a family of random variables, not to distributions.

I guess I consider a family of random variables to be the same thing as a family of distributions? Is there a difference?

2wolajacy5y

Answering the last question: If you deal with any random variable, formally you are specifying a probability space, and the variable is a measurable function on it. So, to say anything useful about a family of random variables, they all have to live on the same space (otherwise you can't - for example - add them. It does not make sense to add functions defined on different spaces). This shared probability space can be very complicated by itself, even though the marginal distributions are the same - it encodes the (non-)independence among them (in case of independent variables, it's just a product space with a product measure).

3Maxwell Peterson5y

Your comment made me realize that I didn't actually know what it meant to add random variables! I looked it up and found that, according to Wikipedia, this corresponds (if the RVs are independent) to what my main source (Jaynes) has been talking about in terms of convolutions of probability distributions. So I'm gonna go back and re-read the parts on convolution. But I still want to go out on a limb here and say that sounds to me like too strong a statement. Since I can take the AND of just about any two propositions and get a probability, can't I talk about the chance of a person being 6 feet tall, and about the probability that it is raining in Los Angeles today, even though those event spaces are really different, and therefore their probability spaces are different? And if I can do that, what is special about the addition of random variables that makes it not applicable, in the way AND is applicable?

4wolajacy5y

If you don't have a given joint pobability space, you implicitly construct it (for example, by saying RV are independent, you implicitly construct a product space). Generally, the fact that sometimes you talk about X living on one space (on its own) and other time on the other (joint with some Y) doesn't really matter, because in most situations, probability theory is specifically about the properties of random variables that are independent of the of the underlying spaces (although sometimes it does matter). Your example, by definition, P = Prob(X = 6ft AND Y = raining) = mu{t: X(t) = 6ft and Y(t) = raining}. You have to assume their joint probability space. For example, maybe they are independent, and then it P = Prob(X = 6ft) \* Prob(Y = raining), or maybe it's Y = if X = 6ft than raining else not raining, and then P = Prob(X = 6ft).

LESSWRONG
LW

LESSWRONG
LW

2

[ Question ]

Where can I find good explanations of the central limit theorems for people with a Bayesian background?

2

2

1 Answers sorted by
top scoring

Nov 13, 2020*

2

[ Question ]

Where can I find good explanations of the central limit theorems for people with a Bayesian background?

2

2

1 Answers sorted by top scoring

Nov 13, 2020*

1 Answers sorted by
top scoring