My knowledge of probability theory is based mostly on reading E.T. Jaynes’ Probability Theory book, Andrew Gelman’s blog, and various LessWrong posts. I now want to get a strong grasp of the central limit theorem(s), but YouTube videos and googled pages speak so much in the language of sampling from a population, and random variables, that it’s hard to be sure what they’re saying, given that my background doesn’t really include those ideas. I’m especially interested in the different kinds of CLTs, like the Lyapunov condition, the Berry-Esseen theorem, and so on. I often have a tough time with diving right into algebra - something like http://personal.psu.edu/drh20/asymp/fall2002/lectures/ln04.pdf gives me terrible trouble. Given all these constraints, does anyone know of good resources from which I can gain a strong grasp of the CLTs?

Some things I am confused about after googling so far:

Do distributions converge to gaussians, or do means converge to the mean of a gaussian? Is the former a more difficult convergence to achieve, or are they actually the very same condition?

Is the CLT even about means? Does it say anything about the resulting variance or skewness of the resulting distribution?

Is it actually necessary to be sampling from a population, or does the CLT apply to taking the means of arbitrary distributions, regardless of where they were obtained?

Any form of media is OK, for recommendations - no preference. Please feel free to suggest things even if you’re not sure it’s what I’m looking for - you are probably better than google!

Thanks. I think I had the law of large numbers and CLT in the same bucket in my head, so pointing out they're different is helpful. Your point #5, and the attractor bit, are especially interesting - and I've seen similar arguments in Jaynes's book, around gaussians, so this is starting to get into places I can relate to. And knowing that convergence in distribution is called weak convergence should help when I'm searching for stuff. Helpful!

I guess I consider a family of random variables to be the same thing as a family of distributions? Is there a difference?