A Butterfly's View of Probability

[-]Viliam4y30

Consider the consequences of adding a single electron at the edge of the observable universe. The gravitational pull of this electron is enough to disrupt the trajectories of all air molecules on Earth after only 50 collisions... a fraction of a microsecond.

The gravitational waves propagate at the speed of light, so if you add an electron at the edge of the observable universe, I think it will take much more time until any effect of doing so reaches Earth.

(Is this correct? I am not a physicist.)

[-]Gabriel Wu4y30

Ah yes, I think that's correct (although I am also not a physicist). A more accurate description would be "In a matter of minutes after the time its gravitational waves reach earth, human events are unfolding in a measurably different fashion than they would have had that electron never existed."

[-]Adam Shai4y30

This is great! The issue of timescale is interesting to me in this. I am wondering for different systems at different levels of the ergodic heirarchy, if there are certain statements you can make (when considering the relevant timescales).

Also I am wondering how this plays with the issue of observer models. When I say that some event one month from now has 30% probability, are you imagining that I have a chaotic world model that I somehow run forward many times or push a probability distribution forward in some way and then count the volume in model space that contains the event? How would that process actually work in practice (ie how does my brain do it?).

[-]Gabriel Wu4y20

This is cool, I had never heard of the Ergodic Hierarchy before!

Related to your second point -- Alex Cai showed this psychology paper to me. It found that when humans are predicting the behavior of physical systems (e.g. will this stack of blocks fall over?), in their subconscious they are doing exactly this: running the scene in their brain's internal physics engine with a bunch of initial perturbations/randomness and selecting the majority result. Of course, predicting how a tower of blocks will topple is a lot different from predicting the probability of an event one month into the future.

[-]tailcalled4y30

I think this is a fun exercise. It of course can't replace the Bayesian model of probability, but it's conceptually interesting enough as a way to think about chaos.

[-]Slider4y30

Having to pick a metric is comparable to having an epistemological frame. Some metrics might have a different "double convergence" than other metrics. If the metrics do not agree then its not really objective.

If probablity of a statement doesn't make sense is treated as 0 it would seem to me that "I can't derive that" should also be assigned 0 by the same basis. So the logical uncertainty is defined but just clunky and not particularly inspiring. I would have also thought that the analog would be to have different axiom sets to be the things metrics are defined over.

Or if one wants to insist that logical probabloities are undefined it should also be extended so that the Trump probabilities start to become undefined once the -sphere starts to include sufficiently alien worlds. That could also be a natural boundary, the largest radius for which the statement is still defined.

One interesting metric to use would be to pick an agent in the world and use the difference in qualia / experience for the distance. That would be "worlds that feel almost like this". But if these sort of "exotic" metrics are "too partial" there is still work to be done in defining what sort of metrics are "well-behaved".

A deterministic system has good basis to be time reversible and in the case that it is then past events do have butterfly probabilities. There is analog with quantum probablities, there is no fact of the matter which slit the particle went throught in the double slit experiement. Thus starting moving away from the screen even if the particle is classical only a small velocity vector shift would be required that the ball had been coming from the other slit (both slits having appriciable butterfly amplitude).

[-]Shmi4y30

Uh, probability is in the map. Uncertainty is in the map. Bayesianism and Frequentism are not at odds. The prior is invisibly the fraction of possible past worlds one can imagine. The probability of an election outcome is the fraction of possible future worlds one can imagine that can emerge from the possible past worlds one had imagined. All that's needed is fine-graining and counting. There is no need for nonlinearity and chaos.

This also resolves the so called logical uncertainty: the probability of the n-th digit of pi being 0 depends on the agent doing the estimate. Some agents have more detailed and accurate maps than others, and their probabilities may converge with each other, such as that 3^^^^3 digit of pi is 0 with probability 1/10, even though it will likely never be calculated by anyone. My personal probability that the 20th digit of pi is 0 with probability 1/10, up until I look it up and then it snaps to either 0 or 1, or more like 10^(-5) away from those, since my senses and google can lie to me.

[-]Gabriel Wu4y50

I agree with everything you're saying. Probability, in the most common sense of "how confident am I that X will occur," is a property of the map, not the territory.

The next natural question is "does it even make sense for us to define a notion of 'probability' as a property of the territory, independent from anyone's map?" You could argue no, that's not what probability means; probability is inherently about maps. But the goal of the post is to offer a way to extend the notion of probability to be a property of the territory instead of the map. I think chaos theory is the most natural way to do this.

Another way to view this (pointed out to me by a friend) is: Butterfly Probability is the probability assigned by a Bayesian God who is omniscient about the current state of the universe up to 10^{-50} precision errors in the positions of atoms.

[-]Shmi4y30

Well, I guess you can say that, due to chaos, even the best map requires probabilities, which, in a way, makes it a feature of the territory, because it is common to all maps.

[-]TAG4y10

Probability is only in the map of it isn't in the territory as well. The theory that it is in the territory as well is not known to be tue, but is scientifically respectable. As Gabriel writes:

One response is to reject determinism. Maybe in Newton’s day we believed the universe was deterministic, but now we know about wave functions and Heisenberg uncertainty and all of that stuff. If we accept that there is true randomness occurring on the quantum level, then the outcome of the next election isn’t predetermined — it will depend on all of the quantum interactions that occur between now and 2024. With this view, it makes complete sense to assign fractional probabilities.

[-]Morpheus3y*20

Your butterfly formalism strikes me as a good description of what an "objective" probability is (and what 'frequentists' actually mean). The problem with the 'frequentist view' is best illustrated by your own example:

A coin has a 50% chance of landing heads because if you flip it 100 times, close to 50 of the flips will be heads. In contrast with Bayesianism, the frequentist view is perfectly objective: the limit of a ratio will be the same no matter who observes it.

Saying something is 50% likely because it happens 50% of the time is valid, but it does not actually refer to any real phenomenon. Real coins thrown by real people are not perfectly fair, because angular momentum is crucial, if you let the coin land on a flat surface.

In some sense, nothing is objective, there is only more and less objective. But throwing a die under carefully set up conditions (like in the casino game craps) gets you pretty close to an "objective" probability that multiple humans can agree on.

[-]Golol4y20

Let me try rephrase is in more conventional probability theory. You are looking at a metric space of universes . You probably want to take the Borel-sigma algebra $B$ as your collection of events. We think of propositions as sets $A \in B$ , which really just means $A \subset U$ is a subset which is not too irregular. Then thebindicator function $χ_{A} (u)$ is $1$ if A holds in universe $u$ and $0$ otherwise.

Your elaborations do not depend much on the time so we set $t = 0$ .

You now talk about picking a universe uniformly from a ball $B_{ϵ} (u_{0}) = u \in U : d (u, u_{0}) < ϵ$ . This is a problem. On finite dimensional vector spaces we have the lebesgue measurenand we can have such a uniform distribution. On your metric space of universes it is entirely unclear what this means. You have to actually specify a distribution. This choice of distribution then influences your outcome to the extreme. It is similar to how you can not uniformly pick a natural number. So here your result will be strongly influenced by the distribution. What we can do is say the following: We fix a sequence of probability measures $ρ_{n}$ on $U$ so that $ρ_{n}$ converges to $δ_{u_{0}}$ in the sense of weak convergence of probability measures. What this means is that you choose a sequence of distributions which approximate the dirac delta at $u_{0}$ , the distribution which samples to $u_{0}$ with probability $1$ . Then you can say something like: "The butterfly probability decay sequence around $u_{0}$ with respect to $ρ_{n}$ is given by $P_{u \sim ρ_{n}} (u \in A)$ .

Here I am also not formalizing your sense of "convergence in the middle" because this is extremely unlikely to correapond to somwthing rigorous. You can view the above as a sequence in $n$ and then study it's decay as $n$ goes to infinity, which corresponds to $ϵ$ going to zero.

But everything here will depend on your choice of $ρ_{n}$ . You can not necessarily choose uniformly from a small neighbourhood in any metric space. If the metric space is an infinite dimensional vector space uniformly, this is not possible.

There may be an alternative which means you don't have to choose the $ρ_{n}$ . You can fix a metric betwern probability measures which metrizes weak convergence, for example the Wasserstein distance $W$ . athen you could perhaps look at: ${sup}_{ρ : W (ρ, δ_{u_{0}}) < ϵ} P_{u \sim ρ} (u \in A)$ .

This may be infinite or zero though.

[-]Yair Halberstadt4y00

I'm not quite sure what the point of all of this is... You've decided you want to be able to define what a god's eyes probability for something would be, and indeed come up with what (at least initially) seems like a reasonable definition. But why should I want to define such a thing in the first place, if, as you yourself admit, it isn't actually useful for anything?

[-]TAG4y20

Bayesianism and frequentism both have their limitations.

[-]Eric Neyman4y20

I often talk about the "true probability" of something (e.g. AGI by 2040). When asked what I mean, I generally say something like "the probability I would have if I had perfect knowledge and unlimited computation" -- but that isn't quite right, because if I had truly perfect knowledge and unlimited computation I would be able to resolve the probability to either 0 or 1. Perfect knowledge and computation within reason, I guess? But that's kind of hand-wavey. What I've actually been meaning is the butterfly probability, and I'm glad this concept/post now exists for me to reference!

More generally I'd say it's useful to make intuitive concepts more precise, even if it's hard to actually use the definition, in the same way that I'm glad logical induction has been formalized despite being intractable. Also I'd say that this is an interesting concept, regardless of whether it's useful :)

[-]Yair Halberstadt4y10

How would you ever know what the butterfly probability of something is, such that it would make sense to refer to it? In what context is it useful?

[-]Eric Neyman4y20

"My probability is 30%, and I'm 50% sure that the butterfly probability is between 20% and 40%" carries useful information, for example. It tells people how confident I am in my probability.

^{^}

This observation is credited to the physicist Michael Berry (1978) and the calculations are explained in this paper. The idea is that, given some tiny error in the angle of a trajectory $Δ θ$ , the next collision will have an angle error of about $(\frac{ℓ}{R}) Δ θ$ , then the next will have an error of ${(\frac{ℓ}{R})}^{2} Δ θ$ , and so on (where $ℓ$ is the distance traveled between collisions and $R$ is the radius of each ball). So even though $Δ θ$ might be vanishingly small, the error becomes quite large after only a few collisions.

^{^}

OK fine, Spotify doesn't use random.org to shuffle its playlists, but I'm just trying to give an illustrative example.

^{^}

If you prefer an interpretation of physics in which time is discretized (as it is in a cellular automata), you can instead use a single-step transition function $^f : U \to U$ . Then you can think of $f (U, t)$ as $^f (^f (\dots (U)))$ , where $^f$ is iterated $t$ times.

^{^}

We technically haven't defined a preferred probability distribution on $U$ for which we can invoke the phrase "uniformly at random". I suppose one way you could do this would be to think of $U$ as $R^{6 n}$ (three spatial components and velocity components for each particle), where $n$ is the number of particles in the universe, and weight your probability distribution by $6 n$ -dimensional volume. Or you could think of $U$ as being discretized by choosing some super small "precision level" at which to encode positions and velocities. But at this point we're just getting silly — it really doesn't matter.

^{^}

Don't let it bother you that this definition involves a $Pr$ . We're not being circular because we're only constructing this definition for physical-world probabilities — we're allowed to assume that the mathematical theory of probability rests on solid ground.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

29

A Butterfly's View of Probability

29

29

Bayesianism and Frequentism

The Butterfly Effect

The Formalization

The Intuition

Conclusion