Counterintuitive Coin Toss. Part I

PaulTheHuman

Translation from Russian. Original text available here.

When I was about thirteen, a question occurred to me.

Suppose we have a point at coordinates (0, 0). We flip a coin, and if it comes up heads, we move the point one unit to the right. If it's tails, we move it one unit to the left. To make the process easier to observe, we do the same with up or down movement — using another coin toss.

So, if the point leaves a trail, what would we see on a screen or a sheet of paper after, say, a thousand such steps?

Right now, before you read further, you can also test your intuition or, for example, your depth of understanding of basic probability theory concepts. Stop right here, at the very beginning of the article, and try to imagine what the picture would look like. Naturally, not exactly — we can't predict that because we can't predict how the coin will land — but "in principle." What will it be?

Back then, I reasoned like this: coin toss outcomes are equally probable. So, approximately as many times as it moves left, it will move right. The same for up or down. And this will continue indefinitely.

Therefore, the drawing probably should look like some kind of cloud around the point (0, 0). Something like an almost filled circle or square — not perfectly even, of course, but roughly that shape.

Fortunately, I was learning to program around that time, so I quickly programmed this process — with a small modification: the virtual coin became three-sided, meaning it could show not only 1 and -1, but also 0.

I must say, what I saw on the screen surprised me quite a lot: there was no cloud, but rather something that could easily be mistaken for a map of some continent — an algorithm I later modified for drawing such maps.

Here is what it looks like.

But before explaining why things turn out this way, I'll make a few more additions.

Many things wander

Around the same years, Brownian motion was part of my school curriculum, which, judging by the description, should represent the same process. However, the physics textbook had a very unclear illustration, so I didn't even immediately associate the two. And realizing this "random" trajectory from a picture was quite difficult.

Furthermore, I repeat, even the basic tenets of probability theory are highly counterintuitive, so I've even heard physics teachers say that "a small particle of a substance in a liquid, under the influence of the liquid's molecules bombarding it, will just flutter in place."

It would be all the more surprising to observe this process live or in a computer emulation and see that it's not just staying in place at all — on the contrary, some particles dropped into the center of a glass will drift to one of the walls in a fairly short time. Moreover, different particles go to different walls, so it's not about a "liquid flow." And some particles will indeed "flutter" near where they landed for quite a long time.

Something similar could be observed in a case more familiar to people than observing Brownian motion in a school lab.

Suppose one of your relatives is baking buns in the kitchen. But you, sitting in your room, suddenly smell them. This means that some aromatic particles, instead of fluttering inside the oven, managed to fly out of it, then out of the kitchen, then turn into the hallway and fly to your room, then turn into the room and somehow steer towards your olfactory receptors. And this would happen even without any drafts in the apartment.

And surely, the particles didn't have the goal of reaching your receptors. Nor did they know how to get there. But they made it. In this random process of collisions with air molecules, with the crystal lattices of walls, ceiling, and floor, and so on.

Finally, an even more obvious but still non-obvious point — the very coin toss I used to move the point. How much would you win after a hundred games? Heads and tails are equally likely, but what would the sum be? Zero? But is it always zero?

And if not, how far from zero?

Test yourself again: how obvious is the result to you?

Why does it happen this way?

Understanding why the point flies away somewhere — instead of drawing a cloud around its start — can be achieved through the following purely logical reasoning.

We started from the point (0, 0) and assume that a cloud will be drawn around this point further on.

But at the very first step, the point moved, say, to (1, 1). Why shouldn't this point become the starting point? Why wouldn't a cloud be drawn around it? What special magic forbids this?

Then, after some steps of drawing the hypothetical cloud, we ended up, say, at (10, 10). What prevents this point from being the starting point? Why wouldn't a cloud be drawn around it? Why shouldn't the point, just as it flew away from (0, 0), also fly away from (10, 10) by 10 to the right and 10 up?

In other words, we quickly arrived at a contradiction. If the cloud assumption were correct, a cloud would have to be drawn around every point on the path. Thus, many clouds would have to be drawn. This contradicts the assumption of a single cloud.

The point is that intuitively, all these statements about "equal probability" and "zero sum as it tends to infinity" push us to think that we are talking about some kind of "universal harmony." As if the universe "maintains balance and justice," and therefore inclines any random system we observe towards equilibrium around the point from which we started observing.

Simply put, it intuitively seems to us that if a coin has come up tails ten times, then surely heads must come up next: since each side seems equally likely, after ten tails, the coin has accumulated a lot of "head-ness."

But no, the coin has no memory of previous states. No matter how many times tails has come up, the probability of it coming up next is still ½, the same as for heads.

And as a consequence, no matter how far from zero the amount we have already won or lost is, the amount of winnings from all subsequent tosses still tends to zero, meaning that "on average you are obligated" to stay with the winnings already achieved.

If you've already won ten dollars after a hundred coin-toss games with a one-dollar bet, this does not mean that the "universe" obligates you to lose those ten dollars over the next few games. Oh no, the expected sum of all subsequent games is still zero.

So many paradoxes

It seems there are too many paradoxes in the previous section.

How can we simultaneously consider the outcomes equally probable, yet you can still win not just a little, but quite a lot?

How can we simultaneously claim that the expected win is zero, but you are still likely to keep what you won, say, in the first hundred games out of a thousand during the subsequent nine hundred games?

Well, firstly, we have experimental confirmation — you yourself see that the point doesn't just spin around the start but moves somewhere, so you really can win or lose.

And secondly, we need to understand what we mean by "likely to keep."

"Likely" here means that if we don't just play a thousand games of coin toss once and look at the winnings, but repeat the thousand-game match, say, ten thousand times, then most outcomes of such thousand-game matches will show us a zero total win for each player.

That's why we talk about the "most probable sum" — such a sum will indeed occur most often.

However, "most often" in turn means "more often than a sum equal to any other single number." More often than a sum equal to any other number, but not necessarily more often than all non-zero sums combined.

For example, in this computer simulation of such a process, I found that a zero sum in a thousand-game match occurred in about 270 trials out of 10,000 (here, the horizontal axis represents the total win, and the vertical axis represents how many times it occurred).

And yes, here it did occur more often than any other single sum. But clearly not more often than all others combined: because 270 is clearly less than 9,730.

In other words, the most likely win in a series of coin-toss games is indeed zero, but nevertheless, a non-zero result is more likely than a zero result.

Moreover, among non-zero results, both positive and negative sums occur with approximately equal frequency. So, there are about 4,865 positive sums here.

Suddenly, it is much more likely (18 times more likely!) that you will win some money than that you will end up with your original amount.

True, it is equally likely that you will lose some money — this is another deep meaning of "mathematical expectation of winnings."

Probability theory vs. paradoxes

If we write all this in mathematical language, we get the following.

We have a series of outcomes corresponding to a coin toss: 1 — heads, -1 — tails.

For example:

We can calculate the mean of the outcomes:

However, we are interested in what would happen if we continued conducting trials.

In this case, we could say that, due to the equal probability of heads and tails, there would be roughly equal numbers of outcomes. But some deviation is quite possible. Let's denote the difference in the number of outcomes in favor of, say, heads as k. This means there are exactly k more ones than minus-ones in the numerator, and thus our expectation of the average outcome value will be

Each new toss will definitely increase n in the denominator by one, but not every toss will increase the difference between heads and tails in the numerator — some, on the contrary, will decrease it.

Thus, we can conclude that the more tosses we make, the smaller the average outcome's difference from zero tends to be on average. That is, as the number of tosses tends to infinity, the average of the outcomes tends to zero.

This is precisely what the phrase "the mathematical expectation of the outcome is zero" means.

But in the process of random walks, we are calculating not the average outcome at all, but the sum of outcomes. That is, in this case, we lack that very denominator which guaranteed the entire fraction tending to zero.

And this means that in this process, there is no regular tendency for the result to tend to zero as the number of tosses increases. Yes, the difference in the number of outcomes may become zero at some point, but then it may start increasing again in one direction or the other from zero.

Although the outcomes are equally probable, each trial is different, and depending on luck, we can observe an arbitrarily long "imbalance" in the number of heads and tails over a long series of tosses. This is precisely what the experimental distribution graph of sums over a thousand tosses shows us.

We made a thousand tosses, summed their results, and did this ten thousand times. The most frequently occurring sum across the trials is zero, however, the majority of trials gave a sum that was not zero at all.

If we calculate the average sum across all thousand-toss games, it will also be approximately zero. Moreover, the more games we conduct, the closer the average winnings will be to zero.

In other words, the mathematical expectation of the outcome is zero, the mathematical expectation of the sum of outcomes is also zero, however, each specific sum of outcomes will equal zero only in relatively rare cases.

If this still seems paradoxical, think about this: the coin only shows heads or tails, which we assigned the values "1" and "-1". The expected value of the outcome is zero, even though "0" cannot appear on the coin at all.

But this is not the end yet. The distribution graph of outcomes has the shape of a "bell." The center of this bell approximately coincides with the mathematical expectation, but how "wide" is this bell?

To characterize the width of such distributions, a quantity called the "standard deviation" is introduced, usually denoted as "σ".

The standard deviation roughly corresponds to half the width of this "bell" at half its height. With a large number of trials, about 2/3 of outcomes will fall within the range from -σ to σ around μ (the arithmetic mean).

If we calculate the standard deviation for the event "coin toss," using a method similar to the one above, we find that as the number of tosses (n) increases, the standard deviation tends to one.

The second term under the square root tends to zero as the number of tosses tends to infinity for reasons explained earlier, meaning the standard deviation of a single coin toss result tends to one during this time.

However, unlike the standard deviation of a single coin toss result, the standard deviation of the sum of toss results in a single game, on the contrary, increases as the number of tosses per game increases.

Let there be m games total, each with n tosses. The "k" with an index is the difference between the number of heads and tails in each game, and consequently the sum of tosses in that game.

The average sum across all games, as we know, tends to zero as the number of games increases, so we can safely consider it zero when calculating the standard deviation, assuming the number of games is arbitrarily large.

However, if we keep the same number of games but change the number of coin tosses per game, we find that the denominator here is a constant (that constant number of games), while the numerator contains a sum of positive numbers (since the square of any number is positive). But the more tosses in a game, the further from zero their sum can be in each specific game. That is, as the number of tosses per game increases, the numerator increases on average. Moreover, it increases without bound.

This can be imagined as follows. Suppose there are only ten tosses in a game. We play a hundred games, but the sum of ten tosses cannot be greater than ten or less than minus ten. So the results of all hundred games will fall within the range from minus ten to ten.

Now we start playing games with twenty tosses. It is clear that in some of them, the result might be greater than ten or less than minus ten. However, we played exactly the same hundred games as last time, and if some number of games fell outside the previous range by their results, then fewer games fell inside that range than before.\

And this will happen each time we increase the number of tosses in each game: the numerator will increase on average.

Therefore — since the denominator is unchanged and the numerator grows without bound — the fraction itself, i.e., the standard deviation characterizing the width of the "bell" of sums won or lost in each game, also grows without bound.

In other words, the standard deviation of the sum of tosses grows without bound as the number of these tosses increases, despite the fact that the standard deviation of a single toss tends to one during this time.

Here we observe two different mathematical expectations — of a single toss and of the sum of tosses — and two different corresponding standard deviations.

The expectation of both a single toss and the sum of tosses tends to zero as the number of tosses per game increases, however, the standard deviation of a single toss tends to one, while that of the sum of tosses (i.e., the amount won in the game) tends to infinity.

This is precisely what leads to the fact that the more tosses, the more often we will observe a non-zero sum in each specific game.

Because these are often referred to by the same words, a lot of confusion is introduced into reasoning. Additionally, the mathematical side of what is happening most likely does not coincide with what intuition suggests. This leads to erroneous judgments and even a feeling of many paradoxes in this process.

However, a correct understanding of probability theory eliminates these "paradoxes."

And, importantly, it allows us to draw interesting conclusions.

For example, you could open...

Coin Toss Game Courses

Yes, they would be a scam. But let's look at why such a scam works.

Suppose you actually open such courses. Ten thousand people come to you and pay you one dollar each. In return, you promise them a path to enrichment, but in reality, you just feed them nonsense about believing in themselves, cosmic energies, mental unity with the coin, deep study of the coin market, and the like.

After the courses, most will probably try to get rich and play some number of games. Let's say a thousand each, as in the example in the previous section.

As we already know, only about 270 of them will end up with their original amount. They will be favorably contrasted by 4,865 winners — a pretty good result, because not every course leads to success for every second student. Moreover, about 4,600 of them will even recoup their one-dollar tuition.

These are the ones you will cite when talking to the 270 who broke even and the 4,865 who lost. Especially since not all of them will come to complain.

Those who do come will see success stories of just under half of the course, whom you will call "diligent students." Many will feel ashamed at this point and drop their claims.

To those who persist, you will offer to play again. And, suddenly, again just under half of them will win. Some might even manage to win back what they lost last time. However, even for many of those who don't win it back, the very fact of winning will seem quite convincing — what if they just used your recommendations incorrectly last time?

And finally, to a handful of the most stubborn, you will refund their one-dollar tuition.

Heck, you could even refund that dollar to everyone who lost — you would still have about 5,000 dollars left.

Obtained for nothing. For completely meaningless advice.

Note, by the way: only a handful will win more than 100 dollars, although each of them will be sure that they didn't waste their dollar on tuition — look how well it paid off, a hundred times over. The rest will show more modest results. And just under half will lose overall.

And towering proudly above all of them will be one person: the one who didn't play coin toss at all but received — depending on the refund policy — between 5,000 and 10,000 dollars for teaching all these people.

We could consider more complex options. For example, courses on teaching stock trading. But perhaps we'll leave that as food for independent thought.

Paired Coin Toss

Now let's modify the game process a bit. This time, let's have 10,000 people, each with 10,000 dollars. They are randomly divided into pairs, and each pair randomly chooses a bet for that round — from zero to 100 dollars. Then they flip a coin, and depending on the outcome, one of them pays the other that bet.

Then they are randomly divided into pairs again and repeat the process. And so on, ten thousand times.

So, as a result, we have ten thousand pairwise games between random partners with a random (though limited) bet.

If you hadn't been prepared by the previous text, you would probably think that after such a process, everyone would end up roughly with their original amount. Although... Maybe some still think that's how it would be?

But, of course, no, it won't be like that at all. As before, the number of people ending up with their original amount will likely be greater than the number ending up with any other specific sum, but the vast majority of players will not end up with their original amount at all.

In this case, however, I will present the results in the form of a histogram, where each bar represents the number of people with a sum within a certain range (the width of each bar here covers a range of 2,000 dollars).

As we can see on the histogram, some people have gone into negative territory — apparently, they owe a bunch of money to other players and will now have to pay them back with interest. On the other hand, there are some players whose wealth has exceeded 30,000 dollars — starting from an initial 10,000.

And, what is particularly interesting, here we know for sure that this entire game is the result of pure chance. The players' personal skills definitely did not affect the outcome. And yet, the outcome shows us a number of lucky ones.

But wait, what if...

Competition Rewards the Smart and Talented

Yes-yes, if we rephrase the previously considered process, we can draw completely different conclusions.

We have a society of, say, 10,000 people. They produce something and enter into completely voluntary exchange transactions with each other. However, some are more talented, smarter, harder-working, and so on, and conditions are constantly changing (some adapt better, some worse), so each transaction might turn out slightly in favor of one of its participants.

But a fair system doesn't allow cheating, so all random fluctuations should cancel out. If two "economic agents" are equally smart, talented, etc., if they produce things equally useful to others, then all these "sometimes won a little — sometimes lost a little" should sum up to an equal gain or loss for them.

And if someone has risen to the top, it's only because they are exceptionally smart and exceptionally useful to society. And those who went bankrupt and got into debt are just slackers and lazy people.

As proof that things are exactly like this in this system, one could even cite the histogram of money distribution from the previous section.

True, one would have to slightly omit the fact that it was obtained as a result of a completely random process. In which there were certainly no smart, no talented, and not even any product useful to society, except for the entertainment effect of the coin-toss game itself.

But still, the histogram fits perfectly into this beautiful theory: there are the average folks, who are the majority; there are the especially talented — they are few, and they received their reward — of course, strictly for their talent; and there is a slightly larger number of ham-fisted slackers than talented ones, who ended up with nothing (yes, we just lumped all the bankrupts together).

Only one thing is slightly troubling: for a similar distribution, the pure statistical regularity alone is sufficient — arising from completely random fluctuations in each transaction between two agents, independent of their personal qualities. Because of this regularity, histograms will still show what can later be interpreted as a "reward for talent" or a "penalty for laziness." That is, generally speaking, to draw real conclusions about intelligence, hard work, stupidity, and laziness, and about whether a given system "rewards" or "penalizes" specifically for them, the obtained statistics should first be cleaned of the effect caused by the purely random nature of the process itself.

But, fortunately, there is a second option that greatly saves effort and eliminates inconvenient questions: as noted above, one can simply "forget" to mention the purely statistical effect. Just slightly ignore its existence. Well, so that it doesn't spoil such a beautiful and pleasant theory about the fairness of the system described in the model.

But histograms aren't like that!

Perhaps some particularly attentive readers still did not take the author's word for it (which the author wholeheartedly approves of) and went to look at histograms of wealth and income distribution in all sorts of fair societies.

And as a result, they discovered that on these histograms, the top of the "bell" seems to have been hit from the right, causing it to crumple to the left. That is, the distribution seems not quite the same as here, which quite naturally hints to us that the process considered here is not quite accurately described by the model presented.

Something else is clearly influencing. But what? Could it be that very intelligence and stupidity, laziness and hard work?

I have an answer to this question as well. But we'll talk about it a little later.