If you now put a detector in path A , it will find a photon with probability ( ), and same for path B. This means that there is a 50% chance of the configuration |photon in path A only>, and 50% chance of the configuration |photon in path B only>. The arrow direction still has no effect on the probability.
Isn't this kind of assertion implicitly taking a pretty strong stance on a particular philosophical interpretation?
We have some observations (counts of how many times each detector went off in past experiments), and a theory which explains those observations in terms of complex amplitudes and their magnitudes. A more agnostic stance would be to just say that the photon in the experiment is the amplitudes of the two configuration states, and the relative magnitudes of each state in this particular experiment are equal.
Combining the observations and the underlying theory to assign probabilities to future observations, or even talking about past observations in terms of probabilities, introduces a whole bunch of thorny philosophical questions: Bayesian vs. frequentist interpretations, what a "future" even is, and subjective experience.
But you can ignore all of those issues and treat the observation-counts as measuring a magnitude, and then use the theory to infer the underlying complex amplitudes, without ever having to talk about the ratios of the magnitudes as representing anything in particular, e.g. a probability of something happening (whatever it means for something to happen, anyway).
I personally think some or all of those philosophical questions have relatively straightforward answers, but the point is you don't actually need to resolve them to understand the parts of QM introduced here, if you're careful not to talk about the probabilities implied by your experimental observations as anything more than an afterthought.
You do sometimes need to introduce probabilities if you want to predict future observations (rather than just accepting the observation-counts as brute facts) for specific real experiments in our actual world, but once you're doing experimentation and prediction in the real world (rather than just talking about thought experiments that represent different ways a world logically-possibly could be), you unavoidably have to deal with a bunch of philosophical questions anyway, mostly unrelated to QM itself.
Overall, I think this is a nice presentation of the basic concepts and math, and the diagrams in particular are a lot clearer than in the original.
But your prognostications about the intellectual "sinfulness" and "unforgivability" of mistakes in Eliezer's original presentation are kind of weird and uncharitable.
For one, Eliezer offered his own disclaimer 16 years ago in a comment:
I think some of my readers may be overestimating the degree to which I intend to explain quantum mechanics, here. I'm not doing a textbook. I'm trying to get (reasonably smart nonphysicist) readers to the point where they're no longer confused, and the remaining difficulties are mere matters of math.
For two, your specific claims about the likely confusion that Eliezer's presentation could induce in "laymen" is empirically falsified to some degree by the comments on the original post: in at least one case, a reader noticed the issue and managed to correct for it when they made up their own toy example, and the first comment to explicitly mention the missing unitarity constraint was left over 10 years ago. ^{[1]}
Finally, I think the actual QM concepts here are sufficiently basic that I expect both you and Eliezer (and many LW readers) would get the right answers on a test consisting of questions about variations on these toy experiments (i.e. I predict that Eliezer, now or in 2008, would not have failed to account for unitarity when it mattered to the final answer).
So the relevant expertise for deciding whether a particular explanation has a "deep flaw" or which context and details are "important", isn't expertise in QM (since a solid grasp of the basics likely suffices for getting the right answers on test questions related to the toy problems presented here), but rather depends mostly on judgement and expertise related to pedagogy and technical explanation in general.
I think your presentation is a better and clearer presentation of the basic math and of our actual physical reality, especially for passive readers. (For active readers, e.g. GreedyAlgorithm, there are probably some minor pedagogical advantages to having to find flaws and work out the details on your own, which is a well-known skill for building up durable and true understanding.)
I think Eliezer's presentation is clearer as an explanation of how and why QM-like theories are meaningful in the first place, and a good gears-level explanation of the kinds of predictions they make, qualitatively if not always quantitatively. It also lays a foundation for the philosophical points Eliezer makes in future posts, posts which are IMO much clearer and more correct treatments of the deeper philosophical issues than any other physicist or philosopher has ever written up.
Incidentally, that commenter suggests pretty much exactly the correction you make, which is to just replace the rule with a more physically-accurate one, without going into the details of why. But the commenter manages to avoid attacking the omission as some kind of sinful, irresponsible flaw and just notes that a different pedagogical choice might make the presentation more clear.
For two, your specific claims about the likely confusion that Eliezer's presentation could induce in "laymen" is empirically falsified to some degree by the comments on the original post: in at least one case, a reader noticed the issue and managed to correct for it when they made up their own toy example, and the first comment to explicitly mention the missing unitarity constraint was left over 10 years ago.
Some readers figuring out what's going on is consistent with many of them being unnecessarily confused.
Apologies for the late reply, but thank you for your detailed response.
Responding to your objection to my passage, I disagree, but I may edit it slightly to be clearer.
I was simply trying to point out the empirical fact that if you put a detector in path A and a detector in path B, and repeat the experiment a bunch of times, you will find the photon in detector A 50% of the time, and the photon in detector B 50% of the time. If the amplitudes had different values, you would empirically find them in different proportions, as given by the squared amplitudes.
I don't find these probabilities to be an "afterthought". This is the whole point of the theory, and the reason we consider quantum physics to be "true". We never see these amplitudes directly, we infer them from the fact that they give correct probabilities via the Born rule. Or more specifically, this is the formula that works. That this formula works is an empirical fact, all the interpretations and debate are a question of why this formula works.
Regarding the defense of the original sequence, I'm sorry, but incorrect math is incorrect math. The people who figured out the mistake in the comments figured it out from other sources. If anything, it is even more damning that people pointed the mistake out 10 years ago, and it still hasn't been fixed. For every person who figured out the problem or sifted through hundreds of comments to figure out the issue, there are dozens more who accepted the incorrect framework, or decided they were too dumb to understand the math when it was the author who was wrong.
My problem is that the post is misinforming people. I will make no apology for being harsh about that.
I will restrain my opinion on Eliezers other quantum posts for a future post when I tackle the overstated case for many worlds theories.
We never see these amplitudes directly, we infer them from the fact that they give correct probabilities via the Born rule. Or more specifically, this is the formula that works. That this formula works is an empirical fact, all the interpretations and debate are a question of why this formula works.
Sure, but inferring underlying facts and models from observations is how inference in general works; it's not specific to quantum mechanics. Probability is in the Mind, even when those probabilities come from applying the Born rule.
Analogously, you could talk about various physical properties of a coin and mechanics of a flip, but synthesizing those properties into a hypothesized Coin Rule involves translating from physical properties inherent in the system itself, to facts which are necessarily entangled with your own map. This is true even if you have no way of measuring the physical properties themselves (even in principle) except by flipping the coin and using the Coin Rule to infer them back.
I'm a little confused by what your objection is. I'm not trying to stake out an interpretation here, I'm describing the calculation process that allows you to make predictions about quantum systems. The ontology of the wavefunction is a matter of heated debate, I am undecided on it myself.
Would you object to the following modification:
If you now put a detector in path A , it will find a photon with probability ( ), and same for path B. If you repeated this experiment a very large number of times, the results would converge to finding it 50% of the time in the configuration |photon in path A only>, and 50% of the time in the configuration |photon in path B only>. The arrow direction still has no effect on the probability.
I mildly object to the phrase "it will find a photon". In my own terms, I would say that you will observe the detector going off 50% of the time (with no need to clarify what that means in terms of the limit of a large # of experiments), but the photon itself is the complex amplitudes of each configuration state, which are the same every time you run the experiment.
Note that I myself am taking a pretty strong stance on the ontology question, which you might object to or be uncertain about.
My larger point is that if you (or other readers of this post) don't see the distinction between my phrasing and yours, or don't realize that you are implicitly leaning on a particular interpretation (whether you're trying to do so or not), I worry that you are possibly confused about something rather than undecided.
I actually don't think this is a huge deal either way for a presentation that is focused on the basic mechanics and math. But I preregister some skepticism of your forthcoming post about the "overstated case for many worlds theories".
I am assuming you are referring to the many worlds interpretation of quantum mechanics, where superpositions extend up to the human level, and the alternative configurations correspond to real, physical worlds with different versions of you that see different results on the detector.
Which is puzzling, because then why would you object to "the detector finding a photon"? The whole point of the theory is that detectors and humans are treated the same way. In one world, the detector finds the photon, and then spits out a result, and then one You sees the result, and in a different world, the detector finds the photon, spits out the other result, and a different result is seen. There is no difference between "you" and "it" here.
As for the photon "being" the complex amplitudes... That doesn't sound right to me. Would you say that "you" are the complex amplitudes assigned to world 1 and world 2? It seems more accurate to say that there are two yous, in two different worlds (or many more).
Assuming you are a many worlder, may I ask which solution to the Born probabilities you favour?
I'm a many-worlder, yes. But my objection to "finding a photon" is actually that it is an insufficiently reductive treatment of wave-particle duality - a photon can sometimes behave like a little billiard ball, and sometimes like a wave. But that doesn't mean photons themselves are sometimes waves and sometimes particles - the only thing that a photon can be that exhibits those different behaviors in different contexts is the complex amplitudes themselves.
The whole point of the theory is that detectors and humans are treated the same way. In one world, the detector finds the photon, and then spits out a result, and then one You sees the result, and in a different world, the detector finds the photon, spits out the other result, and a different result is seen. There is no difference between "you" and "it" here.
Yep! But I think treating the notion of a "you" at this level of reductiveness would actually be overly reductive and distracting in this context. (Picky, aren't I?)
Would you say that "you" are the complex amplitudes assigned to world 1 and world 2? It seems more accurate to say that there are two yous, in two different worlds (or many more).
I would say that there are two people in two different worlds, but they're both (almost entirely) me.
It often makes sense to talk about non-ontologically-basic concepts like a photon-as-a-little-billiard-ball, and a person-in-a-single-Everrett-branch as meaningful things. But the true notion of both a "me" and a "photon" requires drawing the conceptual boundaries around the complex amplitudes assigned to multiple worlds.
What part of "finding a photon" implies that the photon is a billiard ball? Wave-particle duality aside, a photon is a quanta of energy: the detector either finds that packet or it doesn't (or in many worlds, one branched detector finds it and the other branched detector doesn't).
I'm interested to hear more about how you interpret the "realness" of different branches. Say there is an electron in one of my pinky fingers that is in a superposition of spin up and spin down. Are there correspondingly two me's, one with with pinky electron up and one with pinky electron down? Or is there a single me, described by the superposition of pinky electrons?
If the photon were only a quanta of energy which is entirely absorbed by the detector that actually fires, how could it have any causal effects (e.g. destructive interference) on the pathway where it isn't detected?
OTOH, if your definition of "quanta of energy" includes the complex amplitude in the unmeasured path, then I think it's more accurate to say that the detector finds or measures a component of the photon, rather than that it detects the photon itself. Why should the unmeasured component be any less real or less part of the photon than the measured part?
Say there is an electron in one of my pinky fingers that is in a superposition of spin up and spin down. Are there correspondingly two me's, one with with pinky electron up and one with pinky electron down? Or is there a single me, described by the superposition of pinky electrons?
If there were a higher-dimensional being simulating a quantum universe, they could treat the up-electron and down-electron people as distinct and do different things to them (perhaps ones which violate the previous rules of the simulation).
But I think your own concept of yourself (for the purposes of making predictions about future observations, making decisions, reasoning about morality or philosophy, etc.) should be drawn such that it includes both versions (and many other closely-related ones) as a single entity.
Okay, let me break in down in terms of actual states, and this time, let's add in the actual detection mechanism, say an electron in a potential well. Say the detector is in the ground state energy, E=0, and the absorption of a photon will bump it up to the next highest state, E=1. We will place this detector in path A, but no detector in path B.
At time t = 0, our toy wavefunction is:
1/sqrt2 |photon in path A, detector E=0> + 1/sqrt2 |photon in path B, detector E=0>
If the photon in A collides with the detector at time t =1, then at time t=2, our evolved wavefunction is:
1/sqrt2 |no free photon, detector E=1> + 1/sqrt2 |photon in path B, detector E=0>
Within the context of world A, a photon was found by the detector. This is a completely normal way to think and talk about this.
I think it's straight up wrong to say "the photon is in the detector and in path B". Nature doesn't label photons, and it doesn't distinguish between them. And what is actually in world A is an electron in a higher energy state: it would be weird to say it "contains" a photon inside of it.
Quantum mechanics does not keep track of individual objects, it keeps track of configurations of possible worlds, and assigns amplitudes to each possible way of arranging everything.
Here's a crude Google Drawing of t = 0 to illustrate what I mean:
Both the concept of a photon and the concept of a world are abstractions on top of what is ultimately just a big pile of complex amplitudes; illusory in some sense.
I agree that talking in terms of many worlds ("within the context of world A...") is normal and natural. But sometimes it makes sense to refer to and name concepts which span across multiple (conceptual) worlds.
I'm not claiming the conceptual boundaries I've drawn or terminology I've used in the diagram above are standard or objective or the most natural or anything like that. But I still think introducing probabilities and using terminology like "if you now put a detector in path A , it will find a photon with probability 0.5" is blurring these concepts together somewhat, in part by placing too much emphasis on the Born probabilities as fundamental / central.
Nice graph!
But as a test, may I ask what you think the x-axis of the graph you drew is? Ie: what are the amplitudes attached to?
I'm not claiming the conceptual boundaries I've drawn or terminology I've used in the diagram above are standard or objective or the most natural or anything like that. But I still think introducing probabilities and using terminology like "if you now put a detector in path A , it will find a photon with probability 0.5" is blurring these concepts together somewhat, in part by placing too much emphasis on the Born probabilities as fundamental / central.
I think you've already agreed (or at least not objected to) saying that the detector "found the photon" is fine within the context of world A. I assume you don't object to me saying that I will find the detector flashing with probability 0.5. And I assume you don't think me and the detector should be treated differently. So I don't think there's any actual objection left here, you just seem vaguely annoyed that I mentioned the empirical fact that amplitudes can be linked to probabilities of outcomes. I'm not gonna apologise for that.
But as a test, may I ask what you think the x-axis of the graph you drew is? Ie: what are the amplitudes attached to?
Position, but it's not meant to be an actual graph of a wavefunction pdf; just a way to depict how the concepts can be sliced up in a way I can actually draw in 2 dimensions.
If you do treat it as a pdf over position, a more accurate way to depict the "world" concept might be as a line which connects points on the diagram for each time step. So for a fixed time step, a world is a single point on the diagram, representing a sample from the pdf defined by the wavefunction at that time.
"position" is nearly right. The more correct answer would be "position of one photon".
If you had two electrons, say, you would have to consider their joint configuration. For example, one possible wavefunction would look like the following, where the blobs represent high amplitude areas:
This is still only one dimensional: the two electrons are at different points along a line. I've entangled them, so if electron 1 is at position P, electron 2 can't be.
Now, try and point me to where electron 1 is on the graph above.
You see, I'm not graphing electrons here, and neither were you. I'm graphing the wavefunction. This is where your phrasing seems a little weird: you say the electron is the collection of amplitudes you circled: but those amplitudes are attached to configurations saying "the electron is at position x1" or "the electron is at position x2". It seems circular to me. Why not describe that lump as "a collection of worlds where the electron is in a similar place"?
If you have N electrons in a 3d space, the wavefunction is not a vector in 3d space (god I wish, it would make my job a lot easier). It's a vector in 3N+1 dimensions, like the following:
where r1, r2, etc are pointing to the location of electron 1, 2, 3, etc, and each possible configuration of electron 1 here, electron 2 there, etc, has an amplitude attached, with configurations that are more often encountered experimentally empirically having higher amplitudes.
::starts reading::
The amplitude of the configuration |a photon is heading to the right> is then given by:
e^iθ(x,t)
Okay, I'm already confused. What are θ, x, and t in this context? (I already know about complex exponentiation.) Am I supposed to calculate θ given x and t?
::reads some more::
If our psi is e^iθ,
Wtf is a psi? Pounds per square inch? You haven't used this term anywhere. (I know from college physics that psi = the wavefunction, but *you *haven't told us what you mean yet.)
::to be continued::
Setting aside most problems with the original, I've always found this interferometer example an unsatisfying introduction because it's surprisingly ambiguous exactly what's quantum mechanical here or what's special about quantum mechanics.
You have superposition and interference in classical electromagnetism. That's enough for everything until you get to the two-photon experiment (that is, for everything in "Configurations and Amplitude"). Single photons and photon counters are posited, but these are taken as given where I would sooner take as given the idea that a solution to a wave equation can be associated with a complex amplitude. Otherwise, up to that point one might as well have been talking about electromagnetic pulses and intensity detectors.
So is the interference between many-photon states the key? ("Joint Configurations"?) Not exactly. If you have classical light sources, then both quantum and classical theories give the same answer. Worse than that, really—for the quantum description, you have to posit that photons from different sources can be indistinguishable for the purposes of interference between many-photon states ("Distinct Configurations", although it's extra unclear about that), whereas that comes naturally if you're just talking about electromagnetic fields as usual.
It feels somehow unfashionable in an age of quantum information to talk about wave-particle duality as the central surprise in quantum mechanics, but I think it's right to zero in on the idea that you get superposition and interference in systems where otherwise-successful analogies from everyday experience don't allow that.
Maybe it's because my perspective is "electromagnetism-first". From that direction, you'd introduce quantum mechanics by establishing the need for photons with things like the photoelectric effect. I suppose if you're coming from the perspective that photons are only real, discrete, individual particles, then all this build-up for interferometry might make sense—did you know light can act like a wave, too? But then I think electron diffraction or spin polarization is more straightforward and doesn't risk hammering on things that are totally fine classically.
(A marginally related suggestion—the diagrams of MZIs with lasers are going to be a little misleading for talking about experiments with photon number states, because lasers are not single photon sources. Maybe I'd be clearer about the difference between the diagram and situation in the text or just modify the figure.)
I think this post could be really good, and perhaps there should be an effort to make this post as good as it can be. Right now I think it has a number of issues.
It's too short. It moves very quickly past the important technical details, trusting the user to pick them up. I think it would be better if it was a bit longer and luxuriated on the important technical bits.
It is very physics-brained. Ideally we could get some math-literate non-physicists to go over this with help from a physicist to do a better job phrasing it in ways that are unfamiliar to non-physicists.
It should be published somewhere without part 2. Part 2 is intracommunity discourse, Part 1 is a great explainer, and I'd love to be able to link to it without part 2 as a consideration.
Many (e.g., the ones that contain the string "Ebborian") of the sequences posts on quantum physics are missing from the 2015 book Rationality From AI to Zombies, but the 3 posts (“configurations and amplitudes”, “Joint configurations”, and “distinct configurations”) under discussion are present.
Confidence level: I am a physicist with a phd in computational quantum chemistry. I’m pretty sure there are no major errors here, but I still may have missed something. Thanks to Blake Stacey for looking over the post.
The goal of this post is to rewrite everything in the three sequences posts “configurations and amplitudes”, “Joint configurations”, and “distinct configurations”. I think they were noble attempts to explain a very difficult subject. Unfortunately, all of them are based on a mathematically incorrect description of their subject matter.
The first part of this post is a new introduction to the basics of quantum physics, using a basic interferometer setup as our guide. It is aimed at the dedicated layman level: There will be math, but it will be generally basic. The main aim is for you to get a feel for how things actually operate at the quantum level. It should cover all the material in those three posts above, but with corrected math and more useful explanations.
Only then, in the second part, will I explain why the original sequences are incorrect and deeply flawed. If you are just here for the drama, you can skip to the section “why the sequences posts are incorrect”.
Part 1: The quantum world
Introduction to amplitudes
At the smallest level, the universe is run by the rules of quantum physics, which are fundamentally unlike anything we encounter in our everyday life. A light photon is not a billiard ball, or a wave of water. It’s a third thing which shares characteristics of both or neither, and it acts in a new way, according to new rules.
The quantum world is described by a “wavefunction”, that allows us to calculate how likely any configuration of physical properties is to occur. Each possible configuration is assigned a “probability amplitude”, which is a complex number that is attached to that configuration of our system.
When I talk about a configuration, I will place it in brackets^{[1]} |like this>. For example, in the Schrodinger cat scenario, we could assign an amplitude of 0.1 + 0.2i to the configuration |cat is dead>, and and an amplitude -0.9 + 0.37i to the configuration |cat is alive>. Obviously, these amplitudes themselves are not probabilities (you can’t have a 0.6i probability of something happening), but they can be converted into probabilities. The actual probability of finding each configuration when measured is given by the absolute value of the amplitude squared (our cat above has a ~95% survival rate).
To better understand these probability amplitudes, let’s pretend there’s only one light photon in the world, travelling to the right. The amplitude of the configuration |a photon is heading to the right> is then given by^{[2]}:
eiθ(x,t)If you didn’t do advanced math in high school, you might be scared of the whole concept of exponentiating an imaginary number. but it’s actually fairly simple.
You can plot the real and imaginary parts of a complex number as an arrow on a 2d graph of real vs imaginary components, like the one below. What eiθdoes is draw an arrow on this graph that traces a circle. To calculate it’s value, you take an arrow of length 1 pointing right (the number 1 + 0i), and then rotate it counterclockwise by an angle of θ. You can then read the new number off the graph. (Fortunately, we will avoid the need for trigonometry in this article.)
You might have heard the famous equation, eiπ=−1. In this formulation, what this means is that when you rotate the arrow by an angle of pi radians (we are using radians here where π = 180 degrees or half a circle), the arrow points to -1+0i = -1. In fact, no matter where the arrow is pointing, adding a half a circle of rotation will make it point in the opposite direction, which is the same as multiplying it by -1.
In our photon, the angle θ changes as the photon moves. The effect will be that after a certain distance, the arrow will have rotated around a bit and point in a new direction, leading to a new amplitude. This is why, in the following discussion, it’s very important that the total distance travelled by each light beam is exactly the same^{[3]}. The rotation of the arrow by a certain angle is called a “phase shift”.
Earlier I said that the way to calculate probabilities from amplitudes was to take the absolute value squared. We can think of this as taking the length of the arrow, and squaring it, to get the likelihood of the particular state.
If our psi is eiθ, this is very easy, because the length of this arrow is always 1, so we our probability of finding the photon is 12 = 1= 100%. So far so easy.
The interferometer
One photon on it’s own isn’t particularly interesting, so let’s introduce an experimental setup called an “interferometer”. It’s a system where a beam of light is split into two sections and then recombined with a series of mirrors. Only in this case, we are only firing a single photon.
To split the photon, we use a device called the “half-silver mirror”. The actual formal treatments of beam splitters requires a few math tools that aren’t worth explaining, and different beam splitters can have different setups, but for one photon on a simple half-silver mirror, the rule goes as follows:
When a half mirror is hit by part of a photon, the incoming beam is split into two separate configurations. One is “transmitted”, continuing forward, and one is “reflected”, changing direction. The amplitudes of both configurations is scaled down from the incoming amplitude by multiplying them by 1√2. (around 0.71).
If (and only if) a beam has been reflected from the front side of a half-mirror, it also undergoes a phase shift, and is multiplied again by -1. This can be thought of as rotating our amplitude arrow by an extra half circle, or by flipping it to face the opposite direction. If it’s transmitted, or reflected from the back of a half mirror, no phase shift occurs. The reason it only happens from the front is due to the rules about transferring between different dielectric materials, and this phase shift rule is required for conservation of energy.
So now the probability amplitude of the photon now has two components, describing the two possible configurations of |photon is on path A> or |photon is on path B>. Each configuration has an amplitude arrow attached, which each point in opposite directions. Both amplitudes have length 1√2. From now on, we’ll just keep track of relative phases, assuming they have gone the same distance. This also means we will not have to think about imaginary numbers for the rest of this article. So if the arrow was pointing up when it hit the mirror, then it’s pointing down in path A, but up in path B.
If you now put a detector in path A , it will find a photon with probability ((1√2)2=12 ), and same for path B. This means that there is a 50% chance of the configuration |photon in path A only>, and 50% chance of the configuration |photon in path B only>. The arrow direction still has no effect on the probability.
We call this a “superposition” of different configurations, because if you look, you will never find half a photon in either path. It will always be like the whole photon went this way or that way. This is why the photon cannot be only a wave: if it was, then you would be able to find “half photons”.
Next, both beams bounce off full mirrors and change direction. Since no splitting occurs, no scaling occurs either, but they do both undertake another phase shift of half a circle (multiply by -1). This doesn’t affect anything. After all, both light particles are constantly changing arrow direction, what matters is their relative phase. At this point, when they have travelled equal distance, they are still pointing in opposite directions.
Now, each of the beams hit the next half mirror, and things get interesting.
First, each of the 2 beams has been split again into 2, so we now have 4 new amplitudes to think about, for all the different paths of the light. The length of each will be 1√2 times 1√2 = 12.
At last, we get to the relevance of the arrow directions! If we say the path A amplitude is pointing up (at the point of mirror launch), then the path B amplitude is pointing down.
The path A beam (arrow pointing up) hits the back of the half-mirror (no phase shift), so it transmits one arrow into detector 1, pointing up, and reflects one arrow to detector 2, pointing up.
The path B beam (arrow pointing down) hits the front of the half mirror, so it transmits one arrow into detector 2, pointing down, and reflects one arrow to detector 1, which undergoes a phase shift of -1 and points up.
Now we get to the crux. Nature does not label photons, and does not care about their history. So, while bits of the photon have taken different paths, with different amplitudes, there are only two possible final configurations: Either there is a photon at detector 1 or a photon at detector 2. If two different paths lead to the same configuration, to figure out the amplitude of that configuration, you just add the amplitude of the contributing paths.
In detector 1, we get an up arrow and and a down arrow. To get the amplitude of the resulting configuration |photon went into detector 1>, we add them together. But since one is the negative of the other, they cancel out (“destructively interfere”), and add to 0.
In detector 2, we get two up arrows of length 1/2. To get the amplitude of the resulting configuration |photon went into detector 2>, they add together (constructively interfere), to produce a final amplitude arrow pointing up with length 1.
What is the probability of finding the photon in each detector? We square the length of the arrows, and you get 12 = 1 in the detector 2, and 02 = 0 in detector 1. So the result is that there is a 100% chance of finding the photon in detector 2, and no chance in detector 1.^{[4]}
So the photon has interfered with itself into nothing, like a wave. However, no matter what you do the detector never sees the photon energy “split” between detectors, like a particle. (We can do similar experiments with bulky things like electrons). To rescue the idea that it must be one or the other, you might propose that there are secret instructions in the photon telling it about what actions to take, that just so happen to end up at detector D1 for some reason. This can run into problems like the following:
Imagine if we put a block into one of the paths, and recalculated where the photons would end up:
The block cuts off the beam from path A, but allows path B through. Before the block, you would never find a photon in Detector 2, but now it shows up there in a quarter of experiments. (in another quarter of experiments, D1 will go off, and in the remaining half of experiments, the photon was blocked on path A)
But wait, a photon going through path B never saw path A! An obstacle in the path not travelled has affected the behavior of a photon in another path. You can take this further: you can place this blocker in the path after the initial split has happened (but before the alternate path would have hit it), and the result will be the same. This only really makes sense if the photon really is split in between the two paths in some way. There is no secret information in the photon that can know in advance whether you will place a block in the path not taken in the future.^{[5]}
Two photon interference and joint configurations
So far we have only been looking at a single photon. But a large part of the weirdness of quantum mechanics occurs when we are looking at the interaction between different particles.
Again, we will use a half mirror, but this time, we send two photons in instead of one.
This might look similar to the second half-mirror in our interferometer example. But actually, the presence of an extra photon complicates things quite a bit. In our 1 photon example, we were going from a superposition of two possible configurations:
superposition of |photon in path A> and |photon in path B>
to another superposition of two possible configurations:
superposition of |photon in path D1> and |photon in path D2>
In our new case, we are going from the configuration:
|one photon in path A and one photon in path B>
to a superposition of three possible configurations:
superposition of |there are 2 photons in D1> and |there is 1 photon in D1 and another in D2> and |there are 2 photons in D2>
This means you can’t quite use the same rules from earlier. We can say that each photon can either transmit or reflect when it hits the mirror. There are four paths (we’ll assume the incoming amplitudes are both 1, and we keep our rule that only reflection from the front induces phaseshift):
Both photons are reflected: state is |1 photon in D1, 1 photon in D2>, phase shift occurs, amplitude −12
Photon A is transmitted, photon B is reflected: state is|2 photons in D1>, phase shift occurs, amplitude −1√2
Photon A is reflected, photon B is transmitted: state is|2 photons in D2>, no phase shift, amplitude 1√2
Both photons are transmitted: state is |1 photon in D1, 1 photon in D2>, no phase shift, amplitude 12
The origins of the 1√2 and 12 terms are a little too complicated to be worth explaining, and come from the math of dealing with multiple photons. (see this paper, or this wikipedia article for a formal treatment).
If we remember from before, interference only occurs in cases where multiple paths lead to the same configuration. This only applies to paths 1 and 4, which both end up in the configuration |photon in D1, photon in D2>. So we add the amplitudes 12 and −12, which adds to 0 amplitude of that state (destructive interference again).
After squaring each amplitude, we end up with a 50% chance of finding both photons in D1, a 50% chance of finding both photons in D2, and 0% chance of finding one in each. You either find 2 photons in D1 or D2, but never find them split. This has been experimentally verified!
Now just stop for a second to wrap your head around this. There was destructive interference between the two paths 1 and 4. You can’t decompose this into one part-photon interfering with another part-photon (go ahead, try). The state of the two photons are entangled. What has interfered are two scenarios involving the photons. The scenario of “both reflected” has bumped into “both transmitted”, leaving nothing left.
It doesn’t stop at two photons, though. In principle, every element and property of a system (and possibly the entire universe) can be bundled up into one, universal, extremely high dimensional wavefunction, which can represent every possible combination of properties for every object, including continuous properties like particle position. Every possible configuration of all of the elements of the system will have an amplitude, and we can predict how these amplitudes evolve over time using the Schrodinger equation, a tool that allows us to build up all of chemistry and material science. I cover a bit about the practical applications in this article.
Sensors, collapse, and decoherence
Let’s return back to our interferometer and do one last test. Take our initial test, where all the photons are found in D1. Add a sensor next to the one of the paths, that will return a definite Yes if the photon has gone nearby, and a definite No if the photon hasn’t, without affecting the photon path. Surely this can’t affect the results?
Despite not touching the path of the photons at all, suddenly there will be no interference, and you will see 50% of the time the photon in detector 1, and 50% in detector 2.
One way to think of this is that when the probability amplitude of the part of the photon on path B reaches S, the “wavefunction collapsed”. Instead of being spread between between the two paths, it’s now either 100% in path B and S is “yes”, or 100% in path A and S is “no”. If we think in terms of our arrows, one of the arrows has disappeared, and the other one has stretched out to length 1 to compensate. There is now no “other component” to bounce off of, so there is no interference and 50% detection in each detector. Note that this does not require conscious observation: we don’t need to read the sensor for this change to occur.
A lot of people think this collapse must be some fictional construct like centrifugal force, because it’s acts in ways that are extremely out of line with the other laws of physics (discontinuous, instantaneous, faster than light, etc).
And in fact, we can get rid of the apparent collapse in this scenario. Another way of thinking about the sensor case is that the sensor has become entangled with the photons, and now the configurations arising from each path are distinguishable. Recall that I said that amplitudes add if they lead to the same configuration. But if we include the sensor in our framework, we have different configurations. We have :
They are all different configurations, so no amplitude adding occurs, so the direction of the arrows is once again irrelevant. We get a probability of 0.52 = 25% of each outcome, and if we ignore S, we get 50% in D1 and 50% in D2 as predicted. This is the basis of decoherence. The actual theory allows for a lot of little bumps by the environment to have a similar effect.
So, we can see that by taking into account the entanglement of the photon and the sensor, the apparent collapse we saw earlier goes away. Some have theorized that this just keeps going: your measuring equipment gets entangled with S, and then you get entangled with the measuring equipment and S and the photon, so you’ll end up with superpositions of states like:
A: |S says Yes and equipment measures Yes, and You see Yes on the screen>
B: |S says No and equipment measures No, and You see No on the screen>
In this way of thinking (the most commonly told version of many worlds), the You that sees no and the You that sees yes both exist in different worlds, and are both as equally real as the You reading this post. This is an elegant theory, but an incomplete one: we have to explain what it means for the you in state A to have probability 70% and the you in state B to have probability 30%. There have been a lot of attempts to resolve this, but debate is still ongoing as to whether any are likely to be successful. I will have more to say on this in a future article.
On the other hand, if you reject many worlds, you have to explain why entanglement doesn’t go on all the way up to humans, which is also a matter of heated debate. There are a lot of interpretations of quantum mechanics, which all try to replicate reality while biting the least egregious philosophical bullets.
The majority of physicists I know are in the agnostic camp: quantum interpretations may be fun lunchroom conversation, but I already have my hands full with the devilish details of my own problem that is applicable and testable. I’ll just use whatever formalism is easiest to calculate with, and leave the true nature of reality to someone else.
Part 2: Why the math in the sequences is wrong
The reason for writing this up is that a series (1,2,3) of extremely popular blog posts by Eliezer Yudkowsky have covered the same topic, and in my opinion, made a complete mess of it. He covers the same topics I did in this post, but in longer time, making numerous errors, and leaving out important context. They are old, but since they are part of the “sequences” that form the foundational text of the Rationalist movement, they are still read a lot.
In his account, Yudkowsky introduces amplitudes, complex numbers, and the interferometer. The bulk of the error comes when he introduces the half silver mirror with the following description:
Remember, the actual half-silver rule (for a single photon) is: transmit one beam forward and reflect one beam 90 degrees, and multiply their amplitudes by 1√2 . If the beam is reflected from the front of the half mirror, also multiply the reflected beam by -1.
The fictional rule is this : transmit one beam forward and reflect one beam 90 degrees. if the beam is reflected (front or back), multiply the amplitude by i.
Up until now, most of the critique has come on this phase shift of i instead of -1, but actually this is actually a fairly minor error. I’m pretty sure a regular half-silver mirror will not do this, but a beamsplitter which induces a phase shift of i on every reflection is theoretically possible, if you carefully engineer a device that induces the correct phase changes on reflection and transmission. You can find plenty of textbooks and online explainers that will use this type of beamsplitter to explain the experiment.
What’s not forgivable is the failure to scale the amplitudes down by 1√2 . By leaving this bit out, this fictional system violates conservation of energy and probability at every half mirror split.
A footnote “correcting” the errors in the post reads:
Unfortunately, this is also wrong! Firstly, it doesn’t just multiply by -1, it multiplies by −1√2 . And secondly, it only does this when reflecting from the front, not the back. Without that second part, you again violate conservation of energy, and the experiment will pop an extra photon into existence. I think this “correction” is worse than the original text!
There is another major difference between the real system and this fictional one. To get the probability of an amplitude in the real system, we just take the length of the amplitude and we square it, and it spits out the probability of that configuration as is.
In the fictional system, we also square the amplitudes, but because he left out all the scaling factors, we now we sometimes get values over 1. He states we have to renormalize these: if D1 has magnitude squared of 1 and D2 has magnitude squared of 4, then D1 occurs probability 1/5 and D2 occurs probability 4/5.
I think that Yudkowsky might have thought this last change would eliminate the need for keeping track of all the 1/√2 stuff. This is not true, and I will show a few cases where it breaks down.
So, this fictional system is close to reality, except the phase shift acts differently, and the magnitude of each outgoing amplitude doesn’t decrease (violating conservation of energy).
For the specific case of the basic interferometer, this alternate system does gives the same answer as reality. But let me ask you some questions about modifications to our experiment. If you want to check your understanding, try and guess the answers to these questions in the real system. You should have everything you need to figure them out.
1.What happens if you flip one of the half-mirrors around?
2.What happens if you replace the bottom full mirror with a half-mirror?
3.Replace the blocker in the blocker scenario with a third detector as shown below. What are the ratios of photons in each detector?
For change 1: Flipping the half-mirror would mean there was no phase change in path A. The result would be that the photons would be found in detector 2 instead of detector 1^{[6]}. In Yudkowsky’s formalism, the two sides are identical, so nothing would change. This one is a fairly minor critique, easily fixed by changing the words “half silver mirror” to “symmetric beamsplitter”.
For change 2: if you replace one of the full mirrors with half mirrors. in reality, part of the photon would transmit off into space, while another part would reflect and continue on to the detector setup, with both amplitudes reduced by a further 1√2 factor. Due to this reduced factor, there would only be partial destructive interference: by my calculations, you’d find photons 72.9% of the time in D2, 2.1% in D1, and 25% off in space.
In the fictional system, the amplitude of path B will not be scaled down compared to path A when it hits the extra half-mirror. This means that the portion that heads off to recombine with path A will act the same way it does in the original experiment, leading to amplitude 1 detector 1. The portion that got transmitted off into space would also have amplitude 1, so using the fictional renormalisation rules, you’d see 50% of the photons there, and 50% in detector 1.
Change 3: With the detector 3 acting as a blocker, no interference will occur, so we don’t have to worry about any phase shifts. there is a 1√2 amplitude going into detector 3, so we square that to get a 50% chance of getting a photon there. Detectors 1 and 2 are symmetrical with amplitude 12 , so we square that to get a 25% chance of each of them going off.
In the fictional system, the amplitudes have not been scaled. At D3, the amplitude is i, at D2 the amplitude is i, and at D3 the amplitude is -1. So we get probability 1 for finding the photon in D1, and another probability of 1 to find it in D2, and another probability of 1 to find it in D3. Renormalising, we get the incorrect answer that there is an equal 33% chance of finding the photon in each detector.
In all three cases, the fictional system provides inaccurate answers to quite simple modifications.
It’s pretty clear what happened here is that Yudkoswky messed up the math and physics, but didn’t realize it because it gave the right answer, or thought that his “simplifications” wouldn’t affect the results. This is preferable to the alternative, that he knew it was wrong but just didn’t care.
What about the defense that the fictional system is simpler? I think when it comes to the phase shift of i compared to the phase shift of -1, it’s a matter of taste. I prefer my way because it means you only need to keep track of arrow up and arrow down, rather than rotating by 90 degrees every time, and reduces the need to think about complex numbers.
On the other side, not including the 1√2 factor does simplify things, but at great cost. Because it means that from the first split onwards, we aren’t talking about amplitudes at all. Amplitudes obey the law of conservation of energy and probability, these fictional numbers do not. And this leads to the false predictions we see earlier. It basically means that the reader that learns these rules can’t make accurate predictions with them. It also makes it harder for them to grasp followup posts on deeper questions of interpretations: the scaling of amplitudes is incredibly important when discussing things like the Born rule.
The primary sin of these articles is that they introduce just enough math to be confusing and intimidating to the layman, but not enough to be actually correct or useful. This is the worst of both worlds. Learning this fictional system may almost be detrimental for learning the actual reality, if you don’t realize the problems with it.
The other sin is that a lot of context is left out, so that the layman doesn’t know where to look to get more information. He states that the mirror “multiplies by i”, but if a reader wants to know why this happens, they are in the dark. The word “phase” does not appear once in any of the three articles, despite phase shifts being the main cause of the described effects.
I’m aware that LessWrong is not meant to be a physics textbook, and that simplification is a necessary part of science communication (I’ve simplified plenty here myself). I really do admire the effort to dive into and communicate how quantum calculations actually operate. But if you’re going to ask people to trust you and to give you their time and attention, you have a responsibility not to tell them the wrong answers. I hope this article goes some way towards rectifying this mistake.
For the origin of these odd brackets, look up “bra ket notation”, or watch this youtube video.
Technically this would be a wavepacket and I’d have to include spatial information of how it’s spread out, but we simplify this for the sake of clarity.
You can also get it to work if the difference in path causes theta to go around an entire circle, so the phase shift of the extra distance is e0i = 1 and the phase remains unchanged.
(In reality, you’ll never be this exact with your paths lengths, so you won’t get it down to exactly 0%, but can make it pretty close).
Unless you’re a superdeterminist, but let’s not go down that route.
Actually, this is slightly more complicated if you take into account the thickness of glass. In this case one of the paths would spend more time in the glass of the mirrors of the mirror than the other one, so you’d get additional phase shifts depending on glass thickness which means the result could be anything. This isn’t the case in the regular scenario because it’s symmetrical: both paths spend equal time inside the glass. See here for a more in depth explanation.