Spooky Action at a Distance: The No-Communication Theorem

Eliezer Yudkowsky

Previously in series: Bell's Theorem: No EPR "Reality"

When you have a pair of entangled particles, such as oppositely polarized photons, one particle seems to somehow "know" the result of distant measurements on the other particle. If you measure photon A to be polarized at 0°, photon B somehow immediately knows that it should have the opposite polarization of 90°.

Einstein famously called this "spukhafte Fernwirkung" or "spooky action at a distance". Einstein didn't know about decoherence, so it seemed spooky to him.

Though, to be fair, Einstein knew perfectly well that the universe couldn't really be "spooky". It was a then-popular interpretation of QM that Einstein was calling "spooky", not the universe itself.

Let us first consider how entangled particles look, if you don't know about decoherence—the reason why Einstein called it "spooky":

Suppose we've got oppositely polarized photons A and B, and you're about to measure B in the 20° basis. Your probability of seeing B transmitted by the filter (or absorbed) is 50%.

But wait! Before you measure B, I suddenly measure A in the 0° basis, and the A photon is transmitted! Now, apparently, the probability that you'll see B transmitted is 11.6%. Something has changed! And even if the photons are light-years away, spacelike separated, the change still occurs.

You might try to reply:

"No, nothing has changed—measuring the A photon has told you something about the B photon, you have gained knowledge, you have carried out an inference about a distant object, but no physical influence travels faster-than-light.

"Suppose I put two index cards into an envelope, one marked '+' and one marked '-'. Now I give one envelope to you, and one envelope to a friend of yours, and you get in a spaceship and travel a few light-years away from each other, and then you open your envelope and see '+'. At once you know that your friend is holding the envelope marked '-', but this doesn't mean the envelope's content has changed faster than the speed of light.

"You are committing a Mind Projection Fallacy; the envelope's content is constant, only your local beliefs about distant referents change."

Bell's Theorem, covered yesterday, shows that this reply fails. It is not possible that each photon has an unknown but fixed individual tendency to be polarized a particular way. (Think of how unlikely it would seem, a priori, for this to be something any experiment could tell you!)

Einstein didn't know about Bell's Theorem, but the theory he was criticizing did not say that there were hidden variables; it said that the probabilities changed directly.

But then how fast does this influence travel? And what if you measure the entangled particles in such a fashion that, in their individual reference frames, each measurement takes place before the other?

These experiments have been done. If you think there is an influence traveling, it travels at least six million times as fast as light (in the reference frame of the Swiss Alps). Nor is the influence fazed if each measurement takes place "first" within its own reference frame.

So why can't you use this mysterious influence to send signals faster than light?

Here's something that, as a kid, I couldn't get anyone to explain to me: "Why can't you signal using an entangled pair of photons that both start out polarized up-down? By measuring A in a diagonal basis, you destroy the up-down polarization of both photons. Then by measuring B in the up-down/left-right basis, you can with 50% probability detect the fact that a measurement has taken place, if B turns out to be left-right polarized."

It's particularly annoying that nobody gave me an answer, because the answer turns out to be simple: If both photons have definite polarizations, they aren't entangled. There are just two different photons that both happen to be polarized up-down. Measuring one photon doesn't even change your expectations about the other.

Entanglement is not an extra property that you can just stick onto otherwise normal particles! It is a breakdown of quantum independence. In classical probability theory, if you know two facts, there is no longer any logical dependence left between them. Likewise in quantum mechanics, two particles each with a definite state must have a factorizable amplitude distribution.

Or as old-style quantum theory put it: Entanglement requires superposition, which implies uncertainty. When you measure an entangled particle, you are not able to force your measurement result to take any particular value. So, over on the B end, if they do not know what you measured on A, their probabilistic expectation is always the same as before. (So it was once said).

But in old-style quantum theory, there was indeed a real and instantaneous change in the other particle's statistics which took place as the result of your own measurement. It had to be a real change, by Bell's Theorem and by the invisibly assumed uniqueness of both outcomes.

Even though the old theory invoked a non-local influence, you could never use this influence to signal or communicate with anyone. This was called the "no-signaling condition" or the "no-communication theorem".

Still, on then-current assumptions, they couldn't actually call it the "no influence of any kind whatsoever theorem". So Einstein correctly labeled the old theory as "spooky".

In decoherent terms, the impossibility of signaling is much easier to understand: When you measure A, one version of you sees the photon transmitted and another sees the photon absorbed. If you see the photon absorbed, you have not learned any new empirical fact; you have merely discovered which version of yourself "you" happen to be. From the perspective at B, your "discovery" is not even theoretically a fact they can learn; they know that both versions of you exist. When B finally communicates with you, they "discover" which world they themselves are in, but that's all. The statistics at B really haven't changed—the total Born probability of measuring either polarization is still just 50%!

A common defense of the old theory was that Special Relativity was not violated, because no "information" was transmitted, because the superluminal influence was always "random". As some Hans de Vries fellow points out, information theory says that "random" data is the most expensive kind of data you can transmit. Nor is "random" information always useless: If you and I generate a million entangled particles, we can later measure them to obtain a shared key for use in cryptography—a highly useful form of information which, by Bell's Theorem, could not have already been there before measuring.

But wait a minute. Decoherence also lets you generate the shared key. Does decoherence really not violate the spirit of Special Relativity?

Decoherence doesn't allow "signaling" or "communication", but it allows you to generate a highly useful shared key apparently out of nowhere. Does decoherence really have any advantage over the old-style theory on this one? Or are both theories equally obeying Special Relativity in practice, and equally violating the spirit?

A first reply might be: "The shared key is not 'random'. Both you and your friend generate all possible shared keys, and this is a deterministic and local fact; the correlation only shows up when you meet."

But this just reveals a deeper problem. The counter-objection would be: "The measurement that you perform over at A, splits both A and B into two parts, two worlds, which guarantees that you'll meet the right version of your friend when you reunite. That is non-local physics—something you do at A, makes the world at B split into two parts. This is spooky action at a distance, and it too violates the spirit of Special Relativity. Tu quoque!"

And indeed, if you look at our quantum calculations, they are written in terms of joint configurations. Which, on reflection, doesn't seem all that local!

But wait—what exactly does the no-communication theorem say? Why is it true? Perhaps, if we knew, this would bring enlightenment.

Here is where it starts getting complicated. I myself don't fully understand the no-communication theorem—there are some parts I think I can see at a glance, and other parts I don't. So I will only be able to explain some of it, and I may have gotten it wrong, in which case I pray to some physicist to correct me (or at least tell me where I got it wrong).

When we did the calculations for entangled polarized photons, with A's polarization measured using a 30° filter, we calculated that the initial state

√(1/2) * ( [ A=(1 ; 0) ∧ B=(0 ; 1) ] - [ A=(0 ; 1) ∧ B=(1; 0) ] )

would be decohered into a blob for

( -(√3)/2 * √(1/2) * [ A=(-(√3)/2 ; 1/2) ∧ B=(0 ; 1) ] )
- ( 1/2 * √(1/2) * [ A=(-(√3)/2 ; 1/2) ∧ B=(1; 0) ] )

and symmetrically (though we didn't do this calculation) another blob for

( 1/2 * √(1/2) * [ A=(1/2 ; (√3)/2) ∧ B=(0 ; 1) ] )
- ( (√3)/2 * √(1/2) * [ A=(1/2 ; (√3)/2) ∧ B=(1; 0) ] )

These two blobs together add up, linearly, to the initial state, as one would expect. So what changed? At all?

What changed is that the final result at A, for the first blob, is really more like:

(Sensor-A-reads-"ABSORBED") * (Experimenter-A-sees-"ABSORBED") *
{ ( -(√3)/2 * √(1/2) * [ A=(-(√3)/2 ; 1/2) ∧ B=(0 ; 1) ] )
-( 1/2 * √(1/2) * [ A=(-(√3)/2 ; 1/2) ∧ B=(1; 0) ] ) }

and correspondingly with the TRANSMITTED blob.

What changed is that one blob in configuration space, was decohered into two distantly separated blobs that can't interact any more.

As we saw from the Heisenberg "Uncertainty Principle", decoherence is a visible, experimentally detectable effect. That's why we have to shield quantum computers from decoherence. So couldn't the decohering measurement at A, have detectable consequences for B?

But think about how B sees the initial state:

√(1/2) * ( [ A=(1 ; 0) ∧ B=(0 ; 1) ] - [ A=(0 ; 1) ∧ B=(1; 0) ] )

From B's perspective, this state is already "not all that coherent", because no matter what B does, it can't make the A=(1 ; 0) and A=(0 ; 1) configurations cross paths. There's already a sort of decoherence here—a separation that B can't eliminate by any local action at B.

And as we've earlier glimpsed, the basis in which you write the initial state is arbitrary. When you write out the state, it has pretty much the same form in the 30° measuring basis as in the 0° measuring basis.

In fact, there's nothing preventing you from writing out the initial state with A in the 30° basis and B in the 0° basis, so long as your numbers add up.

Indeed this is exactly what we did do, when we first wrote out the four terms in the two blobs, and didn't include the sensor or experimenter.

So when A permanently decohered the blobs in the 30° basis, from B's perspective, this merely solidified a decoherence that B could have viewed as already existing.

Obviously, this can't change the local evolution at B (he said, waving his hands a bit).

Now this is only a statement about a quantum measurement that just decoheres the amplitude for A into parts, without A itself evolving in interesting new directions. What if there were many particles on the A side, and something happened on the A side that put some of those particles into identical configurations via different paths?

This is where linearity and unitarity come in. The no-communication theorem requires both conditions: in general, violating linearity or unitarity gives you faster-than-light signaling. (And numerous other superpowers, such as solving NP-complete problems in polynomial time, and possibly Outcome Pumps.)

By linearity, we can consider parts of the amplitude distribution separately, and their evolved states will add up to the evolved state of the whole.

Suppose that there are many particles on the A side, but we count up every configuration that corresponds to some single fixed state of B—say, B=(0 ; 1) or B=France, whatever. We'd get a group of components which looked like:

(AA=1 ∧ AB=2 ∧ AC=Fred ∧ B=France) +
(AA=2 ∧ AB=1 ∧ AC=Sally ∧ B=France) + ...

Linearity says that we can decompose the amplitude distribution around states of B, and the evolution of the parts will add to the whole.

Assume that the B side stays fixed. Then this component of the distribution that we have just isolated, will not interfere with any other components, because other components have different values for B, so they are not identical configurations.

And unitary evolution says that whatever the measure—the integrated squared modulus—of this component, the total measure is the same after evolution at A, as before.

So assuming that B stays fixed, then anything whatsoever happening at A, won't change the measure of the states at B (he said, waving his hands some more).

Nor should it matter whether we consider A first, or B first. Anything that happens at A, within some component of the amplitude distribution, only depends on the A factor, and only happens to the A factor; likewise with B; so the final joint amplitude distribution should not depend on the order in which we consider the evolutions (and he waved his hands a final time).

It seems to me that from here it should be easy to show no communication considering the simultaneous evolution of A and B. Sadly I can't quite see the last step of the argument. I've spent very little time doing actual quantum calculations—this is not what I do for a living—or it would probably be obvious. Unless it's more subtle than it appears, but anyway...

Anyway, if I'm not mistaken—though I'm feeling my way here by mathematical intuition—the no-communication theorem manifests as invariant generalized states of entanglement. From B's perspective, they are entangled with some distant entity A, and that entanglement has an invariant shape that remains exactly the same no matter what happens at A.

To me, at least, this suggests that the apparent non-locality of quantum physics is a mere artifact of the representation used to describe it.

If you write a 3-dimensional vector as "30° west of north, 40° upward slope, and 100 meters long," it doesn't mean that the universe has a basic compass grid, or that there's a global direction of up, or that reality runs on the metric system. It means you chose a convenient representation.

Physics, including quantum physics, is relativistically invariant: You can pick any relativistic frame you like, redo your calculations, and always get the same experimental predictions back out. That we know.

Now it may be that, in the course of doing your calculations, you find it convenient to pick some reference frame, any reference frame, and use that in your math. Greenwich Mean Time, say. This doesn't mean there really is a central clock, somewhere underneath the universe, that operates on Greenwich Mean Time.

The representation we used talked about "joint configurations" of A and B in which the states of A and B were simultaneously specified. This means our representation was not relativistic; the notion of "simultaneity" is arbitrary. We assumed the universe ran on Greenwich Mean Time, in effect.

I don't know what kind of representation would be (1) relativistically invariant, (2) show distant entanglement as invariant, (3) directly represent space-time locality, and (4) evolve each element of the new representation in a way that depended only on an immediate neighborhood of other elements.

But that representation would probably be a lot closer to the Tao.

My suspicion is that a better representation might take its basic mathematical objects as local states of entanglement. I've actually suspected this ever since I heard about holographic physics and the entanglement entropy bound. But that's just raw speculation, at this point.

However, it is important that a fundamental representation be as local and as simple as possible. This is why e.g. "histories of the entire universe" make poor "fundamental" objects, in my humble opinion.

And it's why I find it suspicious to have a representation for calculating quantum physics that talks about a relativistically arbitrary "joint configuration" of A and B, when it seems like each local position has an invariant "distant entanglement" that suffices to determine local evolution. Shouldn't we be able to refactor this representation into smaller pieces?

Though ultimately you do have to retrieve the phenomenon where the experimenters meet again, after being separated by light-years, and discover that they measured the photons with opposite polarizations. Which is provably not something you can get from individual billiard balls bopping around.

I suspect that when we get a representation of quantum mechanics that is local in every way that the physics itself is local, it will be immediately obvious—right there in the representation—that things only happen in one place at a time.

Hence, no faster-than-light communicators. (Dammit!)

Now of course, all this that I have said—all this wondrous normality—relies on the decoherence viewpoint.

It relies on believing that when you measure at A, both possible measurements for A still exist, and are still entangled with B in a way that B sees as invariant.

All the amplitude in the joint configuration is undergoing linear, unitary, local evolution. None of it vanishes. So the probabilities at B are always the same from a global standpoint, and there is no supraluminal influence, period.

If you tried to "interpret" things any differently... well, the no-communication theorem would become a lot less obvious.

Part of The Quantum Physics Sequence

Next post: "Decoherence is Simple"

Previous post: "Bell's Theorem: No EPR 'Reality'"

Eliezer, I know your feelings about density matrices, but this is exactly the sort of thing they were designed for. Let ρAB be the joint quantum state of two systems A and B, and let UA be a unitary operation that acts only on the A subsystem. Then the fact that UA is trace-preserving implies that TrA[UA ρAB UA*] = ρB, in other words UA has no effect whatsoever on the quantum state at B. Intuitively, applying UA to the joint density matrix ρAB can only scramble around matrix entries within each "block" of constant B-value. Since UA is unitary, the trace of each of these blocks remains unchanged, so each entry (ρB)ij of the local density matrix at B (obtained by tracing over a block) also remains unchanged. Since all we needed about UA was that it was trace-preserving, this can readily be generalized from unitaries to arbitrary quantum operations including measurements. There, we just proved the no-communication theorem, without getting our hands dirty with a single concrete example! :-)

Scott, I am sure that would be a deeply satisfying explanation, and moreover, I would be able to find a nice concrete example by which I could explain it to all my readers, if only I knew what the hell a density matrix means, physically. Not how to define it, what it means. This information seems to have been left out of physics textbooks and Wikipedia.

I have never really had time to sit down and just study QM properly. I'm sure that the meaning of a density matrix will be completely obvious in retrospect. I'm sure that I will slap myself on the forehead for not getting earlier. And I'm sure that, once I finally get it, I will be filled with the same feeling of absolute indignation that overtook me when I realized why the area under a curve is the anti-derivative, realized how truly beautiful it was, and realized that this information had not been mentioned anywhere in my goddamned calculus textbook. Why?

Another late response from me as I read through this series again:

"I realized why the area under a curve is the anti-derivative, realized how truly beautiful it was"

Would this be that the curve is the rate-of-change of the area (as the curve goes up, so does the area beneath it)?

Scott: I'm a bit confused by what you're doing here: TrA[UA ρAB UA*] = ρB

Specifically, I understand what a trace is, but what's TrA mean? ie, I'm guessing it's not a scalar given that it looks like multiplying (if that's the intended operation) it by the transformed density matrix of AB gives you a density matrix of B, which presumably has fewer dimensions.

Psy-Kosh: TrA just means the operation that "traces out" (i.e., discards) the A subsystem, leaving only the B subsystem. So for example, if you applied TrA to the state |0〉|1〉, you would get |1〉. If you applied it to |0〉|0〉+|1〉|1〉, you would get a classical probability distribution that's half |0〉 and half |1〉. Mathematically, it means starting with a density matrix for the joint quantum state ρAB, and then producing a new density matrix ρB for B only by summing over the A-indices (sort of like tensor contraction in GR, if that helps).

Eliezer: The best way I can think of to explain a density matrix is, it's what you'd inevitably come up with if you tried to encode all information locally available to you about a quantum state (i.e., all information needed to calculate the probabilities of local measurement outcomes) in a succinct way. (In fact it's the most succinct possible way.)

You can see it as the quantum generalization of a probability distribution, where the diagonal entries represent the probabilities of various measurement outcomes if you measure in the "standard basis" (i.e., whatever basis the matrix happens to be presented in). If you measure in a different orthogonal basis, identified with some unitary matrix U, then you have to "rotate" the density matrix ρ to UρU before measuring it (where U is U's conjugate transpose). In that case, the "off-diagonal entries" of ρ (which intuitively encode different pairs of basis states' "potential for interfering with each other") become relevant.

If you understand (1) why density matrices give you back the usual Born rule when ρ=|ψ〉〈ψ| is a pure state, and (2) why an equal mixture of |0〉 and |1〉 leads to exactly the same density matrix as an equal mixture of |0〉+|1〉 and |0〉-|1〉, then you're a large part of the way to understanding density matrices.

One could argue that density matrices must reflect part of the "fundamental nature of QM," since they're too indispensable not to. Alas, as long as you insist on sharply distinguishing between the "really real" from the "merely mathematical," density matrices might always cause trouble, since (as we were discussing a while ago) a density matrix is a strange sort of hybrid of amplitude vector with probability distribution, and the way you pick apart the amplitude vector part from the probability distribution part is badly non-unique. Think of someone who says: "I understand what a complex number does -- how to add and multiply one, etc. -- but what does it mean?" It means what it does, and so too with density matrices.

Think of someone who says: “I understand what a complex number does—how to add and multiply one, etc. -- but what does it mean?” It means what it does, and so too with density matrices.

But a complex number does mean something intuitive: it represents a rotation, or something like a rotation in whatever system is at hand. Indeed once you understand this it becomes so much easier to work with complex numbers... I too would like an intuition for density matrices that matches.

As for your pedagogical question, Eliezer -- well, the gift of explaining mathematical concepts verbally is an incredibly rare one (I wish every day I were better at it). I don't think most textbook writers are being deliberately obscure; I just think they're following the path of least resistance, which is to present the math and hope each individual reader (after working it through) will have his or her own forehead-slapping "aha!" moment. Often (as with your calculus textbook) that's a serious abdication of authorial responsibility, but in some cases there might really not be any faster way.

To echo Scott, a density matrix is a probability distribution over quantum states - over a set of basis states, specifically. But if you pick a new basis, the same density matrix resolves into a different probability distribution over a different set of quantum states. If you believe that reality does reduce to amplitude flows in configuration space, then that means that one basis, the position basis, is the real one (since it corresponds to different possible states of reality, i.e. distributions of amplitude in configuration space); you can think of a density matrix as a probability distribution, period, and you're done. Density matrices will mean trouble if and only if you want to think of various incompatible choices of basis as equally real.

I would suggest that understanding superdense coding is another test of whether one's explanation of the significance of entanglement works. There is no purely quantum communication at a distance, but there can be a quantum rider on an otherwise classical communication channel.

Not exactly. By writing down a density matrix you specify a special basis (via the eigenvectors) and cannot change this basis and still have some meaning. Eliezer, back in the beginning of the series you wrote about classical phase space and that it is an optional addition to classical mechanics, while QM inherently takes place in configuration space. Well, the density matrix is sort of the same addition to QM. Whether we have (in some basis) a simple state or a linear combination thereof is a physical fact that does not say anything about our beliefs. But if we want to express that a system could be in one of several states with different probabilities (or, equivalently, a lot of equally prepared systems are in one state each and the percentage of systems in a particular state is given), you use a density matrix. Because a hermitian matrix encodes exactly two informations: it picks an orthogonal basis (the eigenvectors) and stores a number for each of those basis vectors (the eigenvalue). In the case of the density matrix these are the possible states with their probabilities. By the Way, that's also why we use hermitian operators/matrices to represent observables: they specify the states the measurement is designed to distinguish together with the results of the measurement in each case.

when I realized why the area under a curve is the anti-derivative, realized how truly beautiful it was, and realized that this information had not been mentioned anywhere in my goddamned calculus textbook. Why?

Have you seen better since? Please can anyone recommend a high quality, intuitive, get-what-it-actually-means, calculus tutorial or textbook, preferably with exercises, and if so, please could you share the link?

While I'm asking, same question for Statistical hypothesis testing. Thanks.

14 years too late, but I can never pass on an opportunity to recommend "Essence of Calculus" by 3blue1brown on youtube.

It is a series of short clips, explaining Calculus concepts and core ideas without too much formalism and with plenty of geometric examples.

Eliezer; My suspicion is that a better representation might take its basic mathematical objects as local states of entanglement.

A local state of entanglement is a bit of an oxymoron. Though you could take one part of an entangled state and trace over all distant degrees of freedom. You'd end up with a "reduced density matrix". If you did that for each part separately, you'd have a set of reduced density matrices. To reconstitute the whole, you need some extra information as well, about how they fit together. In any case, that would be one way to pursue this program; a potentially more sophisticated version of searching for the "elements of reality" in the tensor factors of the global quantum state, which is a bit like using entanglement to define what is local, rather than vice versa.

If I had the time myself (and maybe I'll make the time), I would be trying to pursue this line of thought in the context of "M(atrix) theory", which I believe is a limit of string theory that reduces to point objects ("D0-branes") connected by a web of strings (the rows and columns of the capital-M Matrix correspond to the D0-branes, the matrix elements to the strings connecting them). There are many people who think that those are the fundamental degrees of freedom of string theory, and it has the Machian, particulate, geometry-independent feel that one might expect of the bottom level. You would then be trying to piece together the quantum state of the universe from reduced D0-brane density matrices, basically. But I think that if you could do this, you wouldn't need the many-worlds perspective any more. These quasilocal component quantum states would not be further reduced to amplitude distributions on little configuration spaces; they would be the ultimate states of things themselves.

Mitchell: No, even if you want to think of the position basis as the only "real" one, how does that let you decompose any density matrix uniquely into pure states? Sure, it suggests a unique decomposition of the maximally mixed state, but how would you decompose (for example) ((1/2,1/4),(1/4,1/2))?

Scott: Think of someone who says: "I understand what a complex number does -- how to add and multiply one, etc. -- but what does it mean?"

Sounds like a perfectly legitimate question to me. Feynman's excellent answer is that, in the context of QM, it means a little 2D arrow.

I say this tongue-in-cheek and completely seriously at the same time.

The idea that density matrices summarize locally invariant entanglement information is certainly helpful, but I still don't know how to start with a density matrix and visualize some physical situation, nor can I take your proof and extract back out an argument that would complete the demonstration in this blog post. I confess this is strictly a defect of my own education, but...

But still, surely you see the difference between saying "Now let this be a trace-preserving operation on this density matrix that is the outer product of the AB state," and saying "Now split up the joint amplitude distribution on A and B according to distinct states of B, and let anything whatsoever happen to the A side; since the evolution is unitary, it won't change the squared modulus of any B-group of states, hence it won't change the perceived probabilities at B."

Recovering irrationalist, no, I've never seen a good calculus textbook in my life. Admittedly my requirements are unusual: The book I've always wanted to read is "The Pure Joy of Calculus For People Who Are Good At Math", rather than "A Giant Dull Tome of Calculus For Students Who Would Rather Be Somewhere Else" or "Calculus for Nitpicking Formalists".

Scott: Thanks, that clarifies it. (As far as the, what now I look up and find to be the partial trace thing)

Scott and Eliezer: As far as what density matricies "really physically are"... well, this isn't an answer so much as a notion of what form the answer might take: Since the laws of probability and rationality are LAWS rather than "just good ideas", it isn't entirely shocking that there'd be some mathematical object th that would seem to act like the place where the territory and map meet. More to the point, the some mathematical object related to the physics that says "this is the most accurate your map can possibly be given the information of whatever is going on with this part/factor of reality."

Alternately, since our map has to be encoded physically. ie, the map isn't the teritory, but the territory contains the map, then not too shocking that there's something about reality that could tell us something semidirectly about maps.

I may just be reshuffling my confusion here. In fact, I pretty sure I am, but it does seem to me that something like what I said above is possibly "what a density matrix 'really' is"

"The idea that density matrices summarize locally invariant entanglement information is certainly helpful, but I still don't know how to start with a density matrix and visualize some physical situation, nor can I take your proof and extract back out an argument that would complete the demonstration in this blog post. I confess this is strictly a defect of my own education, but..."

From what I understand (which is admittedly not much; I could well be wrong), a density matrix is the thingy that describes the probability distribution of the quantum system over all possible states. Suppose that you have a set of quantum states A1...An. The density matrix is a way of describing a system that, say, has a 75% chance of being in A1 and a 25% chance of being in A2, or a 33% chance of being in A1 or A2 or A4, or whatever. You can then plug the density matrix into the standard quantum equations, but everything you get back will have one extra dimension, to account for the fact that the system you are discussing is described by a distribution rather than a pure quantum state.

The gist of Scott Aaronson's proof is (again, if I understand correctly): Suppose that you have two quantum systems, A and B. List the Cartesian product over all possible states of A and B (A1B1, A2B1, A3B1, etc., etc.). Use a density matrix to describe a probability distribution over these states (10% chance of A1B1, 5% chance of A1B2, whatever). Suppose that you are physically located at system A, and you fiddle with the density matrix using some operator Q. Using some mathematical property of Q which I don't really understand, you can show that, after Q has been applied, another person's observations at B will be the same as their earlier observations at B (ie, the density matrix after Q acts the same as it did before Q, so long as you only consider B).

Eliezer: "Why can't you signal using an entangled pair of photons that both start out polarized up-down? By measuring A in a diagonal basis, you destroy the up-down polarization of both photons. Then by measuring B in the up-down/left-right basis, you can with 50% probability detect the fact that a measurement has taken place, if B turns out to be left-right polarized ... the answer turns out to be simple: If both photons have definite polarizations, they aren't entangled."

You can adjust this slightly so that answer no longer applies. Start with two entangled photons A and B that we know have opposite polarizations, so they really are entangled. At A we have a detector behind a filter that can be rotated either vertically or at a 45 degree angle. This is our signal source.

At B, we use a mirror that reflects, say, vertically polarized photons and transmits horizontally polarized; then we recombine the beams from slightly different angles onto a detector. So if we were to send B photons that have gone through a vertical or horizontal filter, we get no interference pattern at the detector, but if we send it photons that went through a diagonal filter, one would show up.

Now if we put the diagonal filter on at A, we know the diagonal polarization at B, and therefore do not know the horizontal/vertical polarization, and so we get an interference pattern. If we put vertical filter on at A, we know the vertical polarization at B, and the interference pattern disappears. Thus we seem to have faster-than-light (or back in time, if you prefer) communication.

(Of course this doesn't actually work, but I think it's a lot harder to explain why in understandable terms.)

Since the laws of probability and rationality are LAWS rather than "just good ideas", it isn't entirely shocking that there'd be some mathematical object th that would seem to act like the place where the territory and map meet. More to the point, the some mathematical object related to the physics that says "this is the most accurate your map can possibly be given the information of whatever is going on with this part/factor of reality."

That's a beautiful way of putting it, which expresses what I was trying to say much better than I did.

That is a good puzzle, Jeff. I remember debating a similar experiment a few years ago on a mailing list. At the time I discovered an analysis of a related idea at http://www.flownet.com/gat/QM.pdf.

The conclusion was that you don't get interference regardless of what you do at the other end, because the paths are potentially distinguishable. There would only be interference when paths are indistinguishable. In other words, photons in the entangled state are kind of weird. I think in terms of the discussion above, you might say that such a photon needs to be described by a density matrix rather than a state vector. This makes it work a little differently than photons from simpler sources.

It's particularly annoying that nobody gave me an answer, because the answer turns out to be simple

The answer you give might be the simple answer to the question you asked, but I have trouble figuring out what you were suggesting in the first place (even given the answer!). Figuring out the question, the confusion, so that it's possible to supply the simple answer is really hard.

Similarly, I'm skeptical of your claim that your calculus text failed an easy chance to communicate insight. I've had a lot of bad experiences with textbooks, where I eventually figure it out, perhaps from another source, come back and can't see what was wrong with the book. If there is something worthy of indignation for its not being shared, why don't you share it?

Let me try, and likely fail, to communicate mathematical insight: matrices are evil. Moving to matrices involves choosing a basis. Usually, as in Scott's example, you just want a direct sum decomposition; it's more natural, and it doesn't clutter the problem with unnecessary entries or indices.

Scott: how does that let you decompose any density matrix uniquely into pure states?

Yes, that was a very bad thing I said. Because even if my density matrix is diagonal in one basis, if I nontrivially change basis it will no longer be diagonal, and so won't be interpretable as a straightforward probability distribution over the new basis states. I must retract, retreat, and retrace my steps.

Scott: Thanks. Though "eeeew" at all my typos. Anyways, there're still aspects I'm unsure of, but hopefully with more reading, playing around with the relevant mathematical dohickies, and thinking, I'll gain a better grasp.

I find myself unable to give an "interpretation-independent" account of what a density matrix is that would be any advance on what's already been said. It is, among other things, a way to represent a probability distribution over quantum states, but you can get the same density matrix from different starting points; but that is less of a problem if you have decided in advance that only certain starting points (e.g. configuration basis, position eigenstates) correspond to reality; that's what I should have said. And the response of a diehard "positionist" to Scott's challenge would be, I think, that any density matrix which cannot be reduced to a mixture of position-basis pure states could only have arisen as the reduced density matrix describing some part of a larger pure state.

I suppose the bigger question is whether the formal ability to change basis in Hilbert space, or even to work independently of any basis at all, is a principle which should be accorded the same significance as, say, special relativity. There is a curious argument for the priority of position, utilizing probability currents and "weak-valued measurements". In that paper it is expressed in the context of Bohm-like hidden variables theories, but I wonder if it can be transposed into a many-worlds perspective.

Hal Finney: "...at the time I discovered an analysis of a related idea at http://www.flownet.com/gat/QM.pdf".

That's a great link--thanks! That puzzle puzzled me for years, ever since I read about some EPR experiments in Scientific American as a kid, and wondered why they didn't just tweak the experiment a bit to make it actually interesting. That paper is the best explanation I've seen by far.

The conclusion was that you don't get interference regardless of what you do at the other end, because the paths are potentially distinguishable.

That's not quite true. The conclusion was that there actually is interference at the other end, but there are two interference patterns that cancel each other out and make it appear that there is no interference. You can apparently produce interference by bringing (classical) information back form one end of the experiment to the other, but you aren't really creating it, you are just "filtering out" interference that was already there.

Hal Finney: From that pdf

Spooky correlations between separate photons were demonstrated in an experiment at the Royal Signals and Radar Establishment in England. In this simplified depiction, a down-converter sends pairs of photons in opposite directions. Each photon passes through a separate two-slit apparatus and is directed by mirrors to a detector. Because the detectors cannot distinguish which slit a photon passes through each photon goes both ways generating an interference pattern.... Yet each photon's momentum is also correlated with its partner's. A measurement showing a photon going through the upper left slit would instantaneously force its distant partner to go

through the lower slit on the right.

I'm so glad that paper says measurement is entanglement because what I've been thinking is that consecutive photons, in a dimmed to one photon at a time two-slit experiment, are the ones interfering not the photon interfering with itself.

Also everything so far from this series says that polarization also determines the slit that is chosen.

What would happen if you had three sources on a rotating platform each taking turns firing at the two slits?

(I can't find the "rerun" version of this page, so am posting my questions here).

For all these types of experiments, how do they "aim" the particle so it hits its target from far away? It would seem that the experimenters would know pretty much where the particle is when it shoots out of the gun (or whatever), so would not the velocity be all over the place? In the post on the Heisenberg principle, there was an example of letting the sun shine through a hole in a piece of paper, which caused the photons to spread pretty widely, pretty quickly.
Does the polarization vector change as the photon moves along? It seems to be very similar to a photon's "main" wave function, as it can be represented as a complex number (and is even displayed as an arrow, like Feynman uses). But I know those Feynman arrows spin according to the photon's wavelength.
Finally - and this is really tripping me up - why can we put in the minus sign in the equation that you say "we will need" later, instead of a + sign? If you have two blobs of amplitude, you need to add them to get the wave function, yes? If that is not the case, I have SEVERELY misunderstood the most basic posts of this sequence.

You have already asked these 3 questions and had them answered: http://lesswrong.com/lw/btv/seq_rerun_on_being_decoherent/6f5f

To clarify the answer at point 3, if you phase shift by half a cycle and add, well, that's called 'subtraction'.

Thanks; sorry about the duplicate question post, I had not been able to find the "replay" version of this particular article.

There must be something I'm missing here. The previous post pretty definitively proved to me that the no communication clause must be false.

Consider the latter two experiments in the last post:

A transmitted 20°, B transmitted 40°: 5.8%

A transmitted 0°, B transmitted 40°: 20.7%

Lets say I'm on Planet A and my friend is on Planet B, and we are both constantly receiving entangled pairs of photons from some satellite stationed between us. I'm filtering my photons on planet A at 20°, and my friend on planet B is filtering his at 40°. He observes a 5.8% chance that his photons are transmitted, in accordance with the experiment. I want to send him a signal faster than light, so I turn my filter to 0°. He should now observe that his photons have a 20.7% chance of being transmitted.

This takes some statistical analysis before he can determine that the signal has really been sent, but the important part is that it makes the speed of sending the message not dependent on the distance, but on the number of particles sent. Given a sufficient distance and enough particles, it should be faster than light, right?

Those are the probabilities that both halves of a pair of photons are transmitted, so you can't determine them without the information from both detectors. The distribution at each individual detector doesn't change, it's the correlation between them that changes.

The distribution at each individual detector doesn't change, it's the correlation between them that changes.

... And to calculate this correlation one needs to transmit information by classical means, no faster than light.

Oh. I can imagine a distribution that looks like that. It would have been helpful if he had given us all the numbers. Perhaps he does in this blog post, but I got confused part way through and couldn't make it to the end.

Would it look like this?

From the decoherent point of view, the no-communication theorem is fairly simple (if you are comfortable with tensor products*). Suppose that Alice and Bob are studying the two quantum systems $A$ and $B$ , whose state spaces are represented by Hilbert spaces $H\_A$ and $H\_B$ , respectively. Then the state space of the joint system is $H := H\_A \\otimes H\_B$ . Now suppose that Alice makes a measurement on** system $A$ , and Bob makes a measurement on system $B$ . These measurements are represented physically by unitary transformations $U\_A:H\_A\\rightarrow H\_A$ and $U\_B: H\_B\\rightarrow H\_B$ . The effect of the measurements on the joint system are therefore represented by the unitary transformations $V\_A = U\_A \\otimes I\_B$ and $V\_B = I\_A \\otimes U\_B$ , where $I\_A$ and $I\_B$ are the identity transformations on $H\_A$ and $H\_B$ , respectively. The key to the no-communication theorem is the observation that the transformations $V\_A$ and $V\_B$ commute with each other. (Either way you take the product you get $U\_A \\otimes U\_B$ .) It implies that if we do our calculations assuming that Alice did her measurement first, then we will get the same answers as if we do our calculations assuming that Bob did his measurement first. So let's do our calculations assuming that Bob measured first, as it will be easier to analyze that way.

After Bob makes his measurement, the amplitude of the universe is split up into two blobs, one corresponding to Bob recording Possible Outcome 1 and another correspondint to Bob recording Possible Outcome 2. The size of these blobs, as measured by square-integrating, is independent of anything that Alice does (since according to this formulation of the problem, Alice hasn't done anything yet). Now when Alice makes her measurement, the size of the blobs is preserved because of unitarity. Moreover (and this is the crucial point) the blob corresponding to Outcome 1 gets mapped to another blob corresponding to Outcome 1, and the blob corresponding to Outcome 2 gets mapped to another blob corresponding to Outcome 2. Thus, the final size of the blobs corresponding to the different outcomes is independent of Alice's choice, and according to the Born probabilities that means Bob's expectations about his measurement are also independent of Alice's choice.

The fact that outcomes are preserved under Alice's action is worth remarking further on. Intuitively, it corresponds to the fact that recorded measurements don't erase themselves randomly. Scientifically, it corresponds to the complicated phenomenon known as decoherence, which is much harder to describe rigorously than the no-communication theorem is. Philosophically, it corresponds to the fact about the world that the Copenhagen interpretation thinks of as an assumption, and which many-worlders think too complicated to be considered a fundamental assumption of physics.

* For those not familiar with tensor products, they are the mathematical objects Eliezer is implicitly talking about whenever he writes things like "(Human-LEFT Sensor-LEFT Atom-LEFT) + (Human-RIGHT Sensor-RIGHT Atom-RIGHT)". A working definition is that the tensor product of an M-dimensional space with an N-dimensional space is an MN-dimensional space.

** And/or a modification to system $A$ ; the composition of any number of measurements and modifications will always be represented by a unitary transformation.

A final remark: The no-communication theorem, as I've sketched it above, shows that entangled but noninteracting particles cannot be used for distant communication. It says nothing about faster-than-light communication, as it does not make the connection between the ability of particles to interact and the speed of light, a connection which requires more formalism. The fact that FTL communication is impossible is a theorem of quantum field theory, the relativistic version of quantum mechanics. The basic idea is that the evolution operators corresponding to spacelike separated regions of spacetime will commute, allowing the above argument to take place with $V\_A$ and $V\_B$ replaced by more realistic operators.

By the way, has anyone else noticed that math symbols don't always work in LessWrong markup? I originally posted code which I had compiled from LaTeX to markup at the suggested website and then double-checked the markup output at http://markdownr.com/, but when I posted here there were errors which didn't come up on either of the previous sites. (I think I've fixed all the errors now though...)

This would be a lot less annoying if it were possible to preview a comment before posting it...

This post is tagged "signaling". :-)