# cousin_it's Shortform

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
This is a special post for short-form writing by cousin_it. Only they can create top-level comments. Comments here also appear on the Shortform Page and All Posts page.

## New to LessWrong?

(1/?)

Finally got around to learning quantum computation. Here’s some notes on the absolute basics, please correct if I’m wrong.

The state of an N-qubit quantum computer is an amplitude distribution on all possible N-bit strings. Let’s define amplitudes as real numbers that (unlike probabilities) can range from −1 to 1 and whose squares sum to 1. So an amplitude distribution is a unit vector in a real 2^N-dimensional space.

(Some people will say amplitudes should be complex numbers, but I read somewhere that using real amplitudes still gives us all the power of quantum computation, and is easier for me to think about.)

Here’s some ways to write amplitude distributions:

|0> is a distribution where the bit string “0” has amplitude 1, and everything else (i.e. the bit string “1″) has amplitude 0.

( |0> + |1> ) / √2 is a distribution where bit strings “0” and “1″ have amplitude 1/√2 each.

|01> is a distribution where the bit string “01” has amplitude 1. We can factor it as |0> ⊗ |1>. The operation ⊗ makes a joint distribution from two independent ones. (It’s called tensor product, but you don’t need to learn it unless you want to.)

( |00> + |01> ) / √2 is a distribution where bit strings “00” and “01″ have amplitude 1/√2 each. It can be factored as |0> ⊗ ( |0> + |1>) / √2, so the two qubits are independent.

( |00> + |11> ) / √2 is a distribution that can’t be factored, so the two qubits are dependent, or “entangled”.

Here’s the operations that a quantum computer can do in constant time:

1. Initialize the qubits with given values.

2. Apply a simple logical operation across all bit strings. For example, if you have the distribution ( |00> + |11> ) / √2 and want to flip the first bit in each string, that gives ( |10> + |01> ) / √2. The operation has to be reversible, e.g. XOR is ok but AND isn’t.

3. Rotate a single qubit by an angle. For example, rotating |0> by 45 degrees becomes ( |0> + |1> ) / √2. To deal with entangled distributions, you need to factor and rotate each component individually, e.g. rotating the first qubit in ( |00> + |11> ) / √2 by 45 degrees will give ( |00> + |10> + |11> - |01> ) / 2. Sometimes things will cancel out, e.g. rotating ( |0> + |1> ) / √2 by 45 degrees will give |1>. This clever use of rotation to reduce randomness is basically the reason for quantum computation’s extra power.

4. Sample a bit string from the distribution, with probabilities according to squared amplitudes. We can only do this once.

That set of operations is enough for everything.

Strong upvote for the very clear explanation of the basics. I would definitely read any further posts elaborating on this -- for example, if you explained some of the simple quantum gates and maybe an algorithm or two that asymptotically outperforms its classical analog.

Oh wait. Here:

( |00> + |01> - |10> + |11> ) / 2

Are the signs of |01> and |10> reversed by mistake? I would think the components would rotate as follows:

• |00> goes to (|00> + |10>) / √2
• |11> goes to (|11> - |01>) / √2

Whoops. Thanks!

It is interesting if quantum computers' computations could be used as evidence against "we are in simulation on a classical computer". Quantum supremacy can't be modeled on a classical computer. But the simulators could easily overcome the obstacle by using quantum computers too. However, it creates some limits about the nature of their hardware.

One could imagine an experiment to test if we are in a classical simulation by performing a very complex computation of a quantum computer. But this could also end our world.

Would it be correct to say that (2) and (3) can be replaced with just "apply any linear operator"?

Also, what advantages does working with amplitudes have compared to working with probabilities? Why don't we just use probability theory?

Any unitary linear operator (to make sure squared amplitudes still sum to 1). To your second question, I'll post a toplevel comment.

Hey, I've got a sudden question for you. Probability distribution on a set of binary variables is to a quantum state as ??? is to a unitary linear operator.

What should ??? be replaced with?

Here's why I have this question. Somehow I was thinking about Normalizing flows, which are invertible functions which, when applied to a sample from an N-dimensional probability distribution, transform it into a sample from another N-dimensional probability distribution. And then I thought: isn't this similar to how quantum operator is always unitary? Maybe then I can combine encoding an image as a pure state (like in Stoudenmire 2016 - Supervised learning with quantum-inspired tensor networks with representing quantum operators as tensor networks to get a quantum-inspired generative model similar to normalizing flows.

(2/?)

Mixed states.

As a lead-in question, why do we use only reversible logic? Why can't we erase a qubit? We can, but it's not pretty. Let's say we have the state ( |00> + |11> ) / √2 and want to erase the first qubit, ideally ending up with ( |0> + |1> ) / √2. Here's a couple things we could try:

1. Pry away the first qubit and send it off in a spaceship

2. Measure the first qubit and then forget the classical bit

The two methods aren't mutually exclusive, because the guy in the spaceship could decide to measure the qubit as well. We shouldn't be able to instantly learn that, so both methods should give the same result. But it's easy to compute what happens in method 2, because we start with a superposition of bit strings "00" and "11" with amplitude 1/√2 each. Either the measurement yields 0 and the remaining state is |0>, or the measurement yields 1 and the remaining state is |1>, with probability 1/2 each. Forgetting the bit, we end up with a classical probability distribution over quantum amplitude distributions: |0> or |1> with probability 1/2 each.

That's a new kind of object, a "mixed" state. It's different from our desired ( |0> + |1> ) / √2, because rotating the latter by 45 degrees and then measuring it leads to 1 with certainty, while for the former it's not true. Moreover, it's different from every "pure" state! Because for every pure state there's some rotation that makes it 1 with certainty, but rotating our mixed state by any angle and then measuring it always gives 0 or 1 with probability 1/2 each.

Okay, that's weird thing #1. Erasing a qubit doesn't work on individual bit strings as you'd expect. Instead it gives us something new - a mixed state with both classical probabilities and quantum amplitudes.

Weird thing #2: our mixed state "|0> or |1> with probability 1/2 each" can also be described in other ways, like "( |0> + |1> ) / √2 or ( |0> - |1> ) / √2 with probability 1/2 each". Don't believe me? With both these descriptions, you can check that rotating by any angle, possibly negating, and then measuring always leads to 0 or 1 with probability 1/2 each. So they are indistinguishable by all our allowed tools.

In fact, since our mixed state is invariant under rotation, it has infinitely many descriptions - just rotate it by any angle. For a mixed state, there's no fact about which pure states ("worlds") live inside it. This mystery is kind of similar to the Born probabilities, in that you can spend lots of effort trying to figure it out, but as a student it's better to move on. There's more to learn.

(There is a unique way to describe mixed states, just not in terms of pure states. It's called density matrices. I might come up with a simple explanation someday, but for now just check Wikipedia.)

(3/?)

Why don’t we just use probability theory?

Imagine Bob works at a qubit recycling plant. He receives qubits, rotates them by 45 degrees, and measures them. Whenever he gets 1 as a result, he's happy.

On Monday Bob gets a bunch of qubits in state |0>. Rotating them by 45 degrees gives ( |0> + |1> ) / √2. When measured, half of those yield 0 and half yield 1. So on Monday Bob is happy 50% of the time.

On Tuesday Bob gets a bunch of qubits in state |1>. Rotating them by 45 degrees gives ( |1> - |0> ) / √2. When measured, half of those yield 0 and half yield 1, just like on Monday. So on Tuesday Bob is happy 50% of the time.

On Wednesday Bob gets a bunch of qubits in state ( |0> + |1> ) / √2. Isn't that just a mixture of qubits from Monday and Tuesday, so Bob should be happy 50% of the time? Nope! Rotating this state by 45 degrees gives |1>, which always yields 1 when measured. So on Wednesday Bob is happy all the time.

That's why amplitudes are different from probabilities. If Wednesday's qubits were a probability-based mixture of Monday's and Tuesday's, only half of them would make Bob happy. But with an amplitude-based mixture, all of them make Bob happy.

(4/?)

Start with the state ( |00> + |11> ) / √2. Give the first qubit to Alice, the second to Bob, and send them far apart. Consider these hypothetical random variables:

A: what Alice would get by measuring her qubit

B: what Bob would get by measuring his qubit

A': what Alice would get by rotating her qubit by 30 degrees and then measuring

B': what Bob would get by rotating his qubit by -30 degrees and then measuring

Each of these variables is a fair coinflip (0 or 1 with probability 1/2 each), but they are not independent. Grab a pen and check:

1. If A and B are measured, they are always equal

2. If A and B' are measured, they are equal with probability 3/4

3. If A' and B are measured, they are equal with probability 3/4

4. If A' and B' are measured, they are equal with probability 1/4

That's spooky. From (1)-(3) it follows that A' and B' are equal with probability at least 1/2, which contradicts (4). But reality still satisfies all four, and you'll never catch it on a contradiction, because you can't measure a qubit twice.

(11/?)

Superdense coding.

Alice is told two classical bits of information and sends a qubit to Bob, who can then recover the two classical bits. Again, it relies on Alice and Bob sharing a prepared pair beforehand. It's the opposite of quantum teleportation, where Alice sends two classical bits and Bob can recover a qubit.

First let's talk about bases. This is the usual basis: |00>, |01>, |10>, |11>. This is the Bell basis: (|00> + |11>)/√2, (|00> - |11>)/√2, (|10> + |01>)/√2, (|10> - |01>)/√2. Check for yourself that each two vectors are orthogonal to each other. To "measure a state in a different basis" means to apply a rotation from one basis into another, then measure. For example, if you have a state and you know that it's one of the Bell basis states, you can figure out which one, by rotating into the usual basis and measuring.

One cool thing about the Bell basis is that you can change any basis vector into any other basis vector by operations on the first qubit only. For example, rotating the first qubit by 90 degrees turns (|00> + |11>)/√2 into (|10> - |01>)√2. Flipping the first qubit gives (|10> + |01>)√2, and so on.

Now superdense coding is easy. Alice and Bob start by sharing two halves of a Bell state. Then depending on which two classical bits Alice needs to send, she either leaves the state alone or rotates into one of the other three, by operating only on her qubit. Then she sends her qubit to Bob, who now has both qubits and can rotate them into the usual basis and measure them, recovering the two classical bits.

(10/?)

Quantum teleportation.

It's not a very good name, maybe we should call it "sending qubits over a classical channel using a quantum one-time pad". The idea is that Alice and Bob are far apart from each other and have a prepared pair of qubits, and Alice also has some qubit in unknown state. She can measure it in a clever way and send some classical bits to Bob, which allows him to recreate the unknown qubit. Both qubits of the prepared pair are spent in the process.

Let's say the unknown qubit is a|0>+b|1>, and the prepared pair is (|00> + |11>)/√2. So the whole state is (a|0> + b|1>) ⊗ (|00> + |11>)/√2 = (a|000> + a|011> + b|100> + b|111>)/√2. Alice can access the first two qubits and Bob can access the third.

1. Alice flips the second qubit depending on the first, leading to (a|000> + a|011> + b|110> + b|101>)/√2.

2. Alice applies a combined reflection and rotation to the first qubit, which is known as Hadamard gate: |0> becomes (|0> + |1>)/√2, while |1> becomes (|0> - |1>)/√2. So the whole state becomes (a|000> + a|100> + a|011> + a|111> + b|010> - b|110> + b|001> - b|101>)/2.

3. We can rewrite that as ((|00> ⊗ (a|0> + b|1>)) + (|01> ⊗ (a|1> + b|0>)) + (|10> ⊗ (a|0> - b|1>)) + (|11> ⊗ (a|1> - b|0>)))/2.

4. Notice something funny? In the above state, measuring the first two qubits gives you exactly enough information to know which operation to apply to the third qubit to get a|0> + b|1>. So if Alice measures both her qubits and sends the two classical bits to Bob, he can reconstruct the unknown qubit on his end. And if Carol eavesdrops on the message without having access to Bob's qubit, she won't learn anything, because the four possible messages have probability 1/4 each, regardless of a and b.

If the unknown qubit was entangled with something else, that also survives the transfer, though proving that requires a bit more calculation.

(9/?)

Black box functions and Grover's algorithm.

Say we want our quantum computer to make calls to a black box function f(x) from bit strings to bits. There's a couple ways to formalize that. Phase oracle: |x> becomes -|x> if f(x), otherwise |x>. Binary oracle: |x> ⊗ |b> becomes |x> ⊗ |f(x) xor b>. These definitions only say what happens to basis vectors, and are extended to other vectors by linearity.

We can convert from a binary oracle to a phase oracle by using an extra qubit. Start with a state |x> ⊗ (|0> - |1>)/√2 and apply a binary oracle, we end up with (1/√2) ⋅ (|x1> - |x0> if f(x), otherwise |x0> - |x1>). But that's exactly (-|x> if f(x), otherwise |x>) ⊗ (|0> - |1>)/√2. So |x> has been changed as if by a phase oracle, and the extra qubit was left unchanged.

Grover's algorithm: we have a black box function from N-bit strings to bits, which returns true for only one possible input. We want to find that input. Classically that requires O(2^N) calls to the black box, because you can't do much better than brute force. With a quantum algorithm we can get O(√2^N) which is much faster.

Let's say X is the input we're trying to find, which is one of the basis vectors of the space. Let's also define a vector S = (|0> + |1>)/√2 ⊗ ... ⊗ (|0> + |1>)/√2, which is the sum of all 2^N basis vectors with equal amplitudes.

1. Initialize state as S

2. Apply phase oracle, corresponding to reflection through the plane perpendicular to X

3. Apply "Grover operator", corresponding to reflection through the plane perpendicular to S and then central symmetry around 0

4. Repeat steps 2 and 3 a bunch of times

To understand it geometrically, draw the 2D plane containing the vectors X and S. Their dot product is 1/√2^N, which is a small positive number, so the angle between them is a bit less than 90 degrees. If our state starts out as S, the net effect of (2) and (3) is a slight rotation toward X, and each time we repeat it, we rotate again by the same angle. The angle is 2⋅arcsin(1/√2^N), which is proportional to 2/√2^N, so after (pi/4)⋅√2^N iterations we're pretty close to X. Then we can measure and get X with high probability.

I glossed over implementing the Grover operator, but that shouldn't be too hard, since from the first half of this comment it should be kinda clear how to reflect a vector through a plane.

(6/?)

The last comment didn't have enough rules for calculating with mixed states. Here's some more:

1. To measure a mixed state, the probabilities of all bit strings are on the main diagonal of the density matrix. For example, the pure state |0> has density matrix ((1,0),(0,0)), so measuring it leads to 0 with certainty. The mixed state “|0> or |1> with probability 1⁄2 each” has density matrix ((1/2,0),(0,1/2)), so the probability of each outcome is 1/2. And the pure state ( |0> + |1> ) / √2 has density matrix ((1/2,1/2),(1/2,1/2)), so the probability of each outcome is also 1/2.

2. To apply a reversible logical operation to a mixed state, recall that each possible bit string corresponds to a basis vector. So applying a permutation to the possible bit strings is as simple as swapping around some rows and columns in the density matrix. For example, if we have a qubit |0> with density matrix ((1,0),(0,0)) and want to negate it, we need to swap the two rows and swap the two columns, leading to density matrix ((0,0),(0,1)).

3. You also can apply plain old probabilistic operations to a mixed state, like flip a coin and do something depending on the result. To do that, take a weighted sum of density matrices. For example, if we start with a qubit |0> and want to negate it depending on a coinflip, we take a weighted sum of ((1,0),(0,0)) and ((0,0),(0,1)), which leads to ((1/2,0),(0,1/2)).

4. To rotate a single qubit in a mixed state by an angle, the calculation is a bit involved. First you divide the density matrix into 2x2 blocks, according to which rows and columns correspond to bit strings that differ on only that qubit. Then in each block you perform a change of basis to ((cos(phi), -sin(phi)), (sin(phi), cos(phi))). For the simple case of phi = 45 degrees, each block ((a,b),(c,d)) becomes ((a+b+c+d,a+b-c-d),(a-b+c-d,a-b-c+d))/2. For example, the density matrix ((1,0),(0,0)) representing the pure state |0> becomes the matrix ((1/2,1/2),(1/2,1/2)) representing the pure state ( |0> + |1> ) / √2, as expected.

(5/?)

if only I knew what the hell a density matrix means, physically. Not how to define it, what it means. This information seems to have been left out of physics textbooks and Wikipedia.

I'm here to help!

Recall that with N qubits we have a 2^N dimensional space whose basis vectors correspond to all possible bit strings of length N. A pure state, like ( |00> + |11> ) / √2, is a vector in that space. When we measure a pure state, the probability of each bit string is the squared dot product of the state vector with the corresponding basis vector. We can assume that the state vector is a unit vector, which makes its squared dot products with all basis vectors (i.e. all probabilities) sum to 1. We can also rotate the vector, and probabilities will still sum to 1, because length is invariant under rotation. Math is wonderful.

On the other hand, a mixed state like "|0> or |1> with probability 1/2 each" is a norm on that space. When we measure a mixed state, the probability of each bit string is that norm applied to the corresponding basis vector. Each pure state corresponds to a degenerate norm which does squared projection onto the state vector, so you can think of it as a straight line; whereas most mixed states correspond to non-degenerate norms, so you can think of them as ellipsoids. We can assume that the norm has trace 1, so the norms of all basis vectors (i.e. all probabilities) sum to 1. We can also rotate the norm, or equivalently rotate the basis, and probabilities will still sum to 1, because trace is invariant under rotation. Math is wonderful.

That fixes a small annoyance I noticed this morning. There's no way to distinguish |0> from -|0> by rotation, negation and measurement, so they should be the same. As vectors they are different, but as norms (represented by 2^N x 2^N matrices, known as "density matrices") they are indeed the same! The correspondence between mixed states and density matrices is one-to-one.

The computational rules are very simple:

1. For a basis state, the density matrix has one 1 on the diagonal and 0 everywhere else. For example, the pure state |0> has density matrix ((1,0),(0,0)).

2. For a pure state that's a superposition of basis states, you take the superposition coefficients a_i for i from 1 to 2^N, and build the matrix {a_i*a_j}. For example, the pure state ( |0> + |1> ) / √2 has superposition coefficients (1/√2, 1/√2), so the density matrix is ((1/2,1/2),(1/2,1/2)).

3. For a mixed state that's a probabilistic mixture of pure states, you just take a weighted sum of matrices. For example, the mixed state "|0> or |1> with probability 1/2 each" has density matrix ((1/2,0),(0,1/2)). Note that this matrix, unlike the other two, isn't degenerate - it's a Euclidean norm. Now it's obvious why this mixed state is invariant under rotation, and why no pure state can do the same, which we figured out the hard way in one of the previous comments.

Do you by any chance have a typo here? Sorry if I am wrong, since I don't actually know quantum information theory.

A pure state, like ( |00> + |11> ) / √2, is a vector in that space.

I think this state is mixed, since it's a sum of two vectors, which can't be represented as just one kronecker product.

No. The property which you are describing is not "mixedness" (technical term: "purity"). That the state vector in question can't be written as a tensor product of state vectors makes it an *entangled* state.

Mixed states are states which cannot be represented by *any* state vector. You need a density matrix in order to write them down.

I just thought of a simple way to explain tensors. Imagine a linear function that accepts two numbers and returns a number, let's call it f(x,y). Except there are two ways to imagine it:

1. Linear in both arguments combined: f(1,2)+f(1,3)=f(2,5). Every such function has the form f(x,y)=ax+by for some a and b, so the space of such functions is 2-dimensional. We say that the Cartesian product of R^1 and R^1 is R^2, because 1+1=2.

2. Linear in each argument when the other is fixed: f(1,2)+f(1,3)=f(1,5). Every such function has the form f(x,y)=axy for some a, so the space of such functions is 1-dimensional. We say that the tensor product of R^1 and R^1 is R^1, because 1*1=1.

In this case the tensor product is lower dimensional than the Cartesian product. But if we take say R^3 and R^3, then the Cartesian product will be R^6 and the tensor product will be R^9, because it will have separate coefficients a_(ij)*x_i*y_j.

Edit: no point asking this question here.

Could convolution work?

EDIT: confused why I am downvoted. Don't we want to encourage giving obvious (and obviously wrong) solutions to short form posts?

(12/?)

The uncertainty principle, or something like it.

When you measure a qubit cos(φ)|0> + sin(φ)|1>, the result has variance (1-cos(4φ))/4. (I'm skipping over the trig calculations here and below.) If you have a choice between either measuring the qubit, or rotating it by an angle ψ and then measuring it, then the sum of variances of the two operations is at least (1-|cos(2ψ)|)/2. In particular, if ψ=π/4, the sum of variances is at least 1/2.

So no matter what state a qubit is in, and how precisely you know its state, there are two possible measurements such that the sum of variances of their results is at least 1/2. The reason is that the bases corresponding to these measurements are at an angle to each other, not aligned. Apparently in the real world, an object's "position" and "momentum" are two such measurements, so there's a limit on their joint precision.

You can also carry out one measurement and then the other, but that doesn't help - after the first measurement you have variance in the second. Moreover, after the second you have fresh variance in the first. This lets you get an infinite stream of fair coinflips from a single qubit: start with |0>, rotate by π/4, measure, rotate by π/4, measure...

(8/?)

Decoherence.

Let's say you have a state ( |0> + |1> ) / √2 and want to rotate it by 45 degrees and measure it, expecting to get "1" with certainty. But a bug in your equipment affects some other unrelated qubit, which was |0> before and becomes |0 xor your qubit> afterward. So now you have ( |00> + |11> ) / √2. Now rotating your qubit leads to ( |00> + |10> + |11> - |01> ) / 2, and measuring gives "0" or "1" with probability 1/2 each. Some unrelated qubit being xor'ed with yours caused your qubit to become mixed.

In theory, this is recoverable: find the stray qubit and xor it with yours again. But what if you accidentally measured it? Then you'd have to xor your memory with the qubit, which sounds impractical. Worse, what if you measured it and learned about the problem only in case of "1"? This is why quantum computation is so fragile.

(7/?)

No-cloning and no-deleting theorems.

As a warmup, imagine the state ( |00> + |01> ) / √2 gets changed by a quantum operation into ( |00> + |11> ) / √2. The operation is "flip the first bit if the second is 1". After that, if we have only the first qubit to play with, it's statistically indistinguishable from a single qubit in the mixed state "|0> or |1> with probability 1/2 each". Parts of a pure state, considered in isolation, are mixed states.

Mixed states involve probabilities, which makes us think of measurement. And indeed, the operation above can be seen as either "use the first qubit to measure the second" or "entangle the first qubit with the second". When you're inside a quantum system, it looks like measurement; when you're outside and something inside is doing the measuring, it looks like entanglement. This lets us prove theorems using only linear operations, without worrying about measurement as a special case.

Moreover, every mixed state is a part of some pure state. Recall that every mixed state is an ellipsoid. We can rotate it to be axis-aligned, leading to a density matrix with nonnegative entries on the diagonal and zero everywhere else. These entries are the probabilities of getting each bit string, a.k.a. basis vector. Let's say each basis vector |x_i> has probability p_i. Now we can make a bigger system with twice the dimensions, and build a superposition of basis vectors |x_i> ⊗ |x_i> with amplitudes √p_i, all other vectors get zero. That's a pure state, and the first half of its qubits give our original mixed state, just like the first qubit of the pure state ( |00> + |11> ) / √2 gave us the mixed state "|0> or |1> with probability 1/2 each". This lets us prove theorems using only pure states, without worrying about mixed states.

With the preliminaries out of the way...

No-cloning theorem: let's say we have a qubit in unknown pure state and another qubit prepared as |0>. Then we can't copy the first into the second. Proof: assume we can. Let's say |x> is the rest of the universe. Then |00x> and |10x> should become |00y> and |11z> for some y and z. Then by linearity, (( |0> + |1> ) / √2) ⊗ |0x> becomes ( |00y> + |11z> ) / √2. But then measuring our qubits always gives either "00" or "11", so they aren't independent copies of ( |0> + |1> ) / √2 as we wanted. Done.

No-deleting theorem: let's say we have two qubits in identical unknown pure states. Then we can't erase one of them to |0> while leaving the other and the rest of the universe unchanged. Proof: if we could, we could do it in reverse and achieve cloning. Done.

I'm not sure these theorems and proofs are exactly right. Just looked up the rough theorem statements on Wikipedia and worked out the proofs here in the comment draft. If there are errors, I'm sure people will point them out.

[+][comment deleted]4yΩ24
[+][comment deleted]6moΩ12
[+][comment deleted]4yΩ12
[+][comment deleted]4yΩ12