Treating the density matrix as fundamental is bad because you shouldn't explain with ontology that which you can explain with epistemology.
I've found our Agent Smith :) If you are serious, I'm not sure what you mean. Like there is no ontology in physics -- every picture you make is just grasping at pieces of whatever theory of everything you eventually develop
When you say there's "no such thing as a state," or "we live in a density matrix," these are statements about ontology: what exists, what's real, etc.
Density matrices use the extra representational power they have over states to encode a probability distribution over states. If we regard the probabilistic nature of measurements as something to be explained, putting the probability distribution directly into the thing we live in is what I mean by "explain with ontology."
Epistemology is about how we know stuff. If we start with a world that does not inherently have a probability distribution attached to it, but obtain a probability distribution from arguments about how we know stuff, that's "explain with epistemology."
In quantum mechanics, this would look like talking about anthropics, or what properties we want a measure to satisfy, or solomonoff induction and coding theory.
What good is it to say things are real or not? One useful application is predicting the character of physical law. If something is real, then we might expect it to interact with other things. I do not expect the probability distribution of a mixed state to interact with other things.
One person's "occam's razor" may be description length, another's may be elegance, and a third person's may be "avoiding having too much info inside your system" (as some anti-MW people argue). I think discussions like "what's real" need to be done thoughtfully, otherwise people tend to argue past each other, and come off overconfident/ underinformed.
To be fair, I did use language like this so I shouldn't be talking -- but I used it tongue-in-cheek, and the real motivation given in the above is not "the DM is a more fundamental notion" but "DM lets you make concrete the very suggestive analogy between quantum phase and probability", which you would probably agree with.
For what it's worth, there are "different layers of theory" (often scale-dependent), like classical vs. quantum vs. relativity, etc., where there I think it's silly to talk about "ontological truth". But these theories are local conceptual optima among a graveyard of "outdated" theories, that are strictly conceptually inferior to new ones: examples are heliocentrism (and Ptolemy's epycycles), the ether, etc.
Interestingly, I would agree with you (with somewhat low confidence) that in this question there is a consensus among physicists that one picture is simply "more correct" in the sense of giving theoretically and conceptually more elegant/ precise explanations. Except your sign is wrong: this is the density matrix picture (the wavefunction picture is genuinely understood as "not the right theory", but still taught and still used in many contexts where it doesn't cause issues).
I also think that there are two separate things that you can discuss.
Note that for #1, you should not think of a density function as a probability distribution on quantum states (see the discussion with Optimization Process in the comments), and this is a bad intuition pump. Instead, the thing that replaces probability distributions in quantum mechanics is a density matrix.
I think a charitable interpertation of your criticism would be a criticism of #1 (putting limited-info dynamics -- i.e., quantum thermodynamics) as primary to "invertible dynamics". Here there is a debate to be had.
I think there is not really a debate in #2: even in invertible QM (no probability), you need to use density matrices if you want to study different subsystems (e.g. when modeling systems existing in an infinite, but not thermodynamic universe you need this language, since restricting a wavefunction to a subsystem makes it mixed). There's also a transposed discussion, that I don't really understand, of all of this in field theory: when do you have fields vs. operators vs. other more complicated stuff, and there is some interesting relationship to how you conceptualize "boundaries" - but this is not what we're discussing. So you really can't get away from using density matrices even in a nice invertible universe, as soon as you want to relate systems to subsystems.
For question #1 is reasonable (though I don't know how productive) to discuss what is "primary". I think (but here I am really out of my depth) that people who study very "fundamental" quantum phenomena increasingly use a picture with a thermal bath (e.g. I vaguely remember this happening in some lectures here). At the same time, it's reasonable to say that "invertible" QM phenomena are primary and statistical phenomena are ontological epiphenomena on top of this. While this may be a philosophical debate, I don't think it's a physical one, since the two pictures are theoretically interchangeable (as I mentioned, there is a canonical way to get thermodynamics from unitary QM as a certain "optimal lower bound on information dynamics", appropriately understood).
Still, as soon as you introduce the notion of measurement, you cannot get away from thermodynamics. Measurement is an inherently information-destroying operation, and iiuc can only be put "into theory" (rather than being an arbitrary add-on that professors tell you about) using the thermodynamic picture with nonunitary operators on density matrices.
people who study very "fundamental" quantum phenomena increasingly use a picture with a thermal bath
Maybe talking about the construction of pointer states? That linked paper does it just as you might prefer, putting the Boltzmann distribution into a density matrix. But of course you could rephrase it as a probability distribution over states and the math goes through the same, you've just shifted the vibe from "the Boltzmann distribution is in the territory" to "the Boltzmann distribution is in the map."
Still, as soon as you introduce the notion of measurement, you cannot get away from thermodynamics. Measurement is an inherently information-destroying operation, and iiuc can only be put "into theory" (rather than being an arbitrary add-on that professors tell you about) using the thermodynamic picture with nonunitary operators on density matrices.
Sure, at some level of description it's useful to say that measurement is irreversible, just like at some level of description it's useful to say entropy always increases. Just like with entropy, it can be derived from boundary conditions + reversible dynamics + coarse-graining. Treating measurements as reversible probably has more applications than treating entropy as reversible, somewhere in quantum optics / quantum computing.
Thanks for the reference -- I'll check out the paper (though there are no pointer variables in this picture inherently).
I think there is a miscommunication in my messaging. Possibly through overcommitting to the "matrix" analogy, I may have given the impression that I'm doing something I'm not. In particular, the view here isn't a controversial one -- it has nothing to do with Everett or einselection or decoherence. Crucially, I am saying nothing at all about quantum branches.
I'm now realizing that when you say map or territory, you're probably talking about a different picture where quantum interpretation (decoherence and branches) is foregrounded. I'm doing nothing of the sort, and as far as I can tell never making any "interpretive" claims.
All the statements in the post are essentially mathematically rigorous claims which say what happens when you
Both of these are mathematically formalizable and aren't saying anything about how to interpret quantum branches etc. And the Lindbladian is simply a useful formalism for tracking the evolution of a system that has these properties (subdivisions and baths). Note that (maybe this is the confusion?) subsystem does not mean quantum branch, or decoherence result. "Subsystem" means that we're looking at these particles over here, but there are also those particles over there (i.e. in terms of math, your Hilbert space is a tensor product
Also, I want to be clear that we can and should run this whole story without ever using the term "probability distribution" in any of the quantum-thermodynamics concepts. The language to describe a quantum system as above (system coupled with a bath) is from the start a language that only involves density matrices, and never uses the term "X is a probability distribution of Y". Instead you can get classical probability distributions to map into this picture as a certain limit of these dynamics.
As to measurement, I think you're once again talking about interpretation. I agree that in general, this may be tricky. But what is once again true mathematically is that if you model your system as coupled to a bath then you can set up behaviors that behave exactly as you would expect from an experiment from the point of view of studying the system (without asking questions about decoherence).
There are some non-obvious issues with saying "the wavefunction really exists, but the density matrix is only a representation of our own ignorance". Its a perfectly defensible viewpoint, but I think it is interesting to look at some of its potential problems:
All of that said, your position is fully reasonable, I am just trying to point out that the way density matrices are usually introduced in teaching or textbooks does make the issue seem a lot more clear cut than I think it really is.
A process or machine prepares either |0> or |1> at random, each with 50% probability. Another machine prepares either |+> or |-> based on a coin flick, where |+> = (|0> + |1>)/root2, and |+> = (|0> - |1>)/root2. In your ontology these are actually different machines that produce different states.
I wonder if this can be resolved by treating the randomness of the machines quantum mechanically, rather than having this semi-classical picture where you start with some randomness handed down from God. Suppose these machines use quantum mechanics to do the randomization in the simplest possible way - they have a hidden particle in state |left>+|right> (pretend I normalize), they mechanically measure it (which from the outside will look like getting entangled with it) and if it's on the left they emit their first option (|0> or |+> depending on the machine) and vice versa.
So one system, seen from the outside, goes into the state |L,0>+|R,1>, the other one into the state |L,0>+|R,0>+|L,1>-|R,1>. These have different density matrices. The way you get down to identical density matrices is to say you can't get the hidden information (it's been shot into outer space or something). And then when you assume that and trace out the hidden particle, you get the same representation no matter your philosophical opinion on whether to think of the un-traced state as a bare state or as a density matrix. If on the other hand you had some chance of eventually finding the hidden particle, you'd apply common sense and keep the states or density matrices different.
Anyhow, yeah, broadly agree. Like I said, there's a practical use for saying what's "real" when you want to predict future physics. But you don't always have to be doing that.
You are completely correct in the "how does the machine work inside?" question. As you point out that density matrix has the exact form of something that is entangled with something else.
I think its very important to be discussing what is real, although as we always have a nonzero inferential distance between ourselves and the real the discussion has to be a little bit caveated and pragmatic.
- A process or machine prepares either |0> or |1> at random, each with 50% probability. Another machine prepares either |+> or |-> based on a coin flick, where |+> = (|0> + |1>)/root2, and |+> = (|0> - |1>)/root2. In your ontology these are actually different machines that produce different states. In contrast, in the density matrix formulation these are alternative descriptions of the same machine. In any possible experiment, the two machines are identical. Exactly how much of a problem this is for believing in wavefuntions but not density matrices is debatable - "two things can look the same, big deal" vs "but, experiments are the ultimate arbiters of truth, if experiemnt says they are the same thing then they must be and the theory needs fixing."
I like “different machines that produce different states”. I would bring up an example where we replace the coin by a pseudorandom number generator with seed 93762. If the recipient of the photons happens to know that the seed is 93762, then she can put every photon into state |0> with no losses. If the recipient of the photons does not know that the random seed is 93762, then she has to treat the photons as unpolarized light, which cannot be polarized without 50% loss.
So for this machine, there’s no getting away from saying things like: “There’s a fact of the matter about what the state of each output photon is. And for any particular experiment, that fact-of-the-matter might or might not be known and acted upon. And if it isn’t known and acted upon, then we should start talking about probabilistic ensembles, and we may well want to use density matrices to make those calculations easier.”
I think it’s weird and unhelpful to say that the nature of the machine itself is dependent on who is measuring its output photons much later on, and how, right?
Yes, in your example a recipient who doesn't know the seed models the light as unpolarised, and one who does as say, H-polarised in a given run. But for everyone who doesn't see the random seed its the same density matrix.
Lets replace that first machine with a similar one that produces a polarisation entangled photon pair, |HH> + |VV> (ignoring normalisation). If you have one of those photons it looks unpolarised (essentially your "ignorance of the random seed" can be thought of as your ignorance of the polarisation of the other photon).
If someone else (possibly outside your light cone) measures the other photon in the HV basis then half the time they will project your photon into |H> and half the time into |V>, each with 50% probability. This 50/50 appears in the density matrix, not the wavefunction, so is "ignorance probability".
In this case, by what I understand to be your position, the fact of the matter is either (1) that the photon is still entangled with a distant photon, or (2) that it has been projected into a specific polarisation by a measurement on that distant photon. Its not clear when the transformation from (1) to (2) takes place (if its instant, then in which reference frame?).
So, in the bigger context of this conversation,
OP: "You live in the density matrices (Neo)"
Charlie :"No, a density matrix incorporates my own ignorance so is not a sensible picture of the fundamental reality. I can use them mathematically, but the underlying reality is built of quantum states, and that randomness when I subject them to measurements is fundamentally part of the territory, not the map. Lets not mix the two things up."
Me: "Whether a given unit of randomness is in the map (IE ignorance), or the territory is subtle. Things that randomly combine quantum states (my first machine) have a symmetry over which underlying quantum states are being mixed that looks meaningful. Plus (this post), the randomness can move abruptly from the territory to the map due to events outside your own light cone (although the amount of randomness is conserved), so maybe worrying too much about the distinction isn't that helpful.
Your use of "pure state" is totally different to the standard definition (namely rank(rho)=1). I suggest using a different term.
To add: I think the other use of "pure state" comes from this context. Here if you have a system of commuting operators and take a joint eigenspace, the projector is mixed, but it is pure if the joint eigenvalue uniquely determines a 1D subspace; and then I think this terminology gets used for wave functions as well
Today's post is in response to the post "Quantum without complications", which I think is a pretty good popular distillation of the basics of quantum mechanics.
For any such distillation, there will be people who say "but you missed X important thing". The limit of appeasing such people is to turn your popular distillation into a 2000-page textbook (and then someone will still complain).
That said, they missed something!
To be fair, the thing they missed isn't included in most undergraduate quantum classes. But it should be.[1]
Or rather, there is something that I wish they told me when I was first learning this stuff and confused out of my mind, since I was a baby mathematician and I wanted the connections between different concepts in the world to actually have explicit, explainable foundations and definitions rather than the hippie-dippie timey-wimey bullshit that physicists call rigor.
The specific point I want to explain is the connection between quantum mechanics and probability. When you take a quantum class (or read a popular description like "Quantum without complications") there is a question that's in the air, always almost but not quite understood. At the back of your mind. At the tip of your tongue.
The question is this:
If you are brave, I'm going to tell you about it. Buckle up, Neo.
Quantum mechanics 101
Let me recap the standard "state space" quantum story, as exemplified by (a slight reinterpretation of) that post. Note that (like in the "Quantum without complications" post) I won't give the most general or the most elegant story, but rather optimize for understandability:
Upshots
The important things to keep in mind from the above:
Statistical mechanics 101
The process of measurement connects quantum mechanics with statistical mechanics. But even if I hadn't talked about measurement in the last section, anyone who has studied probability would see a lot of parallels between the last section and the notion of Markov procesess.
Most people are intuitively familiar with Markov processes. A Markov process is a mathematical way of modeling some variable x that starts at some state s (which may be deterministic or already probabilistic) and undergoes a series of random transitions between states. Let me again give a recap:
The correct way to model the state of the universe is as a probability distribution p, which models uncertain knowledge about the universe and is a function from a set of deterministic states S={s1,s2,…} to real numbers. These must satisfy:
We say that a probability distribution p is deterministic if there is a single state s with p(s′)={1,s′=s0,otherwise. In this case we write p=δs.
There are a hell of a lot of similarities between this picture and the quantum picture, though of course we don't have to separately introduce a notion of measurement here: indeed, in the quantum context, measurement converts a quantum state to a probability distribution but in statistics, you have a probability distribution from the start!
However, there are a couple key differences as well. The standard one that everyone notices is that in the quantum picture we used complex numbers and in the statistical picture, we used real numbers. But there's a much more important and insidious difference that I want to bring your attention to (and that I have been bolding throughout this discussion). Namely:
Specifically, the "pure-state measurement" probability variable associated to a quantum state |ϕ⟩ is quadratic in the vector |ϕ⟩ (with coordinates |⟨s|ϕ⟩|2).
This seems to dash the hopes of putting both the quantum and statistical pictures of the world on an equal footing, with perhaps some class of "mixed" systems interpolating between them. After all, while the dynamics in both cases are linear, there must be some fundamental nonlinearity in the relationship between the quantum and statistical worlds.
Right?
Welcome to the matrix
We have been lied to (by our quantum mechanics 101 professors. By the popular science magazines. By the well-meaning sci-fi authors). There is no such thing as a quantum state.
Before explaining this, let's take a step back and imagine that we have to explain probability to an intelligent alien from a planet that has never invented probability. Then here is one possible explanation you can give:
Probability is a precise measure of our ignorance about a complex system. It captures the dynamics of a "minimal bound" on the information we have about a set of "coarse" states in a subsystem S (corresponding to "the measurable quantities in our experimental setup") inside a large system U (corresponding to a maximally finegrained description of the universe)[5].
Now whenever we do quantum mechanics, we also implicitly are separate a "large" system into an "experimental setup" and an "environment". We think of the two as "not interacting very much", but notably measurement is inherently linked to thinking about the interaction of the system and its environment.
And it turns out that in the context of quantum mechanics, whenever you are studying a subsystem inside a larger environment (e.g. you're focusing on only a subset of all particles in the universe, an area of space, etc.), you are no longer allowed to use states.
Density matrices
Instead, what replaces the "state" or "wavefunction" from quantum mechanics is the density matrix, which is a "true state" of your system (incorporating the "bounded information" issues inherent with looking at a subsystem). This "true state" is a matrix, or a linear operator, ρ:H→H. Note here a potential moment of confusion: in the old "state space" picture of quantum mechanics (that I'm telling you was all lies), the evolution operators were matrices from H→H. The partition matrices happen to live in the same space, but they behave very differently and should by no means be thought of as the same "kind of object". In particular they are Hermitian rather than unitary.
Now obviously the old picture isn't wrong. If your system happens to be "the entire universe", then while I am claiming that you also have this new "density matrix evolution" picture of quantum mechanics, you still have the old "state vector" picture. You can get from one to the other via the following formula:
ρ=|ϕ⟩⟨ϕ|. In other words, ρ is the rank-1 complex projection matrix associated to your "old-picture" state ϕ.
Now the issue with states is that there is no way to take a universe state |ϕU⟩ associated to a big system and convert it to a "system state" |ϕS⟩ associated to a small or coarse subsystem. But there is a way to take the partition matrix ρU associated to the big system and "distill" the partition matrix ρS for the subsystem. It's called "taking a trace", and while it's easy to describe in many cases, I won't do this here for reasons of time and space (in particular, because I haven't introduced the necessary formalism to talk about system-environment separation and don't plan to do so).
Going back to the relationship between the quantum state and the partition function: notice that the passage |ϕ⟩↦ρ=|ϕ⟩⟨ϕ| is quadratic. I forgot to bold: it's quadratic.
What does this mean? Well first of all, this means that the "probability vector" associated to performing a measurement on the state |ϕ⟩ is now a linear operation of the "improved" version of the state, namely the density matrix ρ=|ϕ⟩⟨ϕ|. This is a big deal! This means that we might be able to have a linear relationship with the "probability world" after all.
But does this mean that the linear evolution that Quantum mechanics posits on the nice vector |ϕ⟩ turns into some quadratic mess? Luckily, the answer is "no". Indeed, the evolution remains linear. Namely just from the formula, we see the following identity is true for the "universal" state vector[6]: ρt=Utρ0U−1t. Now if you expand, you see that each entry of ρt is linear in entries of ρ0. Thus evolution is given by a linear "matrix conjugation" operator Conj(U)t:OpH→OpH, where "Op" denotes the vector space of operators from H to itself. Moreover, the evolution operators Conj(U)t are unitary[7].
So what we've developed is a new picture:
So now comes the big question. What if instead of the "whole universe", we are only looking at the dynamics of the "limited information" subsystem? Turns out there are two options here, depending on whether the Hilbert space HS associated with the subsystem is "coupled" (i.e., exchanges particles/ energy/ etc.) with the Hilbert space HU∖S of the "environment" (a.k.a. the "rest of the universe").
So at the end of the day we see two new things that occur when modeling any realistic quantum system:
In fact, we can say more: the new dynamics interpolates between the unitary dynamics of "fully isolated quantum systems" and the Markovian dynamics of the stochastic evolution picture. In fact, if the interaction between the system and its environment exhibits weak coupling and short correlation-time (just words for now that identify a certain asymptotic regime, but note that most systems are like this macroscopically), then the Lindbladian dynamics becomes Markovian (at a suitable time step). Specifically if there are N states, the density matrix at any point in time has N2 terms. In this asymptotic regime, all the dynamics reduces to the dynamics of the diagonal density matrices, the N linear combinations of matrices of the form |s⟩⟨s|, though the different diagonal terms can get mixed. And on large timescales, this mixing is exactly described by a Markov process.
If you've followed me along this windy path, you are now awakened. You know three things:
So can I brag to people that I've resolved all the "multiverse/decoherence" issues now?
Not really. Certainly, you can fully understand "measurement" in terms of these "corrected" quantum dynamics -- it's no longer a mystery (and has not been for a very long time). And you can design toy models where running dynamics on a "multiverse" exhibits everything a natural splitting into quantum branches and gives everything you want from decoherence. But the larger question of why and how different quantum "branches" decohere in our real, non-toy universe is still pretty hard and not a little mysterious. (I might write a bit more about this later, but I don't have any groundbreaking insights for you here.)
Who ordered that?
This is the famous apocryphal question asked by the physicist Isidor Isaac Rabi in response to the discovery of yet another elementary particle (the muon). So who ordered this matrix-flavored craziness, that the correct way to approach modeling quantum systems is by evolving a matrix (entries indexed by pairs of configurations) rather than just a single state?
In this case there actually is an answer: Liouville. Liouville ordered that. Obviously Liouville didn't know about quantum mechanics, but he did know about phase space[9]. Here I'm going to get a little beyond our toy "Quantum 101" and talk about wavefunctions (in an very, very hand-wavy way. Get it - waves). Namely, something interesting that happens when performing "quantization": passing from usual mechanics to quantum mechanics is that, weirdly, "space gets smaller". Indeed, knowing a bunch of positions of particles is not sufficient to know how they evolve in the classical world: you also need to know their velocities (or equivalently, momenta). So for example in single-particle classical physics in three dimensions, the evolution equation you get is not on single-particle "configuration space" R3, but on the space of (position, momentum) pairs, which is R3+3=R6. In "wavefunction" quantum mechanics, your quantum state loses half of its dimension: the evolution occurs on just 3-dimensional wavefunctions. This is to some extent unavoidable: the uncertainty principle tells you that you can't independently set the position and the momentum of a particle, since position and momentum are actually two separate bases of the Hilbert space of wavefunctions. But on the other hand, like, classical physics exists. This means that in some appropriate "local/coarsegrained" sense of a particle in a box separated (but entangeled) from the environent of the rest of the universe, position and momentum are two meaningful quantities that can sort of cooccur.
Now there is a certain very natural and elegant quantum-classical comparison, called the "Wigner-Weyl transform", that precisely relates the space of operators on R3 (or a more general configuration space) and functions on the phase space R3+3 (or a more general phase space). Thus, when we think in the "density matrix" formalism, there is a natural translation of states and evolutions between them which (approximately) translates phase-space dynamics and density-space dynamics. So in addition to all the good properties of the density matrix formalism that I've (badly) explained above, we see a reasonable explanation for something else that was mysterious and nonsensical in the "typical" quantum story.
But don't worry. If you're attached to your old nice picture of quantum mechanics where states are wavefunctions and evolution is unitary and nothing interesting ever happens, there's always the blue pill. The wavefunction will always be there.
Along with the oscillating phase expansion, basics on Lie groups, ⋆ products, and the Wigner-Weyl transform. Oh and did I mention that an intro quantum class should take 3 semesters, not one?
Often called a "wavefunction"
In terms of the bra-ket notation physicists write this requirement as ⟨ϕ|ϕ⟩=1. The way you're supposed to read this notation is as follows:
- If the "ket" |ϕ⟩=⎛⎜ ⎜⎝a1a2⋮⎞⎟ ⎟⎠ is a column vector of complex numbers, then the same vector written as a "bra" ⟨ϕ| fmeans ⟨ϕ|:=¯ϕT=(¯a1,¯a2,…). Here the notation ¯a denotes "complex conjugate".
- When we write a ket and a bra together, we're performing matrix multiplication. So ⟨v|v⟩=vT⋅¯v as above denotes "horizontal times vertical" vector multiplication (which is dot product and gives a scalar) and |v⟩⟨v| denotes "vertical times horizontal" vector multiplication (which is external multiplication and gives a matrix). A good heuristic to remember is that "stuff between two brackets ⟨…⟩ is a scalar and stuff between two pipes |…| is a matrix.
There is often some discussion of distinguishable vs. indistinguishable particles, but it will not be relevant here and we'll ignore it.
I initially wrote this in the text, but decided to replace with a long footnote (taking a page from @Kaarel), since it's not strictly necessary for what follows.
A nice way to make this precise is to imagine that in addition to our collection of "coarse states" S={s1,s2,…,sm}, which encode "information about the particular system in question", there is a much larger collection of "fine states" U={u1,u2,…,uN} which we think of as encoding "all the information in the universe". (For convenience we assume both sets are finite.) For example perhaps the states of our system are 5-particle configurations, but the universe actually contains 100 particles (or more generally, our subsystem only contains coarse-grained information, like the average of a collection of particles, etc.). Given a state of the universe, i.e. a state of the "full/fine system", we are of course able to deterministically recover the state of our subsystem. I.e., we have a "forgetting information" map: F:U→S. In the case above of 5 particles in a 100-particle universe, the map F "forgets" all the particle information except the states of the first 5 particles. Conversely, given a "coarse" state s∈S, we have some degree of ignorance about the fine "full system" state u∈U that underlies it. We can measure this ignorance by associating to each coarse state a set Us:=F−1(s)⊂U, namely its preimage under the forgetting map.
Now when thinking of a Markov process, we assume that there is an "evolution" mapping A:U→U that "evolves" a state of the universe to a new state of the universe in a deterministic way. Now given such an evolution on the "full system" states, we can try to think what "dynamics" it implies on subsystem states S. To this end, we define the real number Mt(s,s′) to be the average over Us (universe states underlying S) of the indicator function δF(u)=s′. De-tabooing the word "probability", this is just the probability that a random "total" state underlying the coarse state s maps to a "total" state underlying s' after time t.
Now in general, it doesn't have to be the case that on the level of matrices, we have the Markov evolution behavior: e.g. that M2=M21. For example we might have chosen the evolution mapping A:U→U to be an involution with A2=I, in which case M2 is the identity matrix (whereas M1 might have been essentially arbitrary). However there is an inequality involving entropy (that I'm not going to get into -- but note that entropy is explainable to the alien as just a deterministic function on probability distribution "vectors") that for a given value of the single-transition matrix M1, the least possible information you may have about the double-transition matrix M2(s) is in a suitable sense "bounded" by M21. Moreover, there is a specific choice of "large system" dynamics, sometimes called a "thermal bath", which gives us time evolution Mk that is (arbitrarily close to) M1. Moreover, any system containing a thermal bath will have no more information about multistep dynamics than a thermal bath. Thus in the limit of modeling "lack of information" about the universe, but conditional on knowing the single-time step coarse transformation matrix M1, it makes sense to "posit" that our k-step dynamics is Mk1.
To prove the following formula holds, all we need is the identity U†=U−1 for unitary matrices. Here the "dagger" notation is a matrix version of |ϕ⟩↦⟨ϕ|, and takes a matrix to its "complex conjugate transpose" U†:=¯UT.
Note that instead of all operators here, it would be sufficient to only look at the (real, not complex) subspace of Hermitian operators which satisfy ρ†=ρ. In this case, lacking complex structure, evolution would no longer be unitary: it would be orthogonal instead.
If you read the massive footnote about "explaining probability theory to an alien" above, you know that whenever we talk about probabilities we are making a secret implicit assumption that we are in the "worst-case" informational environment, where knowing dynamics on the "coarse" system being observed gives minimal information about the environment -- this can be guaranteed by assuming the environment contains a "thermal bath". The same story applies here: a priori, it's possible that there is some highly structured interaction between the system and the environment that lets us make a "more informative" picture of the evolution, that would depend on the specifics of system-environment interaction; but if we assume that interactions with the environment are "minimally informative", then any additional details about the rest of the universe get "integrated out" and the Lindbladian is the "true answer" to the evolution dynamics.
The history is actually a bit tangled here with the term attributed to various people -- it seems the first people to actually talk about phase space in the modern way were actually Ludwig Boltzmann, Henri Poincaré, and Josiah Willard Gibbs.