(This post was originally intended as a comment on Adele's question, but ballooned to the point where it seems worthy of a toplevel post. Note that I'm not trying to answer Adele's (specific fairlytechnical) question here. I consider it to be an interesting one, and I have some guesses, but here I'm comentating on how some arguments mentioned within the question relate to the mysteries swirling around the Born rule.)
(Disclaimer: I wrote this post as a kind of intellectual recreation. I may not have the time and enthusiasm to engage with the comments. If you point to a gaping error in my post, I may not reply or fix it. If I think there's a gaping error in your comment, I may not point it out. You have been warned.)
My current take is that the "problem with the Born rule" is actually a handful of different questions. I've listed some below, including some info about my current status wrt each.
Q1. What hypothesis is QM?
In, eg, the theory of Solomonoff induction, a "hypothesis" is some method for generating a stream of sensory data, interpreted as a prediction of what we'll see. Suppose you know for a fact that reality is some particular state vector in some Hilbert space. How do you get out a stream of sensory data? It's easy enough to get a single sensory datum — sample a classical state according to the Born probabilities, sample some coordinates, pretend that there's an eyeball at those coordinates, record what it sees. But once we've done that, how do we get our next sense datum?
Or in other words, how do we "condition" a quantum state on our past observations, so that we can sample repeatedly to generate a sequence of observations suitable for linking our theories of induction with our theories of physics?
To state the obvious, a sensory stream generated by just resampling predicts that you're constantly teleporting through the multiverse, and a sensory stream generated by putting a delta spike on the last state you sampled and then evolving that forward for a tick will... not yield good predictions (roughly, it will randomize all momenta).
Current status: I assert that additional machinery is required to turn QM into a hypothesis in the inductioncompatible sense — ie, I'd say "the Born rule is not complete (as a rule for generating a hypothesis from a quantum state)". My guess is that the missing machinery involves something roughly like sampling classical states according to the Born rule and filtering them by how easy it is to read the (remembered) sense history off of them. I suspect that a full resolution of this question requires some mastery of naturalized induction. (I have some more specific models than this that I won't get into at the moment. Also there are things to say about how this problem looks from the updateless perspective, but I also won't go into that now.)
ETA: I am not claiming to be the first person to notice this problem. As best I can tell, this problem or something close to it is what physicists refer to as the "measurement problem". I have not seen anyone clearly frame it as a challenge of segueing a quantum state into an inductorcompatible sensory stream; I'd guess that's b/c most physicists don't (think most other physicsts) natively speak inductortongue. I'm aware of the fact that various people have worked on the problem of identifying qualitative branches in a quantum state, and that one explicit motivation for that research is resolving this issue. @interstice linked some below, thanks interstice. That's not my preferred approach. I still think that that research is cool.
Q2. Why should we believe the Born rule?
For instance, suppose my friend is about to roll a biased quantum die, why should I predict according to the Borngiven probabilities?
The obvious answer is "because we checked, and that's how it is (ie, it's the simplest explanation of the observed data so far)".
I suspect this answer is correct, but I am not personally quite willing to consider the case closed on this question, for a handful of reasons:

I'm not completely solid on how to twist QM into a fullon sensory stream (see Q1), and I suspect some devils may be lurking in the details, so I'm not yet comfortable flatly declaring "Occam's razor pins the Born rule down".

There's an intuitive difference (that may or may not survive philosophical progress) between indexical uncertainty, empirical uncertainty, and logical uncertainty, and it's not completely obvious that I'm supposed to use induction to manage my indexical uncertainty. For example, if I have seen a million coin tosses in my past, and 2/3 of them came up heads (with no other detectable pattern), and I have a bona fide guarantee that I'm an emulation running on one of 2^2000000 computers, each of which is halfway through a simulation of me living my life while two million coins get flipped (in literally all combinations), then there's some intuition that I'm supposed to predict the future coins to be unbiased, in defiance of the observed past frequency. Furthermore, there's an intuition that QM is putting us in an analogous scenario. (My current bet is that it's not, and that the aforementioned intuition is deceptive. I have models about precisely where the disanalogy is that I won't go into at the moment. The point I'm trying to make is that it's reasonable to think that the Born rule requires justification beyond 'Occam says'. See also Q4 below.)

It's not clear to me that the traditional induction framework is going to withstand the test of time. For example, the traditional framework has trouble dealing with inductors who live inside the world and have to instantiate their hypotheses physically. And, humans sure are keen to factor their hypotheses into "a world" + "a way of generating my observations from some path through that world's history". And, the fact that QM does not naturally beget an observation stream feels like something of a hint (see Q1), and I suspect that a better theory of induction would accommodate QM in a way that the traditional theory doesn't. Will a better theory of reasoningwhileinsidetheworld separate the "world" from the "location therein", rather than lumping them all into a single sensory stream? If so, might the Born rule end up on the opposite side of some relevant chasm? I suspect not, but I have enough confusion left in this vicinity that I'm not yet comfortable closing the case.
My current status is "best guess: we believe the Born for the usual reason (ie "we checked"), with the caveat that it's not yet completely clear that the usual reason works in this situation".
Q3. But... why the Born rule in particular?
Why is the Born rule natural? In other words, from what mathematical viewpoint is this a rule so simple and elegant as to be essentially forced?
Expanding a bit, I observe that there's a sense in which discrete mathematics feels easier to many humans (see, eg, how human formalizations of continuous math often arise from taking limits or other εδmanship built atop our formalizations for discrete math). Yet, physics makes heavy use of smooth functions and differential equations. And, it seems to me like we're supposed to stare at this and learn something about which things are "simple" or "elegant" or "cheap" with respect to reality. (See also gauge theory and the sense that it is trying to teach us some lessons about symmetry, etc.)
I think that hungerforalesson is part of the "but whyyyy" that many people feel when they encounter the Born rule. Like, why are we squaring amplitude? What ever happened to "zero, one, or infinity"? When physics raises something to a power that's not zero, one, or infinity, there's probably some vantage point from which this is particularly forced, or simple, or elegant, and if you can find it then it can likely help you predict what sorts of other stuff you'll see.
Or to put it another way, consider the 'explanation' of the Born rule which goes "Eh, you have a complex number and you need a real number, there aren't that many ways you can do it. Your first guess might be 'take the magnitude', your second guess might be 'take the real component', your third guess might be 'multiply it by its own complex conjugate', and you'll turn out to be right on the third try. Third try isn't bad! We know it is so because we checked. What more is there to be explained?". Observe that there's a sense in which this explanation feels uncompelling — like, there are a bunch of things wrong with the objection "reality wasn't made by making a list of possible ways to get a real from a complex number and rolling a die", but there's also something to it.
My current status on this question is that it's significantly reduced — though not completely solved — by the argument in the OP (and the argument that @evhub mentions, and the ignorance+symmetry argument @Charlie Steiner mentions, which I claim all ground out in the same place). In particular, I claim that the aforementioned argumentcluster grounds out the Born rule into the inner product operator, thereby linking the apparentlyoutoftheblue 2 in the Born rule with the same 2 from "L2 norm" and from the Pythagorean theorem. And, like, from my vantage point there still seem to be deep questions here, like "what is the nature of the connection between orthonormality and squaring", and "is the L2 norm preferred b/c it's the only norm that's invariant under orthonormal change of basis, or is the whole idea of orthonormality somehow baking in the fact that we're going to square and sqrt everything in sight (and if so how)" etc. etc. I might be willing to consider this one solved in my own book once I can confidently trace that particular 2 all the way back to its maker; I have not yet done so.
For the record, on the axis from "Gentlemen, that is surely true, it is absolutely paradoxical; we cannot understand it, and we don't know what it means. But we have proved it, and therefore we know it must be the truth" to... whatever the opposite of that is, I tend to find myself pretty far on the "opposite of that" end, ie, I often anticipate finding explanations for logical surprises. In this regard, I find arguments of the form "the Born rule is the only one that satisfies properties X, Y, and Z" fairly uncompelling — those feel to me like proofs that I must believe the Born rule is good, not reasons why it is good. I'm generally much more compelled by arguments of the form "if you meditate on A, B, and C you'll find that the Correct Way (tm) to visualize the xness of (3x, 4y) is with the number (3^2/5)" or suchlike. Fortunately for me, an argument of the latter variety can often be reversed out of a proof of the former variety. I claim to have done some of that reversing in the case of the Born rule, and while I haven't fully absorbed the results yet, it seems quite plausible to me that the argument cluster named by Adele/Evan/Charlie essentially answers this third question (at least up to, say, some simpler Qs about the naturality of inner products).
Q4. wtf magical reality fluid
What the heck is up with the thing where, not only can we be happening in multiple places, but we can be happening quantitatively more in some of them?
I see this as mostly a question of anthropics, but the Born rule is definitely connected. For instance, you might wish to resolve questions of howmuchyou'rehappening by just counting physical copies, but this is tricky to square with the continuous distribution of QM, etc.
Some intuition that's intended to highlight the remaining confusion: suppose you watch your friend walk into a personduplicating device. The left copy walks into the left room and grabs a candy bar. The right copy walks into the right room and is just absolutely annihilated by a tangle of whirring blades — screams echo from the chamber, blood spatters against the windows, the whole works. You blink in horror at the left clone as they exit the door eating a candy bar. "What?" they say. "Oh, that. Don't worry. There's a dial in the duplicating device that controls how happening each clone is, and the right clone was happening only negligibly — they basically weren't happening at all".
Can such a dial exist? Intuition says no. But quantum mechanics says yes! Kind of! With the glaring disanalogy that in QM, you can't watch the negligiblyhappening people get ripped apart — light bouncing off of them cannot hit your retinas, or else their magicalhappeningness would be comparable to yours. Is that essential? How precisely do we go about believing that magical happeningness dials exist but only when things are "sufficiently noninteracting"? (Where, QM reminds us, this interactingness is a continuous quantity that rarely if ever hits zero.) (These questions are intended to gesture at confusion, not necessarily to be answered.)
And it feels like QM is giving us a bunch of hints — ie, if physics turned out to look like a discrete state plus a discrete time evolution rule, we would have been able to say "aha, that's what happening" and feel content about it, never quite noticing our deeper confusion about this whole "happeningness" thing. But reality's not like that. Reality is like a unit vector in an extraordinarily highdimensional room, casting complexvalued shadows on each wall in the room, and each wall corresponds to a way that everything can be arranged. And if we cast our gaze to the walls in accordance with the degree to which that wall is supporting the overall magnitude of the realityvector (ie, in accordance with the shadow that the shadowonthewall casts back onto reality, ie in proportion to the shadow times its conjugate, ie in proportion to the squared amplitude of the shadow) then our gaze occasionally falls on arrangements of everything that look kinda like how everything seems to be arranged. And if we cast our gaze using any other rule, we find only noise. And, like, one thing you can do is be like "haha weird" and then figure out how to generate an observation stream from it and chalk it up to "we followed Occam's razor and this is what we found". But it seems to me that this is ignoring this great big surprise that reality handed us. This is an unexpected breed of object for reality to be. This shadowofashadow thing feels like a surprising way for happeningness to metahappen. It all feels like a hint, a hint about how our beliefs about what the heck is going on with this whole "existence" thing are built atop false assumptions. And it's a hint that I can't yet read.
And... this is somewhat related to the beef I have with measure nonrealism. Like, one thing a person can say is "everything is happening; I'm built to optimize what happens in places in accordance with how simple they are; it seems that the simplest way you find me in the logical multiverse is by flitting your gaze along those walls in accordance with the shadowofashadow and in accordance with some asyetunnamed rule about following coherent histories starting from the birth of a particular child; the shadowofashadow rule is elegant, ridiculously overdetermined by the data, and has no special status relative to any other part of the description of how to find me; what remains to be explained?" And... well, I'm still pretty confused about this whole "stuff is happening" thing. And I'm suspicious of a metaphysics that places physics on the same status as every other mathematical object, b/c I am not yet sure which of physics and math "comes first". And yes, that's a confused question, but that doesn't make me any less confused about the answer. And, yeah, there are deflationary measurenonrealist replies to these inarticulate gesticulations, but they leave me no less confused. And all the while, reality is sitting there having this counterintuitive shadowcasting form, and I cannot help but wonder what false assumptions it would reveal, what mysteries it would lay bare, what lessons it would teach about which sorts of things can metaexist at all, if only I could find my errant intuitions and put them in contact with this surprise.
And, like, there's a way in which the hypothesis "everything is; we are built to attend to the simple stuff" is a curiositystopper — a mental stance that, when adopted, makes it hard to mine a surprise like "reality has the quantum nature" for information about what sort of things can be.
I have a bunch more model than this, and various pet hypotheses, but ultimately my status on this one is "confused". I expect to remain confused at least until the point where I can understand all these blaring hints.
In sum, there are some ways in which I find the Born rule nonmysterious, and there are also Bornrulerelated questions that I remain quite confused about.
With regards to the things I consider nonmysterious, I mostly endorse the following, with some caveats (mostly given in the Q2 section above):
The Born rule is on the same status as the Fourier transform in quantum mechanics — it's just another equation in the simple description of where to find us. It gets an undeservedly bad rep on account of being just barely on the realityside of the weird boundary humans draw between "reality" and "my location therein" in their hypotheses, and it has become a posterchild for the counterintuitive manner in which we are embedded in our reality. Even so, fixing the nature of the rest of reality, once one has fully comprehended the job that the Born rule does, the Born rule is the only intuitively natural tool for its job.
(And, to be clear, I've updated in favor of that last sentence in recent times, thanks in part to meditating on the cluster of arguments mentioned by Adele/Evan/Charlie.)
With regards to the remaining mystery, there is a sense in which the Born rule is the star in a question that I consider wideopen and interesting, namely "why is 'trace your eyes across these walls in accordance with the Born rule' a reasonable way for reality to be?". I suspect this question is confused, and so I don't particularly seek its answer, but I do seek mastery of it, and I continue to expect such mastery to pay dividends.
The L2 norm is the only Lp norm that can be preserved by any nontrivial change of basis (the trivial ones: permuting basis elements and multiplying some of them by 1). This follows from the fact that, for p≠2, the basis elements are their negatives can be identified just from the Lp norm and the addition and scalar multiplication operations of the vector space. To intuitively gesture at why this is so, let's look at L1 and L∞.
In L1, the norm of the sum of two vectors is the sum of their norms iff for each coordinate, both vectors have components of the same sign; otherwise, they cancel in some coordinate, and the norm of the sum is smaller than the sum of the norms. 0 counts as the same sign as everything, so the more zeros a vector has in its coordinates, the more other vectors it will have the maximum possible norm of sum with. The basis vectors and their negations are thus distinguished as those unit vectors u for which the set {v : u+v = u+v} is maximal. Since the alternative to u+v = u+v is u+v < u+v, the basis vectors can be thought of as having maximal tendency for their sums with other vectors to have large norm.
In L∞, on the other hand, as long as you're keeping the largest coordinate fixed, changing the other coordinates costs nothing in terms of the norm of the vector, but making those other coordinates larger still creates more opportunities to change the norm of other vectors when you add them together. So if you're looking for a unit vector u that minimizes {v : u+v ≥ v}, u is a basis vector or the negation of one. The basis vectors have minimal tendency for their sums with other vectors to have large norm.
As p increases, the tendency for basis vectors to have large sums with other vectors decreases (as compared to the tendency for arbitrary vectors to have large sums with other vectors). There must be a crossover point where whether or not a vector is a basis vector ceases to be predictive of the norm of its sum with an arbitrary other vector, and we lose the ability to figure out which vectors are basis vectors only at that point, which is p=2.
So if you're trying to guess what sort of norm some vector space naturally carries (let's say you're given, as a hint, that it's an Lp norm for some p), L2 should start out as a pretty salient option, along with, and arguably ahead of, L1 and L∞. As soon as you hear anything about there being multiple different bases that seem to have equal footing (as is saliently the case in QM), that settles it: L2 is the only option.
<3, this is exactly the sort of thought I claim to be missing when I say I still don't know how to trace the 2 in the Born rule back to its maker. This is a step I didn't yet have. It doesn't feel like the last piece I'm missing, but it does feel like a piece  eg, now I can focus some attention on "why precisely is this crossover point at 2 / where is that 2 coming from?". Thanks!
(And ofc there's still a question about why we use an Lp norm, and indeed why we pass our gaze along the walls in a way that factors through the shadow on that wall, but I am fairly happy rolling most of that into the "what are we to learn from the fact that reality has the quantum nature" bundle.)
A related thing that's special about the L2 norm is that there's a bilinear form <⋅,⋅>:V×V→R such that v carries the same information as <v,v>.
"Ok, so what? Can't do you the same thing with any integer n, with an nlinear form?" you might reasonably ask. First of all, not quite, it only works for the even integers, because otherwise you need to use absolute value*, which isn't linear.
But the bilinear forms really are the special ones, roughly speaking because they are a similar type of object to linear transformations. By currying, a bilinear form on V is a linear map V→V∗, where V∗ is the space of linear maps V→R. Now the condition of a linear transformation preserving a bilinear form can just be written in terms of chaining linear maps together. A linear map f:V→W has an adjoint f∗:W∗→V∗ given by f∗(φ)(v)=φ(f(v)) for φ:W→R, and a linear map f:V→V preserves a bilinear form B:V→V∗ iff f∗∘B∘f=B. When using coordinates in an orthonormal basis, the bilinear form is represented by the identity matrix, so if f is represented by the matrix A, this becomes A∗IA=I, which is where the usual definition A∗A=I of an orthogonal matrix comes from. For quadrilinear forms etc, you can't really do anything like this. So it's L2 for which you get a way of characterizing "normpreserving" in a nice clean linearalgebraicincharacter way, so it makes sense that that would be the one to have a different space of normpreserving maps than the others.
I also subtly brushed past something that makes L2 a particularly special norm, although I guess it's not clear if it helps. A nondegenerate bilinear form is the same thing as an isomorphism between V and V∗. If <v,v> is always positive, then taking its square root gives you a norm, and that norm is L2 (though it may be disguised if you weren't using an orthonormal basis); and if it isn't always positive, then you don't get a norm out of it at all. So L2 is unique among all possible norms in that it induces and comes from an identification between your vector space and its dual.
*This assumes your vector space is over R for simplicity. If it's over C, then you can't get multilinearity no matter what you do, and the way this argument has to go is that you can get close enough by taking the complex conjugate of exactly half of the inputs, and then you get multilinearity from there. Speaking of C, this reminds me that I was inappropriately assuming your vector space was over R in my previous comment. Over C, you can multiply basis vectors by any scalar of absolute value 1, not just +1 and 1. This is broader that the normpreserving changes of basis you can do over R to exactly the extent explicable by the fact that you're sneaking in a little bit of L2 via the definition of the absolute value of a complex number.
Thanks! I expect I can stare at this and figure something out about why there is no reasonable notion of "triality" in Vect (ie, no 3way analog of vector space duality  and, like, obviously that's a little ridiculous, but also there's definitely still something I haven't understood about the specialness of the dual space).
ETA: Also, I'm curious what you think the connection is between the "L2 is connected to bilinear forms" and "L2 is the only Lp metric invariant under nontrivial change of basis", if it's easy to state.
FWIW, I'm mostly reading these arguments as being variations on "if you put anything else than a 2 there, your life sucks", and I believe that, but I still have a sense that the explanation I'm looking for is more about how putting a 2 there is positively natural, not just the best of a bad lot. That said, I'm loving these arguments, and I expect I can mine them for some of the intuitioncorrections I seek :)
I was just thinking back to this, and it occurred to me that one possible reason to be unsatisfied with the arguments I presented here is that I started off with this notion of a crossingover point as p continuously increases. But then when you asked "ok, but why is the crossingover point 2?", I was like "uh, consider that it might be an integer, and then do a bunch of very discretelooking arguments that end up showing there's something special about 2", which doesn't connect very well with the "crossover point when p continuously varies" picture. If indeed this seemed unsatisfying to you, then perhaps you'll like this more:
If we have a norm on a vector space, then it induces a norm on its dual space, given by φ:=maxv=1φ(v). If a linear map preserves a norm, then its adjoint preserves the induced norm on the dual space.
Claim: The Lp norm on column vectors induces, as its dual, the Lq norm on row vectors, where p and q satisfy 1p+1q=1.
Thus if a matrix preserves Lp norm, then its adjoint preserves Lq norm. When p=2, we get that its adjoint preserves the same norm. This sort of gives you a natural way of seeing 2 as halfway between 1 and infinity, and giving, for every p, a corresponding q that is equally far away from the middle in the other direction, in the appropriate sense.
Proof of claim: Given p and q such that 1p+1q=1, and a row vector φ=(φ1,...,φn) with Lq norm 1, let xi=φiq, so that x1+...+xn=1. Then let vi:=±x1/pi (with the same sign as φi). The column vector v=(v1,...,vn)T has Lp norm 1. φv=φ1v1+...+φnvn=x1p+1q1+...+x1p+1qn=1. This shows that the dualLp norm of φ is at least 1. Standard constrained optimization techniques will verify that this v maximizes φv subject to the constraint that v has Lp norm 1, and thus that the dualLp norm of φ is exactly 1.
Corollary: If a matrix preserves Lp norm for any p≠2, then it is a permutation matrix (up to flipping the signs of some of its entries).
Proof: Let q be such that 1p+1q=1. The columns of the matrix each have Lp norm 1, so the whole matrix has Lp norm n1/p (since the entries from each of the n columns contribute 1 to the sum). By the same reasoning about its adjoint, the matrix has Lq norm n1/q. Assume wlog p<q. Lq norm is ≤ Lp norm for q>p, with equality only on scalar multiples of basis vectors. So if any column of the matrix isn't a basis vector (up to sign), then its Lq norm is less than 1; meanwhile, all the columns have Lq norm at most 1, so this would mean that the Lq norm of the whole matrix is strictly less than n1/q, contradicting the argument about its adjoint.
This was what I was trying to vaguely gesture towards with the derivation of the "transpose = inverse" characterization of L2preserving matrices; the idea was that the argument was a natural sort of thing to try, so if it works to get us a characterization of the Lppreserving matrices for exactly one value of p, then that's probably the one that has a different space of Lppreserving matrices than the rest. But perhaps this is too sketchy and mysterian. Let's try a dimensioncounting argument.
Linear transformations Rn→Rn and bilinear forms Rn×Rn→R can both be represented with n×n matrices. Linear transformations act on the space of bilinear forms by applying the linear transformation to both inputs before plugging them into the bilinear form. If the matrix A represents a linear transformation and the matrix B represents a bilinear form, then the matrix representing the bilinear form you get from this action is ATBA. But whatever, the point is, so far we have an n2dimensional group acting on an n2dimensional space. But quadratic forms (like the square of the L2 norm) can be represented by symmetric n×n matrices, the space of which is (n+12)dimensional, and if B is symmetric, then so is ATBA. So now we have an n2dimensional group acting on a (n+12)dimensional space, so the stabilizer of any given element must be at least n2−(n+12)=(n2) dimensional. As it turns out, this is exactly the dimensionality of the space of orthogonal matrices, but the important thing is that this is nonzero, which explains why the space of orthogonal matrices must not be discrete.
Now let's see what happens if we try to adapt this argument to Lp and plinear forms for some p≠2.
With p=1, a linear transformation preserving a linear functional corresponds to a matrix A preserving a row vector φ in the sense that φA=φ. You can do a dimensioncounting argument and find that there are tons of these matrices for any given row vector, but it doesn't do you any good because 1 isn't even so preserving the linear functional doesn't mean you preserve L1 norm.
Let's try p=4, then. A 4linear form Rn×Rn×Rn×Rn→R can be represented by an n×n×n×n hypermatrix, the space of which is n4dimensional. Again, we can restrict attention to the symmetric ones, which are preserved by the action of linear maps. But the space of symmetric n×n×n×n hypermatrices is (n+34)dimensional, still much more than n2. This means that our linear maps can use up all of their degrees of freedom moving a symmetric 4linear form around to different 4linear forms without even getting close to filling up the whole space, and never gets forced to use its surplus degrees of freedom with linear maps that stabilize a 4linear form, so it doesn't give us linear maps stabilizing L4 norm.
I'll add to what you said in your main comment and the one below that the $L^2$ norm is also the buildin norm of human beings (and arguably all animals), as we evaluate distances in $R^3$ in the Euclidean norm (of which the $L^2$ norm is a generalisation) rather than the $L^1$ or $L^\infty$ norms.
The $L^2 norm also seems to be the norm of the physical world  Newton's laws for example use the Euclidean norm.
This doesn't seem like it should be too hard  if you have some degrees of freedom which you take as representing your 'eyeball', and a preferred basis of 'measurement states' for that eyeball, repeatedly projecting onto that measurement basis will give sensible results for a sequence of measurements. Key here is that you don't have to project e.g. all the electrons in the universe onto their position basis  just the eyeball DOF onto their preferred 'measurement basis'(which won't look like projecting the electrons onto their position basis either), and then the relevant entangled DOF in the rest of the universe will automatically get projected onto a sensible 'classicallike' state. The key property about the universe's evolution that would make this procedure sensible is noninterference between the 'branches' produced by successive measurements. i.e. if you project onto two different eyeball states at time 1, then at time 2, those states will be approximately noninterfering in the eyeball basis. This is formalized in the consistent histories approach to QM.
What's somewhat trickier is identifying the DOF that make a good 'eyeball' in the first place, and what the preferred basis should be. More broadly it's not even known what quantum theories will give rise to 'classicallike' states at all. The place to look to make progress here is probably the decoherence literature, also quantum darwinism and Jess Riedel's work.
I agree that the problem doesn't seem too hard, and that there are a bunch of plausibleseeming theories. (I have my own pet favorites.)
I think that virtually every specialist would give you more or less the same answer as interstice, so I don't see why it's an open question at all. Sure, constructing a fully rigorous "eyeball operator" is very difficult, but defining a fully rigorous bridge rule in a classical universe would be very difficult as well. The relation to anthropics is more or less spurious IMO (MWI is just confused), but also anthropics is solvable using the infraBayesian approach to embedded agency. The real difficulty is understanding how to think about QM predictions about quantities that you don't directly observe but that your utility function depends on. However, I believe that's also solvable using infraBayesianism.
My own most recent pet theory is that the process of branching is deeply linked to thermalization, so to find model systems we should look to things modeling the flow of heat/entropy  e.g. a system coupled to two heat baths at different temperatures.
^_^
Also, thanks for all the resource links!
I think quantum darwinism is on the right track. FWIW, I found Zurek's presentation of it here to be more clear to me.
The gist of it is, AFAICT:
I have a different answer to go alongside AlexMennen's answer.
In differential topology, there is an important distinction between vectors and covectors. To see what this is, we need to look at the behavior under a change of basis. If we double our basis vectors, then we'll need to halve the coordinates of a vector, but we'll need to double the coordinates of a covector. A good way to visualize this is as a geographical map with contour lines. Position differences are vectors, and the contours are covectors.
You can think of covectors as measuring vectors, but without adding something new, there's not a natural way to compare two vectors to each other. To compare two things, you need a function that will take those two things as an input, and return a scalar. In order for such a function to be invariant under change of basis, it will have to be a (0, 2)tensor (aka a bilinear form). Let's call this tensor T(u,v). Now if we multiply u and v by a scalar r, then bilinearity forces that T(ru, rv) = r^2 T(u,v) which is the squaring we were looking form (and in general, you can prove that T(u,u) is a quadratic form, and that it must actually be quadratic if we want both inputs to matter).
So to summarize:
Thanks! This seems to me like another piece of the puzzle =D
In this case, this is one that I already had (at least, well enough for the hindsight bias to kick in :p), and it's on my list of trailheads next time I try to ground out the 2 in the Born rule. FWIW, some lingering questions I have when I take this viewpoint include "ok, cool, why are there no corresponding situations where I want to compare 3 vectorish thingies?" / "I see why the argument works for 2, but I have a sneaking suspicion that this 2 is being slipped into the problem statement in a way that I'm not yet quite following". Also, I have a sense that there's some fairly important fact about anointing some linear isomorphism between a vector space and its dual as "canonical" that I have yet to grasp. Like, some part of the answer to "why do I never want to compare 3 vectorish thingies" is b/c the relationship between a vector space and its dual space is somehow pretty special, and there's no correspondingly special... triality of vector spaces. (Hrm. I wonder whether I can ground the 2 in the Born rule out into the 2 in categorical duality. That would be nuts.)
FWIW, one of my litmus tests is question "assuming we are supposed to measure distance in R2 using an L2 norm, why are we not supposed to measure distance in R3 using an L3 norm?". And, like, I have a bunch of explanations for why this is (including "L3 isn't invariant under most any change of basis" (per Alex's argument above) and "b/c the natural notion of distance between two points in R3 factors into two questions of distance in R2, so using L2 in R2 pins down using L2 in Rn"), but I still feel like there's some... surprising fixation on 2 here, that I can't yet explain to the satisfaction of youngNate who guesses that cuberute(x^3 + y^3 + z^3)=r is the equation for a sphere. Like, I still feel kinda like math said "proof by induction: starting in the case where n=2, ..." and I'm like "wat why aren't we starting at 0" and it's like "don't worry, n < 2 will work out as special cases" and I'm like "ok, sure, this argument is valid, but also wtf are you doing". My wtfs aren't all ironed out yet.
And, maybe I'm chasing shadows (eg, seeking a logical explanation where there is none, which is the sort of thing that can happen to a poor sap who lacks an explicit understanding of logical causality), but my suspicion is that I'm still missing part of the explanation. And both this route (which I'd gloss as "understand why it's so important/natural/??? to have/choose/determine a canonical isomorphism between your vector space and its dual, then appeal to bilinearity", with a prereq of better understanding why we have/careabout duality but not triality in Vect) and Alex's (which I'd gloss as "explain why 2 obviouslyshouldbe the balance point in the Lp norms") both feel like good trailheads to me.
(And, in case this wasn't clear to all readers, none of my assertions of confusion here are allegations that everyone else is similarly confused  indeed, Alex and Adele have already given demonstrations to the contrary.)
<3 hooray.
Awesome!
So, trilinear forms are a thing: for example, if you have 3 vectors, and you want to know the volume of the parallelepiped they form, that's a trilinear form. And that clearly has a "cubicness" to it, and you can do this for arbitrary numbers of vectors and covectors. The Riemann curvature tensor is perhaps the most significant one that has more than 2 (co)vectors involved. FWIW the dual space thing also seems likely to be important for my confusion about why phase space "volume" is 2dimensional (even in super huge phase spaces)!
I would say that distance is bilinear in arbitrary dimension because it's also inherently a comparison of two vectors (a vector to measure, and a "unit" vector to measure it by). Not sure if that reduces things any for you.
For me, it doesn't feel like there's going to be anything beyond "because comparison is important, and inherently 2ish" for this. I do think part of why a metric is so significant is related to the dual space, but my guess is that even this will ultimately boil down to "comparison" (maybe as the concept of equality) being important.
I think I basically agree with all of this, though I definitely think that the problem that you're pointing to is mostly not about the Born rule (as I think you mostly state already), and instead mostly about anthropics. I do personally feel pretty convinced that at least something like UDASSA will serve the test of time on that front—it seems to me like you mostly agree with that, but just think that there are problems with embededness + figuring out how to properly extract a sensory stream that still need to be resolved, which I definitely agree with, but still expect the end result of resolving those issues to look UDASSAish.
I also definitely agree that there are really important hints as to how we're supposed to do things like anthropics that we can get from looking at physics. I think that if you buy that we're going to want something UDASSAish, then one way in which we can interpret the hint that QM is giving us is as a hint as to what our Universal Turing Machine should be. Obviously, the problem with that is that it's a bit circular, since you don't want to choose a UTM just by taking a maximum likelihood estimate, otherwise you just get a UTM with physics as a fundamental operation. I definitely still feel confused about the right way to use physics as evidence for what our UTM should be, but I do feel like it should be some form of evidence—perhaps we're supposed to have a preprior or something here to handle combining our prior beliefs about simple UTMs with our observations about what sorts of UTM properties would make the physics that we find ourselves in look simple.
It's also worth noting that UDASSA pretty straightforwardly predicts the existence of happeningness dials, which makes me find their existence in the real world not all that surprising.
Also, it's not really either here not there, but as an aside I feel like this sort of discussion is where the meat of not just anthropics, but also population ethics is supposed to be—if we can figure out what the “correct” anthropic distribution is supposed to be (whatever that means), then that measure should also clearly be the measure that you use to weight personmoments in your utilitarian calculations.
I agree that the Born rule is just the poster child for the key remaining confusions (eg, I would have found it similarly natural to use the moniker "Hilbert space confusions").
I disagree about whether UDASSA contains much of the answer here. For instance, I have some probability on "physics is deeper than logic" being moretruethantheopposite in a way that ends up tossing UDASSA out the window somehow. For another instance, I weakly suspect that "running an emulation on a computer with 2xasthick wires does not make them twiceashappening" is closer to the truth than the opposite, in apparent contradiction with UDASSA. More generally, I'm suspicious of the whole framework, and the "physics gives us hints about the UTM that metareality uses" line of attack feels to me like it has gone astray somewhere. (I have a bunch more model here, but don't want to go into it at the moment.)
I agree that these questions likely go to the heart of population ethics as well as anthropics :)
I feel like I would be shocked if running a simulation on twiceasthick wires made it twice as easy to specify you, according to whatever the “correct” UTM is. It seems to me like the effect there shouldn't be nearly that large.
This is precisely the thought that caused me to put the word 'apparent' in that quote :p. (In particular, I recalled the original UDASSA post asserting that it took that horn, and this seeming both damningtome and notobviouslytrueforthereasonyoustate, and I didn't want to bog my comment down, so I threw in a hedge word and moved on.) FWIW I have decent odds on "a thicker computer (and, indeed, any number of additional copies of exactly the same em) has no effect", and that's more obviously in contradiction with UDASSA.
Although, that isn't the name of my true objection. The name of my true objection is something more like "UDASSA leaves me no less confused, gives me no sense of "aha!", or enlightenment, or amysteryunraveled, about the questions at hand". Like, I continue to have the dualing intuitions "obviously more copies = more happening" and "obviously, setting aside how it's nice for friends to have backup copies in case of catastrophe, adding an identical em of my bud doesn't make the world better, nor make their experiences different (never mind stronger)". And, while UDASSA is a simple idea that picks a horse in that race, it doesn't... reveal to each intuition why they were confused, and bring them into unison, or something?
Like, perhaps UDASSA is the answer and I simply have not yet figured out how to operate it in a way that reveals its secrets? But I also haven't seen anyone else operate it in a way that reveals the sort of things that seemlikedeconfusiontome, and my guess is that it's a red herring.
Absolutely no effect does seem pretty counterintuitive to me, especially given that we know from QM that different levels of happeningness are at least possible.
I think my answer here would be something like: the reason that UDASSA doesn't fully resolve the confusion here is that UDASSA doesn't exactly pick a horse in the race as much as it enumerates the space of possible horses, since it doesn't specify what UTM you're supposed to be using. For any (computable) tradeoff between “more copies = more happening” and “more copies = no impact” that you want, you should be able to find a UTM which implements that tradeoff. Thus, neither intuition really leaves satisfied, since UDASSA doesn't actually take a stance on how much each is right, instead just deferring that problem to figuring out what UTM is “correct.”
I also have that counterintuition, fwiw :p
I have the sense that you missed my point wrt UDASSA, fwiw. Having failed once, I don't expect I can transmit it rapidly via the medium of text, but I'll give it another attempt.
This is not going to be a particularly tight analogy, but:
Alice is confused about metaethics. Alice has questions like "but why are good things good?" and "why should we care about goodness?" and "if goodness is not objective, can I render murder good by deciding it's good?".
Bob is not confused about ethics. Bob can correctly answer many of Alice's questions: "good things are good b/c they result in good things such as, eg, human flourishing", and "because we like good consequences, such as human flourishing", and "no, because murder is not in fact good". (...I'm only subtweeting Sam Harris a little bit, here.)
The problem with these answers is not that they are incorrect. The problem with these answers is that they are not deconfusing, they are not identifying the box that Alice is trapped in and freeing her from it.
Claire is not confused about metaethics. Claire can state correct answers to the questions that Alice did not know she was asking, such as "Alice!goodness is moreorless a fixed logical funtion; Alice!goodness is perhaps slightly different from Claire!goodness but they are close enough as to make no difference against the space of values; this fixed logical function was etched into your genes by eons of sex and death; it is however good, and other logical functions in its place would not be."
The problem with these answers is not that they are incorrect, as answers to the questions that Alice would have been asking were she freed from her box (although, once she's glimpsed the heretofore hidden degree of freedom, she's unlikely to need to actually ask those questions). The problem with these answers is that they are not meeting Alice at the point of her confusion. To her, they sound sort of odd, and do not yet have a distinguishable ring of truth.
What Alice needs in this hypothetical is a bunch of thoughtexperiments, observations, and considerations that cause her to percieve the dimension along which her hypotheses aren't yet freed, so that the correct hypothesis can enter her view / so that her mind can undergo a subtle shiftinhowsheframesthequestion such that the answers Claire gives suddenly become intuitively clear. She's probably going to need to do a lot of the walking herself. She needs questions and nudges, not answers. Or something. (This is hard to articulate.)
I claim that my state wrt various anthropic questions  such as the ol' trilemma  is analogous to that of Alice. I expect that becoming deconfused about the trilemma to feel like a bunch of changes to my viewpoint that cause the correct hypothesis to enter my view / that cause my mind to undergo a shiftinhowIframethequestion such that the correct answer to snaps into focus. (This is still hard to articulate. I don't think my words have captured the core. Hopefully they have waved in the right direction.) More generally, I claim to know what deconfusion looks like, and I can confidently assert that UDASSA hasn't done it for me yet.
Like, for all I know, the odd shit UDASSA says to me is like the phrases Claire says to Alice  correct, topical, but oddseeming and foreign from my current state of confusion. Perhaps there's a pathway through the valley of my confusion that causes me to shift my understanding of (eg) the trilemma, such that the problem falls away, and I start emitting UDASSAlike sentences on the other side, but if so I have not yet found it.
And, as someone in the Clairestate wrt the problem of metaethics, I claim that I would be able to go back and walk Alice through the valley, to the point where she was happily emitting Clairestatements all her own. (Or, at least, I'd have a pretty good hitrate among particularly sharp friends.) And I have not been able to cause any UDASSAer to walk me through the valley. And also a number of the UDASSAmoves smell to me like missteps  perhaps b/c I'm bad at requesting it, but also perhaps b/c UDASSA doesn't do the thing. All told, my guess is that it's making about as much progress at resolving the core confusions as it looks like it's making  ie, not much.
(To be clear, I have managed to get UDASSA to tell me why I shouldn't be confused about the trilemma. But this is not the currency I seek, alas.)
Yeah—I think I agree with what you're saying here. I certainly think that UDASSA still leaves a lot of things unanswered and seems confused about a lot of important questions (embeddedness, uncomputable universes, what UTM to use, how to specify an input stream, etc.). But it also feels like it gets a lot of things right in a way that I don't expect a future, better theory to get rid of—that is, UDASSA feels akin to something like Newtonian gravity here, where I expect it to be wrong, but still right enough that the actual solution doesn't look too different.
Neat! I'd bet against that if I knew how :) I expect UDASSA to look more like a red herring from the perspective of the future, with most of its answers revealed as wrong or notevenwrong or otherwise rendered inapplicable by deep viewpoint shifts. Off the top of my head, a bet I might take is "the question of which UTM metareality uses to determine the simplicity of various realities was quite offbase" (as judged by, say, agreement of both EY and PC or their surrogates in 1000 subjective years).
In fact, I'm curious for examples of things that UDASSA seems to get right, that you think better theories must improve upon. (None spring to my own mind. Though, one hypothesis I have is that I've sodeeplyinternalized all the aspects of UDASSA that seem obviouslytrue to me (or that I got from some ancestortheory), that the only things I can percieve under that label are the controversial things, such that I am not attributing to it some credit that it is due. For instance, perhaps you include various pieces of the updateless perspective under that umbrella while I do not.)
I don't think I would take that bet—I think the specific question of what UTM to use does feel more likely to be offbase than other insights I associate with UDASSA. For example, some things that I feel UDASSA gets right: a smooth continuum of happeningness that scales with number of clones/amount of simulation compute/etc., and simpler things being more highly weighted.
Cool, thanks. Yeah, I don't have >50% on either of those two things holding up to philisophical progress (and thus, eg, I disagree that future theories need to agree with UDASSA on those fronts). Rather, happeningnessasitrelatestomultiplesimulations and happeningnessasitrelatestothesimplicityofreality are precisely the sort of things where I claim Alicestyle confusion, and where it seems to me like UDASSA is alledging answers while being unable to dissolve my confusions, and where I suspect UDASSA is notevenwrong.
(In fact, you listing those two things causes me to believe that I failed to convey the intended point in my analogy above. I lean towards just calling this 'progress' and dropping the thread here, though I'd be willing to give a round of feedback if you wanna try paraphrasing or otherwise falsifying my model instead. Regardless, hooray for a more precise articulation of a disagreement!)
There are some interesting and tangentially related comments in the discussion of this post (incidentally, the first time I've been 'ratioed' on LW).
Good day, everyone. I've just come across this site; good discussion. Please take a look at
https://royalsocietypublishing.org/doi/abs/10.1098/rspa.2020.0282
(I'm the author). I hope this delivers the (exhaustive) answers to the questions above.
Comments/questions/remarks/criticism are welcome.
Who is this "particular child" you are referring to? You?
Regarding Q3, I don't understand what's wrong with the observation that we checked the Born rule by doing repeated experiments and just QM without the Born rule predicts (by doing inner product) that after doing repeated experiments amplitude in all regions that contradict Born statistic tends to zero. That way we get consistent world picture where all what's really happens is amplitude decrease and following Born rule is just arbitrary preference.
You need to separate treating (1) something literally as a delta function, (2) treating something you have observed as having probability 1.0 for the purpose of further probability calculation.
If you are measuring something discrete like spinup versus spindown, it is completely standard to set the unobserved state to 0, effectively discarding it. The discarding (projection) is just as necessary a part of the procedure as the absolutesquaring (Born's rule per se).
It's not that noone knows how to predict a series of observations with correct probabilities using QM, it is that the timehonoured method looks like Copenhagen.
I agree that there's a difference between "put a deltaspike on the single classical state you sampled" and "zero out amplitude on all states not consistent with the observation you got from your sample". I disagree that using the latter to generate a sensory stream from a quantum state yields reasonable predictions  eg, taken literally I think you're still zeroing out all but a measurezero subset of the position basis, and I expect the momenta to explode immediately. You can perhaps get this hypothesis (or the vanilla delta spike) hobbling by trying to smooth things out a bit (eg, keep a Gaussian centered on each classical state in which you made the sampled observation), but I still expect this to be experimentally distinguishable from what really happens (eg by way of some quantumeraserstyle hijinks or other sizeable entanglements), though I haven't checked the details myself.
The observation you got from your sample is information. Information is entropy, and entropy is locally finite. So I don't think it's possible for the states consistent with the observation you got from your sample to have measure zero.
When you're using TMs to approximate physics, you have to balance the continuity of physics against the discreteness of the machines somehow. The easy thing to do is to discuss the limiting behavior of a family of machines that perform the simulation at everfiner fidelity. I was doing this implicitly, for lack of desire to get into details.
And as I've said above, I'm not attempting to suggest that these naive approaches  such as sampling a single classical state and reporting the positions of some things with arbitrary fidelity in the limit  are reasonable ideas. Quite the opposite. What I'm trying to point out is that if all you have is a quantum state and the Born rule, you cannot turn it into a hypothesis without making a bunch of other choices, for which I know of no consensus answer (and for which I have not seen proposals that would resolve the problem to my satisfaction, though I have some ideas).
I agree that the correct way of making these choices will almost surely not involve recording any observation with infinite precision (in the limit).
You have been assuming that all measurements are in the position basis, which is wrong. In particular, spin is its own basis.
If you make a sharp measurement in one basis, you have uncertainty or lack of information about the others. That does not mean the "momentum is randomised" in some catastrophic sense. The original position measurement was not deterministic, for one thing.
It is true that delta functions can be badly behaved. It's also true that they can be used in practice ... if you are careful. They are not an argument against discardingandrenormalising , because if you don't do that at all, you get much wronger results than the results you get by rounding off small values to zero, ie. using a delta to represent a sharp gaussian.
That might be the case if you were making an infinitely sharp measurement of an observable with a real valued spectrum, but there are no infinitely sharp measurements, and not every observable is realvalued.
To be clear, the process that I'm talking about for turning a quantum state into a hypothesis is not intended to be a physical process (such as a measurement), it's intended to be a Turing machine (that produces output suitable for use by Solomonoff induction).
That said, to be clear, I don't think this is a fundamentally hard problem. My point is not "we have absolutely no idea how to do it", it's somehing more like "there's not a consensus answer here" + "it requires additional machinery above and beyond [the state vector + the born rule + your home address]" + "in my experience, many (nonspecialistphysicist) people don't even register this as a problem, and talk about the Born rule as if it's supposed to fill this gap".
I agree that there are a bunch of reasonable additional pieces of machinery you can use to get the missing piece (such as choice of a "measurement basis"); my own suspicion is that the right answer looks a bit different from what I've seen others propose (and routes through, eg, machinery that lets you read off the remembered history, as opposed to machinery that picks some subbasis to retain); my guess is that there are experimental differences in theory but they're probably tricky to create in practice, but I haven't worked through the details myself.
Then you run into the basic problem of using SI to investigate MW: SI's are supposed to output a series of definite observations. They are inherently "single world"
If the program running the SWE outputs information about all worlds on a single output tape, they are going to have to be concatenated or interleaved somehow. Which means that to make use of the information, you have to identify the subset if bits relating to your world. That's extra complexity which isn't accounted for because it's being done by hand, as it were.
In particular, if you just model the wave function, the only results you will get represent every possible outcome. In order to match observation , you will have to keep discarding unobserved outcomes and renormalising as you do in every interpretation. It's just that that extra stage is performed manually, not by the programme.
To get an output that matches one observers measurements, you would need to simulate collapse somehow. You could simulate collapse with a PRNG, but it won’t give you the right random numbers.
Or you would need to keep feeding your observations back in so that the simulator can perform projection and renormalisation itself. That would work, but that's a departure from how SI's are supposed to work.
Meta: trying to mechanise epistemology doesn't solve much , because mechanisms still have assumptions built into them.
Yeah yeah, this is the problem I'm referring to :)
I disagree that you must simulate collapse to solve this problem, though I agree that that would be one way to do it. (The way you get the right random numbers, fwiw, is from sample complexity  SI doesn't put all its mass on the single machine that predicts the universe, it allocates mass to all machines that have not yet erred in proportion to their simplicity, so probability mass can end up on the class of machines, each individually quite complex, that describe QM and then hardcode the branch predictions. See also the proof about how the version of SI in which each TM outputs probabilities is equivalent to the version where they don't.)
If your SI can't make predictions ITFP, that's rather beside the point. "Not erring" only has a straightforward implementation if you are expecting the predictions to be deterministic. How could an SI compare a deterministic theory to a probablistic one?
The deterministic theory gets probability proportional to 2^length + (0 if it was correct so far else infty), the probabilistic theory gets probability proportional to 2^length + log(probability it assigned to the observations so far).
That said, I was not suggesting a solomonoff inductor in which some machines were outputting bits and others were outputting probabilities.
I suspect that there's a miscommunication somewhere up the line, and my notterriblycharitableguess is that it stems from you misunderstanding the formalism of Solomonoff induction and/or the point I was making about it. I do not expect to clarify further, alas. I'd welcome someone else hopping in if they think they see the point I was making & can transmit it.
The only reason that sort of discarding works is because of decoherence (which is a probabilistic, thermodynamic phenomenon), and in fact, as a result, if you want to be super precise, discarding actually doesn't work, since the impact of those other eigenfunctions never literally goes to zero.
Maybe, but decoherence doesn't imply MW.