One of the sharpest and most important tools in the LessWrong cognitive toolkit is the idea of going meta, also called seeking whence or jumping out of the system, all terms crafted by Douglas Hofstadter. Though popularized by Hofstadter and repeatedly emphasized by Eliezer in posts like "Lost Purposes" and "Taboo Your Words", Wikipedia indicates that similar ideas have been around in philosophy since at least Anaximander in the form of the Principle of Sufficient Reason (PSR). I think it'd be only appropriate to seek whence this idea of seeking whence, taking a history of ideas perspective. I'd also like analyses of where the theme shows up and why it's appealing and so on, since again it seems pretty important to LessWrong epistemology. Topics that I'd like to see discussed are:

  1. How conservation of probability in Bayesian probability theory and conservation of phase space volume in statistical mechanics are related—a summary of Eliezer's posts on the topic would be great.
  2. How conservation of probability &c. are related to other physical/mathematical laws, e.g. Noether's theorem and quantum mechanics' continuity equation.
  3. The history of the idea of conservation laws; whether the discovery of conservation laws was fueled by PSR-like philosophical-like concerns (e.g. Leibniz?), by lower level intuitive concerns, or other means.
  4. How conservation of probability &c. are related to the idea of seeking whence [pdf] (e.g., "follow the improbability").
  5. How the PSR relates to conservation of probability &c. and to seeking whence.
  6. How going meta and seeking whence are related/equivalent.
  7. Which philosophers have used something like the PSR (e.g. Spinoza, Leibniz) and which haven't; those who haven't, what their reasons were for not using it.
  8. What kinds of conclusions are typically reached via the PSR or have historically been justified by the PSR, and whether those conclusions fit with LW's standard conclusions. If it disagrees with LW's standard conclusions, where does the PSR not apply or not apply as strongly; alternatively, why standard LW conclusions might be mistaken.
  9. Whether Schopenhauer's four-fold division of the PSR makes sense. (Schopenhauer's a relatively LW-friendly continentalesque philosopher.) A summary of any criticisms of his four-fold division.
  10. What makes the PSR, going meta, "JOOTS"-ing and seeking whence appealing, from a metaphysical, epistemological, pragmatic, and psychological perspective. What sorts of environments or problem sets select for it. (The Baldwin effect and similar phenomena might be relevant.)
  11. What going meta / seeking whence looks like at different levels of organization; how one jumps out of systems at varying levels.
  12. Eliezer's rule of derivative validity from CFAI and how it relates to the PSR; an analysis of how the (moral, or perhaps UDT-like decision-policy-centric) PSR might be relevant to Friendliness philosophy, e.g. as compared with CEV-like proposals [pdf].
  13. How latent Platonic nodes in TDT [pdf] (p. 78) relate to the PSR.
  14. A generalization of CFAI's causal validity semantics to timeless validity semantics in the spirit of the generalization of CDT to TDT, or perhaps even further generalizations of causal validity semantics in the spirit of Updateless Decision Theory or eXceptionless Decision Theory. (ETA: Whoops, Eliezer already discussed the acausal level, but seems to have only mentioned Platonic forms as an afterthought. Maybe ignore this bullet point.)
  15. How the PSR and the rule of derivative validity relate to Robin Hanson's idea of pre-rationality and Wei Dai's questions about extending pre-rationality to include past selves' utility functions—whether this elucidates the relation between XDT and UDT.
  16. Where Hofstadter picked up the idea of "going meta" and what led him to think it was important. What led Eliezer to rely on it so much and emphasize the importance of avoiding lost purposes.
Does anyone have the mathematical, historical, philosophical, and research skills necessary to write a very long post or two or a sequence on this? How about just the skills required to tackle one of the above questions, or a question like those?
Might people with relevant knowledge share it in comments on this post? I'll try to write a few comments to seed discussion of some of the above questions.
(Maybe best discussed elsewhere, but: What other cognitive tools or themes often used on LessWrong have a long history that we don't all know even exists?)
I'd also just generally appreciate posts on going meta, especially with regards to epistemic rationality—discussion of "going meta" in the domain of instrumental rationality tends to devolve into boring 'experiment more!' 'no, go meta more!' back-and-forths without any accompanying explicit  VOI calculations. With epistemic rationality the benefits are clearer cut. When people reflect and try to go meta discussions tend to go way better—e.g., when people are explicitly aware that "politics is the mind-killer" they're less likely to start a distracting flame war, and more likely to discuss interesting factual issues. (Please, let's avoid discussion of "politics is the mind killer" politics here. ;P ) I think this sort of effect could be increased across the board, especially for mildly political subjects like whether to donate to x-risk charities, and that this would increase the quality of most arguments on LessWrong. Although much of the relevant knowledge needed to go meta there is knowledge of signaling game theory and social psychology, and can't be replaced  simply  by applying a skill like 'go meta', going meta and staying reflective can still mitigate many obvious failure modes. Posts giving examples of (hypothetical) debates and how the debaters could go meta could be really valuable.
Thanks all!
New Comment
33 comments, sorted by Click to highlight new comments since: Today at 7:58 PM

Baez' series on network theory and information geometry answer a couple of your questions in a very accessible way.

Here Baez and Fong prove a version of Noether's theorem for Markov processes.

Earlier Baez talked about conservation of total probability.

Here he relates the distribution of existing species to a prior and how Bayes' rule says how the number of each species changes over time.

It looks as if you (Will) are trying to make some sort of equivalence between the idea of "going meta" and the "principle of sufficient reason", but surely these are completely different things.

Going meta (in the sense used here): saying "OK, so is there a reason why this is the way it is, and if so what's the reason?"

Principle of sufficient reason: "Everything that is true is true because of some prior reason sufficient to make it true."

PSR is not at all the same thing as "going meta"; it's not even the same kind of thing as "going meta"; it is one particular opinion about what sort of answers one will get when one goes meta.

It doesn't seem to me that the PSR is credible enough to warrant the sort of amount of attention you're trying to give it here, and in particular tying it to LWers' fondness for meta-ness seems entirely out of order.

(It feels to me as if the nearest thing to the PSR in LW tradition is not the idea of "going meta" but the idea of Solomonoff induction. But I haven't thought this through very hard.)

First, recursive statements (aka "going meta") are powerful. As the history of mathematics teaches, seeking to restrict the ability to make recursive statements has enormous difficulty.

Second, recursive analysis of society (aka sociology) can provide very interesting insights, despite the fact that there is substantial social resistance to engaging in meta-level analysis. Hansonian signalling theory is a valuable perspective that could not be created without meta-level analysis of social behavior. Thus, lowing the resistance of the general population to meta-level analysis is relatively low-hanging fruit in raising the sanity line.

Most importantly, going meta often a mistake when trying to solve real-world problems. Going meta again is almost always a mistake.
To be concrete, when I work as a lawyer, the actual meaning of the rules is usually the only important level of analysis. Sometimes, thinking about the policies that justify the rule is helpful. But thinking about the policies that justify having policies is pointless and unhelpful. In short, it is appropriate to be skeptical of assertions of the value of "going meta," particularly assertions about the value of going "meta-meta."

Is your only evidence for this:

Most importantly, going meta often a mistake when trying to solve real-world problems. Going meta again is almost always a mistake.


To be concrete, when I work as a lawyer, the actual meaning of the rules is usually the only important level of analysis.

It seems very plausible that going meta would be particularly unhelpful in a field that's all about negotiating --and ensuring compliance with-- precedent and explicit rules. Compare to a field where the problem that needs to be solved is one caused by precedent and explicit rules.

My main point was that meta-meta level (the part you didn't quote) is very seldom useful in solving problems. My sense is that Will is too willing to go to that level.

I don't agree with your point about what the practice of law is like, but that isn't important. With certain types of problems, one level of meta is useful very frequently, with other types of problems much less so. It depends a lot on context. In particular circumstances, I think most thoughtful people could come to agreement on the usefulness of meta level analysis (i.e. policy arguments). But Will is particularly poorly calibrated on the usefulness of meta arguments. Most of society under-relies on meta-level analysis. Will seems to over-rely on it.

How would you know how well I'm calibrated?

Most importantly, going meta often a mistake when trying to solve real-world problems. Going meta again is almost always a mistake.

I think your concept of "the real world" is insufficiently broad. I think there is a moral imperative to figure out morality, and I think figuring out morality requires going ridiculously meta—including asking deep questions like "what policies justify having policies?". I think intelligent reflective people should spend as many hours as they can thinking about questions like these.

Perhaps now everyone can consider the "how meta should we go in our daily lives?" debate over—both sides have aired their opinions, and there's not much else to do beyond that.

I agree that everyone ought to deeply think about deep questions at least long enough to come up with answers or have a satisfactory explanation of why no answers are there. But you only need to go through this process once. After you have the answers to these questions, there's no call to continue thinking at this level unless you come to believe that the answers you found are wrong.

In short, there's no reason to go seeking new deep answers unless you have reason to think your deep answers don't work for you as well as they could. As I implied in my sanity line point, developing a sense of when that is happening is a valuable skill that most people simply don't have. But meta-level thinking doesn't develop that skill.

A closely related idea, or maybe a special case, is at the heart of what we do here:

  • When one makes a decision, one is also choosing to use a particular decision process.

  • When one believes something, one is also choosing to trust a particular truth-seeking process.

The history of the idea of conservation laws; whether the discovery of conservation laws was fueled by PSR-like philosophical-like concerns (e.g. Leibniz?), by lower level intuitive concerns, or other means.

If I'm not mistaken, Maslama al-Majriti beat Antoine-Laurent de Lavoisier to the punch on the principle of the conservation of mass by about 8 centuries. As far as I can tell, he was closer to the practical side (like refining metals and creating medicines, not like proving theorems or arguing from first principles) of the theoretical-practical continuum (or, if you prefer, the philosopher-craftsmen continuum) than Leibniz.

[This comment is no longer endorsed by its author]Reply

What is your source for the claim? Google finds many people making the claim, but I'm not convinced that Maslama made it. I think he did one burn where the amount lost as gas was about the same as the amount of oxygen gained, within the error of his not very accurate scale. If he drew philosophical conclusions from this, I would count it against him, but I don't think he did.

What is your source for the claim?

Ziauddin Sardar. The Touch of Midas : Science, Values, and Environment in Islam and the West. Manchester, U.K. Dover, N.H., U.S.A: Manchester University Press, 1984.

Google finds many people making the claim, but I'm not convinced that Maslama made it.

I'm not convinced either (and I'm even less convinced now). I'm getting less hits on Google than I would expect given that it is a historical fact. Also, this violates my at least two independent sources heuristic for historical claims of this kind, so I will retract my comment until further notice.

Sardar says nothing more than wikipedia. He quotes the same passage and claims that it stands on its own. He italicizes the last phrase. If al-Majriti emphasized it in the original, I would take that as his making the claim.

He says "compare this experiment with that of Lavoisier." Indeed, Lavoisier also used mercury, but it is terrible for this experiment, as I explained above, and maybe it caused Lavoisier to fool himself, too. But if that had been Lavoisier's last experiment, he wouldn't have convinced anyone. Who cares if one burn conserves mass? The point of the law is that if properly contained, all burns conserve mass. If al-Majriti weighed the container, that would be (1) closer to Lavoisier and (2) expressing an interest in the correct law, but the recorded weight sounds like the reagent, not the container.

How do you judge the independence of sources? I can certainly find you others that agree with Sardar.
Added: I suppose you could look for agreement between sources with different politics.

Here is a Holmyard book from 1945 that quotes the same passage, but does not believe the author is proposing the law. Incidentally, he disputes the attribution to al-Majriti. (In 1922 in Nature, Holmyard announced that he had a new manuscript and gave the translation of that passage that he copied in 1945 and which wikipedia copied; Sardar gives a new translation.)

The metaphysical principle of the conservation of matter—that matter can be neither created nor destroyed in chemical processes—called upon here is at least as old as Aristotle (Weisheipl, 1963).

-Michael Weisberg, Paul Needham, and Robin Hendry, Stanford Encyclopedia of Philosophy

The citation seems to refer to this:
Weisheipl, James A. (1963), “The Concept of Matter in Fourteenth Century Science”, in Ernan McMullin, ed., The Concept of Matter in Greek and Medieval Philosophy, Notre Dame: University of Notre Dame Press.

I wasn't able to obtain a jail-broken copy of the paper; maybe you'll have better luck.


This paper is about Europe and thus does not shed much light on al-Majriti. The paper is about a metaphysical principle, not an empirical one. People point to al-Majriti because he might have done relevant experiments.

After thousands of years of acceptance of this metaphysical principle, Occam was skeptical; perhaps the original application of his razor was that "quantity of matter" was meaningless, and thus too its conservation (though atomists could use a count). Later 14th century Europeans defined "quantity" as "mass," and thus made a precise claim. Following Averroes, they used inertial mass, not gravitational mass.

So you could say that Lavoisier did not discover conservation of mass, but experimentally checked it. But he made a big splash at the time, probably because scientists were no longer so enthusiastic about metaphysical speculation. In particular, he showed the 14th century people made the right guess.

I can't think of any interesting connection between conservation of probability and Liouville's theorem. Can you elaborate on what you're thinking about? Conservation of probability just tells us that the total probability must always add up to 1, so if an update increases the probability of some hypothesis, there must be a corresponding decrease in probability of some other hypotheses. Since phase space volume is a finite measure, it must of course satisfy this sort of conservation. If the volume of some region of phase space increases, there must be a corresponding decrease in the volume of some other region.

But Liouville's theorem tells us something different. It tells us that if we start with a region of phase space A and follow it along as it evolves under a Hamiltonian flow into another region B, then the volumes of A and B will be identical. This feature isn't forced on us by the mere fact that volume is a measure. The only constraint placed by conservation of probability (in this case, conservation of measure) is that the total volume of phase space must remain constant under the flow. But this is compatible with the volume of some subset of phase space changing as it evolves under the flow, as long as this change is couteracted by another change elsewhere.

The conservation of total phase space volume (or what you might call the conservation of probability) simply follows from the fact that phase space volume is a finite measure. This places a constraint on possible phase space flows. There is no flow that can increase the total measure of phase space. Liouville's theorem doesn't just follow from the nature of the volume measure; it follows from the geometric structure of phase space. In a Hamiltonian system, phase space is a symplectic manifold. The flow must preserve the symplectic structure, and a consequence is that it must preserve the natural volume measure. So here it's not just the general mathematics of measures that's relevant, it's the particular symmetries of Hamiltonian phase space. The symplectic structure of Hamiltonian mechanics places a further constraint on the flow, and this time it's a continuity constraint.

Additionally, if you look at the probability distributions that are actually used in statistical mechanics, these distributions don't obey the Liouville kind of conservation of probability, but they obviously obey the Bayesian kind of conservation of probability (since they are probability distributions). If the distributions obeyed Liouville's theorem, there would not be an increase in entropy. Increasing entropy requires the probability distribution to spread out, but Liouville's theorem forbids this. We get increase of entropy by working with coarse-grained probability distributions that aren't governed by Liouville's theorem

OK, I just read the link in your post, and I realized that you're referring to something different when you talk about conservation of probability in Bayesian epistemology. I still don't think it has all that much to do with Liouville's theorem, but some of the stuff I wrote above is a little bit irrelevant. Stupid pragmatist! That'll teach me to mouth off without first looking at the links.

Still, my main point stands. The Bayesian version of conservation of probability just follows from the mathematics of probability (plus Bayesian updating). The Liouvillean version follows from the geometric structure of the space over which the probability distributions are defined.

Zuckerberg just went meta. Reminder that in the novel, the architect of the metaverse was a billionaire using a Burroughsian information virus for mind control... 

I'm also looking for a discussion of the symmetry related to conservation of probability through Noether's theorem. A quick Google search only finds quantum mechanics discussions, which relate it to spatial invariances, etc.

If there's no symmetry, it's not a conservation law. Surely someone has derived it carefully. Does anyone know where?