Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Toward a New Technical Explanation of Technical Explanation

16th Feb 2018

25SquirrelInHell

7abramdemski

4SquirrelInHell

2abramdemski

14habryka

22Jaime Sevilla Molina

14Gordon Seidoh Worley

12habryka

8ryan_b

8Rob Bensinger

6abramdemski

5habryka

2Vaughn Papenhausen

4Dacyn

7abramdemski

5habryka

3abramdemski

6habryka

2abramdemski

4Valentine

3habryka

2Ben Pace

5abramdemski

2PDV

2abramdemski

1avturchin

1ryan_b

3abramdemski

3ryan_b

5abramdemski

1ryan_b

4abramdemski

3ryan_b

4abramdemski

1Gordon Seidoh Worley

4abramdemski

New Comment

36 comments, sorted by Click to highlight new comments since: Today at 4:04 PM

*[excellent, odds ratio 3:2 for worth checking LW2.0 sometimes and 4:3 for LW2.0 will succeed]*

I think "Determinism and Reconstructability" are great concepts but you picked terrible names for them, and I'll probably call them "gears" and "meta-gears" or something short like that.

This article made me realize that my cognition runs on something equivalent to logical inductors, and what I recently wrote on Be Well Tuned about cognitive strategies is a reasonable attempt at explaining how to implement logical inductors in a human brain.

Thank you! I'm glad to contribute to those odds ratios.

I neglected to optimize those names, yeah. But "gears" v "meta-gears"? I think the two things together make what people call "gears", so it should be more like "gears inside" v "gears outside" (maybe "object-leves gears" v "meta-level gears"), so that you can say both are necessary for good gears-level models.

I hadn't seen Be Well Tuned!

I think it's perfectly valid to informally say "gears" while meaning both "gears" (how clear a model is on what it predicts) and "meta-gears" (how clear the meta model is on which models it a priori expects to be correct). And the new clarity you bring to this would probably be the right time to re-draw the boundaries around gears-ness, to make it match the structure of reality better. But this is just a suggestion.

Maybe so. I'm also tempted to call meta-gears "policy-level gears" to echo my earlier terminology post, but it seems a bit confusing. Definitely would be nice to have better terminology for it all.

Curated for:

- Being a clear and engaging explanation
- Putting significant effort into being technically precise
- Covering a core topic of rationality with a long history on this site
- Finally explaining logical induction in a way that made me feel like I get it

This is rather random, but I really appreciate the work made by the moderators when explaining their reasons for curating an article. Keep this up please!

I like this post in that it gives a much clearer picture of the intuition behind logical inductors than I've gotten before. Thanks for presenting it in a way that makes it easier to manipulate.

This post actually got me to understand how logical induction works, and also caused me to eventually give up on bayesianism as the foundation of epistemology in embedded contexts (together with Abram's other post on the untrollable mathematician).

I do not understand Logical Induction, and I especially don't understand the relationship between it and updating on evidence. I feel like I keep viewing Bayes as a procedure separate from the agent, and then trying to slide LI into that same slot, and it fails because at least LI and probably Bayes are wrongly viewed that way.

But this post is what I leaned on to shift from an utter-darkness understanding of LI to a heavy-fog one, and re-reading it has been very useful in that regard. Since I am otherwise not a person who would be expected to understand it, I think this speaks very well of the post in general and of its importance to the conversation surrounding LI.

This also is a good example of the norm of multiple levels of explanation: in my lay opinion a good intellectual pipeline needs explanation stretching from intuition through formalism, and this is such a post on one of the most important developments here.

When I read this post, it struck me as a remarkably good introduction to logical induction, and the whole discussion seemed very core to the formal-epistemology projects on LW and AIAF.

the best technical understanding of practical epistemology available at the time* -- the Bayesian account --

Note: I don't stand by all possible interpretations of this sentence. It should be hedged very carefully, but attempting to do so in the main text would have reduced clarity.

Bayesianism is not a theory of epistemology in the philosophical sense. It has no account of what knowledge is, and as a result, does not attempt to answer many of the questions which are primary concerns of the field. Instead, Bayesianism provides an account of what's important: the information-theoretic processes which are going on in agents when developing true(er) beliefs, and the relationship between these information-theoretic processes and action. I therefore hedge with "practical epistemology".

This, too, is not obviously not false, because Bayesianism is in many respects far from practical. While our community focuses on a praxis developed around the Bayesian model, it is not really clear that it is "the best" for this purpose: one might do better with techniques derived more from practice. There are schools of thought one might point to in this respect, such as whatever it is they teach in critical thinking courses these days, or ideas derived from introspective schools such as meditation and phenomenology, or management science.

None of these compete with Bayesianism in its home niche, though, as a *formal theory of rational thinking*. Hence my additional hedge, "technical understanding". Bayesianism provides a rigorous mathematical theory, which makes precise claims. Competitors such as Popperian epistemology lack this property. This is one reason why Bayesian models are a go-to when trying to invent new statistical techniques or machine learning algorithms.

Was Bayesianism really the best technical understanding of practical epistemology at the time? Alternatives which can compete with it on these grounds include PAC and MDL methods. I won't try to make a case against these here -- just note that the claim I'm making is intended to contrast with those in particular, not with other philosophical schools nor with other concrete rationality practices.

This was really great! I've been trying to grasp at a bunch of stuff in this direction for a while, and recently had a long conversation about moral uncertainty and uncertainty about decision theory (which was proposed in this paper by Will MacAskill), in which I expressed a bunch of confusion and discomfort with commonly suggested solutions, with reasoning similar to what you outlined in this post. This has clarified a good amount of what I had tried to say, though I have many more thoughts.

However, my thoughts are still all super vague and I have some urgent deadlines coming, and so probably won't have the time to make them coherent in the next few days. If anyone reads this a week from now or later, you are welcome to ping me to write down my thoughts on this.

If you don't feel you have a good intuitive grasp of what I mean by "gears level understanding", I suggest reading his post.

What post? Is there a link missing?

It does indeed look like the post lost a lot of links at some point in copying it to other places for editing. Yikes! I'll have to try and put them back in.

I suggest it's this:whether the hypothesis is well-defined, such that anyone can say what predictions it makes without extra information.

So, you might be confused reading this, because TETE (my preferred abbreviation) defines a hypothesis by its predictions.

But then you add in that reality is a connected series of cause and effect, where having a simple causal graphical model can predict hundreds of relationships, and then you realise that the causal graphical model is the hypothesis, the nodes + how the connect are the gears, and the joint probability distribution over all the variables is the predictions. That's why a hypothesis and its predictions are distinct.

Yeah, there's more to be said about this. If you think of the model as just being the predictions, it doesn't make sense to imagine handing a person a hypothesis without handing them the ability to figure out what it predicts. As you say, you can instead imagine that a hypothesis is a bayesian network. Or, to be more general, you can imagine it is a program for generating predictions. (A Bayesian network being one particular program.) However, it *still* doesn't make very much sense to hand someone a program for generating predictions without having handed them the ability to make predictions! So the confusion persists.

In order to really formalize the distinction I'm making, you have to imagine that you can refer to a hypothesis and put a degree of credence on it without "having it". So, we still think of a hypothesis as a way of generating predictions, but when I tell you about a hypothesis I'm not necessarily giving you the source code -- or maybe I'm giving you the source code, but in a programming language you don't know how to execute, so you still don't have enough information to generate predictions.

So, a hypothesis *is just* a way of generating predictions, but knowing *about* the hypothesis or knowing *some description* of it doesn't necessarily give you access to the hypothesis' ability to generate predictions.

To clearly illustrate the distinction at play here:

Suppose I have access to a halting oracle which not everyone has access to. I generate predictions about whether a program will halt by consulting the halting oracle. My hypothesis is * not* "the halting oracle is always right" -- that's a hypothesis which you could also generate predictions from, such as "IF I could access the halting oracle to check it, THEN I would never falsify it" -- but rather, my hypothesis is just the halting oracle itself. (There's a use-mention distinction here.)

If I give you a big bayesian network, you might make different predictions with it than I do because you have different knowledge to give the bayesian network as inputs. However, both of us know how to implement the same input-output relation between observations and predictions; that's what the bayesian network tells us. However, I can't give you my procedure for consulting the halting oracle to generate predictions, unless I can give you the halting oracle.

Similarly, if a hypothesis is embodied in a bunch of messy neural connections, we may not be able to convey it perfectly. It's like we're all using different programming languages which are so complex that we can't hope to provide perfect compilers into each other's dialects.

This point is fairly subtle, since it relies on a distinction between inputs vs subroutines. Am I "observing" the halting oracle to generate my predictions, or am I "calling" it, as one bit of source code may call another? I can think of it both ways. At least it *is* clear, though, that I'm not "observing" the neural patterns implementing my brain.

I think that prediction market is only an instrument to compare existing logical uncertainty estimations, and some more straightforward instrument for their calulcation may be needed.

It may be the share of the similar statements from the same reference class which are known to be true. For example, if I want to learn n digit of pi, I can estimate the logical uncertainty as 1 of 10 as the reference class consists of 10 possible digits which have equal probability to be true. If I want to learn a priory probability of the veracity of a theorem about natural numbers, I may use share of true statements about natural numbers (no longer than m) compared to all possible statements (of this length). As any statement belongs to several reference classes, I could have several estimations this way, and after getting the median, I will be probably be close to some kind of best available to me estimation.

The described above algorithm for logical uncertainty calculation has an advantage of that it doesn't require complex AI or general intelligence to be calculated, as it just compresses all prior knowledge about how many theorems have turned to be true, using rather simple calculations.

This calculations may be done more effectively if we add machine learning to predict which theorems are likely to be true, using the same architecture as in AlphaZero broad games engine. AlphaZero system has Monte-Carlo search engine (in space of possible future games) and "intuition" neural net part, which is trained (on previous games) to predict which moves are likely to be winning.

I am having trouble parsing this section:

**However, the probability for the well-known observation was already at 100%. How can a previously-known statement provide new support for the hypothesis, as if we are re-updating on evidence we've already updated on?**

If it is a new theory, why is the evidence against which it is tested considered old? Further, how would this be different from using the theory to predict the precession of mercury if you were to test it deliberately? Intuitively, this feels to me like privileging where the evidence lies in the time domain over anything else about it.

I agree that those are reasons to *not* treat old evidence differently.

In terms of the problem of old evidence as usually presented (to the best of my understanding), the idea is: if you already know a thing, how can you update on it? This can be formalized more, as an objection to Bayesianism (though not a very good one I think).

At one point I had a link to further explanation of this in SEP, but it looks like I accidentally removed all links from my post at some point during editing.

Thank you for the link; reading it caused my confusion to increase. Forgive me if the question seems stupid, but doesn't this prevent contradicting a theory?

Suppose we have evidence statement *E*, known for some time, and theory statement H, considered for some time. If we then discover that *H* implies -*E, *how does this not run into the same problem of old evidence? I would expect in Bayesian confirmation theory and also scientific practice for confidence in *H* to be reduced, but it seems to be the same operation as conditionalizing on *E*.

Come to think of it, how can we assert that our current confidence in *H* is correct if we discover anything new about its relationship to *E*? My guts rebel at the idea; intuitively it seems like we should conclude our previous update of *H *was an error, undo the previous conditionalization on *E*, and re-conditionalize on what we now know to be correct.

In Bayesian confirmation theory, you have to already have considered all the implications of a hypothesis. You can't be thinking of a hypothesis H and not know H->(-E) from the beginning. Discovering implications of a theory means you have logical uncertainty. Our best theory of logical uncertainty at the moment seems to be logical induction, and it behaves somewhat counterintuitively: noticing the implication H->(-E) would indeed disprove H, but if H merely confers high probability to -E, noticing this doesn't *necessarily* drive the belief in H down. This is actually an important feature, because if it *always* counted against H, you could always drive belief in H down by biasing the order in which you prove things.

Come to think of it, how can we assert that our current confidence inHis correct if we discover anything new about its relationship toE? My guts rebel at the idea; intuitively it seems like we should conclude our previous update ofHwas an error, undo the previous conditionalization onE, and re-conditionalize on what we now know to be correct.

Yeah, if I didn't know about LI, I would agree. "Bayes isn't wrong; you're specifying a wrong way for a being with finite computational resources to approximate Bayes! You re-do the calculations when you notice new hypotheses or new implications of hypotheses! Your old probability estimate was just poor; you don't have to explain the way your re-calculation changed things within the Bayesian framework, so there's no problem of old evidence."

However, given LI, the picture is more complicated. We can now say a lot more about what it means for an agent with bounded computational resources to reason approximately about a computationally intractable structure, and it *does *seem like there's a problem of old evidence.

I am hanging up consistently on the old/new verbiage, and you have provided enough resolution that I suspect my problem lies beneath the level we're discussing. So while I go do some review, I have a tangentially related question:

Are you familiar with the work of Glenn Schafer and Vladimir Vovk in building a game-theoretic treatment of probability? I mention it because of the prediction market comment for LI and the post about logical dutch book; their core mechanisms appear to have similar intuitions at work, so I thought it might be of interest.

Here's an example where you OBVIOUSLY don't want to award points for old evidence: every time the stock market goes up or down, your friend says "I saw that coming". When you ask how, they give a semi-plausible story of how recent news made them suspect the stock market would move in that direction.

I've heard of Schafer & Vovk's work! Haven't looked into it yet, but Sam Eisenstat was reading it.

That much makes intuitive sense to me - I might go as far as to say that when we cherry-pick we are deliberately trolling ourselves with old evidence. I think I keep expecting that many of these problems are resolved by considering the details of how we, the agent, actually do the procedure. For example, say you have a Bayesian Confirmation Theoretic treatment of a hypothesis, but then you learn about LI, does re-interpreting the evidence with LI still count as the old evidence problem? Do we have a formal account of how to transition from one interpretation to the other, like a gauge theory of decisions (I expect not)?

I wrote a partial review of Shafer & Vovk's book on the subject here. I am still reading the book and it was published in 2001, so it doesn't reflect the current state of scholarship - but if you'll take a lay opinion, I recommend it.

A conclusion is gears-like with respect to a particular ontology to the extent that you can "see the derivation" in that ontology. A conclusion is gears-like without qualification to the extent that you can also "see the derivation" of the ontology itself.

Seeing the derivation of the ontology itself starts to run us into what some call the hard problem of consciousness and I think of as the hard problem of existence, that is, why does anything exist at all? Obviously it could be no other way else we wouldn't be here asking these questions, but it points towards the existence of an epistemologically irreducible point of ontology origination where something is somehow known before we can know how we know, yet anything we will come to know about how we know will be tainted by what we already know.

In phenomenology this is the sometimes thought of as looking into the transcendental aspect of being because it is not perfectly knowable and yet we know of it anyway. This is not to mix up "transcendental" with mysticism, merely to point out that we see there is something we know that transcends our ability to fully know it even though we may reckon it is just as physical as everything else. As this unfortunately suggests for the line of reasoning you seem to be hoping to pursue, it is impossible because we are forever locked out of perfect knowledge by our instantiation as knowers in the world, and we must content ourselves with either being consistent but not complete in our understanding or complete but inconsistent.

I don't claim that you become completely gears at any point. You just keep looking for more objectivity in your analysis all the time, while also continuing to jump ahead of what you can objectively justify.

## A New Framework

(Thanks to Valentine for a discussion leading to this post, and thanks to CFAR for running the CFAR-MIRI cross-fertilization workshop. Val provided feedback on a version of this post. Warning: fairly long.)Eliezer's

A Technical Explanation of Technical Explanation, and moreover the sequences as a whole, used the best technical understanding of practical epistemology available at the time* -- the Bayesian account -- to address the question of how humans can try to arrive at better beliefs in practice. The sequences also pointed out several holes in this understanding, mainly having to do with logical uncertainty and reflective consistency.MIRI's research program has since then made major progress on logical uncertainty. The new understanding of epistemology -- the theory of logical induction -- generalizes the Bayesian account by eliminating the assumption of logical omniscience. Bayesian belief updates are recovered as a special case, but the dynamics of belief change are non-Bayesian in general. While it might not turn out to be the last word on the problem of logical uncertainty, it has a large number of desirable properties, and solves many problems in a unified and relatively clean framework.

It seems worth asking what consequences this theory has for practical rationality. Can we say new things about what good reasoning looks like in humans, and how to avoid pitfalls of reasoning?

First, I'll give a shallow overview of logical induction and possible implications for practical epistemic rationality. Then, I'll focus on the particular question of

A Technical Explanation of Technical Explanation(which I'll abbreviate TEOTE from now on). Put in CFAR terminology, I'm seeking a gears-level understanding of gears-level understanding. I focus on the intuitions, with only a minimal account of how logical induction helps make that picture work.## Logical Induction

There are a number of difficulties in applying Bayesian uncertainty to logic. No computable probability distribution can give non-zero measure to the logical tautologies, since you can't bound the amount of time you need to think to check whether something is a tautology, so updating on provable sentences always means updating on a set of measure zero. This leads to convergence problems, although there's been recent progress on that front.

Put another way: Logical consequence is deterministic, but due to Gödel's first incompleteness theorem, it is like a stochastic variable in that there is no computable procedure which correctly decides whether something is a logical consequence. This means that any computable probability distribution has infinite Bayes loss on the question of logical consequence. Yet, because the question is actually deterministic, we know how to point in the direction of better distributions by doing more and more consistency checking. This puts us in a puzzling situation where we want to improve the Bayesian probability distribution by doing a kind of non-Bayesian update. This was the two-update problem.

You can think of logical induction as supporting a set of hypotheses which are about ways to shift beliefs as you think longer, rather than fixed probability distributions which can only shift in response to evidence.

This introduces a new problem: how can you score a hypothesis if it keeps shifting around its beliefs? As TEOTE emphasises, Bayesians outlaw this kind of belief shift for a reason: requiring predictions to be made in advance eliminates hindsight bias. (More on this later.) So long as you understand exactly what a hypothesis predicts and what it does not predict, you can evaluate its Bayes score and its prior complexity penalty and rank it objectively. How do you do this if you don't know all the consequences of a belief, and the belief itself makes shifting claims about what those consequences are?

The logical-induction solution is: set up a prediction market. A hypothesis only gets credit for contributing to collective knowledge by moving the market in the right direction early. If the market's odds on prime numbers are currently worse than those which the prime number theorem can provide, a hypothesis can make money by making bets in that direction. If the market has already converged to those beliefs, though, a hypothesis can't make any more money by expressing such beliefs -- so it doesn't get any credit for doing so. If the market has moved on to even more accurate rules of thumb, a trader would only lose money by moving beliefs back in the direction of the prime number theorem.

## Mathematical Understanding

This provides a framework in which we can make sense of mathematical labor. For example, a common occurrence in combinatorics is that there is a sequence which we can calculate, such as the catalan numbers, by directly counting the number of objects of some specific type. This sequence is boggled at like data in a scientific experiment. Different patterns in the sequence are observed, and hypotheses for the continuation of these patterns are proposed and tested. Often, a significant goal is the construction of a closed form expression for the sequence.

This looks just like Bayesian empiricism, except for the fact that

we already have a hypothesis which entirely explains the observations.The sequence is constructed from a definition which mathematicians made up, and which thus assigns 100% probability to the observed data. What's going on? It is possible to partially explain this kind of thing in a Bayesian framework by actingas ifthe true formula were unknown and we were trying to guess where the sequence came from, but this doesn't explain everything, such as why finding a closed form expression would be important.Logical induction explains this by pointing out how different time-scales are involved. Even if all elements of the sequence are calculable, a new hypothesis can get credit for calculating them faster than the brute-force method. Anything which allows one to produce correct answers faster contributes to the efficiency of the prediction market inside the logical inductor, and thus, to the overall mathematical understanding of a subject. This cleans up the issue nicely.

What other epistemic phenomena can we now understand better?

## Lessons for Aspiring Rationalists

Many of these could benefit from a whole post of their own, but here's some fast-and-loose corrections to Bayesian epistemology which may be useful:

adjustyour odds as you think longer, they can leave most sentences alone and focus on a narrow domain of expertise. Everyone was already doing this in practice, but the math of Bayesian probability theory requires each hypothesis to make a prediction about every observation, if you actually look at it. Allowing a hypothesis to remain silent on some issues in standard Bayesianism can cause problems: if you're not careful, a hypothesis can avoid falsification by remaining silent, so you end up incentivising hypotheses to remain mostly silent (and you fail to learn as a result). Prediction markets are one way to solve this problem.current price, so they take a hit for leaving a now-unpopular position which they initially supported (but less of a hit than if they'd stuck with it) or coming in late to a position of growing popularity. Other stock-market type dynamics can occur.tooconfused about Hofstadter's-law type paradoxes.You may want to be a bit careful and Chesterton-fence existing Bayescraft, though, because some things are still better about the Bayesian setting. I mentioned earlier that Bayesians don't have to worry so much about hindsight bias. This is closely related to the problem of old evidence.

## Old Evidence

Suppose a new scientific hypothesis, such as general relativity, explains a well-know observation such as the perihelion precession of mercury better than any existing theory. Intuitively, this is a point in favor of the new theory. However, the probability for the well-known observation was already at 100%. How can a previously-known statement provide new support for the hypothesis, as if we are re-updating on evidence we've already updated on long ago? This is known as the problem of old evidence, and is usually levelled as a charge against Bayesian epistemology. However, in some sense, the situation is worse for logical induction.

A Bayesian who endorses Solomonoff induction can tell the following story: Solomonoff induction is the right theory of epistemology, but we can only approximate it, because it is uncomputable. We approximate it by searching for hypotheses, and computing their posterior probability retroactively when we find new ones. It only makes sense that when we find a new hypothesis, we calculate its posterior probability by multiplying its prior probability (based on its description length) by the probability it assigns to all evidence so far. That's Bayes' Law! The fact that we already knew the evidence is not relevant, since our approximation didn't previously include this hypothesis.

Logical induction speaks against this way of thinking. The hypothetical Solomonoff induction advocate is assuming one way of approximating Bayesian reasoning via finite computing power. Logical induction can be thought of as a different (more rigorous) story about how to approximate intractible mathematical structures. In this new way,

propositions are bought or sold at market prices at the time.If a new hypothesis is discovered, it can't be given any credit for 'predicting' old information. The price of known evidence is already at maximum -- you can't gain any money by investing in it.There are good reasons to ignore old evidence, especially if the old evidence has biased your search for new hypotheses. Nonetheless, it doesn't seem right to

totallyrule out this sort of update.I'm still a bit puzzled by this, but I think the situation is improved by understanding gears-level reasoning. So, let's move on to the discussion of TEOTE.

## Gears of Gears

As Valentine noted in his article, it is somewhat frustrating how the overall idea of gears-level understanding seems so clear while remaining only heuristic in definition. It's a sign of a ripe philosophical puzzle. If you don't feel you have a good intuitive grasp of what I mean by "gears level understanding", I suggest reading his post.

Valentine gives three tests which point in the direction of the right concept:

pay rent? If it does, and if it were falsified, how much (and how precisely) could you infer other things from the falsification?could be different?could you rederive it?I already named one near-synonym for "gears", namely "technical explanation". Two more are "inside view" and Elon Musk's notion of reasoning from first principles. The implication is supposed to be that gears-level understanding is in some sense better than other sorts of knowledge, but this is decidedly not supposed to be valued to the exclusion of other sorts of knowledge. Inside-view reasoning is traditionally supposed to be combined with outside-view reasoning (although Elon Musk calls it "reasoning by analogy" and considers it inferior, and much of Eliezer's recent writing warns of its dangers as well, while allowing for its application to special cases). I suggested the terms gears-level & policy-level in a previous post (which I actually wrote after most of this one).

Although TEOTE gets close to answering Valentine's question, it doesn't quite hit the mark. The definition of "technical explanation" provided there is a theory which strongly concentrates the probability mass on specific predictions and rules out others. It's clear that a model can do this without being "gears". For example, my model might be that whatever prediction the Great Master makes will come true. The Great Master can make very detailed predictions, but I don't know how they're generated. I lack the understanding associated with the predictive power. I might have a strong outside-view reason to trust the Great Master: their track record on predictions is immaculate, their Bayes-loss miniscule, their calibration supreme. Yet, I lack an inside-view account. I can't derive their predictions from first principles.

Here, I'm siding with David Deutsch's account in the first chapter of

The Fabric of Reality. He argues that understanding and predictive capability are distinct, and that understanding is about having good explanations. I may not accept his whole critique of Bayesianism, but that much of his view seems right to me. Unfortunately, he doesn't give atechnicalaccount of what "explanation" and "understanding" could be.## First Attempt: Deterministic Predictions

TEOTE spends a good chunk of time on the issue of making predictions in advance. According to TEOTE, this is a human solution to a human problem: you make predictions in advance so that you can't make up what predictions you could have made after the fact. This counters hindsight bias. An ideal Bayesian reasoner, on the other hand, would never be tempted into hindsight bias in the first place, and is free to evaluate hypotheses on old evidence (as already discussed).

So, is gears-level reasoning just pure Bayesian reasoning, in which hypotheses have strictly defined probabilities which don't depend on anything else? Is outside-view reasoning the thing logical induction adds, by allowing the beliefs of a hypothesis to shift over time and to depend on on the wider market state?

This isn't quite right. An ideal Bayesian can still learn to trust the Great Master, based on the reliability of the Great Master's predictions. Unlike a human (and unlike a logical inductor), the Bayesian will at all times have in mind all the possible ways the Great Master's predictions

couldhave become so accurate. This is because a Bayesian hypothesis contains a full joint distribution on all events, and an ideal Bayesian reasons about all hypotheses at all times. In this sense, the Bayesian always operates from an inside view -- it cannot trust the Great Master without a hypothesis which correlates the Great Master with the world.However, it is possible that this correlation is introduced in a very simple way, by ruling out cases where the Great Master and reality disagree without providing any mechanism explaining how this is the case. This may have low prior probability, but gain prominence due to the hit in Bayes-score other hypotheses are taking for not taking advantage of this correlation. It's not a bad outcome given the epistemic situation, but it's not gears-level reasoning, either. So, being fully Bayesian or not isn't

exactlywhat distinguishes whether advanced predictions are needed. What is it?I suggest it's this:

whether the hypothesis is well-defined, such that anyone can say what predictions it makes without extra information.In his post on gears, Valentine mentions the importance of "how deterministically interconnected the variables of the model are". I'm pointing at something close, but importantly distinct: how deterministic thepredictionsare. You know that a coin is very close to equally likely to land on heads or tails, and from this you can (if you know a little combinatorics) compute things like the probability of getting exactly three heads if you flip the coin five times. Anyone with the same knowledge would compute the same thing. The model includes probabilities inside it, but how those probabilities flow is perfectly deterministic.This is a notion of objectivity: a wide variety of people can agree on what probability the model assigns, despite otherwise varied background knowledge.

If a model is well-defined in this way, it is very easy (Bayesian or no) to avoid hindsight bias. You cannot argue about how you could have predicted some result. Anyone can sit down and calculate.

The hypothesis that the Great Master is always correct, on the other hand, does not have this property. Nobody but the Great Master can say what that hypothesis predicts. If I know what the Great Master says about a particular thing, I can evaluate the accuracy of the hypothesis; but, this is special knowledge which I need in order to give the probabilities.

The Bayesian hypothesis which simply forces statements of the Great Master to correlate with the world is somewhat more gears-y, in that there's a probability distribution which can be written down. However, this probability distribution is a complicated mish-mosh of the Bayesian's other hypotheses. So, predicting what it would say requires extensive knowledge of the private beliefs of the Bayesian agent involved. This is typical of the category of non-gears-y models.

## Objection: Doctrines

Infortunately, this account doesn't totally satisfy what Valentine wants.

Suppose that, rather than making announcements on the fly, the Great Master has published a set of fixed Doctrines which his adherents memorize. As in the previous thought experiment, the word of the Great Master is infallible; the application of the Doctrines always leads to correct predictions. However, the contents of the Doctrines appears to be a large mish-mosh of rules with no unifying theme. Despite their apparent correctness, they fail to provide any understanding. It is as if a physicist took all the equations in a physics text, transformed them into tables of numbers, and then transported those tables to the middle ages with explanations of how to use the tables (but none of where they come from). Though the tables work, they are opaque; there is no insight as to how they were determined.

The Doctrines are a deterministic tool for making predictions. Yet, they do not seem to be a gears-level model. Going back to Valentine's three tests, the Doctrines fail test three: we could erase any one of the Doctrines and we'd be unable to rederive it by how it fit together with the rest. Hence, the Doctrines have almost as much of a "trust the Great Master" quality as listening to the Great Master directly -- the disciples would not be able to derive the Doctrines for themselves.

## Second Attempt: Proofs, Axioms, & Two Levels of Gears

My next proposal is that

having a gears-level model is like knowing the proof. You might believe a mathematical statement because you saw it in a textbook, or because you have a strong mathematical intuition which says it must be true. But, you don't have the gears until you can prove it.This subsumes the "deterministic predictions" picture: a model is an axiomatic system. If we know all the axioms, then we can in theory produce all the predictions ourselves. (Thinking of it this way introduces a new possibility, that the model may be well-defined but we may be unable to

findthe proofs, due to our own limitations.) On the other hand, we don't have access to the axioms of the theory embodied by the Great Master, and so we have no hope of seeing the proofs; we can only observe that the Great Master is always right.How does this help with the example of the Doctrines?

The concept of "axioms" is somewhat slippery. There are many equivalent ways of axiomatizing any given theory. We can often flip views between what's taken as an axiom vs what's proved as a theorem. However, the most elegant set of axioms tends to be preferred.

So, we

canregard the Doctrines as one long set of axioms. If we look at them that way, then adherents of the Great Master have a gears-level understanding of the Doctrines if they can successfully apply them as instructed.However, the Doctrines are not an elegant set of axioms. So, viewing them in this way is very unnatural. It is more natural to see them as a set of assertions which the Great Master has produced by some axioms unknown to us. In this respect, we "can't see the proofs".

In the same way, we can consider flipping any model between the axiom view and the theorem view. Regarding the model as axiomatic, to determine whether it is gears-level we only ask whether its predictions are well-defined. Regarding in in "theorem view", we ask if we know how

the model itselfwas derived.Hence, two of Valentine's desirable properties of a gears-level model can be understood as the same property applied at different levels:

Determinism, which is Val's property #2, follows from requiring that we can see the derivations within the model.Reconstructability,Val's property #3, follows from requiring that we can see the derivationofthe model.We might call the first level of gears "made out of gears", and the second level "made

bygears" -- the model itself being constructed via a known mechanism.If we change our view so that a scientific theory is a "theorem", what are the "axioms"? Well, there are many criteria which are applied to scientific theories in different domains. These criteria could be thought of as pre-theories or meta-theories. They encode the hard-won wisdom of a field of study, telling us what theories are likely to work or fail in that field. But, a very basic axiom is: we want a theory to be

the simplest theory consistent with all observations.The Great Master's Doctrines cannot possibly survive this test.To give a less silly example: if we train up a big neural network to solve a machine learning problem, the predictions made by the model are deterministic, predictable from the network weights. However, someone else who knew all the principles by which the network was created would nonetheless train up a very different neural network -- unless they use the very same gradient descent algorithm, data, initial weights, and number and size of layers.

Even if they're the same in all those details, and so reconstruct the same neural network

exactly,there's a significant sense in which they can'tseehow the conclusion follows inevitably from the initial conditions. It's less doctrine-y than being handed a neural network, but it's more doctrine-y than understanding the structure of the problem and why almost any neural network achieving good performance on the task will have certain structures. Remember what I said about mathematical understanding. There's always another level of "being able to see why" you can ask for. Being able to reproduce the proof is different from being able to explain why the proof has to be the way it is.## Exact Statement?

Gears-y ness is a matter of degree, and there are several interconnected things we can point at, and a slippage of levels of analysis which makes everything quite complicated.

, we can point at whether you can see the proof of a theorem. There are several slippages which make this fuzzier than it may seem. First: do you derive it only form the axioms, or do you use commonly known theorems and equivalences (which you may or may not be able to prove if put on the spot)? There's a long continuum between what one mathematician might say to another as proof and a formal derivation in logic. Second: how well can you see why the proof has to be? This is the spectrum between following each proof step individually (but seeing them as almost a random walk) vs seeing the proof as an elementary application of a well-known technique. Third: we can start slipping the axioms. There are small changes to the axioms, in which one thing goes from being an axiom to a theorem and another thing makes the opposite transition. There are also large changes, like formalizing number theory via the Peano axioms vs formalizing it in set theory, where the entire description language changes. You need to translate from statements of number theory to statements of set theory. Also, there is a natural ambiguity between taking something as an axiom vs requiring it as a condition in a theorem.In the ontology of math/logicwe can point at knowing the output of a machine vs being able to run it by hand to show the output. This is a little less flexible than the concept of mathematical proof, but essentially the same distinction. Changing the axioms is like translating the same algorithm to a different computational formalism, like going between Turing machines and lambda calculus. Also, there is a natural ambiguity between a program vs an input: when you run program XYZ with input ABC on a universal Turing machine, you input XYZABC to the universal turing machine; but, you can also think of this as running program XY on input ZABC, or XYZA on input BC, et cetera.In the ontology of computation,we could say "can you see why this has to be, from the structure of the ontology describing things?" "Ontology" is less precise than the previous two concepts, but it's clearly the same idea. A different ontology doesn't necessarily support the same conclusions, just like different axioms don't necessarily give the same theorems. However, the reductionist paradigm holds that the ontologies we use should all be consistent with one another (under some translation between the ontologies). At least, aspire to be eventually consistent. Analogous to axiom/assumption ambiguity and program/input ambiguity, there is ambiguity between an ontology and the cognitive structure which created and justifies the ontology. We can also distinguish more levels; maybe we would say that an ontology doesn't make predictions directly, but provides a language for stating models, which make predictions. Even longer chains can make sense, but it's all subjective divisions. However, unlike the situation in logic and computation, we can't expect to articulate the full support structure for an ontology; it is, after all, a big mess of evolved neural mechanisms which we don't have direct access to.In the ontology of ontology,Having established that we can talk about the same things in all three settings, I'll restrict myself to talking about ontologies.

A conclusion is gears-like with respect to a particular ontology to the extent that you can "see the derivation" in that ontology. A conclusion is gears-like without qualification to the extent that you can also "see the derivation" of the ontology itself. This is contiguous with gears-ness relative to an ontology, because of the natural ambiguity between programs and their inputs, or between axioms and assumptions. For a given example, though, it's generally more intuitive to deal with the two levels separately.Two-level definition of gears:There are several things to point at by this phrase.Seeing the derivation:precisepredictions. This could be seen as a prerequisite of "seeing the derivation": first, we must be saying somethingspecific; then, we can ask if we can say why we're saying that particular thing. This implies that models are more gears-like when they are more deterministic, all other things being equal.predictionsof the model are deterministic; the standard way of assigning probabilities to dice is very gears-like, despite placing wide probabilities. I think these are simply two different important things we can talk about.Seeing the derivation is about explicitness and external objectivity. You can trivially "execute the program" generating any of your thoughts, in that you thinking

isthe program which generated the thoughts. However, the execution of this program could rely on arbitrary details of your cognition. Moreover, these details are usually not available for conscious access, which means you can't explain the train of thought to others, and even you may not be able to replicate it later. So, a model is more gears-like the morereplicableit is. I'm not sure if this should be seen as an additional requirement, or an explanation of where the requirements come from.## Conclusion, Further Directions

Obviously, we only touched the tip of the iceberg here. I started the post with the claim that I was trying to hash out the implications of logical induction for practical rationality, but secretly, the post was about things which logical inductors can only barely begin to explain. (I think these two directions support each other, though!)

We need the framework of logical induction to understand some things here, such as how you still have degrees of understanding when you already have the proof / already have a program which predicts things perfectly (as discussed in the "mathematical understanding" section). However, logical inductors don't look like they care about "gears" -- it's not very close to the formalism, in the way that TEOTE gave a notion of technical explanation which is close to the formalism of probability theory.

I mentioned earlier that logical induction suffers from the old evidence problem more than Bayesianism. However, it doesn't suffer in the sense of losing bets it could be winning. Rather,

wesuffer, when we try to wrap our heads around what's going on. Somehow, logical induction is learning to do the right thing -- the formalism is just not very explicit about how it does this.The idea (due to Sam Eisenstat, hopefully not butchered by me here) is that logical inductors get around the old evidence problem by learning notions of objectivity.

A hypothesis you come up with later can't gain any credibility by fitting evidence from the past. However, if you register a prediction

ahead of timethat a particular hypothesis-generation process will eventually turn up something which fits the old evidence, youcanget credit, and use this credit to bet on what the hypothesis claims will happen later. You're betting on a particular school of thought, rather than a known hypothesis. "You can't make money by predicting old evidence, but you may be able to find a benefactor who takes it seriously."In order to do this, you need to specify a precise prediction-generation process which you are betting in favor of. For example, Solomonoff Induction can't run as a trader, because it is not computable. However, the probabilities which it generates are well-defined (if you believe that halting bits are well-defined, anyway), so you can make a business of betting that its probabilities will have been good in hindsight. If this business does well, then the whole market of the logical inductor will shift toward trying to make predictions which Solomonoff Induction will later endorse.

Similarly for other ideas which you might be able to specify precisely without being able to run right away. For example, you can't find all the proofs right away, but you could bet that all the theorems which the logical inductor observes

haveproofs, and you'd be right every time. Doing so allows the market to start betting it'll see theorems if it sees that they're provable, even if it hasn't yet seen this rule make a successful advance prediction. (Logical inductors start out really ignorant of logic; they don't know what proofs are or how they're connected to theorems.)This doesn't

exactlypush toward gears-y models as defined earlier, but it seems close. You push toward anything for which you can provide an explicit justification, where "explicit justification" is anything you can name ahead of time (and check later) which pins down predictions of the sort which tend to correlate with the truth.This doesn't mean the logical inductor converges entirely to gears-level reasoning. Gears were never supposed to be everything, right? The optimal strategy combines gears-like and non-gears-like reasoning. However, it

doessuggest that gears-like reasoning has an advantage over non-gears reasoning: it can gain credibility from old evidence. This will often push gears-y models above competing non-gears considerations.All of this is still terribly informal, but is the sort of thing which could lead to a formal theory. Hopefully you'll give me credit later for that advanced prediction.