Searching for Bayes-Structure

1Tom_McCabe2

5Benoit_Essiambre

5Eliezer Yudkowsky

3Will_Sawin

4wnoise

1Will_Sawin

8Richard_Hollerith2

2Adirian

2AshwinV

2[anonymous]

3Tom3

17Tom_McCabe2

-1Will_Pearson

0brent

-1Brandon_Reinhart

0Roland2

2Adirian

1Tom_McCabe2

6Will_Pearson

5Zubon

3iwdw

1gwern

2Ramana Kumar

11gwern

2Ramana Kumar

2gwern

11Eliezer Yudkowsky

0Silas

0Peterdjones

1Nick_Tarleton

0anonymous12

0Silas

2Paul_Gebheim

1Tom_McCabe2

-1Adirian

1Silas

2Tom_McCabe2

1Cyan2

1Will_Pearson

2anonymous12

1Cyan2

-2Adirian

21beriukay

8RobinZ

2beriukay

-1lockeandkeynes

1cousin_it

4Evan O'Leary

New Comment

So, we already have the underlying mathematical structure of the first half of cognition (determining the state of the world). What about the second half- influencing the world in a predictable way to achieve goals? We know that this is solvable, given enough computing power and omniscience- just dump the world data into a quantum simulator, run all possible modifications that the optimization process can make, and see which one has the highest expected utility (this, if I understand correctly, is basically what AIXI does).

"It was previously pointed out to me that I might be losing some of my readers with the long essays"

I for one find the long mathematical bayesian proselytizing some of your most fascinating posts. I can't wait for the next ones.

*So, we already have the underlying mathematical structure of the first half of cognition (determining the state of the world). What about the second half- influencing the world in a predictable way to achieve goals?*

= creating mutual information between your utility function and the world. This is left as an exercise to the reader.

"Correlation" is a big old fuzzy mess, usually just defined in terms of what's not correlated. As a result it boils down to E[x]E[y] =/= E[xy], or sometimes p(x|y) =/= p(x). It can only really be made quantitative (i.e. correlation coefficients) with linear variables, rather than categories. Mutual information really captures in a quantitative way how much you can predict one from the other.

That said, they're both bad terms because a utility function is not a probability distribution.

But you can have a probability distribution of utility functions. Now that is true only in certain circumstances, but there is a simple model in which you can make a very nice probabilistic statement.

If the state of the world consists of a vector of N real variables, and a utility function is another vector, with the utility being the dot product (meaning all utility functions are linear in those variables), and the expected value of each coefficient is 0

then expected utility can be expressed as the covariance of this vector, and rational behavior maximizes that covariance. So that's something.

*creating mutual information between your utility function and the world* without changing your utility function.

Eliezer - Bayesian theory is a model. It isn't the universe. This is where you will be losing most of your readers - yes, you can express anything in Bayesian terms. You can express anything in C, too - this doesn't mean the universe is a program, and it doesn't provide any fundamental insight into how the universe works.

Bayesianism, has the potential to go beyond just a means of expression and become a tool for decision making as well. Expressing a problem in Bayesian terms will lead to insights that will help you in solving the problem. Writing the same problem in C, may not do the same.

I dont know jack shit *(like zilch, absolute zero)* about programming, but I think there's a way to explain this. You can express anything in C, and make it functionally representative of a particular model.If you had to model the entire universe that way, you probably would use some structure in your code that is very representative of a bayesian model (I actually mean *exactly* like a bayesian model.) In this case, your merely writing the underlying principle (which happens to be Bayesian) in C.

C is Turing-complete, which means Gödel-complete, so yeah, the universe can be viewed as a C program.

"yes, you can express anything in Bayesian terms. You can express anything in C, too"

A Turing-equivalent programming language (eg, lambda calculus) provides very little useful information about the universe, because you can use it to produce almost anything. It's very simple to write a C program that spits out 2 + 2 = 5, or any other incorrect statement you want. You can't do this with Bayesian logic- you can't juggle the math around and get a program written in Bayes-language that assigns a 1% probability to the Sun rising.

"this doesn't mean the universe is a program,"

The laws of physics the universe runs on are provably Turing-equivalent.

Bayesian health warning, above argument does not take all the evidence into account.

Remember all of the above is ignoring quantum physics, including such things as the no cloning theorem. Which, I think, has significant things to say about getting into the state of mutual information with the environment.

dude.

I am SO buying your book when you write it - if only because then your writing will have an introduction. I feel like I've jumped in at the middle here.

brent, if you search for "Bayesian" you'll a fairly tight list of all relevant posts (for the most part). Start at the bottom and work your way up. Either that or you could just go back six months and start working your way through the archives.

Maybe it is time someone wrote a summary page and indexed this work.

erratum: disguies -> disguises

*I was actually a bit disappointed that no one in the audience jumped up and said: "Yes! Yes, that's it! Of course! It was really Bayes all along!"*

Well, I can't speak for the others but I'm very excited about all this and I try to read your postings every day. I always wondered about the mysteries of AI and human cognition and I hope you are right!

Tom - you can't write a C program that adds 2 and 2 and gets 5. You can write a C program that takes two and two, and produces five - through an entirely different algorithm than addition. And you're adding in an additional layer of model, besides - remember that 2 means absolutely nothing in the universe. "Two" is a concept within a particular mathematical model. You can choose the axioms for your model pretty much at will - the only question is how you have to twist the model to make it describe the universe.

And yes, I can write a program in Bayes-language that assigns a 1% probability to the Sun rising - simply by changing the definitions for these things, as you did when you wrote that you could write a C program that added 2 and 2 to get 5. It is the definitions - a form of axiom in themselves - that give meaning to the modeling language. Bayes-language can describe a universe completely contradictory to the one we live in, simply by using different definitions.

Bayes-language doesn't naturally describe the probability of the Sun rising, after all - you can't derive that probability from Bayes-language itself. You have to code in every meaningful variable, and their relationships to one another. This is no different from what you do in C.

And first, no, the laws of physics are provably no such thing - as we have no way to assign probability that we have a significant enough subset of those laws to be able to produce meaningful predictions of the laws of physics we don't yet know. And second, the laws of physics are equivalent to multiple contradictory coordinate systems. Any model can be, with the correct translations and transformations and definitions, accurate in describing the universe. That the universe behaves as a model might expect it to, therefore, says nothing about the universe, only about the model - and only as a model.

"Tom - you can't write a C program that adds 2 and 2 and gets 5."

Obviously, you can't rewrite the laws of math with C. But a C program can produce obviously incorrect statements, such as "2 + 2 = 5". There is, on average, one bug in every ten lines of C code.

"And you're adding in an additional layer of model, besides - remember that 2 means absolutely nothing in the universe."

See http://lesswrong.com/lw/ms/is_reality_ugly/.

"Bayes-language can describe a universe completely contradictory to the one we live in, simply by using different definitions."

Then, of course, it is no longer Bayes-language. You cannot simply redefine math- every theorem is tangled up with every other theorem to produce a coherent system, which will give you exactly one correct answer to every question. See http://lesswrong.com/lw/nz/arguing_by_definition/.

"This is no different from what you do in C."

It's perfectly possible to write a C program that inputs all the right data and generates garbage. You cannot write a Bayes program that inputs all the right data and generates garbage.

"as we have no way to assign probability that we have a significant enough subset of those laws to be able to produce meaningful predictions of the laws of physics we don't yet know."

Every prediction that the laws of physics make has been tested over and over again (often to ten decimal places or more).

"And second, the laws of physics are equivalent to multiple contradictory coordinate systems."

The laws of physics do not require a coordinate system of any sort to function, although this admittedly requires some pretty fancy math to get at (see Gravitation, by Meisner, Wheeler and Thorne).

"Any model can be, with the correct translations and transformations and definitions, accurate in describing the universe."

If I wrote a version of GR that made gravity repulsive instead of attractive (a perfectly valid thing to do, mathematically), it would not be accurate in describing the universe, as this universe does not make things fall up.

"You cannot write a Bayes program that inputs all the right data and generates garbage."

That sounds like a challenge. Care to formalise things so this theory can be tested?

I will agree that, in book form, this section will need more signposts and structure. Otherwise, readers will skim past things you want them to read.

Also, resist RPG references unless you explain them in the main text. Here, online, you will have many audience members who own that Spelljammer book and need no explanation for GURPS Friendly AI rules and where the jokes are. Most audiences will not get the significance of having a minor helm in a gnomish helm. They don't even know what tinker gnomes are.

"The laws of physics the universe runs on are provably Turing-equivalent."

Are there any links or references for this? That sounds like fascinating reading.

It's a trivial observation based on a constructive proof ie. that which I'm writing and you're reading on.

(There is the issue of resource consumption, but then we have the result that the universe is Turing-complete for anything small enough.)

Turing-equivalent is usually used to mean that one system is at least as powerful as some sort of TM or UTM. Your computer is some sort of TM or UTM, and it exists inside the universe, so the universe (or its laws, rather) is quite obviously Turing-equivalent. That's the trivial observation.

Sometimes Turing-equivalent is said to be true only if a system can both implement some sort of TM or UTM within itself, and if it can also be implemented within some sort of TM or UTM. This is a little more objectionable and not trivial, but so far I haven't seen anyone demolish the various 'digital physics' proposals or the Church-Turing surmise by pointing out some natural process which is incomputable (except perhaps the general area of consciousness, but if you're on Less Wrong you probably accept the Strong AI thesis already).

Thanks for the reply. I want to follow a related issue now.

So are all natural processes computable (as far as we know)?

I want to know whether the question above makes sense, as well as its answer (if it does make sense).

I have trouble interpreting the question because I understand computability to be about effectively enumerating subsets of the natural numbers, but I don't find the correspondence between numbers and nature trivial. I believe there is a correspondence, but I don't understand how correspondence works. Is there something I should read or think about to ease my confusion? (I hope it's not *impenetrable* nonsense to both believe something and not know what it means.)

I have trouble interpreting the question because I understand computability to be about effectively enumerating subsets of the natural numbers, but I don't find the correspondence between numbers and nature trivial. I believe there is a correspondence, but I don't understand how correspondence works. Is there something I should read or think about to ease my confusion? (I hope it's not impenetrable nonsense to both believe something and not know what it means.)

A hard question. I know no good solid answer; people have tried to explain 'why couldn't that rock over there be processing a mind under the right representation?' It's one of those obscene questions - we know when a physics model is simulating nature, and when a computation is doing nothing like simulating nature, but we have no universally accepted criterion. Eliezer has written some entries on this topic, though I don't have them to hand.

*You cannot write a Bayes program that inputs all the right data and generates garbage.*

Yes you can. All it needs is the wrong prior. Hence the problem of induction.

Truth-finding --> Bayes-structure Bayes-structure -/-> truth-finding

Eliezer_Yudkowsky: To condense, you're saying that between the time the mind knows nothing (in a human's case, conception) to the time when it has knowledge of the world, it must have performed Bayesian inference (I'm trying to be more specfic than your frequent "Bayesian-like processes"), because there is only a tiny probability of the mind's belief matching the world without doing so, similar to that of the probability of an egg unscrambling itself, water spontaneously giving you work, etc.

Now, I either have a counterexample, or misunderstand the generality of your claim. Evolutions would tend to give humans brains with beliefs that largely matched the world, else they would be weeded out. So, after conception, as the mind grows, it would build itself up (as per its genetic code, proteome, bacteria, etc.) with beliefs that match the world, even if it didn't perform any Bayesian inferences.

So, is this a genuine counterexample, or would you say that the evolutionary history functioned as a sort of mind that "encountered evidence" (organisms with poor beliefs dying out), which then built up a database of information about the world that it would then inject into new organisms?

Or did I miss your point somehow?

*The laws of physics the universe runs on are provably Turing-equivalent.*

Our current model of physics is. Not that I expect future models to change this, but it's important to remember the difference.

*Evolutions would tend to give humans brains with beliefs that largely matched the world, else they would be weeded out. So, after conception, as the mind grows, it would build itself up (as per its genetic code, proteome, bacteria, etc.) with beliefs that match the world, even if it didn't perform any Bayesian inferences.*

Note that natural selection can be seen as a process of Bayesian inference: gene frequencies represent prior and posterior probabilities, while the fitness landscape is equivalent to a likelihood function. However, evolution can only provide the mind with *prior* beliefs; presumably, these beliefs would have to match the ancestral evolutionary environment.

Yes, anonymous, that's exactly what I was getting at. Eliezer_Yudkowsky's claim about a mind acquiring knowledge after first starting with nothing could only be true if we viewed the evolutionary history as the "mind". My caution here is against thinking that one understands the human brain because he has inferred that after conception, that human must have observed evidence on which he performed Bayesian inference (which could somehow be captured in an AI). In reality, this need not be the case at all -- that new human, upon growing, could simply have been fed accurate knowledge about the world, gathered through that evolution history, which coincidentally matches the world, even though he didn't gain it through any Bayesian inference.

So, again, am I too far out on a limb here?

Silas,

I think that could have read "At time T=1, the mind has 10 bits of mutual information... at time T=2, the mind has 100 bits of mutual information," and meant the same thing. Meaning, he's saying that, if during any time period the mind has acquired mutual information with S, then mind must have encountered evidence. This doesn't preclude us from starting out with some bits of information. The statement is about the change in the amount of information; starting from 0 is just convenient for the explanation.

"Yes you can. All it needs is the wrong prior."

I included this under "input the right data". Obviously, if you assign a prior of 10^-(10^(10^100)) to the Sun rising, you aren't going to get a significant probability for it happening no matter how many times you watch it rise.

"Are there any links or references for this? That sounds like fascinating reading."

See Feynman's QED for a popular explanation of quantum math, and http://en.wikipedia.org/wiki/General_relativity for GR (I'm not aware of any good books on GR that don't have a lot of fancy math).

"Evolutions would tend to give humans brains with beliefs that largely matched the world, else they would be weeded out."

This is not really true; see http://www.singinst.org/upload/CFAI/anthro.html#observer.

Tom -

"Obviously, you can't rewrite the laws of math with C. But a C program can produce obviously incorrect statements, such as "2 + 2 = 5". There is, on average, one bug in every ten lines of C code."

- That, of course, is a completely different statement. But then you are suggesting that Bayes-Language is incapable of representing a false statement - which is an obvious lie.

"See http://lesswrong.com/lw/ms/is_reality_ugly/."

- Yup. I see it. It's begging the point that I'm arguing - that the model is the universe.

"Then, of course, it is no longer Bayes-language. You cannot simply redefine math- every theorem is tangled up with every other theorem to produce a coherent system, which will give you exactly one correct answer to every question. See http://lesswrong.com/lw/nz/arguing_by_definition/."

- Yes, it is Bayes-language. Mathematics does NOT describe the universe, it describes mathematics - it is the variables which you input INTO the mathematics which make it describe a particular real-world situation. Mathematics is a modeling language no different from any other save in precision.

"It's perfectly possible to write a C program that inputs all the right data and generates garbage. You cannot write a Bayes program that inputs all the right data and generates garbage."

- You're begging the point, and yes, you can. Others have put this eloquently enough, however.

"Every prediction that the laws of physics make has been tested over and over again (often to ten decimal places or more)."

- You missed the point - we can't predict what the next law of physics we'll discover will be.

"The laws of physics do not require a coordinate system of any sort to function, although this admittedly requires some pretty fancy math to get at (see Gravitation, by Meisner, Wheeler and Thorne)."

- That's very good, if not entirely accurate. All variables are variables on some coordinate system or another, after all, if not a spacial one. The coordinate systems are particular mathematical models.

"If I wrote a version of GR that made gravity repulsive instead of attractive (a perfectly valid thing to do, mathematically), it would not be accurate in describing the universe, as this universe does not make things fall up."

- You didn't perform the appropriate transformations. They get quite nasty in this case, as your coordinate system would have to warp quite considerably in some fashion or another, but it can be done. As a very simple example, suppose a two-particle system, with the perspective as one of the particles; you then merely need to change the behavior of your measuring concept - say, light - to arrive in a time T inversely proportional to the distance. More complex systems with more complex variables would require exponentially more complex transformations to describe related concepts.

*~ eek! Three posts in "recent comments", going to get banned. ~*

Paul_Gebheim: The mind doesn't have to get evolutions' coincidentally-correct knowledge injected immediately at conception, so that re-interpretation wouldn't save Eliezer_Yudkowsky's point. The brain is slowly built up over time with assistance from the genetic code, the proteome, and microorganisms in the environment, and that interaction could very well give the brain non-Bayesian knowledge. And before you say, "but that's still mutual information with the environment!", yes it is, but it's the equivalent of accepting a belief on faith that happens to be true, or drawing a map of somewhere you've never been that happens to be accurate.

Tom_McCabe: *"Evolutions would tend to give humans brains with beliefs that largely matched the world, else they would be weeded out." This is not really true;*

Okay. So evolutions don't give organisms' brains *any* knowledge. And Eliezer_Yudkowsky's point is that much weaker.

"That, of course, is a completely different statement. But then you are suggesting that Bayes-Language is incapable of representing a false statement - which is an obvious lie."

Bayes-language can represent statements with very small probabilities, but then, of course, they will be assigned very small probabilities. You cannot assign a probability of .1% to the Sun rising without fudging the evidence (or fudging the priors, as Eli pointed out).

"- Yes, it is Bayes-language."

So much for begging the question. Please do a calculation, using the theorems of Bayes (or theorems derived from Bayesian theorems), which gives an incorrect number given correct numbers as input.

"Mathematics does NOT describe the universe,"

Using mathematics to describe the universe goes all the way back to Ptolemy. It isn't going away anytime soon.

"Mathematics is a modeling language no different from any other save in precision."

Ah, here we have found one who does not comprehend the beauty of math. Alas, it is beyond my ability to impart such wisdom in a blog comment. Just drive down to your local university campus and start taking math classes- you'll get it eventually.

"All variables are variables on some coordinate system or another, after all, if not a spacial one."

Neither GR nor QED requires a coordinate system of any sort. This is, admittedly, hard to wrap your head around, especially without going into the math. To name a simple example, it is mathematically impossible to cover the surface of a sphere (or, by topological extension, any closed surface) with a single coordinate system without creating a singularity. Needless to say, this does not mean that there must be some point on Earth where numbers go to infinity.

"- You missed the point - we can't predict what the next law of physics we'll discover will be."

We can predict that they won't violate the earlier ones.

"You didn't perform the appropriate transformations."

You simply flip the sign on the gravitational constant G. No geometric transformations required.

"Okay. So evolutions don't give organisms' brains *any* knowledge."

Evolution gives brains a system for acquiring knowledge, which is pseudo-Bayesian but operates under different surface rules. See Judgment Under Uncertainty or any other H&B textbook.

Suggestion: instead of playing point-counterpoint, pick some key words and play Taboo with them.

"So much for begging the question. Please do a calculation, using the theorems of Bayes (or theorems derived from Bayesian theorems), which gives an incorrect number given correct numbers as input."

Couldn't we say the same thing for Turing machines? "Please do a computation, using a Universal Turing Machine (or equivalent), which gives an incorrect number, given correct numbers as input."

Remember that a Universal Turing Machine takes a Turing machine as an input, so you can't muck around with the algorithm it runs, without making the input "incorrect".

I thought the whole point of probabilistic methods is that it doesn't matter too much what the prior is, it will always eventually converge on the right answer...

Well apart from in some cases. The following is a situation where, unless you give the system exactly the right prior it will never come to the right answer. Not quite what you were after but shows a hole in bayes to my mind.

Environmental output is the entire affect that a computation has on the environment (e.g. heat, radiation, reduction in the energy of the power source).

In the Sensitive Urn the colours of the balls are dependent upon the average environmental output from the processing done in the area since the last sample. That is they are a function of the processing done. We could represent knowledge about the probability function in the following way with the standard notation

P (r|Φ(μ, ts − 100, ts ) > 10) Being the probability that a ball is red with having there been at least 10 environmental output per millisecond in the 100 milliseconds before the time of the current sample ts . We shall say that the probabilistic reasoner outputs 20 outputs per millisecond and so fulfils this property. This value is therefore found during its normal operation and sampling. However

P (r| ∼ Φ(μ, ts−1 ts ) > 10) , the probability that the ball will be red if there is no such processing in the area is harder to find. For the sake of argument say that this is the most efficient bayesian reasoner that we could build. In order to find this value would require that the sampler no longer process to the same extent and because processing is required to update probabilities it can no longer update probabilities. It is in effect a blind spot, a place that the sampler cannot go with out changing itself and stopping being a Bayesian sampler.

"I thought the whole point of probabilistic methods is that it doesn't matter too much what the prior is, it will always eventually converge on the right answer..."

AIUI this is somewhat misleading. Bayesian methods are most valuable precisely when the amount of available data is limited and prior probability is important. Whenever "it doesn't matter too much what the prior is", it makes more sense to use frequentist methods, which rely on large amounts of data to converge to the right solution.

Of course frequentist tools *also* make assumptions about the data and some of these assumptions may be disguised and poorly understood (making sense of these is arguably part of the "searching for Bayes structure" program), but some interpretations are straightforward: for instance, likelihood-based methods are equivalent to Bayesian methods assuming a uniform prior distribution.

(As an aside, it's ironic that Bayesian interpretation of such statistical tools is being pursued for the sake of rigor, given that frequentist statistics itself was developed as a reaction to widespread ad-hoc use of the "principle of inverse probability".)

Whenever "it doesn't matter too much what the prior is", it makes more sense to use frequentist methods, which rely on large amounts of data to converge to the right solution.

... but only when the frequentist methods are easier to get working than the Bayesian approach. Even in large sample settings, it doesn't make sense to give up the nice things that Bayesian methods provide (like coherence, directly interpretable credible intervals and regions, marginalization for nuisance parameters, etc.) unless the tradeoff gives you something of value in return, e.g., a reasonably accurate answer computed much faster.

"Bayes-language can represent statements with very small probabilities, but then, of course, they will be assigned very small probabilities. You cannot assign a probability of .1% to the Sun rising without fudging the evidence (or fudging the priors, as Eli pointed out)."

- Yes you can. You can have insufficient evidence. (Your probability "assignment" will have very low probability of being correct, but the assignment itself could still easily by .1%.)

"So much for begging the question. Please do a calculation, using the theorems of Bayes (or theorems derived from Bayesian theorems), which gives an incorrect number given correct numbers as input."

- How about this as a counterchallenge - produce a correct number, any correct number at all, as it relates to the actual universe.

Incorrect numbers are generated constantly using probabilistic methods - they're eliminated or refined as more evidence comes along.

"Using mathematics to describe the universe goes all the way back to Ptolemy. It isn't going away anytime soon."

- If you're going to address a single statement, you should really pay attention to context.

"Ah, here we have found one who does not comprehend the beauty of math. Alas, it is beyond my ability to impart such wisdom in a blog comment. Just drive down to your local university campus and start taking math classes- you'll get it eventually."

- Beauty is truth, truth beauty? If you're going to argue reality you'll have to do better than the aesthetic value of mathematics.

"Neither GR nor QED requires a coordinate system of any sort. This is, admittedly, hard to wrap your head around, especially without going into the math. To name a simple example, it is mathematically impossible to cover the surface of a sphere (or, by topological extension, any closed surface) with a single coordinate system without creating a singularity. Needless to say, this does not mean that there must be some point on Earth where numbers go to infinity."

- Everything requires a coordinate system. For every value that HAS a value, there is an axis upon which its values are calculated. It might be a very simple boolean axis, and it might be a more complex one, representing a logarithmic function. But if a value has value, that value will be stored in some sort of mathematic concept space.

"We can predict that they won't violate the earlier ones."

- No, we can't.

"You simply flip the sign on the gravitational constant G. No geometric transformations required."

- Which is utterly irrelevant to the point I was making. Yes, there are simpler transformations, and less lossy ones in many cases. But the point was that any model can represent the universe, not that all are equally messy.

The Youtube link to "Yes!" is unfortunately broken.

This isn't exactly related, but I find that every link to other Less Wrong posts is like a little game. I try to guess which article the words are referencing. Like with "not so easy" I successfully predicted that it would be to the short inferential distances. I failed with the "stupid design", expecting it to be the blind alien god. That means I need to reread stupid design, and perhaps the alien god. The second part to the game is that I try to remember the main points, and all the thoughts I had when reading the cited post.

Were I playing the game, I would predict that the broken link was for Life of Brian, the "You are all individuals!" scene.

Did that beautiful scene of scientists finding Bayesian rhythm in cognitive phenomena actually happen?

Non-probabilistic formulations of the laws of thermodynamics now exist: https://arxiv.org/pdf/1608.02625.pdf

They are better than other formulations: e.g. they are scale independent and can explain the time asymmetry of the 2nd law.

We have seen that knowledge implies mutual information between a mind and its environment, and we have seen that this mutual information is negentropy in a very physical sense: If you know where molecules are and how fast they're moving, you can turn heat into work via a Maxwell's Demon / Szilard engine.

We have seen that forming true beliefs without evidence is the same sort of improbability as a hot glass of water spontaneously reorganizing into ice cubes and electricity. Rationality takes "work" in a thermodynamic sense, not just the sense of mental effort; minds have to radiate heat if they are not perfectly efficient. This cognitive work is governed by probability theory, of which thermodynamics is a special case. (Statistical mechanics is a special case of statistics.)

If you saw a machine continually spinning a wheel, apparently without being plugged into a wall outlet or any other source of power, then you would look for a hidden battery, or a nearby broadcast power source - something to explain the work being done, without violating the laws of physics.

So if a mind is arriving at true beliefs, and we assume that the second law of thermodynamics has not been violated, that mind must be doing something at least

vaguelyBayesian - at least one process with a sort-of Bayesian structuresomewhere- or itcouldn't possibly work.In the beginning, at time T=0, a mind has no mutual information with a subsystem S in its environment. At time T=1,the mind has 10 bits of mutual information with S. Somewhere in between, the mind must have encountered evidence - under the Bayesian definition of evidence, because all Bayesian evidence is mutual information and all mutual information is Bayesian evidence, they are just different ways of looking at it - and processed at least some of that evidence, however inefficiently, in the right direction according to Bayes on at least some occasions. The mind must have

moved in harmony with the Bayesat least a little, somewhere along the line - either that or violated the second law of thermodynamics by creating mutual information from nothingness.In fact, any

partof a cognitive process thatcontributes usefullyto truth-finding must have at least a little Bayesian structure - must harmonize with Bayes, at some point or another - must partially conform with the Bayesian flow, however noisily - despite however many disguising bells and whistles - even if this Bayesian structure is only apparent in the context of surrounding processes. Or it couldn't evenhelp.How philosophers pondered the nature of words! All the ink spent on the true definitions of words, and the true meaning of definitions, and the true meaning of meaning! What collections of gears and wheels they built, in their explanations! And all along, it was a disguised form of Bayesian inference!

I was actually a bit disappointed that no one in the audience jumped up and said: "

Yes! Yes, that's it! Of course! It was really Bayes all along!"But perhaps it is not

quiteas exciting to see something thatdoesn'tlook Bayesian on the surface, revealed as Bayes wearing a clever disguise, if: (a) you don't unravel the mystery yourself, but read about someone else doing it (Newton had more fun than most students taking calculus), and (b) you don't realize thatsearching for the hidden Bayes-structureis this huge, difficult, omnipresent quest, like searching for the Holy Grail.It's a different quest for each facet of cognition, but the Grail always

turns outto be the same. It has to be therightGrail, though - and theentireGrail, without any parts missing - and so each time you have to go on the quest looking for a full answerwhateverform it may take, rather than trying to artificially construct vaguely hand-waving Grailish arguments.Thenyou always find the same Holy Grail at the end.It was previously pointed out to me that I might be losing some of my readers with the long essays, because I hadn't "made it clear where I was going"...

...but it's not so easy to just tell people where you're going, when you're going somewhere like

that.It's not very helpful to merely

know thata form of cognition is Bayesian, if you don'tknow howit is Bayesian. If you can't see the detailed flow of probability, you have nothing but a password - or, a bit more charitably, a hint at the form an answer would take; but certainly not an answer. That's why there's a Grand Quest for the Hidden Bayes-Structure, rather than being done when you say "Bayes!" Bayes-structure can be buried under all kinds of disguies, hidden behind thickets of wheels and gears, obscured by bells and whistles.The way you begin to grasp the Quest for the Holy Bayes is that you learn about cognitive phenomenon XYZ, which seems really useful - and there's this bunch of philosophers who've been arguing about its true nature for centuries, and they are still arguing - and there's a bunch of AI scientists trying to make a computer do it, but they can't agree on the philosophy either -

And -

Huh, that's odd!- this cognitive phenomenon didn't look anything like Bayesian on the surface, but there's this non-obvious underlying structure that has a Bayesian interpretation - but wait, there's still some useful work getting done that can't be explained in Bayesian terms - no wait,that'sBayesian too - OH MY GOD thiscompletely differentcognitive process, thatalsodidn't look Bayesian on the surface, ALSO HAS BAYESIAN STRUCTURE - hold on, are these non-Bayesian parts evendoinganything?Once this happens to you a few times, you kinda pick up the rhythm. That's what I'm talking about here, the rhythm.

Trying to talk about the rhythm is like trying to dance about architecture.

This left me in a bit of a pickle when it came to trying to explain in advance where I was going. I know from experience that if I say, "Bayes is the secret of the universe," some people may say "Yes! Bayes is the secret of the universe!"; and others will snort and say, "How narrow-minded you are; look at all these other ad-hoc but amazingly useful methods, like regularized linear regression, that I have in my toolbox."

I hoped that with a specific example in hand of "something that doesn't look all that Bayesian on the surface, but turns out to be Bayesian after all" -

andan explanation of the difference between passwords and knowledge -andan explanation of the difference between tools and laws - maybethenI could convey such of the rhythm as can be understood without personally going on the quest.Of course this is not the

fullSecret of the Bayesian Conspiracy, but it's all that I can convey at this point. Besides, the complete secret is known only to the Bayes Council, and if I told you, I'd have to hire you.To

see throughthe surface adhockery of a cognitive process, to the Bayesian structureunderneath- to perceive the probability flows, andknow how,not justknow that,this cognition too is Bayesian - as it always is - as it always must be - to be able to sense the Force underlying all cognition - this, is the Bayes-Sight.