It is said that parents do all the things they tell their children not to do, which is how they know not to do them.
Long ago, in the unthinkably distant past, I was a devoted Traditional Rationalist, conceiving myself skilled according to that kind, yet I knew not the Way of Bayes. When the young Eliezer was confronted with a mysterious-seeming question, the precepts of Traditional Rationality did not stop him from devising a Mysterious Answer. It is, by far, the most embarrassing mistake I made in my life, and I still wince to think of it.
What was my mysterious answer to a mysterious question? This I will not describe, for it would be a long tale and complicated. I was young, and a mere Traditional Rationalist who knew not the teachings of Tversky and Kahneman. I knew about Occam’s Razor, but not the conjunction fallacy. I thought I could get away with thinking complicated thoughts myself, in the literary style of the complicated thoughts I read in science books, not realizing that correct complexity is only possible when every step is pinned down overwhelmingly. Today, one of the chief pieces of advice I give to aspiring young rationalists is “Do not attempt long chains of reasoning or complicated plans.”
Nothing more than this need be said: even after I invented my “answer,” the phenomenon was still a mystery unto me, and possessed the same quality of wondrous impenetrability that it had at the start.
Make no mistake, that younger Eliezer was not stupid. All the errors of which the young Eliezer was guilty are still being made today by respected scientists in respected journals. It would have taken a subtler skill to protect him than ever he was taught as a Traditional Rationalist.
Indeed, the young Eliezer diligently and painstakingly followed the injunctions of Traditional Rationality in the course of going astray.
As a Traditional Rationalist, the young Eliezer was careful to ensure that his Mysterious Answer made a bold prediction of future experience. Namely, I expected future neurologists to discover that neurons were exploiting quantum gravity, a la Sir Roger Penrose. This required neurons to maintain a certain degree of quantum coherence, which was something you could look for, and find or not find. Either you observe that or you don’t, right?
But my hypothesis made no retrospective predictions. According to Traditional Science, retrospective predictions don’t count—so why bother making them? To a Bayesian, on the other hand, if a hypothesis does not today have a favorable likelihood ratio over “I don’t know,” it raises the question of why you today believe anything more complicated than “I don’t know.” But I knew not the Way of Bayes, so I was not thinking about likelihood ratios or focusing probability density. I had Made a Falsifiable Prediction; was this not the Law?
As a Traditional Rationalist, the young Eliezer was careful not to believe in magic, mysticism, carbon chauvinism, or anything of that sort. I proudly professed of my Mysterious Answer, “It is just physics like all the rest of physics!” As if you could save magic from being a cognitive isomorph of magic, by calling it quantum gravity. But I knew not the Way of Bayes, and did not see the level on which my idea was isomorphic to magic. I gave my allegiance to physics, but this did not save me; what does probability theory know of allegiances? I avoided everything that Traditional Rationality told me was forbidden, but what was left was still magic.
Beyond a doubt, my allegiance to Traditional Rationality helped me get out of the hole I dug myself into. If I hadn’t been a Traditional Rationalist, I would have been completely screwed. But Traditional Rationality still wasn’t enough to get it right. It just led me into different mistakes than the ones it had explicitly forbidden.
When I think about how my younger self very carefully followed the rules of Traditional Rationality in the course of getting the answer wrong, it sheds light on the question of why people who call themselves “rationalists” do not rule the world. You need one whole hell of a lot of rationality before it does anything but lead you into new and interesting mistakes.
Traditional Rationality is taught as an art, rather than a science; you read the biography of famous physicists describing the lessons life taught them, and you try to do what they tell you to do. But you haven’t lived their lives, and half of what they’re trying to describe is an instinct that has been trained into them.
The way Traditional Rationality is designed, it would have been acceptable for me to spend thirty years on my silly idea, so long as I succeeded in falsifying it eventually, and was honest with myself about what my theory predicted, and accepted the disproof when it arrived, et cetera. This is enough to let the Ratchet of Science click forward, but it’s a little harsh on the people who waste thirty years of their lives. Traditional Rationality is a walk, not a dance. It’s designed to get you to the truth eventually, and gives you all too much time to smell the flowers along the way.
Traditional Rationalists can agree to disagree. Traditional Rationality doesn’t have the ideal that thinking is an exact art in which there is only one correct probability estimate given the evidence. In Traditional Rationality, you’re allowed to guess, and then test your guess. But experience has taught me that if you don’t know, and you guess, you’ll end up being wrong.
The Way of Bayes is also an imprecise art, at least the way I’m holding forth upon it. These essays are still fumbling attempts to put into words lessons that would be better taught by experience. But at least there’s underlying math, plus experimental evidence from cognitive psychology on how humans actually think. Maybe that will be enough to cross the stratospherically high threshold required for a discipline that lets you actually get it right, instead of just constraining you into interesting new mistakes.
This is a good exercise for all of us - tell a story of when we made a serious inference mistake.
One of my mistakes was believing in Bayesian decision theory, and in constructive logic at the same time. This is because traditional probability theory is inherently classical, because of the axiom that P(A + not-A) = 1. This is an embarassingly simple inconsistency, of course, but it lead me to some interesting ideas.
Upon reflection, it turns out that the important idea is not Bayesianism proper, which is merely one of an entire menagerie of possible rationalities, but rather de Finetti's operationalization of subjective belief in terms of avoiding Dutch book bets. It turns out there are a lot of ways of doing that, because the only physically realizable bets are of finitely refutable propositions.
So you can have perfectly rational agents who never come to agreement, no matter how much evidence they see, because no finite amount of evidence can settle questions like whether the law of the excluded middle holds for propositions over the natural numbers.
Could you be so kind as to expand on that?
0 And 1 Are Not Probabilities - there is no finite amount of evidence that allows us to assign a probability of 0 or 1 to any event. Many important proofs in classical probability theory rely on marginalising to 1 - that is, saying that the total probability of mutually exclusive and collectively exhaustive events is exactly 1. This works just fine until you consider the possibilty that you are incapable of imagining one or more possible outcomes. Bayesian decision theory and constructive logic are both valid in their respective fields, but constructive logic is not applicable to real life, because we can't say with certainty that we are aware of all possible outcomes.
Constructive logic preserves truth values - it consists of taking a set of axioms, which are true by definition, and performing a series of truth-preserving operations to produce other true statements. A given logical system is a set of operations defined as truth-preserving - a syntax into which semantic statements (axioms) can be inserted. Axiomatic systems are never reliable in real life, because in real life there are no axioms (we cannot define anything to have probability 1) and no rules of syntax (we cannot be certain that our reasoning is valid). We cannot ever say what we know or how we know it; we can only ever say what we think we know and how we think we know it.
Are there any particular arguments in constructive logic that you formerly believed, and now no longer believe?
Or is this just a thing where you are forever doomed to say "minus epsilon" every time you say "1" but it doesn't actually change what arguments you accept?
To be more precise, there is no such finite evidence unless there already exist events to which you assign probability 0 or 1. If such events do exist, then you may later receive evidence that allows them to propagate.
Even if we have infinite evidence (positive or negative) for some set of events, we cannot achieve infinite evidence for any other event. The point of a logical system is that everything in it can be proven syntactically, that is, without assigning meaning to any of the terms. For example, "Only Bs have the property X" and "A has the property X" imply "A is a B" for any A, B and X - the proof makes no use of semantics. It is sound if it is valid and its axioms are true, but it is also only valid if we have defined certain operations as truth preserving. There are an uncountably infinite number of logical systems under which the truth of the axioms will not ensure the truth of the conclusion - the reasoning won't be valid.
Non-probabilistic reasoning does not ever work in reality. We do not know the syntax with certainty, so we cannot be sure of any conclusion, no matter how certain we are about the semantic truth of the premises. The situation is like trying to speak a language you don't know using only a dictionary and a phrasebook - no matter how certain you are that certain sentences are correct, you cannot be certain that any new sentence is gramatically correct because you have no way to work out the grammar with absolute certainty. No matter how many statements we take as axioms, we cannot add any more axioms unless we know the rules of syntax, and there is no way at all to prove that our rules of syntax - the rules of our logical sytem - are the real ones. (We can't even prove that there are real ones - we're pretty darned certain about it, but there is no way to prove that we live in a causal universe.)
Well, yes. If we believe that A=>B with probability 1, it's not enough to assign probability 1 to A to conclude B with probability 1; you must also assign probability 1 to modus ponens.
And even then you can probably Carroll your way out of it.
Classical logics make the assumption that all statements are either exactly true or exactly false, with no other possibility allowed. Hence classical logic will take shortcuts like admitting not(not(X)) as a proof of X, under the assumptions of consistency (we've proved not(not(X)) so there is no proof of not(X)), completeness (if there is no proof of not(X) then there must be a proof of X) and proof-irrelevance (all proofs of X are interchangable, so the existence of such a proof is acceptable as proof of X).
The flaw is, of course, the assumption of a complete and consistent system, which Goedel showed to be impossible for systems capable of modelling the Natural numbers.
Constructivist logics don't assume the law of the excluded middle. This restricts classical 'truth' to 'provably true', classical 'false' to 'provably false' and allows a third possibility: 'unproven'. An unproven statement might be provably true or provably false or it might be undecidable.
From a probability perspective, constructivism says that we shouldn't assume that P(not(X)) = 1 - P(X), since doing so is assuming that we're using a complete and consistent system of reasoning, which is impossible.
Note that constructivist systems are compatible with classical ones. We can add the law of the excluded middle to a constructive logic and get a classical one; all of the theorems will still hold and we won't introduce any inconsistencies.
Another way of thinking about it is that the law of the excluded middle assumes that a halting oracle exists which allows us to take shortcuts in our proofs. The results will be consistent, since the oracle gives correct answers, but we can't tell which results used the oracle as a shortcut (and hence don't need it) and which would be impossible without the oracle's existence (and hence don't exist, since halting oracles don't exist).
The only way to work out which ones are shortcuts is to take 'the long way' and produce a separate proof which doesn't use an oracle; these are exactly the constructive proofs!
Good post. I find your writing style a little overwrought for your audience (us overcomingbias readers) but the practical details and advice are gold.
I'm wondering about the build up to becoming a Bayesian. Do you think it's necessary for a person to understand Traditional Rationality as a mode of thinking before they can appreciate Bayes?
Intuitively, I would suspect that an understanding and even appreciation of ol' fashioned either/or thinking is a necessary foundation for probabilities.
Sorry if this is out of left field. My wife just left for work -- she's a pre-school teacher -- and I was thinking of how the lesson might be applied to her students (who are admittedly far too young for this sort of thing just yet.)
Do you think it's necessary for a person to understand Traditional Rationality as a mode of thinking before they can appreciate Bayes?
Good question! I think it should be possible to start with Bayes, but I've never seen it done. Lessons on Traditional Rationality appeal to built-in human intuitions, like "Reality is either a certain way or it's not", so you'd appeal to the same intuitions but use them to introduce probability principles like "Your probabilities shouldn't sum to more than 1.0."
Is this what CFAR is trying to do?
I would be interested to hear what other members of the community think about this. I accidentally found Bayes after being trained as a physicist, which is not entirely unlike traditional rationality. But I want to teach my brother, who doesn't have any science or rationality background. Has anyone had success with starting at Bayes and going from there?
Great post, as always. I think you're a great writer.
I think the following should be added to the about page in some form:
Until I read this exact paragraph I was always a little confused as to how any of this was terribly new or eye-opening. Putting everything that I have read in the last week into a perspective that includes this paragraph makes everything significantly more potent. If this nugget was in the previous posts I either missed it or forgot it. Either way, its impact did not match its importance.
One correct probability estimate of what? You are tacitly assuming that someone has mapped the ideaspace and presented you with a tidy menu of options. But no-one could have converged on relativity before Einstein because he hadn't thought of it yet. Guessing bad, hypothesing good.
Not checking what your hypothesis would have meant doesn't like science as she is did to me. What is the example you were thinking of here? I am having difficulty reconstructing a picture in my head of what you are calling "Traditional Rationality" without using straw.
While reading through this I ran into a problem. It seems intuitive to me that to be perfectly rational you would have to have instances in which given the same information two rationalists disagreed. I think this because I presume that a lack of randomness leads to a local maxima. Am I missing something?
Unpack "local maxima". Maxima of what?
I'm thinking of being unable to reach a better solution to a problem because what you know conflicts with arriving at the solution.
Say your data leads you to an inaccurate initial conclusion. Everybody agrees on this conclusion. Wouldn't that conclusion be data for more inaccurate conclusions?
So I thought that there would need to be some bias that was put on your reasoning so that occasionally you didn't go with the inaccurate claim. That way if some of the data is wrong you still have rationalists who arrive at a more accurate map.
Tried to unpack it. Noticed that I seem to expect this "exact art" of rationality to be a system that can stand on its own when it doesn't. What I mean by that is that I seem to have assumed that you could built some sort of AI on top of this system which would always arrive at an accurate perception of reality. But if that was the case, wouldn't Elizer already have done it?
I feel like I'm making mistakes and being foolish right now, so I'm going to stop writing and eagerly await your corrections.
There's nothing in being a rationalist that prevents you from considering multiple hypotheses. One thing I've not seen elaborated on a lot on this site (but maybe I've just missed it) is that you don't need to commit to one theory or the other, the only time you're forced to commit yourself is if you need to make a choice in your actions. And then you only need to commit for that choice, not for the rest of your life. So a bunch of perfect rationalists who have observed exactly the same events/facts (which of course doesn't happen in real life) would ascribe exactly the same probabilities to a bunch of theories. If new evidence came in they would all switch to the new hypothesis because they were all already contemplating it but considering it less likely than the old hypothesis.
The only thing preventing you from considering all possible hypotheses is lack of brain power. This limited resource should probably be divided among the possible theories in the same ratio that you're certain about them, so if you think theory A has a probability of 50% of being right, theory B a probability of 49% and theory C a probability of 1%, you should spend 99% of your efforts on theory A and B. But if the probabilities are 35%, 33% and 32% you should spend almost a third of your resources on theory C. (Assuming the goal is just to find truth, if the theories have other utilities that should be weighted in as well.)
Likelyhood is one consideration when determining how much to investigate a possible hypotheses but it isn't the only consideration. Quite often the ratio of attention should be different to the ratio of credibility.
I think even a perfect implementation of Bayes would not in and of itself be an AI. By itself, the math doesn't have anything to work on, or any direction to do so. Agency is hard to build, I think.
As always, of course, I could be wrong.
Would a "perfect implementation of Bayes", in the sense you meant here, be a Solomonoff inductor (or similar, perhaps modified to work better with anthropic problems), or something perfect at following Bayesian probability theory but with no prior specified (or a less universal one)? If the former, you are in fact most of the way to an agent, at least some types of agents, e.g. AIXI.
Well, I'm not personally capable of building AI's, and I'm not as deeply versed as I'm sure many people here are, but, I see an implementation of Bayes theorem as a tool for finding truth, in the mind of a human or an AI or whatever sort of person you care to conceive of / display, whereas the mind behind it is an agent with a quality we might called directedness, or intentionality, or simply an interest to go out and poke the universe with a stick where it doesn't make sense. Bayes is in itself already math, easy to put into code, but we don't understand internally directed behavior well enough to model it, yet.
This is also true of Bayesians. The probability estimate given the evidence is a property of the map, not the territory (hence "estimate"). One correct posterior implies one correct prior. What is this "Ultimate Prior"? There isn't one.
Possibly, you meant that there's one correct posterior given the evidence and the prior. That's correct, but it doesn't prevent Bayesians from disagreeing, because they do have different priors.
Alternatively, one can point out that the "given evidence" operator is, in expectation, always non-expansive, and contractive when the priors disagree. This means that the beliefs of Perfect Bayesians with shared observations converge (with probability 1) into a single posterior. But this convergence is too slow for humans. Agreeing to disagree is sometimes our only option.
Incidentally, it's Traditional Rationalists who believed they should never agree to disagree: the set of hypotheses which aren't "ruled out" by confirmed and repeatable experiments, they argued, is a property of the territory.
I'm aware of this result. It specifically requires the two Beyesians to have the same prior. My point is exactly that this doesn't have to be the case, and in reality is sometimes not the case.
EDIT: The original paper by Aumann references a paper by Harsanyi which supposedly addresses my point. Aumann himself is careful in interpreting his result as supporting my point (since evidently there are people who disagree despite trusting each other). I'll report here my understanding of the Harsanyi paper once I get past the paywall.
The Harsanyi paper is very enlightening, but he's not really arguing that people have shared priors. Rather, he's making the following points (section 14):
It is worthwhile for an agent to analyze the game as if all agents have the same prior, because it simplifies the analysis. In particular, the game (from that agent's point of view) then becomes equivalent to a Bayesian complete-information game with private observations.
The same-prior assumption is less restrictive than it may seem, because agents can still have private observations.
A wide family of hypothetical scenarios can be analyzed as if all agents have the same prior. Other scenarios can be easily approximated by a member of this family (though the quality of the approximation is not studied).
All of this is mathematically very pleasing, but it doesn't change my point. That's mainly because in the context of the Harsanyi paper "prior" means before any observation, and in the context of this post "prior" means before the shared observation (but possibly after private observations).
Problem: "retrospective predictions" is undefined here. Search does not locate this term anywhere on the LessWrong website, the LessWrong wiki or on Wikipedia, but it seems to be the crux of this piece that we have to make retrospective predictions. Also, it's not clear what you mean by it because it sounds oxymoronic - you can't predict something that already happened. My best guess about what you mean by "retrospective predictions" is: Say someone has a theory that humans are hairless because they evolved from aquatic monkeys. That person should "predict" that there's past evidence of aquatic monkeys existing at the right place/time/circumstance/whatever and then go do some research to find out.
Retrospective prediction is an expansion of http://en.wikipedia.org/wiki/retrodiction
Oh, thank you, Gwern! Ok, so retrodiction is more like this: There are facts that we currently know and phenomena that have already happened so you should consider whether your theory would have predicted them. It's not "did something related precede this" but "If we had known this theory before realizing certain facts or making certain observations, would the theory have predicted or explained these?"
Hmm for examples... if there were an all-knowing, all-powerful, all-loving God, what would I predict? If life on earth evolved, what would I predict?
What would God do? Make something awesome or lounge around feeling enlightened. I'm personifying here, and I know it... I have no idea what a God would do but I suspect that it would not be "Make a bunch of creatures knowing that a bunch of them will experience horrible suffering. Demand that they have faith but confuse them with a bunch of different religions to choose from. Create each of them knowing exactly how they'll reason and what they'll experience and what that combination will result in and demand certain beliefs that won't make sense to some of them."
Whereas with evolution, I'd predict that various life forms would evolve, some would succeed, some would not, life would be more like a chaotic experiment than a harmonious symphony, the smartest life forms would be dreadfully confused for quite some time before having it together...
And this sounds like earth.
I would expect most life to just end up as planets full of green goo (ie. like grey goo but natural). But I'd expect that in a tiny minority of cases things like Fisherian Runaway, complex signalling and just plain luck happen to throw some individual toward the 'general intelligence' path (and a bunch of other deal breaking to not happen on the way). I'd expect any intelligent agents to observe that they are on a planet, in a galaxy in an Everett Branch where life had evolved much like you said.
Hmm. I notice that I was not as specific as you are. I didn't say anything about what "most" life forms would be like or whether there would be lots of smart life forms. I haven't really done a thorough retrodiction on evolution, to tell the truth. But I am really liking this new imagination trick of "try to predict the past if the theory was true" (which is subtly different from my other tricks like "is there anything in the past that supports / refutes this?") and it's pleasant atheism-promoting effect on the remnants of my dead agnosticism phase. I'm glad I asked this question and that Gwern helped.
Thinking it out, I do not agree with your green goo hypothesis. I think that as long as there were mutations in the green goo's pattern (and stability in this pattern would be the exception not the rule due to the complexity of making a self-replicating, self-incarnating pattern, and due to environmental differences more complex and diverse than the green goo's pattern would be able to expect) and as long as there was always room for improvement (for something this complex that evolved randomly, perfection in the pattern would be the exception not the rule) it would have to change and mutate and new variations would inevitably emerge.
What would it take to have that kind of stability in life forms? Other than a perfectly stable planet? The life game is very, very complex.
I think, perhaps, a drastic reduction in the number of physical laws (when you have all kinds of neat toys to play with from electricity to friction, room for improvement is immense), as well as the number of substances available (otherwise the goo will only expand and encounter new things which promote adaptations), it MIGHT result in a simple life form becoming "perfect" for it's environment and then stabilizing it's genes as a way of optimizing perfection.
I think diversity and increasing improvement is more likely to result from evolution than perfect, stable green goo.
We may also have meant different things by "if life on earth evolved". I read it as "conditional on self replicating things we could call 'life' emerged on earth, how would I expect things to proceed" where it could also have meant "conditional on intelligent life like we know it having been evolved, how would I expect that process to have gone".
What I was intending to convey was not so much that one stable form of goo would remain permanently but rather that there is a significant component of the great filter in the stages between life emerging and general-intelligence evolving as well as the component before life emerges at all. I expect that most planets where life evolves at all to not evolve general intelligence or even other lifeforms as interesting as what we consider lesser animals. I expect it to get stuck in local minima rather frequently.
I disagree. The incentivising force for continued adaptation is changes in your environment (including your fellow other species). Static goo - or uniformly adapting goo - cannot be optimal for all of a planet at once, leaving room to be outcompeted by diversifying dark-green goo, which may eventually evolve into goo-man (I mean, hu-man):
A planet filled with homogeneous green goo would still be subject to offering advantages based on adaptation on two major axes:
1) Planets universally offer different conditions for habitats, pole temperature versus equatorial temperature, seismic activities on active planets, surface versus underground habitats. The green goo would eventually split off into various types, each best suited to the environment. There is no such thing as an "optimal green goo for every environment", optimal refers to a specific set of conditions. Some tasks are hard for single-celled organisms to fulfill, which is probably why the uniform green goo that life developed as on earth diversified while spreading, and that bacteria, while ubiquitous, still aren't considered the dominant life form.
2) As a hypothetical, even a planet transformed into a uniform green goo blob in space would be an environment in itself, allowing for niches for different forms of life (as long as there's still some entropy to waste i.e. a mechanism for mutation). For a crude comparison, think of lava as goo on a different time scale.
Lastly, if you allow certain variations in your green goo, you could well argue that earth as it is now is an amalgam of various sorts of green goo - us. Especially from the vantage point of our basic goo unit - the gene. See the goo now?
(To me, the curious thing isn't the eventual appearance of memetic-temetic based adaptability (intelligence), but of subjective experience to go with it. Good fiction novel on that: Peter Watts’ Blindsight.)
One might compare this to ecosystems of reproducing known-number iterated prisoner's dilemma robots - the analogous idea is that these ecosystems will usually end up as "tit for tat goo."
Tit for tat is reliable. Like algae in the sea of early earth, tit for tat can serve as a "background" for our ecosystem - cooperation is harvesting energy from the sun, defection is being a predator, but if everyone tries to be a predator everyone dies. So algae reproduces. But also like a sea full of algae, there are predatory / parasitic strategies that work really well once the plants are common, like defecting at the end, or eating plants. If a tit for tat robot has the first mutant baby that defects at the end, that baby will only play against tit for tat robots, so it will defect successfully and have more babies than usual, eventually leading to a whole new strain. The zooplankton of the ecosystem. But then if that becomes common, it may be worth it to produce a parasite to the parasite - defecting twice from the end. The bigger the possible rewards, the more layers of strategies will be viable. Tit for tat goo is unstable - plants quickly grow herbivores, and herbivores can sometimes grow predators.
And that's just iterated prisoner's dilemma. Add in more dimensions, multiple equilibria... things could get pretty complicated.
Yes, that's pretty much what retrodiction is. It's not as good as prediction since you can come up with theories over-fitted to exactly the past (a big problem with financial retrodiction: people routinely find some complex strategy or apparent arbitrage when running over the last 30 years of market data, which disappears the moment they tried to use it), but if predictions are unavailable, at least retrodiction keeps you concretely grounded.
I'm not sure I would use God as an example. Theists like Plantinga have done a good job showing that they can come up with a version of God + concepts like 'free will' which is logically consistent with any observation, so neither retrodiction nor prediction matters for their God.
I love it. Retrodiction is awesome.
I think I broke the free will God argument. The idea that evil is evidence that God gives us free will is contradicted by the existence of evil. What do you think?
In general, if someone thinks they've said something that is both new and valuable about the theodicy: they haven't.
Looking at your link, I have no idea what you're trying to say.
Well, I reworded my point as "The idea that evil is evidence that God gives us free will is contradicted by the existence of evil" but if you don't think it's going to be interesting, don't bother.
Who told you that? Einstein's retrodiction of the perihelion shift of Mercury is an oft-quoted example from a century back.
Oh wow, I had sort of a feeling that accepting how wrong we can be was not the ultimate goal; of course, it cannot be. I'm interested in where this is going further.