Mentioned in

You only need faith in two things

10th Mar 2013

20redxaxder

2redxaxder

15Eugine_Nier

0loup-vaillant

0Eugine_Nier

12jklsemicolon

3Eliezer Yudkowsky

1drnickbone

9paulfchristiano

0Larks

0sullyj3

8Kawoomba

2Qiaochu_Yuan

7Eliezer Yudkowsky

1sullyj3

4Kawoomba

2Qiaochu_Yuan

5Kawoomba

4ESRogs

1[anonymous]

0ESRogs

5Alex Flint

0Eliezer Yudkowsky

5Elithrion

4ThrustVectoring

7Eliezer Yudkowsky

15shminux

0ThrustVectoring

6shminux

0ThrustVectoring

3shminux

0nigerweiss

3jacobt

8Eliezer Yudkowsky

3Error

3jacobt

2Eliezer Yudkowsky

9Wei Dai

2Eliezer Yudkowsky

1Wei Dai

2jacobt

1Eliezer Yudkowsky

2DaFranker

2IlyaShpitser

0DaFranker

-6wedrifid

4IlyaShpitser

0jacobt

3Ben Pace

6fubarobfusco

2Ben_Welchner

2Ben Pace

9[anonymous]

2David_Gerard

0Error

2CoffeeStain

2Qiaochu_Yuan

0CoffeeStain

0Qiaochu_Yuan

1CoffeeStain

0Qiaochu_Yuan

-1Larks

1ChristianKl

6Elithrion

5BlazeOrangeDeer

0Qiaochu_Yuan

-1ChristianKl

3evand

2Qiaochu_Yuan

0benelliott

1timtyler

5fubarobfusco

0Jayson_Virissimo

2timtyler

0MichaelHoward

0MrMind

2Eliezer Yudkowsky

0ryujin

2Eliezer Yudkowsky

1ryujin

0Decius

1DanielLC

2Qiaochu_Yuan

0Decius

0DanielLC

0Decius

New Comment

86 comments, sorted by Click to highlight new comments since: Today at 2:43 AM

Some comments are truncated due to high volume. (⌘F to expand all)

This phrase confuses me:

and that some single large ordinal is well-ordered.

Every definition I've seen of ordinal either includes well-ordered or has that as a theorem. I'm having trouble imagining a situation where it's necessary to use the well-orderedness of a larger ordinal to prove it for a smaller one.

*edit- Did you mean well-founded instead of well-ordered?

211y

Every ordinal (in the sense I use the word[1]) is both well-founded and well-ordered.
If I assume what you wrote makes sense, then you're talking about a different sort of ordinal. I've found a paper[2] that talks about proof theoretic ordinals, but it doesn't talk about this in the same language you're using. Their definition of ordinal matches mine, and there is no mention of an ordinal that might not be well-ordered.
Also, I'm not sure I should care about the consistency of some model of set theory. The parts of math that interact with reality and the parts of math that interact with irreplaceable set theoretic plumbing seem very far apart.
[1] An ordinal is a transitive set well-ordered by "is an element of".
[2] www.icm2006.org/proceedings/Vol_II/contents/ICM_Vol_2_03.pdf

011y

Nevertheless, the lack of exposure to such attractors is quite relevant: if there was any, you'd expect some scientist to encounter it.

011y

Why would one expect scientists to have encountered such attractors before even if they exist? As far as I know there hasn't been much effort to systematically search for them, and even if there has been some effort in that direction, Eliezer didn't site any.

311y

Yeah, it's hard to phrase this well and I don't know if there's a standard phrasing. What I was trying to get at was the idea that some computable ordering is total and well-ordered, and therefore an ordinal.

111y

Well, supposing that a large ordinal exists is equivalent to supposing a form of Platonism about mathematics (that a colossal infinity of other objects exist). So that is quite a large statement of faith!
All maths really needs is for a large enough ordinal to be logically possible, in that it is not self-contradictory to suppose that a large ordinal exists. That's a much weaker statement of faith. Or it can be backed by an inductive argument in the way Eliezer suggests.

I'm a bit skeptical of this minimalism (if "induction works" needs to get explicitly stated, I'm afraid all sorts of other things---like "deduction works"---also do).

But while we're at it, I don't think you need to take any mathematical statements on faith. To the extent that a mathematical statement does any useful predictive work, it too can be supported by the evidence. Maybe you could say that we should include it on a technicality (we don't yet know how to do induction on mathematical objects), but if you don't think that you can do induction over mathematical facts, you've got more problems than not believing in large ordinals!

011y

My guess is that deduction, along with bayesian updating, are being considered part of our rules of inference, rather than axioms.

09y

Oh, like Achilles and the tortoise. Thanks, this comment clarified things a bit.

being exposed to ordered sensory data will rapidly promote the hypothesis that induction works

Promote it how? By ways of inductive reasoning, to which Bayesian inference belongs. It seems like there's a contradiction between the initially small prior of "induction works" (which is different from inductive reasoning, but still related) and "promote that low-probability hypothesis (that induction works) by ways of inductive reasoning".

If you see no tension there, wouldn't you still need to state the basis for "inductive reasoning works", at least such that its use can be justified (initially)?

211y

Consider the following toy model. Suppose you are trying to predict a sequence of zeroes and ones. The stand-in for "induction works" here will be Solomonoff induction (the sequence is generated by an algorithm and you use the Solomonoff prior). The stand-in for "induction doesn't work" here will be the "binomial monkey" prior (the sequence is an i.i.d. sequence of Bernoulli random variables with p = 1/2, so it is not possible to learn anything about future values of the sequence from past observations). Suppose you initially assign some nonzero probability to Solomonoff induction working and the rest of your probability to the binomial monkey prior. If the sequence of zeroes and ones isn't completely random (in the sense of having high Kolmogorov complexity), Solomonoff induction will quickly be promoted as a hypothesis.
Not all Bayesian inference is inductive reasoning in the sense that not all priors allow induction.

711y

To amplify on Qiaochu's answer, the part where you promote the Solomonoff prior is Bayesian deduction, a matter of logic - Bayes's Theorem follows from the axioms of probability theory. It doesn't proceed by saying "induction worked, and my priors say that if induction worked it should go on working" - that part is actually implicit in the Solomonoff prior itself, and the rest is pure Bayesian deduction.

19y

Doesn't this add "the axioms of probability theory" ie "logic works" ie "the universe runs on math" to our list of articles of faith?
Edit: After further reading, it seems like this is entailed by the "Large ordinal" thing. I googled well orderedness, encountered the wikipedia article, and promptly shat a brick.
What sequence of maths do I need to study to get from Calculus I to set theory and what the hell well orderedness means?

411y

Again, promoted how? All you know is "induction is very, very unlikely to work" (low prior, non 0), and "some single large ordinal is well-ordered". That's it. How can you deduce an inference system from that that would allow you to promote a hypothesis based on it being consistent with past observations?
It seems like putting the hoversled before the bantha (= assuming the explanandum).

211y

Promoted by Bayesian inference. Again, not all Bayesian inference is inductive reasoning. Are you familiar with Cox's theorem?

511y

Only in passing. However, why would you assume those postulates that Cox's theorem builds on?
You'd have to construct and argue for those postulates out of (sorry for repeating) "induction is very, very unlikely to work" (low prior, non 0), and "some single large ordinal is well-ordered". How?

411y

Wouldn't it be: large ordinal -> ZFC consistent -> Cox's theorem?
Maybe you then doubt that consequences follow from valid arguments (like Carroll's Tortoise in his dialogue with Achilles). We could add a third premise that logic works, but I'm not sure it would help.

1[anonymous]11y

Can you elaborate on the first step?

011y

I'm no expert in this -- my comment is based just on reading the post, but I take the above to mean that there's some large ordinal for ZFC whose existence implies that ZFC has a model. And if ZFC has a model, it's consistent.

To get to Bayes, don't you also need to believe not just that probability theory is internally consistent (your well-ordered ordinal gives you that much) but also that it is the correct system for deducing credences from other credences? That is, you need to believe Cox's assumptions, or equivalently (I think) Jayes' desiderata (consistent, non-ideological, quantitative). Without these, you can do all the probability theory you want but you'll never be able to point at the number at the end of a calculation and say "that is now my credence for the sun rising tomorrow".

011y

If you believe in a prior, you believe in probability, right?

711y

I was staring at this thinking "Didn't I just say that in the next-to-last paragraph?" and then I realized that to a general audience it is not transparent that adducing the consistency of ZFC by induction corresponds to inducing the well-ordering of some large ordinal by induction.

011y

I was at least familiar with the concepts involved and conflated mathematical induction and evidential inductive reasoning anyways.

611y

I can't even understand if the post is about pure math or about the applicability of certain mathematical models to the physical world.

011y

From what I understand it's along the same lines as Bertrand Russel's search for the smallest set of axioms to form mathematics, except for everything and not just math.

311y

If so, it makes little sense to me. Math is one tool for modeling and accurately predicting the physical world, and it is surely nice to minimize the number of axioms required to construct an accurate model, but it is still about the model, there is no well-ordering and no ordinals in the physical world, these are all logical constructs. It seems that there is something in EY's epistemology I missed.
...unless you are dealing with phenomena where it doesn't, like stock markets? Or is this a statement about the general predictability of the world, i.e. that models are useful? Then it is pretty vacuous, since otherwise what point would be there in trying to model the world?
"Believe" in what sense? That it is self-consistent? That it enables accurate modeling of physical systems?

011y

I figured it out from context. But, sure, that could probably be clearer.

Because being exposed to ordered sensory data will rapidly promote the hypothesis that induction works

Not if the alternative hypothesis assigns about the same probability to the data up to the present. For example, an alternative hypothesis to the standard "the sun rises every day" is "the sun rises every day, until March 22, 2015", and the alternative hypothesis assigns the same probability to the data observed until the present as the standard one does.

You also have to trust your memory and your ability to compute Solomonoff induction, both of which are demonstrably imperfect.

811y

There's an infinite number of alternative hypotheses like that and you need a new one every time the previous one gets disproven; so assigning so much probability to all of them, that they went on dominating Solomonoff induction on every round even after being exposed to large quantities of sensory information, would require that the remaining probability mass assigned to the prior for Solomonoff induction be less than exp(amount of sensory information), that is, super-exponentially tiny.

311y

My brain parsed "super-exponentially tiny" as "arbitrarily low" or somesuch. I did not wonder why it specifically needed to be super-exponential. Hence this post served both to point out that I should have been confused, (I wouldn't have understood why) and to dispel the confusion.
Something about that amuses me.

311y

You could choose to single out a single alternative hypothesis that says the sun won't rise some day in the future. The ratio between P(sun rises until day X) and P(sun rises every day) will not change with any evidence before day X. If initially you believed a 99% chance of "the sun rises every day until day X" and a 1% chance of Solomonoff induction's prior, you would end up assigning more than a 99% probability to "the sun rises every day until day X".
Solomonoff induction itself will give some significant probability mass to "induction works until day X" statements. The Kolmogorov complexity of "the sun rises until day X" is about the Kolmogorov complexity of "the sun rises every day" plus the Kolmogorov complexity of X (approximately log2(x)+2log2(log2(x))). Therefore, even according to Solomonoff induction, the "sun rises until day X" hypothesis will have a probability approximately proportional to P(sun rises every day) / (X log2(X)^2). This decreases subexponentially with X, and even slower if you sum this probability for all Y >= X.
In order to get exponential change in the odds, you would need to have repeatable independent observations that distinguish between Solomonoff induction and some other hypothesis. You can't get that in the case of "sun rises every day until day X" hypotheses.

211y

If you only assign significant probability mass to one changeover day, you behave inductively on almost all the days up to that point, and hence make relatively few epistemic errors. To put it another way, unless you assign superexponentially-tiny probability to induction ever working, the number of anti-inductive errors you make over your lifespan will be bounded.

911y

But even one epistemic error is enough to cause an arbitrarily large loss in utility. Suppose you think that with 99% probability, unless you personally join a monastery and stop having any contact with the outside world, God will put everyone who ever existed into hell on 1/1/2050. So you do that instead of working on making a positive Singularity happen. Since you can't update away this belief until it's too late, it does seem important to have "reasonable" priors instead of just a non-superexponentially-tiny probability to "induction works".

211y

This is always true.
I'd say more that besides your one reasonable prior you also need to not make various sorts of specifically harmful mistakes, but this only becomes true when instrumental welfare as well as epistemic welfare are being taken into account. :)

111y

Do you think it's useful to consider "epistemic welfare" independently of "instrumental welfare"? To me it seems that approach has led to a number of problems in the past.
1. Solomonoff Induction was historically justified a way similar to your post: you should use the universal prior, because whatever the "right" prior is, if it's computable then substituting the universal prior will cost you only a limited number of epistemic errors. I think this sort of argument is more impressive/persuasive than it should be (at least for some people, including myself when I first came across it), and makes them erroneously think the problem of finding "the right prior" or "a reasonable prior" is already solved or doesn't need to be solved.
2. Thinking that anthropic reasoning / indexical uncertainty is clearly an epistemic problem and hence ought to be solved within epistemology (rather than decision theory), leading for example to dozens of papers arguing over what is the right way to do Bayesian updating in the Sleeping Beauty problem.

211y

Ok, I agree with this interpretation of "being exposed to ordered sensory data will rapidly promote the hypothesis that induction works".

111y

Yep! And for the record, I agree with your above paragraphs given that.
I would like to note explicitly for other readers that probability goes down proportionally to the exponential of Kolmogorov complexity, not proportional to Kolmogorov complexity. So the probability of the Sun failing to rise the next day really is going down at a noticeable rate, as jacobt calculates (1 / x log(x)^2 on day x). You can't repeatedly have large likelihood ratios against a hypothesis or mixture of hypotheses and not have it be demoted exponentially fast.

211y

But... no.
"The sun rises every day" is much simpler information and computation than "the sun rises every day until Day X". To put it in caricature, if hypothesis "the sun rises every day"is:
XXX1XXXXXXXXXXXXXXXXXXXXXXXXXX
(reading from the left)
then the hypothesis "the sun rises every day until Day X" is:
XXX0XXXXXXXXXXXXXXXXXXXXXX1XXX
And I have no idea if that's even remotely the right order of magnitude, simply because I have no idea how many possible-days or counterfactual days we need to count, nor of how exactly the math should work out.
The important part is that for every possible Day X, it is equally balanced by the "the sun rises every day" hypothesis, and AFAICT this is one of those things implied by the axioms. So because of complexity giving you base rates, most of the evidence given by sunrise accrues to "the sun rises every day", and the rest gets evenly divided over all non-falsified "Day X" (also, induction by this point should let you induce that Day X hypotheses will continue to be falsified).

011y

You're making the argument that Solomonoff induction would select "the sun rises every day" over "the sun rises every day until day X". I agree, assuming a reasonable prior over programs for Solomonoff induction. However, if your prior is 99% "the sun rises every day until day X", and 1% "Solomonoff induction's prior" (which itself might assign, say, 10% probability to the sun rising every day), then you will end up believing that the sun rises every day until day X. Eliezer asserted that in a situation where you assign only a small probability to Solomonoff induction, it will quickly dominate the posterior. This is false.
Not sure exactly what this means, but the ratio between the probabilities "the sun rises every day" and "the sun rises every day until day X" will not be affected by any evidence that happens before day X.

(No, this is

notthe "tu quoque!" moral equivalent of starting out by assigning probability 1 that Christ died for your sins.)

Can someone please explain this?

I understand many religious people claim to just 'have faith' in Christ, with absolute certainty. I think the standard argument would run "well, you say I shouldn't have faith in Christ, but you have faith in 'science' / 'non-neglible probability on induction and some single well ordered large ordinal' so you can't argue against faith".

What is Eliezer saying here?

Addendum: By which I mean, can someone give a clear explanation of why they are not the same?

611y

— http://en.wikipedia.org/wiki/Presuppositional_apologetics

211y

You pretty much got it. Eliezer's predicting that response and saying, no, they're really not the same thing. (Tu quoque)
EDIT: Never mind, I thought it was a literal question.

211y

I see. Could you articulate how exactly they're not the same thing please?

9[anonymous]11y

For instance: nowhere above did EY claim anything had probability one.

011y

I actually thought this was part of a sequence, because I'm missing some context along the lines of "I vaguely remember this being discussed but now I can't remember why the topic was on the table in the first place." Initially I thought it was part of the Epistemology sequence but I can't figure out where it follows from. Someone enlighten me?

Does induction state a fact about the territory or the map? Is it more akin to "The information processing influencing my sensory inputs *actually* has to a processor in which P(0) & [P(0) & P (1) & ... & P(n) -> P(n+1)] for all propositions P and natural n?" Or is it "my *own* information processor is one for which P(0) & [P(0) & P (1) & ... & P(n) -> P(n+1)] for all propositions P and natural n?"

It seems like the second option is true by definition (by the authoring of the AI, we simply make it so b...

211y

This question seems to confuse mathematical induction with inductive reasoning.

011y

So I have. Mathematical induction is, so I see, actually a form of deductive reasoning because its conclusions necessarily follow from its premises.

011y

Mathematical induction is more properly regarded as an axiom. It is accepted by a vast majority of mathematicians, but not all.

111y

How should I think about the terminologies "faith" and "axiom" in this context? Is this "faith in two things" more fundamental than belief in some or all mathematical axioms?
For example, if I understand correctly, mathematical induction is equivalent to the well-ordering principle (pertaining to subsets of the natural numbers, which have a quite low ordinal). Does this mean that this axiom is subsumed by the second faith, which deals with the well-ordering of a single much higher ordinal?
Or, as above, did Eliezer mean "well-founded?" In which case, is he taking well-ordering as an axiom to prove that his faiths are enough to believe all that is worth believing?
It may be better to just point me to resources to read up on here than to answer my questions. I suspect I may still be missing the mark.

011y

I'm not sure how to answer your specific question; I'm not familiar with proof-theoretic ordinals, but I think that's the keyword you want. I'm not sure what your general question means.

-111y

Utter pedantry: or rather an axiom schema, in first order languages.

611y

I feel like this is more of a problem with your optimism than with induction. You should really have a hypothesis set that says "humans want me to be fed for some period of time" and the evidence increases your confidence in that, not just some subset of it. After that, you can have additional hypotheses about, for example, their possible motivations, that you could update on based on whatever other data you have (e.g. you're super-induction-turkey, so you figured out evolution). Or, more trivially, you might notice that sometimes your fellow turkeys disappear and don't come back (if that happens). You would then predict the future based on all of these hypotheses, not just one linear trend you detected.

511y

I'm not sure why, but now I want Super-induction-turkey to be the LW mascot.

011y

If you have a method of understanding the world that works for all problems, I would love to hear it.

-111y

Acknowledging that you can't solve them?

311y

In what sense does that "work"?
Being able to predict the results of giving up on a problem does not imply that giving up is superior to tackling a problem that I don't know I'll be able to solve.

211y

How do you know which ones are the ones you can't solve?

011y

So induction gives the right answer 100s of times, and then gets it wrong once. Doesn't seem too bad a ratio.

I've long claimed to not have faith in *anything*. I'm certainly don't have "faith" in inductive inference. I don't see why anyone would have "faith" in something which they are uncertain about. The need for lack of certainty about induction has long been understood.

511y

I don't have faith in induction. I happen to be the kind of monster who does induction.
But only sometimes. The more toothpaste I have extracted from the tube so far, the more likely it is to be empty.

011y

Which of these seven models of faith do you have in mind when you use that word (if any)?

211y

Bah, philosophy. I essentially mean belief not justified by the evidence.

You only need faith in two things: ...that some single large ordinal is well-ordered.

I'm confused. What do you mean by *faith* in... well, properties of abstract formal systems? That some single large ordinal must exist in at least one of your models for it to usefully model reality (or other models)?

Work is ongoing on eliminating the requirement for faith in these two remaining propositions. For example, we might be able to describe our increasing confidence in ZFC in terms of logical uncertainty and an inductive prior which is updated as ZFC passes various tests that it would have a substantial subjective probability of failing, even given all other tests it has passed so far, if ZFC were inconsistent.

Would using the length of the demonstration of a contradiction work? Under the Curry-Howard correspondence, a lengthy proof should correspond to a lengthy program, which under Solomonoff induction should have less and less credit.

211y

Unless I've missed something, it is easy to exhibit small formal systems such that the minimum proof length of a contradiction is unreasonably large. E.g. Peano Arithmetic plus the axiom "Goodstein(Goodstein(256)) does not halt" can prove a contradiction but only after some very, very large number of proof steps. Thus failure to observe a contradiction after small huge numbers of proof steps doesn't provide very strong evidence.

011y

Given that we can't define that function in PA what do you mean by Goodstein(256)?

211y

Goodstein is definable, it just can't be proven total. If I'm not mistaken, all Turing machines are definable in PA (albeit they may run at nonstandard times).

111y

So I gather we define a Goodstein relation G such that [xGy] in PA if [y = Goodstein(x)] in ZFC, then you're saying PA plus the axiom [not(exists y, (256Gy and exists z, (yGz)))] is inconsistent but the proof of that is huge because it the proof basically has to write an execution trace of Goodstein(Goodstein(256)). That's interesting!

111y

How?
You can do it with the axiom of choice, but beyond that I'm pretty sure you can't.

211y

If "arbitrary size" means "arbitrarily large size," see Hartogs numbers. On the other hand, the well-ordering principle is equivalent to AC.

011y

Take the empty set. Add an element. Preserving the order of existing elements, add a greatest element. Repeat.

011y

That sounds like it would only work for countable sets.

011y

Is the single large ordinal which must be well-ordered uncountable? I had figured that simply unbounded was good enough for this application.

You only need faith in two things: That "induction works" has a non-super-exponentially-tiny prior probability, and that some single large ordinal is well-ordered. Anything else worth believing in is a deductive consequence of one or both.

(Because being exposed to ordered sensory data will rapidly promote the hypothesis that induction works, even if you started by assigning it very tiny prior probability, so long as that prior probability is not super-exponentially tiny. Then induction on sensory data gives you all empirical facts worth believing in. Believing that a mathematical system has a model usually corresponds to believing that a certain computable ordinal is well-ordered (the proof-theoretic ordinal of that system), and large ordinals imply the well-orderedness of all smaller ordinals. So if you assign non-tiny prior probability to the idea that induction might work, and you believe in the well-orderedness of a single sufficiently large computable ordinal, all of empirical science, and all of the math you will actually believe in, will follow without any further need for faith.)

(The reason why you need faith for the first case is that although the fact that induction works can be readily observed, there is also some anti-inductive prior which says, 'Well, but since induction has worked all those previous times, it'll probably fail next time!' and 'Anti-induction is bound to work next time, since it's never worked before!' Since anti-induction objectively gets a far lower Bayes-score on any ordered sequence and is then demoted by the logical operation of Bayesian updating, to favor induction over anti-induction it is not necessary to start out believing that induction works better than anti-induction, it is only necessary *not* to start out by being *perfectly* confident that induction won't work.)

(The reason why you need faith for the second case is that although more powerful proof systems - those with larger proof-theoretic ordinals - can prove the consistency of weaker proof systems, or equivalently prove the well-ordering of smaller ordinals, there's no known perfect system for telling which mathematical systems are consistent just as (equivalently!) there's no way of solving the halting problem. So when you reach the strongest math system you can be convinced of and further assumptions seem dangerously fragile, there's some large ordinal that represents all the math you believe in. If this doesn't seem to you like faith, try looking up a Buchholz hydra and then believing that it can always be killed.)

(Work is ongoing on eliminating the requirement for faith in these two remaining propositions. For example, we might be able to describe our increasing confidence in ZFC in terms of logical uncertainty and an inductive prior which is updated as ZFC passes various tests that it would have a substantial subjective probability of failing, even given all other tests it has passed so far, if ZFC were inconsistent.)

(No, this is *not* the "tu quoque!" moral equivalent of starting out by assigning probability 1 that Christ died for your sins.)