Mentioned in

Results from MIRI's December workshop

13John_Baez

10Stuart_Armstrong

6Manfred

5JeremyHahn

3Manfred

6JeremyHahn

1Manfred

0JeremyHahn

3Manfred

0paulfchristiano

0Manfred

0Will_Sawin

0Manfred

0Will_Sawin

4ESRogs

3TruePath

0Benya

0TruePath

0Benya

1twanvl

1JeremyHahn

0abramdemski

0JeremyHahn

0abramdemski

0twanvl

0JeremyHahn

1timtyler

0shiney

0abramdemski

0shiney

0abramdemski

0shminux

1Nisan

0Protagoras

0shminux

0Nisan

2shminux

0Will_Sawin

-2Jonathan Paulson

0Benya

2shminux

0Benya

2shminux

New Comment

43 comments, sorted by Click to highlight new comments since: Today at 11:09 AM

suppose that you find evidence that Q(x) is true of exactly 90% of all numbers between x=1 and x=10^100. Then we would expect that if you plug in some arbitrary number in this range, say n = floor(7^7^7^7 * pi) mod 10^100, then the posterior probability of Q(n), after conditioning on your evidence, would be 0.9. But it turns out that in Demski's proposal, the probability of Q(n) would be approximately 0.5.

This is because the task you're talking about is a task for generalizing based on a pattern rather than on proofs, which early proposals for assigning logical probabilities didn't include. I have literally written an article about this, check it out.

I'll plug my own posts as many times as it takes :P

Hi Manfred, your post indeed touches on many of the same issues we were considering on the workshop. If I understand it correctly, you are worried about the robot correctly learning universal generalizations. For example, if P(x) is true for all odd x and false for all even x, you want your robot to correctly guess that after observing P(0),P(1),P(2),...,P(1 million). In fact, most current proposals, including Demski's, are capable of learning these universal generalizations. Our current programs are able to correctly learn rules like "P(x) is true for all odd x," or even "P(x) is true for 90% of x between 0 and 10 million." You could call this step "generalization." There are many interesting things to say about generalization, including your post and Will Sawin's Pi1-Pi2 Theorem. The problem Paul brought to our attention at this workshop however was not generalization, but "specialization." If you believe that "P(x) is true for all odd x," it's rather easy to say what you believe about P(994543). But it's surprisingly subtle to understand what you should believe about P(994543) given only that "P(x) is true for 90% of x between 0 and 10 million," particularly when P(994543) is part of a different sequence of propositions Q(x) which you have different statistical beliefs about. Properly handling the "specialization" step was the focus of the workshop, and I'd be interested to hear if you have any thoughts about the subtleties brought up in the write-up.

Well, I wrote down some stream of consciousness stuff :P

The statement "P(x) is true for 90% of x in some domain" is an interesting one. A message-length or complexity based prior over patterns (like I justified the use of in my post, and as is commonly used in things like minimum message length prediction) does not have statements like that as basic statements. Instead the basic statements are about the different patterns that could correspond to the values of P(x) over its domain.

If you've done random sampling on domain x and gotten P(x) true 90% of the time, then a robot using a complexity-based prior refuses to throw away the information about what you sampled, and just tries to find simple patterns in your data. This *does* lead it to predict that some additional statement has high probability of being true, because simple patterns tend to be consistent. But it does not assign equal probability to the truth of each P(x), because different x's can fit into different patterns.

So a complexity prior over functions exhibits the correct behavior, at least when you give it the raw data. But what if we give if the information "P(x) is true for 90% of x in some domain?"

Well, there are lots of different patterns with P(x) true for 90% of x. But the math is fairly straightfoward - our robot takes its prior, cuts out every pattern that doesn't have P(x) true for 90% of x, and then renormalizes to get its new distribution over patterns. Again, not every x has the same probability, because simple patterns contribute more, but the probability is around 90% for various x.

However, at no point does our complexity-prior robot ever think of the statement "P(x) is true for 90% of x in some domain" on its own. To do that, we'd have to make a robot that approximated by throwing out the information of what the sampled P(x) were, and only keeping the frequencies, and then didn't consider any information about the complexity of the function (equivalent to a uniform non-standard prior), so that it didn't update to a more complicated distribution.

This is a reasonable path to take for approximation. After all, the complexity prior isn't going to be computationally tractable for many real-world problems. Throwing out the sampled P(x) is obviously bad if you have enough computational resources to use that information (e.g. imagine if 50% were true and it was just the odd numbers), so along with approximation schemes, we should remember to think about ways of controlling and doing cost/benefit analyses on those approximations.

Hi Manfred, It might help to communicate if you read the report, though I'm not sure it's understandable since it was written quite rapidly :(. Thanks for sending your stream of consciousness and I look forward to hearing more of it. You seem quite convinced that a simplicity prior exhibits reasonable behavior, but I am less so. A lot of the workshop involved understanding potential pitfalls of simplicity priors. Basically, some pathologies appear to arise from the fact that a very large pseudorandom number with a short description is treated very differently from a truly random large number with no short description.

It is difficult to analyze simplicity priors since they are so intractably complicated, but do you believe that, after updating on the first 1 million values of a random 90% true sequence of propositions P(0), P(1),...,P(1 million), P(x) will be assigned similar truth probabilities both when x admits short description and when x admits only a complicated description?

I confess I'm ignorant about what report you mean - could you throw me a link?

Hm, x being simple or complicated...

For example, if our predicate P(x) is true just for prime numbers, then "the trillionth prime number" is a simple description - no prime number can have a greater complexity than its description *qua* prime number, and so if we assign high probability to simple descriptions like primeness, we'll expect P to be true for simple numbers.

I would argue that this is proper behavior. It is not that our robot cares about the complexity of the number. It just tries to find simple patterns in the data, and naturally some numbers will fit into more simple patterns than others, once we're given information about the density of P(x) being true.

An extreme example would be if P(x) were only true (or, equivalently, false) for a single number. Then our probability about P(x) really *would* be about complexity of the number. Why is this proper behavior? Because information that justifies pattern-matching behavior is information about the complexity of P(x)! If P(x) is a single number and we know that its description is less than 100 bits, that knowledge is what allows us to exhibit a preference for simpler patterns, and thus simpler numbers.

Fun fact: if we know that P(x) is true for 50% of numbers, the complexity of x has essentially no impact, even though we still favor simple patterns.

It looks like what I was talking about and what you were talking about were different things. I was talking about having a complexity-based prior over the behavior of our predicate P(x) on the basis of actually knowing about the complexity of P.

The bad behavior you wrote up, though, seems like a consequence of using a shoddy method for updating on outside information.

If we just want to throw computing power at the problem until it goes away, the solution is actually really simple: you pick a model at random like normal, but if it's not consistent with the information we're conditioning on, you throw it out. Note that this requires even probabilistic-seeming information to be phrased as frequencies, like "P(x) is true for 90% of some domain." But since this whole method of assigning probabilities is built off of frequencies, you're probably fine with that.

Can I ask a non-sequitur question? If I were to write up some posts about logical probability, largely reprising the analogous trains of thought in the first few chapters of Jaynes, what would be a good comprehensive strategy to get the people attending workshops like this to read them?

you pick a model at random like normal, but if it's not consistent with the information we're conditioning on, you throw it out.

You can do this for a finite set S by picking a random assignment of truth values to sentences in S, but (1) there is a nasty dependence on S, so the resulting distribution is very sensitive to the choice of S and definitely doesn't converge to any limit (this is related to the difficulty of defining a "random model" of some axioms), (2) if S is a finite set, you actually need to have some sentence like "P(x) is true for 90% of x such that 'P(x)' is in S" which is really awkward, and I don't know how to make it actually work.

A technically correct solution along these lines would still be of interest, but my intuition is that this isn't working for the right reasons and so there are likely to be further problems even if (2) is resolved and (1) is deemed non-problematic.

Also note that "pick a random model consistent with information we are conditioning on" is not the kind of solution we are looking for; we want conditioning to be conditioning. So your proposal would correspond to picking sentences at random for a while as in Abram Demski's proposal, and then transitioning into choosing a model at random at some (probably randomly chosen) point. This is something we considered explicitly.

Anyhow, the question is, why is throwing out non-90% models and trying again going to make [the probability that we assign true to P(x) using this random process] behave like we want the probability that P(x) is true to behave?

We can answer this with an analogy to updating on new information. If we have a probability distribution over models, and we learn that the correct model says that 90% of P(x) are true in some domain, what we do is we zero out the probability of all models where that's false, and normalize the remaining probabilities to get our new distribution. All this "output of the random process" stuff is really just describing a process that has some probability of outputting different models (that, is, Abram's process outputs a model drawn from some distribution, and then we call the probability that P(x) is true as the probability that the output of Abram's process assigns true to P(x)).

So the way you do updating is you zero out the probability that this process outputs a model where the conditioned-upon information is false, and then you normalize the outputs so that the process outputs one of the remaining models with the same relative frequencies. This is the same behavior as updating a probability distribution.

--

One thing I think you might mean by "we want conditioning to be conditioning" is that you don't want to store a (literal or effective) distribution over models, and then condition by updating that distribution and recalculating the probability of a statement. You want to store the probability of statements, and condition by doing something to that probability. Like, P(A|B) = P(AB)/P(B).

I like the aesthetics of that too - my first suggestion for logical probability was based off of storing probabilities of statements, after all. But to make things behave at all correctly, you need more than just that, you also need to be able to talk about correlations between probabilities. The easiest way to represent that? Truth tables.

Yeah, updating probabilty distributions over models is believed to be good. The problem is, sometimes our probability distributions over models are wrong, as demonstrated by bad behavior when we update on certain info.

The kind of data that would make you want to zeroi out non-90% models. Is when you observe a bunch of random data points and 90% of them are true, but there are no other patterns you can detect.

The other problem is that updates can be hard to compute.

definitely doesn't converge to any limit

I'm confused about what you want when you say this. I think an example or two would help me.

My best guess: you want the assigned probability to equal the "ideal" probability (as laid out in the probabilistic reflection paper) when we take the limit of computing power going to infinity. It's difficult to take this limit for Abram's original proposal, but if you make it doable by restricting it to S, now there can be things that are true but not provable using statements in S, so infinite computing power doesn't get us there.

It's actually not too hard to demonstrate things about the limit for Abram's original proposal, unless there's another one that's original-er than the one I'm thinking of. It limits to the distribution of outcomes of a certain incomputable random process which uses a halting oracle to tell when certain statements are contradictory.

You are correct that it doesn't converge to a limit of assigning 1 to true statements and 0 to false statements. This is of course impossible, so we don't have to accept it. But it seems like we should not have to accept divergence - believing something with high probability, then disbelieving with high probability, then believing again, etc. Or perhaps we should?

Actually, the `proof' you gave that no true list of theories like this exists made the assumption (not listed in this paper) that the sequence of indexes for the computable theories is definable over arithmetic. In general there is no reason this must be true but of course for the purposes of an AI it must.

Ultimately, you can always collapse any computable sequence of computable theories (necessary for the AI to even manipulate) into a single computable theory so there was never any hope this kind of sequence could be useful.

Actually, the `proof' you gave that no true list of theories like this exists made the assumption (not listed in this paper) that the sequence of indexes for the computable theories is definable over arithmetic. In general there is no reason this must be true but of course for the purposes of an AI it must.

("This paper" being Eliezer's writeup of the procrastination paradox.) That's true, thanks.

Ultimately, you can always collapse any computable sequence of computable theories (necessary for the AI to even manipulate) into a single computable theory so there was never any hope this kind of sequence could be useful.

First of all (always assuming the theories are at least as strong as PA), note that in any such sequence, T_0 is the union of all the theories in the sequence; if T_(n+1) |- phi, then PA |- Box_(T_(n+1)) "phi", so T_n |- Box_(T_(n+1)) "phi", so by the trust schema, T_n |- phi; going up the chain like this, T_0 |- phi. So T_0 is in fact the "collapse" of the sequence into a single theory.

That said, I disagree that there is no hope that this kind of sequence could be useful. (I don't literally want to use an unsound theory, but see my writeup about an infinite sequence of sound theories each proving the next consistent, linked from the main post; the same remarks apply there.) Yes, T_0 is stronger than T_1, so why would you ever want to use T_1? Well, T_0 + Con(T_0) is stronger than T_0, so why would you ever want to use T_0? But by this argument, you can't use *any* sound theory including PA, so this doesn't seem like a remotely reasonable argument against using T_1. Moreover, the fact that an agent using T_0 can construct an agent using T_1, but it can't construct an agent using T_0, seems like a sufficient argument against the claim that the sequence as a whole must be useless because you could always use T_0 for everything.

I meant useful in the context of AI since any such sequence would obviously have to be non-computable and thus not something the AI (or person) could make pragmatic use of.

Also, it is far from clear that T_0 is the union of all theories (and this is the problem in the proof in the other rightup). It may well be that there is a sequence of theories like this all true in the standard model of arithmetic but that their construction requires that T*n add extra statements beyond the schema for the proof predicate in T*{n+1}

Also, the claim that T*n must be stronger than T*{n+1} (prove a superset of it...to be computable we can't take all these theories to be complete) is far from obvious if you don't require that T_n be true in the standard model. If T_n is true in the standard model than, as it proves that Pf(T*n+1, \phi) -> \phi this is true so if T*{n+1} |- \phi then (as this witnessed in a finite proof) there is a proof that this holds from T_n and thus a proof of \phi. However, without this assumption I don't even see how to prove the containment claim.

I meant useful in the context of AI since any such sequence would obviously have to be non-computable and thus not something the AI (or person) could make pragmatic use of.

I was replying to this:

Ultimately, you can always collapse any computable sequence of computable theories (necessary for the AI to even manipulate) into a single computable theory so there was never any hope this kind of sequence could be useful.

I.e., I was talking about computable sequences of computable theories, not about non-computable ones.

Also, it is far from clear that T_0 is the union of all theories (and this is the problem in the proof in the other rightup). It may well be that there is a sequence of theories like this all true in the standard model of arithmetic but that their construction requires that T_n add extra statements beyond the schema for the proof predicate in T_{n+1}

I can't make sense of this. Of course T_n can contain statements other than those in T_{n+1} and the Löb schema of T_{n+1}, but this is no problem for the proof that T_0 is the union of all the theories; the point is that because of the Löb schema, we have T_{n+1} \subset T_n for all n, and therefore (by transitivity of the subset operation) T_n \subseteq T_0 for all n.

Also, the claim that T_n must be stronger than T_{n+1} (prove a superset of it...to be computable we can't take all these theories to be complete) is far from obvious if you don't require that T_n be true in the standard model. If T_n is true in the standard model than, as it proves that Pf(T_n+1, \phi) -> \phi this is true so if T_{n+1} |- \phi then (as this witnessed in a finite proof) there is a proof that this holds from T_n and thus a proof of \phi. However, without this assumption I don't even see how to prove the containment claim.

Note again that I was talking about computable sequences T_n. If T_{n+1} |- \phi and T_{n+1} is computable, then PA |- Pf(T_{n+1}, \phi) and therefore T_n |- Pf(T_{n+1}, \phi) if T_n extends PA. This doesn't require either T_n or T_{n+1} to be sound.

I am not convinced by the problematic example in the "Scientific Induction in Probabilistic Mathematics" writeup. Let's say that there are n atoms ϕ(1)..ϕ(n). If you don't condition, then because of symmetry, all consistent sets S drawn from the process have equal probability. So the prior on S is uniform and the probability of ϕ(i) is therefore 1/2, by

```
P(ϕ(i)) = ∑{S} 1[ϕ(i)∈S] * P(S)
```

n a consistent set S drawn from the process is exactly 1/2 for all i, this must be true by symmetry because μ(x)=μ(¬x). Now what you should do to condition on some statement X is simply throw out the sets S which don't satisfy that statement, i.e.

```
P(ϕ(i)|X) = ∑{S} 1[ϕ(i)∈S] * P(S) * 1[X(S)] / ∑{S} P(S) * 1[X(S)]
```

Since the prior on S was uniform, it will still be uniform on the restricted set after conditioning. So

```
P(ϕ(i)|X) = ∑{S} 1[ϕ(i)∈S] * 1[X(S)] / ∑{S} 1[X(S)]
```

Which should just be 90% in the example where X is "90% of the ϕ are true"

The mistake in the writeup is to directly define P(S|X) in an inconsistent way.

To avoid drowning in notation, let's consider a simpler example with the variables a, b and c. We will first pick a or ¬a uniformly, then b or ¬b, and finally c or ¬c. Then we try to condition on X="exactly one of a,b,c is true". You obviously get prior probabilities P(S) = 1/8 for all consistent sets.

If you condition the right way, you get P(S) = 1/3 for the sets with one true attom, and P(S)=0 for the other sets. So then

```
P(a|X) = P(a|{a,¬b,¬c})P({a,¬b,¬c}|X) + P(a|{¬a,b,¬c})P({¬a,b,¬c}|X) + P(a|{¬a,¬b,c})P({¬a,¬b,c}|X)
= 1/3
```

What the writeup does instead is first pick a or ¬a uniformly. If it picks a, we know that b and c are false. If we pick ¬a we continue. The uniform choice of a is akin to saying that

```
P({a,b,c}|X) = P(a) * P({b,c}|a,X).
```

But that first term should be `P(a|X)`

, not `P(a)`

!

Sorry if there is something fishy in the writeup :(. I could believe it, given how rushed I was writing it.

Suppose we consider not just a,~a,b,~b, and c,~c, but also statements q="exactly one of a,b,c is true" and ~q. Suppose now that we uniformly pick a truth value for a, then for q, then a logically consistent but otherwise random value for b, and finally a logically consistent but otherwise random value for c. Such an asymmetric situation could occur if b and c have high mu but a and q have small mu. In the worlds where we believe q, b and c are much more often disbelieved than a. I believe that basically captures the worries about Demski's scheme that Paul was having; maybe he will comment himself.

Does that clarify anything?

This was surprisingly clarifying for me.

I'm still not sure whether I find it too concerning. We assign higher probability to simpler theories so long as they follow the constraint that Q(n) below 10^100 is 90%. The theory which randomly assigns Q(n) to true/false for 10^99 simple values of n (and then gets forced to assign Q as true for most of the remaining n below 10^100) is just one form of theory which may be generated by the process, and not a really simple one. The theory that Q(n) represents "not divisible by 10" is going to be much more probable than all these theories.

In other words, the write-up estimates Q(n) based on a narrow subset of the theories which assign probability mass. I don't really think it's representative...

Imagine that we encounter a truly iid random sequence of 90% likely propositions Q(0),Q(1),Q(2),... Perhaps they are merely pseudorandom but impossibly complicated to reason about, or perhaps they represent some random external output that an agent observes. After observing a very large number of these Q(i), one might expect to place high probability on something like "About 90% of the next 10^100 Q(j) I haven't observed yet will be true," but there is unlikely to be any simple rule that describes the already observed Q(i). Do you think that the next 10^100 Q(j) will all individually be believed 90% likely to be true, or will the simpler to describe Q(j) receive closer to 50% probability?

We can show that the FOL prior is not too different from the algorithmic prior, so it can't perform too badly for problems where algorithmic induction does well. Partial theories which imply probabilities close to .9 but do not specify exact predictions will eventually have high probability; for example, a theory might specify that Q(x) is derived from an unspecified F(x) and G(x) (treated as random sources) getting OR'd together, making probabilities roughly .75; variations of this would bring things closer to .9.

This still may still assign simpler Q(j) to closer to 50% probability.

After writing this I realize that there is a much simpler prior on finite sets S of consistent statements: simply have a prior over all sets of statements, and keep only the consistent ones. If your language is chosen such that it contains X if and only if it also contains ¬X, then this is equivalent to choosing a truth value for each basic statement, and a uniform prior over these valuations would work fine.

To learn more about this, see "Scientific Induction in Probabilistic Mathematics", written up by Jeremy Hahn

This line:

Choose a random sentence from S, with the probability that O is chosen proportional to u(O) - 2^-length(O).

...looks like a subtraction operation to the reader. Perhaps use "i.e." instead.

The paper appears to be arguing against the applicability of the universal prior to mathematics.

However, why not just *accept* the universal prior - and then *update* on learning the laws of mathematics?

Can't you somewhat patch Demski's proposal by sampling uniformly from S rather than doing it biased by length. That would generate the right probabilities for the 90% issue, provided that all the ϕ(n) are in S to start with. If not all the sentences were in S then there still be a bias towards ϕ(n) being true but it would be only for the ones such that ϕ(n) is in S and it would be lower.

Maybe I missed something but I could never see why there was anything intrinsically good about (say) the short bias in the Solomonoff prior, it seemed like the whole thinking bigger programs were less likely was just a necessary trick to make the infinite sums finite. If we consider the formalism in the technical note that only keeps a finite number of sentences in memory then we don't have to worry about this issue and can sample uniformly rather than (arbirtrarily?) picking a bias.

In your paper http://ict.usc.edu/pubs/Logical%20Prior%20Probability.pdf you have to worry about the infinite series summing to infinity as the probability distribution is over all sentences in a language, so you have to include this bias.

If we can avoid assuming simplicity is more likely and treat it as a fact about the world which we might learn after choosing a prior isn't that better?

I feel that the bigger issue with this proposal is that it doesn't solve the problem it's trying to solve as statements in S are biased towards being picked.

Great to see this! I'm not convinced of the problem with my prior, though... The judgements for the different instances of Q(n) may be logically independent, but they won't be probabilistically independent. In fact, the logical information that Q(x) is true of exactly 90% of numbers under a given limit number L will eliminate all theories where that isn't true. I find it likely that this will push the probability of Q(L+1) up significantly.

For example, the theory that Q holds for every number not divisible by 10 becomes much more probable.

[This comment is no longer endorsed by its author]

if you're not sure whether the twin prime conjecture is true, then each time you discover a new twin prime larger than all that you have seen before, you should ever so slightly increase the probability you assign to the conjecture.

I do not understand what you mean by "probability" here. Suppose I use one criterion to estimate that the twin-prime conjecture is true with probability 0.99, but a different criterion gives me 0.9999. In what situation would my choice of the criterion matter?

Are we talking about some measure over many (equally?) possible worlds in some of which the TPC is true and in others false (or maybe unprovable)? What would I do differently if I am convinced that one criterion is "right" and the other is "wrong" vs the other way around? Would I spend more time trying to prove the conjecture if I thought it is more likely true, or something?

There are a number of things one could do with a nice logical prior:

One could use it as a model for what a mathematician does when they search for examples in order to evaluate the plausibility of a Pi_1 conjecture (like the twin prime conjecture). One could accept the model as normative and argue that mathematicians conform to it insofar as their intuitions obey Cox's theorem; or one could argue that the model is not normative because arithmetic sentences aren't necessarily true or false / don't have "aboutness".

One could use it to define a computable maximizing agent as follows: The agent works within a theory T that is powerful enough to express arithmetic

*and*has symbols that are meant to refer to the agent and aspects of the world. The agent maximizes the expectation of some objective function with respect to conditional probabilities of the form P("outcome O obtains" | "I perform action A, and the theory T is true"). Making an agent like this has some advantages:It would perform well if placed into an environment where the outcomes of its actions depend on the behavior of powerful computational processes (such as a supercomputer that dispenses pellets of utilitronium if it can find a twin prime).

More specifically, it would perform well if placed into an environment that contains multiple computable agents.

Hmm, most of this went way over my head, unfortunately. I have no problem understanding probability in statements like "There is a 0.1% chance of the twin prime conjecture being proven in 2014", because it is one of many similar statements that can be bet upon, with a well-calibrated predictor coming out ahead on average. Is the statement "the twin prime conjecture is true with 99% probability" a member of some set of statements a well calibrated agent can use to place bets and win?

For that purpose a better example is a computationally difficult statement, like "There are at least X twin primes below Y". We could place bets, and then acquire more computing power, and then resolve bets.

The mathematical theory of statements like the twin primes conjecture should be essentially the same, but simpler.

Agree with Nisan's intuition, though I also agree with Wei Dai's position that we shouldn't feel sure that Bayesian probability is the right way to handle logical uncertainty. To more directly answer the question what it means to assign a probability to the twin prime conjecture: If Omega reveals to you that you live in a simulation, and it offers you a choice between (a) Omega throws a bent coin which has probability p of landing heads, and shuts down the simulation if it lands tails, otherwise keeps running it forever; and (b) Omega changes the code of the simulation to search for twin primes and run for one more step whenever it finds one; then you should be indifferent between (a) and (b) iff you assign probability p to the twin prime conjecture. [**ETA**: Argh, ok, sorry, not quite, because in (b) you may get to run for a long time still before getting shut down -- but you get the idea of what a probability over logical statements should mean.]

[This comment is no longer endorsed by its author]

but you get the idea of what a probability over logical statements should mean

Not from your example, I do not. I suspect that if you remove this local Omega meme, you are saying that there are many different possible worlds in your inner simulator and in p*100% of them the conjecture ends up being proven... some day before that world ends. Unless you are a Platonist and assign mathematical "truths" independent immaterial existence.

Last week (Dec. 14-20), MIRI ran its 6th research workshop on logic, probability, and reflection. Writing up mathematical results takes time, and in the past, it's taken quite a while for results from these workshops to become available even in draft form. Because of this, at the December workshop, we tried something new: taking time during and in the days immediately after the workshop to write up results in quick and somewhat dirty form, while they still feel fresh and exciting.

In total, there are seven short writeups. Here's a list, with short descriptions of each. Before you get started on these writeups, you may want to read John Baez's blog post about the workshop, which gives an introduction to the two main themes of the workshop.

## Theme 1: Scientific induction in mathematics

One of the main themes was using Bayesian probability to represent uncertainty about mathematical statements. Like human mathematicians, an AI will be able to outright prove or disprove many mathematical statements, but there will also be many that it will be uncertain about, and the obvious thing to try to get a handle on dealing with such uncertainty is to assign a probability to each such statement. This would mean choosing some sort of prior, and then updating on evidence: For example, if you're not sure whether the twin prime conjecture is true, then each time you discover a new twin prime larger than all that you have seen before, you should ever so slightly increase the probability you assign to the conjecture.

But what sort of prior should we choose? That's the problem that people at the December workshop tried to make some progress on. Here is an interesting problem which the best previous proposal, due to Abram Demski, fails on: Suppose that Q(x) is a predicate symbol, and suppose that you find evidence that Q(x) is true of exactly 90% of all numbers between x=1 and x=10^100. Then we would expect that if you plug in some arbitrary number in this range, say n = floor(7^7^7^7 * pi) mod 10^100, then the posterior probability of Q(n), after conditioning on your evidence, would be 0.9. But it turns out that in Demski's proposal, the probability of Q(n) would be approximately 0.5.

To learn more about this, see "

Scientific Induction in Probabilistic Mathematics", written up by Jeremy Hahn. Jeremy writes: "At the workshop, after pinning down the above example of undesired behaiviour we turned to other proposals for priors. None of the ideas presented are in a polished enough form to be considered a complete proposal, but we are very happy with the progress that was made."## Theme 2: The "procrastination paradox"

The other main theme of the workshop was the Löbian obstacle to self-modifying AI, and more specifically, something we're calling the "procrastination paradox". One approach for constructing a self-modifying AI based on mathematical logic would be to require the AI to prove, whenever it wants to take an action -- be it an external action or making some modification to its own source code -- that this action is "safe". Then you would naively expect the AI to be able to reason as follows: "The following simple self-modification X is safe, because the modified AI will still only take actions that it has proven to be safe, so it will only take safe actions." But by Löb's theorem, a consistent system T which is as strong as Peano Arithmetic cannot go from "whatever the modified AI does will

have been proven(in T) to be safe" to "whatever the modified AI does willin factbe safe". (See here for discussion of why it seems reasonable to tackle self-modifying AI from the perspective of mathematical logic, even though a realistic AI will certainly need to use probability to reason about the physical world.) In particular, we cannot have a consistent system T as strong as PA that proves "If S is provable in T, then S" for all sentences S.If we can't have a system T that trusts

itselfin this way, maybe we can have an infinite sequence of consistent systems T_n such that each T_n proves "If S is provable in T_(n+1), then S", for all sentences S? Marcello (Herreshoff) has shown that in fact there is such a sequence, explained in his paper with Eliezer on the Löbian obstacle. However, although it is consistent (meaning that it doesn't proveeverything; in particular, it doesn't prove 0=1), this system isunsound(meaning that it provessomethings which aren't actually true). We now have an intuition for why this is so, and a corresponding formal proof that any sequence like this must be unsound: Imagine a sequence of agents A_n, using the systems T_n, whose goal it is that a particular button gets pressed -- they don't carewhenit gets pressed, just that itdoesget pressed, eventually. Each of these agents will try to prove that one of its successors presses the button; if it can't find such a proof, then it will press the button itself. Now A_n reasons as follows (in T_n):Either my successor, A_(n+1), will press the button, or it won't. If it

doesn'tpress the button, then it must have found a proof that the button gets pressed at some later time. This proof is in T_(n+1), and I know that if something is provable in T_(n+1), then it is true. Thus, either way, the button gets pressed -- by my successor, or at some later time.So A_n has found a proof that the button gets pressed, and therefore doesn't press the button itself. But since this is true for every n, the button never gets pressed -- and therefore, all of the systems T_n are unsound, since they prove a statement which isn't true. Metaphorically, every time the AI thinks about pressing the button, it convinces itself that there is some

futuretime when it will stop procrastinating and actually press the button, sotodayit is surely fine to procrastinate -- and in this way, the AI procrastinates forever. For more, including formal proofs, see "The Procrastination Paradox", written up by Eliezer.However, on the positive side, it turns out that if you modify Marcello's construction so that T_n merely proves that T_(n+1) is

consistent, rather than proving that everything T_(n+1) proves is true, then Marcello's proof not only shows that the T_n are all consistent, but also that they are sound. (This is really easy to see: T_n is PA plus the axiom "psi(n) -> Con(T_(n+1))", for some formula psi(n). Marcello's proof shows that all of these systems T_n are consistent. But then, since T_(n+1) is consistent, "psi(n) -> Con(T_(n+1))" is true, so all axioms of T_n are true and T_n is sound.) Moreover, although there are things that T_n proves that T_(n+1) doesn't -- namely, "Con(T_(n+1))", since if T_(n+1) proved that, it would be inconsistent -- you can construct these T_n so that they all have the same proof-theoretic ordinal, so to the extent that you think that proof-theoretic ordinals are a good measure of the "mathematical strength" of a theory, all of the theories T_n can be said to have the same amount of mathematical strength. Finally, although T_n doesn't prove the Löb schema "If T_(n+1) proves S, then S is true" forallsentences S, it does in fact prove it for a particular class of sentences (the Pi_1 sentences). For details, see "An infinitely descending sequence of sound theories each proving the next consistent", written up by Benja Fallenstein, a.k.a. myself.Is it enough to have the Löb schema for Pi_1 sentences? (These are sentences of the form "forall x. phi(x)", where phi(x) is such that you can write a computer program that takes x as input and outputs whether phi(x) is true or false.) Actually, there's an argument to be made that it might be

exactly rightfor tiling agents: It seems to be as much as you can get without running into the procrastination paradox, and it's enough to formalize goals like "The world is not destroyed before time n" (if the world is computable). Moreover, there's a way to combine this with ideas from a different solution to the tiling agents problem, parametric polymorphism (one version of which is described in the tiling agents paper), in order to also handle goals that don't as obviously fit into the Pi_1 mold; we call this hybrid of ideas "Fallenstein's monster", written up by Nate Soares a.k.a. So8res, because it looks rather grafted together. (In fact, it was even uglier when we came up with it, because back then we didn't even have the sound version of Marcello's construction yet. I'm hopeful that we will find prettier solutions to the remaining monstrous aspects as well, so that I will soon be able to ask my trusted research assistant, Igor, to dispose of the monster.)Finally, we usually talk of a set of equations like "T_n = PA + psi(n) -> Con(T_(n+1))" as if they define

one particularsequence of theories T_n, but it's not obvious that different ways of making this formal lead to logically equivalent theories. In fact, I'd have guessed that in general they don't, but Will Sawin had the opposite intuition and was right; any recursively enumerable set of equations of this type has a unique solution (in the sense that if PA proves that two r.e. lists of theories U_i and V_i are both solutions, then PA proves that they are equivalent). For details, see "Recursively-defined logical theories are well-defined", written up by Nisan Stiennon. Interestingly, this turns out to be an equivalent formulation of Löb's theorem!## Odds and ends

In addition to the two main themes, there were some odds and ends: First, it turns out that the 5-and-10 problem, which has been discussed on LW quite a bit, is

almosta problem for the formalism in the tiling agents paper; the exact system discussed in the paper narrowly avoids the problem, but a simple and natural variant runs into it. For details, see "The 5-and-10 problem and the tiling agents formalism", written up by me.And both last and least, we considered and disposed of a conjecture that a particular version of parametric polymorphism might be able to avoid the problem of losing mathematical strength when an agent using it rewrites itself. This would have been an interesting result if it had been true, so it seems worth recording, but since it turned out to be false, it's not particularly exciting. See "

Decreasing mathematical strength in one formalization of parametric polymorphism", also written up by me.