# Ω 22

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is independent research. To make it possible for me to continue writing posts like this, please consider supporting me.

## Abstract

I frame probability theory as a lens for understanding the real-world phenomenon of machines that quantify their uncertainty in their beliefs. I argue in favor of using multiple lenses to look at any phenomena, even when one of those lenses appears to be the most powerful one, in order to keep track of the boundary between the phenomena we are examining and the lens we are using to examine it. I argue that logical induction is another lens for looking at the real-world phenomena of machines that quantify their uncertainty in their beliefs, addressing the specific case of machines that are Turing-computable. I argue that tracking uncertainty in purely logical claims is merely a necessary consequence of addressing this general issue. I conclude with remarks on the importance of general theories of intelligent systems in the engineering of safety-critical AI systems.

## Introduction

Last year I began studying the logical induction paper published by MIRI in 2016. I did so partially because I was unsure how to contribute within AI safety. I had been reading articles and writing posts but was at a bit of a loss as to what to do next, and I had been curious about, although intimidated by logical induction for some time, and so I felt inclined to study it. I felt a little guilty about spending time looking into something that I was merely curious about, at the expense perhaps of other projects that might be more directly helpful.

But my curiosity turned out to be well-founded. As I read through the logical induction paper, and particularly as I came back to it many days in a row, I started to see that it was actually significant in a different way than what I had expected. I had previously understood that logical induction was about having uncertainty about purely logical claims, such as putting 10% probability on the claim that the one billionth digit of pi is a 3. That is certainly in there, but as I read through the paper, I realized that logical induction paints a bigger picture than this, and that in fact the maintaining of uncertainty in logical claims is a mere consequence of this bigger picture.

The bigger picture is this: logical induction provides an account of what it means for machines to quantify their uncertainty in their beliefs, in the specific case when the machine is subject to the laws of physics and can only compute things that are Turing-computable. In general it is not possible for such a machine to immediately evaluate the logical consequences of each new observation, so logical induction is forced to deal with the gradual propagation of logical information through belief networks, and so there has to be some mechanism for accounting for partially-propagated logical information, and of course one reasonable way to accomplish that is by maintaining explicit uncertainty in logical claims. But the way I see it, this is merely a consequence of having a formal account of quantified uncertainty when the one doing the quantifying is explicitly subject to the laws of computability.

## Probability theory

Probability theory also provides an account of what it means to quantify one’s uncertainty in one’s beliefs. It is a different account from the one provided by logical induction, and it is a compelling account. We might view each basic derivation of probability theory as an instance of the following message:

If the credences you assign to your beliefs obey the laws of probability theory, then you will get such-and-such benefits.

Under the Dutch book arguments for probability theory, the "benefits you get" are that if you bet on your credences then you can be certain that you will not be Dutch-booked. (Being Dutch-booked is when someone makes a combination of bets with you under which you are guaranteed to lose money no matter what the state of the world turns out to be.)

Conversely, we can view derivations of probability theory as an instance of:

If you want such-and-such benefits, then the credences you assign to your beliefs must obey the laws of probability theory.

Under the Dutch book arguments for probability theory, the message is now that if you don’t want to be Dutch-booked, then your credences must obey the laws of probability theory.

There are other ways to arrive at the laws of probability theory, too. For example, under under Jaynes’ derivation, the "benefits you get" are his three desiderata[1]:

• Degrees of plausibility are represented by real numbers

• Qualitative correspondence with common sense

• If a conclusion can be reasoned out in more than one way, then every possible way must lead to the same result.

These desiderata seem so reasonable, so humble, so minimal, that it can be difficult to imagine that one would ever not want to obey the laws of probability theory. Do you not want your credences to be consistent? Do you not want your credences to correspond with common sense (which Jaynes operationalizes as a set of very reasonable inequalities)? Do you not want your credences to be represented by numbers? When I originally read Jaynes’ book, it appeared to me that he had demonstrated that probability theory was the right way to quantify uncertainty, and in my thinking there was very little daylight between the abstraction of "probability theory" and the real-world phenomenon of "having beliefs".

And indeed probability theory is excellent. But there is just one hitch: we cannot in general build machines that implement it! This is not a criticism of probability theory itself, but it is a criticism of viewing probability theory as a final answer to the question of how we can quantify our uncertainty in our beliefs. Probability theory is a lens through which we can view the real-world phenomena of machines with quantified uncertainty in their beliefs:

Probability theory is a way of understanding something that is out there in the world, and that way-of-understanding gives us affordances with which to take actions and engineer powerful machines. But when we repeatedly use one lens to look at some phenomenon, as I did for many years with probability theory, it’s easy to lose track of the boundary between the lens and the thing that is actually out there in the world:

To avoid this collapse of the lens/phenomenon, or map/territory distinction, it is helpful to view phenomena through multiple lenses:

Of course using multiple lenses may allow us to see aspects of real-world phenomena missed by our primary lenses, but even beyond that, using multiple lenses is helpful simply insofar as it reminds us that we are in fact using lenses to see the world. In this way it helps us to see not just the phenomena we are looking at but also the lens itself.

## Logical induction

So what exactly is the perspective that logical induction gives us on machines with quantified uncertainty in their beliefs? Well, one way to see the framework presented in the logical induction paper is:

If the credences you assign to your beliefs obey the logical induction criterion, then you will get such-and-such benefits.

In the case of logical induction, the benefits are things like coherence, convergence, timeliness, and unbiasedness[2]. But different from probability theory, these concepts are operationalized as properties of the evolution of your credences over time, rather than as properties of your credences at any particular point in time.

The benefits promised to one whose credences obey the laws of logical induction are weaker than those promised to one whose credences obey the laws of probability theory. A logical inductor can generally be Dutch-booked at any finite point in time, unless by some fluke it happens to have fallen perfectly into alignment with the laws of probability theory at that point in time, in which case there is no guarantee at all that it will remain there. So why would one choose to pick credences that obey the logical induction criterion rather than the laws of probability theory?

The answer is that credences that obey the logical induction criterion can be computed, no matter how many claims about the world you are maintaining uncertainty with respect to, and no matter how complex the relationships between those claims. This is a very significant property. The logical induction criterion threads a needle that sits at the intersection of those formulations that give us the properties we care about (coherence, convergence, timeliness, unbiasedness, and so on), and those formulations that permit a computable implementation:

## Conclusion: theory-based engineering

As we build safety-critical systems, and safety-critical AI systems in particular, it is good to have theories that can guide our engineering efforts. In particular it is good to have theories that make statements of the form "if you construct your system in such-and-such a way then it will have such-and-such properties". We can then decide whether we want those properties, and whether it is feasible for us to work within the confines of the design space afforded by the theory.

For the purpose of engineering AI systems, our theories must ultimately be applicable to machines that can be constructed in the real world. Probability theory is not natively applicable to machines that can be constructed in the real world, but it can be made applicable by finding appropriate approximation schemes and optimization algorithms through which we can reason not just about the properties we want our credences to have but also the mechanism by which we will get them there. Logical induction, also, is not "natively applicable" to machines that can be constructed in the real world. The logical induction algorithm is computable but not efficient. It does not guarantee anything about our credences at any finite point in time but only about the evolution of our credences over all time, so it does not provide a complete account for how to build general intelligent systems that do what we want. But it is a step in that direction.

Most importantly, to me, logical induction provides an example of what theories of intelligent systems can look like. We have few such theories, and the ones we do have, like probability theory, are so familiar that it can be difficult to see the daylight between the real-world phenomenon we are trying to understand the theory we are using as a lens to look at it. By looking through multiple lenses we are more likely to be able to see where to construct new theories that revise the answers given by our existing theories about what shape intelligent systems ought to have.

1. Of course these imply non-dutch-bookability and non-dutch-bookability implies these desiderata. It is merely a different formulation and emphasis of the same message. ↩︎

2. In chapter 4 of the paper each of these is defined formally as a property of a sequence of belief states and then it is proved that credences that obeys the logical induction criterion will necessarily have this property. ↩︎

# Ω 22

New Comment

Great post, I’m glad this is written up nicely.

One section was especially interesting to me:

If the credences you assign to your beliefs obey the logical induction criterion, then you will get such-and-such benefits.

In the case of logical induction, the benefits are things like coherence, convergence, timeliness, and unbiasedness[2]. But different from probability theory, these concepts are operationalized as properties of the evolution of your credences over time, rather than as properties of your credences at any particular point in time.

I had not realized this is how it is defined, but now it seems obvious, or necessary. When dealing with real world systems where cognition happens in time, the logical induction criterion talk about beliefs over time rather than beliefs at a single point in time. Very interesting.

Yeah, I agree, logical induction bakes in the concept of time in a way that probability theory does not. And yeah, it does seem necessary, and I find it very interesting when I squint at it.

Well stated. For what it's worth I think this is a great explanation of why I'm always going on about the problem of the criterion: as embedded, finite agents without access to hypercomputation or perfect, a priori knowledge we're stuck in this mess of trying to figure things out from the inside and always getting it a little bit wrong, no matter how hard we try, so it's worth paying attention to that because solving, for example, alignment for idealized mathematical systems that don't exist is maybe interesting but also not an actual solution to the alignment problem.

That post was a delightful read! Thanks for the pointer.

It seems that we cannot ever find, among concepts, a firm foundation on which we can be absolutely sure of our footing. For the same reason, our basic systems of logic, ethics, and empiricism can never be put on absolutely sure footing (Godel, Humean is/ought gap, radical skepticism).

Right. For example, I think Stuart Armstrong is hitting something very important about AI alignment with his pursuit of the idea that there's no free lunch in value learning. We only close the gap by making an "arbitrary" assumption, but it's only arbitrary if you assume there's some kind of context-free version of the truth. Instead we can choose in a non-arbitrary way based on what we care about and is useful to us.

I realize lots of people are bored by this point because they're non-arbitrary solution that is useful is some version of rationality criteria since those are very useful for not getting Dutch booked, for example, but we could just as well choose something else and humans, for example, seem to do just that, even though so far we'd be hard pressed to very precisely say just what it is that humans do assume to ground things in, although we have some clues of things that seem important, like staying alive.

You're talking about how we ground out our thinking in something that is true but is not just further conceptualization?

Look if we just make a choice about the truth by making an assumption then eventually the world really does "bite back". It's possible to try this out by just picking a certain fundamental orientation towards the world and sticking to it no matter what throughout your life for a little while. The more rigidly you adhere to it the more quickly the world will bite back. So I don't think we can just pick a grounding.

But at the same time I very much agree that there is no concept that corresponds to the truth in a context-free or absolute way. The analogy I like the most is dance: imagine if I danced a dance that beautifully expressed what it's like to walk in the forest at night. It might be an incredibly evocative dance and it might point towards a deep truth about the forest at night, but it would be strange to claim that a particular dance is the final, absolute, context-free truth. It would be strange to seek after a final, absolute, context-free dance that expresses what it's like to walk in the forest at night in a way that finally captures the actual truth about the forest at night.

When we engage in conceptualization, we are engaging in something like a dance. It's a dance with real consequence, real power, real impacts on the world, and real importance. It matters that we dance it and that we get it right. It's hard to think of anything at this point that matters more. But its significance is not a function of its capturing the truth in a final or context-free way.

So when I consider "grounding out" my thinking in reality, I think of it in the same way that a dance should "ground out" in reality. That is: it should be about something real. It's also possible to pick some idea about what it's really like to walk in the forest at night and dance in a way that adheres to that idea but not to the reality of what it's actually like to walk in the forest at night. And it's possible to think in a way that is similarly not in accord with reality itself. But just as with dance, thinking in accord with reality is not at all about capturing reality in a final, absolute, or context-free way.

Is this how you see things too?

Yep, that accords well with my own current view.