Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

The content of this post would not exist if not for conversations with Zack Davis, and owes something to conversations with Sam Eisenstat.

There's been some talk about filtered evidence recently. I want to make a mathematical observation which causes some trouble for the Bayesian treatment of filtered evidence. [OK, when I started writing this post, it was "recently". It's been on the back burner for a while.]

This is also a continuation of the line of research about trolling mathematicians, and hence, relevant to logical uncertainty.

I'm going to be making a mathematical argument, but, I'm going to keep things rather informal. I think this increases the clarity of the argument for most readers. I'll make some comments on proper formalization at the end.


Alright, here's my argument.

According to the Bayesian treatment of filtered evidence, you need to update on the fact that the fact was presented to you, rather than the raw fact. This involves reasoning about the algorithm which decided which facts to show you. The point I want to make is that this can be incredibly computationally difficult, even if the algorithm is so simple that you can predict what it will say next. IE, I don't need to rely on anything like "humans are too complex for humans to really treat as well-specified evidence-filtering algorithms".

For my result, we imagine that a Bayesian reasoner (the "listener") is listening to a series of statements made by another agent (the "speaker").

First, I need to establish some terminology:

Assumption 1. A listener will be said to have a rich hypothesis space if the listener assigns some probability to the speaker enumerating any computably enumerable set of statements.

The intuition behind this assumption is supposed to be: due to computational limitations, the listener may need to restrict to some set of easily computed hypotheses; for example, the hypotheses might be poly-time or even log-poly. This prevents hypotheses such as "the speaker is giving us the bits of a halting oracle in order", as well as "the speaker has a little more processing power than the listener". However, the hypothesis space is not so restricted as to limit the world to being a finite-state machine. The listener can imagine the speaker proving complicated theorems, so long as it is done sufficiently slowly for the listener to keep up. In such a model, the listener might imagine the speaker staying quiet for quite a long time (observing the null string over and over, or some simple sentence such as 1=1) while a long computation completes; and only then making a complicated claim.

This is also not to say that I assume my listener considers only hypotheses in which it can 100% keep up with the speaker's reasoning. The listener can also have probabilistic hypotheses which recognize its inability to perfectly anticipate the speaker. I'm only pointing out that my result does not rely on a speaker which the listener can't keep up with.

What it does rely on is that there are not too many restrictions on what the speaker eventually says.

Assumption 2. A listener believes a speaker to be honest if the listener distinguishes between "X" and "the speaker claims X at time t" (aka "claims-X"), and also has beliefs such that P(X| claims-X)=1 when P(claims-X) > 0.

This assumption is, basically, saying that the agent trusts its observations; the speaker can filter evidence, but the speaker cannot falsify evidence.

Maybe this assumption seems quite strong. I'll talk about relaxing it after I sketch the central result.

Assumption 3. A listener is said to have minimally consistent beliefs if each proposition X has a negation X*, and P(X)+P(X*)1.

The idea behind minimally consistent beliefs is that the listener need not be logically omniscient, but does avoid outright contradictions. This is important, since assuming logical omniscience would throw out computability from the start, making any computational-difficulty result rather boring; but totally throwing out logic would make my result impossible. Minimal consistency keeps an extremely small amount of logic, but, it is enough to prove my result.

Theorem(/Conjecture). It is not possible for a Bayesian reasoner, observing a sequence of remarks made by a speaker, to simultaneously:

  • Have a rich hypothesis space.
  • Believe the speaker to be honest.
  • Have minimally consistent beliefs.
  • Have computable beliefs.

Proof sketch. Suppose assumptions 1-3. Thanks to the rich hypothesis space assumption, the listener will assign some probability to the speaker enumerating theorems of PA (Peano Arithmetic). Since this hypothesis makes distinct predictions, it is possible for the confidence to rise above 50% after finitely many observations. At that point, since the listener expects each theorem of PA to eventually be listed, with probability > 50%, and the listener believes the speaker, the listener must assign > 50% probability to each theorem of PA! But this implies that the listener's beliefs are not computable, since if we had access to them we could separate theorems of PA from contradictions by checking whether a sentence's probability is > 50%.

So goes my argument.


What does the argument basically establish?

The argument is supposed to be surprising, because minimally consistent beliefs are compatible with computable beliefs; and rich hypothesis space is compatible with beliefs which are computable on observations alone; yet, when combined with a belief that the speaker is honest, we get an incomputability result.

My take-away from this result is that we cannot simultaneously use our unrestricted ability to predict sensory observations accurately and have completely coherent beliefs about the world which produces those sensory observations, at least if our "bridge" between the sensory observations and the world includes something like language (whereby sensory observations contain complex "claims" about the world).

This is because using the full force of our ability to predict sensory experiences includes some hypotheses which eventually make surprising claims about the world, by incrementally computing increasingly complicated information (like a theorem prover which slowly but inevitably produces all theorems of PA). In other words, a rich sensory model contains implicit information about the world which we cannot immediately compute the consequences of (in terms of probabilities about the hidden variables out there in the world). This "implicit" information can be necessarily implicit, in the same way that PA is necessarily incomplete.

To give a non-logical example: suppose that your moment-to-moment anticipations of your relationship with a friend are pretty accurate. It might be that if you roll those anticipations forward, you inevitably become closer and closer until the friendship becomes a romance. However, you can't necessarily predict that right now; even though the anticipation of each next moment is relatively easy, you face a halting-problem-like difficulty if you try to anticipate what the eventual behavior of your relationship is. Because our ability to look ahead is bounded, each new consequence can be predictable without the overall outcome being predictable.

Thus, in order for an agent to use the full force of its computational power on predicting sensory observations, it must have partial hypotheses -- similar to the way logical induction contains traders which focus only on special classes of sentences, or Vanessa's incomplete Bayesianism contains incomplete hypotheses which do not try to predict everything.

So, this is an argument against strict Bayesianism. In particular, it is an argument against strict Bayesianism as a model of updating on filtered evidence! I'll say more about this, but first, let's talk about possible holes in my argument.


Here are some concerns you might have with the argument.

One might possibly object that the perfect honesty requirement is unrealistic, and therefore conclude that the result does not apply to realistic agents.

  • I would point out that the assumption is not so important, so long as the listener can conceive of the possibility of perfect honesty, and assigns it nonzero probability. In that case, we can consider P(X|honesty) rather than P(X). Establishing that some conditional beliefs are not computable seems similarly damning.
  • Furthermore, because the "speaker" is serving the role of our observations, the perfect honesty assumption is just a version of P(X|observe-X)=1. IE, observing X gives us X. This is true in typical filtered-evidence setups; IE, filtered evidence can be misleading, but it can't be false.
  • However, one might further object that agents need not be able to conceive of "perfect honesty", because this assumption has an unrealistically aphysical, "perfectly logical" character. One might say that all observations are imperfect; none are perfect evidence of what is observed. In doing so, we can get around my result. This has some similarity to the assertion that zero is not a valid probability. I don't find this response particularly appealing, but I also don't have a strong argument against it.

Along similar lines, one might object that the result depends on an example ("the speaker is enumerating theorems") which comes from logic, as opposed to any realistic physical world-model. The example does have a "logical" character -- we're not explicitly reasoning about evidence-filtering algorithms interfacing with an empirical world and selectively telling us some things about it. However, I want to point out that I've assumed extremely little "logic" -- the only thing I use is that you don't expect a sentence and its negation to both be true. Observations corresponding to theorems of PA are just an example used to prove the result. The fact that P(X) can be very hard to compute even when we restrict to easily computed P(claims-X) is very general; even if we do restrict attention to finite-state-machine hypotheses, we are in P-vs-NP territory.


What does this result say about logical uncertainty?

Sam's untrollable prior beat the trollable-mathematician problem by the usual Bayesian trick of explicitly modeling the sequence of observations -- updating on I-observed-X-at-this-time rather than only X. (See also the illustrated explanation.)

However, it did so at a high cost: Sam's prior is dumb. It isn't able to perform rich Occam-style induction to divine the hidden rules of the universe. It doesn't believe in hidden rules; it believes "if there's a law of nature constraining everything to fit into a pattern, I will eventually observe that law directly." It shifts its probabilities when it makes observations, but, in some sense, it doesn't shift them very much; and indeed, that property seems key to the computability of that prior.

So, a natural question arises: is this an essential property of an untrollable prior? Or can we construct a "rich" prior which entertains hypotheses about the deep structure of the universe, learning about them in an Occam-like way, which is nonetheless still untrollable?

The present result is a first attempt at an answer: given my (admittedly a bit odd) notion of rich hypothesis space, it is indeed impossible to craft a computable prior over logic with some minimal good properties (like believing what's proved to it). I don't directly address a trollability-type property, unfortunately; but I do think I get close to the heart of the difficulty: a "deep" ability to adapt in order to predict data better stands in contradiction with computability of the latent probability-of-a-sentence.


So, how should we think about filtered evidence?

Orthodox Bayesian (OB): We can always resolve the problem by distinguishing between X and "I observe X", and conditioning on all the evidence available. Look how nicely it works out in the Monty Hall problem and other simple examples we can write down.

Skeptical Critique (SC): You're ignoring the argument. You can't handle cases where running your model forward is easier than answering questions about what happens eventually; in those cases, many of your beliefs will either be uncomputable or incoherent.

OB: That's not a problem for me. Bayesian ideals of rationality apply to the logically omniscient case. What they give you is an idealized notion of rationality, which defines the best an agent could do.

SC: Really? Surely your Bayesian perspective is supposed to have some solid implications for finite beings who are not logically omniscient. I see you giving out all this advice to machine learning programmers, statisticians, doctors, and so on.

OB: Sure. We might not be able to achieve perfect Bayesian rationality, but whenever we see something less Bayesian than it could be, we can correct it. That's how we get closer to the Bayesian ideal!

SC: That sounds like cargo-cult Bayesianism to me. If you spot an inconsistency, it matters how you correct it; you don't want to go around correcting for the planning fallacy by trying to do everything faster, right? Similarly, if your rule-of-thumb for the frequency of primes is a little off, you don't want to add composite numbers to your list of primes to fudge the numbers.

OB: No one would make those mistakes.

SC: That's because there are, in fact, rationality principles which apply. You don't just cargo-cult Bayesianism by correcting inconsistencies any old way. A boundedly rational agent has rationality constraints which apply, guiding it to better approximate "ideal" rationality. And those rationality constraints don't actually need to refer to the "ideal" rationality. The rationality constraints are about the update, not in the ideal which the update limits to.

OB: Maybe we can imagine some sort of finite Bayesian reasoner, who treats logical uncertainty as a black box, and follows the evidence toward unbounded-Bayes-optimality in a bounded-Bayes-optimal way...

SC: Maybe, but I don't know of a good picture which looks like that. The picture we do have is given by logical induction: we learn to avoid Dutch books by noticing lots of Dutch books against ourselves, and gradually becoming less exploitable.

OB: That sounds a lot like the picture I gave.

SC: Sure, but it's more precise. And more importantly, it's not a Bayesian update -- there is a kind of family resemblance in the math, but it isn't learning through a Bayesian update in a strict sense.

OB: Ok, so what does all this have to do with filtered evidence? I still don't see why the way I handle that is wrong.

SC: Well, isn't the standard Bayesian answer a little suspicious? The numbers conditioning on X don't come out to what you want, so you introduce something new to condition on, observe-X, which can have different conditional probabilities. Can't you get whatever answer you want, that way?

OB: I don't think so? The numbers are dictated by the scenario. The Monty Hall problem has a right answer, which determines how you should play the game if you want to win. You can't fudge it without changing the game.

SC: Fair enough. But I still feel funny about something. Isn't there an infinite regress? We jump to updating on observe-X when X is filtered. What if observe-X is filtered? Do we jump to observe-observe-X? What if we can construct a "meta Monty-Hall problem" where it isn't sufficient to condition on observe-X?

OB: If you observe, you observe that you observe. And if you observe that you observe, then you must observe. So there's no difference.

SC: If you're logically perfect, sure. But a boundedly rational agent need not realize immediately that it observed X. And certainly it need not realize and update on the entire sequence "X", "I observed X", "I observed that I observed X", and so on.

OB: Ok...

SC: To give a simple example: call a sensory impression "subliminal" when it is weak enough that only X is registered. A stronger impression also registers "observe-X", making the sensory impression more "consciously available". Then, we cannot properly track the effects of filtered evidence for subliminal impressions. Subliminal impressions would always register as if they were unfiltered evidence.

OB: ...no.

SC: What's wrong?

OB: An agent should come with a basic notion of sensory observation. If you're a human, that could be activation in the nerves running to sensory cortex. If you're a machine, it might be RBG pixel values coming from a camera. That's the only thing you ever have to condition on; all your evidence has that form. Observing a rabbit means getting pixel values corresponding to a rabbit. We don't start by conditioning on "rabbit" and then patch things by adding "observe-rabbit" as an additional fact. We condition on the complicated observation corresponding to the rabbit, which happens to, by inference, tell us that there is a rabbit.

SC: That's... a bit frustrating.

OB: How so?

SC: The core Bayesian doctrine is the Kolmogorov axioms, together with the rule that we update beliefs via Bayesian conditioning. A common extension of Bayesian doctrine grafts on a distinction between observations and hypotheses, naming some special events as observable, and others as non-observable hypotheses. I want you to notice when you're using the extension rather than the core.

OB: How is that even an extension? It just sounds like a special case, which happens to apply to just about any organism.

SC: But you're restricting the rule "update beliefs by Bayesian conditioning" -- you're saying that it only works for observations, not for other kinds of events.

OB: Sure, but you could never update on those other kinds of events anyway.

SC: Really, though? Can't you? Some information you update on comes from sensory observations, but other information comes from reasoning. Something like a feedforward neural network just computes one big function on sense-data, and can probably be modeled in the way you're suggesting. But something like a memory network has a nontrivial reasoning component. A Bayesian can't handle "updating" on internal calculations it's completed; at best they're treated as if they're black boxes whose outputs are "observations" again.

OB: Ok, I see you're backing me into a corner with logical uncertainty stuff again. I still feel like there should be a Bayesian way to handle it. But what does this have to do with filtered evidence?

SC: The whole point of the argument we started out discussing is that if you have this kind of observation/hypothesis divide, and have sufficiently rich ways of predicting sensory experiences, and remain a classical Bayesian, then your beliefs about the hidden information are not going to be computable, even if your hypotheses themselves are easy to compute. So we can't realistically reason about the hidden information just by Bayes-conditioning on the observables. The only way to maintain both computability and a rich hypothesis space under these conditions is to be less Bayesian, allowing for more inconsistencies in your beliefs. Which means, reasoning about filtered evidence doesn't reduce to applying Bayes' Law.

OB: That... seems wrong.

SC: Now we're getting somewhere!


All that being said, reasoning about filtered evidence via Bayes' Law in the orthodox way still seems quite practically compelling. The perspective SC puts forward in the above dialogue would be much more compelling if I had more practical/interesting "failure-cases" for Bayes' Law, and more to say about alternative ways of reasoning which work better for those cases. A real "meta Monty-Hall problem".

Arguably, logical induction doesn't use the "condition on the fact that X was observed" solution:

  • Rather than the usual sequential prediction model, logical induction accommodates information coming in for any sentence, in any order. So, like the "core of Bayesianism" mentioned by SC, it maintains its good properties without special assumptions about what is being conditioned on. This is in contrast to, e.g., Solomonoff induction, which uses the sequential prediction model.
  • In particular, in Monty Hall, although there is a distinction between the sentence "there is a goat behind door 3" and "the LI discovers, at time t, that there is a goat behind door 3" (or suitable arithmetizations of these sentences), we can condition on the first rather than the second. A logical inductor would learn to react to this in the appropriate way, since doing otherwise would leave it Dutch-bookable.

One might argue that the traders are implicitly using the standard Bayesian "condition on the fact that X was observed" solution in order to accomplish this. Or that the update an LI performs upon seeing X is always that it saw X. But to me, this feels like stretching things. The core of the Bayesian method for handling filtered evidence is to distinguish between X and the observation of X, and update on the latter. A logical inductor doesn't explicitly follow this, and indeed appears to violate it. Part of the usual idea seems to be that a Bayesian needs to "update on all the evidence" -- but a logical inductor just gets a black-box report of X, without any information on how X was concluded or where it came from. So information can be arbitrarily excluded, and the logical inductor will still do its best (which, in the case of Monty Hall, appears to be sufficient to learn the correct result).


A notable thing about the standard sort of cases, where the Bayesian way of reasoning about filtered evidence is entirely adequate, is that you have a gears-level model of what is going on -- a causal model, which you can turn the crank on. If you run such a model "forward" -- in causal order -- you compute the hidden causes before you compute the filtered evidence about them. This makes it sound as if predicting the hidden variables should be easier than predicting the sensory observations; and, certainly makes it hard to visualize the situation where it is much much harder.

However, even in cases where we have a nice causal model like that, inferring the hidden variables from what is observed can be intractably computationally difficult, since it requires reverse-engineering the computation from its outputs. Forward-sampling causal models is always efficient; running them backwards, not so.

So even with causal models, there can be good reason to engage more directly with logical uncertainty rather than use pure Bayesian methods.

However, I suspect that one could construct a much more convincing example if one were to use partial models explicitly in the construction of the example. Perhaps something involving an "outside view" with strong empirical support, but lacking a known "inside view" (lacking a single consistent causal story).

Unfortunately, such an example escapes me at the moment.


Finally, some notes on further formalisation of my main argument.

The listener is supposed to have probabilistic beliefs of the standard variety -- an event space which is a sigma-algebra, and which has a P(event) obeying the Kolmogorov axioms. In particular, the beliefs are supposed to be perfectly logically consistent in the usual way.

However, in order to allow logical uncertainty, I'm assuming that there is some embedding of arithmetic; call it E[]. So, for each arithmetic sentence S, there is an event E[S]. Negation gets mapped to the "star" of an event: E[¬S] = (E[S])*. This need not be the compliment of the event E[S]. Similarly, the embedding E[AB] need not be E[A]E[B]; E[AB] need not be E[A]E[B]; and so on. That's what allows for logical non-omniscience -- the probability distribution doesn't necessarily know that E[AB] should act like E[A]E[B], and so on.

The more we impose requirements which force the embedding to act like it should, the more logical structure we are forcing onto the beliefs. If we impose very much consistency, however, then that would already imply uncomputability and the central result would not be interesting. So, the "minimal consistency" assumption requires very little of our embedding. Still, it is enough for the embedding of PA to cause trouble in connection with the other assumptions.

In addition to all this, we have a distinguished set of events which count as observations. A first pass on this is that for any event A, there is an associated event obs(A) which is the observation of A. But I do worry that this includes more observation events than we want to require. Some events A do not correspond to sentences; sigma-algebras are closed under countable unions. If we think of the observation events as claims made by the speaker, it doesn't make sense to imagine the speaker claiming a countable union of sentences (particularly not the union of an uncomputable collection).

So, more conservatively, we might say that for events E[S], that is events in the image of the embedding, we also have an event obs(E[S]). In any case, this is closer to the minimal thing we need to establish the result.

I don't know if the argument works out exactly as I sketched; it's possible that the rich hypothesis assumption needs to be "and also positive weight on a particular enumeration". Given that, we can argue: take one such enumeration; as we continue getting observations consistent with that observation, the hypothesis which predicts it loses no weight, and hypotheses which (eventually) predict other things must (eventually) lose weight; so, the updated probability eventually believes that particular enumeration will continue with probability > 1/2.

On the other hand, that patched definition is certainly less nice. Perhaps there is a better route.

New Comment
33 comments, sorted by Click to highlight new comments since: Today at 10:56 AM

From my perspective, the trouble here comes from the honesty condition. This condition hides an unbounded quantifier: "if the speaker will ever say something, then it is true". So it's no surprise we run into computational complexity and even computability issues.

Consider the following setting. The agent Alice repeatedly interacts with two other entities: Bob and Carol. When Alice interacts with Bob, Bob asks Alice a yes/no question, Alice answers it and receives either +1 or -1 reward depending on whether the answer is correct. When Alice interacts with Carol, Carol tells Alice some question and the answer to that question.

Suppose that Alice starts with some low-information prior and learns over time about Bob and Carol both. The honesty condition becomes "if Carol will ever say and Bob asks the question , then the correct answer is ". But, this condition might be computationally intractable so it is not in the prior and cannot be learned. However, weaker versions of this condition might be tractable, for example "if Carol says at time step between and , and Bob asks at time , then the correct answer is ". Since simulating Bob is still intractable, this condition cannot be expressed as a vanilla Bayesian hypothesis. However, it can be expressed as an incomplete hypothesis. We can also have an incomplete hypothesis that is the conjunction of this weak honesty condition with a full simulation of Carol. Once Alice learned this incomplete hypothesis, ey answer correctly at least those questions which Carol have already taught em or will teach em within 1000 time steps.

I like your example, because "Carol's answers are correct" seems like something very simple, and also impossible for a (bounded) Bayesian to represent. It's a variation of calculator or notepad problems -- that is, the problem of trying to represent a reasoner who has (and needs) computational/informational resources which are outside of their mind. (Calculator/notepad problems aren't something I've written about anywhere iirc, just something that's sometimes on my mind when thinking about logical uncertainty.)

I do want to note that weakening honesty seems like a pretty radical departure from the standard Bayesian treatment of filtered evidence, in any case (for better or worse!). Distinguishing between observing X and X itself, it is normally assumed that observing X implies X. So while our thinking on this does seem to differ, we are agreeing that there are significant points against the standard view.

From outside, the solution you propose looks like "doing the best you can to represent the honesty hypothesis in a computationally tractable way" -- but from inside, the agent doesn't think of it that way. It simply can't conceive of perfect honesty. This kind of thing feels both philosophically unsatisfying and potentially concerning for alignment. It would be more satisfying if the agent could explicitly suspect perfect honesty, but also use tractable approximations to reason about it. (Of course, one cannot always get everything one wants.)

We could modify the scenario to also include questions about Carol's honesty -- perhaps when the pseudo-Bayesian gets a question wrong, it is asked to place a conditional bet about what Carol would say if Carol eventually gets around to speaking on that question. Or other variations along similar lines.

Here's another perspective. Suppose that now Bob and Carol have symmetrical roles: each one asks a question, allows Alice to answer, and then reveals the right answer. Alice gets a reward when ey answer correctly. We can now see that perfect honesty actually is tractable. It corresponds to an incomplete hypothesis. If Alice learns this hypothesis, ey answer correctly any question ey already heard before (no matter who asks now and who asked before). We can also consider a different incomplete hypothesis that allows real-time simulation of Carol. If Alice learns this hypothesis, ey answer correctly any question asked by Carol. However, the conjunction of both hypotheses is already intractable. There's no impediment for Alice to learn both hypotheses: ey can both memorize previous answers and answer all questions by Carol. But, this doesn't automatically imply learning the conjunction.

It's absurd (in a good way) how much you are getting out of incomplete hypotheses. :)

Since this hypothesis makes distinct predictions, it is possible for the confidence to rise above 50% after finitely many observations. At that point, since the listener expects each theorem of PA to eventually be listed, with probability > 50%, and the listener believes the speaker, the listener must assign > 50% probability to each theorem of PA!

I don't see how this follows. At the point where the confidence in PA rises above 50%, why can't the agent be mistaken about what the theorems of PA are? For example, let T be a theorem of PA that hasn't been claimed yet. Why can't the agent believe P(claims-T) = 0.01 and P(claims-not-T) = 0.99? It doesn't seem like this violates any of your assumptions. I suspect you wanted to use Assumption 2 here:

A listener believes a speaker to be honest if the listener distinguishes between "X" and "the speaker claims X at time t" (aka "claimst-X"), and also has beliefs such that P(X| claimst-X)=1 when P(claims-X) > 0.

But as far as I can tell the scenario I gave is compatible with that assumption.

I think there is some confusion here coming from the unclear notion of a Bayesian agent with beliefs about theorems of PA. The reformulation I gave with Alice, Bob and Carol makes the problem clearer, I think.

Yeah, I did find that reformulation clearer, but it also then seems to not be about filtered evidence?

Like, it seems like you need two conditions to get the impossibility result, now using English instead of math:

1. Alice believes Carol is always honest (at least with probability > 50%)

2. For any statement s: [if Carol will ever say s, Alice currently believes that Carol will eventually say s (at least with probability > 50%)]

It really seems like the difficulty here is with condition 2, not with condition 1, so I don't see how this theorem has anything to do with filtered evidence.

Maybe the point is just "you can't perfectly update on X and Carol-said-X , because you can't have a perfect model of them, because you aren't bigger than they are"?

(Probably you agree with this, given your comment.)

The problem is not in one of the conditions separately but in their conjunction: see my follow-up comment. You could argue that learning an exact model of Carol doesn't really imply condition 2 since, although the model does imply everything Carol is ever going to say, Alice is not capable of extracting this information from the model. But then it becomes a philosophical question of what does it mean to "believe" something. I think there is value in the "behaviorist" interpretation that "believing X" means "behaving optimally given X". In this sense, Alice can separately believe the two facts described by conditions 1 and 2, but cannot believe their conjunction.

I still don't get it but probably not worth digging further. My current confusion is that even under the behaviorist interpretation, it seems like just believing condition 2 implies knowing all the things Carol would ever say (or Alice has a mistaken belief). Probably this is a confusion that would go away with enough formalization / math, but it doesn't seem worth doing that.

I'm not sure exactly what the source of your confusion is, but:

I don't see how this follows. At the point where the confidence in PA rises above 50%, why can't the agent be mistaken about what the theorems of PA are?

The confidence in PA as a hypothesis about what the speaker is saying is what rises above 50%. Specifically, an efficiently computable hypothesis eventually enumerating all and only the theorems of PA rises above 50%.

For example, let T be a theorem of PA that hasn't been claimed yet. Why can't the agent believe P(claims-T) = 0.01 and P(claims-not-T) = 0.99? It doesn't seem like this violates any of your assumptions.

This violates the assumption of honesty that you quote, because the agent simultaneously has P(H) > 0.5 for a hypothesis H such that P(obs_n-T | H) = 1, for some (possibly very large) n, and yet also believes P(T) < 0.5. This is impossible since it must be that P(obs_n-T) > 0.5, due to P(H) > 0.5, and therefore must be that P(T) > 0.5, by honesty.

Yeah, I feel like while honesty is needed to prove the impossibility result, the problem arose with the assumption that the agent could effectively reason now about all the outputs of a recursively enumerable process (regardless of honesty). Like, the way I would phrase this point is "you can't perfectly update on X and Carol-said-X , because you can't have a perfect model of Carol"; this applies whether or not Carol is honest. (See also this comment.)

I agree with your first sentence, but I worry you may still be missing my point here, namely that the Bayesian notion of belief doesn't allow us to make the distinction you are pointing to. If a hypothesis implies something, it implies it "now"; there is no "the conditional probability is 1 but that isn't accessible to me yet".

I also think this result has nothing to do with "you can't have a perfect model of Carol". Part of the point of my assumptions is that they are, individually, quite compatible with having a perfect model of Carol amongst the hypotheses.

the Bayesian notion of belief doesn't allow us to make the distinction you are pointing to

Sure, that seems reasonable. I guess I saw this as the point of a lot of MIRI's past work, and was expecting this to be about honesty / filtered evidence somehow.

I also think this result has nothing to do with "you can't have a perfect model of Carol". Part of the point of my assumptions is that they are, individually, quite compatible with having a perfect model of Carol amongst the hypotheses.

I think we mean different things by "perfect model". What if I instead say "you can't perfectly update on X and Carol-said-X , because you can't know why Carol said X, because that could in the worst case require you to know everything that Carol will say in the future"?

Sure, that seems reasonable. I guess I saw this as the point of a lot of MIRI’s past work, and was expecting this to be about honesty / filtered evidence somehow.

Yeah, ok. This post as written is really less the kind of thing somebody who has followed all the MIRI thinking needs to hear and more the kind of thing one might bug an orthodox Bayesian with. I framed it in terms of filtered evidence because I came up with it by thinking about some confusion I was having about filtered evidence. And it does problematize the Bayesian treatment. But in terms of actual research progress it would be better framed as a negative result about whether Sam's untrollable prior can be modified to have richer learning.

I think we mean different things by “perfect model”. What if [...]

Yep, I agree with everything you say here.

This seems related to my speculations about multi-agent alignment. In short, for embedded agents, having a tractable complexity of building models of other decision processes either requires a reflexively consistent view of their reactions to modeling my reactions to their reactions, etc. - or it requires simplification that clearly precludes ideal Bayesian agents. I made the argument much less formally, and haven't followed the math in the post above (I hope to have time to go through more slowly at some point.)

To lay it out here, the basic argument in the paper is that even assuming complete algorithmic transparency, in any reasonably rich action space, even games as simple as poker become completely intractable to solve. Each agent needs to simulate a huge space of possibilities for the decision of all other agents in order to make a decision about what the probability is that the agent is in each potential position. For instance, what is the probability that they are holding a hand much better than mine and betting this way, versus that they are bluffing, versus that they have a roughly comparable strength hand and are attempting to find my reaction, etc. But evaluating this requires evaluating the probability that they assign to me reacting in a given way in each condition, etc. The regress may not be infinite, because the space of states is finite, as is the computation time, but even in such a simple world it grows too quickly to allow fully Bayesian agents within the computational capacity of, say, the physical universe.

Since this hypothesis makes distinct predictions, it is possible for the confidence to rise above 50% after finitely many observations.

I was confused about why this is the case. I now think I've got an answer (please anyone confirm):
The description length of the Turing Machine enumerating theorems of PA is constant. The description length of any Turing Machine that enumerates theorems of PA up until time-step n and the does something else grows with n (for big enough n). Since any probability prior over Turing Machines has an implicit simplicity bias, no matter what prior we have, for big enough n the latter Turing Machines will (jointly) get arbitrarily low probability relative to the first one. Thus, after enough time-steps, given all observations are PA theorems, our listener will assign arbitrarily higher probability to the first one than all the rest, and thus the first one will be over 50%.

Edit: Okay, I now saw you mention the "getting over 50%" problem further down:

I don't know if the argument works out exactly as I sketched; it's possible that the rich hypothesis assumption needs to be "and also positive weight on a particular enumeration". Given that, we can argue: take one such enumeration; as we continue getting observations consistent with that observation, the hypothesis which predicts it loses no weight, and hypotheses which (eventually) predict other things must (eventually) lose weight; so, the updated probability eventually believes that particular enumeration will continue with probability > 1/2.

But I think the argument goes through already with the rich hypothesis assumption as initially stated. If the listener has non-zero prior probability on the speaker enumerating theorems of PA, it must have non-zero probability on it doing so in a particular enumeration. (unless our specification of the listener structure doesn't even consider different enumerations? but I was just thinking of their hypothesis space as different Turing Machines the whole time) And then my argument above goes through, which I think is just your argument + explicitly mentioning the additional required detail about the simplicity prior.

Sounds right to me.

I follow most of this, but... uh, what's PA?

Upon reflection, it seems you're talking about Peano arithmetic. Would be good if the post spelled that out.

Yep. I'll insert some clarification.

The set of all (valid) Peano Arithmetic theorems is not computably enumerable- wouldn't this mean that a listener with a rich hypothesis space doesn't need to assign a probability to the speaker enumerating them (thereby undermining the argument)?

I guess an important distinction is between computable and computably enumerable- I'm not perfect with CS, but it seems that this proof rests on the PA theorems being computably enumerable, but not computable themselves- is my read on this correct? But it seems to me that one can only enumerate the (valid) theorems by actually computing them

The set of theorems of PA is computably enumerable, but the set of theorems and anti-theorems (things provably false) is not computably separable, IE there is no computation which returns "true" for theorems, "false" for anti-theorems, and always returns some answer (we don't care specifically what that answer is for anything that's not a theorem or anti-theorem).

Assumption 3. A listener is said to have minimally consistent beliefs if each proposition X has a negation X*, and P(X)+P(X*)≤1.

One thing that's interesting to me is that this is assumption is frequently not satisfied in real life due to underspecification, e.g. P(I'm happy) + P(I'm not happy) ≥ 1 because "happy" may be underspecified. I can't think of a really strong minimal example, but I feel like this pops up a lot of discussions on complex issues where a dialectic develops because neither thesis nor antithesis capture everything and so both are underspecified in ways that make their naive union exceed the available probability mass.

  • Believe the speaker to be honest.
  • Have minimally consistent beliefs.

That's sufficient to be a contradiction, since it's possible for the speaker to say "X" and then say "X*". You have to constrain the speaker to say true things, and thus for the speaker to begin enumerating axioms of Peano Arithmetic, you have to assume those axioms to be true, thus contradiction.

It's sufficient to allow an adversarial (and dishonest) speaker to force a contradiction, sure. But the theorem is completely subjective. It says that even from the agent's perspective there is a problem. IE, even if we think the speaker to be completely honest, we can't (computably) have (even minimally) consistent beliefs. So it's more surprising than simply saying that if we believe a speaker to be honest then that speaker can create a contradiction by lying to us. (At least, more surprising to me!)

If we think the speaker to be completely honest (incapable of saying false things), and also we have nonzero belief that the speaker will enumerate axioms of Peano Arithmetic, then we already believe a contradiction (because assuming that a proof of X implies X creates all the contradictions).

Adding to the axioms of PA, the statement "a proof of X from the axioms of PA implies X", does not create any contradictions. This is just the belief that PA is sound.

What would be contradictory would be for PA itself to believe that PA is sound. It is fine for an agent to have the belief that PA is sound.

There might be some technicality under which you're not simply wrong. https://en.wikipedia.org/wiki/L%C3%B6b%27s_theorem

If the listener is running a computable logical uncertainty algorithm, then for a difficult proposition it hasn't made much sense of, the listener might say "70% likely it's a theorem and X will say it, 20% likely it's not a theorem and X won't say it, 5% PA is inconsistent and X will say both, 5% X isn't naming all and only theorems of PA".

Conditioned on PA being consistent and on X naming all and only theorems of PA, and on the listener's logical uncertainty being well-calibrated, you'd expect that in 78% of such cases X eventually names it.

But you can't use the listener's current probabilities on [X saying it] to sort out theorems from non-theorems in a way that breaks computability!

What am I missing?

Is there any meaningful distinction between filtered evidence and lying? I know that in toy models these can be quite different, but in the expansive setting here, where the speaker can select the most misleading technically true fact, is there any major difference?

And how would the results here look if we expended it to allow the speaker to lie?

Here's one way to extend a result like this to lying. Rather than assume honesty, we could assume observations carry sufficiently much information about the truth. This is like saying that sensory perception may be fooled, but in the long run, bears a strong enough connection to reality for us to infer a great deal. Something like this should imply the same computational difficulties.

I'm not sure exactly how this assumption should be spelled out, though.

Promoted to curated: This post is a bit more technical than the usual posts we curate, but I think it is still quite valuable to read for a lot of people, since it's about a topic that has already received some less formal treatment on LessWrong. 

I also am very broadly excited about trying move beyond a naive bayesianism paradigm, and felt like this post helped me significantly in understanding what that would look like. 

[+][comment deleted]4yΩ120