Is there any meaningful distinction between filtered evidence and lying? I know that in toy models these can be quite different, but in the expansive setting here, where the speaker can select the most misleading technically true fact, is there any major difference?

And how would the results here look if we expended it to allow the speaker to lie?

Here's one way to extend a result like this to lying. Rather than assume honesty, we could assume observations carry sufficiently much information about the truth. This is like saying that sensory perception may be fooled, but in the long run, bears a strong enough connection to reality for us to infer a great deal. Something like this should imply the same computational difficulties.

I'm not sure exactly how this assumption should be spelled out, though.

Thinking About Filtered Evidence Is (Very!) Hard

by abramdemski 13 min read19th Mar 202025 comments

76

Ω 27


Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

The content of this post would not exist if not for conversations with Zack Davis, and owes something to conversations with Sam Eisenstat.

There's been some talk about filtered evidence recently. I want to make a mathematical observation which causes some trouble for the Bayesian treatment of filtered evidence. [OK, when I started writing this post, it was "recently". It's been on the back burner for a while.]

This is also a continuation of the line of research about trolling mathematicians, and hence, relevant to logical uncertainty.

I'm going to be making a mathematical argument, but, I'm going to keep things rather informal. I think this increases the clarity of the argument for most readers. I'll make some comments on proper formalization at the end.


Alright, here's my argument.

According to the Bayesian treatment of filtered evidence, you need to update on the fact that the fact was presented to you, rather than the raw fact. This involves reasoning about the algorithm which decided which facts to show you. The point I want to make is that this can be incredibly computationally difficult, even if the algorithm is so simple that you can predict what it will say next. IE, I don't need to rely on anything like "humans are too complex for humans to really treat as well-specified evidence-filtering algorithms".

For my result, we imagine that a Bayesian reasoner (the "listener") is listening to a series of statements made by another agent (the "speaker").

First, I need to establish some terminology:

Assumption 1. A listener will be said to have a rich hypothesis space if the listener assigns some probability to the speaker enumerating any computably enumerable set of statements.

The intuition behind this assumption is supposed to be: due to computational limitations, the listener may need to restrict to some set of easily computed hypotheses; for example, the hypotheses might be poly-time or even log-poly. This prevents hypotheses such as "the speaker is giving us the bits of a halting oracle in order", as well as "the speaker has a little more processing power than the listener". However, the hypothesis space is not so restricted as to limit the world to being a finite-state machine. The listener can imagine the speaker proving complicated theorems, so long as it is done sufficiently slowly for the listener to keep up. In such a model, the listener might imagine the speaker staying quiet for quite a long time (observing the null string over and over, or some simple sentence such as 1=1) while a long computation completes; and only then making a complicated claim.

This is also not to say that I assume my listener considers only hypotheses in which it can 100% keep up with the speaker's reasoning. The listener can also have probabilistic hypotheses which recognize its inability to perfectly anticipate the speaker. I'm only pointing out that my result does not rely on a speaker which the listener can't keep up with.

What it does rely on is that there are not too many restrictions on what the speaker eventually says.

Assumption 2. A listener believes a speaker to be honest if the listener distinguishes between "X" and "the speaker claims X at time t" (aka "claims-X"), and also has beliefs such that P(X| claims-X)=1 when P(claims-X) > 0.

This assumption is, basically, saying that the agent trusts its observations; the speaker can filter evidence, but the speaker cannot falsify evidence.

Maybe this assumption seems quite strong. I'll talk about relaxing it after I sketch the central result.

Assumption 3. A listener is said to have minimally consistent beliefs if each proposition X has a negation X*, and P(X)+P(X*)1.

The idea behind minimally consistent beliefs is that the listener need not be logically omniscient, but does avoid outright contradictions. This is important, since assuming logical omniscience would throw out computability from the start, making any computational-difficulty result rather boring; but totally throwing out logic would make my result impossible. Minimal consistency keeps an extremely small amount of logic, but, it is enough to prove my result.

Theorem(/Conjecture). It is not possible for a Bayesian reasoner, observing a sequence of remarks made by a speaker, to simultaneously:

  • Have a rich hypothesis space.
  • Believe the speaker to be honest.
  • Have minimally consistent beliefs.
  • Have computable beliefs.

Proof sketch. Suppose assumptions 1-3. Thanks to the rich hypothesis space assumption, the listener will assign some probability to the speaker enumerating theorems of PA (Peano Arithmetic). Since this hypothesis makes distinct predictions, it is possible for the confidence to rise above 50% after finitely many observations. At that point, since the listener expects each theorem of PA to eventually be listed, with probability > 50%, and the listener believes the speaker, the listener must assign > 50% probability to each theorem of PA! But this implies that the listener's beliefs are not computable, since if we had access to them we could separate theorems of PA from contradictions by checking whether a sentence's probability is > 50%.

So goes my argument.


What does the argument basically establish?

The argument is supposed to be surprising, because minimally consistent beliefs are compatible with computable beliefs; and rich hypothesis space is compatible with beliefs which are computable on observations alone; yet, when combined with a belief that the speaker is honest, we get an incomputability result.

My take-away from this result is that we cannot simultaneously use our unrestricted ability to predict sensory observations accurately and have completely coherent beliefs about the world which produces those sensory observations, at least if our "bridge" between the sensory observations and the world includes something like language (whereby sensory observations contain complex "claims" about the world).

This is because using the full force of our ability to predict sensory experiences includes some hypotheses which eventually make surprising claims about the world, by incrementally computing increasingly complicated information (like a theorem prover which slowly but inevitably produces all theorems of PA). In other words, a rich sensory model contains implicit information about the world which we cannot immediately compute the consequences of (in terms of probabilities about the hidden variables out there in the world). This "implicit" information can be necessarily implicit, in the same way that PA is necessarily incomplete.

To give a non-logical example: suppose that your moment-to-moment anticipations of your relationship with a friend are pretty accurate. It might be that if you roll those anticipations forward, you inevitably become closer and closer until the friendship becomes a romance. However, you can't necessarily predict that right now; even though the anticipation of each next moment is relatively easy, you face a halting-problem-like difficulty if you try to anticipate what the eventual behavior of your relationship is. Because our ability to look ahead is bounded, each new consequence can be predictable without the overall outcome being predictable.

Thus, in order for an agent to use the full force of its computational power on predicting sensory observations, it must have partial hypotheses -- similar to the way logical induction contains traders which focus only on special classes of sentences, or Vanessa's incomplete Bayesianism contains incomplete hypotheses which do not try to predict everything.

So, this is an argument against strict Bayesianism. In particular, it is an argument against strict Bayesianism as a model of updating on filtered evidence! I'll say more about this, but first, let's talk about possible holes in my argument.


Here are some concerns you might have with the argument.

One might possibly object that the perfect honesty requirement is unrealistic, and therefore conclude that the result does not apply to realistic agents.

  • I would point out that the assumption is not so important, so long as the listener can conceive of the possibility of perfect honesty, and assigns it nonzero probability. In that case, we can consider P(X|honesty) rather than P(X). Establishing that some conditional beliefs are not computable seems similarly damning.
  • Furthermore, because the "speaker" is serving the role of our observations, the perfect honesty assumption is just a version of P(X|observe-X)=1. IE, observing X gives us X. This is true in typical filtered-evidence setups; IE, filtered evidence can be misleading, but it can't be false.
  • However, one might further object that agents need not be able to conceive of "perfect honesty", because this assumption has an unrealistically aphysical, "perfectly logical" character. One might say that all observations are imperfect; none are perfect evidence of what is observed. In doing so, we can get around my result. This has some similarity to the assertion that zero is not a valid probability. I don't find this response particularly appealing, but I also don't have a strong argument against it.

Along similar lines, one might object that the result depends on an example ("the speaker is enumerating theorems") which comes from logic, as opposed to any realistic physical world-model. The example does have a "logical" character -- we're not explicitly reasoning about evidence-filtering algorithms interfacing with an empirical world and selectively telling us some things about it. However, I want to point out that I've assumed extremely little "logic" -- the only thing I use is that you don't expect a sentence and its negation to both be true. Observations corresponding to theorems of PA are just an example used to prove the result. The fact that P(X) can be very hard to compute even when we restrict to easily computed P(claims-X) is very general; even if we do restrict attention to finite-state-machine hypotheses, we are in P-vs-NP territory.


What does this result say about logical uncertainty?

Sam's untrollable prior beat the trollable-mathematician problem by the usual Bayesian trick of explicitly modeling the sequence of observations -- updating on I-observed-X-at-this-time rather than only X. (See also the illustrated explanation.)

However, it did so at a high cost: Sam's prior is dumb. It isn't able to perform rich Occam-style induction to divine the hidden rules of the universe. It doesn't believe in hidden rules; it believes "if there's a law of nature constraining everything to fit into a pattern, I will eventually observe that law directly." It shifts its probabilities when it makes observations, but, in some sense, it doesn't shift them very much; and indeed, that property seems key to the computability of that prior.

So, a natural question arises: is this an essential property of an untrollable prior? Or can we construct a "rich" prior which entertains hypotheses about the deep structure of the universe, learning about them in an Occam-like way, which is nonetheless still untrollable?

The present result is a first attempt at an answer: given my (admittedly a bit odd) notion of rich hypothesis space, it is indeed impossible to craft a computable prior over logic with some minimal good properties (like believing what's proved to it). I don't directly address a trollability-type property, unfortunately; but I do think I get close to the heart of the difficulty: a "deep" ability to adapt in order to predict data better stands in contradiction with computability of the latent probability-of-a-sentence.


So, how should we think about filtered evidence?

Orthodox Bayesian (OB): We can always resolve the problem by distinguishing between X and "I observe X", and conditioning on all the evidence available. Look how nicely it works out in the Monty Hall problem and other simple examples we can write down.

Skeptical Critique (SC): You're ignoring the argument. You can't handle cases where running your model forward is easier than answering questions about what happens eventually; in those cases, many of your beliefs will either be uncomputable or incoherent.

OB: That's not a problem for me. Bayesian ideals of rationality apply to the logically omniscient case. What they give you is an idealized notion of rationality, which defines the best an agent could do.

SC: Really? Surely your Bayesian perspective is supposed to have some solid implications for finite beings who are not logically omniscient. I see you giving out all this advice to machine learning programmers, statisticians, doctors, and so on.

OB: Sure. We might not be able to achieve perfect Bayesian rationality, but whenever we see something less Bayesian than it could be, we can correct it. That's how we get closer to the Bayesian ideal!

SC: That sounds like cargo-cult Bayesianism to me. If you spot an inconsistency, it matters how you correct it; you don't want to go around correcting for the planning fallacy by trying to do everything faster, right? Similarly, if your rule-of-thumb for the frequency of primes is a little off, you don't want to add composite numbers to your list of primes to fudge the numbers.

OB: No one would make those mistakes.

SC: That's because there are, in fact, rationality principles which apply. You don't just cargo-cult Bayesianism by correcting inconsistencies any old way. A boundedly rational agent has rationality constraints which apply, guiding it to better approximate "ideal" rationality. And those rationality constraints don't actually need to refer to the "ideal" rationality. The rationality constraints are about the update, not in the ideal which the update limits to.

OB: Maybe we can imagine some sort of finite Bayesian reasoner, who treats logical uncertainty as a black box, and follows the evidence toward unbounded-Bayes-optimality in a bounded-Bayes-optimal way...

SC: Maybe, but I don't know of a good picture which looks like that. The picture we do have is given by logical induction: we learn to avoid Dutch books by noticing lots of Dutch books against ourselves, and gradually becoming less exploitable.

OB: That sounds a lot like the picture I gave.

SC: Sure, but it's more precise. And more importantly, it's not a Bayesian update -- there is a kind of family resemblance in the math, but it isn't learning through a Bayesian update in a strict sense.

OB: Ok, so what does all this have to do with filtered evidence? I still don't see why the way I handle that is wrong.

SC: Well, isn't the standard Bayesian answer a little suspicious? The numbers conditioning on X don't come out to what you want, so you introduce something new to condition on, observe-X, which can have different conditional probabilities. Can't you get whatever answer you want, that way?

OB: I don't think so? The numbers are dictated by the scenario. The Monty Hall problem has a right answer, which determines how you should play the game if you want to win. You can't fudge it without changing the game.

SC: Fair enough. But I still feel funny about something. Isn't there an infinite regress? We jump to updating on observe-X when X is filtered. What if observe-X is filtered? Do we jump to observe-observe-X? What if we can construct a "meta Monty-Hall problem" where it isn't sufficient to condition on observe-X?

OB: If you observe, you observe that you observe. And if you observe that you observe, then you must observe. So there's no difference.

SC: If you're logically perfect, sure. But a boundedly rational agent need not realize immediately that it observed X. And certainly it need not realize and update on the entire sequence "X", "I observed X", "I observed that I observed X", and so on.

OB: Ok...

SC: To give a simple example: call a sensory impression "subliminal" when it is weak enough that only X is registered. A stronger impression also registers "observe-X", making the sensory impression more "consciously available". Then, we cannot properly track the effects of filtered evidence for subliminal impressions. Subliminal impressions would always register as if they were unfiltered evidence.

OB: ...no.

SC: What's wrong?

OB: An agent should come with a basic notion of sensory observation. If you're a human, that could be activation in the nerves running to sensory cortex. If you're a machine, it might be RBG pixel values coming from a camera. That's the only thing you ever have to condition on; all your evidence has that form. Observing a rabbit means getting pixel values corresponding to a rabbit. We don't start by conditioning on "rabbit" and then patch things by adding "observe-rabbit" as an additional fact. We condition on the complicated observation corresponding to the rabbit, which happens to, by inference, tell us that there is a rabbit.

SC: That's... a bit frustrating.

OB: How so?

SC: The core Bayesian doctrine is the Kolmogorov axioms, together with the rule that we update beliefs via Bayesian conditioning. A common extension of Bayesian doctrine grafts on a distinction between observations and hypotheses, naming some special events as observable, and others as non-observable hypotheses. I want you to notice when you're using the extension rather than the core.

OB: How is that even an extension? It just sounds like a special case, which happens to apply to just about any organism.

SC: But you're restricting the rule "update beliefs by Bayesian conditioning" -- you're saying that it only works for observations, not for other kinds of events.

OB: Sure, but you could never update on those other kinds of events anyway.

SC: Really, though? Can't you? Some information you update on comes from sensory observations, but other information comes from reasoning. Something like a feedforward neural network just computes one big function on sense-data, and can probably be modeled in the way you're suggesting. But something like a memory network has a nontrivial reasoning component. A Bayesian can't handle "updating" on internal calculations it's completed; at best they're treated as if they're black boxes whose outputs are "observations" again.

OB: Ok, I see you're backing me into a corner with logical uncertainty stuff again. I still feel like there should be a Bayesian way to handle it. But what does this have to do with filtered evidence?

SC: The whole point of the argument we started out discussing is that if you have this kind of observation/hypothesis divide, and have sufficiently rich ways of predicting sensory experiences, and remain a classical Bayesian, then your beliefs about the hidden information are not going to be computable, even if your hypotheses themselves are easy to compute. So we can't realistically reason about the hidden information just by Bayes-conditioning on the observables. The only way to maintain both computability and a rich hypothesis space under these conditions is to be less Bayesian, allowing for more inconsistencies in your beliefs. Which means, reasoning about filtered evidence doesn't reduce to applying Bayes' Law.

OB: That... seems wrong.

SC: Now we're getting somewhere!


All that being said, reasoning about filtered evidence via Bayes' Law in the orthodox way still seems quite practically compelling. The perspective SC puts forward in the above dialogue would be much more compelling if I had more practical/interesting "failure-cases" for Bayes' Law, and more to say about alternative ways of reasoning which work better for those cases. A real "meta Monty-Hall problem".

Arguably, logical induction doesn't use the "condition on the fact that X was observed" solution:

  • Rather than the usual sequential prediction model, logical induction accommodates information coming in for any sentence, in any order. So, like the "core of Bayesianism" mentioned by SC, it maintains its good properties without special assumptions about what is being conditioned on. This is in contrast to, e.g., Solomonoff induction, which uses the sequential prediction model.
  • In particular, in Monty Hall, although there is a distinction between the sentence "there is a goat behind door 3" and "the LI discovers, at time t, that there is a goat behind door 3" (or suitable arithmetizations of these sentences), we can condition on the first rather than the second. A logical inductor would learn to react to this in the appropriate way, since doing otherwise would leave it Dutch-bookable.

One might argue that the traders are implicitly using the standard Bayesian "condition on the fact that X was observed" solution in order to accomplish this. Or that the update an LI performs upon seeing X is always that it saw X. But to me, this feels like stretching things. The core of the Bayesian method for handling filtered evidence is to distinguish between X and the observation of X, and update on the latter. A logical inductor doesn't explicitly follow this, and indeed appears to violate it. Part of the usual idea seems to be that a Bayesian needs to "update on all the evidence" -- but a logical inductor just gets a black-box report of X, without any information on how X was concluded or where it came from. So information can be arbitrarily excluded, and the logical inductor will still do its best (which, in the case of Monty Hall, appears to be sufficient to learn the correct result).


A notable thing about the standard sort of cases, where the Bayesian way of reasoning about filtered evidence is entirely adequate, is that you have a gears-level model of what is going on -- a causal model, which you can turn the crank on. If you run such a model "forward" -- in causal order -- you compute the hidden causes before you compute the filtered evidence about them. This makes it sound as if predicting the hidden variables should be easier than predicting the sensory observations; and, certainly makes it hard to visualize the situation where it is much much harder.

However, even in cases where we have a nice causal model like that, inferring the hidden variables from what is observed can be intractably computationally difficult, since it requires reverse-engineering the computation from its outputs. Forward-sampling causal models is always efficient; running them backwards, not so.

So even with causal models, there can be good reason to engage more directly with logical uncertainty rather than use pure Bayesian methods.

However, I suspect that one could construct a much more convincing example if one were to use partial models explicitly in the construction of the example. Perhaps something involving an "outside view" with strong empirical support, but lacking a known "inside view" (lacking a single consistent causal story).

Unfortunately, such an example escapes me at the moment.


Finally, some notes on further formalisation of my main argument.

The listener is supposed to have probabilistic beliefs of the standard variety -- an event space which is a sigma-algebra, and which has a P(event) obeying the Kolmogorov axioms. In particular, the beliefs are supposed to be perfectly logically consistent in the usual way.

However, in order to allow logical uncertainty, I'm assuming that there is some embedding of arithmetic; call it E[]. So, for each arithmetic sentence S, there is an event E[S]. Negation gets mapped to the "star" of an event: E[¬S] = (E[S])*. This need not be the compliment of the event E[S]. Similarly, the embedding E[AB] need not be E[A]E[B]; E[AB] need not be E[A]E[B]; and so on. That's what allows for logical non-omniscience -- the probability distribution doesn't necessarily know that E[AB] should act like E[A]E[B], and so on.

The more we impose requirements which force the embedding to act like it should, the more logical structure we are forcing onto the beliefs. If we impose very much consistency, however, then that would already imply uncomputability and the central result would not be interesting. So, the "minimal consistency" assumption requires very little of our embedding. Still, it is enough for the embedding of PA to cause trouble in connection with the other assumptions.

In addition to all this, we have a distinguished set of events which count as observations. A first pass on this is that for any event A, there is an associated event obs(A) which is the observation of A. But I do worry that this includes more observation events than we want to require. Some events A do not correspond to sentences; sigma-algebras are closed under countable unions. If we think of the observation events as claims made by the speaker, it doesn't make sense to imagine the speaker claiming a countable union of sentences (particularly not the union of an uncomputable collection).

So, more conservatively, we might say that for events E[S], that is events in the image of the embedding, we also have an event obs(E[S]). In any case, this is closer to the minimal thing we need to establish the result.

I don't know if the argument works out exactly as I sketched; it's possible that the rich hypothesis assumption needs to be "and also positive weight on a particular enumeration". Given that, we can argue: take one such enumeration; as we continue getting observations consistent with that observation, the hypothesis which predicts it loses no weight, and hypotheses which (eventually) predict other things must (eventually) lose weight; so, the updated probability eventually believes that particular enumeration will continue with probability > 1/2.

On the other hand, that patched definition is certainly less nice. Perhaps there is a better route.

76

Ω 27