Lately I've been reading a lot about the problem of logical uncertainty. The problem is that there are logical consequences of your beliefs that you're uncertain about.
So, for statements like "the billionth digit of pi is even", it's provable from your beliefs. But you're still uncertain of it. So, the problem is, what's its probability? Well, 1/2, probably, but from what principled theory can you derive that?
That's the question. What probabilities we should assign to logical statements, and what laws of probability apply to them.
With all the time I've been spending on it, I've been wondering, will I ever have the opportunity to use it? My conclusion is, probably not me, personally. But I have come up with a decent list of ways other people might use it.
Combining information from simulation and experiment
This is why I first became interested in the problem--I was working in protein structure prediction.
The problem in protein structure prediction is figuring out how the protein folds. A protein is a big molecule. They're always these long, linear chains of atoms with relatively short side-branches. Interestingly, there are easy experiments to learn what they'd look like all stretched out in a line like this. But in reality, this long chain is folded up in a complex way. So, the problem is: given the linear, stretched out structure, what's the folded structure?
We have two general approaches to solve this problem.
One is a physics calculation. A protein folds because of electromagnetic forces, and we know how those work. So, you can set up a quantum physics simulation and end up with the correct fold. But you won't really end up with anything because your simulation will never finish if you do it precisely. So a lot of work goes into coming up with realistic approximate calculations, and using bigger computers. (Or distributed computing - Rosetta@Home distributes the calculations over the idle time of people's home computers, and you can download it now if you want your computer to help solve protein structures.)
Another is by experiment. We get some kind of reading from the protein structure--X-rays or radio waves, usually--and use that to solve the structure. This can be thought of as Bayesian inference, and, in fact, that's how some people do it--see a book dear to my heart, Bayesian Methods in Structural Bioinformatics. Some of the chapters are about P(observations|structure), and some are about the prior distribution, P(structure). This prior usually comes from experience with previous protein structures.
(A third approach that doesn't really fit into this picture is the videogame FoldIt.)
But, it occurred to me, doesn't the simulation provide a prior?
This is actually a classic case of logical uncertainty. Our beliefs actually imply a fold--we know the linear chain, we know the laws of physics, from that you can theoretically calculate the folded structure. But we can't actually do it, which is why we need experiments, the experiments are actually resolving logical uncertainty.
My dream is for us to be able to do that explicitly, in the math. To calculate a prior using the laws of physics plus logical uncertainty, and then condition on experimental evidence to resolve that uncertainty.
There are two people that I know of, doing research that resembles this. One is Francesco Stingo. He published a method for detecting binding between two different kinds of molecules--miRNA and mRNA. His method has a prior that is based in part on chemistry-based predictions of binding, and updated on the results of microarray experiments. The other is Cari Kaufman, who builds probability distributions over the results of a climate simulation. (the idea seems to be to extrapolate from simulations actually run with similar but not identical parameters)
What I have in mind would accomplish the same kind of thing in a different way--the prior would come from a general theory for assigning probabilities to statements of mathematics, and "the laws of physics predict these two molecules bind" would be one such statement.
Automated theorem proving
I don't know anything about this field, but this kind of makes sense--what if you could decide which lemmas to try and prove based on their probability? That'd be cool.
Or, decide which lemmas are important with value of information calculations.
(AlanCrowe talked about something like this in a comment)
I don't really understand this, something about a "Löbian obstacle." But Squark has a bit of an overview at "Overcoming the Loebian obstacle using evidence logic". And qmaurmann wrote something on a similar topic, "Meditations on Löb’s theorem and probabilistic logic." eli_sennesh also recently wrote a related post.
There's an interesting result, in this line of research. It says that, although it's well known that a formal system cannot contain a predicate representing truth in that system, this is actually just because formal proof systems only assign certainties of 0 and 1. If they instead gave false statements probability infinitesimally close to 0, and true statements infinitesimally close to 1, they could define truth. (Disclaimer: I can't follow all the math and the result has not yet been published in a peer reviewed journal). This is such an odd use of probabilistic logical uncertainty, nothing like the applications I've been imagining, but there it is.
(walkthrough of the paper here)
Philosophically, I want to know how you calculate the rational degree of belief in every proposition. Good philosophical theories don't necessarily need to be practical to use. Even if it's impossible to compute, it's got to give the right answer in simple thought experiments, so I feel like I can reason without paradox. But even getting all these simple thought experiments right is hard. Good philosophical theories are hard to come by.
Bayesian confirmation theory has been a successful philosophical theory for thought experiments involving the confirmation of theories by evidence. It tells you how belief in a hypothesis should go up or down after seeing observational evidence.
But what about mathematical evidence, what about derivations? Wasn't Einstein's derivation that general relativity predicts the orbit of Mercury evidence in favor of the theory? I want a philosophical theory, I want basic principles, that will tell me "yes." That will tell me the rational degree of belief is higher if I know the derivation.
Philosophers frame this as the "problem of old evidence." They think of it like: you come up with a new theory like GR, consider old evidence like Mercury, and somehow increase your confidence in the new theory. The "paradox" is that you didn't learn of any new evidence, but your confidence in the theory still changed. Philosopher Daniel Garber argues that the new evidence is a mathematical derivation, and the problem would be solved by extending Bayesian probability theory to logical uncertainty. (The paper is called "Old Evidence and Logical Omniscience in Bayesian Confirmation Theory," and can be found here)
A more general philosophical theory than Bayesian confirmation theory is Solomonoff induction, which provides a way to assign probabilities to any observation given any past observations. It requires infinite computing power. However, there still is a rational degree of belief for an agent with finite computing power to have, given the computations that they've done. I want the complete philosophical theory that tells me that.
Learning about probabilistic logical uncertainty
I made a list of references here, if you'd like to learn more about the subject.
As far as AI is concerned, you don't need to go anywhere near as specialised as FAI to find something where logical uncertainty is directly applicable.
Every search problem in AI is an instance of logical uncertainty, and every search algorithm is a different way of attempting to deal with that uncertainty.
This is why I think logical uncertainty is a "dual use" problem, rather than an FAI problem. Lack of a deep understanding of the nature of logical uncertainty could be holding up a lot of AI progress, and I'm not sure why people aren't more concerned about this.
It's true that this is a case of logical uncertainty.
However, I must add that in most of my examples, I bring up the benefits of a probabilistic representation. Just because you have logical uncertainty doesn't mean you need to represent it with probability theory.
In protein structure, we already have these Bayesian methods for inferring the fold, so the point of the probabilistic representation is to plug it i these methods as a prior. In philosophy, we want ideal rationality, which suggests probability. In automated theorem proving... okay, yeah, in automated theorem proving I can't explain why you'd want to use probability theory in particular.
But yes. If you had a principled way to turn your background information and already done computations into a probability distribution for future computations, you could use that for AI search problems. And optimization problems. Wow, that's a lot of problems. I'm not sure how it would stack up against other methods, but it'd be interesting if that became a paradigm for at least some problems.
In fact, now that you've inspired me to look for it, I find that it's being done! Not with the approach of coming up with a distribution over all mathematical statements that you see in Christiano's report, and which is the approach I had in mind when writing the post. But rather, with an approach like what Cari Kaufman I think uses, where you guess based on nearby points. Which is accomplished by modeling a difficult-to-evaluate function as a stochastic process with some kind of local correlations, like a Gaussian process, so that you get probability distributions for the values of the function at each point. What I'm finding is that this is, in fact, an approach people use to optimizing difficult-to-evaluate objective functions. See here for the details: Efficient Global Optimization of Expensive Black-Box Functions, by Jones, Schonlau and Welch.
Surely probability or something very much like it is conceptually the right way to deal with uncertainty, whether it's logical uncertainty or any other kind? Granted, most of the time you don't want to deal with explicit probability distributions and Bayesian updates because the computation can be expensive, but when you work with approximations you're better off if you know what it is you're approximating.
In the area of search algorithms, I think these kinds of approaches are woefully underrepresented, and I don't think it's because they aren't particularly applicable. Granted, I could be wrong on this, because the core ideas aren't particularly new (see, for example, Dynamic Probability, Computer Chess, and the Measurement of Knowledge by I. J. Good).
It's an area of research I'm working on right now, so I've spent a fair amount of time looking into it. I could give a few references on the topic, but on the whole I think they're quite sparse.
Here's some of the literature:
Heuristic search as evidential reasoning by Hansson and Mayer
A Bayesian Approach to Relevance in Game Playing by Baum and Smith
and also work following Stuart Russell's concept of "metareasoning"
On Optimal Game-Tree Search using Rational Meta-Reasoning by Russell and Wefald
Principles of metareasoning by Russell and Wefald
and the relatively recent
Selecting Computations: Theory and Applications by Hay, Russell, Tolpin and Shimony.
On the whole, though, it's relatively limited. At a bare minimum there is plenty of room for probabilistic representations in order to give a better theoretical foundation, but I think there is also plenty of practical benefit to be gained from those techniques as well.
As a particular example of the applicability of these methods, there is a phenomenon referred to as "search pathology" or "minimax pathology", in which for certain tree structures searching deeper actually leads to worse results, when using standard rules for propagating value estimates up a tree (most notably minimax). From a Bayesian perspective this clearly shouldn't occur, and hence this phenomenon of pathology must be the result of a failure to correctly update on the evidence.
I confess to being a bit envious of this. My academic path after undergrad biochemistry took me elsewhere, alas.
Empirical priors + simulation of relevant models is somewhat similar to my idea on how to estimate P(causality|correlation): use explicit comparisons of correlational & randomized trials as priors when available, and simulate P(cauality|correlation) on random causal networks when not available.
I don't mean to butt in, but for God's sakes, can somebody please tell me what keywords to search in order to find Solomonoff or Kolmogorov-type work for putting a probability measure on countably infinite sets of structured objects? Is this just a matter of defining the correct \sigma-algebra, with countable additivity of subsets? The actual problem here is very simple: it's just a matter of taking the definition of an inductive type and putting a sensible probability measure over the resulting objects.
The problem is that you can't just assign equal probability mass to each constructor (with parameterized constructors "containing" each possible tuple of parameters as an event inside their portion of the probability mass), as this results in believing (at least, as a prior) that 50% of all linked lists are empty.
Why not start with a probability distribution over (the finite list of) objects of size at most N, and see what happens when N becomes large?
It really depends on what distribution you want to define though. I don't think there's an obvious "correct" answer.
Here is the Haskell typeclass for doing this, if it helps: https://hackage.haskell.org/package/QuickCheck-188.8.131.52/docs/Test-QuickCheck-Arbitrary.html
Because there is no defined "size N", except perhaps for nodes in the tree representation of the inductive type.
I was introduced to the idea of 'emulation' of complex models by Tony O'Hagan a few years back, where you use a Gaussian Process to model what a black box simulation will give across all possible inputs, seeded with actual simulation runs that you performed. (This also helps with active learning, in that you can find the regions of the input space where you're most uncertain what the simulation will give, and then run a simulation with those input parameters.) I believe the first application it saw was also in climate modeling.
Do you know of any cases where this simulation-seeded Gaussian Process was then used as a prior, and updated on empirical data?
uncertain parameters --simulation--> distribution over state
noisy observations --standard bayesian update--> refined distribution over state
Cari Kaufman's research profile made me think that's something she was interested in. But I haven't found any publications by her or anyone else that actually do this.
I actually think that I misread her research description, latching on to the one familiar idea.
None come to mind, sadly. :( (I haven't read through all of his work, though, and he might know someone who took it in that direction.)
What about the problem that if you admit that logical propositions are only probable, you must admit that the foundations of decision theory and Bayesian inference are only probable (and treat them accordingly)? Doesn't this leave you unable to complete a deduction because of a vicious regress?
I think most formulations of logical uncertainty give axioms and proven propositions probability 1, or 1-minus-epsilon.
Yes. Because, we're trying to express uncertainty about the consequences of axioms. Not about axioms themselves.
common_law's thinking does seem to be something people actually do. Like, we're uncertain about the consequences of the laws of physics, while simultaneously being uncertain of the laws of physics, while simultaneously being uncertain if we're thinking about it in a logical way. But, it's not the kind of uncertainty that we're trying to model, in the applications I'm talking about. The missing piece in these applications are probabilities conditional on axioms.
If you automatically assign the axioms an actually unobtainable certainty, you don't get the rational degree of belief in every proposition, as the set of "propositions" includes those not conditioned on the axioms.
Hmm. Yeah, that's tough. What do you use to calculate probabilities of the principles of logic you use to calculate probabilities?
Although, it seems to me that a bigger problem than the circularity is that I don't know what kinds of things are evidence for principles of logic. At least for the probabilities of, say, mathematical statements, conditional on the principles of logic we use to reason about them, we have some idea. Many consequences of a generalization being true are evidence for a generalization, for example. A proof of an analogous theorem is evidence for a theorem. So I can see that the kinds of things that are evidence for mathematical statements are other mathematical statements.
I don't have nearly as clear a picture of what kinds of things lead us to accept principles of logic, and what kind of statements they are. Whether they're empirical observations, principles of logic themselves, or what.
Hmm? If these are physically or empirically meaningful axioms, we can apply regular probability to them. Now, the laws of logic and probability themselves might pose more of a problem. I may worry about that once I can conceive of them being false.