Applications of logical uncertainty

[-]lackofcheese11y50

As far as AI is concerned, you don't need to go anywhere near as specialised as FAI to find something where logical uncertainty is directly applicable.

Every search problem in AI is an instance of logical uncertainty, and every search algorithm is a different way of attempting to deal with that uncertainty.

[-]Wei Dai11y30

This is why I think logical uncertainty is a "dual use" problem, rather than an FAI problem. Lack of a deep understanding of the nature of logical uncertainty could be holding up a lot of AI progress, and I'm not sure why people aren't more concerned about this.

[-]alex_zag_al11y30

It's true that this is a case of logical uncertainty.

However, I must add that in most of my examples, I bring up the benefits of a probabilistic representation. Just because you have logical uncertainty doesn't mean you need to represent it with probability theory.

In protein structure, we already have these Bayesian methods for inferring the fold, so the point of the probabilistic representation is to plug it i these methods as a prior. In philosophy, we want ideal rationality, which suggests probability. In automated theorem proving... okay, yeah, in automated theorem proving I can't explain why you'd want to use probability theory in particular.

But yes. If you had a principled way to turn your background information and already done computations into a probability distribution for future computations, you could use that for AI search problems. And optimization problems. Wow, that's a lot of problems. I'm not sure how it would stack up against other methods, but it'd be interesting if that became a paradigm for at least some problems.

In fact, now that you've inspired me to look for it, I find that it's being done! Not with the approach of coming up with a distribution over all mathematical statements that you see in Christiano's report, and which is the approach I had in mind when writing the post. But rather, with an approach like what Cari Kaufman I think uses, where you guess based on nearby points. Which is accomplished by modeling a difficult-to-evaluate function as a stochastic process with some kind of local correlations, like a Gaussian process, so that you get probability distributions for the values of the function at each point. What I'm finding is that this is, in fact, an approach people use to optimizing difficult-to-evaluate objective functions. See here for the details: Efficient Global Optimization of Expensive Black-Box Functions, by Jones, Schonlau and Welch.

[-]lackofcheese11y30

Surely probability or something very much like it is conceptually the right way to deal with uncertainty, whether it's logical uncertainty or any other kind? Granted, most of the time you don't want to deal with explicit probability distributions and Bayesian updates because the computation can be expensive, but when you work with approximations you're better off if you know what it is you're approximating.

In the area of search algorithms, I think these kinds of approaches are woefully underrepresented, and I don't think it's because they aren't particularly applicable. Granted, I could be wrong on this, because the core ideas aren't particularly new (see, for example, Dynamic Probability, Computer Chess, and the Measurement of Knowledge by I. J. Good).

It's an area of research I'm working on right now, so I've spent a fair amount of time looking into it. I could give a few references on the topic, but on the whole I think they're quite sparse.

[-]lackofcheese11y90

Here's some of the literature:
Heuristic search as evidential reasoning by Hansson and Mayer
A Bayesian Approach to Relevance in Game Playing by Baum and Smith

and also work following Stuart Russell's concept of "metareasoning"
On Optimal Game-Tree Search using Rational Meta-Reasoning by Russell and Wefald
Principles of metareasoning by Russell and Wefald
and the relatively recent
Selecting Computations: Theory and Applications by Hay, Russell, Tolpin and Shimony.

On the whole, though, it's relatively limited. At a bare minimum there is plenty of room for probabilistic representations in order to give a better theoretical foundation, but I think there is also plenty of practical benefit to be gained from those techniques as well.

As a particular example of the applicability of these methods, there is a phenomenon referred to as "search pathology" or "minimax pathology", in which for certain tree structures searching deeper actually leads to worse results, when using standard rules for propagating value estimates up a tree (most notably minimax). From a Bayesian perspective this clearly shouldn't occur, and hence this phenomenon of pathology must be the result of a failure to correctly update on the evidence.

[-]Cyan11y30

I was working in protein structure prediction.

I confess to being a bit envious of this. My academic path after undergrad biochemistry took me elsewhere, alas.

[-]gwern11y20

There are two people that I know of, doing research that resembles this. One is Francesco Stingo. He published a method for detecting binding between two different kinds of molecules--miRNA and mRNA. His method has a prior that is based in part on chemistry-based predictions of binding, and updated on the results of microarray experiments. The other is Cari Kaufman, who builds probability distributions over the results of a climate simulation. (the idea seems to be to extrapolate from simulations actually run with similar but not identical parameters)

Empirical priors + simulation of relevant models is somewhat similar to my idea on how to estimate P(causality|correlation): use explicit comparisons of correlational & randomized trials as priors when available, and simulate P(cauality|correlation) on random causal networks when not available.

[-][anonymous]11y00

So, for statements like "the billionth digit of pi is even", it's provable from your beliefs. But you're still uncertain of it. So, the problem is, what's its probability? Well, 1/2, probably, but from what principled theory can you derive that?

I don't mean to butt in, but for God's sakes, can somebody please tell me what keywords to search in order to find Solomonoff or Kolmogorov-type work for putting a probability measure on countably infinite sets of structured objects? Is this just a matter of defining the correct \sigma-algebra, with countable additivity of subsets? The actual problem here is very simple: it's just a matter of taking the definition of an inductive type and putting a sensible probability measure over the resulting objects.

The problem is that you can't just assign equal probability mass to each constructor (with parameterized constructors "containing" each possible tuple of parameters as an event inside their portion of the probability mass), as this results in believing (at least, as a prior) that 50% of all linked lists are empty.

[-]Jonathan Paulson11y00

Why not start with a probability distribution over (the finite list of) objects of size at most N, and see what happens when N becomes large?

It really depends on what distribution you want to define though. I don't think there's an obvious "correct" answer.

Here is the Haskell typeclass for doing this, if it helps: https://hackage.haskell.org/package/QuickCheck-2.1.0.1/docs/Test-QuickCheck-Arbitrary.html

[-][anonymous]11y00

Why not start with a probability distribution over (the finite list of) objects of size at most N, and see what happens when N becomes large?

Because there is no defined "size N", except perhaps for nodes in the tree representation of the inductive type.

[-]Vaniver11y00

The other is Cari Kaufman, who builds probability distributions over the results of a climate simulation. (the idea seems to be to extrapolate from simulations actually run with similar but not identical parameters)

I was introduced to the idea of 'emulation' of complex models by Tony O'Hagan a few years back, where you use a Gaussian Process to model what a black box simulation will give across all possible inputs, seeded with actual simulation runs that you performed. (This also helps with active learning, in that you can find the regions of the input space where you're most uncertain what the simulation will give, and then run a simulation with those input parameters.) I believe the first application it saw was also in climate modeling.

[-]alex_zag_al11y00

Do you know of any cases where this simulation-seeded Gaussian Process was then used as a prior, and updated on empirical data?

Like...

uncertain parameters --simulation--> distribution over state
noisy observations --standard bayesian update--> refined distribution over state

Cari Kaufman's research profile made me think that's something she was interested in. But I haven't found any publications by her or anyone else that actually do this.

I actually think that I misread her research description, latching on to the one familiar idea.

[-]Vaniver11y00

Do you know of any cases where this simulation-seeded Gaussian Process was then used as a prior, and updated on empirical data?

None come to mind, sadly. :( (I haven't read through all of his work, though, and he might know someone who took it in that direction.)

[-]common_law11y-10

What about the problem that if you admit that logical propositions are only probable, you must admit that the foundations of decision theory and Bayesian inference are only probable (and treat them accordingly)? Doesn't this leave you unable to complete a deduction because of a vicious regress?

[-]somnicule11y20

I think most formulations of logical uncertainty give axioms and proven propositions probability 1, or 1-minus-epsilon.

[-]alex_zag_al11y50

Yes. Because, we're trying to express uncertainty about the consequences of axioms. Not about axioms themselves.

common_law's thinking does seem to be something people actually do. Like, we're uncertain about the consequences of the laws of physics, while simultaneously being uncertain of the laws of physics, while simultaneously being uncertain if we're thinking about it in a logical way. But, it's not the kind of uncertainty that we're trying to model, in the applications I'm talking about. The missing piece in these applications are probabilities conditional on axioms.

[-]common_law11y10

Philosophically, I want to know how you calculate the rational degree of belief in every proposition.

If you automatically assign the axioms an actually unobtainable certainty, you don't get the rational degree of belief in every proposition, as the set of "propositions" includes those not conditioned on the axioms.

[-]alex_zag_al11y00

Hmm. Yeah, that's tough. What do you use to calculate probabilities of the principles of logic you use to calculate probabilities?

Although, it seems to me that a bigger problem than the circularity is that I don't know what kinds of things are evidence for principles of logic. At least for the probabilities of, say, mathematical statements, conditional on the principles of logic we use to reason about them, we have some idea. Many consequences of a generalization being true are evidence for a generalization, for example. A proof of an analogous theorem is evidence for a theorem. So I can see that the kinds of things that are evidence for mathematical statements are other mathematical statements.

I don't have nearly as clear a picture of what kinds of things lead us to accept principles of logic, and what kind of statements they are. Whether they're empirical observations, principles of logic themselves, or what.

[-]hairyfigment11y00

Hmm? If these are physically or empirically meaningful axioms, we can apply regular probability to them. Now, the laws of logic and probability themselves might pose more of a problem. I may worry about that once I can conceive of them being false.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

30

Applications of logical uncertainty

30

30

Combining information from simulation and experiment

Automated theorem proving

Friendly AI

Philosophy

Learning about probabilistic logical uncertainty