Church: a language for probabilistic modeling

[-]andreas15y200

The key idea behind Church and similar languages is that they allow us to express and formally reason about a large class of probabilistic models, many of which cannot be formalized in any concise way as Bayes nets.

Bayes nets express generative models, i.e. processes that generate data. To infer the states of hidden variables from observations, you condition the Bayes net and compute a distribution on the hidden variable settings using Bayesian inference or some approximation thereof. A particularly popular class of approximations is the class of sampling algorithms, e.g. Markov Chain Monte Carlo methods (MCMC) and importance sampling.

Probabilistic programs express a larger class of models, but very similar approximate inference algorithms can be used to condition a program on observations and to infer the states of hidden variables. In both machine learning and cognitive science, when you are doing Bayesian inference with some model that expresses your prior belief, you usually code both model and inference algorithm and make use of problem-specific approximations to Bayesian inference. Probabilistic programs separate model from inference by using universal inference algorithms.

If you are interested in this set of ideas in the context of cognitive science, I recommend this interactive tutorial.

Church is based on Lisp. At the lowest level, it replaces Boolean gates with stochastic digital circuits. These circuits are wired together to form Markov chains (the probabilistic counterpart of finite state machines.) At the top, it's possible to define probabilistic procedures for generating samples from recursively defined distributions.

This confuses Church as a language for expressing generative models with ideas on how to implement such a language in hardware. There are three different ideas here:

Church as a language for generative models with sampling-based semantics
MCMC as an approximate inference method for such models (that can be implemented on traditional von Neumann architectures)
Machine architectures that are well-suited for MCMC

So this is an open call for volunteers -- any brave Bayesians want to blog about a brand new computer language?

I'll write an exposition within the next weeks if people are interested.

[-][anonymous]15y30

Thank you so much. I'm sorry I was confused, and I'm glad someone is around who knows more. And thank you so much for the tutorial! This is proving to be much clearer than the papers. You are a prince.

[-]Cyan15y20

Do eet.

[-]Daniel_Burfoot15y10

you usually code both model and inference algorithm and make use of problem-specific approximations to Bayesian inference. Probabilistic programs separate model from inference by using universal inference algorithms.

Naively, separation of model from inference algorithm seems like a terrible idea. People use problem-specific approximation algorithms because if they don't exploit specific features of the model, inference will be completely intractable. Often this consideration is so important that people will use obviously sub-optimal models, if those models support fast inference.

[-]andreas15y00

Yes, deriving mechanisms that take complex models and turn them into something tractable is mostly an open problem.

[-]cousin_it15y00

Please do.

[-]Larks15y40

(As an aside, I think we need more technical posts so we don't spend all our time hissing at each other; how would people feel about seeing summaries of recent research in the neuro/cognitive science/AI-ish cluster?)

Upvoted for this alone; I've always felt one of the things that kept LW good was that the subject matter and terminology acted as a filter for participation.

[-]jsalvatier15y40

Are the inference methods really advanced enough to make this kind of thing practical? I know that in statistics, MCMC methods have long been specialized enough that in order to use them for even relatively simple problems (lets say a 20 parameter linear model with non normal errors) you have to know what you're doing. I have been interested in this field because recently there have been promising algorithms which promise to make inference much easier for problems with continuous variables (in particular gradient/hessian based algorithms analogous to newton methods on optimization). Probabilistic programs seems like a much more difficult category to work with likely to get very slow very fast. Am I missing something? What do people hope to accomplish with such a language?

[-]andreas15y20

Current universal inference methods are very limited, so the main advantages of using probabilistic programming languages are (1) the conceptual clarity you get by separating generative model and inference and (2) the ability to write down complex nonparametric models and immediately be able to do inference, even if it's inefficient. Writing a full model+inference implementation in Matlab, say, takes you much longer, is more confusing and less flexible.

That said, some techniques that were developed for particular classes of problems have a useful analog in the setting of programs. The gradient-based methods you mention have been generalized to work on any probabilistic program with continuous parameters.

[-]jsalvatier15y10

Interesting, I suppose that does seem somewhat useful; for discussion purposes at the very least. I am curious about how a gradient-based method can work without continuous parameters: that is counter intuitive for me. Can you throw out some keywords? Keywords for what I was talking about: Metropolis-adjusted Langevin algorithm (MALA), Stochastic Newton, any MCMC with 'hessian' in the name.

[-]andreas15y10

They don't work without continuous parameters. If you have a probabilistic program that includes both discrete and continuous parameters, you can use gradient methods to generate MH proposals for your continuous parameters. I don't think there are any publications that discuss this yet.

[-]jsalvatier15y00

Oh, ok that makes perfect sense. Breaking inference problems into sub problems and using different methods on the sub problems seems like a common technique.

[-]Richard_Kennaway15y00

Writing a full model+inference implementation in Matlab, say, takes you much longer, is more confusing and less flexible.

Defining and implementing a whole programming language is way more work than writing a library in an existing language. A library, after all, is a language, but one you don't have to write a parser, interpreter, or compiler for.

[-]andreas15y10

I was comparing the two choices people face who want to do inference in nontrivial models. You can either write the model in an existing probabilistic programming language and get inefficient inference for free or you can write model+inference in something like Matlab. Here, you may be able to use libraries if your model is similar enough to existing models, but for many interesting models, this is not the case.

[-]Richard_Kennaway15y00

Ok, I was comparing the effort at the meta-level: providing tools for computing in a given subject area either by making a new language, or by making a new library in an existing language.

[-][anonymous]15y40

It looks like there a number of other probabilistic languages out there.

Including lambda circle(? it is symbolic) which is associated with Sebastian Thrun and some real world robotics. An ocaml variant.

There is also another ocaml variant called Ibal, which doesn't seem to use the sampling method the other two are talking about.

LtU has some discussion on them.

I'm also not a lisp speaker, but I have a passing knowledge of ocaml syntax so I might have a look at lambda circle and blog on it. Once I figure out how to google for it.

[-]SilasBarta15y00

Including lambda circle(? it is symbolic) which is associated with Sebastian Thrun and some real world robotics. An ocaml variant.

I was reminded of Sebastian Thrun's CES that I heard about when reading EY's Selling Non-Apples. Anyone know the differences between them?

I was really interested in CES when I read about it because of the idea of being able to seamlessly combine multiple sources of information about the same phenomenon, which is what you would need for a real-world agent that can perform general reasoning.

(I was especially interested when I saw scene in The Dark Knight where they convert the sonar data from a bunch of cell phones into an image of a city, and got to thinking about how you implement something like that, to the extent that it's realistic, but please don't hold that against me ... )

[-]JoshuaZ15y20

Regarding parenthetical, summaries of that sort would be good.

I'm in general pessimistic that were every going to have a really good approach to handling these sorts of probability calculations because they have some versions that are hard. For example, SAT which is NP-complete can be sort of thought of as trying to get a probability distribution where each event is assigned probability of 0 or 1. This doesn't resemble anything rigorous, more just a general intuition. However, the fact that they think that Church might be used for practical applications does make this look good. Also, I may be too focused on a computational complexity approach to things rather than a more general "this makes this easier to program and understand" approach.

[-]andreas15y40

Probabilistic inference in general is NP-hard, but it is not clear that (1) this property holds for the kinds of problems people are interested in and, even if it does, that (2) approximate probabilistic inference is hard for this class of problems. For example, if you believe this paper, probabilistic inference without extreme conditional probabilities is easy.

[-][anonymous]15y20

(As an aside, I think we need more technical posts so we don't spend all our time hissing at each other; how would people feel about seeing summaries of recent research in the neuro/cognitive science/AI-ish cluster?)

Yes to more technical summary posts, 'cuz I think they could be useful for exposing people doing IA or AI research to ideas that might be otherwise missed in the information sea.

[-]luminosity15y20

To the aside: I'd find it interesting at the least, and I'd hope that some of the research would come with information that could actually be put into use. Eliezer's sequences for the site were great because they built solidly upon research.

[-]Laoch13y00

If anyone is interested I'm doing an MSc project using Church see here. Does anybody know of an intuitive explanation of Metropolis Hastings? The literature is replete with the assumption that people are statisticians or a pro mathematician, and I haven't got the time to catch up.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

25

Church: a language for probabilistic modeling

25

25