Infrabayesianism seems to me (Abram) like a very promising framework for addressing at least some of the problems of AI alignment.
- Like logical induction, it solves the realizability problem, creating an epistemic theory suitable for embedded agents.
- Unlike logical induction, and unlike standard Bayesian decision theory, it presents a theory of epistemics directly relevant to proving decision-theoretic results (in particular, useful learning-theoretic guarantees). Logical induction and standard Bayesian decision theories both can produce meaningful loss-bounding guarantees with respect to predictive error, but bounding decision error appears challenging for these approaches. Infrabayes provides a systematic way to get around this problem. Since decision error is much more meaningful for bounding risk, this seems highly relevant to AI safety.
- Being a new perspective on very basic issues, Infrabayesianism (or perhaps successors to the theory) may turn out to shed light on a number of other important questions.
(For more information on InfraBayes, see the infrabayesianism sequence.)
However, I believe infrabayesianism appears to have a communication problem. I've chatted with several people who have strongly "bounced off" the existing write-ups. (I'm tempted to conclude this is a near-universal experience.)
There was even a post asking whether a big progress write-up -- applying InfraBayes to naturalized induction -- had simply fallen through the cracks.
Personally, even though I've carefully worked through the first three posts and re-visited my notes to study them more than once, I still am not fluent enough to confidently apply the concepts in my own work when they seem relevant.
I would like to change this situation if possible. It's not obvious to me what the best solution is, but it seems to me like it could be possible to find someone who can help.
Properties which would make an applicant interesting:
- Must be capable of fully understanding the mathematics.
- See the sequence to get an idea of what kind of mathematics is involved; mainly topology, functional analysis, measure theory and convex analysis. Background in reinforcement learning theory is a bonus.
- Must have good judgement when it comes to math exposition.
- Must be a good editor.
Details of the job description are to be worked out, but probable activities include producing independent write-ups re-explaining InfraBayes from the ground up, in a more accessible way, assisting with the creation of a textbook and exercise sheet, and editing/writeups of additional posts.
(Even if not applying, discussion in the comments about possible ways to approach this bottleneck may be fruitful!)
It seems worth remembering the AXRP podcast episode on InfraBayesianism, which I think was the first time I didn't bounce off something related to this?
I've had on my TODO to try reading the LW post transcript of that and seeing if it could be distilled further.
A future episode might include a brief distillation of that episode ;)
There was one paragraph from the podcast that I found especially enlightening—I excerpted it here (Section 3.2.3).
Ooh! Shiny! I forgot that the InfraBayes sequence existed, but when I went back I saw that I "read" the first four of them before "bouncing off" as you say. Just now I tried to dip back in to The Many Faces of Infra-Beliefs (hoping to get a summary) and it is so big! And it is not a summary <3
That post has a section titled "Deconfusing the Cosmic Ray Problem" which could be an entire post... or maybe even an entire sequence of its own if the target audience was like "bright high school students with some calc and trig and stats" and you have to explain and motivate the Cosmic Ray Problem and explain all the things that don't work first, before you explain inframeasures in a small and practical enough way to actually apply it slowly, and then turn the crank, and then see how inframeasure math definitely "gives the intuitive right answer" in a way that some definite alternative math does not.
Reading and googling and thinking... The sequence there now seems like it is aimed at securing intellectual priority? Like, I tried to go find the the people who invented it, and wrote a text book on it, and presented about it at conferences by searching for [inframeasure theory] on Google Videos and there was... literally zero videos?
This caused me to loop back to the first post of the sequence and realize that this was all original research, explained from scratch for something-like-the-first-time-ever in that sequence, not a book report on something invented by Kolmogorov or Schmidhuber or whoever, and known to be maybe finally sort of mature, and suspected to be useful.
So in terms of education, my priors are... assuming that this is as important as causal graphs, this will take decades to come into general awareness, sorta like Pearl's stuff existed back in the 1980s but widespread understanding of even this simple thing lagged for a really really long time.
Honestly, trying to figure it out, I still don't know what an inframeasure actually is in words.
If I had to guess, I'd wonder if it was maybe just a plain old bayesian model, but instead of a bayesian model with an event space that's small and cute and easy to update on, maybe it is an event space over potential infinities of bayesian agents (with incompatible priors?) updating on different ways of having conceptual simplifications of potentially infinitely complicated underlying generic event spaces? Maybe?
If this is even what it is, then I'd be tempted to say that the point was "to put some bayes in your bayes so you can update on your updates". Then maybe link it, conceptually, to Meta MCMC stuff? But my hunch is that that's not exactly what's going on here, and it might be very very very different from Meta MCMC stuff.
Infradistributions are a generalization of sets of probability distributions. Sets of probability distributions are used in "imprecise bayesianism" to represent the idea that we haven't quite pinned down the probability distribution. The most common idea about what to do when you haven't quite pinned down the probability distribution is to reason in a worst-case way about what that probability distribution is. Infrabayesianism agrees with this idea.
One of the problems with imprecise bayesianism is that they haven't come up with a good update rule -- turns out it's much trickier than it looks. You can't just update all the distributions in the set, because [reasons i am forgetting]. Part of the reason infrabayes generalizes imprecise bayes is to fix this problem.
So you can think of an infradistribution mostly as a generalization of "sets of probability distributions" which has a good update rule, unlike "sets of probability distributions".
Why is this great?
Mainly because "sets of probability distributions" are actually a pretty great idea for decision theory. Regular Bayes has the "realizability" problem: in order to prove good loss bounds, you need to assume the prior is "realizable", which means that one of the hypotheses in the prior is true. For example, with Solomonoff, this amounts to assuming the universe is computable.
Using sets instead, you don't need to have the correct hypothesis in your prior; you only need to have an imprecise hypothesis which includes the correct hypothesis, and "few enough" other hypotheses that you get a reasonably tight bound on loss.
Unpacking that a little more: if the learnability condition is met, then if the true environment is within one of the imprecise hypotheses in the prior, then we can eventually do as well as an agent who just assumed that particular imprecise hypothesis from the beginning (because we eventually learn that the true world is within that imprecise hypothesis).
This allows us to get good guarantees against non-computable worlds, if they have some computable regularities. Generalizing imprecise probabilities to the point where there's a nice update rule was necessary to make this work.
There is currently no corresponding result for logical induction. (I think something might be possible, but there are some onerous obstacles in the way.)
The reason you can't just update all the distributions in the set is, it wouldn't be dynamically consistent. That is, planning ahead what to do in every contingency versus updating and acting accordingly would produce different policies.
The correct update rule actually does appear in the literature (Gilboa and Schmeidler 1993). They don't introduce any of our dual formalisms of a-measures and nonlinear functionals, instead just viewing beliefs as orders on actions, but the result is equivalent. So, our main novelty is really combining imprecise probability with reinforcement learning theory (plus consequences such as FDT-like behavior and extensions such as physicalism) rather than the update rule (even though our formulation of the update rule has some advantages).
I'm not sure the part about "update rule was necessary" is true. Having a nice update rule is nice, but in practice it seems more important to have nice learning algorithms. Learning algorithms is something I only began to work on. As to what kind of infradistributions do we actually need (on the range between crisp and fully general), it's not clear. Physicalism seems to work better with cohomogeneous compared to crisp, but the inroads in learning suggest affine infradistributions which is even narrower than crisp. In infra-Bayesian logic, both have different advantages (cohomogeneous admits continuous conjunction, affine might admit efficient algorithms). Maybe some synthesis is possible, but at present I don't know.
See this for some initial observations. Since then I arrived at regret bounds for stochastic linear affine bandits (both ~O(√n) for the general case and ~O(logn) for the gap case, given an appropriate definition of "gap") with a UCB-type algorithm. In addition, there is Tian et al 2020 which is stated as studying zero-sum games but can be viewed as a regret bound for infra-MDPs. ↩︎
Are beta and gamma distributions infradistributions in the sense that they are different sets of probability distributions whose behavior is parameterized? Or multivariate beta distributions?
For those with math backgrounds not already familiar with InfraBayes (maybe people share the post with their math-background friends), can there be specifics for context? Like:
As someone kinda confused about InfraBayesianism but has some sense that it's important, I am glad to see this initiative. :)
If Infrabayesianism is indeed that important, then I'd also love to read something geared more towards the "lay LessWronger". As a lay LessWronger myself, I'd be happy to beta-read such an article and give feedback.
Would it be possible to set up a wiki or something similar for this? Pedagogy seems easier to crowdsource than hardcore research, and the sequences-style posting here doesn't really lend itself to incremental improvement. I think this project will be unnecessarily restricted by requiring one person to understand the whole theory before putting out any contributions. The few existing expository posts on infrabayes could already be polished and joined into a much better introduction if a collaborative editing format allowed for it. Feedback in the comments could be easily integrated into the documents instead of requiring reading of a large conversation, etc.
I was talking with someone just today about the need for something like that. I also expect that InfraBayes is one of the approaches of alignment that need a lot of translation effort to become palatable to proponents of other approaches.
Additional point though: I feel like the applicant should probably also have a decent understanding of alignment, as a lot of the value of such a communication and translation (for me at least) would come from understanding its value for alignment.
This post caught my eye as my background is in mathematics and I was, in the not-too-distant past, excited about the idea of rigorous mathematical AI alignment work. My mind is still open to such work but I'll be honest, I've since become a bit less excited than I was. In particular, I definitely "bounced off" the existing write-ups on Infrabayesianism and now without already knowing what it's all about, it's not clear it's worth one's time. So, at the risk of making a basic or even cynical point: The remuneration of the proposed job could be important for getting attention/ incentivising people on-the-fence.
Offering a bounty on what you want seems sensible here. It seemed like it worked OK for ELK proposals, so why not here?
It could also work here. But I do feel like pointing out that the bounty format has other drawbacks. Maybe it works better when you want a variety of bitesize contributions, like various different proposals? I probably wouldn't do work like Abram proposes - quite a long and difficult project, I expect - for the chance of winning a prize, particularly if the winner(s) were decided by someone's subjective judgement.
My intuition is that you could break this task down into smaller chunks, like applications of Infra-bayes and musings on why Infra-bayes worked better than existing tools there (or worse!), which someone could do within a couple of weeks, and award bounties for those tasks. Then offer jobs to whomever seems like they could do good distillations.
I think that for a few 100 hour tasks, you might need to offer maybe $50k-$100k dollars. That sounds crazy high? Well AI safety is talent constrained, it doesn't look like much is being done with the money, and MIRI seems to think there's a high discount rate (doom within a decade or two) so money should be spent now on tasks that seem important.
I'd also be happy to include a good summary in the Alignment Newsletter (here's the previous summary, which doesn't include many of the newer results).
If I haven't found a way to extend my post-doc position (ending in August) by mid-July and by some miracle this job offer is still open, it could be the perfect job for me. Otherwise, I look forward to seeing the results.
Job applicants often can't start right away; I would encourage you to apply!
Consider applying to this now anyway; applications often can be pretty quick and there's not all that much value in delaying.
Cool that this is (hopefully) being done! I have had this on my reading list for a while and since this is about the kind of problems I also spend much time thinking about, I definitely have to understand it better at some point. I guess I can snooze it for a bit now. :P Some suggestions:
Maybe someone could write an FAQ page? Also, a somewhat generic idea is to write something that is more example based, perhaps even something that just solely gives examples. Part of why I suggest these two is that I think they can be written relatively mechanically and therefore wouldn't take that much time and insight to write. Also, maybe Vanessa or Alex could also record a talk? (Typically one explains things differently in talks/on a whiteboard and some people claim that one generally does so better than in writing.)
I think for me the kind of writeup that would have been most helpful (and maybe still is) would be some relatively short (5-15 pages), clean, self-contained article that communicates the main insight(s), perhaps at the cost of losing generality and leaving some things informal. So somewhere in between the original intro post / the content in the AXRP episode / Rohin's summary (all of which explain the main idea but are very informal) and the actual sequence (which seems to require wading through a lot of intrinsically not that interesting things before getting to the juicy bits). I don't know to what extent this is feasible, given that I haven't read any of the technical parts yet. (Of course, a lot of projects have this presentation problem, but I think usually there's some way to address this. E.g., compare the logical induction paper, which probably has a number of important technical aspects that I still don't understand or forgot at this point. But where by making a lot of things a bit informal, the main idea can be grasped from the short version, or from a talk.)