This video series was produced as part of a project through the 2022 SERI Summer Research Fellowship (SRF) under the mentorship of Diffractor.
Epistemic effort: Before working on these videos, we spent ~400 collective hours working to understand infra-Bayesianism (IB) for ourselves. We built up our own understanding of IB primarily by working together on the original version of Infra-Exercises Part I and subsequently creating a polished version of the problem set in hopes of making it more user-friendly for others.
We then spent ~320 hours writing, shooting, and editing this video series. Part 5 through Part 8 of the video series were checked for accuracy by Vanessa Kosoy, but any mistakes that remain in any of the videos are fully our own.
Goals of this video series
IB appears to have quite a bit of promise. It seems plausible that IB itself or some better framework that builds on and eventually replaces IB could end up playing a significant role in solving the alignment problem (although, as with every proposal in alignment, there is significant disagreement about this). But the original sequence of posts on IB appears to be accessible only to those with a graduate-level understanding of math. Even those with a graduate-level understanding of math would likely be well-served by first getting a gentle overview of IB before plunging into the technical details.
When creating this video series, we had two audiences in mind. Some people just want to know what the heck infra-Bayesianism is at a high level and understand how it's supposed to help with alignment. We designed this video series to be a one-stop shop for accomplishing this goal. We hope that this will be the kind of video series where viewers won't ever have to pause a video and go do a search for some word or concept they didn't understand or that the video assumes knowledge of. To that end, the first four videos go over preliminary topics (which can definitely be skipped depending on how familiar the viewer already is with these topics). Here are the contents of the video series:
- Intro to Bayesianism
- Intro to Reinforcement Learning
- Intro to AIXI and Decision Theory
- Intro to Agent Foundations
- Vanessa Kosoy's Alignment Research Agenda
- Infra-Bayesian Physicalism
- A Conversation with John Wentworth
- A Conversation with Diffractor
- A Conversation with Vanessa Kosoy
We found that in order to explain IB effectively, we needed to show how IB is situated within Vanessa Kosoy's broader research agenda (which itself is situated within the agent foundations class of research agendas). We also wanted to give a concrete example of how IB could be applied to create a concrete protocol for alignment. Pre-DCA is such a protocol. It is very new and is changing quite rapidly as Vanessa tinkers with it more and more. By the time readers of this post watch the Pre-DCA video, it is likely that parts of it will already be out of date. That's perfectly fine. The purpose of the Pre-DCA video is purely to illustrate how one might go about leveraging IB to brainstorm a solution to alignment.
Our second audience are those who want to gain mastery of the technical details behind IB so that they can apply it to their own alignment research. We hope that the video series will serve as a nice "base camp" for gaining a high-level understanding of IB before delving into more technical sources (such as Infra-Exercises Part I, the original sequence of posts on IB, or Vanessa's post on infra-Bayesian physicalism).
The primary reason that we chose to create videos instead of a written post is that video is a much more neglected medium for AI alignment pedagogy. Video also allows us to relate to our audience on a more personal level. I (Jack) often find myself pausing in the middle of reading a LessWrong post to look up a video of the author speaking so that I can get a better sense of who they are.
Many thanks to Diffractor, Vanessa Kosoy, John Wentworth, Thomas Larsen, Brittany Gelb, and Lukas Melgaard for contributing to this project.
We are grateful also to the SERI SRF organizers who supported us throughout this project: Joe Collman, Voctor Warlop, Sage Bergerson, Ines Fernandez, and Cian Mullarkey.