The Learning-Theoretic Agenda: Status 2023
TLDR: I give an overview of (i) the key problems that the learning-theoretic AI alignment research agenda is trying to solve, (ii) the insights we have so far about these problems and (iii) the research directions I know for attacking these problems. I also describe "Physicalist Superimitation" (previously knows as "PreDCA"): a hypothesized alignment protocol based on infra-Bayesian physicalism that is designed to reliably learn and act on the user's values. I wish to thank Steven Byrnes, Abram Demski, Daniel Filan, Felix Harder, Alexander Gietelink Oldenziel, John S. Wentworth[1] and my spouse Marcus Ogren for reviewing a draft of this article, finding errors and making helpful comments and suggestions. Any remaining flaws are entirely my fault. Preamble I already described the learning-theoretic agenda (henceforth LTA) in 2018. While the overall spirit stayed similar, naturally LTA has evolved since then, with new results and research directions, and slightly different priorities. This calls for an updated exposition. There are several reasons why I decided that writing this article is especially important at this point: * Most of the written output of LTA focuses on the technical, mathy side. This leaves people confused about the motivation for those inquiries, and how they fit into the big picture. I find myself having to explain it again and again in private conversations, and it seems more efficient to have a written reference. * In particular, LTA is often conflated with infra-Bayesianism. While infra-Bayesianism is a notable part, it is not the only part, and without context it's not even clear why it is relevant to alignment. * Soares has recently argued that alignment researchers don't "stack", meaning that it doesn't help much to add more researchers to the same programme. I think that for LTA, this is not at all the case. On the contrary, there is a variety of shovel-ready problems that can be attacked in parallel. With this in mind, it's especi
I like the overall vibe. Two issues:
The "recent comments" disappeared. This isreally badbecause I use that to find my recent comments when I want to edit them. (For example now I wanted to find this comment to add this second bullet but had to do it manually.)OK, I now see I can find them under "feed" but this might be confusing.