What is Inadequate about Bayesianism for AI Alignment: Motivating Infra-Bayesianism
Introduction Infra-Bayesianism is a mathematical framework for studying artificial learning and intelligence that developed from Vanessa Kosoy’s Learning Theoretic AI Alignment Research Agenda. As applied to reinforcement learning, the main character of infra-Bayesianism is an agent that is learning about an unknown environment and making decisions in pursuit of some goal. Infra-Bayesianism provides novel ways to model this agent’s beliefs and make decisions, which address problems arising when an agent does not or cannot consider the true environment possible at the beginning of the learning process. This setting, a non-realizable environment, is relevant to various scenarios important to AI alignment, including scenarios when agents may consider themselves as part of the environment, and scenarios involving self-modifying agents, multi-agent interactions, and decision theory problems. Furthermore, it is the most realistic setting given the computational complexity of the real world. Here are the links to further posts in this sequence, which continue the introduction given in this post (links will be updated as available): * An Introduction to Reinforcement Learning for Understanding Infra-Bayesianism * Proof Section to an Introduction to Reinforcement Learning for Understanding Infra-Bayesianism * An Introduction to Credal Sets and Infra-Bayes Learnability * Proof Section to an Introduction to Credal Sets and Infra-Bayes Learnability * Crisp Supra-Decision Processes * Proof Section to Crisp Supra-Decision Processes * Formalizing Newcombian problems with fuzzy infra-Bayesianism * Proof Section to Formalizing Newcombian problems with fuzzy infra-Bayesianism Non-realizability and irreflexivity In classical Bayesian reinforcement learning theory, a learning agent starts out with a prior probability distribution over hypotheses called, for brevity, a prior. Each hypothesis describes a possible way that the environment might be, and there is some hypothesi