(Work done at Convergence Analysis. Mateusz wrote the post and is responsible for the outline of the argument, many details of which crystallized in conversations with Justin. Thanks to Olga Babeeva for the feedback on this post.) 1. Introduction: Clarifying the DSA-AI theses Over the last decade, AI research and...
(Work done at Convergence Analysis. The ideas are due to Justin. Mateusz wrote the post. Thanks to Olga Babeeva for feedback on this post.) In this post, we introduce the typology of structure, function, and randomness that builds on the framework introduced in the post Goodhart's Law Causal Diagrams. We...
(Work done at Convergence Analysis. Mateusz wrote the post and is responsible for most of the ideas with Justin helping to think it through. Thanks to Olga Babeeva for the feedback on this post.) 1. Motivation Suppose the perspective of pausing or significantly slowing down AI progress or solving the...
Boxing an agent more intelligent than ourselves is daunting, but information theory, thermodynamics, and control theory provide us with tools that can fundamentally constrain agents independent of their intelligence. In particular, we may be able to contain an AI by limiting its access to information Constraining output and inputs Superintelligent...
Interpretability research is conducted to improve our understanding of AI. Many see interpretability as essential for AI safety, but recently some have argued that it can also increase the risk posed by AI by facilitating improved AI capabilities. We agree, and in this post, we’ll explain why, as well as...
In this post, we’ll introduce wisdom as a measure of the benevolence and internal coherence of an arbitrary agent. We’ll define several factors, such as the agent’s values, plans, evidence, and alignment with human values, and then define wisdom as consistency within and between these factors. We believe this is...
Many organizations are developing and using AI evaluations, “evals”, to assess the capability, alignment, and safety of AI. However, evals are not entirely innocuous, and we believe the risks they pose are neglected. In this article, we’ll outline some of the risks posed by doing AI evals, and suggest a...