Produced as part of the SERI ML Alignment Theory Scholars Program - Winter 2022 Cohort Introduction I think that Discovering Latent Knowledge in Language Models Without Supervision (DLK; Burns, Ye, Klein, & Steinhardt, 2022) is a very cool paper – it proposes a way to do unsupervised mind reading[1] –...
Previous Posts: Formal Metaethics and Metasemantics for AI Alignment New MetaEthical.AI Summary and Q&A at UC Berkeley This time I tried to focus less on the technical details and more on providing the intuition behind the principles guiding the project. I'm grateful for questions and comments from Stuart Armstrong and...
This was a talk at Effective Altruism Global (London 2019) by Mahendra Prasad. It's a non-technical introduction to the ideas in his working paper I previously posted. It also covers some additional ground. For instance, from Republican polling data, we can see the difference voting methods can make with Trump...
This is a working paper on group rationality by Mahendra Prasad, who has previously published, among other things, "Social Choice and the Value Alignment Problem" in Artificial Intelligence Safety and Security, edited by Roman V. Yampolskiy. He's provided me with this summary to share: Resolving Majority Rule’s Irrationality Since the...
Previous Intro: Formal Metaethics and Metasemantics for AI Alignment I’m nearing the completion of a hopefully much more readable version of the ideas previously released as set-theoretic code. This takes the form of a detailed outline, currently in WorkFlowy, in which you can easily expand/collapse subsections which elaborate on their...
A Brief Introduction to MetaEthical.AI tl;dr: AIXI for Friendliness [Crossposted from my new blog.] Abstract: We construct a fully technical ethical goal function for AI by directly tackling the philosophical problems of metaethics and mental content. To simplify our reduction of these philosophical challenges into “merely” engineering ones, we suppose...