Short summary of Condensation Condensation is a theory of concepts by Sam Eisenstat. The paper can be read here. Abram wrote a review on Lesswrong, and a followup. The paper is very much worth reading, and can be skimmed to just understand the motivation if you’re time constrained. What is...
The story so far We (Alfred and Jeremy) started a Dovetail project on Natural Latents in order to get some experience with the proofs. Originally we were going to take a crack at this bounty, but just before we got started John and David published a proof, closing the bounty....
The idea of a “basin of attraction around corrigibility” motivates much of prosaic alignment research. Essentially this is an abstract way of thinking about the process of iteration on AGI designs. Engineers test to find problems, then understand the problems, then design fixes. The reason we need corrigibility for this...
Is focusing on corrigibility our best shot at getting to ASI alignment? Max Harms and Jeremy Gillen are current and former MIRI alignment researchers who both see superintelligent AI as an imminent extinction threat, but disagree about Max's proposal of Corrigibility as Singular Target (CAST). Max thinks focusing on corrigibility...
I've been curious about how good LeelaPieceOdds is, so I downloaded a bunch of data and graphed it. For context, Leela is a chess bot and this version of it has been trained to play with a handicap. This is BBNN odds, meaning Leela starts without bishops and knights. I...
A common failure of optimizers is Edge Instantiation. An optimizer often finds a weird or extreme solution to a problem when the optimization objective is imperfectly specified. For the purposes of this post, this is basically the same phenomenon as Goodhart’s Law, especially Extremal and Causal Goodhart. With advanced AI,...
This dialogue is still in progress, but due to other commitments we don't have much time to continue it. We think the content is interesting, so we decided to publish it unfinished. We will maybe very slowly continue adding to it in the future, but can't commit to doing so....