(...but also gets the most important part right.) Bentham’s Bulldog (BB), a prominent EA/philosophy blogger, recently reviewed If Anyone Builds It, Everyone Dies. In my eyes a review is good if it uses sound reasoning and encourages deep thinking on important topics, regardless of whether I agree with the bottom...
Last year I wrote the CAST agenda, arguing that aiming for Corrigibility As Singular Target was the least-doomed way to make an AGI. (Though it is almost certainly wiser to hold off on building it until we have more skill at alignment, as a species.) I still basically believe that...
Is focusing on corrigibility our best shot at getting to ASI alignment? Max Harms and Jeremy Gillen are current and former MIRI alignment researchers who both see superintelligent AI as an imminent extinction threat, but disagree about Max's proposal of Corrigibility as Singular Target (CAST). Max thinks focusing on corrigibility...
This post is a (somewhat rambling and unsatisfying) meditation on whether it's possible, given a somewhat powerful AI that is more or less under control and trained in a way that it behaves reasonably corrigible in environments that resemble the training data, whether one could carefully iterate towards a machine...
> It's plausible that humanity could make a corrigible ASI by 2035 if the planet was united around that goal and being very careful. Are there any knowledgeable people outside MIRI who might disagree with me on this statement and be interested in arguing with me about it? I'm more...
Clara Collier recently reviewed If Anyone Builds It, Everyone Dies in Asterisk Magazine. I’ve been a reader of Asterisk since the beginning and had high hopes for her review. And perhaps it was those high hopes that led me to find the review to be disappointing. Collier says “details matter,”...
This is part of the MIRI Single Author Series. Pieces in this series represent the beliefs and opinions of their named authors, and do not claim to speak for all of MIRI. Okay, I'm annoyed at people covering AI 2027 burying the lede, so I'm going to try not to...