Palus Astra — LessWrong

Evan Hubinger on Inner Alignment, Outer Alignment, and Proposals for Building Safe Advanced AI

It’s well-established in the AI alignment literature what happens when an AI system learns or is given an objective that doesn’t fully capture what we want. Human preferences and values are inevitably left out and the AI, likely being a powerful optimizer, will take advantage of the dimensions of freedom...

Jul 1, 202035

FLI Podcast: On Superforecasting with Robert de Neufville

Essential to our assessment of risk and ability to plan for the future is our understanding of the probability of certain events occurring. If we can estimate the likelihood of risks, then we can evaluate their relative importance and apply our risk mitigation resources effectively. Predicting the future is, obviously,...

Apr 30, 20206

AI Alignment Podcast: An Overview of Technical AI Alignment in 2018 and 2019 with Buck Shlegeris and Rohin Shah

Just a year ago we released a two part episode titled An Overview of Technical AI Alignment with Rohin Shah. That conversation provided details on the views of central AI alignment research organizations and many of the ongoing research efforts for designing safe and aligned systems. Much has happened in...

Apr 16, 202058

FLI Podcast: The Precipice: Existential Risk and the Future of Humanity with Toby Ord

Toby Ord’s “The Precipice: Existential Risk and the Future of Humanity" has emerged as a new cornerstone text in the field of existential risk. The book presents the foundations and recent developments of this budding field from an accessible vantage point, providing an overview suitable for newcomers. For those already...

Apr 1, 20207

AI Alignment Podcast: On Lethal Autonomous Weapons with Paul Scharre

Most relevant to AI alignment, and a pertinent question to focus on for interested readers/listeners is: if we are are unable to establish a governance mechanism as a global community on the concept that we should not let AI make the decision to kill humans, then what effects will this...

Mar 16, 202012