TLDR: Superhuman AI may consider takeover the risky option, and we can influence its choices not only by increasing the chances of being caught, but also by rewarding the choice not to try takeover. An Ultimate Gamble Imagine you face a choice, on behalf of humanity: Option A: play it...
This post is part of a series on my “crazy” ideas in AI Safety. The probability that any individual idea is useful is low; however, if even one of them turns out to matter, the impact could be large. Or maybe it will inspire someone to do something useful, remotely...
As I wrote earlier, we’ve launched the next season of our mentorship program. This year we have two mentors who are explicitly focused on AI Safety (Slava Meriton and Peter Drotos), along with many others from a wide range of academic fields—including CS, AI, and additional STEM areas. If you...
A year and a half ago, we (AI Safety Quest and Sci.STEPS) launched the pilot season of the Mentorship in AGI Safety (MAGIS) program. MAGIS was originally modeled as a simplified version of Sci.STEPS, an online mentorship program for early-career researchers. In both programs, we connect experienced professionals with aspiring...
EDIT: Big error in this post spotted and corrected by gjm here https://www.lesswrong.com/posts/ZEuDH2W3XdRaTwpjD/hyperbolic-model-fits-metr-capabilities-estimate-worse-than . I believe the motivation part still stands, but so far no experimental confirmation. EDIT2: I also realized that we need to take into account confidence intervals provided by METR as well. I will make a separate...
TLDR: AI researchers may have a different intuitive definition of sentience than neuroscientists; if you are one of the AI researchers (or policymakers, also important), please consider suggesting your definition in the poll here. The question of whether AI is sentient, and what criteria can establish this, gets more and...
TLDR; We report our intermediate results from the AI Safety Camp project “Mechanistic Interpretability Via Learning Differential Equations”. Our goal was to explore transformers that deal with time-series numerical data (either infer the governing differential equation or predict the next number). As the task is well formalized, this seems to...