As part of the MATS Winter 2023-24 Program, scholars were invited to take part in a series of weekly discussion groups on AI safety strategy. Each strategy discussion focused on a specific crux we deemed relevant to prioritizing AI safety interventions and was accompanied by a reading list and suggested...
Context: somebody at some point floated the idea that Ronny might (a) understand the argument coming out of the Quintin/Nora camp, and (b) be able to translate them to Nate. Nate invited Ronny to chat. The chat logs follow, lightly edited. The basic (counting) argument Ronny Fernandez Are you mostly...
Summary * We’re running the second year of the Atlas Fellowship, a $10k scholarship and free 11-day program for high school students from across the world. I see it as a unique opportunity for talented young people to meet intellectual peers IRL and improve their thinking. * The core program...
edit: Several days after posting this and asking for feedback on it someone pointed me to this post: Does SGD Produce Deceptive Alignment by Mark Xu. Mark's post makes essentially the exact same argument that this post makes, but is written much more carefully, and I think does a better...
Crossposted from Figuring Figuring. I did a second survey that fixed some of the flaws of the first survey. The results from the second survey significantly color the interpretation of the results from the first survey given in the first “Conclusion and Discussion” section. Please continue reading past the section...
Crossposted from Figuring Figuring. Here is a handle for a mistake I make all the time that works really well for me… that is the handle works really well for me. I make this mistake fairly often, but of course I notice it in other people more often. Imagine that...