Victoria Krakovna. Research scientist at DeepMind working on AI safety, and cofounder of the Future of Life Institute. Website and blog: vkrakovna.wordpress.com
This is a significant effect in general, but I'm not sure how much epistemic cost it creates in this situation. Moderates working with AI companies mostly interact with safety researchers, who are not generally doing bad things. There may be a weaker second-order effect where the safety researchers at labs have some epistemic distortion from cooperating with capabilities efforts, and this can influence external people who are collaborating with them.
Thanks for this helpful framework, it's also useful for people who are submitting rebuttals not for the first time :). Sadly NeurIPS and ICML no longer allow a top-level comment (for silly technical reasons).
Thanks Gunnar, those sound like reasonable guidelines!
Yeah, living in a group house was important for our mental well-being as well, especially during the pandemic and parental leaves. I think the benefits of the social environment decreased somewhat because we were often occupied with the kids and had less time to socialize. It was still pretty good though - if Deep End was close enough to schools we like, we would have probably stayed and tried to make it work (though this would likely involve taking over more of the house over time). Our new place contributes to mental well-being by being much closer to nature (while still a reasonable bike commute from the office).
I would potentially be interested, if we knew the other people well. I find that, as a parent, I'm less willing to take risks by moving in with people I don't know that well, because the stress and uncertainty associated with things not working out are more costly.
Space requirements would likely be the biggest difficulty though, as you pointed out. A family with 2 kids probably needs at least 3 rooms, so two such families together would need a 6 bedroom house. This is hard to find, especially combined with other constraints like proximity to schools, commute distances, etc. It's a lot easier to live near other families than sharing a living space.
I really enjoyed this sequence, it provides useful guidance on how to combine different sources of knowledge and intuitions to reason about future AI systems. Great resource on how to think about alignment for an ML audience.
I think this is still one of the most comprehensive and clear resources on counterpoints to x-risk arguments. I have referred to this post and pointed people to a number of times. The most useful parts of the post for me were the outline of the basic x-risk case and section A on counterarguments to goal-directedness (this was particularly helpful for my thinking about threat models and understanding agency).
I still endorse the breakdown of "sharp left turn" claims in this post. Writing this helped me understand the threat model better (or at all) and make it a bit more concrete.
This post could be improved by explicitly relating the claims to the "consensus" threat model summarized in Clarifying AI X-risk. Overall, SLT seems like a special case of that threat model, which makes a subset of the SLT claims:
I continue to endorse this categorization of threat models and the consensus threat model. I often refer people to this post and use the "SG + GMG → MAPS" framing in my alignment overview talks. I remain uncertain about the likelihood of the deceptive alignment part of the threat model (in particular the requisite level of goal-directedness) arising in the LLM paradigm, relative to other mechanisms for AI risk.
In terms of adding new threat models to the categorization, the main one that comes to mind is Deep Deceptiveness (let's call it Soares2), which I would summarize as "non-deceptiveness is anti-natural / hard to disentangle from general capabilities". I would probably put this under "SG MAPS", assuming an irreducible kind of specification gaming where it's very difficult (or impossible) to distinguish deceptiveness from non-deceptiveness (including through feedback on the model's reasoning process). Though it could also be GMG, where the "non-deceptiveness" concept is incoherent and thus very difficult to generalize well.
Similarly to Leo, I think racing to AGI is bad and it would be good to coordinate not to do that. I support proposals for AI regulations that would make this easier. I signed various open letters to this effect on AI red lines, AI Treaty, SB1047, and others.
I'm pretty uncertain if pushing for an AI pause now is an effective way to achieve this, and I think it's quite plausibly better to pause later rather than now. In the next few years, we will have more solid evidence of misalignment, and we would be able to make better use of a pause period (which is likely to be finite) e.g. with automated alignment researchers. I don't think calling for a pause/ban now is a costless action - early calls for a pause have the risk of crying wolf and using up the political will that could be used for a pause later. I signed the FLI pause letter in 2023, but looking back it seems a bit premature. A conditional pause in the future seems much easier to get adopted than a hard pause now.
I agree with everything Neel said in his top-level comment, and I'm puzzled by the number of disagreement votes on it.