TLDR: A new paper summarizes some verification methods for international AI agreements. See also summaries on LinkedIn and Twitter. Several co-authors and I are currently planning some follow-up projects about verification methods. There are also at least 2 other groups planning to release reports on verification methods. If you have...
This summer, I’m supervising some research fellows through Cambridge’s ERA AI Fellowship. The program started last week, and I’ve had conversations with about 6 fellows about their research projects & summer goals. In this post, I’ll highlight a few pieces of advice I’ve found myself regularly giving to research fellows....
In a new Science paper, the authors provide concise summaries of AI risks and offer recommendations for governments. I think the piece is quite well-written. It concisely explains a lot of relevant arguments, including arguments about misalignment and AI takeover. I suspect this is one of the best standalone pieces...
Summary Evidential cooperation in large worlds (ECL) is a proposed way of reaping gains—that is, getting more of what we value instantiated—through cooperating with agents across the universe/multiverse. Such cooperation does not involve physical, or causal, interaction. ECL is potentially a crucial consideration because we may be able to do...
In this post, I summarize some of my thoughts on OpenAI's recent preparedness framework. The post focuses on my opinions/analysis of the framework, and it doesn't have much summary (for readers unfamiliar with the framework, I recommend Zvi’s post). As a high-level point, I believe voluntary commitments from individual labs...
In May and June of 2023, I (Orpheus) had about 50-70 meetings about AI risks with congressional staffers. I had been meaning to write a post reflecting on the experience and some of my takeaways, and I figured it could be a good topic for a LessWrong dialogue. I saw...