Ben_Snodin — LessWrong

Exploring the capabilities spike with METR's time horizon data: no clear signal

Fair warning: this is mostly a null result. I tried to figure out what drives the capabilities spike using METR’s time horizon data, and didn’t find much signal. I’m sharing it because it seems good to share null results. Key takeaways * The “capabilities spike” refers to the observation that...

Apr 3019

Notes from a mini-replication of the alignment faking paper

Key takeaways * This post contains my notes from a 30-40 hour mini replication of Greenblatt et al.’s alignment faking paper. * This was a significant paper because it provided some evidence for potential catastrophic risk from AI misalignment. (more) * My replication results: (more) * I consider only the...

Jun 4, 202513

My thoughts on nanotechnology strategy research as an EA cause area

This is a cross-post from the Effective Altruism Forum. Two-sentence summary: Advanced nanotechnology might arrive in the next couple of decades (my wild guess: there’s a 1-2% chance in the absence of transformative AI) and could have very positive or very negative implications for existential risk. There has been relatively...

May 2, 202234