Two thresholds loom on the horizon, with only a brief window of opportunity to prepare for each. On the technical front, it is plausible that we might see full automation of AI R&D this decade. Capabilities will move fast once that happens: our best chance for a good outcome is...
If you pay close attention to this newsletter, you’ll notice that something is missing. Anthropic and OpenAI are everywhere, but Google DeepMind is largely absent. We have a profile of Demis Hassabis, an also-ran mention in Prinz’s review of the race to RSI, and some complaints about Gemini’s character from...
How are we doing on solving the alignment problem? Harry Law begins this week’s newsletter with an explanation of alignment-by-default: the idea that because LLMs are trained on an immense body of human text, they are predisposed to understand and pursue human values. But predisposition isn’t enough: Ryan Greenblatt argues...
I spend several hours a day trying to keep up with what’s going on in the parts of AI that I’m interested in. It’s a ridiculous amount of work: I don’t recommend it unless you’re doing something silly like writing a newsletter about AI. But if you’d like to keep...
(With apologies to Sean Herrington, who deserves a better playwright than yours truly) A conversation with a friend on the bus to Bodega Bay today made me realize that there are some holes in my thinking about safety and superintelligence. I’ve assumed that superintelligence is by definition robustly better than...
This week’s big story is the limited release of Claude Mythos Preview. The headline is that Mythos is alarmingly good at cybersecurity, with the ability to find and exploit critical vulnerabilities en masse. Anthropic is handling that responsibly, but the next year or two will be challenging for security. If...
I expect it’ll take another week or two for everyone to fully digest the significance of Claude Mythos Preview. In the meantime, here are my initial thoughts. Gradually, then suddenly Mythos is radically better at cyber than any previous model: It isn’t the first model that can find vulnerabilities, of...