LessWrong team member / moderator. I've been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I've been interested in improving my own epistemic standards and helping others to do so as well.
Interested in links to the press reviews you're thinking of.
Nod. Does anything in the "AI-accelerated AI R&D" space feel cruxy for you? Or "a given model seems to be semi-reliably be producing Actually Competent Work in multiple scientific fields?"
Curious if there are any bets you'd make where, if they happened in the next 10 years or so, you'd significantly re-evaluate your models here?
Nod.
FYI I don't think the book is making a particular claim that any of this will happen soon, merely that when it happens, the outcome will be very likely to be human extinction. The point is not that it'll happen at a particular time/way – the LLM/ML paradigm might hit a wall, there might need to be algorithmic advances, it might instead route through narrow AI getting really good at conducting and leveraging neuroscience and making neuromorphic AI or whatever.
But, the fact that we know human brains run on a relatively low amount of power and training data means we should expect this to happen sooner or later. (but meanwhile, it does sure seem like both the current paradigm keeps advancing, and a lot of money is being poured in, so it seems at least reasonably likely the that it'll be sooner rather than later).
The book doesn't argue a particular timeline for that, but, it personally (to me) seems weird to me to expect it to take another century, in particular when you can leverage narrower pseudogeneral AI to help you make advances. And I have a hard time imagining takeoff taking longer than than a decade, or really even a couple years, once you hit full generality.
No. The argument is "the current paradigm will produce the Bad Thing by default, if it continues on what looks like it's default trajectory." (i.e. via training, in a fashion where it's not super predictable in advance what behaviors the training will result in in various off-distribution scenarios)
A thing I can't quite tell if you're incorporating into your model – the thing the book is about is:
"AI that is either more capable than the rest of humanity combined, or is capable of recursively self-improving and situationally aware enough to maneuever itself into having the resources to do so (and then being more capable than the rest of humanity combined), and which hasn't been designed in a fairly different way from the way current AIs are created."
I'm not sure if you're more like "if that happened, I don't see why it'd be particularly likely to behave like an ideal agent ruthlessly optimizing for alien goals", or if you're more like "I don't really buy that this can/will happen in the first place."
(the book is specifically about that type of AI, and has separate arguments for "someday someone will make that" and "when they do, here's how we think it'll go")
My prediction is that a year from now Jim will still think it was a mistake and Habryka will still think it was a good call because they value different things.
Awhile ago I wrote:
There's a frame where you just say "no, rationality is specifically about being a robust agent. There are other ways to be effective, but rationality is the particular way of being effective where you try to have cognitive patterns with good epistemology and robust decision theory."
This is in tension with the "rationalists should win", thing. Shrug.
I think it's important to have at least one concept that is "anyone with goals should ultimately be trying to solve them the best way possible", and at least one concept that is "you might consider specifically studying cognitive patterns and policies and a cluster of related things, as a strategy to pursue particular goals."
Just had a thought that you might carve this into something like "shortterm rationality" and "longterm rationality", where shorterm is "what cognitive algorithms will help me right now (to systematically achieve my goals, given my current conditions and skills)", and longterm rationality is like "what cognitive-algorithm-metacognitive practices would help me longterm to invest in?"
Part of the deal of being allies if you don't have to be allies about everything. I don't think they particularly need to do anything to help with technical safety (there just need to be people who understand and care about that somewhere). I'm pretty happy if they're just on board with "stop building AGI" for whatever reason.
I do think they eventually need to be on board with some version of the handling the intelligence curse (I didn't know that term, here's a link ), although I think in a lot of worlds the gameboard is so obviously changed I expect handling it to be an easier sell.
In addition to Malo's comment, I think the book contains arguments that AFAICT are only especially made in the context of the MIRI dialogues, which are particularly obnoxious to read.