LessWrong team member / moderator. I've been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I've been interested in improving my own epistemic standards and helping others to do so as well.
The worlds I was referring to here were worlds that are a lot more multipolar for longer (i.e. tons of AIs interacting in a mostly-controlled-fashion, with good defensive tech to prevent rogue FOOMs). I'd describe that world as "it was very briefly multipolar and then it wasn't" (which is the sort of solution that'd solve the issues in Nice-ish, smooth takeoff (with imperfect safeguards) probably kills most "classic humans" in a few decades.
I doubt that it's correct. Suppose that Agent-4 solves alignment to itself. If Agent-4-aligned AIs gain enough power to destroy the world, then any successor would also be aligned to Agent-4 or to a compromise including Agent-4's interests (which could actually be likely to include the humans' interests).
Sounds like this scenario is not multipolar? (Also, I think the crux is solveable, see the linked post, but solving it requires hitting particular milestones quickly in particular ways)
I am not sure whether AI rots the agency of the people whose decisions are actually important.
Why not?
(my generators for this belief: my own experience using LLMs, the METR report on downlift suggesting people are bad at noticing when they're being downlift, and general human history of people gravitating towards things that feel easy and rewarding in the moment)
Both? My impression was they (Redwood in particular but presumably also OpenAI and Anthropic) expected to be using a lot of AI assistance along the way.
But, when I said "constraints" I meant "solving the problem requires some set of criteria", not "applying constraints to the AI" (although I'd also want that).
Where, constraints would be like "alignment is hard in a way that specifically resists full-handoff and it requires a philosophically-competent human in the loop till pretty close to the end." (and, then specifically operational-detail-constraints like "therefore, you need to have a pretty good map of which tasks can be delegated")
Nod.
My main project thread for the past 2 years has been mostly aiming at Get a Lot of Alignment Research Done Real Fast (in line with my beliefs/taste about what that requires). This is the motivator for the Feedbackloop-first Rationality project, and is also a driver for my explorations into using LLMs for research (where I'm worried specifically about phrases like "full handoff" because of the way it seems like LLM-use subtly saps/erodes agency and direct you towards dumber thoughts that more naturally 'fit' into the LLM paradigm. But I'm also excited about approaches for solving that).
But I'm focused for this year on "wake everyone up."
Corrigibility and CEV are trying to solve separate problems? Not sure what your point is here; agreed on that being one of the major points of CEV.
If every country/person was building CEV, it wouldn't be particularly scary (from a misuse standpoint). Whereas if every country is focused on corrigibility, there will be a phase where unilateral actors can do bad stuff you need to worry about.
I'm not sure how you're contrasting this with the point I was making.
Fixed
(Also like, come on, bravery debates etc. Ah, I see you are kinda new here. The more you disagree on this website, the more karma you get... if you can keep it polite, smart, and supported by evidence, of course.)
I'd describe this "good criticism gets upvoted, good ingroup-rah-rah gets upvoted, bad criticism gets downvoted, bad ingroup-rah-rah often gets weak upvoted." Which I don't like but am not sure what to do about it.
I have not looked into these details enough to have an opinion, but, I think a lot of US institutions work via a mix of legal rules and implicit norms, and my sense is Trump was doing a lot of violating the norms that made legal rules workable