I think I've never seen anyone empathize properly with MAGA people (myself included)
By anyone do you mean "anyone who isn't MAGA"? Or does this imply that you haven't observed any MAGA people?
I think a lot of democrats feel bitten by having tried to compromise in the past and feeling like the republicans kept defecting
Note that this is true the other way round as well (e.g. with stuff like attempted impeachment, imprisonment, and assassination). My post on underdog bias might be useful here, re the ways in which each side considers themselves underdogs.
Even if you think that a clear majority of the escalation has been from Trump (which doesn't seem true to me), it's worth thinking about ways to avoid your proposal acting as another step in the escalation spiral. For example, what kind of coalition would be able to actually update that Trump is less bad than it thinks if your current fears don't come to pass?
Or, more concretely: what kind of coalition would be able to deescalate the conflict by actually making compromises (for example, being able to credibly put a crackdown on illegal immigration, or a repudiation of DEI, on the table). What kind of coalition could make credible promises to not jail Trump and his allies after winning the election? Etc.
When I articulate the case for AI takeover risk to people I know, I don't find the need to introduce them to new ontologies.... But I think I agree that if you want to actually do technical work to reduce the risks, that it is useful to have new concepts that point out why the risk might arise.
It seems like one crux might be that I place much less value on "articulating a case" than you do. Or maybe another way of putting it is that you draw a cleaner boundary between "articulating a case" and "actually do technical work", whereas I think of them as pretty continuous when done well.
(Note that this also puts me in disagreement with many rationalists. Many rationalists treat "the case for high P(doom)" as a pretty reliable set of ideas, and then "alignment research" as something we're very confused about. Whereas from my perspective, these two things are intimately related—developing the concepts and ontology required for a robust case for high P(doom) would actually get us most of the way towards solving the alignment problem.)
Not quite sure how to justify my position here, but one intuition is something like "taking crucial considerations seriously". It really does seem that people's overall conclusions about what's good or bad can be flipped by things they haven't thought of. For example, I notice that many people pay lip service to the idea that the AI safety movement has significantly accelerated AI capabilities (and that AI governance has polarized the current administration against AI safety), but almost nobody is actually trying to figure out how to systematically avoid making additional mistakes like that.
So how do you make your conclusions and strategies less fragile? Well, here are some domains in which it's possible to draw robust conclusions: physics, statistics, computer science, etc. Each of these fields have deep ontologies that have been tested against reality in many different ways, such that it's very hard to imagine their concepts and arguments being totally wrongheaded. Even then, you still have "crucial considerations" like relativity which totally change how you interpret fundamental concepts in those fields—but importantly, Newtonian mechanics was robust enough that even a reconceptualization of core concepts like "space" and "time" actually changed very few of its practical conclusions.
I think it's also worth mentioning the social element. Making a case of the form "people should be worried about X/people should work on X" is a satisfying thing which gains people (short-term) clout and influence. Conversely, trying to deeply understand X is harder and riskier. This is part of why it seems like there's a misallocation of effort towards raising awareness of risks rather than trying to solve them (even within what's usually called "AI safety research"—e.g. evals are mostly in the former category).
To be fair, you also get short-term clout for being a grumpy contrarian, like I'm being now. So I think that if I spend too much time or effort doing so, there's probably something suspicious about that. Looking back, it seems like I started my current grumpy contrarian arc around mid-2024, when I published this post and this post. So it's been a year and a half, which is quite a long time! Ideally I'll stop making posts about which research I dislike within the next 6 months or so, and limit my focus to discussing research I like (or ideally just producing research that speaks for me).
In my mind, there is not much ontological innovation going on in these concepts, because they can be stated in one sentence using pre-existing concepts.
So can the concept of evolutionary fitness, or Galileo's laws of motion. Yet those are huge ontological innovations—and in general a lot of ontological innovations are very simple in hindsight. (In Galileo's case, even the realization "stuff on earth follows the same rules as stuff in space" was a huge step forward.) The important part is not that each individual concept is novel or complex, but rather that you get a set of interlocking concepts which bind together to allow you to generate good explanations and predictions.
Okay, it's helpful to know that you see these as providing new valuable ontologies to some extent.
To be clear, I did say "these ontologies are kinda "thin", which has made it difficult to use them to do substantive work". So yeah, not useless but also not central examples of valuable ontological progress.
It sometimes seems to me like you jump to the conclusion that all the action is in the edge cases without actually arguing for it
In the case of the extinction example I only said it was "possible" that all the action is in the edge cases. I use this as an example to illustrate that even concepts which seem really solid are still more flimsy than you'd think, not as an example of a concept that we should discard because it's near-useless. Sorry for the lack of clarity.
Conversely, in the case of human powergrabs you're right that I'm making a claim which I haven't fully argued for. I think the best way to make this argument is just to continue developing my own theories for understanding power grabs (e.g. this kind of thinking, but hopefully much more rigorous). Might take a while unfortunately.
Thanks for making this more concrete! I still disagree with you, but this list provides a good way of articulating why.
Of these, I think the initial case for AI takeover risk was the most impactful by far, and also a great example of introducing a new ontology (which includes concepts like AGI and superintelligence, the orthogonality thesis, instrumental convergence, recursive self-improvement, the nearest unblocked strategy problem, corrigibility, alignment, reward tampering, etc).
The simulation argument, by contrast, is an interesting example of an idea that is nominally very "big if true" but has actually had very little impact. And one reason why I think this is the case is because it doesn't really change our ontologies much. We add the concept of the universe being a simulation, but in order to understand what's inside the simulation we still use all our old concepts, and we have almost no new concepts that help us think about what's outside the simulation. (Off the top of my head, "ancestor simulation" is the only additional novel concept I can recall related to this hypothesis.)
Existential risk and longtermism fall somewhere in the middle. They are also "big if true", and they have been fleshed out to some degree (e.g. with concepts like the vulnerable world hypothesis, astronomical waste, value lock-in, etc). But I don't think it's a coincidence that the vast majority of work under these headings has been on preventing AI takeover. Outside that bucket, I'd say these ontologies are kinda "thin", which has made it difficult to use them to do substantive work without funneling that work through better-developed ontologies (like AI risk).
What do I mean by "thin"? Roughly two things. Firstly, it's unclear if the concepts in them are well-defined or well-grounded. For people on LessWrong, a good way to get a sense of such concepts is from reading continental philosophy, where people often use words in vague ways that feel like they're mostly trying to convey a vibe not a precise meaning. Now, analytic philosophers try to be more precise than continentals. But when reasoning about weird futures, conceptual clarity is extremely difficult to achieve; from a future perspective most strategy/futurism researchers will likely seem as confused as continental philosophers seem to LWers. For example, a concept like "extinction" seems very easy to define. But actually when you start to think about possible edge cases (like: are humans extinct if we've all uploaded? Are we extinct if we've all genetically modified into transhumans? Are we extinct if we're all in cryonics and will wake up later?) it starts to seem possible that maybe "almost all of the action" is in the parts of the concept that we haven't pinned down.
And that becomes much more true when working with far more abstract concepts like "lock-in". If you ask me "will the values that rule the future be locked in?" the majority of my probability mass is on "something will happen which is a very non-central example of either 'yes' or 'no' in your current ontology—in other words, your conception of locked-in is too vague for the question to be meaningful". For more on this, see my post on why I'm not a bayesian. Other concepts which I think are too vague or confused to be very useful for doing strategy research: AGI, LLMs (at what point have we added so much image/video/RL data that summarizing them as "language models" is actively misleading?), "human power grabs" (I expect that there will be strong ambiguity about the extent to which AIs are 'responsible'), "societal epistemics", "alignment", "metaphilosophy", and a range of others.
Secondly, though, even when concepts are well-defined, are they useful? In mathematics you can easily generate concepts which are well-defined on a technical level but totally uninteresting. One intuition pump here is personality tests: it's very easy to score people's personalities on a bunch of different axes, but most of the time those axes are so arbitrary that it's hard to "hook them in" to the rest of our knowledge about how people think. Similarly, even insofar as DeepMind's "levels of AGI" are well-defined, they're just so arbitrary that we shouldn't expect these concepts to carve reality at its joints. But these kinds of frameworks and taxonomies abound in futurism/strategy research.
And so, instead of trying to produce "strategy research", I claim people should try to do the kinds of work that produced powerful ontologies in the past—whether that's inventing theoretical frameworks like probability theory/information theory/computation theory, doing empirical research to produce new scientific fields, doing the kinds of philosophy that produced our current political ontology, etc.
Strong disagree. Often in Alexander Wales' stories, I find he takes the rationalist/EA mindset so far that it reads more like a reductio. Examples include the story about EA superman, and
the "aligned" skynet
at the end of Branches on the Tree of Time.
I have a similar feeling about the end of Worth the Candle.
Consider making a curriculum (along the lines of AGI safety fundamentals or 21civ.com) aimed at smart, open-minded people with little relevant background. I doubt I'm the only one around here who would be interested in an accessible way to start reading good mid-twentieth-century European philosophy. Also in principle you can create a rough curriculum pretty quickly (e.g. an hour for a first draft) although if you're anything like me you'll then be sucked into spending a huge amount more time on it.
FYI my current prioritization for myself for this year is "work on 'Influency' things, such as the followup to AI 2027 and whatever other things we can find that seem tractiony, in attempt to wake up the world and get people thinking 'okay, how we navigate AI really can't be politics-as-usual."
I feel very meh about "wake up the world", firstly because AI capabilities companies are going to do it for us, and secondly because whether it's good or bad depends a lot on the quality of what we funnel the world towards, and right now we really don't have many robustly good things to funnel the world towards (we don't even really have good things to funnel EAs towards).
Also, you can't rely on people who need to be "woken up" right now to actually do high-quality thinking about this stuff.
Hence I also disagree with "I think last year and this one is a limited window of time to do Influency AI things with much leverage".
Curious what the story for this being particularly important are? (obviously I see why it's an intuitively Raemon-shaped one, not sure if it was more like "this seems actively good" or "idk I wanna reroll Ray on something-or-other and this vaguely matches his vibe")
Story something like "imagine if meaning-making becomes 100x easier in the next few years than it is today. We sure would want people trying hard to make a lot of meaning!" I think lumping this under "influency" things is self-defeating though, you actually need to be trying to solve some problem (and then the influence may come later) rather than trying to cater to what you think other people want from you.
oh yeah, I guess you're in charge of the lighthaven-shaped hole. sorry for omitting you, I was just going down the list of team members here.
If you continued working together as a team? Honestly the "headline" of "build software that allows humans to communicate and coordinate about AI" probably wouldn't change that much. Mostly I want to push the reset button on what your background assumptions about that are. Instead of "how can we improve LessWrong", think "how can we build a new type of thing from scratch, without needing our existing userbase to approve of the changes?" E.g. the kind of thinking @Ivan Vendrov is doing.
It's also plausible to me that part of the reset button is splitting up the team. If so, I want Rafe to do agent foundations research, I want Habryka to try to design a political philosophy for the AI age (where this curriculum is IMO a good starting point, whether as something to build on or something to fight), I want you to figure out what personalized AI fiction/music/poetry/film should look like and create a bunch of it for our community, I want Ben to go down the woo rabbit-hole and report back what he finds, I want Ruby to own LW, and I don't know Robert well enough to say (but I guess there's a Lighthaven-shaped hole in my list so far, so maybe that).
Not gonna weigh in on the object level but on the meta level I think we're reaching the point where existing concepts like "corrigibility" and "human morality" are starting to buckle, and we need a better ontology in order to have more productive discussions about this.