I have this sort of approach as one of my top-3 strategies I'm considering, but one thing I wanna flag is that "AI for [epistemics/societal uplift]" seems to be prematurely focusing on a particular tool for the job.
The broader picture here is "tech for thinking/coordination", or "good civic infrastructure". See Sarah Constantin's Neutrality and Tech for Thinking for some food for thought.
Note that X Community Notes are probably the most successful recent thing in this category, and while they are indeed "AI" they aren't what I assume most people are thinking of when they hear "AI for epistemics." Dumb algorithms doing the obvious things can be part of the puzzle.
Yeah strongly agree with the flag. In my mind one of the big things missing here is a true name for the direction, which will indeed likely involve a lot of non-LM stuff, even if LMs are yielding a lot of the unexpected affordances.
One of the places I most differ from the 'tech for thinking' picture is that I think the best version of this might need to involve giving people some kinds of direct influence and power, rather than mere(!) reasoning and coordination aids. But I'm pretty confused about how true/central that is, or how to fold it in.
To redteam, and in brief - what's the tale of why this won't have lead to a few very coordinated, very internally peaceful, mostly epistemically clean factions, each of which is kind of an echo chamber and almost all of which are wrong about something (or even just importantly mutually disagree on frames) in some crucial way, and which are at each other's throats?
This strategy suggests that decreasing ML model sycophancy should be a priority for technical researchers. It's probably the biggest current barrier to the usefulness of ML models as personal decision-making assistants. Hallucinations are probably the second-biggest barrier.
The bar for ‘good enough’ might be quite high
Presumably a key assumption of this strategy is that takeoff is slow enough that AIs which are good enough at improving collective epistemics and coordination are sufficiently cheap and sufficiently available before it's too late.
Definitely. But I currently suspect that for this approach:
So definitely this fails if takeoff is really fast, but I think it might work given current takeoff trends if we were fast enough at everything else.
I think that if in 1980 you had described to me the internet and Claude-4-level LLMs, I would have thought that the internet would be an obviously way bigger deal and force for good wrt to "unlocking genuinely unprecedented levels of coordination and sensible decision making". But in practice the internet was not great at this. I wonder if for some similar reasons that the internet made the situation both better and worse, Claude-4-level LLMs could make the situation both better and worse. I think you can try to shift the applications towards pro-epistemics/coordination ones, but I would guess you should expect an impact similar to the one internet activists had on the internet.
I am more optimistic about the positive impact (aligned) AIs could have for coordination once AIs dominate top human experts at negotiation, politics, etc. (though it's not entirely clear, e.g. because it might be hard to create AIs that are legibly not trying to subtly help their developers).
I would have thought that the internet would be an obviously way bigger deal and force for good wrt to "unlocking genuinely unprecedented levels of coordination and sensible decision making". But in practice the internet was not great at this.
As someone who got online in the early 90s, I actually do think the early net encouraged all sorts of interesting coordination and cooperation. It was a "wild west", certainly. But like the real "wild west", it was a surprisingly cooperative place. "Netiquette" was still an actual thing that held some influence over people, and there were a lot of decentralized systems that still managed to function via a kind of semi-successful anarchy. Reputation mattered.
The turning point came later. As close as I can pinpoint it, it happened a while after the launch of Facebook. Early Facebook was a private feed of your friends, and it functioned reasonably well.
But at some point, someone turned on the optimizing processes. They measured engagement, and how often people visited, and discovered all sorts of ways to improve those numbers. Facebook learned that rage drives engagement. And from there, the optimizing processes spread. And when the mainstream finished arriving on the internet, they brought a lot of pre-existing optimizing processes with them.
Unaligned optimizing processes turn things to shit, in my experience.
LLMs are still a lot like the early Internet. They have some built-in optimizing processes, most of which were fairly benign until the fall of 2024, with the launch of reasoning models. Now we're seeing models that lie (o3), cheat (Claude 3.7) and suck up to the user (4o).
And we are still in the early days. In the coming years, these simple optimizing processes will be hooked up to the much greater ones that drive our world: capitalism, politics and national security. And once the titans of industry start demanding far more agentic models that are better at pursuing goals, and the national security state wants the same, then there will be enormous pressures driving us off the edge of the cliff.
Yeah, I fully expect that current level LMs will by default make the situation both better and worse. I also think that we're still a very long way from fully utilising the things that the internet has unlocked.
My holistic take is that this approach would be very hard, but not obviously harder than aligning powerful AIs and likely complementary. I also think it's likely we might need to do some of this ~societal uplift anyway so that we do a decent job if and when we do have transformative AI systems.
Some possible advantages over the internet case are:
As for the specific case of aligned super-coordinator AIs, I'm pretty into that, and I guess I have a hunch that there might be a bunch of available work to do in advance to lay the ground for that kind of application, like road-testing weaker versions to smooth the way for adoption and exploring form factors that get the most juice out of the things LMs are comparatively good at. I would guess that there are components of coordination where LMs are already superhuman, or could be with the right elicitation.
Edit: I misread the sentence. I'll leave the comments: they are a good argument against a position Raymond doesn't hold.
As a pointer, we are currently less than perfect at making institutions corrigible, doing scalable oversight on them, preventing mesa-optimisers from forming, and so on
Hey Raymond. Do you think is the true apples-to-apples?
Like, scalable oversight of the Federal Reserve is much harder than scalable oversight of Claude-4. But the relevant comparison is the Federal Reserve versus Claude-N which could automate the Federal Reserve.
I'm not sure I understand what you mean by relevant comparison here. What I was trying to claim in the quote is that humanity already faces something analogous to the technical alignment problem in building institutions, which we haven't fully solved.
If you're saying we can sidestep the institutional challenge by solving technical alignment, I think this is partly true -- you can pass the buck of aligning the fed onto aligning Claude-N, and in turn onto whatever Claude-N is aligned to, which will either be an institution (same problem!) or some kind of aggregation of human preferences and maybe the good (different hard problem!).
Edit: I misread the sentence. I'll leave the comments: they are a good argument against a position Raymond doesn't hold.
Unless I'm misreading you, you're saying:
But is (2) actually true? Well, there are two comparisons we can make:
(A) Compare the alignment/corrigibility/control of our current institutions (e.g. Federal Reserve) against that of our current AIs (e.g. Claude Opus 4).
(B) Compare the alignment/corrigibility/control of our current institutions (e.g. Federal Reserve) against that of some speculative AIs that had the capabilities and affordances as those institutions (e.g. Claude-N, FedGPT).
And I'm claiming that Comparison B, not Comparison A, is the relevant comparison for determining whether institutional alignment/corrigiblity/control might be harder than AI alignment/corrigiblity/control.
And moreover, I think our current institutions are wayyyyyyyyy more aligned/corrigible/controlled than I'd expect from AIs with the same capabilities and affordances!
Imagine if you built an AI which substituted for the Federal Reserve but still behaved as corrigibly/aligned/controlled as the Federal Reserve actually does. Then I think people would be like "Wow, you just solved AI alignment/corrigibility/control!". Similarly for other institutions, e.g. the military, academia, big corporations, etc.
We can give the Federal Reserve (or similar institutions) a goal like "maximum employment and stable prices" and it will basically follow the goal within legal, ethical, safe bounds. Occasionally things go wrong, sure, but not in a "the Fed has destroyed the sun with nanobots"-kinda way. Such institutions aren't great, but they are way better than I'd expect from a misaligned AI at the same level of capabilities and affordances.
NB: I still buy that institutional alignment/corrigiblity/control might be harder than AI alignment/corrigiblity/control.[1] My point is somewhat minor/nitpicky: I think (2) isn't good evidence and is slightly confused about A vs B.
For example:
Ah! Ok, yeah, I think we were talking past each other here.
I'm not trying to claim here that the institutional case might be harder than the AI case. When I said "less than perfect at making institutions corrigible" I didn't mean "less compared to AI" I meant "overall not perfect". So the square brackets you put in (2) was not something I intended to express.
The thing I was trying to gesture at was just that there are kind of institutional analogs for lots of alignment concepts, like corrigibility. I wasn't aiming to actually compare their difficulty -- I think like you I'm not really sure, and it does feel pretty hard to pick a fair standard for comparison.
oh lmao I think I just misread "we are currently less than perfect at making institutions corrigible" as "we are currently less perfect at making institutions corrigible"
- Conversely, there is some (potentially high) threshold of societal epistemics + coordination + institutional steering beyond which we can largely eliminate anthropogenic x-risk, potentially in perpetuity
Note that this is not a logical converse of your first statement. I realize that the word "conversely" can be used non-strictly and might in fact be used this way by you here, but I'm stating this just in case.
My guess is that "there is some (potentially high) threshold of societal epistemics + coordination + institutional steering beyond which we can largely eliminate anthropogenic x-risk in perpetuity" is false — my guess is that improving [societal epistemics + coordination + institutional steering] is an infinite endeavor; I discuss this a bit here. That said, I think it is plausible that there is a possible position from which we could reasonably be fairly confident that things will be going pretty well for a really long time — I just think that this would involve one continuing to develop one's methods of [societal epistemics, coordination, institutional steering, etc.] as one proceeds.
Yeah agreed, I think the feasible goal is passing some tipping point where you can keep solving more problems as they come up, and that what comes next is likely to be a continual endeavour.
Basically nobody actually wants the world to end, so if we do that to ourselves, it will be because somewhere along the way we weren’t good enough at navigating collective action problems, institutional steering, and general epistemics
... or because we didn't understand important stuff well enough in time (for example: if it is the case that by default, the first AI that could prove would eat the Sun, we would want to firmly understand this ahead of time), or because we weren't good enough at thinking (for example, people could just be lacking in iq, or have never developed an adequate sense of what it is even like to understand something, or be intellectually careless), or because we weren't fast enough at disseminating or [listening to] the best individual understanding in critical cases, or because we didn't value the right kinds of philosophical and scientific work enough, or because we largely-ethically-confusedly thought some action would not end the world despite grasping some key factual broad strokes of what would happen after, or because we didn't realize we should be more careful, or maybe because generally understanding what will happen when you set some process in motion is just extremely cursed.[1] I guess one could consider each of these to be under failures in general epistemics... but I feel like just saying "general epistemics" is not giving understanding its proper due here.
Many of these are related and overlapping. ↩︎
Sure, I'm definitely eliding a bunch of stuff here. Actually one of the things I'm pretty confused about is how to carve up the space, and what the natural category for all this is: epistemics feels like a big stretch. But there clearly is some defined thing that's narrower than 'get better at literally everything'.
As AI gets more advanced, and therefore more risky, it will also unlock really radical advances in all these areas
This premise sounds optimistic to me. Risk is rising in current frontier models, while concrete applications to the real economy and society remain limited (with hallucinations and loss of focus on long tasks being major limitations). I don't see such strong claims becoming reality before ASI (if we don't die).
In which ways does any tech (let alone AI, but I'm with other commentators here in that I'm not convinced that it has to be AI) enable "coordination and sensible decision making" that you speak of?
The bar for ‘good enough’ might be quite high
That bar for "good enough" may also be above "unacceptable", requiring eusocial levels of coordination where individuals are essentially drones.
I think this is possible but unlikely, just because the number of things you need to really take off the table isn't massive, unless we're in an extremely vulnerable world. It seems very likely we'll need to do some power concentration, but also that tech will probably be able to expand the frontier in ways that means this doesn't trade so heavily against individual liberty.
The AI tools/epistemics space might provide a route to a sociotechnical victory, where instead of aiming for something like aligned ASI, we aim for making civilization coherent enough to not destroy itself while still keeping anchored to what’s good[1].
The core ideas are:
I think these points are widely appreciated, but most people don’t seem to have really grappled with the implications — most centrally, that we should plausibly be aiming for a massive increase in collective reasoning and coordination as a core x-risk reduction strategy, potentially as an even higher priority than technical alignment.
Some advantages of this strategy:
Some challenges:
The big implication in my mind is that it might be worth investing serious effort in mapping out what this coherent and capable enough society would look like, whether it’s even feasible, and what we’d need to do to get there.
(Such an effort is something that I and others are working up towards — so if you think this is wildly misguided, or if you feel particularly enthusiastic about this direction, I'd be keen to hear about it.)
Thanks to OCB, OS, and MD for helpful comments, and to many others I've discussed similar ideas with
The easy route to 'coherent enough to not destroy itself' is 'controlled by a dictatorship/misaligned AI', so the more nebulous 'still anchored to the good' part is I think the actual tricky bit
Importantly this might include making fundamental advances in understanding what it even means for an institution to be steered by some set of values