‘AI for societal uplift’ as a path to victory

[-]Raemon5mo248

I have this sort of approach as one of my top-3 strategies I'm considering, but one thing I wanna flag is that "AI for [epistemics/societal uplift]" seems to be prematurely focusing on a particular tool for the job.

The broader picture here is "tech for thinking/coordination", or "good civic infrastructure". See Sarah Constantin's Neutrality and Tech for Thinking for some food for thought.

Note that X Community Notes are probably the most successful recent thing in this category, and while they are indeed "AI" they aren't what I assume most people are thinking of when they hear "AI for epistemics." Dumb algorithms doing the obvious things can be part of the puzzle.

[-]Raymond Douglas5mo119

Yeah strongly agree with the flag. In my mind one of the big things missing here is a true name for the direction, which will indeed likely involve a lot of non-LM stuff, even if LMs are yielding a lot of the unexpected affordances.

One of the places I most differ from the 'tech for thinking' picture is that I think the best version of this might need to involve giving people some kinds of direct influence and power, rather than mere(!) reasoning and coordination aids. But I'm pretty confused about how true/central that is, or how to fold it in.

[-]Lorxus5mo310

To redteam, and in brief - what's the tale of why this won't have lead to a few very coordinated, very internally peaceful, mostly epistemically clean factions, each of which is kind of an echo chamber and almost all of which are wrong about something (or even just importantly mutually disagree on frames) in some crucial way, and which are at each other's throats?

[-]Nate Showell5mo63

This strategy suggests that decreasing ML model sycophancy should be a priority for technical researchers. It's probably the biggest current barrier to the usefulness of ML models as personal decision-making assistants. Hallucinations are probably the second-biggest barrier.

[-]ryan_greenblatt5mo68

The bar for ‘good enough’ might be quite high

Presumably a key assumption of this strategy is that takeoff is slow enough that AIs which are good enough at improving collective epistemics and coordination are sufficiently cheap and sufficiently available before it's too late.

[-]Raymond Douglas5mo57

Definitely. But I currently suspect that for this approach:

We currently have a big overhang: we could be getting a lot even out of the models we already have
There's some tipping point beyond which society is uplifted enough to correctly prioritise getting more uplifted
Getting to that tipping point wouldn't require massively more advanced AI capabilities in a lot of the high-diffusion areas (i.e. Claude 4 might well be good enough for anything that requires literally everyone to have access to their own model)
The areas that might require more advanced capabilities require comparatively little diffusion (e.g. international coordination, lab oversight)

So definitely this fails if takeoff is really fast, but I think it might work given current takeoff trends if we were fast enough at everything else.

[-]Fabien Roger5mo71

I think that if in 1980 you had described to me the internet and Claude-4-level LLMs, I would have thought that the internet would be an obviously way bigger deal and force for good wrt to "unlocking genuinely unprecedented levels of coordination and sensible decision making". But in practice the internet was not great at this. I wonder if for some similar reasons that the internet made the situation both better and worse, Claude-4-level LLMs could make the situation both better and worse. I think you can try to shift the applications towards pro-epistemics/coordination ones, but I would guess you should expect an impact similar to the one internet activists had on the internet.

I am more optimistic about the positive impact (aligned) AIs could have for coordination once AIs dominate top human experts at negotiation, politics, etc. (though it's not entirely clear, e.g. because it might be hard to create AIs that are legibly not trying to subtly help their developers).

[-]Random Developer5mo126

I would have thought that the internet would be an obviously way bigger deal and force for good wrt to "unlocking genuinely unprecedented levels of coordination and sensible decision making". But in practice the internet was not great at this.

As someone who got online in the early 90s, I actually do think the early net encouraged all sorts of interesting coordination and cooperation. It was a "wild west", certainly. But like the real "wild west", it was a surprisingly cooperative place. "Netiquette" was still an actual thing that held some influence over people, and there were a lot of decentralized systems that still managed to function via a kind of semi-successful anarchy. Reputation mattered.

The turning point came later. As close as I can pinpoint it, it happened a while after the launch of Facebook. Early Facebook was a private feed of your friends, and it functioned reasonably well.

But at some point, someone turned on the optimizing processes. They measured engagement, and how often people visited, and discovered all sorts of ways to improve those numbers. Facebook learned that rage drives engagement. And from there, the optimizing processes spread. And when the mainstream finished arriving on the internet, they brought a lot of pre-existing optimizing processes with them.

Unaligned optimizing processes turn things to shit, in my experience.

LLMs are still a lot like the early Internet. They have some built-in optimizing processes, most of which were fairly benign until the fall of 2024, with the launch of reasoning models. Now we're seeing models that lie (o3), cheat (Claude 3.7) and suck up to the user (4o).

And we are still in the early days. In the coming years, these simple optimizing processes will be hooked up to the much greater ones that drive our world: capitalism, politics and national security. And once the titans of industry start demanding far more agentic models that are better at pursuing goals, and the national security state wants the same, then there will be enormous pressures driving us off the edge of the cliff.

[-]Raymond Douglas5mo52

Yeah, I fully expect that current level LMs will by default make the situation both better and worse. I also think that we're still a very long way from fully utilising the things that the internet has unlocked.

My holistic take is that this approach would be very hard, but not obviously harder than aligning powerful AIs and likely complementary. I also think it's likely we might need to do some of this ~societal uplift anyway so that we do a decent job if and when we do have transformative AI systems.

Some possible advantages over the internet case are:

People might be more motivated towards by the presence of very salient and pressing coordination problems
- For example, I think the average head of a social media company is maybe fine with making something that's overall bad for the world, but the average head of a frontier lab is somewhat worried about causing extinction
Currently the power over AI is really concentrated and therefore possibly easier to steer
A lot of what matters is specifically making powerful decision makers more informed and able to coordinate, which is slightly easier to get a handle on

As for the specific case of aligned super-coordinator AIs, I'm pretty into that, and I guess I have a hunch that there might be a bunch of available work to do in advance to lay the ground for that kind of application, like road-testing weaker versions to smooth the way for adoption and exploring form factors that get the most juice out of the things LMs are comparatively good at. I would guess that there are components of coordination where LMs are already superhuman, or could be with the right elicitation.

[-]Cleo Nardo5mo*40

Edit: I misread the sentence. I'll leave the comments: they are a good argument against a position Raymond doesn't hold.

As a pointer, we are currently less than perfect at making institutions corrigible, doing scalable oversight on them, preventing mesa-optimisers from forming, and so on

Hey Raymond. Do you think is the true apples-to-apples?

Like, scalable oversight of the Federal Reserve is much harder than scalable oversight of Claude-4. But the relevant comparison is the Federal Reserve versus Claude-N which could automate the Federal Reserve.

[-]Raymond Douglas5mo20

I'm not sure I understand what you mean by relevant comparison here. What I was trying to claim in the quote is that humanity already faces something analogous to the technical alignment problem in building institutions, which we haven't fully solved.

If you're saying we can sidestep the institutional challenge by solving technical alignment, I think this is partly true -- you can pass the buck of aligning the fed onto aligning Claude-N, and in turn onto whatever Claude-N is aligned to, which will either be an institution (same problem!) or some kind of aggregation of human preferences and maybe the good (different hard problem!).

[-]Cleo Nardo5mo*20

Edit: I misread the sentence. I'll leave the comments: they are a good argument against a position Raymond doesn't hold.

Unless I'm misreading you, you're saying:

Institutional alignment/corrigiblity/control might be harder than AI alignment/corrigiblity/control
Supporting evidence for (1) is that "[...] we are currently less than perfect at making institutions corrigible [than AIs], doing scalable oversight on them, preventing mesa-optimisers from forming, and so on".

But is (2) actually true? Well, there are two comparisons we can make:

(A) Compare the alignment/corrigibility/control of our current institutions (e.g. Federal Reserve) against that of our current AIs (e.g. Claude Opus 4).

(B) Compare the alignment/corrigibility/control of our current institutions (e.g. Federal Reserve) against that of some speculative AIs that had the capabilities and affordances as those institutions (e.g. Claude-N, FedGPT).

And I'm claiming that Comparison B, not Comparison A, is the relevant comparison for determining whether institutional alignment/corrigiblity/control might be harder than AI alignment/corrigiblity/control.

And moreover, I think our current institutions are wayyyyyyyyy more aligned/corrigible/controlled than I'd expect from AIs with the same capabilities and affordances!

Imagine if you built an AI which substituted for the Federal Reserve but still behaved as corrigibly/aligned/controlled as the Federal Reserve actually does. Then I think people would be like "Wow, you just solved AI alignment/corrigibility/control!". Similarly for other institutions, e.g. the military, academia, big corporations, etc.

We can give the Federal Reserve (or similar institutions) a goal like "maximum employment and stable prices" and it will basically follow the goal within legal, ethical, safe bounds. Occasionally things go wrong, sure, but not in a "the Fed has destroyed the sun with nanobots"-kinda way. Such institutions aren't great, but they are way better than I'd expect from a misaligned AI at the same level of capabilities and affordances.

NB: I still buy that institutional alignment/corrigiblity/control might be harder than AI alignment/corrigiblity/control.^[1] My point is somewhat minor/nitpicky: I think (2) isn't good evidence and is slightly confused about A vs B.

^{^}
For example:
- Institutions are made partly of brains which hinders mech interp.
- They are made partly of physical stuff that's hard to copy. Hence: no evals.
- There's no analogue of SGD for institutions, because their behaviour isn't a smooth funciton on a manifold of easily-modifiable parameters.
- Powerful institutions already exist, so we'd probably be aligning/corrigibilising/controlling these incumbant institutions. But the powerful AIs don't exist so maybe makes our job easier. (This might also make AI alignment/corrigibility/control harder because we have less experience.)
- That's all the examples I can think for now :)

[-]Raymond Douglas5mo20

Ah! Ok, yeah, I think we were talking past each other here.

I'm not trying to claim here that the institutional case might be harder than the AI case. When I said "less than perfect at making institutions corrigible" I didn't mean "less compared to AI" I meant "overall not perfect". So the square brackets you put in (2) was not something I intended to express.

The thing I was trying to gesture at was just that there are kind of institutional analogs for lots of alignment concepts, like corrigibility. I wasn't aiming to actually compare their difficulty -- I think like you I'm not really sure, and it does feel pretty hard to pick a fair standard for comparison.

[-]Cleo Nardo5mo20

oh lmao I think I just misread "we are currently less than perfect at making institutions corrigible" as "we are currently less perfect at making institutions corrigible"

[-]Kaarel5mo32

Conversely, there is some (potentially high) threshold of societal epistemics + coordination + institutional steering beyond which we can largely eliminate anthropogenic x-risk, potentially in perpetuity

Note that this is not a logical converse of your first statement. I realize that the word "conversely" can be used non-strictly and might in fact be used this way by you here, but I'm stating this just in case.

My guess is that "there is some (potentially high) threshold of societal epistemics + coordination + institutional steering beyond which we can largely eliminate anthropogenic x-risk in perpetuity" is false — my guess is that improving [societal epistemics + coordination + institutional steering] is an infinite endeavor; I discuss this a bit here. That said, I think it is plausible that there is a possible position from which we could reasonably be fairly confident that things will be going pretty well for a really long time — I just think that this would involve one continuing to develop one's methods of [societal epistemics, coordination, institutional steering, etc.] as one proceeds.

[-]Raymond Douglas5mo53

Yeah agreed, I think the feasible goal is passing some tipping point where you can keep solving more problems as they come up, and that what comes next is likely to be a continual endeavour.

[-]Kaarel5mo*32

Basically nobody actually wants the world to end, so if we do that to ourselves, it will be because somewhere along the way we weren’t good enough at navigating collective action problems, institutional steering, and general epistemics

... or because we didn't understand important stuff well enough in time (for example: if it is the case that by default, the first AI that could prove would eat the Sun, we would want to firmly understand this ahead of time), or because we weren't good enough at thinking (for example, people could just be lacking in iq, or have never developed an adequate sense of what it is even like to understand something, or be intellectually careless), or because we weren't fast enough at disseminating or [listening to] the best individual understanding in critical cases, or because we didn't value the right kinds of philosophical and scientific work enough, or because we largely-ethically-confusedly thought some action would not end the world despite grasping some key factual broad strokes of what would happen after, or because we didn't realize we should be more careful, or maybe because generally understanding what will happen when you set some process in motion is just extremely cursed.^[1] I guess one could consider each of these to be under failures in general epistemics... but I feel like just saying "general epistemics" is not giving understanding its proper due here.

Many of these are related and overlapping. ↩︎

[-]Raymond Douglas5mo53

Sure, I'm definitely eliding a bunch of stuff here. Actually one of the things I'm pretty confused about is how to carve up the space, and what the natural category for all this is: epistemics feels like a big stretch. But there clearly is some defined thing that's narrower than 'get better at literally everything'.

[-]Raphael Roche5mo32

As AI gets more advanced, and therefore more risky, it will also unlock really radical advances in all these areas

This premise sounds optimistic to me. Risk is rising in current frontier models, while concrete applications to the real economy and society remain limited (with hallucinations and loss of focus on long tasks being major limitations). I don't see such strong claims becoming reality before ASI (if we don't die).

[-]listic5mo10

In which ways does any tech (let alone AI, but I'm with other commentators here in that I'm not convinced that it has to be AI) enable "coordination and sensible decision making" that you speak of?

[-]sloonz5mo10

The bar for ‘good enough’ might be quite high

That bar for "good enough" may also be above "unacceptable", requiring eusocial levels of coordination where individuals are essentially drones.

[-]Raymond Douglas5mo44

I think this is possible but unlikely, just because the number of things you need to really take off the table isn't massive, unless we're in an extremely vulnerable world. It seems very likely we'll need to do some power concentration, but also that tech will probably be able to expand the frontier in ways that means this doesn't trade so heavily against individual liberty.

^{^}

The easy route to 'coherent enough to not destroy itself' is 'controlled by a dictatorship/misaligned AI', so the more nebulous 'still anchored to the good' part is I think the actual tricky bit

^{^}

Importantly this might include making fundamental advances in understanding what it even means for an institution to be steered by some set of values

LESSWRONG
LW

LESSWRONG
LW

85

‘AI for societal uplift’ as a path to victory

85

85