Raymond Douglas — LessWrong

Are people interested in a regular version of this, probably on a substack? Plus, any other thoughts on the format.

best guesses: valuable, hat tip, disappointed, right assumption wrong conclusion, +1, disgusted, gut feeling, moloch, subtle detail, agreed, magic smell, broken link, link redirect, this is the diff

I wonder if it would be cheap/worthwhile to just get a bunch of people to guess for a variety of symbols to see what's actually intuitive?

‘AI for societal uplift’ as a path to victory

Raymond Douglas4mo20

Ah! Ok, yeah, I think we were talking past each other here.

I'm not trying to claim here that the institutional case might be harder than the AI case. When I said "less than perfect at making institutions corrigible" I didn't mean "less compared to AI" I meant "overall not perfect". So the square brackets you put in (2) was not something I intended to express.

The thing I was trying to gesture at was just that there are kind of institutional analogs for lots of alignment concepts, like corrigibility. I wasn't aiming to actually compare their difficulty -- I think like you I'm not really sure, and it does feel pretty hard to pick a fair standard for comparison.

‘AI for societal uplift’ as a path to victory

Raymond Douglas4mo20

I'm not sure I understand what you mean by relevant comparison here. What I was trying to claim in the quote is that humanity already faces something analogous to the technical alignment problem in building institutions, which we haven't fully solved.

If you're saying we can sidestep the institutional challenge by solving technical alignment, I think this is partly true -- you can pass the buck of aligning the fed onto aligning Claude-N, and in turn onto whatever Claude-N is aligned to, which will either be an institution (same problem!) or some kind of aggregation of human preferences and maybe the good (different hard problem!).

‘AI for societal uplift’ as a path to victory

Raymond Douglas4mo53

Sure, I'm definitely eliding a bunch of stuff here. Actually one of the things I'm pretty confused about is how to carve up the space, and what the natural category for all this is: epistemics feels like a big stretch. But there clearly is some defined thing that's narrower than 'get better at literally everything'.

‘AI for societal uplift’ as a path to victory

Raymond Douglas4mo53

Yeah agreed, I think the feasible goal is passing some tipping point where you can keep solving more problems as they come up, and that what comes next is likely to be a continual endeavour.

‘AI for societal uplift’ as a path to victory

Raymond Douglas4mo52

Yeah, I fully expect that current level LMs will by default make the situation both better and worse. I also think that we're still a very long way from fully utilising the things that the internet has unlocked.

My holistic take is that this approach would be very hard, but not obviously harder than aligning powerful AIs and likely complementary. I also think it's likely we might need to do some of this ~societal uplift anyway so that we do a decent job if and when we do have transformative AI systems.

Some possible advantages over the internet case are:

People might be more motivated towards by the presence of very salient and pressing coordination problems
- For example, I think the average head of a social media company is maybe fine with making something that's overall bad for the world, but the average head of a frontier lab is somewhat worried about causing extinction
Currently the power over AI is really concentrated and therefore possibly easier to steer
A lot of what matters is specifically making powerful decision makers more informed and able to coordinate, which is slightly easier to get a handle on

As for the specific case of aligned super-coordinator AIs, I'm pretty into that, and I guess I have a hunch that there might be a bunch of available work to do in advance to lay the ground for that kind of application, like road-testing weaker versions to smooth the way for adoption and exploring form factors that get the most juice out of the things LMs are comparatively good at. I would guess that there are components of coordination where LMs are already superhuman, or could be with the right elicitation.

‘AI for societal uplift’ as a path to victory

Raymond Douglas4mo44

I think this is possible but unlikely, just because the number of things you need to really take off the table isn't massive, unless we're in an extremely vulnerable world. It seems very likely we'll need to do some power concentration, but also that tech will probably be able to expand the frontier in ways that means this doesn't trade so heavily against individual liberty.

‘AI for societal uplift’ as a path to victory

Raymond Douglas4mo119

Yeah strongly agree with the flag. In my mind one of the big things missing here is a true name for the direction, which will indeed likely involve a lot of non-LM stuff, even if LMs are yielding a lot of the unexpected affordances.

One of the places I most differ from the 'tech for thinking' picture is that I think the best version of this might need to involve giving people some kinds of direct influence and power, rather than mere(!) reasoning and coordination aids. But I'm pretty confused about how true/central that is, or how to fold it in.

‘AI for societal uplift’ as a path to victory

Raymond Douglas4mo57

Definitely. But I currently suspect that for this approach:

We currently have a big overhang: we could be getting a lot even out of the models we already have
There's some tipping point beyond which society is uplifted enough to correctly prioritise getting more uplifted
Getting to that tipping point wouldn't require massively more advanced AI capabilities in a lot of the high-diffusion areas (i.e. Claude 4 might well be good enough for anything that requires literally everyone to have access to their own model)
The areas that might require more advanced capabilities require comparatively little diffusion (e.g. international coordination, lab oversight)

So definitely this fails if takeoff is really fast, but I think it might work given current takeoff trends if we were fast enough at everything else.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments