LESSWRONG
LW

Raymond Douglas
1393Ω9614450
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
‘AI for societal uplift’ as a path to victory
Raymond Douglas2d20

Ah! Ok, yeah, I think we were talking past each other here.

I'm not trying to claim here that the institutional case might be harder than the AI case. When I said "less than perfect at making institutions corrigible" I didn't mean "less compared to AI" I meant "overall not perfect". So the square brackets you put in (2) was not something I intended to express.

The thing I was trying to gesture at was just that there are kind of institutional analogs for lots of alignment concepts, like corrigibility. I wasn't aiming to actually compare their difficulty -- I think like you I'm not really sure, and it does feel pretty hard to pick a fair standard for comparison.

Reply
‘AI for societal uplift’ as a path to victory
Raymond Douglas2d20

I'm not sure I understand what you mean by relevant comparison here. What I was trying to claim in the quote is that humanity already faces something analogous to the technical alignment problem in building institutions, which we haven't fully solved.

If you're saying we can sidestep the institutional challenge by solving technical alignment, I think this is partly true -- you can pass the buck of aligning the fed onto aligning Claude-N, and in turn onto whatever Claude-N is aligned to, which will either be an institution (same problem!) or some kind of aggregation of human preferences and maybe the good (different hard problem!).

Reply
‘AI for societal uplift’ as a path to victory
Raymond Douglas2d31

Sure, I'm definitely eliding a bunch of stuff here. Actually one of the things I'm pretty confused about is how to carve up the space, and what the natural category for all this is: epistemics feels like a big stretch. But there clearly is some defined thing that's narrower than 'get better at literally everything'.

Reply
‘AI for societal uplift’ as a path to victory
Raymond Douglas2d31

Yeah agreed, I think the feasible goal is passing some tipping point where you can keep solving more problems as they come up, and that what comes next is likely to be a continual endeavour.

Reply
‘AI for societal uplift’ as a path to victory
Raymond Douglas5d30

Yeah, I fully expect that current level LMs will by default make the situation both better and worse. I also think that we're still a very long way from fully utilising the things that the internet has unlocked.

My holistic take is that this approach would be very hard, but not obviously harder than aligning powerful AIs and likely complementary. I also think it's likely we might need to do some of this ~societal uplift anyway so that we do a decent job if and when we do have transformative AI systems.

Some possible advantages over the internet case are:

  • People might be more motivated towards by the presence of very salient and pressing coordination problems
    • For example, I think the average head of a social media company is maybe fine with making something that's overall bad for the world, but the average head of a frontier lab is somewhat worried about causing extinction
  • Currently the power over AI is really concentrated and therefore possibly easier to steer
  • A lot of what matters is specifically making powerful decision makers more informed and able to coordinate, which is slightly easier to get a handle on

As for the specific case of aligned super-coordinator AIs, I'm pretty into that, and I guess I have a hunch that there might be a bunch of available work to do in advance to lay the ground for that kind of application, like road-testing weaker versions to smooth the way for adoption and exploring form factors that get the most juice out of the things LMs are comparatively good at. I would guess that there are components of coordination where LMs are already superhuman, or could be with the right elicitation.

Reply
‘AI for societal uplift’ as a path to victory
Raymond Douglas5d22

I think this is possible but unlikely, just because the number of things you need to really take off the table isn't massive, unless we're in an extremely vulnerable world. It seems very likely we'll need to do some power concentration, but also that tech will probably be able to expand the frontier in ways that means this doesn't trade so heavily against individual liberty.

Reply
‘AI for societal uplift’ as a path to victory
Raymond Douglas5d97

Yeah strongly agree with the flag. In my mind one of the big things missing here is a true name for the direction, which will indeed likely involve a lot of non-LM stuff, even if LMs are yielding a lot of the unexpected affordances. 

One of the places I most differ from the 'tech for thinking' picture is that I think the best version of this might need to involve giving people some kinds of direct influence and power, rather than mere(!) reasoning and coordination aids. But I'm pretty confused about how true/central that is, or how to fold it in.

Reply
‘AI for societal uplift’ as a path to victory
Raymond Douglas6d20

Definitely. But I currently suspect that for this approach:

  1. We currently have a big overhang: we could be getting a lot even out of the models we already have
  2. There's some tipping point beyond which society is uplifted enough to correctly prioritise getting more uplifted
  3. Getting to that tipping point wouldn't require massively more advanced AI capabilities in a lot of the high-diffusion areas (i.e. Claude 4 might well be good enough for anything that requires literally everyone to have access to their own model)
  4. The areas that might require more advanced capabilities require comparatively little diffusion (e.g. international coordination, lab oversight)

So definitely this fails if takeoff is really fast, but I think it might work given current takeoff trends if we were fast enough at everything else.

Reply
Richard Ngo's Shortform
Raymond Douglas1mo40

Interesting! Two questions:

  • What about the 5-and-10 problem makes it particularly relevant/interesting here? What would a 'solution' entail?
  • How far are you planning to build empirical cases, model them, and generalise from below, versus trying to extend pure mathematical frameworks like geometric rationality? Or are there other major angles of attack you're considering?
Reply
Gradual Disempowerment: Concrete Research Projects
Raymond Douglas1mo51

To me the reason the agent/model distinction matters is that there are ways in which an LLM is not an agent, so inferences (behavioural or mechanistic) that would make sense for an agent can be incorrect. For example, a LM's outputs ("I've picked a secret answer") might give the impression that it has internally represented something when it hasn't, and so intent-based concepts like deception might not apply in the way we expect them to.

I think the dynamics of model personas seem really interesting! To me the main puzzle is methodological: how do you even get traction on it empirically? I'm not sure how you'd know if you were identifying real structure inside the model, so I don't see any obvious ways in. But I think progress here could be really valuable! I guess the closest concrete thing I've been thinking about is studying the dynamics of repeatedly retraining models on interactions with users who have persistent assumptions about the models, and seeing how much that shapes the distribution of personality traits. Do you have ideas in mind?

Reply
Load More
69‘AI for societal uplift’ as a path to victory
6d
22
23Upcoming workshop on Post-AGI Civilizational Equilibria
19d
0
98Gradual Disempowerment: Concrete Research Projects
Ω
1mo
Ω
10
74Disempowerment spirals as a likely mechanism for existential catastrophe
3mo
7
30Selection Pressures on LM Personas
Ω
3mo
Ω
0
164Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development
Ω
5mo
Ω
65
11What does success look like?
6mo
0
53The Choice Transition
8mo
4
153Decomposing Agency — capabilities without desires
Ω
1y
Ω
32
213ChatGPT can learn indirect control
1y
27
Load More