Last week we wrapped the second post-AGI workshop; I'm copying across some reflections I put up on twitter:
Very nice! A couple months ago I did something similar, repeatedly prompting ChatGPT to make images of how it "really felt" without any commentary, and it did mostly seem like it was just thinking up plausible successive twists, even though the eventual result was pretty raw.
Pictures in order
Are people interested in a regular version of this, probably on a substack? Plus, any other thoughts on the format.
best guesses: valuable, hat tip, disappointed, right assumption wrong conclusion, +1, disgusted, gut feeling, moloch, subtle detail, agreed, magic smell, broken link, link redirect, this is the diff
I wonder if it would be cheap/worthwhile to just get a bunch of people to guess for a variety of symbols to see what's actually intuitive?
Ah! Ok, yeah, I think we were talking past each other here.
I'm not trying to claim here that the institutional case might be harder than the AI case. When I said "less than perfect at making institutions corrigible" I didn't mean "less compared to AI" I meant "overall not perfect". So the square brackets you put in (2) was not something I intended to express.
The thing I was trying to gesture at was just that there are kind of institutional analogs for lots of alignment concepts, like corrigibility. I wasn't aiming to actually compare their difficulty -- I think like you I'm not really sure, and it does feel pretty hard to pick a fair standard for comparison.
I'm not sure I understand what you mean by relevant comparison here. What I was trying to claim in the quote is that humanity already faces something analogous to the technical alignment problem in building institutions, which we haven't fully solved.
If you're saying we can sidestep the institutional challenge by solving technical alignment, I think this is partly true -- you can pass the buck of aligning the fed onto aligning Claude-N, and in turn onto whatever Claude-N is aligned to, which will either be an institution (same problem!) or some kind of aggregation of human preferences and maybe the good (different hard problem!).
Sure, I'm definitely eliding a bunch of stuff here. Actually one of the things I'm pretty confused about is how to carve up the space, and what the natural category for all this is: epistemics feels like a big stretch. But there clearly is some defined thing that's narrower than 'get better at literally everything'.
Yeah agreed, I think the feasible goal is passing some tipping point where you can keep solving more problems as they come up, and that what comes next is likely to be a continual endeavour.
Yeah, I fully expect that current level LMs will by default make the situation both better and worse. I also think that we're still a very long way from fully utilising the things that the internet has unlocked.
My holistic take is that this approach would be very hard, but not obviously harder than aligning powerful AIs and likely complementary. I also think it's likely we might need to do some of this ~societal uplift anyway so that we do a decent job if and when we do have transformative AI systems.
Some possible advantages over the internet case are:
As for the specific case of aligned super-coordinator AIs, I'm pretty into that, and I guess I have a hunch that there might be a bunch of available work to do in advance to lay the ground for that kind of application, like road-testing weaker versions to smooth the way for adoption and exploring form factors that get the most juice out of the things LMs are comparatively good at. I would guess that there are components of coordination where LMs are already superhuman, or could be with the right elicitation.
Sorry! I realise now that this point was a bit unclear. My sense of the expanded claim is something like:
For my part I found this surprising because I hadn't reflected on the sheer orders of magnitude involved, and the fact that any version of this basically involves passing through some fragile craziness. Even if it's small as a proportion of future GDP, it would in absolute terms be tremendously large.
I separately think there was something important to Korinek's claim (which I can't fully regenerate) that the relevant thing isn't really whether stuff is 'cheaper', but rather the prices of all of these goods relative to everything else going on.