LESSWRONG
LW

342
Raemon
59564Ω7324948620311
Message
Dialogue
Subscribe

LessWrong team member / moderator. I've been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I've been interested in improving my own epistemic standards and helping others to do so as well.

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Wei Dai's Shortform
Raemon3h20

Say more about the de-facto eugenics program?

Reply
Raemon's Shortform
Raemon3h40

I've heard ~"I don't really get this concept of 'intelligence in the limit'" a couple times this week. 

Which seems worth responding to, but I'm not sure how.

It seemed like some combination of: "wait, why do we care about 'superintelligence in the limit' as opposed to any particular "superintelligence-in-practice?", as well as "what exactly do we mean by The Limit?" and "why would we think The Limit shaped the way Yudkowsky thinks?"

My impression, based on my two most recent conversations about it, is that this is not only sort of cloudy and confusing feeling to some people, but, also, it's intertwined with a few other things that are separately cloudy and confusing. And also it's intertwined with other things that aren't cloudy and confusing per-se, but there's a lot of individual arguments to keep track of, so it's easy to get lost.

One ontology here is:

  • it's useful to reason with nice abstractions that generalize to different situations.
    • (It's easier to think about such abstractions at extremes, given simple assumptions)
  • its also useful to reason about the nitty-gritty details of a particular implementation of a thing.
  • it's useful to be able to move back and forth between abstractions, and specific implementations.

One person I chatted with seemed to both be frustrated simultaneously with:

[note, not sure if this is a correct summary of them, they can pop up here to clarify if they want]

"Why does this concept of corrigibility need to care about the specifics of 'powerful enough to end the acute risk period?' Why can't we just think about iteratively making an AI more corrigible as we improve it's capabilities? That can't possibly be important to the type-signature of corrigibility?"

and also: "what is this 'The Limit?' and why do I care?"

and also: "Why is MIRI always talking about abstractions and not talking about the specific implementation details of how takeoff will work in practice? It seems sus and castles-on-sand-y"

My answer was "well, the type signature of corrigibility doesn't care about any particular powerful level, but, it's useful to have typed out "what does corrigibility look at in the limit?". But the reason it's useful to specifically think about corrigibility at The Limit is because the actually nitty-gritty-details of what we need corrigibility for require absurd power levels (i.e. preventing loss by immediate AI takeover, and also loss from longterm evolution a couple decades later).

It seemed like what was going on, in this case, was they were attempting to loop through the "abstractions" and "gritty details" and "interplay between the two", but they didn't have a good handle on abstraction-in-The-Limit, and then they couldn't actually complete the loop. And because it was fuzzy to think about the gritty-details without a clear abstraction, and hard to think about abstractions without realistic details, this was making it hard to think about either. (Even though, AFAICT, the problem lay mostly in The Abstract Limit side, not the Realistic Details side)

...

A different person recently seemed to similarly not be grokking "why do we care about the Limit?", but the corresponding problem wasn't about any particular other argument, just, there were a lot of other arguments and they weren't keeping track of all of them at once, and it seemed like they were getting lost in a more mundane way.

...

I don't actually know what to do with any of this, because I'm not sure what's confusing about "Intelligence in the limit." 

(Or: I get that there's a lot of fuzziness there you need to keep track of while reasoning about this. But the basic concept of "well, if it was imperfect at either not-getting-resource-pumped, or making suboptimal game theory choices, or if it gave up when it got stuck, it would know that it wasn't as cognitively powerful as it could be, and would want to find ways to be more cognitively powerful all-else-equal"... seems straightforward to me, and I'm not sure what makes it not straightforward seeming to others).

Reply
shortplav
Raemon4h20

I think this is less important than the other confusing terms in this thread, but something I stumbled into yesterday:

"Intelligence"/"Capable" -> "Relentlessly Resourceful/Creative" [1]

(at least in some contexts)

i.e. the reason you expect a superintelligence to be difficult to control, is not exactly the raw intelligence. It's that (some people think) the way something succeeds at being truly superintelligent requires being relentlessly resourceful.

(If it wasn't relentlessly resourceful, it maybe could one-shot a large-but-shallow set of problems in a way that wasn't concerning. But, then, if it hit a snag, it would get stuck, and it would be less useful than something that didn't get stuck when it hit snags)

I like this because

a) it highlights what the problem is, more clearly.

and b), it highlights "if you could build a very useful powerful tool that succeeds without being relentlessly resourceful, that's maybe a useful avenue."

For examples of relentlessly resourceful people, see:

  • Startup founders
  • Prolific Inventors
  • Elon Musk in a particularly famous way that includes both traditional startup-founder-y but also technical innovation
  • Richard Feyman (who found he had a hard time doing important work after working on the atom bomb, but then solved the problem by changing his mindset to deliberately not focus on "important" things and just follow his interests, which eventually led to more good ideas)

For people that are smart but not obviously "relentless resourceful", see "one hit wonders", or people who have certain kinds of genius but it only comes in flashes and they don't really know how to cultivate it on purpose.

  1. ^

    Resourceful and Creative sort of mean the same thing and for conciseness I'm going to mostly say "Relentlessly Resourceful" since it's more fun evocative, but, there are some connotations creativity has that are important to not loose track of. i.e. not just able to fully exhaust all local resources, but, able to think from a wide variety of angles and see entirely different solutions that lie completely outside it's current set of affordances.

Reply
Nice-ish, smooth takeoff (with imperfect safeguards) probably kills most "classic humans" in a few decades.
Raemon6h20

I definitely think the "benevolent godlike singleton" is just as likely to fail in horrifying ways as any other scenario. Once you permanently give away all your power, how do you guarantee any bargain?

This is why you won't build a benevelont godlike singleton until you have vastly more knowledge than we currently have (i.e, with augmenting human intelligence, etc)[1]

  1. ^

    I'm not sure I buy the current orientation Eliezer/Nate have to augmented human intelligence in the context of global shutdown, but, does seem like a thing you want before building anything that's likely to escalate to full overwhelming-in-the-limit-superintelligence.

Reply
Thomas Kwa's Shortform
Raemon10h20

Low polarization. If there's high polarization with a strong opposing side, the opposing side can point to the radicals in order to hurt the moderates.[2]

I'm not 100% sure what this means (but it sounds interesting)

Agreed that this is a good frame.

Reply
Why Corrigibility is Hard and Important (i.e. "Whence the high MIRI confidence in alignment difficulty?")
Raemon10h20

Thing I wanted to briefly check before responding to some other comments – does your work here particularly route through criticism or changing of the VNM axioms frame? 

Reply
Nice-ish, smooth takeoff (with imperfect safeguards) probably kills most "classic humans" in a few decades.
Raemon10h*20

I mean I don't believe most of the leadup-assumptions to this world in the first place. 

But, AI would be way more competent at self-modification than billionaires. I think hopes that routes through: "build a aligned-ish medium strength AI, in a corrigibility basin", then "ask it to help you make it more aligned" is the the sort of thing that might work if you actually succeed at the first step. 

Just, you have to then progress quickly to fully aligned wise powerful longterm safeguards, which implies a degree of 'fully solve alignment' that it didn't seem like everyone was properly imagining as necesssary."

Reply
Nice-ish, smooth takeoff (with imperfect safeguards) probably kills most "classic humans" in a few decades.
Raemon10h53

Nod. That is a somewhat different position from "trying to leverage AI to fully solve alignment, and then leverage it to fundamentally change the situation somehow", but, I'd consider the position you put here to be conceptually similar and this post isn't arguing against it. 

This post is mostly spelling out the explicit assumptions:

  • "you need permanent safeguards"
  • "those safeguards are very complex and wisdom-loaded"
  • and, "you have to build those safeguards before insufficiently friendly AI controls the solar system."

The people with the most sophisticated views may all agree with this, but I don't see those assumptions spelled out clearly very often when coming from this direction, and I want to make sure people are on the same page about that requirement, or check if there are arguments for slow-takeoff optimism that don't route through those three assumptions, since they constrain the goal-state a fair amount.

Reply
Nice-ish, smooth takeoff (with imperfect safeguards) probably kills most "classic humans" in a few decades.
Raemon1d102

Nod, I deliberately titled a section There is no safe "muddling through" without perfect safeguards (in an earlier draft I did just say "there is no safe muddling through", and then was like "okay, that's false, because, seems totally plausible muddle through into figuring out longerterm safeguards, 

(and, in fact, I don't have a plan to get longterm safeguards that don't look like some kind of muddling through, in some sense)

I was just chatting with @1a3orn, and he brought up a similar point to the industrial revolution concern, and I totally agree.

Some background assumptions I have here:

  • you can't reason your way all the way to "safely navigate the industrial revolution", yeah. Some notable failures:
    • inventing communism
    • trying to invent the cotton gin to make slavery less bad, accidentally produce way more slavery via inducing demand
    • environmentalism ending up banning nuclear stuff that caused a lot of environmental damage
    • (there are positive examples too I think, but the existence of these negative examples should put the fear of god in you)
  • it's still possible to do nonzero reasoning ahead. You can put constraints on what sort of things possibly make sense to be doing.
    • early industrial revolution: if you don't see the first steam train and think "oh shit, everything is gonna change", man you are going to be pointed in the wrong direction completely
      • analogously: if you don't look at the oncoming AI (as well as general economic trends), and think "man, All Possible Views About Humanity's Future Are Wild", you're not pointed in the right direction at all

Part of the point of this post was to lay out "here's the rough class of thing that seems like it's gonna happen by default. Seems like either we need to learn new facts, or, we need a process with an extreme amount of power and wisdom, or, we should expect some cluster of bad things to probably happen.

During my chat with 1a3orn, I did notice:

Okay, if I'm trying to solve the 'death by evolution' problem (assuming we got nice smooth takeoff still), an alternate plan from "build the machine god" is:

Send human-uploads with some von-neuman probes to every star in the universe, immediately, before we leave The Dreamtime. And then probably there will at least be a lot of subjective experience-timeslices and chances for some of them to figure out how to make good things happen, with (maybe) like a 10 year head start before hollow grabby AI comes after them.

I don't actually believe in nice slow takeoff or 10 year lead times before Hollow Grabby AI comes after them, but, if I did, that'd at least be a coherent plan.

The problems with that is that are:

a) it's still leaving a lot of risk of costly war between the human diaspora and the Hollow Grabby AI

b) many of the humans across the universe are probably going to do horrible S-risky mindcrime.

So, I'm not very satisfied with that plan, but I mention it to help broaden the creative range of solutions from "build a CEV god" to include at least one other type of option.

Reply1
Nice-ish, smooth takeoff (with imperfect safeguards) probably kills most "classic humans" in a few decades.
Raemon1d20

Yeah I agree, but one of the points of this post was meant to be take as assumption "good people at Anthropic or whatever do a good job building an actually-more-moral-than-average-for-most-practical purposes human level intelligence" (i.e. via the mechanisms like "the weak superintelligence can just decide to self-modify into the sort of being who doesn't feel pressure to grab all the resources from vastly weaker, slower, stupider being, even though it'd be so easy.")

(and, like, I do buy the arguments that if we're assuming the first bunch of IMO optimistic assumptions about getting to humanish-level-alignment being easy, it's actually not that hard to do that step)

But then, argue: yeah, even if we assume that one, it's really not great.

Reply
Load More
22Raemon's Shortform
Ω
8y
Ω
684
Step by Step Metacognition
Feedbackloop-First Rationality
The Coordination Frontier
Privacy Practices
Keep your beliefs cruxy and your frames explicit
LW Open Source Guide
Tensions in Truthseeking
Project Hufflepuff
Rational Ritual
Load More (9/10)
124Nice-ish, smooth takeoff (with imperfect safeguards) probably kills most "classic humans" in a few decades.
1d
16
53</rant> </uncharitable> </psychologizing>
2d
11
78Why Corrigibility is Hard and Important (i.e. "Whence the high MIRI confidence in alignment difficulty?")
4d
46
93The Illustrated Petrov Day Ceremony
7d
11
97"Shut It Down" is simpler than "Controlled Takeoff"
9d
29
70Accelerando as a "Slow, Reasonably Nice Takeoff" Story
11d
12
193The title is reasonable
14d
128
45Meetup Month
16d
10
125Simulating the *rest* of the political disagreement
1mo
16
98Yudkowsky on "Don't use p(doom)"
1mo
39
Load More
AI Consciousness
a month ago
AI Auditing
2 months ago
(+25)
AI Auditing
2 months ago
Guide to the LessWrong Editor
6 months ago
Guide to the LessWrong Editor
6 months ago
Guide to the LessWrong Editor
6 months ago
Guide to the LessWrong Editor
6 months ago
(+317)
Sandbagging (AI)
6 months ago
Sandbagging (AI)
6 months ago
(+88)
AI "Agent" Scaffolds
6 months ago
Load More