LESSWRONG
LW

2078
Raemon
59884Ω7644958676311
Message
Dialogue
Subscribe

LessWrong team member / moderator. I've been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I've been interested in improving my own epistemic standards and helping others to do so as well.

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Plans A, B, C, and D for misalignment risk
Raemon17h40

So I think this paragraph isn't really right, because "slowdown' != 'pause', and slowdowns might still be really really helpful and enough to get you a long way.

I think "everyone agrees to a noticeably smaller next-run-size" seems like a fine thing to do as the first coordination attempt.

I think there is something good about having an early step (maybe after that one), which somehow forces people to actually orient on "okay, suppose we actually had to prioritize interpretability and evals now until they were able to keep pace with capabilities, how would we seriously do that?"

(I don't currently have a good operationalize of this that seems robust, but, it seems plausible by the time we're meaningfully able to decide to do anything like this, someone may have come up with a good policy with that effect. I can definitely see this backfiring and causing people to get better at some kind of software that is then harder to control).

I actually currently think that you want to accelerate compute production, because hardware scaling seems safer than software scaling. I'm not sure exactly what you mean by "in an uncontrolled fashion".. if you mean "have a bunch of inspectors making sure the flow of new chips isn't being smuggled to illegal projects", then I agree with this, on my initial read I thought you meant something like "pause chip production until they start producing GPUS with HEMs in them", which I think is probably bad. 

In other words I think that you want to create a big compute overhang during a pause. The downside is obvious, but the upsides are: 

  1. compute is controllable, far more than software, and so differentially advances legal projects.
  2. more compute for safety. We want to be able to pay a big safety tax, more compute straightforwardly helps.
  3. extra compute progress funges against software progress, which is scarier.
  4. compute is destroyable (e.g. we can reverse and destroy compute, if we want to eat an overhang), but software progress mostly isn't (you can't unpublish reserach). 

Mmm, nod I can see it. I'd need to think more to figure out a considered opinion on this but seems a-priori reasonable.

I think one of the things I want is to have executed each type of control you might want to exert, at least for a shorter period of time, to test whether you're able to do it at all. But, having the early compute steps be more focused on "they have remote-shutdown options but can continue production" or at least a policy-level "there are enforcers sitting outside the compute centers that could choose to forcibly shut it down fairly quickly".

Reply
Plans A, B, C, and D for misalignment risk
Raemon18hΩ240

This framing feels reasonable-ish, with some caveats.[1]

I am assuming we're starting the question at the first stage where either "shut it down" or "have a strong degree of control over global takeoff" becomes plausibly politically viable. (i.e. assume early stages of Shut It Down and Controlled Takeoff both include various partial measures that are more immediately viable and don't give you the ability to steer capability-growth that hard)

But, once it becomes a serious question "how quickly should we progress through capabilities", then one thing to flag is, it's not like you know "we get 5 years, therefore, we want to proceed through those years at X rate." It's "we seem to have this amount of buy-in currently..." and the amount of buy-in could change (positively or negatively).

Some random thoughts on things that seem important:

  • I would want to do at least some early global pause on large training runs, to check if you are actually capable of doing that at all. (in conjunction with some efforts attempting to build international goodwill about it)
  • One of the more important things to do as soon as it's viable, is to stop production of more compute in an uncontrolled fashion. (I'm guessing this plays out with some kind of pork deals for nVidia and other leaders[2], where the early steps are 'consolidate compute', and then them producing the chips that are more monitorable, and which they get to make money from, but also are sort of nationalized). This prevents a big overhang.
  • Before I did a rapid-growth of capabilities, I would want a globally set target of "we are able to make some kind of interpretability strides or evals that let us make better able to predict the outcome of the next training run." (

If it's not viable to do that, well, then we don't. (but, then we're not really having a real convo about how slow the takeoff should ideally be, just riding the same incentive wave we're currently riding with slightly more steering). ((We can instead have a convo about how to best steer given various murky conditions, which seems like a real important convo, I'm just responding here to this comment's framing))[3]

If we reach a point where humanity has demonstrated the capability of "stop training on purpose, stop uncontrolled compute production, and noticeably improve our ability to predict the next training run", then I'm not obviously opposed to doing relatively rapid advancement, but, it's not obviously better to do "rapid to the edge" than "do one round where there are predictions/incentives/prizes somehow for people to accurately predict how the next training rounds go, then evaluate that, then do it again."

  1. ^

    I think there's at least some confusion where people are imagining the simplest/dumbest version of Shut It Down, and imagining "Plan A" is nuanced and complicated. I think the actual draft treaty has levers that are approximately the same levers you'd want to do this sort of controlled takeoff. 

  2. ^

    I'm not sure how powerful nVidia is an an interest group. Maybe it is important to avoid them getting a deal like this so they're less of an interest group with power at the negotiating table.

  3. ^

    FYI my "Ray detects some political bs motivations in himself" alarm is tripping as I write this paragraph. It currently seems right to me but let me know if I'm missing something here.

Reply21
What, if not agency?
Raemon19h20

I'd list "personalware" as an option that feels, well, uh, at least somewhat less personal than "soloware"

Reply
Bending The Curve
Raemon19h20

This here is a real life newcomblike problem.

Reply
Plans A, B, C, and D for misalignment risk
Raemon19h*Ω120

(Having otherwise complained a bunch about some of the commentary/framing around Plan A vs Shut It Down, I do overall like this post and think having the lens of the different worlds is pretty good for planning).

(I am also appreciating how people are using inline reacts)

Reply
Plans A, B, C, and D for misalignment risk
Raemon19hΩ240

Nod. 

FYI, I think Shut It Down is approximately as likely to happen as "Full-fledged Plan A that is sufficiently careful enough to actually help much more than [the first several stages of Plan A that Plan A and Shut It Down share]", on account of being simple enough that it's even really possible to coordinate on it.

I agree they are both pretty unlikely to happen. (Regardless, I think the thing to do is probably "reach for whatever wins seem achievable near term and try to build coordination capital for more wins")

I think it's a major possible failure mode of Plan A is "it turns it a giant regulatory capture molochian boondoggle that both slows thing down for a long time in confused bad ways and reads to the public as a somewhat weirdly cynical plot, which makes people turn against tech progress comparably or more than the average Shut It Down would." (I don't have a strong belief about the relative likelihoods of that 

None of those beliefs are particularly strong and I could easily learn a lot that would change all my beliefs.

Seems fine to leave it here. I dont have more arguments I didn't already write up in "Shut It Down" is simpler than  "Controlled Takeoff", just stating for the record I don't think you've put forth an argument that justifies the 3x increase in difficulty of Shut It Down over the fully fledged version of Plan A. (We might still be imagining different things re: Shut It Down)

Reply
Plans A, B, C, and D for misalignment risk
Raemon1dΩ240

Nod, I agree centralizing part is harder than non-centralized fab monitoring. But, I think a sufficient amount of "non-centralized" fab monitoring is still a much bigger ask than export controls, and, the centralization was part of at least one writeup of Plan A, and it seemed pretty weird to include that bit but write off "actual shutdown" as politically intractable.

Reply
Plans A, B, C, and D for misalignment risk
Raemon1dΩ253

I think compute + fab monitoring with potential for escalation requires much lower political will than shutting down AI development. I agree that both Plan A and shut it all down require something like this. Like I think this monitoring would plausibly not require much more political will than export controls.

FYI this is cruxy. I don't have very strong political-viability-intuitions, but seems like this requires export controls that several (sometimes rivalrous) major nations are agreeing to simultaneously, with at least nontrivial trust for establishing the monitoring process together, which eventually is pretty invasive.

(maybe you are imagining the monitoring is actually mostly done with spy satellites that don't require much trust or cooperation?)

But like, the last draft of Plan A I saw include "we relocate all the compute to centralized locations in third party countries" as an eventual goal. That seems pretty crazy?

Reply
Plans A, B, C, and D for misalignment risk
Raemon1dΩ372

Thanks. I'll leave some responses but feels more fine to leave here for now.

I think shutting down all AI development is much more costly than not shutting down all AI development in a pretty straightforward sense that will in fact probably be priced into the required level of political will: Nvidia is in fact much worse off if all AI development shuts down versus if AI development proceeds, but with capabilities developing more slowly once they reach a high level of capabilities.

I would guessed the stock market will react pretty different to something like Plan A vs "shut it all down" for reasonable reasons.

I don't understand why you think the opening steps are the most politically challenging part given that the opening steps for Plan A plausibly don't require stopping AI development.

First, slight clarification: the thing I had in mind isn't the opening step (which is presumably "do some ad hoc deals that build political momentum without too much cost").

The step I have in mind is "all global compute clusters and fab production is monitored, with buy in from China, UK, Europe etc, with intent for major international escalation of some kind of some violates the monitor-pact". This doesn't directly shut down nVidia, but, it sure is putting some writing on the wall that I would expect nVidian political interests to fight strongly even if it doesn't immediately come with a shut down.

I'm imagining a Plan A that doesn't include something like that is more like a Plan A / B hybrid or some other "not the full Plan A." (based on some other internal Plan A docs I've looked at that went into more detail as of a few weeks ago).

I don't think there's any way you get to that point without most major world leaders actually believing-in-their-heart "if anyone builds it, something real bad is dangerously likely to happen." And by the point people are actually agreeing to have international inspection of some kind, I would expect people to more thinking "okay will this actually work?" than "what do we have buy-in for?".

(There is a version where the US enforces it at gunpoint or at least economicsanction-point without everyone else's buy in but I both don't expect them to do that and don't really expect it to work?)

MIRI discusses in the IABIED resources that they would prefer carveouts for narrow bio AI, so it's not like they're even advocating all progress to stop. (Advanced bio AI seems pretty good for the world and to capture a lot of the benefits).

...

I certainly do expect you-et-al to disagree with MIRI-et-al on a bunch of implementation details of the treaty. 

But, it seems like a version of the treaty that doesn't at least have the capacity to shutdown compute temporarily is a kinda fake version of Plan A, and once you have that, "Shut down" vs "Controlled Takeoff" feels more like arguing details than fundamentals to me.

Reply
What, if not agency?
Raemon1d30

Curated. I'm still a bit confused about how Sahil's ideas about "co-agency" will play in practice, as AI capabilities get more extreme. (i.e. see Gwern's Tool AIs want to be Agent AIs).

But, I like all the individual nearterm pieces of this vision (i.e. trying to build a culture around building tools that help your specific workflow), and I like the concept of trying to steer toward designs that empower, but in practice there sure are a lot of places I (viscerally) really want the AI to do the stuff for me.

A thing that bubbles up for me is "empowerment" mindset, where I see some new tool and instead of merely thinking "what problems can I make go away magically", I think "what new awesome things can I do, that build on this?". Another lens is is that I've been doing more "hiring contractors to build stuff for me" lately, and having an AI do it isn't fundamentally different. Hiring people to help you with stuff can be part of agency... but it can also end up infantalizing if you do it wrong. (i.e. hiring an ops person to do a lots of your bullshit work and then ending up the sort of person who struggles to do their own paperwork when they need to).

I'd be interested in someone building a prototype of something filling the niche of a AI chatbot interface but somehow feels more in-control.

Reply
Load More
Step by Step Metacognition
Feedbackloop-First Rationality
The Coordination Frontier
Privacy Practices
Keep your beliefs cruxy and your frames explicit
LW Open Source Guide
Tensions in Truthseeking
Project Hufflepuff
Rational Ritual
Load More (9/10)
22Raemon's Shortform
Ω
8y
Ω
699
65"Intelligence" -> "Relentless, Creative Resourcefulness"
4d
28
153Nice-ish, smooth takeoff (with imperfect safeguards) probably kills most "classic humans" in a few decades.
8d
19
54</rant> </uncharitable> </psychologizing>
9d
11
80Why Corrigibility is Hard and Important (i.e. "Whence the high MIRI confidence in alignment difficulty?")
11d
52
93The Illustrated Petrov Day Ceremony
14d
11
99"Shut It Down" is simpler than "Controlled Takeoff"
16d
29
71Accelerando as a "Slow, Reasonably Nice Takeoff" Story
17d
20
194The title is reasonable
20d
128
45Meetup Month
23d
10
125Simulating the *rest* of the political disagreement
1mo
16
Load More
AI Consciousness
a month ago
AI Auditing
2 months ago
(+25)
AI Auditing
2 months ago
Guide to the LessWrong Editor
6 months ago
Guide to the LessWrong Editor
6 months ago
Guide to the LessWrong Editor
6 months ago
Guide to the LessWrong Editor
6 months ago
(+317)
Sandbagging (AI)
6 months ago
Sandbagging (AI)
6 months ago
(+88)
AI "Agent" Scaffolds
7 months ago
Load More