LESSWRONG
LW

580
Thomas Larsen
2948Ω17381156
Message
Dialogue
Subscribe

I'm broadly interested in AI strategy and want to figure out the most effective interventions to get good AI outcomes. 

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Plans A, B, C, and D for misalignment risk
Thomas Larsen1mo40

I think I mostly am on board with this comment. Some thoughts: 

Before I did a rapid-growth of capabilities, I would want a globally set target of "we are able to make some kind of interpretability strides or evals that let us make better able to predict the outcome of the next training run." (

  • this feels a bit overly binary to me. I think that understanding-based safety cases will be necessary for ASI. But behavioral methods seem like they might be sufficient before hand.
  • I don't know what you mean by "rapid growth". It seems like you might be imagining the "shut it all down -> solve alignment during pause -> rapidly scale after you've solved alignment" plan. I think we probably should never do a "rapid scaleup" 

Another reaction I have is that a constraint to coordination will probably be "is the other guy doing a blacksite which will screw us over". So I think there's a viability bump at the point of "allow legal capabiliites scaling at least as fast as the max size blacksite that you would have a hard time detecting". 

  • I would want to do at least some early global pause on large training runs, to check if you are actually capable of doing that at all. (in conjunction with some efforts attempting to build international goodwill about it)

So I think this paragraph isn't really right, because "slowdown' != 'pause', and slowdowns might still be really really helpful and enough to get you a long way.

 

  • One of the more important things to do as soon as it's viable, is to stop production of more compute in an uncontrolled fashion. (I'm guessing this plays out with some kind of pork deals for nVidia and other leaders[2], where the early steps are 'consolidate compute', and then them producing the chips that are more monitorable, and which they get to make money from, but also are sort of nationalized). This prevents a big overhang.

I actually currently think that you want to accelerate compute production, because hardware scaling seems safer than software scaling. I'm not sure exactly what you mean by "in an uncontrolled fashion".. if you mean "have a bunch of inspectors making sure the flow of new chips isn't being smuggled to illegal projects", then I agree with this, on my initial read I thought you meant something like "pause chip production until they start producing GPUS with HEMs in them", which I think is probably bad. 

In other words I think that you want to create a big compute overhang during a pause. The downside is obvious, but the upsides are: 

  1. compute is controllable, far more than software, and so differentially advances legal projects.
  2. more compute for safety. We want to be able to pay a big safety tax, more compute straightforwardly helps.
  3. extra compute progress funges against software progress, which is scarier.
  4. compute is destroyable (e.g. we can reverse and destroy compute, if we want to eat an overhang), but software progress mostly isn't (you can't unpublish reserach). 

 

(this comment might be confusing because I typed it quickly, happy to clarify if you want) 

Reply1
Plans A, B, C, and D for misalignment risk
Thomas Larsen1moΩ14290

One framing that I think might be helpful for thinking about "Plan A" vs "shut it all down" is: "Suppose that you have the political will for an n-year slowdown, i.e. after n years, you are forced to handoff trust to superhuman AI systems (e.g. for n = 5, 10, 30). What should the capability progression throughout the slowdown be?" This framing forces a focus on the exit condition / plan to do handoff, which I think is an underdiscussed weakness of the "shut it all down" plan. 

I think my gut reaction is that the most important considerations are: (i) there are a lot of useful things you can do with the AIs, so I want more time with the smarter AIs, and (ii) I want to scale through the dangerous capability range slowly and with slack (as opposed to at the end of the slowdown). 

  • this makes me think that particularly for a shorter slowdown (e.g. 5 years), you want to go fast at the beginning (e.g. scale to ~max controllable AI over the first year or two), and then elicit lots of work out of those AIs for the rest of the time period.
  • A key concern for the above plan is that govts/labs botch the measurement of "max controllable AI", and scale too far.
  • But it's not clear to me how a further delay helps with this, unless you have a plan for making the institutions better over time, or pursuing a less risky path (e.g. ignoring ML and doing human intelligence augmentation).
  • Going slower, on the other hand, definitely does help, but requires not shutting it all down.
  • More generally, it seems good to do something like "extend takeoff evenly by a factor of n", as opposed to something like "pause for n-1 years, and then do a 1 year takeoff".  
  • I am sympathetic to shut all down and go for human augmentation: I do think this reduces AI takeover risk a lot, but this requires a very long pause, and it requires our institutions to bet big on a very unpopular technology. I think that convincing governments to "shut it all down" without an exit strategy at all seems quite difficult as well. 

Ofc this framing also ignores some important considerations, e.g. choices about the capability progression effect both difficulty of enforcement/verification (in both directions: AI lie detectors/ai verification is helpful, while making AIs closer to the edge is a downside), as well as willingness to pay over time (e.g. scary demos or AI for epistemics might help increase WTP) 

Reply3111
Motivation control
Thomas Larsen1mo20

However, I also think that open agency approaches to transparency face two key difficulties: competitiveness and safety-of-the-components.[18] 

 

I think a third key difficulty with this class of approaches is something like "emergent agency", i.e. that each of the individual components seem to be doing something safe, but when you combine several of the agents, you get a scary agent. Intuition pump: each of the weights in a NN is very understandable (it's just a number) and not doing dangerous scheming, but if you compose them it might be scary. Analagously, each of the subagents in the open agency AI might not be scheming, but a collection of these agents might be scheming. 

Understanding the communications between the components seems like it may or may not be sufficient to mitigate this failure mode. If the understanding is "local", i.e. looking at a particular chain of reasoning and verifying that it is valid, this is probably not sufficient to mitigate the problem, as scary reasoning might be made up of a bunch of small chains of local valid reasoning that looks safe. So I think you want something like a reasonable global picture of the reasoning that the open agent is doing in order to mitigate "emergent agency".  

I think this is kind of related to types of the "safety of the components" failure mode you talk about, particularly in the analogue to the corporation passing memos around, but the memos not corresponding to the "real reasoning" going on. However, it could be that the "real reasoning" emerges on a higher level of abstraction than the individual agents. 

This sort of threat model leads me to think that if we're aiming for this sort of open agency, we shouldn't do end-to-end training of the whole system, lest we incentivize "emergent agency", even if we don't make the individual components less safe. 

Reply
Plans A, B, C, and D for misalignment risk
Thomas Larsen1moΩ10149

One upside of shut it all down is that it does in fact buy more time: in Plan A it is difficult to secure algorithmic secrets without extremely aggressive security measures, hence any rogue projects (e.g. nation state blacksites) can just coast off the algos developed by the verified projects. Then, a few years in, they fire up their cluster and try to do an intelligence explosion with the extra algorithmic progress. 

Reply1
The title is reasonable
Thomas Larsen1mo1412

>superintelligence

Small detail: My understanding of the IABIED scenario is that their AI was only moderately superhuman, not superintelligent

Reply111
The Industrial Explosion
Thomas Larsen4mo2928

This post seems systematically too slow to me, and to underrate the capabilities of superintelligence. One particular point of disagreement:

It seems reasonable to use days or weeks as an upper bound on how fast robot doublings could become, based on biological analogies. This is very fast indeed.20

When I read this, I thought this would say "lower bound". Why would you expect evolution to find globally optimal doubling times? This reads to me a bit like saying that the speed of a Cheetah or the size of an Blue Whale will be an upper bound on the speed/size of a robot. Why??? 

The case for lower bound seems clear: biology did it, probably a superintelligence could design a more functional robot than biology. 

Reply
A deep critique of AI 2027’s bad timeline models
Thomas Larsen5mo93

Small typo: Alog(B) = log(B^A), not log(A^B) 

Reply
Ryan Kidd's Shortform
Thomas Larsen5mo168

Also there's a good chance AI gov won't work, and labs will just have a very limited safety budget to implement their best guess mitigations. Or maybe AI gov does work and we get a large budget, we still need to actually solve alignment. 

Reply1
Early Chinese Language Media Coverage of the AI 2027 Report: A Qualitative Analysis
Thomas Larsen6mo3118

Thanks for writing this! 

Reply71
AI 2027: What Superintelligence Looks Like
Thomas Larsen7mo53

For what its worth, my view is that we're very likely to be wrong about the specific details in both of the endings -- they are obviously super conjunctive. I don't think that there's any way around this because we can be confident AGI is going to cause some ex-ante surprising things to happen. 

Also, this is scenario is around 20th percentile timelines for me, my median is early 2030s (though other authors disagree with me). I also feel much more confident about the pre-2027 scenario than about the post 2027 scenario.

Is your disagreement that you think AGI will happen later, or that you think the effects of AGI on the world will look very different, or both? If its just the timelines, we might have fairly similar views.  

Reply
Load More
6Thomas Larsen's Shortform
Ω
3y
Ω
30
670AI 2027: What Superintelligence Looks Like
Ω
7mo
Ω
222
33Long-Term Future Fund Ask Us Anything (September 2023)
2y
6
123Introducing the Center for AI Policy (& we're hiring!)
2y
50
81Long-Term Future Fund: April 2023 grant recommendations
2y
3
39Challenge: construct a Gradient Hacker
Ω
3y
Ω
10
74Wentworth and Larsen on buying time
3y
6
34Ways to buy time
3y
23
6Thomas Larsen's Shortform
Ω
3y
Ω
30
101Instead of technical research, more people should focus on buying time
3y
45
64Possible miracles
3y
34
Load More
Holden Karnofsky
3 years ago
(+15/-9)
Updateless Decision Theory
3 years ago
(+1/-4)
Updateless Decision Theory
3 years ago
(+272)
Counterfactual Mugging
3 years ago
(+8/-6)
Updateless Decision Theory
3 years ago
(+15/-13)
Updateless Decision Theory
3 years ago
(+12/-11)