LESSWRONG
LW

Thomas Larsen
2872Ω15181106
Message
Dialogue
Subscribe

I'm broadly interested in AI strategy and want to figure out the most effective interventions to get good AI outcomes. 

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
The Industrial Explosion
Thomas Larsen21d2221

This post seems systematically too slow to me, and to underrate the capabilities of superintelligence. One particular point of disagreement:

It seems reasonable to use days or weeks as an upper bound on how fast robot doublings could become, based on biological analogies. This is very fast indeed.20

When I read this, I thought this would say "lower bound". Why would you expect evolution to find globally optimal doubling times? This reads to me a bit like saying that the speed of a Cheetah or the size of an Blue Whale will be an upper bound on the speed/size of a robot. Why??? 

The case for lower bound seems clear: biology did it, probably a superintelligence could design a more functional robot than biology. 

Reply
A deep critique of AI 2027’s bad timeline models
Thomas Larsen1mo93

Small typo: Alog(B) = log(B^A), not log(A^B) 

Reply
Ryan Kidd's Shortform
Thomas Larsen1mo168

Also there's a good chance AI gov won't work, and labs will just have a very limited safety budget to implement their best guess mitigations. Or maybe AI gov does work and we get a large budget, we still need to actually solve alignment. 

Reply1
Early Chinese Language Media Coverage of the AI 2027 Report: A Qualitative Analysis
Thomas Larsen3mo3118

Thanks for writing this! 

Reply71
AI 2027: What Superintelligence Looks Like
Thomas Larsen3mo53

For what its worth, my view is that we're very likely to be wrong about the specific details in both of the endings -- they are obviously super conjunctive. I don't think that there's any way around this because we can be confident AGI is going to cause some ex-ante surprising things to happen. 

Also, this is scenario is around 20th percentile timelines for me, my median is early 2030s (though other authors disagree with me). I also feel much more confident about the pre-2027 scenario than about the post 2027 scenario.

Is your disagreement that you think AGI will happen later, or that you think the effects of AGI on the world will look very different, or both? If its just the timelines, we might have fairly similar views.  

Reply
AI 2027: What Superintelligence Looks Like
Thomas Larsen3mo82

This wasn't intended to be humor. In the scenario, we write: 

(To avoid singling out any one existing company, we’re going to describe a fictional artificial general intelligence company, which we’ll call OpenBrain. We imagine the others to be 3–9 months behind OpenBrain.)

I think that OpenAI, GDM, and Anthropic are in the lead and are the most likely to be ahead, with similar probability. 

Reply
AI 2027: What Superintelligence Looks Like
Thomas Larsen3mo372

Thank you! We actually tried to write one that was much closer to a vision we endorse! The TLDR overview was something like: 

  1. Both the US and Chinese leading AGI projects stop in response to evidence of egregious misalignment. 
     
  2. Sign a treaty to pause smarter-than-human AI development, with compute based enforcement similar to ones described in our live scenario, except this time with humans driving the treaty instead of the AI.
  3. Take time to solve alignment (potentially with the help of the AIs). This period could last anywhere between 1-20 years. Or maybe even longer! The best experts at this would all be brought in to the leading project, many different paths would be pursued (e.g. full mechinterp, Davidad moonshots, worst case ELK, uploads, etc).
  4. Somehow, a do a bunch of good governance interventions on the AGI project (e.g. transparency on use of the AGIs, no helpful only access to any one. party, a formal governance structure where a large number of diverse parties all are represented.).
  5. This culminates with aligning an AI "in the best interests of humanity" whatever that means, using a process where a large fraction of humanity is engaged and has some power to vote. This process might look something like giving each human some of the total resources of space and then doing lots of bargaining to find all the positive sum trades, with some rules against blackmail / using your resources to cause immense harm. 

Unfortunately, it was hard to write this out in a way that felt realistic. 

The next major project I focus on is likely going to be focusing on thinking through the right governance interventions here to make that happen. I'm probably not going to do this in scenario format (and instead something closer to normal papers and blog posts), but would be curious for thoughts. 

Reply
Why Don't We Just... Shoggoth+Face+Paraphraser?
Thomas Larsen6mo40

Thanks - I see, I was misunderstanding. 

Reply
Why Don't We Just... Shoggoth+Face+Paraphraser?
Thomas Larsen6mo20

Proposal part 1: Shoggoth/Face Distinction: Instead of having one model undergo agency training, we have two copies of the base model work together, specializing in different parts of the job, and undergo training together. Specifically we have the "shoggoth" copy responsible for generating all the 'reasoning' or 'internal' CoT, and then we have the "face" copy responsible for the 'actions' or 'external' outputs. So e.g. in a conversation with the user, the Shoggoth would see the prompt and output a bunch of reasoning token CoT; the Face would see the prompt + all that CoT and then output a message to pass to the user, and then the process would repeat.

Proposal part 2: Blind the Evaluation Process to the Internal Reasoning: Remember that reward process that evaluates overall performance? Hide from it all the outputs of the Shoggoth. Just show it the outputs from the Face. 

I don't see how this proposal could get you AGI-level capabilities. You need the "shoggoth" part of the model to gain general purpose capabilities somehow, and if you are completely shielding the shoggoth from RL, all of the capabilities have to come from pre-training, which seems unlikely. My guess is that a more realistic proposal involves having separate reward models for the two parts: 

  1. The shoggoth RM, which only rewards capabilities
  2. The face RM, which also includes alignment-relevant stuff 

But it isn't clear to me how to separate out the alignment/capabilities. IIUC the hope is that the capabilities training seems would naturally incentivize the shoggoth to have bad goals, and perhaps instrumentally decieve the humans, but to do this deception in the COT in english so we can see. 

You later write: 

How it all (hopefully) fits together: First, because of the paraphraser, the Shoggoth will not learn to speak in some optimized alien language, or even to speak in jargon or euphemisms -- because semantics not recognized by the paraphraser will be scrambled by it. So the Shoggoth will continue to 'think in English' even as it undergoes lots of agency training and learns substantial new skills.

Second, because the training process is blind to the outputs of the Shoggoth (and instead only sees what the Face outputs), there should be no pressure on the Shoggoth to learn to decieve, conceal, dissemble, doublethink, etc. Instead the Shoggoth will be 'free' to put all its important thoughts in plain English.

Which suggests we are training the shoggoth with agency training. But I don't understand how this is consistnet with "the training process be blind to the outputs of the Shoggoth". Did you mean just some "alignment" subset of the training process? 

Reply
"Slow" takeoff is a terrible term for "maybe even faster takeoff, actually"
Thomas Larsen10mo1916

I think a problem with all the proposed terms is that they are all binaries, and one bit of information is far too little to characterize takeoff: 

  • One person's "slow" is >10 years, another's is >6 months. 
  • The beginning and end points are super unclear; some people might want to put the end point near the limits of intelligence, some people might want to put the beginning points at >2x AI R&D speed, some at 10, etc. 
  • In general, a good description of takeoff should characterize capabilities at each point on the curve.  

So I don't really think that any of the binaries are all that useful for thinking or communicating about takeoff. I don't have a great ontology for thinking about takeoff myself to suggest instead, but I generally try to in communication just define a start and end point and then say quantitatively how long this might take. One of the central ones I really care about is the time between wakeup and takeover capable AIs. 

wakeup = "the first period in time when AIs are sufficiently capable that senior government people wake up to incoming AGI and ASI" 

takeover capable AIs = "the first time there is a set of AI systems that are coordinating together and could take over the world if they wanted to" 

The reason to think about this period is that (kind of by construction) it's the time where unprecedented government actions that matter could happen. And so when planning for that sort of thing this length of time really matters. 

Of course, the start and end times I think about are both fairly vague. They also aren't purely a function of AI capabilities, and they care about stuff like "who is in government" and "how capable our institutions are at fighting a rogue AGI".  Also, many people believe that we never will get takeover capable AIs even at superintelligence.

Reply
Load More
6Thomas Larsen's Shortform
Ω
3y
Ω
30
Holden Karnofsky
2y
(+15/-9)
Updateless Decision Theory
3y
(+1/-4)
Updateless Decision Theory
3y
(+272)
Counterfactual Mugging
3y
(+8/-6)
Updateless Decision Theory
3y
(+15/-13)
Updateless Decision Theory
3y
(+12/-11)
656AI 2027: What Superintelligence Looks Like
Ω
3mo
Ω
222
33Long-Term Future Fund Ask Us Anything (September 2023)
2y
6
123Introducing the Center for AI Policy (& we're hiring!)
2y
50
81Long-Term Future Fund: April 2023 grant recommendations
2y
3
39Challenge: construct a Gradient Hacker
Ω
2y
Ω
10
74Wentworth and Larsen on buying time
3y
6
34Ways to buy time
3y
23
6Thomas Larsen's Shortform
Ω
3y
Ω
30
101Instead of technical research, more people should focus on buying time
3y
45
64Possible miracles
3y
34
Load More