Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is a transcript of a conversation between Paul Christiano and Eliezer Yudkowsky, with comments by Rohin Shah, Beth Barnes, Richard Ngo, and Holden Karnofsky, continuing the Late 2021 MIRI Conversations.


Color key:

 Chat by Paul and Eliezer  Other chat 

 

15. October 19 comment

 

[Yudkowsky][11:01] 

thing that struck me as an iota of evidence for Paul over Eliezer: https://twitter.com/tamaybes/status/1450514423823560706?s=20 

 

16. November 3 conversation

 

16.1. EfficientZero

 

[Yudkowsky][9:30] 

Thing that (if true) strikes me as... straight-up falsifying Paul's view as applied to modern-day AI, at the frontier of the most AGI-ish part of it and where Deepmind put in substantial effort on their project?  EfficientZero (allegedly) learns Atari in 100,000 frames.  Caveat: I'm not having an easy time figuring out how many frames MuZero would've required to achieve the same performance level.  MuZero was trained on 200,000,000 frames but reached what looks like an allegedly higher high; the EfficientZero paper compares their performance to MuZero on 100,000 frames, and claims theirs is much better than MuZero given only that many frames.

https://arxiv.org/pdf/2111.00210.pdf  CC: @paulfchristiano.

(I would further argue that this case is important because it's about the central contemporary model for approaching AGI, at least according to Eliezer, rather than any number of random peripheral AI tasks.)

[Shah][14:46]  

I only looked at the front page, so might be misunderstanding, but the front figure says "Our proposed method EfficientZero is 170% and 180% better than the previous SoTA performance in mean and median human normalized score [...] on the Atari 100k benchmark", which does not seem like a huge leap?

Oh, I incorrectly thought that was 1.7x and 1.8x, but it is actually 2.7x and 2.8x, which is a bigger deal (though still feels not crazy to me)

[Yudkowsky][15:28]  

the question imo is how many frames the previous SoTA would require to catch up to EfficientZero

(I've tried emailing an author to ask about this, no response yet)

like, perplexity on GPT-3 vs GPT-2 and "losses decreased by blah%" would give you a pretty meaningless concept of how far ahead GPT-3 was from GPT-2, and I think the "2.8x performance" figure in terms of scoring is equally meaningless as a metric of how much EfficientZero improves if any

what you want is a notion like "previous SoTA would have required 10x the samples" or "previous SoTA would have required 5x the computation" to achieve that performance level

[Shah][15:38]  

I see. Atari curves are not nearly as nice and stable as GPT curves and often have the problem that they plateau rather than making steady progress with more training time, so that will make these metrics noisier, but it does seem like a reasonable metric to track

(Not that I have recommendations about how to track it; I doubt the authors can easily get these metrics)

[Christiano][18:01] 

If you think our views are making such starkly different predictions then I'd be happy to actually state any of them in advance, including e.g. about future ML benchmark results.

I don't think this falsifies my view, and we could continue trying to hash out what my view is but it seems like slow going and I'm inclined to give up.

Relevant questions on my view are things like: is MuZero optimized at all for performance in the tiny-sample regime? (I think not, I don't even think it set SoTA on that task and I haven't seen any evidence.) What's the actual rate of improvements since people started studying this benchmark ~2 years ago, and how much work has gone into it? And I totally agree with your comments that "# of frames" is the natural unit for measuring and that would be the starting point for any discussion.

[Barnes][18:22] 

In previous MCTS RL algorithms, the environment model is either given or only trained with rewards, values, and policies, which cannot provide sufficient training signals due to their scalar nature. The problem is more severe when the reward is sparse or the bootstrapped value is not accurate. The MCTS policy improvement operator heavily relies on the environment model. Thus, it is vital to have an accurate one.

We notice that the output  from the dynamic function  should be the same as , i.e. the output of the representation function  with input of the next observation  (Fig. 2). This can help to supervise the predicted next state  using the actual , which is a tensor with at least a few hundred dimensions. This provides  with much more training signals than the default scalar reward and value.

This seems like a super obvious thing to do and I'm confused why DM didn't already try this. It was definitely being talked about in ~2018

Will ask a DM friend about it

[Yudkowsky][22:45] 

I... don't think I want to take all of the blame for misunderstanding Paul's views; I think I also want to complain at least a little that Paul spends an insufficient quantity of time pointing at extremely concrete specific possibilities, especially real ones, and saying how they do or don't fit into the scheme.

Am I rephrasing correctly that, in this case, if Efficient Zero was actually a huge (3x? 5x? 10x?) jump in RL sample efficiency over previous SOTA, measured in 1 / frames required to train to a performance level, then that means the Paul view doesn't apply to the present world; but this could be because MuZero wasn't the real previous SOTA, or maybe because nobody really worked on pushing out this benchmark for 2 years and therefore on the Paul view it's fine for there to still be huge jumps?  In other words, this is something Paul's worldview has to either defy or excuse, and not just, "well, sure, why wouldn't it do that, you have misunderstood which kinds of AI-related events Paul is even trying to talk about"?

In the case where, "yes it's a big jump and that shouldn't happen later, but it could happen now because it turned out nobody worked hard on pushing past MuZero over the last 2 years", I wish to register that my view permits it to be the case that, when the world begins to end, the frontier that enters into AGI is similarly something that not a lot of people spent a huge effort on since a previous prototype from 2 years earlier.  It's just not very surprising to me if the future looks a lot like the past, or if human civilization neglects to invest a ton of effort in a research frontier.

Gwern guesses that getting to EfficientZero's performance level would require around 4x the samples for MuZero-Reanalyze (the more efficient version of MuZero which replayed past frames), which is also apparently the only version of MuZero the paper's authors were considering in the first place - without replays, MuZero requires 20 billion frames to achieve its performance, not the figure of 200 million. https://www.lesswrong.com/posts/jYNT3Qihn2aAYaaPb/efficientzero-human-ale-sample-efficiency-w-muzero-self?commentId=JEHPQa7i8Qjcg7TW6

 

17. November 4 conversation

 

17.1. EfficientZero (continued)

 

[Christiano][7:42] 

I think it's possible the biggest misunderstanding is that you somehow think of my view as a "scheme" and your view as a normal view where probability distributions over things happen.

Concretely, this is a paper that adds a few techniques to improve over MuZero in a domain that (it appears) wasn't a significant focus of MuZero. I don't know how much it improves but I can believe gwern's estimates of 4x.

I'd guess MuZero itself is a 2x improvement over the baseline from a year ago, which was maybe a 4x improvement over the algorithm from a year before that.

If that's right, then no it's not mindblowing on my view to have 4x progress one year, 2x progress the next, and 4x progress the next.

If other algorithms were better than MuZero, then the 2019-2020 progress would be >2x and the 2020-2021 progress would be <4x.

I think it's probably >4x sample efficiency though (I don't totally buy gwern's estimate there), which makes it at least possibly surprising.

But it's never going to be that surprising. It's a benchmark that people have been working on for a few years that has been seeing relatively rapid improvement over that whole period.

The main innovation is how quickly you can learn to predict future frames of Atari games, which has tiny economic relevance and calling it the most AGI-ish direction seems like it's a very Eliezer-ish view, this isn't the kind of domain where I'm either most surprised to see rapid progress at all nor is the kind of thing that seems like a key update re: transformative AI

yeah, SoTA in late 2020 was SPR, published by a much smaller academic group: https://arxiv.org/pdf/2007.05929.pdf

MuZero wasn't even setting sota on this task at the time it was published

my "schemes" are that (i) if a bunch of people are trying on a domain and making steady slow progress, I'm surprised to see giant jumps and I don't expect most absolute progress to occur in such jumps, (ii) if a domain is worth a lot of $, generally a bunch of people will be trying. Those aren't claims about what is always true, they are claims about what is typically true and hence what I'm guessing will be true for transformative AI.

Maybe you think those things aren't even good general predictions, and that I don't have long enough tails in my distributions or whatever. But in that case it seems we can settle it quickly by prediction.

I think this result is probably significant (>30% absolute improvement) + faster-than-trend (>50% faster than previous increment) progress relative to prior trend on 8 of the 27 atari games (from table 1, treating SimPL->{max of MuZero, SPR}->EfficientZero as 3 equally spaced datapoints): Asterix, Breakout, almost ChopperCMD, almost CrazyClimber, Gopher, Kung Fu Master, Pong, QBert, SeaQuest. My guess is that they thought a lot about a few of those games in particular because they are very influential on the mean/median. Note that this paper is a giant grab bag and that simply stapling together the prior methods would have already been a significant improvement over prior SoTA. (ETA: I don't think saying "its only 8 of 27 games" is an update against it being big progress or anything. I do think saying "stapling together 2 previous methods without any complementarity at all would already have significantly beaten SoTA" is fairly good evidence that it's not a hard-to-beat SoTA.)

and even fewer people working on the ultra-low-sample extremely-low-dimensional DM control environments (this is the subset of problems where the state space is 4 dimensions, people are just not trying to publish great results on cartpole), so I think the most surprising contribution is the atari stuff

OK, I now also understand what the result is I think?

I think the quick summary is: the prior SoTA is SPR, which learns to predict the domain and then does Q-learning. MuZero instead learns to predict the domain and does MCTS, but it predicts the domain in a slightly less sophisticated way than SPR (basically just predicts rewards, whereas SPR predicts all of the agent's latent state in order to get more signal from each frame). If you combine MCTS with more sophisticated prediction, you do better.

I think if you told me that DeepMind put in significant effort in 2020 (say, at least as much post-MuZero effort as the new paper?) trying to get great sample efficiency on the easy-exploration atari games, and failed to make significant progress, then I'm surprised.

I don't think that would "falsify" my view, but it would be an update against? Like maybe if DM put in that much effort I'd maybe have given only a 10-20% probability to a new project of similar size putting in that much effort making big progress, and even conditioned on big progress this is still >>median (ETA: and if DeepMind put in much more effort I'd be more surprised than 10-20% by big progress from the new project)

Without DM putting in much effort, it's significantly less surprising and I'll instead be comparing to the other academic efforts. But it's just not surprising that you can beat them if you are willing to put in the effort to reimplement MCTS and they aren't, and that's a step that is straightforwardly going to improve performance.

(not sure if that's the situation)

And then to see how significant updates against are, you have to actually contrast them with all the updates in the other direction where people don't crush previous benchmark results

and instead just make modest progress

I would guess that if you had talked to an academic about this question (what happens if you combine SPR+MCTS) they would have predicted significant wins in sample efficiency (at the expense of compute efficiency) and cited the difficulty of implementing MuZero compared to any of the academic results. That's another way I could be somewhat surprised (or if there were academics with MuZero-quality MCTS implementations working on this problem, and they somehow didn't set SoTA, then I'm even more surprised). But I'm not sure if you'll trust any of those judgments in hindsight.

Repeating the main point:

I don't really think a 4x jump over 1 year is something I have to "defy or excuse", it's something that I think becomes more or less likely depending on facts about the world, like (i) how fast was previous progress, (ii) how many people were working on previous projects and how targeted were they at this metric, (iii) how many people are working in this project and how targeted was it at this metric

it becomes continuously less likely as those parameters move in the obvious directions

it never becomes 0 probability, and you just can't win that much by citing isolated events that I'd give say a 10% probability to, unless you actually say something about how you are giving >10% probabilities to those events without losing a bunch of probability mass on what I see as the 90% of boring stuff

[Ngo: 👍]

and then separately I have a view about lots of people working on important problems, which doesn't say anything about this case

(I actually don't think this event is as low as 10%, though it depends on what background facts about the project you are conditioning on---obviously I gave <<10% probability to someone publishing this particular result, but something like "what fraction of progress in this field would come down to jumps like this" or whatever is probably >10% until you tell me that DeepMind actually cared enough to have already tried)

[Ngo][8:48] 

I expect Eliezer to say something like: DeepMind believes that both improving RL sample efficiency, and benchmarking progress on games like Atari, are important parts of the path towards AGI. So insofar as your model predicts that smooth progress will be caused by people working directly towards AGI, DeepMind not putting effort into this is a hit to that model. Thoughts?

[Christiano][9:06] 

I don't think that learning these Atari games in 2 hours is a very interesting benchmark even for deep RL sample efficiency, and it's totally unrelated to the way in which humans learn such games quickly. It seems pretty likely totally plausible (50%?) to me that DeepMind feels the same way, and then the question is about other random considerations like how they are making some PR calculation.

[Ngo][9:18] 

If Atari is not a very interesting benchmark, then why did DeepMind put a bunch of effort into making Agent57 and applying MuZero to Atari?

Also, most of the effort they've spent on games in general has been on methods very unlike the way humans learn those games, so that doesn't seem like a likely reason for them to overlook these methods for increasing sample efficiency.

[Shah][9:32] 

It seems pretty likely totally plausible (50%?) to me that DeepMind feels the same way, and then the question is about other random considerations like how they are making some PR calculation.

Not sure of the exact claim, but DeepMind is big enough and diverse enough that I'm pretty confident at least some people working on relevant problems don't feel the same way

[...] This seems like a super obvious thing to do and I'm confused why DM didn't already try this. It was definitely being talked about in ~2018

Speculating without my DM hat on: maybe it kills performance in board games, and they want one algorithm for all settings?

[Christiano][10:29] 

Atari games in the tiny sample regime are a different beast

there are just a lot of problems you can state about Atari some of which are more or less interesting (e.g. jointly learning to play 57 Atari games is a more interesting problem than learning how to play one of them absurdly quickly, and there are like 10 other problems about Atari that are more interesting than this one)

That said, Agent57 also doesn't seem interesting except that it's an old task people kind of care about. I don't know about the take within DeepMind but outside I don't think anyone would care about it other than historical significance of the benchmark / obviously-not-cherrypickedness of the problem.

I'm sure that some people at DeepMind care about getting the super low sample complexity regime. I don't think that really tells you how large the DeepMind effort is compared to some random academics who care about it.

[Shah: 👍]

I think the argument for working on deep RL is fine and can be based on an analogy with humans while you aren't good at the task. Then once you are aiming for crazy superhuman performance on Atari games you naturally start asking "what are we doing here and why are we still working on atari games?"

[Ngo: 👍]

and correspondingly they are a smaller and smaller slice of DeepMind's work over time

[Ngo: 👍]

(e.g. Agent57 and MuZero are the only DeepMind blog posts about Atari in the last 4 years, it's not the main focus of MuZero and I don't think Agent57 is a very big DM project)

Reaching this level of performance in Atari games is largely about learning perception, and doing that from 100k frames of an Atari game just doesn't seem very analogous to anything humans do or that is economically relevant from any perspective. I totally agree some people are into it, but I'm totally not surprised if it's not going to be a big DeepMind project.

[Yudkowsky][10:51] 

would you agree it's a load-bearing assumption of your worldview - where I also freely admit to having a worldview/scheme, this is not meant to be a prejudicial term at all - that the line of research which leads into world-shaking AGI must be in the mainstream and not in a weird corner where a few months earlier there were more profitable other ways of doing all the things that weird corner did? 

eg, the tech line leading into world-shaking AGI must be at the profitable forefront of non-world-shaking tasks.  as otherwise, afaict, your worldview permits that if counterfactually we were in the Paul-forbidden case where the immediate precursor to AGI was something like EfficientZero (whose motivation had been beating an old SOTA metric rather than, say, market-beating self-driving cars), there might be huge capability leaps there just as EfficientZero represents a large leap, because there wouldn't have been tons of investment in that line.

[Christiano][10:54] 

Something like that is definitely a load-bearing assumption

Like there's a spectrum with e.g. EfficientZero --> 2016 language modeling --> 2014 computer vision --> 2021 language modeling --> 2021 computer vision, and I think everything anywhere close to transformative AI will be way way off the right end of that spectrum

But I think quantitatively the things you are saying don't seem quite right to me. Suppose that MuZero wasn't the best way to do anything economically relevant, but it was within a factor of 4 on sample efficiency for doing tasks that people care about. That's already going to be enough to make tons of people extremely excited.

So yes, I'm saying that anything leading to transformative AI is "in the mainstream" in the sense that it has more work on it than 2021 language models.

But not necessarily that it's the most profitable way to do anything that people care about. Different methods scale in different ways, and something can burst onto the scene in a dramatic way, but I strongly expect speculative investment driven by that possibility to already be way (way) more than 2021 language models. And I don't expect gigantic surprises. And I'm willing to bet that e.g. EfficientZero isn't a big surprise for researchers who are paying attention to the area (in addition to being 3+ orders of magnitude more neglected than anything close to transformative AI)

2021 language modeling isn't even very competitive, it's still like 3-4 orders of magnitude smaller than semiconductors. But I'm giving it as a reference point since it's obviously much, much more competitive than sample-efficient atari.

This is a place where I'm making much more confident predictions, this is "falsify paul's worldview" territory once you get to quantitative claims anywhere close to TAI and "even a single example seriously challenges paul's worldview" a few orders of magnitude short of that

[Yudkowsky][11:04] 

can you say more about what falsifies your worldview previous to TAI being super-obviously-to-all-EAs imminent?

or rather, "seriously challenges", sorry

[Christiano][11:05][11:08] 

big AI applications achieved by clever insights in domains that aren't crowded, we should be quantitative about how crowded and how big if we want to get into "seriously challenges"

like e.g. if this paper on atari was actually a crucial ingredient for making deep RL for robotics work, I'd be actually for real surprised rather than 10% surprised

but it's not going to be, those results are being worked on by much larger teams of more competent researchers at labs with $100M+ funding

it's definitely possible for them to get crushed by something out of left field

but I'm betting against every time

or like, the set of things people would describe as "out of left field," and the quantitative degree of neglectedness, becomes more and more mild as the stakes go up

[Yudkowsky][11:08] 

how surprised are you if in 2022 one company comes out with really good ML translation, and they manage to sell a bunch of it temporarily until others steal their ideas or Google acquires them?  my model of Paul is unclear on whether this constitutes "many people are already working on language models including ML translation" versus "this field is not profitable enough right this minute for things to be efficient there, and it's allowed to be nonobvious in worlds where it's about to become profitable".

[Christiano][11:08] 

if I wanted to make a prediction about that I'd learn a bunch about how much google works on translation and how much $ they make

I just don't know the economics

and it depends on the kind of translation that they are good at and the economics (e.g. google mostly does extremely high-volume very cheap translation)

but I think there are lots of things like that / facts I could learn about Google such that I'd be surprised in that situation

independent of the economics, I do think a fair number of people are working on adjacent stuff, and I don't expect someone to come out of left field for google-translate-cost translation between high-resource languages

but it seems quite plausible that a team of 10 competent people could significantly outperform google translate, and I'd need to learn about the economics to know how surprised I am by 10 people or 100 people or what

I think it's allowed to be non-obvious whether a domain is about to be really profitable

but it's not that easy, and the higher the stakes the more speculative investment it will drive, etc.

[Yudkowsky][11:14] 

if you don't update much off EfficientZero, then people also shouldn't be updating much off of most of the graph I posted earlier as possible Paul-favoring evidence, because most of those SOTAs weren't highly profitable so your worldview didn't have much to say about them. ?

[Christiano][11:15] 

Most things people work a lot on improve gradually. EfficientZero is also quite gradual compared to the crazy TAI stories you tell. I don't really know what to say about this game other than I would prefer make predictions in advance and I'm happy to either propose questions/domains or make predictions in whatever space you feel more comfortable with.

[Yudkowsky][11:16] 

I don't know how to point at a future event that you'd have strong opinions about.  it feels like, whenever I try, I get told that the current world is too unlike the future conditions you expect.

[Christiano][11:16] 

Like, whether or not EfficientZero is evidence for your view depends on exactly how "who knows what will happen" you are. if you are just a bit more spread out than I am, then it's definitely evidence for your view.

I'm saying that I'm willing to bet about any event you want to name, I just think my model of how things work is more accurate.

I'd prefer it be related to ML or AI.

[Yudkowsky][11:17] 

to be clear, I appreciate that it's similarly hard to point at an event like that for myself, because my own worldview says "well mostly the future is not all that predictable with a few rare exceptions"

[Christiano][11:17] 

But I feel like the situation is not at all symmetrical, I expect to outperform you on practically any category of predictions we can specify.

so like I'm happy to bet about benchmark progress in LMs, or about whether DM or OpenAI or Google or Microsoft will be the first to achieve something, or about progress in computer vision, or about progress in industrial robotics, or about translations

whatever

 

17.2. Near-term AI predictions

 

[Yudkowsky][11:18] 

that sounds like you ought to have, like, a full-blown storyline about the future?

[Christiano][11:18] 

what is a full-blown storyline? I have a bunch of ways that I think about the world and make predictions about what is likely

and yes, I can use those ways of thinking to make predictions about whatever

and I will very often lose to a domain expert who has better and more informed ways of making predictions

[Yudkowsky][11:19] 

what happens if 2022 through 2024 looks literally exactly like Paul's modal or median predictions on things?

[Christiano][11:19] 

but I think in ML I will generally beat e.g. a superforecaster who doesn't have a lot of experience in the area

give me a question about 2024 and I'll give you a median?

I don't know what "what happens" means

storylines do not seem like good ways of making predictions

[Shah: 👍]

[Yudkowsky][11:20] 

I mean, this isn't a crux for anything, but it seems like you're asking me to give up on that and just ask for predictions?  so in 2024 can I hire an artist who doesn't speak English and converse with them almost seamlessly through a machine translator?

[Christiano][11:22] 

median outcome (all of these are going to be somewhat easy-to-beat predictions because I'm not thinking): you can get good real-time translations, they are about as good as a +1 stdev bilingual speaker who listens to what you said and then writes it out in the other language as fast as they can type

Probably also for voice -> text or voice -> voice, though higher latencies and costs.

Not integrated into standard video chatting experience because the UX is too much of a pain and the world sucks.

That's a median on "how cool/useful is translation"

[Yudkowsky][11:23] 

I would unfortunately also predict that in this case, this will be a highly competitive market and hence not a very profitable one, which I predict to match your prediction, but I ask about the economics here just in case.

[Christiano][11:24] 

Kind of typical sample: I'd guess that Google has a reasonably large lead, most translation still provided as a free value-added, cost per translation at that level of quality is like $0.01/word, total revenue in the area is like $10Ms / year?

[Yudkowsky][11:24] 

well, my model also permits that Google does it for free and so it's an uncompetitive market but not a profitable one... ninjaed.

[Christiano][11:25] 

first order of improving would be sanity-checking economics and thinking about #s, second would be learning things like "how many people actually work on translation and what is the state of the field?"

[Yudkowsky][11:26] 

did Tesla crack self-driving cars and become a $3T company instead of a $1T company?  do you own Tesla options?

did Waymo beat Tesla and cause Tesla stock to crater, same question?

[Christiano][11:27] 

1/3 chance tesla has FSD in 2024

conditioned on that, yeah probably market cap is >$3T?

conditioned on Tesla having FSD, 2/3 chance Waymo has also at least rolled out to a lot of cities

conditioned on no tesla FSD, 10% chance Waymo has rolled out to like half of big US cities?

dunno if numbers make sense

[Yudkowsky][11:28] 

that's okay, I dunno if my questions make sense

[Christiano][11:29] 

(5% NW in tesla, 90% NW in AI bets, 100% NW in more normal investments; no tesla options that sounds like a scary place with lottery ticket biases and the crazy tesla investors)

[Yudkowsky][11:30] 

(am I correctly understanding you're 2x levered?)

[Christiano][11:30][11:31] 

yeah

it feels like you've got to have weird views on trajectory of value-added from AI over the coming years

on how much of the $ comes from domains that are currently exciting to people (e.g. that Google already works on, self-driving, industrial robotics) vs stuff out of left field

on what kind of algorithms deliver $ in those domains (e.g. are logistics robots trained using the same techniques tons of people are currently pushing on)

on my picture you shouldn't be getting big losses on any of those

just losing like 10-20% each time

[Yudkowsky][11:31][11:32] 

my uncorrected inside view says that machine translation should be in reach and generate huge amounts of economic value even if it ends up an unprofitable competitive or Google-freebie field

and also that not many people are working on basic research in machine translation or see it as a "currently exciting" domain

[Christiano][11:32] 

how many FTE is "not that many" people?

also are you expecting improvement in the google translate style product, or in lower-latencies for something closer to normal human translator prices, or something else?

[Yudkowsky][11:33] 

my worldview says more like... sure, maybe there's 300 programmers working on it worldwide, but most of them aren't aggressively pursuing new ideas and trying to explore the space, they're just applying existing techniques to a new language or trying to throw on some tiny mod that lets them beat SOTA by 1.2% for a publication

because it's not an exciting field

"What if you could rip down the language barriers" is an economist's dream, or a humanist's dream, and Silicon Valley is neither

and looking at GPT-3 and saying, "God damn it, this really seems like it must on some level understand what it's reading well enough that the same learned knowledge would suffice to do really good machine translation, this must be within reach for gradient descent technology we just don't know how to reach it" is Yudkowskian thinking; your AI system has internal parts like "how much it understands language" and there's thoughts about what those parts ought to be able to do if you could get them into a new system with some other parts

[Christiano][11:36] 

my guess is we'd have some disagreements here

but to be clear, you are talking about text-to-text at like $0.01/word price point?

[Yudkowsky][11:38] 

I mean, do we?  Unfortunately another Yudkowskian worldview says "and people can go on failing to notice this for arbitrarily long amounts of time".

if that's around GPT-3's price point then yeah

[Christiano][11:38] 

gpt-3 is a lot cheaper, happy to say gpt-3 like price point

[Yudkowsky][11:39] 

(thinking about whether $0.01/word is meaningfully different from $0.001/word and concluding that it is)

[Christiano][11:39] 

(api is like 10,000 words / $)

I expect you to have a broader distribution over who makes a great product in this space, how great it ends up being etc., whereas I'm going to have somewhat higher probabilities on it being google research and it's going to look boring

[Yudkowsky][11:40] 

what is boring?

boring predictions are often good predictions on my own worldview too

lots of my gloom is about things that are boringly bad and awful

(and which add up to instant death at a later point)

but, I mean, what does boring machine translation look like?

[Christiano][11:42] 

Train big language model. Have lots of auxiliary tasks especially involving reading in source language and generation in target language. Have pre-training on aligned sentences and perhaps using all the unsupervised translation we have depending on how high-resource language is. Fine-tune with smaller amount of higher quality supervision.

Some of the steps likely don't add much value and skip them. Fair amount of non-ML infrastructure.

For some languages/domains/etc. dedicated models, over time increasingly just have a giant model with learned dispatch as in mixture of experts.

[Yudkowsky][11:44] 

but your worldview is also totally ok with there being a Clever Trick added to that which produces a 2x reduction in training time.  or with there being a new innovation like transformers, which was developed a year earlier and which everybody now uses, without which the translator wouldn't work at all. ?

[Christiano][11:44] 

Just for reference, I think transformers aren't that visible on a (translation quality) vs (time) graph?

But yes, I'm totally fine with continuing architectural improvements, and 2x reduction in training time is currently par for the course for "some people at google thought about architectures for a while" and I expect that to not get that much tighter over the next few years.

[Yudkowsky][11:45] 

unrolling Restricted Boltzmann Machines to produce deeper trainable networks probably wasn't much visible on a graph either, but good luck duplicating modern results using only lower portions of the tech tree.  (I don't think we disagree about this.)

[Christiano][11:45] 

I do expect it to eventually get tighter, but not by 2024.

I don't think unrolling restricted boltzmann machines is that important

[Yudkowsky][11:46] 

like, historically, or as a modern technology?

[Christiano][11:46] 

historically

[Yudkowsky][11:46] 

interesting

my model is that it got people thinking about "what makes things trainable" and led into ReLUs and inits

but I am going more off having watched from the periphery as it happened, than having read a detailed history of that

like, people asking, "ah, but what if we had a deeper network and the gradients didn't explode or die out?" and doing that en masse in a productive way rather than individuals being wistful for 30 seconds

[Christiano][11:48] 

well, not sure if this will introduce differences in predictions

I don't feel like it should really matter for our bottom line predictions whether we classify google's random architectural change as something fundamentally new (which happens to just have a modest effect at the time that it's built) or as something boring

I'm going to guess how well things will work by looking at how well things work right now and seeing how fast it's getting better

and that's also what I'm going to do for applications of AI with transformative impacts

and I actually believe you will do something today that's analogous to what you would do in the future, and in fact will make somewhat different predictions than what I would do

and then some of the action will be in new things that people haven't been trying to do in the past, and I'm predicting that new things will be "small" whereas you have a broader distribution, and there's currently some not-communicated judgment call in "small"

if you think that TAI will be like translation, where google publishes tons of papers, but that they will just get totally destroyed by some new idea, then it seems like that should correspond to a difference in P(google translation gets totally destroyed by something out-of-left-field)

and if you think that TAI won't be like translation, then I'm interested in examples more like TAI

I don't really understand the take "and people can go on failing to notice this for arbitrarily long amounts of time," why doesn't that also happen for TAI and therefore cause it to be the boring slow progress by google? Why would this be like a 50% probability for TAI but <10% for translation?

perhaps there is a disagreement about how good the boring progress will be by 2024? looks to me like it will be very good

[Yudkowsky][11:57] 

I am not sure that is where the disagreement lies

 

17.3. The evolution of human intelligence

 

[Yudkowsky][11:57] 

I am considering advocating that we should have more disagreements about the past, which has the advantage of being very concrete, and being often checkable in further detail than either of us already know

[Christiano][11:58] 

I'm fine with disagreements about the past; I'm more scared of letting you pick arbitrary things to "predict" since there is much more impact from differences in domain knowledge

(also not quite sure why it's more concrete, I guess because we can talk about what led to particular events? mostly it just seems faster)

also as far as I can tell our main differences are about whether people will spend a lot of money work effectively on things that would make a lot of money, which means if we look to the past we will have to move away from ML/AI

[Yudkowsky][12:00] 

so my understanding of how Paul writes off the example of human intelligence, is that you are like, "evolution is much stupider than a human investor; if there'd been humans running the genomes, people would be copying all the successful things, and hominid brains would be developing in this ecology of competitors instead of being a lone artifact". ?

[Christiano][12:00] 

I don't understand why I have to write off the example of human intelligence

[Yudkowsky][12:00] 

because it looks nothing like your account of how TAI develops

[Christiano][12:00] 

it also looks nothing like your account, I understand that you have some analogy that makes sense to you

[Yudkowsky][12:01] 

I mean, to be clear, I also write off the example of humans developing morality and have to explain to people at length why humans being as nice as they are, doesn't imply that paperclip maximizers will be anywhere near that nice, nor that AIs will be other than paperclip maximizers.

[Christiano][12:01][12:02] 

you could state some property of how human intelligence developed, that is in common with your model for TAI and not mine, and then we could discuss that

if you say something like: "chimps are not very good at doing science, but humans are" then yes my answer will be that it's because evolution was not selecting us to be good at science

and indeed AI systems will be good at science using much less resources than humans or chimps

[Yudkowsky][12:02][12:02] 

would you disagree that humans developing intelligence, on the sheer surfaces of things, looks much more Yudkowskian than Paulian?

like, not in terms of compatibility with underlying model

just that there's this one corporation that came out and massively won the entire AGI race with zero competitors

[Christiano][12:03] 

I agree that "how much did the winner take all" is more like your model of TAI than mine

I don't think zero competitors is reasonable, I would say "competitors who were tens of millions of years behind"

[Yudkowsky][12:03] 

sure

and your account of this is that natural selection is nothing like human corporate managers copying each other

[Christiano][12:03] 

which was a reasonable timescale for the old game, but a long timescale for the new game

[Yudkowsky][12:03] 

yup

[Christiano][12:04] 

that's not my only account

it's also that for human corporations you can form large coalitions, i.e. raise huge amounts of $ and hire huge numbers of people working on similar projects (whether or not vertically integrated), and those large coalitions will systematically beat small coalitions

and that's basically the key dynamic in this situation, and isn't even trying to have any analog in the historical situation

(the key dynamic w.r.t. concentration of power, not necessarily the main thing overall)

[Yudkowsky][12:07] 

the modern degree of concentration of power seems relatively recent and to have tons and tons to do with the regulatory environment rather than underlying properties of the innovation landscape

back in the old days, small startups would be better than Microsoft at things, and Microsoft would try to crush them using other forces than superior technology, not always successfully

or such was the common wisdom of USENET

[Christiano][12:08] 

my point is that the evolution analogy is extremely unpersuasive w.r.t. concentration of power

I think that AI software capturing the amount of power you imagine is also kind of implausible because we know something about how hardware trades off against software progress (maybe like 1 year of progress = 2x hardware) and so even if you can't form coalitions on innovation at all you are still going to be using tons of hardware if you want to be in the running

though if you can't parallelize innovation at all and there is enough dispersion in software progress then the people making the software could take a lot of the $ / influence from the partnership

anyway, I agree that this is a way in which evolution is more like your world than mine

but think on this point the analogy is pretty unpersuasive

because it fails to engage with any of the a priori reasons you wouldn't expect concentration of power

[Yudkowsky][12:11] 

I'm not sure this is the correct point on which to engage, but I feel like I should say out loud that I am unable to operate my model of your model in such fashion that it is not falsified by how the software industry behaved between 1980 and 2000.

there should've been no small teams that beat big corporations

today those are much rarer, but on my model, that's because of regulatory changes (and possibly metabolic damage from something in the drinking water)

[Christiano][12:12] 

I understand that you can't operate my model, and I've mostly given up, and on this point I would prefer to just make predictions or maybe retrodictions

[Yudkowsky][12:13] 

well, anyways, my model of how human intelligence happened looks like this:

there is a mysterious kind of product which we can call G, and which brains can operate as factories to produce

G in turn can produce other stuff, but you need quite a lot of it piled up to produce better stuff than your competitors

as late as 1000 years ago, the fastest creatures on Earth are not humans, because you need even more G than that to go faster than cheetahs

(or peregrine falcons)

the natural selections of various species were fundamentally stupid and blind, incapable of foresight and incapable of copying the successes of other natural selections; but even if they had been as foresightful as a modern manager or investor, they might have made just the same mistake

before 10,000 years they would be like, "what's so exciting about these things? they're not the fastest runners."

if there'd been an economy centered around running, you wouldn't invest in deploying a human

(well, unless you needed a stamina runner, but that's something of a separate issue, let's consider just running races)

you would invest on improving cheetahs

because the pile of human G isn't large enough that their G beats a specialized naturally selected cheetah

[Christiano][12:17] 

how are you improving cheetahs in the analogy?

you are trying random variants to see what works?

[Yudkowsky][12:18] 

using conventional, well-tested technology like MUSCLES and TENDONS

trying variants on those

[Christiano][12:18] 

ok

and you think that G doesn't help you improve on muscles and tendons?

until you have a big pile of it?

[Yudkowsky][12:18] 

not as a metaphor but as simple historical fact, that's how it played out

it takes a whole big pile of G to go faster than a cheetah

[Christiano][12:19] 

as a matter of fact there is no one investing in making better cheetahs

so it seems like we're already playing analogy-game

[Yudkowsky][12:19] 

the natural selection of cheetahs is investing in it

it's not doing so by copying humans because of fundamental limitations

however if we replace it with an average human investor, it still doesn't copy humans, why would it

[Christiano][12:19] 

that's the part that is silly

or like, it needs more analogy

[Yudkowsky][12:19] 

how so?  humans aren't the fastest.

[Christiano][12:19] 

humans are great at breeding animals

so if I'm natural selection personified, the thing to explain is why I'm not using some of that G to improve on my selection

not why I'm not using G to build a car

[Yudkowsky][12:20] 

I'm... confused

is this implying that a key aspect of your model is that people are using AI to decide which AI tech to invest in?

[Christiano][12:20] 

no

I think I just don't understand your analogy

here in the actual world, some people are trying to make faster robots by tinkering with robot designs

and then someone somewhere is training their AGI

[Yudkowsky][12:21] 

what I'm saying is that you can imagine a little cheetah investor going, "I'd like to copy and imitate some other species's tricks to make my cheetahs faster" and they're looking enviously at falcons, not at humans

not until very late in the game

[Christiano][12:21] 

and the relevant question is whether the pre-AGI thing is helpful for automating the work that humans are doing while they tinker with robot designs

that seems like the actual world

and the interesting claim is you saying "nope, not very"

[Yudkowsky][12:22] 

I am again confused.  Does it matter to your model whether the pre-AGI thing is helpful for automating "tinkering with robot designs" or just profitable machine translation?  Either seems like it induces equivalent amounts of investment.

If anything the latter induces much more investment.

[Christiano][12:23] 

sure, I'm fine using "tinkering with robot designs" as a lower bound

both are fine

the point is I have no idea what you are talking about in the analogy

what is analogous to what?

I thought cheetahs were analogous to faster robots

[Yudkowsky][12:23] 

faster cheetahs are analogous to more profitable robots

[Christiano][12:23] 

sure

so you have some humans working on making more profitable robots, right?

who are tinkering with the robots, in a way analogous to natural selection tinkering with cheetahs?

[Yudkowsky][12:24] 

I'm suggesting replacing the Natural Selection of Cheetahs with a new optimizer that has the Copy Competitor and Invest In Easily-Predictable Returns feature

[Christiano][12:24] 

OK, then I don't understand what those are analogous to

like, what is analogous to the humans who are tinkering with robots, and what is analogous to the humans working on AGI?

[Yudkowsky][12:24] 

and observing that, even this case, the owner of Cheetahs Inc. would not try to copy Humans Inc.

[Christiano][12:25] 

here's the analogy that makes sense to me

natural selection is working on making faster cheetahs = some humans tinkering away to make more profitable robots

natural selection is working on making smarter humans = some humans who are tinkering away to make more powerful AGI

natural selection doesn't try to copy humans because they suck at being fast = robot-makers don't try to copy AGI-makers because the AGIs aren't very profitable robots

[Yudkowsky][12:26] 

with you so far

[Christiano][12:26] 

eventually humans build cars once they get smart enough = eventually AGI makes more profitable robots once it gets smart enough

[Yudkowsky][12:26] 

yup

[Christiano][12:26] 

great, seems like we're on the same page then

[Yudkowsky][12:26] 

and by this point it is LATE in the game

[Christiano][12:27] 

great, with you still

[Yudkowsky][12:27] 

because the smaller piles of G did not produce profitable robots

[Christiano][12:27] 

but there's a step here where you appear to go totally off the rails

[Yudkowsky][12:27] 

or operate profitable robots

say on

[Christiano][12:27] 

can we just write out the sequence of AGIs, AGI(1), AGI(2), AGI(3)... in analogy with the sequence of human ancestors H(1), H(2), H(3)...?

[Yudkowsky][12:28] 

Is the last member of the sequence H(n) the one that builds cars and then immediately destroys the world before anything that operates on Cheetah Inc's Owner's scale can react?

[Christiano][12:28] 

sure

I don't think of it as the last

but it's the last one that actually arises?

maybe let's call it the last, H(n)

great

and now it seems like you are imagining an analogous story, where AGI(n) takes over the world and maybe incidentally builds some more profitable robots along the way

(building more profitable robots being easier than taking over the world, but not so much easier that AGI(n-1) could have done it unless we make our version numbers really close together, close enough that deploying AGI(n-1) is stupid)

[Yudkowsky][12:31] 

if this plays out in the analogous way to human intelligence, AGI(n) becomes able to build more profitable robots 1 hour before it becomes able to take over the world; my worldview does not put that as the median estimate, but I do want to observe that this is what happened historically

[Christiano][12:31] 

sure

[Yudkowsky][12:32] 

ok, then I think we're still on the same page as written so far

[Christiano][12:32] 

so the question that's interesting in the real world is which AGI is useful for replacing humans in the design-better-robots task; is it 1 hour before the AGI that takes over the world, or 2 years, or what?

[Yudkowsky][12:33] 

my worldview tends to make a big ol' distinction between "replace humans in the design-better-robots task" and "run as a better robot", if they're not importantly distinct from your standpoint can we talk about the latter?

[Christiano][12:33] 

they seem importantly distinct

totally different even

so I think we're still on the same page

[Yudkowsky][12:34] 

ok then, "replacing humans at designing better robots" sure as heck sounds to Eliezer like the world is about to end or has already ended

[Christiano][12:34] 

my whole point is that in the evolutionary analogy we are talking about "run as a better robot" rather than "replace humans in the design-better-robots-task"

and indeed there is no analog to "replace humans in the design-better-robots-task"

which is where all of the action and disagreement is

[Yudkowsky][12:35][12:36] 

well, yes, I was exactly trying to talk about when humans start running as better cheetahs

and how that point is still very late in the game

not as late as when humans take over the job of making the thing that makes better cheetahs, aka humans start trying to make AGI, which is basically the fingersnap end of the world from the perspective of Cheetahs Inc.

[Christiano][12:36] 

OK, but I don't care when humans are better cheetahs---in the real world, when AGIs are better robots. In the real world I care about when AGIs start replacing humans in the design-better-robots-task. I'm game to use evolution as an analogy to help answer that question (where I do agree that it's informative), but want to be clear what's actually at issue.

[Yudkowsky][12:37] 

so, the thing I was trying to work up to, is that my model permits the world to end in a way where AGI doesn't get tons of investment because it has an insufficiently huge pile of G that it could run as a better robot.  people are instead investing in the equivalents of cheetahs.

I don't understand why your model doesn't care when humans are better cheetahs.  AGIs running as more profitable robots is what induces the huge investments in AGI that your model requires to produce very close competition. ?

[Christiano][12:38] 

it's a sufficient condition, but it's not the most robust one at all

like, I happen to think that in the real world AIs actually are going to be incredibly profitable robots, and that's part of my boring view about what AGI looks like

But the thing that's more robust is that the sub-taking-over-world AI is already really important, and receiving huge amounts of investment, as something that automates the R&D process. And it seems like the best guess given what we know now is that this process starts years before the singularity.

From my perspective that's where most of the action is. And your views on that question seem related to your views on how e.g. AGI is a fundamentally different ballgame from making better robots (whereas I think the boring view is that they are closely related), but that's more like an upstream question about what you think AGI will look like, most relevant because I think it's going to lead you to make bad short-term predictions about what kinds of technologies will achieve what kinds of goals.

[Yudkowsky][12:41] 

but not all AIs are the same branch of the technology tree.  factory robotics are already really important and they are "AI" but, on my model, they're currently on the cheetah branch rather than the hominid branch of the tech tree; investments into better factory robotics are not directly investments into improving MuZero, though they may buy chips that MuZero also buys.

[Christiano][12:42] 

Yeah, I think you have a mistaken view of AI progress. But I still disagree with your bottom line even if I adopt (this part of) your view of AI progress.

Namely, I think that the AGI line is mediocre before it is great, and the mediocre version is spectacularly valuable for accelerating R&D (mostly AGI R&D).

The way I end up sympathizing with your view is if I adopt both this view about the tech tree, + another equally-silly-seeming view about how close the AGI line is to fooming (or how inefficient the area will remain as we get close to fooming)

 

17.4. Human generality and body manipulation

 

[Yudkowsky][12:43] 

so metaphorically, you require that humans be doing Great at Various Things and being Super Profitable way before they develop agriculture; the rise of human intelligence cannot be a case in point of your model because the humans were too uncompetitive at most animal activities for unrealistically long (edit: compared to the AI case)

[Christiano][12:44] 

I don't understand

Human brains are really great at basically everything as far as I can tell?

like it's not like other animals are better at manipulating their bodies

we crush them

[Yudkowsky][12:44] 

if we've got weapons, yes

[Christiano][12:44] 

human bodies are also pretty great, but they are not the greatest on every dimension

[Yudkowsky][12:44] 

wrestling a chimpanzee without weapons is famously ill-advised

[Christiano][12:44] 

no, I mean everywhere

chimpanzees are practically the same as humans in the animal kingdom

they have almost as excellent a brain

[Yudkowsky][12:45] 

as is attacking an elephant with your bare hands

[Christiano][12:45] 

that's not because of elephant brains

[Yudkowsky][12:45] 

well, yes, exactly

you need a big pile of G before it's profitable

so big the game is practically over by then

[Christiano][12:45] 

this seems so confused

but that's exciting I guess

like, I'm saying that the brains to automate R&D

are similar to the brains to be a good factory robot

analogously, I think the brains that humans use to do R&D

are similar to the brains we use to manipulate our body absurdly well

I do not think that our brains make us fast

they help a tiny bit but not much

I do not think the physical actuators of the industrial robots will be that similar to the actuators of the robots that do R&D

the claim is that the problem of building the brain is pretty similar

just as the problem of building a brain that can do science is pretty similar to the problem of building a brain that can operate a body really well

(and indeed I'm claiming that human bodies kick ass relative to other animal bodies---there may be particular tasks other animal brains are pre-built to be great at, but (i) humans would be great at those too if we were under mild evolutionary pressure with our otherwise excellent brains, (ii) there are lots of more general tests of how good you are at operating a body and we will crush it at those tests)

(and that's not something I know much about, so I could update as I learned more about how actually we just aren't that good at motor control or motion planning)

[Yudkowsky][12:49] 

so on your model, we can introduce humans to a continent, forbid them any tool use, and they'll still wipe out all the large animals?

[Christiano][12:49] 

(but damn we seem good to me)

I don't understand why that would even plausibly follow

[Yudkowsky][12:49] 

because brains are profitable early, even if they can't build weapons?

[Christiano][12:49] 

I'm saying that if you put our brains in a big animal body

we would wipe out the big animals

yes, I think brains are great

[Yudkowsky][12:50] 

because we'd still have our late-game pile of G and we would build weapons

[Christiano][12:50] 

no, I think a human in a big animal body, with brain adapted to operate that body instead of our own, would beat a big animal straightforwardly

without using tools

[Yudkowsky][12:51] 

this is a strange viewpoint and I do wonder whether it is a crux of your view

[Christiano][12:51] 

this feels to me like it's more on the "eliezer vs paul disagreement about the nature of AI" rather than "eliezer vs paul on civilizational inadequacy and continuity", but enough changes on "nature of AI" would switch my view on the other question

[Yudkowsky][12:51] 

like, ceteris paribus maybe a human in an elephant's body beats an elephant after a burn-in practice period?  because we'd have a strict intelligence advantage?

[Christiano][12:52] 

practice may or may not be enough

but if you port over the excellent human brain to the elephant body, then run evolution for a brief burn-in period to get all the kinks sorted out?

elephants are pretty close to humans so it's less brutal than for some other animals (and also are elephants the best example w.r.t. the possibility of direct conflict?) but I totally expect us to win

[Yudkowsky][12:53] 

I unfortunately need to go do other things in advance of an upcoming call, but I feel like disagreeing about the past is proving noticeably more interesting, confusing, and perhaps productive, than disagreeing about the future

[Christiano][12:53] 

actually probably I just think practice is enough

I think humans have way more dexterity, better locomotion, better navigation, better motion planning...

some of that is having bodies optimized for those things (esp. dexterity), but I also think most animals just don't have the brains for it, with elephants being one of the closest calls

I'm a little bit scared of talking to zoologists or whoever the relevant experts are on this question, because I've talked to bird people a little bit and they often have very strong "humans aren't special, animals are super cool" instincts even in cases where that take is totally and obviously insane. But if we found someone reasonable in that area I'd be interested to get their take on this.

I think this is pretty important for the particular claim "Is AGI like other kinds of ML?"; that definitely doesn't persuade me to be into fast takeoff on its own though it would be a clear way the world is more Eliezer-like than Paul-like

I think I do further predict that people who know things about animal intelligence, and don't seem to have identifiably crazy views about any adjacent questions that indicate a weird pro-animal bias, will say that human brains are a lot better than other animal brains for dexterity/locomotion/similar physical tasks (and that the comparison isn't that close for e.g. comparing humans vs big cats).

Incidentally, seems like DM folks did the same thing this year, presumably publishing now because they got scooped. Looks like they probably have a better algorithm but used harder environments instead of Atari. (They also evaluate the algorithm SPR+MuZero I mentioned which indeed gets one factor of 2x improvement over MuZero alone, roughly as you'd guess): https://arxiv.org/pdf/2111.01587.pdf

[Barnes][13:45] 

My DM friend says they tried it before they were focused on data efficiency and it didn't help in that regime, sounds like they ignored it for a while after that

[Christiano: 👍]

[Christiano][13:48] 

Overall the situation feels really boring to me. Not sure if DM having a highly similar unpublished result is more likely on my view than Eliezer's (and initially ignoring the method because they weren't focused on sample-efficiency), but at any rate I think it's not anywhere close to falsifying my view.

 

18. Follow-ups to the Christiano/Yudkowsky conversation

 

[Karnofsky][9:39]  (Nov. 5) 

Going to share a point of confusion about this latest exchange.

It started with Eliezer saying this:

Thing that (if true) strikes me as... straight-up falsifying Paul's view as applied to modern-day AI, at the frontier of the most AGI-ish part of it and where Deepmind put in substantial effort on their project? EfficientZero (allegedly) learns Atari in 100,000 frames. Caveat: I'm not having an easy time figuring out how many frames MuZero would've required to achieve the same performance level. MuZero was trained on 200,000,000 frames but reached what looks like an allegedly higher high; the EfficientZero paper compares their performance to MuZero on 100,000 frames, and claims theirs is much better than MuZero given only that many frames.

So at this point, I thought Eliezer's view was something like: "EfficientZero represents a several-OM (or at least one-OM?) jump in efficiency, which should shock the hell out of Paul." The upper bound on the improvement is 2000x, so I figured he thought the corrected improvement would be some number of OMs.

But very shortly afterwards, Eliezer quotes Gwern's guess of a 4x improvement, and Paul then said:

Concretely, this is a paper that adds a few techniques to improve over MuZero in a domain that (it appears) wasn't a significant focus of MuZero. I don't know how much it improves but I can believe gwern's estimates of 4x.

I'd guess MuZero itself is a 2x improvement over the baseline from a year ago, which was maybe a 4x improvement over the algorithm from a year before that. If that's right, then no it's not mindblowing on my view to have 4x progress one year, 2x progress the next, and 4x progress the next.

Eliezer never seemed to push back on this 4x-2x-4x claim.

What I thought would happen after the 4x estimate and 4x-2x-4x claim: Eliezer would've said "Hmm, we should nail down whether we are talking about 4x-2x-4x or something more like 4x-2x-100x. If it's 4x-2x-4x, then I'll say 'never mind' re: my comment that this 'straight-up falsifies Paul's view.' At best this is just an iota of evidence or something."

Why isn't that what happened? Did Eliezer mean all along to be saying that a 4x jump on Atari sample efficiency would "straight-up falsify Paul's view?" Is a 4x jump the kind of thing Eliezer thinks is going to power a jumpy AI timeline?

[Ngo: 👍][Shah: ➕]

[Yudkowsky][11:16]  (Nov. 5) 

This is a proper confusion and probably my fault; I also initially thought it was supposed to be 1-2 OOM and should've made it clearer that Gwern's 4x estimate was less of a direct falsification.

I'm not yet confident Gwern's estimate is correct.  I just got a reply from my query to the paper's first author which reads:

Dear Eliezer: It's a good question. But due to the limits of resources and time, we haven't evaluated the sample efficiency towards different frames systematically. I think it's not a trivial question as the required time and resources are much expensive for the 200M frames setting, especially concerning the MCTS-based methods. Maybe you need about several days or longer to finish a run with GPUs in that setting. I hope my answer can help you. Thank you for your email.

I replied asking if Gwern's 3.8x estimate sounds right to them.

A 10x improvement could power what I think is a jumpy AI timeline.  I'm currently trying to draft a depiction of what I think an unrealistically dignified but computationally typical end-of-world would look like if it started in 2025, and my first draft of that had it starting with a new technique published by Google Brain that was around a 10x improvement in training speeds for very large networks at the cost of higher inference costs, but which turned out to be specially applicable to online learning.

That said, I think the 10x part isn't either a key concept or particularly likely, and it's much more likely that hell breaks loose when an innovation changes some particular step of the problem from "can't realistically be done at all" to "can be done with a lot of computing power", which was what I had being the real effect of that hypothetical Google Brain innovation when applied to online learning, and I will probably rewrite to reflect that.

[Karnofsky][11:29]  (Nov. 5) 

That's helpful, thanks.

Re: "can't realistically be done at all" to "can be done with a lot of computing power", cpl things:

1. Do you think a 10x improvement in efficiency at some particular task could qualify as this? Could a smaller improvement?

2. I thought you were pretty into the possibility of a jump from "can't realistically be done at all" to "can be done with a small amount of computing power," eg some random ppl with a $1-10mm/y budget blowing past mtpl labs with >$1bb/y budgets. Is that wrong?

[Yudkowsky][13:44]  (Nov. 5) 

1 - yes and yes, my revised story for how the world ends looks like Google Brain publishing something that looks like only a 20% improvement but which is done in a way that lets it be adapted to make online learning by gradient descent "work at all" in DeepBrain's ongoing Living Zero project (not an actual name afaik)

2 - that definitely remains very much allowed in principle, but I think it's not my current mainline probability for how the world's end plays out - although I feel hesitant / caught between conflicting heuristics here.

I think I ended up much too conservative about timelines and early generalization speed because of arguing with Robin Hanson, and don't want to make a similar mistake here, but on the other hand a lot of the current interesting results have been from people spending huge compute (as wasn't the case to nearly the same degree in 2008) and if things happen on short timelines it seems reasonable to guess that the future will look that much like the present.  This is very much due to cognitive limitations of the researchers rather than a basic fact about computer science, but cognitive limitations are also facts and often stable ones.

[Karnofsky][14:35]  (Nov. 5) 

Hm OK. I don't know what "online learning by gradient descent" means such that it doesn't work at all now (does "work at all" mean something like "work with human-ish learning efficiency?")

[Yudkowsky][15:07]  (Nov. 5) 

I mean, in context, it means "works for Living Zero at the performance levels where it's running around accumulating knowledge", which by hypothesis it wasn't until that point.

[Karnofsky][15:12]  (Nov. 5) 

Hm. I am feeling pretty fuzzy on whether your story is centrally about:

1. A <10x jump in efficiency at something important, leading pretty directly/straightforwardly to crazytown

2. A 100x ish jump in efficiency at something important, which may at first "look like" a mere <10x jump in efficiency at something else

#2 is generally how I've interpreted you and how the above sounds, but under #2 I feel like we should just have consensus that the Atari thing being 4x wouldn't be much of an update. Maybe we already do (it was a bit unclear to me from your msg)

(And I totally agree that we haven't established the Atari thing is only 4x - what I'm saying is it feels like the conversation should've paused there)

[Yudkowsky][15:13]  (Nov. 5) 

The Atari thing being 4x over 2 years is I think legit not an update because that's standard software improvement speed

you're correct that it should pause there

[Karnofsky][15:14]  (Nov. 5) 

👍

[Yudkowsky] [15:24]  (Nov. 5) 

I think that my central model is something like - there's a central thing to general intelligence that starts working when you get enough pieces together and they coalesce, which is why humans went down this evolutionary gradient by a lot before other species got 10% of the way there in terms of output; and then it takes a big pile of that thing to do big things, which is why humans didn't go faster than cheetahs until extremely late in the game.

so my visualization of how the world starts to end is "gear gets added and things start to happen, maybe slowly-by-my-standards at first such that humans keep on pushing it along rather than it being self-moving, but at some point starting to cumulate pretty quickly in the same way that humans cumulated pretty quickly once they got going" rather than "dial gets turned up 50%, things happen 50% faster, every year".

[Yudkowsky][15:16]  (Nov. 5, switching channels) 

as a quick clarification, I agree that if this is 4x sample efficiency over 2 years then that doesn't at all challenge Paul's view

[Christiano][0:20]  (Nov. 26) 

FWIW, I felt like the entire discussion of EfficientZero was a concrete example of my view making a number of more concentrated predictions than Eliezer that were then almost immediately validated. In particular, consider the following 3 events:

  • The quantitative effect size seems like it will turn out to be much smaller than Eliezer initially believed, much closer to being in line with previous progress.
  • DeepMind had relatively similar results that got published immediately after our discussion, making it look like random people didn't pull ahead of DM after all.
  • DeepMind appears not to have cared much about the metric in question, as evidenced by (i) Beth's comment above, which is basically what I said was probably going on, (ii) they barely even mention Atari sample-efficiency in their paper about similar methods.

If only 1 of these 3 things had happened, then I agree this would have been a challenge to my view that would make me update in Eliezer's direction. But that's only possible if Eliezer actually assigns a higher probability than me to <= 1 of these things happening, and hence a lower probability to >= 2 of them happening. So if we're playing a reasonable epistemic game, it seems like I need to collect some epistemic credit every time something looks boring to me.

[Yudkowsky][15:30]  (Nov. 26) 

I broadly agree; you win a Bayes point.  I think some of this (but not all!) was due to my tripping over my own feet and sort of rushing back with what looked like a Relevant Thing without contemplating the winner's curse of exciting news, the way that paper authors tend to frame things in more exciting rather than less exciting ways, etc.  But even if you set that aside, my underlying AI model said that was a thing which could happen (which is why I didn't have technically rather than sociologically triggered skepticism) and your model said it shouldn't happen, and it currently looks like it mostly didn't happen, so you win a Bayes point.

Notes that some participants may deem obvious(?) but that I state expecting wider readership:

  • Just like markets are almost entirely efficient (in the sense that, even when they're not efficient, you can only make a very small fraction of the money that could be made from the entire market if you owned a time machine), even sharp and jerky progress has to look almost entirely not so fast almost all the time if the Sun isn't right in the middle of going supernova.  So the notion that progress sometimes goes jerky and fast does have to be evaluated by a portfolio view over time.  In worlds where progress is jerky even before the End Days, Paul wins soft steady Bayes points in most weeks and then I win back more Bayes points once every year or two.
  • We still don't have a very good idea of how much longer you would need to train the previous algorithm to match the performance of the new algorithm, just an estimate by Gwern based off linearly extrapolating a graph in a paper.  But, also to be clear, not knowing something is not the same as expecting it to update dramatically, and you have to integrate over the distribution you've got.
  • It's fair to say, "Hey, Eliezer, if you tripped over your own feet here, but only noticed that because Paul was around to call it, maybe you're tripping over your feet at other times when Paul isn't around to check your thoughts in detail" - I don't want to minimize the Bayes point that Paul won either.

[Christiano][16:29]  (Nov. 27) 

Agreed that it's (i) not obvious how large the EfficientZero gain was, and in general it's not a settled question what happened, (ii) it's not that big an update, it needs to be part of a portfolio (but this is indicative of the kind of thing I'd want to put in the portfolio), (iii) it generally seems pro-social to flag potentially relevant stuff without the presumption that you are staking a lot on it.

65

Ω 28

35 comments, sorted by Click to highlight new comments since: Today at 6:26 PM
New Comment

Christiano's model of progress, AFAIU, can be summarized as: "When only a few people work in a field, big jumps in progress are possible. When many people work in a field, the low hanging fruits are picked quickly and then progress is smooth."

The problem with this model is, its predictions depend a lot on how you draw the boundary around "field". Take Yudkowsky's example of startups. How do we explain small startups succeed where large companies failed? And, it's not lack of economic incentives since successful startups sometimes make huge profits? (Often resulting from getting acquired by the large companies.) So much that there's an entire industry of investing in startups?

I'm guessing that a proponent of Christiano's theory would say: sure, such-and-such startup succeeded but it was because they were the only ones working on problem P, so problem P was an uncrowded field at the time. Okay, but why do we draw the boundary around P rather than around "software" or around something in between which was crowded? To take another example, heavier-than-air flight was arguably uncrowded when the Wright brothers came along, but flight-in-general (including balloons and airships) was somewhat crowded.

In a hypothetical Yudkowsky-style fast takeoff scenario, an imaginary post-singularity proponent of Christiano's theory could argue: Sure, AI was a crowded field, but this is consistent with the big jump because TAI was achieved using Technology X, which was an innovation created by that small group, and the field of "Technology X" was not crowded. A possible counterargument is: TAI will have to be good at science (or maybe some other word instead of "science"), and people are already trying to do science using AI (e.g. AlphaFold, the knot theory thing, the quantum chemistry thing). However, this is still begging the question: why draw the line around "science AI" rather than around something else?

Perhaps proponents of Christiano's theory just have independent, object-level reasons to believe that there will be no "Technology X", that "more of the same" is already sufficient to get to TAI. In this case there might be no natural place to draw the boundary which will be uncrowded. Here I have my own object-level reasons for skepticism, but maybe it's unwise to take the discussion there: as Yudkowsky pointed out, this can be net negative. However, I still ask: whence the (seemingly high) confidence? Maybe more-of-the-same is enough for TAI, but maybe not? (And even if it's theoretically enough, maybe it won't get there first.)

EDIT: I anticipate an objection to the "Technology X" scenario along the lines of, a small group cannot create something with rapid global impact. Because, presumably, the level of impact of any innovation is bounded in some predictable way by the economic investment. To this I would have two counter-objections:

First, imagine that Technology X didn't create TAI immediately but did lead to a kink comparable to the start of the deep learning revolution (i.e. TAI was eventually created as a result of much investment in Technology X, on a timeline significantly faster than the extrapolation of the pre-X trend). On the one hand, Christiano appears to believe this is still very unlikely. On the other hand, it can still be excused in hindsight by the reference class tennis I described.

Second, I am skeptical of the methodology. All of us here agree that AI poses unique risks compared to past technologies, so why can we extrapolate from the past in that way? Imagine that we lived in a universe in which it was plausible that the LHC creates a black hole or causes false vacuum collapse. It seems to me that such a universe could still have a techno-economic trajectory broadly similar to our own, for the same reasons. So, in that universe, would it make sense to argue "the LHC cannot destroy the world because its cost is an insufficient fraction of world GDP[1]"? It seems to me it would be strange there in a similar way how the economic argument about AI is strange here.


  1. The LHC is an expensive project, but is it expensive enough to destroy the world? How can we tell? Is this really a sensible way to analyze this compared to thinking about the actual physics? ↩︎

I'm guessing that a proponent of Christiano's theory would say: sure, such-and-such startup succeeded but it was because they were the only ones working on problem P, so problem P was an uncrowded field at the time. Okay, but why do we draw the boundary around P rather than around "software" or around something in between which was crowded?

I'd make a different reply: you need to not just look at the winning startup, but all startups. If it's the case that the 'startup ecosystem' is earning 100% returns and the rest of the economy is earning 5% returns, then something weird is up and the model is falsified, but if the startup ecosystem is earning 10% returns once you average together the few successes and many failures, then this looks more like a normal risk-return story.

Furthermore, there's something interesting where the modern startup economy feels much more like the Paulian 'concentration of power' story than the Yudkowskian 'wisdom of USENET' story; teams that make video games might be able to turn a handful of people into tens of millions of dollars in revenue (or billions in an extreme case), but teams that make self-driving cars mostly have to be able to tell a story about being able to turn billions of investor dollars into teams of engineers and mountains of hardware that will then be able to produce the self-driving cars, with the race between companies not being "who has a better product already" but "who can better acquire the means to create a better product."

I'm pretty sympathetic to the view that the first transformative intelligence will look more like a breakout indie game than a AAA title because there's some new 'gimmick' that can be made by a small team and that has an outsized impact on usefulness. But it seems important to note that a lot of the economy doesn't work that way, even lots of the 'build speculative future tech' part!

I don't see what it has to do with risk-return. Sure, many startups fail. And, plausibly many people tried to build an airplane and failed before the Wright brothers. And, many people keep trying building AGI and failing. This doesn't mean there won't be kinks in AI progress or even a TAI created by a small group.

Saying that "the subjective expected value of AI progress over time is a smooth curve" is a very different proposition from "the actual AI progress over time will be a smooth curve".

My line of argument here is not trying to prove a particular story about AI progress (e.g. "TAI will be similar to a startup") but push pack against (/ voice my confusions about) the confidence level of predictions made by Christiano's model.

My line of argument here is not trying to prove a particular story about AI progress (e.g. "TAI will be similar to a startup") but push pack against (/ voice my confusions about) the confidence level of predictions made by Christiano's model.

What is the confidence level of predictions you are pushing back against? I'm at like 30% on fast takeoff in the sense of "1 year doubling without preceding 4 year doubling" (a threshold roughly set to break any plausible quantitative historical precedent a threshold intended to be faster than historical precedent but that's probably similar to the agricultural revolution sped up 10,000x). I'm at maybe 10-20% on the kind of crazier world Eliezer imagines.

Is that a high level of confidence? I'm not sure I would be able to spread my probability in a way that felt unconfident (to me) without giving probabilities that low to lots of particular ways the future could be crazy. E.g. 10-20% is similar to the probability I put on other crazy-feeling possibilities like no singularity at all, rapid GDP acceleration with only moderate cognitive automation, or singleton that arrests economic growth before we get to 4 year doubling times...

I'm at like 30% on fast takeoff in the sense of "1 year doubling without preceding 4 year doubling" (a threshold roughly set to break any plausible quantitative historical precedent).

Huh, AI impacts looked at one dataset of GWP (taken from wikipedia, in turn taken from here) and found 2 precedents for "x year doubling without preceding 4x year doubling", roughly during the agricultural evolution. The dataset seems to be a combination of lots of different papers' estimates of human population, plus an assumption of ~constant GWP/capita early in history.

Yeah, I think this was wrong. I'm somewhat skeptical of the numbers and suspect future revisions systematically softening those accelerations, but 4x still won't look that crazy.

(I don't remember exactly how I chose that number but it probably involved looking at the same time series so wasn't designed to be much more abrupt.)

My view is:

  1. If X is obviously very valuable, many people will work on achieving X (potentially including lots of different approaches they see to achieving X).
  2. In fields with lots of spending and impact, most technological progress is made gradually rather than abruptly, for most ways of measuring.
  3. We haven't said very much at all about why AI should be one of the exceptions (compare to the situation when discussing nuclear weapons, where you can make fantastic arguments about why it would be different). Eliezer's argument about criticality seems to me to just not work unless one rejects 1+2 altogether (unlike nuclear weapons).

Your arguments about technology X apply to any other technological goal---better fusion reactors or solar panels, more generally cheaper energy, rockets, semiconductors, whatever. So it seems like they should be visible in the base rate for 2. Do you think that a significant fraction of technological progress is abrupt and unpredictable in the sense that you are saying TAI will probably be?

First, imagine that Technology X didn't create TAI immediately but did lead to a kink comparable to the start of the deep learning revolution (i.e. TAI was eventually created as a result of much investment in Technology X, on a timeline significantly faster than the extrapolation of the pre-X trend). On the one hand, Christiano appears to believe this is still very unlikely. On the other hand, it can still be excused in hindsight by the reference class tennis I described.

I don't know exactly what you are responding to here. I have some best guesses about what progress will look like, but they are pretty separate from the broader heuristic. And I'm not sure this is a fair representation of my actual view, within 20 years I think it's reasonably likely that AI will look fairly different, on a scale of 5 years that seems kind of unlikely.

Second, I am skeptical of the methodology. All of us here agree that AI poses unique risks compared to past technologies, so why can we extrapolate from the past in that way? Imagine that we lived in a universe in which it was plausible that the LHC creates a black hole or causes false vacuum collapse. It seems to me that such a universe could still have a techno-economic trajectory broadly similar to our own, for the same reasons. So, in that universe, would it make sense to argue "the LHC cannot destroy the world because its cost is an insufficient fraction of world GDP[1]"? It seems to me it would be strange there in a similar way how the economic argument about AI is strange here.

I'm predicting that the performance of AI systems will grow relatively continuously and predictably, not that AI isn't risky or even that risk will emerge gradually.

Take Yudkowsky's example of startups. How do we explain small startups succeed where large companies failed?

I think it's pretty unclear how this bears on the general schema above. But I'm happy to consider particular examples of startups that made rapid/unpredictable progress towards particular technological goals (perhaps by pursuing a new approach), since those are the kind of thing I'm predicting are rare. They sure look rare to me (i.e. are responsible for a very small share of total technological progress regardless of how you measure).

In a hypothetical Yudkowsky-style fast takeoff scenario, an imaginary post-singularity proponent of Christiano's theory could argue: Sure, AI was a crowded field, but this is consistent with the big jump because TAI was achieved using Technology X, which was an innovation created by that small group, and the field of "Technology X" was not crowded.

I'm surprised by an uncrowded technology surprisingly and rapidly overtaking the state of the art for a task people care a lot about, even at a super abstract or broad level like "make money." Happens sometimes but it's a minority of progress.

  1. In fields with lots of spending and impact, most technological progress is made gradually rather than abruptly, for most ways of measuring...

Your arguments about technology X apply to any other technological goal---better fusion reactors or solar panels, more generally cheaper energy, rockets, semiconductors, whatever. So it seems like they should be visible in the base rate for 2. Do you think that a significant fraction of technological progress is abrupt and unpredictable in the sense that you are saying TAI will probably be?

I think that you can roughly divide progress into "qualitatively new ideas" (QNI) and "incremental improvement of existing technology" (ofc in reality it's a spectrum). The first kind is much less predictable than the second kind. Now, when a QNI comes along, it doesn't necessarily look like a discontinuity, because there might be a lot of work to bridge the distance between idea and implementation. And, this work involves a lot of small details. Because of this, the first version is probably often only a slight improvement on SOTA. So, I'm guessing that QNIs produce something more like a discontinuity in the derivative than a discontinuity in the SOTA itself.

Under this model, most progress is "gradual" in the sense that at most points the graph is differentiable. But (i) it doesn't work too well to extrapolate trends across QNI points and (ii) the counterfactual impact of QNIs is a large fraction of progress.

Certainly I don't see fusion reactors, solar panels or (use in electronics of) semiconductors as counterexamples, since each of these was invented at some point, and didn't gradually evolve from some completely different technology.

Another factor is that in software the distance between idea and implementation tends to be smaller, because processors and operating systems abstract much of the details for you. I think this is partly responsible for "startups" being more of a thing in software than in other fields (ofc another part of it is that software is just more new with more low-hanging fruits). And, this probably makes progress in software less smooth.

For unaligned TAI, the effective distance might be shorter still. Because, for most software there's a fair amount of work going towards UI and/or integration with other software. But, with TAI this can be unnecessary. Moreover, the alignment problem itself is part of the would-be idea-to-profitable-implementation gap. On the other hand, optimizing performance can also be a large part of idea-to-implementation, and if the first AGI is e.g. a drastically slow superintelligence, this can be compatible with a slow takeoff.

Certainly I don't see fusion reactors, solar panels or (use in electronics of) semiconductors as counterexamples, since each of these was invented at some point, and didn't gradually evolve from some completely different technology.

Your definition of "discontinuity" seems broadly compatible with my view of the future then. Definitely there are different technologies that are not all outgrowths of one another.

My main point of divergence is:

Now, when a QNI comes along, it doesn't necessarily look like a discontinuity, because there might be a lot of work to bridge the distance between idea and implementation. And, this work involves a lot of small details. Because of this, the fist version is probably often only a slight improvement on SOTA.

I think that most of the time when a QNI comes along it is worse than the previous thing and takes work to bring up to the level of the previous thing. In small areas no one pays attention until it overtakes SOTA, but in big areas people usually start paying attention (and investing a significant fraction of the prior SOTA's size) well before the cross-over point. This seems true for solar or fusion, or digital computers, or deep learning for that matter, or self-driving cars or early automobiles.

If that's right, then you are looking at two continuous curves and you can think about when they cross and you usually start to get a lot of data before the crossover point. And indeed this is obviously how I'm thinking about technologies like deep learning, which are currently useless for virtually all tasks but which I expect to relatively soon overtake alternatives (like humans and other software) in a huge range of very important domains.

And if some other AI technology replaces deep learning, I generally expect the same story. There is a scale at which new things can burst onto the scene, but over time that scale becomes smaller and smaller relative to the scale of the field. At this point the appearance of "bursting onto the scene" is primarily driven by big private projects that don't talk publicly about what they are doing for a while (e.g. putting in 20 person-years of effort before a public announcement, so that they get data internally but an outsider just sees a discontinuity), but even that seems to be drying up fairly quickly.

I'm not sure what's the difference between what you're saying here and what I said about QNIs. Is it that you expect being able to see the emergent technology before the singular (crossover) point? Actually, the fact you describe DL as "currently useless" makes me think we should be talking about progress as a function of two variables: time and "maturity", where maturity inhabits, roughly speaking, a scale from "theoretical idea" to "proof of concept" to "beats SOTA in lab conditions" to "commercial product". In this sense, the "lab progress" curve is already past the DL singularity but the "commercial progress" curve maybe isn't.

On this model, if post-DL AI technology X appears tomorrow, it will take it some time to span the distance from "theoretical idea" to "commercial product", in which time we would notice it and update our predictions accordingly. But, two things to note here:

First, it's not clear which level of maturity is the relevant reference point for AI risk. In particular, I don't think you need commercial levels of maturity for AI technology to become risky, for the reasons I discussed in my previous comment (and, we can also add regulatory barriers to that list, although I am not convinced they are as important as Yudkowsky seems to believe).

Second, all this doesn't sound to me like "AI systems will grow relatively continuously and predictably", although maybe I just interpreted this statement differently from its intent. For instance, I agree that it's unlikely technology X will emerge specifically in the next year, so progress over the next year should be fairly predictable. On the other hand, I don't think it would be very surprising if technology X emerges in the next decade.

IIUC, part of what you're saying can be rephrased as: TAI is unlikely to be created by a small team, since once a small team shows something promising, tonnes of resources will be thrown at them (and at other teams that might be able to copy the technology) and they won't be a small team anymore. Which sounds plausible, I suppose, but doesn't make TAI predictable that long in advance.

Now, when a QNI comes along, it doesn't necessarily look like a discontinuity, because there might be a lot of work to bridge the distance between idea and implementation. And, this work involves a lot of small details. Because of this, the first version is probably often only a slight improvement on SOTA. So, I'm guessing that QNIs produce something more like a discontinuity in the derivative than a discontinuity in the SOTA itself.


Don't have a great source for this at hand, but my impression is that seemingly-QNIs surprisingly often just power existing exponential trends, meaning no change in derivative (on a log graph).

(A random comment in support of this — I remember chip design expert Jim Keller saying on Lex Fridman's podcast that Moore's Law is just a bunch of separate s-curves, as they have to come up with new ideas to work through challenges to shrinking transistors, and the new techniques work for a range of scales and then have to be replaced with new new ideas.)

Not sure if this question is easily settled, but it might be a crux for various views — how often do QNIs actually change the slope of the curve?

The problem with this model is, its predictions depend a lot on how you draw the boundary around "field". Take Yudkowsky's example of startups. How do we explain small startups succeed where large companies failed?


I don't quite see how this is a problem for the model. The narrower you draw the boundary, the more jumpy progress will be, right?

Successful startups are big relative to individuals, but not that big relative to the world as a whole. If we're talking about a project / technology / company that can rival the rest of the world in its output, then the relevant scale is trillions of dollars (prob deca-trillions), not billions.

And while the most fantastically successful startups can become billion dollar companies within a few years, nobody has yet made it to a trillion in less than a decade.

EDIT: To clarify, not trying to say that something couldn't grow faster than any previous startup. There could certainly be a 'kink' in the rate of progress, like you describe. I just want to emphasize that:

  1. startups are not that jumpy, on the world scale
  2. the actual scale of the world matters

A simple model for the discontinuousness of a field might have two parameters — one for the intrinsic lumpiness of available discoveries, and one for total effort going into discovery. And,

  • all else equal, more people means smoother progress — if we lived in a trillion person world, AI progress would be more continuous
  • it's an open empirical question whether the actual values for these parameters will result in smooth or jumpy takeoff:
    • even if investment in AI is in the deca-trillions and a meaningful fraction of all world output, it could still be that the actual territory of available discoveries is so lumpy that progress is discontinuous
    • but, remember that reality has a surprising amount of detail, which I think tends to push things in a smoother direction — it means there are more fiddly details to work through, even when you have a unique insight or technological advantage
      • or, in other words, even if you have a random draw from a distribution that ends up being an outlier, actual progress in the real world will be the result of many different draws, which will tend to push things more toward the regime of normals

I don't quite see how this is a problem for the model. The narrower you draw the boundary, the more jumpy progress will be, right?

So, you're saying: if we draw the boundary around a narrow field, we get jumpy/noisy progress. If we the draw the boundary around a broad field, all the narrow subfields average out and the result is less noise. This makes a lot of sense, thank you!

The question is, what metric do we use to average the subfields. For example, on some metrics the Manhattan project might be a rather small jump in military-technology-averaged-over-subfields. But, its particular subfield had a rather outsized impact! In general, I think that "impactfulness" has a heavy-tailed distribution and therefore the "correct" averaging still leaves a fair amount of jumpiness.

And while the most fantastically successful startups can become billion dollar companies within a few years, nobody has yet made it to a trillion in less than a decade.

Yeaaah, but like I said before, I am skeptical of giving so much weight to data from economics. Economics reflects a lot about people and about the world, but there are facts about physics/math it cannot possibly know about, so evidence from such facts cannot be meaningfully overturned with economic data.

Moreover, from certain angles singleton takeoff can look sort of like a "normal" type of economic story. In one case, person has an idea, does a lot of work, gets investments etc etc and after a decade there's a trillion dollars. In the other case, person builds AI, the AI has some ideas, [stuff happens], after a decade nanobots kill everyone. As Daniel Kokotajlo argued, what's actually important is when the point-of-no-return (PONR) is. And, the PONR might be substantially earlier than the analogue-of-trillion-dollars.

all else equal, more people means smoother progress — if we lived in a trillion person world, AI progress would be more continuous

Would it? It's clear that progress in AI would be faster, but why more continuous?

I think that the causation actually goes in the opposite direction. If a field has a lot of small potential improvements with substantial economic value, then a lot of people will work in the field because (i) you don't need extremely intelligent people to make progress and (ii) it pays off. If a field has a small number of large improvements, then only a small number of people are able to contribute to it. So, a lot of people working on AI is evidence about the kind of progress happening today, but not strong evidence about the absence of significant kinks in the future.

On my version of the "continuous view", the Technology X story seems plausible, but it starts with a shitty version of Technology X that doesn't immediately produce billions of dollars of impact (or something similar, e.g. killing all humans), that then improves faster than the existing technology, such that an outside observer looking at both technologies could use trend extrapolation to predict that Technology X would be the one to reach TAI.

(And you can make this prediction at least, say, 3 years in advance of TAI, i.e. Technology X isn't going to be accelerating so fast that you have zero time to react.)

Yes, this is something I discuss in the edit (you probably started typing your reply before I posted it).

Imagine that we lived in a universe in which it was plausible that the LHC creates a black hole or causes false vacuum collapse. It seems to me that such a universe could still have a techno-economic trajectory broadly similar to our own, for the same reasons. So, in that universe, would it make sense to argue "the LHC cannot destroy the world because its cost is an insufficient fraction of world GDP[1]"? It seems to me it would be strange there in a similar way how the economic argument about AI is strange here.

The "continuous view" argument is about takeoff speeds, not about AI risk?

If AI risk arose from narrow systems that couldn't produce a billion dollars of value then I'd expect that risk could arise more discontinuously from a new paradigm. But AI risk arises from systems that are sufficiently intelligent that they could produce billions of dollars of value.

as late as 1000 years ago, the fastest creatures on Earth are not humans, because you need even more G than that to go faster than cheetahs

...as a matter of fact there is no one investing in making better cheetahs

A little puzzled by Paul's history here. Humans invested exorbitant amounts of money and effort into making better cheetahs, in the sense of 'trying to be able to run much faster and become the fastest creatures on earth'; we call those manufactured cheetahs, "horses". For literally thousands of years, breeding horses has been a central preoccupation of many civilizations and their elites (and horses themselves were merely one of many animals tried by speed-craving royalty & herders - I was intrigued to learn that capturing wild asses to hybridize into "kunga" was recently confirmed by fossil DNA), which is unsurprising because countries could rise and fall based on their mastery of 'gotta go fast' via cart or chariot warfare or horseback archery. Obtaining fast reliable horses could drive all sorts of things like regional trade patterns (eg. the Tibetan tea-horse trade). Even civilizations renowned for their infantry like Greece/Rome still invested huge fortunes into cavalry wings. Where they weren't militarized, they might be the obsession of the post-military aristocracy - the ruin of the British aristocrat was drunken cards, loose women, and fast horses.

Horses, you may notice, still do not sprint faster than cheetahs; this is in part because our methods sucked (in considerable part due to deep misunderstandings of genetics, "humans are great at breeding animals" is not true, and the inauguration of thoroughbred racing led to rapid gains for a while despite all the millennia of pseudo-breeding before), and in part because for physics reasons horses probably can't surpass cheetahs after all no matter how much you spent (and spent, and spent) and so it worked a lot better to apply our collective brains to the invention of cars and planes etc, and now humans really are the fastest creatures on Earth. There's a lot you could say about that. But not "no one was investing in better cheetahs".

Humans invested exorbitant amounts of money and effort into making better cheetahs, in the sense of 'trying to be able to run much faster and become the fastest creatures on earth'; we call those manufactured cheetahs, "horses".

I don't think Paul is talking about that. Consider the previous lines (which seem like they could describe animal breeding to me):

and you think that G doesn't help you improve on muscles and tendons?

until you have a big pile of it?

and Eliezer's response in the following lines:

the natural selection of cheetahs is investing in it

it's not doing so by copying humans because of fundamental limitations

however if we replace it with an average human investor, it still doesn't copy humans, why would it

As I understand the conversation, Eliezer is trying to draw a connection between two different situations:

  1. The natural selection of species on Earth (with various species represented as 'firms')
  2. The market of AGI development on Earth

An important element of this comparison is that from the position of ignorance before G turns out to be hugely valuable in situation 1, the equivalent of 'investors' are doing backward-looking reasoning, where MUSCLES and TENDONS are the core features that are relevant for speed. In situation 2, the 'average human investors' will be playing the same game, throwing money at projects already proven to work instead of projects which haven't worked yet but will.

In that particular line, Paul is (while he's trying to figure out what comparison Eliezer is even trying to make) noting that there aren't 'investors' in situation 1. The thing where humans use G to get better MUSCLES and TENDONS (the breeding you're talking about) is part of the "fingersnap end of the world," not the long story of evolution of cheetahs.

[I see Paul as generally arguing that the road to AGI will be 'obvious in advance', in some important sense, and will get something like a normal economic return for that sort of research investment. I see Eliezer as trying to argue that "no, AGI can come out of left field, because the mechanism that causes it to be possible will be some weird insight that is not well-priced before discovered." Like, fundamentally the question is something like "how efficient and accurate is the AI research market?"]

I agree with your framing, and I think it shows Paul is wrong, leaving aside the specifics of the cheetah thing. Looking back, humans pursued both paths, the path of selecting cheetahs (horses) and of using G to look for completely different paradigms that blow away cheetahs. (Since we aren't evolution, we aren't restricted to picking just one approach.) And we can see the results today: when was the last time you rode a horse?

If you had invested in 'the horse economy' a century ago and bought the stock of bluechip buggywhip manufacturers instead of aerospace & automobile, I don't think you would have done well, because it turns out that horses, even though we have extensively bred them to literally be almost as fast as cheetahs (increasing their speeds by easily a quarter just since the mid-1800s), and pushed them far behind their skittish wild pony ancestors (exemplified by Przewalski's horse), were still blown away by rapid technological developments in other transport methods. If you made projections on the future* of warfare or the economy based on the assumption of smooth obvious progress of ever more refined horse-breeding with plausible asymptoting of their capabilities given fundamental biological limits, well - your forecasts were utterly wrong.

* Trivia: in his Book of the New Sun, Gene Wolfe imagines a world where cavalry developed did continue, in part due to technological regression; in Castle of Days, he explains that part of his setting and he argues that far-future genetically-engineered horses, or "destriers", could get up from quarter horse speeds to closer to 100mph, in which case cavalry becomes a viable alternative to mechanized warfare as they are too fast for regular machine gunnery to stop. (He further concludes: "Thus we have a return to full-fledged cavalry warfare, with dragoons, scouts, and charges. If it sounds fantastic, it should not; it is merely a revival of military techniques that have been in abeyance for scarcely a hundred years. Wait until genetic engineering really gets going and someone questions the need for separable mounts and riders. Fighting centaurs! (Sometimes it almost seems as if the Greeks …)")

I absolutely agree that there are usually multiple ways to do something, often one of them improves faster than current SOTA, and that the faster one often overtakes the slower improving one. I may be misunderstanding what you are taking away from the horses analogy. I don't think this undermines my point (or at least I don't yet see the connection).

I absolutely agree that there are usually multiple ways to do something, often one of them improves faster than current SOTA, and that the faster one often overtakes the slower improving one. I may be misunderstanding what you are taking away from the horses analogy.

The takeaway seems... really obvious to me? In fact, it seems to me that the bolded first sentence of your quote basically is the takeaway: that there are multiple ways of doing things, some of which are faster than others.

This really does seem to me to be all you need to argue for a "discontinuity": just have your timeline play out such that faster way of doing things is discovered after the slower way, and then boom, you have a discontinuity in the rate of progress, located at precisely the point where people switch from the slower way to the faster way. The horses analogy establishes this idea perfectly well in my view, but it seems almost... unnecessary? Like, this is a really obvious point?

And so, like, the question from my perspective is, why would this not be relevant to the idea of slow versus fast takeoff? It continues to remain perfectly plausible to me that a "faster way" of improving "G" exists, and that once this "faster way" is discovered, it's so much faster that it basically obsoletes existing methods in roughly the same way that aerospace & automobile technology obsoleted horses. You say that your model doesn't forbid this, but from my perspective, that... really sounds like you're just conceding the whole argument right there.

Of course, presumably you don't think you're conceding the argument. So what's the remaining disagreement? Is it really just a question of what relative probabilities to assign to such scenarios versus alternative scenarios, i.e. what counts as "plausible"? I'll be honest and say that, unless your probabilities on fast-takeoff-style scenarios are really low, this seems like a pretty pointless line of disagreement to take; conversely, if your probabilities are that low, that seems to me like it'd require positive knowledge of what future AI development will look like, in a low-level, detailed way that I'm pretty sure is not justified by looking at GDP curves or the like. (Also, this comment from you seems to make it pretty clear that your probabilities are not that low.)

I remain confused, even after reading pretty much everything you've publicly written on this topic.

Note that I say this multiple times in the dialog and I agree it's obvious. It also happens all the time in continuous trajectories, so if you think it should lead to discontinuities with high probability it seems like you have a lot of retrodicting-reality to do. 

There are ways of doing things that improve faster, but usually they start off worse. Then they get better, and at some point they overtake the slower-improving alternatives, after which subsequent progress is faster at least for a while.

Sometimes there are exceptions where once something is possible it is necessarily much better than the predecessors (e.g. if there is a fixed cost equal to a significant percentage of GDP, and you can't trade off fixed costs vs marginal costs). But this doesn't happen very much, which isn't so surprising on paper.

I don't think any of this leads to a fast takeoff in theory. Also the view ">30% of progress is stuff from left field that jumps over the competition" doesn't seem at all plausible to me.

in roughly the same way that aerospace & automobile technology obsoleted horses.

I'm totally happy with some better form of future AI being like automobiles, and indeed I think it's extremely likely that AI broadly will replace humans in an automobile-like way. It just seems to me like automobiles obsoleted horses slowly, with lots of crappy automobiles well before usable automobiles. (I don't know much about this case in particular so open to correction, but it seems like common sense / folklore; e.g. see wikipedia.)

There are ways of doing things that improve faster, but usually they start off worse. Then they get better, and at some point they overtake the slower-improving alternatives, after which subsequent progress is faster at least for a while.

I don't see how the faster-improving technology starting off worse doesn't simply strengthen the case for fast takeoff. While it's worse, fewer resources will be invested into it relative to the current best thing, which leaves more room for rapid improvement once a paradigm shift finally occurs and people start switching resources over.

This seems to basically be what happened with automobiles v horses; yes if you specifically look at a Wikipedia article titled "history of the automobile" you will find crappy historical precedents to the (modern) automobile, but the point is precisely that those crappy precedents received comparatively little attention, and therefore did not obsolete horses, until suddenly they became not-so-crappy and... well, did.

I'm not exactly sure what the line of reasoning is here that leads you to look at the existence of crappy historical precedents and conclude, "oh, I guess the fact that these existed means progress wasn't so discontinuous after all!" instead of, "hm, these crappy historical precedents existed and did basically nothing for a long time, which certainly implies a discontinuity at the point where they suddenly took over"; but from the perspective of someone invested in (what gwern described as) the "horse economy", the latter would probably be a much more relevant takeaway than the former.

Sometimes there are exceptions where once something is possible it is necessarily much better than the predecessors (e.g. if there is a fixed cost equal to a significant percentage of GDP, and you can't trade off fixed costs vs marginal costs). But this doesn't happen very much, which isn't so surprising on paper.

I don't think any of this leads to a fast takeoff in theory. Also the view ">30% of progress is stuff from left field that jumps over the competition" doesn't seem at all plausible to me.

I don't disagree with any of what you say here; it just doesn't seem very relevant to me. As you say, something new coming from left field and jumping over the competition is a rare occurrence; certainly not anywhere near 30%. The problem is that the impact distribution of new technologies is heavy-tailed, meaning that you don't need a >30% proportion of new technologies that do the whole "obsoleting" thing, to get a hugely outsized impact from the few that do. Like, it seems to me that the quoted argument could have been made almost word-for-word by someone invested in the "horse economy" in the late 1800s, and it would have nonetheless done nothing to prevent them from being blindsided by the automobile economy.

Which brings me back to the point about needing positive knowledge about the new technology in question, if you want to have any hope of not being blindsided. Without positive knowledge, you're reduced to guessing based on base rates, which again puts you in the position of the horse investor. Fundamentally I don't see an escape to this: you can't draw conclusions about the "physics" of new technologies by looking at GDP curves on graphs; those curves don't (and can't) reflect phenomena that haven't been discovered yet.

How about discontinuities like inventing algorithms? I think often performance on a task gets jumps of O(n^k) to O(n^k') with k' < k, or even O(k^n) to O(k'^n). I'd guess you'd say that these sorts of jumps would smooth out by being aggregrated. But I guess I don't see why you think that the level at which jumps from algorithmic invention happen, is enough "lower" than the level at which meaningful progress towards TAI happens, for this smoothing out to happen.


(Or maybe, do you think jumps like this don't happen (because in practice there's intermediate bad versions of new algorithmic ideas), or don't represent much discontinuity (because they're invented when the task in question is in a regime where the performance is still comparable, or something), or aren't similar to inventions of cognitive algorithms (e.g. because cognitive stuff is more like accumulating content or something)?)

What kind of example do you have in mind? Even for algorithmic problems with relatively small communities (and low $ invested) I think it's still pretty rare to have big jumps like this (e..g measuring performance for values of n at the scale where it can be run on conventional communities). I'm thinking of domains like combinatorial optimization and constraint satisfaction, convex optimization, graph algorithms. In most cases I think you get to a pretty OK algorithm quite quickly and further progress is slow. Exceptions tend to be cases like "there was an algorithm that works well in practice but not the worst case" or "the old algorithm got an approximation ratio of 0.87 but the new one gets an approximation ratio of 0.92 and if you absolutely require 0.9 the new one is very much faster" or extremely niche problems. But my knowledge isn't that deep.

(And maybe what I'd say quantitatively is that something like ~10% of the log-space progress on problems people care about comes from big jumps vs 90% from relatively smooth improvements, with the number getting lower for stricter notions of "people care about" and the step size for "relatively smooth improvement" being defined by how many people are working on it.)

(I'm sure you know more than I do about algorithms.)

What kind of example do you have in mind?
~10% of the log-space progress on problems people care about comes from big jumps vs 90% from relatively smooth improvements,

I'm thinking of the difference between insertion sort / bubble sort vs radix sort / merge sort.

(Knuth gives an interesting history here (Art of Programming Vol 3, section 5.5, p 383); apparently in 1890 the US census data was processed using the radix sorting algorithm running on a mechanical-electronic-human hybrid. There was an order-preserving card-stack merging machine in 1938. Then in 1945, von Neumann wrote down a merge sort, while independently Zuse wrote down an insertion sort.)

I guess we're talking past each other because we're answering different versions of "What is continuous in what?". Performance on a task can be, and is, much more continuous in time than "ideas" are continuous in time, because translating ideas into performance on a task takes resources (money, work, more ideas). So I concede that what I said here:

I think often performance on a task gets jumps

was mostly incorrect, if we don't count the part where

you get to a pretty OK algorithm quite quickly

So one question is, is TAI driven by ideas that will have a stage where they get to a pretty okay version quite quickly once the "idea" is there, or no, or what? Another question is, do you think "ideas" are discontinuous?

Like, fundamentally the question is something like "how efficient and accurate is the AI research market?"

I would distinguish two factors:

  • How powerful and well-directed is the field's optimization?
  • How much does the technology inherently lend itself to information asymmetries?

You could turn the "powerful and well-directed" dial up to the maximum allowed by physics, and still not thereby guarantee that information asymmetries are rare, because the way that a society applies maximum optimization pressure to reaching AGI ASAP might route through a lot of individuals and groups going down different rabbit holes. A researcher could be rationally optimistic about her rabbit hole based on specialized knowledge or experience that's hard to instantly transmit to investors, the field as a whole, etc.

I don't think this is relevant to the disanalogy I was trying to make, which was between natural selection and investors. It seems like I'm thinking about the comparison in a different way here. Hopefully this explains your puzzlement.

That said, since I can't resist responding to random comments: are horses really being bred for sprinting as fast as they can for 20-30 seconds? (Isn't that what cheetahs are so good at?) What is the military/agricultural/trade context in which that is relevant? Who cares other than horse racers? Over any of the distances where people are using horses I would expect them to be considerably faster than cheetahs even if both are unburdened. I don't know much about horses though.

That said, since I can't resist responding to random comments: are horses really being bred for sprinting as fast as they can for 20-30 seconds?

Yes, they were, and they still are. Cavalry charges are not that long*, and even if you want to absurdly nitpick on this exact basis where 20-30 seconds counts but 30-40s doesn't, well, as it happens, 20-30s is exactly about how long quarter horse races last. (Quarter horses, incidentally, now reach the low end of cheetah top speeds: 55mph, vs ~60mph. So depending on which pair of horses & cheetahs you compare, we did succeed in breeding a better cheetah, because you can't ride a cheetah at any speed and they have much worse endurance etc. I'm not going to insist on this point, however, because I think it's unnecessary, even if it should make you a little more humble in your assertions about what humans have and have not done.)

Who cares other than horse racers?

First, what's wrong with horse racers? It's the sport of kings. (The causality in that phrase going both directions.) Horse racers are real, they do in fact exist, I have met them in the flesh and will testify that they are not "no one". You disdained the existence of any organized large human investments into making extremely fast animals. The millions of quarter horse owners dating back to the 1600s, as well as all the horse breeders and horse racers throughout human history, beg to differ, as they are just as valid an example as any other human would be of such things existing - quite aside from reasons everyone else might care like minor things like "civilizations rose and fell on how well they did this".

at is the military/agricultural/trade context in which that is relevant?

I dunno. Since you didn't specify, apparently the relevance of the context wasn't relevant to your point.

* They don't need to be. Think about how far accurate arrow fire goes, and how long it takes a horse to cover that distance if it's sprinting at 30MPH+.

I was saying that natural selection is not a human investor and behaves differently, responding to Eliezer saying "not as a metaphor but as simple historical fact, that's how it played out." I'm sorry if the exchange was unclear (but hopefully not surprising since it was a line of chat in a fast dialog written in about 3 seconds.) I think that you have to make an analogy because the situation is not obviously structurally identical and there are different analogies you could draw here and it was not clear which one he was making. 

I'm sorry I engaged about horse breeding (I think it was mostly a distraction).

That said, since I can't resist responding to random comments: are horses really being bred for sprinting as fast as they can for 20-30 seconds? (Isn't that what cheetahs are so good at?) What is the military/agricultural/trade context in which that is relevant? Who cares other than horse racers? Over any of the distances where people are using horses I would expect them to be considerably faster than cheetahs even if both are unburdened. I don't know much about horses though.

My understanding is that the primary military use of horses in Europe for elites was charges into massed infantry, which were not all that much longer than 30 seconds (rarely more than a few minutes). I would expect them to care more about things like carrying capacity and psychology than sprinting speed (as you want to stay in formation, and be able to break thru even if they don't break formation). Other places focused on horse archers, which involved the horses moving rapidly for much longer periods of time, or skirmishers, where you cared more about the cheetah-like ability to run down someone trying to get away from you.

But the thing that's more robust is that the sub-taking-over-world AI is already really important, and receiving huge amounts of investment, as something that automates the R&D process. And it seems like the best guess given what we know now is that this process starts years before the singularity.

Reading this, I thought about asymptotes and trajectories and universality and stuff. I wonder if there's a disagreement here about unbounded vs. bounded trajectories of these AIs that automate R&D (like, Eliezer thinks they'll have asymptotes that top out at sub-human intelligence, until they're not and have asymptotes that are superintelligent, whereas Paul thinks they'll have trajectories that are just continuously steeper and steeper but always unbounded after some point well before superintelligence).

no, I think a human in a big animal body, with brain adapted to operate that body instead of our own, would beat a big animal straightforwardly

According to https://en.wikipedia.org/wiki/File:Brain-body_mass_ratio_for_some_animals_diagram.svg the hippopotamus would be the animal that "gained" the most from having a human brain, assuming that brain-body mass ratio indicates intelligence.

I could see the Ronaldo-brained hippopotamus winning by planning, cooperating and learning more from experience, but I'm not seeing a big advantage in a straight-up fight.

I think human neural cells are 'just better'; there's some evidence in mice to this effect:

We found that the human glial chimeras indeed performed better than control mice across a variety of learning tasks, that included auditory fear conditioning, novel object and place recognition, and Barnes maze navigation. In all of these tests - but not in any test of social interactivity or primary perception - the human glial chimeras performed better and acquired new causal associations more quickly than did murine-allografted or untransplanted controls (Han et al., 2013). In short, they learned more rapidly, and at least along the axes of the tests performed, could be defined - if colloquially so - as smarter. As such, these glial chimeras may provide us a viable - if provocative - model by which to evaluate the species-specific contributions of human glia to human cognition.

Absolutely fascinating link - strong upvote! Han et al.,2013 did not investigate motor control and motion planning, but I agree that human neural cells probably are just better, though possibly requiring more energy.

From martial arts, I'm convinced people have different innate levels of motor control and motion planning, and this helps nontrivially in fights. However, brains and muscles both require energy, and I'd generally give the advantage to the person with +1std muscles over the person with +1std motor control, assuming both are untrained.