# All of paulfchristiano's Comments + Replies

Yudkowsky and Christiano discuss "Takeoff Speeds"

Going through previous ten IMOs, and imagining a very impressive automated theorem prover, I think

• 2020 - unlikely, need 5/6 and probably can't get problems 3 or 6. Also good chance to mess up at 4 or 5
• 2019 - tough but possible, 3 seems hard but even that is not unimaginable, 5 might be hard but might be straightforward, and it can afford to get one wrong
• 2018 - tough but possible, 3 is easier for machine than human but probably still hard, 5 may be hard, can afford to miss one
• 2017 - tough but possible, 3 looks out of reach, 6 looks hard but not sure a
gwern2d14Ω7

What do you think of Deepmind's new whoop-de-doo about doing research-level math assisted by GNNs?

Yudkowsky and Christiano discuss "Takeoff Speeds"

I looked at a few recent IMOs to get better calibrated. I think the main update is that I significantly underestimated how many years you can get a gold with only 4/6 problems.

For example I don't have the same "this is impossible" reaction about IMO 2012 or IMO 2015 as about most years. That said, I feel like they do have to get reasonably lucky with both IMO content and someone has to make a serious and mostly-successful effort, but I'm at least a bit scared by that. There's also quite often a geo problem as 3 or 6.

Might be good to make some side be... (read more)

Going through previous ten IMOs, and imagining a very impressive automated theorem prover, I think

• 2020 - unlikely, need 5/6 and probably can't get problems 3 or 6. Also good chance to mess up at 4 or 5
• 2019 - tough but possible, 3 seems hard but even that is not unimaginable, 5 might be hard but might be straightforward, and it can afford to get one wrong
• 2018 - tough but possible, 3 is easier for machine than human but probably still hard, 5 may be hard, can afford to miss one
• 2017 - tough but possible, 3 looks out of reach, 6 looks hard but not sure a
Yudkowsky and Christiano discuss "Takeoff Speeds"

I think Metaculus is closer to Eliezer here: conditioned on this problem being resolved it seems unlikely for the AI to be either open-sourced or easily reproducible.

4Matthew Barnett3dMy honest guess is that most predictors didn’t see that condition and the distribution would shift right if someone pointed that out in the comments.
Christiano, Cotra, and Yudkowsky on AI progress

I generally expect smoother progress, but predictions about lulls are probably dominated by Eliezer's shorter timelines. Also lulls are generally easier than spurts, e.g. I think that if you just slow investment growth you get a lull and that's not too unlikely (whereas part of why it's hard to get a spurt is that investment rises to levels where you can't rapidly grow it further).

Yudkowsky and Christiano discuss "Takeoff Speeds"

I don't care about whether the AI is open-sourced (I don't expect anyone to publish the weights even if they describe their method) and I'm not that worried about our ability to arbitrate overfitting.

Ajeya suggested that I clarify: I'm significantly more impressed by an AI getting a gold medal than getting a bronze, and my 4% probability is for getting a gold in particular (as described in the IMO grand challenge). There are some categories of problems that can be solved using easy automation (I'd guess about 5-10% could be done with no deep learning and m... (read more)

I looked at a few recent IMOs to get better calibrated. I think the main update is that I significantly underestimated how many years you can get a gold with only 4/6 problems.

For example I don't have the same "this is impossible" reaction about IMO 2012 or IMO 2015 as about most years. That said, I feel like they do have to get reasonably lucky with both IMO content and someone has to make a serious and mostly-successful effort, but I'm at least a bit scared by that. There's also quite often a geo problem as 3 or 6.

Might be good to make some side be... (read more)

Yudkowsky and Christiano discuss "Takeoff Speeds"

I think I'll get less confident as our accomplishments get closer to the IMO grand challenge. Or maybe I'll get much more confident if we scale up from $1M ->$1B and pick the low hanging fruit without getting fairly close, since at that point further progress gets a lot easier to predict

There's not really a constant time horizon for my pessimism, it depends on how long and robust a trend you are extrapolating from. 4 years feels like a relatively short horizon, because theorem-proving has not had much investment so compute can be scaled up several orde... (read more)

Christiano, Cotra, and Yudkowsky on AI progress

I think there are two easy ways to get hyperbolic growth:

• As long as there is free energy in the environment, without any technological change you can grow like . Then if there is any technological progress that can be driven by your expanding physical civilization, then you get , where  depends on how fast the returns to technology diminish.
• Even without physical growth, if you have sufficiently good returns to technology (as we observe for historical technologies, if you treat doubling food as doubling output, or
Christiano, Cotra, and Yudkowsky on AI progress

. It's the solution to the differential equation  instead of . I usually use it more broadly for , which is the solution to

3rohinmshah5dNitpick: Isn't1/xαthe solution [https://www.wolframalpha.com/input/?i=f%27%28x%29+%3D+f%28x%29%5E%281%2B1%2Fa%29] forf′(x)=f(x)1+1αmodulo constants? Or equivalently,1x1αis the solution tof′(x)=f (x)1+α.
6TekhneMakre8dWhy do you use this form? Do you lean more on: 1. Historical trends that look hyperbolic; 2. Specific dynamical models like: let α be the synergy between "different innovations" as they're producing more innovations; this gives f'(x) = f(x)^(1+α) *; or another such model?; 3. Something else? I wonder if there's a Paul-Eliezer crux here about plausible functional forms. For example, if Eliezer thinks that there's very likely also a tech tree of innovations that change the synergy factor α, we get something like e.g. (a lower bound of) f'(x) = f(x)^f(x). IDK if there's any help from specific forms; just that, it's plausible that there's forms that are (1) pretty simple, pretty straightforward lower bounds from simple (not necessarily high confidence) considerations of the dynamics of intelligence, and (2) look pretty similar to hyperbolic growth, until they don't, and the transition happens quickly. Though maybe, if Eliezer thinks any of this and also thinks that these superhyperbolic synergy dynamics are already going on, and we instead use a stochastic differential equation, there should be something more to say about variance or something pre-End-times. *ETA: for example, if every innovation combines with every other existing innovation to give one unit of progress per time, we get the hyperbolic f'(x) = f(x)^2; if innovations each give one progress per time but don't combine, we get the exponential f'(x) = f(x).
Christiano, Cotra, and Yudkowsky on AI progress

My claim is that the timescale of AI self-improvement, at the point it takes over from humans, is the same as the previous timescale of human-driven AI improvement. If it was a lot faster, you would have seen a takeover earlier instead.

This claim is true in your model. It also seems true to me about hominids, that is I think that cultural evolution took over roughly when its timescale was comparable to the timescale for biological improvements, though Eliezer disagrees

I thought Eliezer's comment "there is a sufficiently high level where things go who... (read more)

1JBlack8dMy main concern is that progress on the frontier tends to be bursty. There are many metrics of AI performance on particular tasks where performance does indeed increase fairly continuously on the larger scale, but not in detail. Over the scale of many years it goes from abysmal to terrible to merely bad to nearly human to worse than human in some ways but better than human in others, and then to superhuman. Each of these transitions is often a sharp jump, but you see steady progress if you plot it on a graph. When you combine with having thousands of types of tasks, you end up with an overview of even smoother progress over the whole field. There are three problems I'm worried about. The first is that "designing better AIs" may turn out to be a relatively narrow task, and subject to a lot more burstiness than broad spectrum performance that could steadily increase world GDP. The second is that for purposes of the future of humanity, only the last step from human-adjacent to strictly superhuman really matters. On the scale of intelligence for all the beings we know about, chimpanzees are very nearly human, but the economic effect of chimpanzees is essentially zero. The third is that we are nowhere near fully exploiting the hardware we have for AI, and I expect that to continue for quite a while. I think any two of these three are enough for a fast takeoff with little warning.

It seems to me that Eliezer's model of AGI is bit like an engine, where if any important part is missing, the entire engine doesn't move. You can move a broken steam locomotive as fast as you can push it, maybe 1km/h. The moment you insert the missing part, the steam locomotive accelerates up to 100km/h. Paul is asking "when does the locomotive move at 20km/h" and Eliezer says "when the locomotive is already at full steam and accelerating to 100km/h." There's no point where the locomotive is moving at 20km/h and not accelerating, because humans can't push ... (read more)

Christiano, Cotra, and Yudkowsky on AI progress

He says things like AlphaGo or GPT-3 being really surprising to gradualists, suggesting he thinks that gradualism only works in hindsight.

I agree that after shaking out the other disagreements, we could just end up with Eliezer saying "yeah but automating AI R&D is just fundamentally unlike all the other tasks to which we've applied AI" (or "AI improving AI will be fundamentally unlike automating humans improving AI") but I don't think that's the core of his position right now.

Christiano, Cotra, and Yudkowsky on AI progress

I think "on track to foom" is a very long way before "actually fooms."

Yudkowsky and Christiano discuss "Takeoff Speeds"

I think IMO gold medal could be well before massive economic impact, I'm just surprised if it happens in the next 3 years. After a bit more thinking (but not actually looking at IMO problems or the state of theorem proving) I probably want to bump that up a bit, maybe 2%, it's hard reasoning about the tails.

I'd say <4% on end of 2025.

I think this is the flipside of me having an intuition where I say things like "AlphaGo and GPT-3 aren't that surprising"---I have a sense for what things are and aren't surprising, and not many things happen that are... (read more)

Maybe another way of phrasing this - how much warning do you expect to get, how far out does your Nope Vision extend?  Do you expect to be able to say "We're now in the 'for all I know the IMO challenge could be won in 4 years' regime" more than 4 years before it happens, in general?  Would it be fair to ask you again at the end of 2022 and every year thereafter if we've entered the 'for all I know, within 4 years' regime?

Added:  This question fits into a larger concern I have about AI soberskeptics in general (not you, the soberskeptics wou... (read more)

Christiano, Cotra, and Yudkowsky on AI progress

Oops, this was in reference to the later part of the discussion where you disagreed with "a human in a big animal body, with brain adapted to operate that body instead of our own, would beat a big animal [without using tools]".

Christiano, Cotra, and Yudkowsky on AI progress

It seems to me like Eliezer rejects a lot of important heuristics like "things change slowly" and "most innovations aren't big deals" and so on. One reason he may do that is because he literally doesn't know how to operate those heuristics, and so when he applies them retroactively they seem obviously stupid. But if we actually walked through predictions in advance, I think he'd see that actual gradualists are much better predictors than he imagines.

4Adele Lopez9dThat seems a bit uncharitable to me. I doubt he rejects those heuristics wholesale. I'd guess that he thinks that e.g. recursive self improvement is one of those things where these heuristics don't apply, and that this is foreseeable because of e.g. the nature of recursion. I'd love to hear more about what sort of knowledge about "operating these heuristics" you think he's missing! Anyway, it seems like he expects things to seem more-or-less gradual up until FOOM, so I think my original point still applies: I think his model would not be "shaken out" of his fast-takeoff view due to successful future predictions (until it's too late).
Christiano, Cotra, and Yudkowsky on AI progress

(I'm interested in which of my claims seem to dismiss or not adequately account for the possibility that continuous systems have phase changes.)

This section seemed like an instance of you and Eliezer talking past each other in a way that wasn't locating a mathematical model containing the features you both believed were important (e.g. things could go "whoosh" while still being continuous):

[Christiano][13:46]

Even if we just assume that your AI needs to go off in the corner and not interact with humans, there’s still a question of why the self-contained AI civilization is making ~0 progress and then all of a sudden very rapid progress

[Yudkowsky][13:46]

unfortunately a lot of what you are saying, fro... (read more)

Christiano, Cotra, and Yudkowsky on AI progress

(ETA: this wasn't actually in this log but in a future part of the discussion.)

I found the elephants part of this discussion surprising. It looks to me like human brains are better than elephant brains at most things, and it's interesting to me that Eliezer thought otherwise. This is one of the main places where I couldn't predict what he would say.

8Eliezer Yudkowsky9dI also think human brains are better than elephant brains at most things - what did I say that sounded otherwise?
Christiano, Cotra, and Yudkowsky on AI progress

I agree we seem to have some kind of deeper disagreement here.

I think stack more layers + known training strategies (nothing clever) + simple strategies for using test-time compute (nothing clever, nothing that doesn't use the ML as a black box) can get continuous improvements in tasks like reasoning (e.g. theorem-proving), meta-learning (e.g. learning to learn new motor skills), automating R&D (including automating executing ML experiments, or proposing new ML experiments), or basically whatever.

I think these won't get to human level in the next 5 yea... (read more)

5Søren Elverlin9dHow long time do you see between "1 AI clearly on track to Foom" and "First AI to actually Foom"? My weak guess is Eliezer would say "Probably quite little time", but your model of the world requires the GWP to double over a 4 year period, and I'm guessing that period probably starts later than 2026. I would be surprised if by 2027, I could point to an AI that for a full year had been on track to Foom, without Foom happening.
Christiano, Cotra, and Yudkowsky on AI progress

I'm mostly not looking for virtue points, I'm looking for: (i) if your view is right then I get some kind of indication of that so that I can take it more seriously, (ii) if your view is wrong then you get some indication feedback to help snap you out of it.

I don't think it's surprising if a GPT-3 sized model can do relatively good translation. If talking about this prediction, and if you aren't happy just predicting numbers for overall value added from machine translation, I'd kind of like to get some concrete examples of mediocre translations or concrete problems with existing NMT that you are predicting can be improved.

8Adele Lopez9dIt seems like Eliezer is mostly just more uncertain about the near future than you are, so it doesn't seem like you'll be able to find (ii) by looking at predictions for the near future.
Yudkowsky and Christiano discuss "Takeoff Speeds"

Yes, IMO challenge falling in 2024 is surprising to me at something like the 1% level or maybe even more extreme (though could also go down if I thought about it a lot or if commenters brought up relevant considerations, e.g. I'd look at IMO problems and gold medal cutoffs and think about what tasks ought to be easy or hard; I'm also happy to make more concrete per-question predictions). I do think that there could be huge amounts of progress from picking the low hanging fruit and scaling up spending by a few orders of magnitude, but I still don't expect i... (read more)

Okay, then we've got at least one Eliezerverse item, because I've said below that I think I'm at least 16% for IMO theorem-proving by end of 2025.  The drastic difference here causes me to feel nervous, and my second-order estimate has probably shifted some in your direction just from hearing you put 1% on 2024, but that's irrelevant because it's first-order estimates we should be comparing here.

So we've got huge GDP increases for before-End-days signs of Paulverse and quick IMO proving for before-End-days signs of Eliezerverse?  Pretty bare port... (read more)

Christiano, Cotra, and Yudkowsky on AI progress

If you give me 1 or 10 examples of surface capabilities I'm happy to opine. If you want me to name industries or benchmarks, I'm happy to opine on rates of progress. I don't like the game where you say "Hey, say some stuff. I'm not going to predict anything and I probably won't engage quantitatively with it since I don't think much about benchmarks or economic impacts or anything else that we can even talk about precisely in hindsight for GPT-3."

I don't even know which of Gwern's questions you think are interesting/meaningful. "Good meta-learning"--I don't... (read more)

Mostly, I think the Future is not very predictable in some ways, and this extends to, for example, it being the possible that 2022 is the year where we start Final Descent and by 2024 it's over, because it so happened that although all the warning signs were Very Obvious In Retrospect they were not obvious in antecedent and so stuff just started happening one day.  The places where I dare to extend out small tendrils of prediction are the rare exception to this rule; other times, people go about saying, "Oh, no, it definitely couldn't start in 2022" a... (read more)

Christiano, Cotra, and Yudkowsky on AI progress

I don’t really feel like anything you are saying undermines my position here, or defends the part of Eliezer’s picture I’m objecting to.

(ETA: but I agree with you that it's the right kind of model to be talking about and is good to bring up explicitly in discussion. I think my failure to do so is mostly a failure of communication.)

I usually think about models that show the same kind of phase transition you discuss, though usually significantly more sophisticated models and moving from exponential to hyperbolic growth (you only get an exponential in your mo... (read more)

3Conor Sullivan8dExcuse my ignorance, what does a hyperbolic function look like? If an exponential is f(x) = r^x, what is f(x) for a hyperbolic function?
Yudkowsky and Christiano discuss "Takeoff Speeds"

I'm going to make predictions by drawing straight-ish lines through metrics like the ones in the gpt-f paper. Big unknowns are then (i) how many orders of magnitude of "low-hanging fruit" are there before theorem-proving even catches up to the rest of NLP? (ii) how hard their benchmarks are compared to other tasks we care about. On (i) my guess is maybe 2? On (ii) my guess is "they are pretty easy" / "humans are pretty bad at these tasks," but it's somewhat harder to quantify. If you think your methodology is different from that then we will probably end u... (read more)

I have a sense that there's a lot of latent potential for theorem-proving to advance if more energy gets thrown at it, in part because current algorithms seem a bit weird to me - that we are waiting on the equivalent of neural MCTS as an enabler for AlphaGo, not just a bigger investment, though of course the key trick could already have been published in any of a thousand papers I haven't read.  I feel like I "would not be surprised at all" if we get a bunch of shocking headlines in 2023 about theorem-proving problems falling, after which the IMO chal... (read more)

Yudkowsky and Christiano discuss "Takeoff Speeds"

This seems totally bogus to me.

It feels to me like you mostly don't have views about the actual impact of AI as measured by jobs that it does or the $s people pay for them, or performance on any benchmarks that we are currently measuring, while I'm saying I'm totally happy to use gradualist metrics to predict any of those things. If you want to say "what does it mean to be a gradualist" I can just give you predictions on them. To you this seems reasonable, because e.g.$ and benchmarks are not the right way to measure the kinds of impacts we care abou... (read more)

I kind of want to see you fight this out with Gwern (not least for social reasons, so that people would perhaps see that it wasn't just me, if it wasn't just me).

But it seems to me that the very obvious GPT-5 continuation of Gwern would say, "Gradualists can predict meaningless benchmarks, but they can't predict the jumpy surface phenomena we see in real life."  We want to know when humans land on the moon, not whether their brain sizes continued on a smooth trend extrapolated over the last million years.

I think there's a very real sense in which, yes... (read more)

Yudkowsky and Christiano discuss "Takeoff Speeds"

Yes, I think that value added by automated translation will follow a similar pattern. Number of words translated is more sensitive to how you count and random nonsense, as is number of "users" which has even more definitional issues.

You can state a prediction about self-driving cars in any way you want. The obvious thing is to talk about programs similar to the existing self-driving taxi pilots (e.g. Waymo One) and ask when they do $X of revenue per year, or when$X of self-driving trucking is done per year. (I don't know what AI freedom-of-the-road means, do you mean something significantly more ambitious than self-driving trucks or taxis?)

Yudkowsky and Christiano discuss "Takeoff Speeds"

I think natural selection has lots of similarities to R&D, but (i) there are lots of ways of drawing the analogy, (ii) some important features of R&D are missing in evolution, including some really important ones for fast takeoff arguments (like the existence of actors who think ahead).

If someones wants to spell out why they think evolution of hominids means takeoff is fast then I'm usually happy to explain why I disagree with their particular analogy. I think this happens in the next discord log between me and Eliezer.

Yudkowsky and Christiano discuss "Takeoff Speeds"

It's not a description of hominids at all, no one spent any money on R&D.

I think there are analogies where this would be analogous to hominids (which I think are silly, as we discuss in the next part of this transcript). And there are analogies where this is a bad description of hominids (which I prefer).

Spending money on R&D is essentially the expenditure of resources in order to explore and optimize over a promising design space, right? That seems like a good description of what natural selection did in the case of hominids. I imagine this still sounds silly to you, but I'm not sure why. My guess is that you think natural selection isn't relevantly similar because it didn't deliberately plan to allocate resources as part of a long bet that it would pay off big.

Yudkowsky and Christiano discuss "Takeoff Speeds"

jumping to newly accessible domains

Man, the problem is that you say the "jump to newly accessible domains" will be the thing that lets you take over the world. So what's up for dispute is the prototype being enough to take over the world rather than years of progress by a giant lab on top of the prototype. It doesn't help if you say "I expect new things to sometimes become possible" if you don't further say something about the impact of the very early versions of the product.

Maybe you'll want to say that however much Google spends on that, they must ration

The crazy part is someone spending $1B and then generating$100B/year in revenue (much less $100M and then taking over the world). Would you say that this is a good description of Suddenly Hominids but you don't expect that to happen again, or that this is a bad description of hominids? Yudkowsky and Christiano discuss "Takeoff Speeds" I'd be happy to disagree about romantic chatbots or machine translation. I'd have to look into it more to get a detailed sense in either, but I can guess. I'm not sure what "wouldn't be especially surprised" means, I think to actually get disagreements we need way more resolution than that so one question is whether you are willing to play ball (since presumably you'd also have to looking into to get a more detailed sense). Maybe we could save labor if people would point out the empirical facts we're missing and we can revise in light of that, but we'd sti... (read more) 9Eliezer Yudkowsky9dThanks for continuing to try on this! Without having spent a lot of labor myself on looking into self-driving cars, I think my sheer impression would be that we'll get$1B/yr waifutech before we get AI freedom-of-the-road; though I do note again that current self-driving tech would be more than sufficient for $10B/yr revenue if people built new cities around the AI tech level, so I worry a bit about some restricted use-case of self-driving tech that is basically possible with current tech finding some less regulated niche worth a trivial$10B/yr. I also remark that I wouldn't be surprised to hear that waifutech is already past 1B/yr in China, but I haven't looked into things there. I don't expect the waifutech to transcend my own standards for mediocrity, but something has to be pretty good before I call it more than mediocre; do you think there's particular things that waifutech won't be able to do? My model permits large jumps in ML translation adoption; it is much less clear about whether anyone will be able to build a market moat and charge big prices for it. Do you have a similar intuition about # of users increasing gradually, not just revenue increasing gradually? I think we're still at the level of just drawing images about the future, so that anybody who came back in 5 years could try to figure out who sounded right, at all, rather than assembling a decent portfolio of bets; but I also think that just having images versus no images is a lot of progress. Yudkowsky and Christiano discuss "Takeoff Speeds" My uncharitable read on many of these domains is that you are saying "Sure, I think that Paul might have somewhat better forecasts than me on those questions, but why is that relevant to AGI?" In that case it seems like the situation is pretty asymmetrical. I'm claiming that my view of AGI is related to beliefs and models that also bear on near-term questions, and I expect to make better forecasts than you in those domains because I have more accurate beliefs/models. If your view of AGI is unrelated to any near-term questions where we disagree, then that seems like an important asymmetry. Yudkowsky and Christiano discuss "Takeoff Speeds" I think you think there's a particular thing I said which implies that the ball should be in my court to already know a topic where I make a different prediction from what you do. I've said I'm happy to bet about anything, and listed some particular questions I'd bet about where I expect you to be wronger. If you had issued the same challenge to me, I would have picked one of the things and we would have already made some bets. So that's why I feel like the ball is in your court to say what things you're willing to make forecasts about. That said, I don't kn... (read more) I think you are underconfident about the fact that almost all AI profits will come from areas that had almost-as-much profit in recent years. So we could bet about where AI profits are in the near term, or try to generalize this. I wouldn't be especially surprised by waifutechnology or machine translation jumping to newly accessible domains (the thing I care about and you shrug about (until the world ends)), but is that likely to exhibit a visible economic discontinuity in profits (which you care about and I shrug about (until the world ends))? There'... (read more) My uncharitable read on many of these domains is that you are saying "Sure, I think that Paul might have somewhat better forecasts than me on those questions, but why is that relevant to AGI?" In that case it seems like the situation is pretty asymmetrical. I'm claiming that my view of AGI is related to beliefs and models that also bear on near-term questions, and I expect to make better forecasts than you in those domains because I have more accurate beliefs/models. If your view of AGI is unrelated to any near-term questions where we disagree, then that seems like an important asymmetry. Yudkowsky and Christiano discuss "Takeoff Speeds" Inevitably, you can go back afterwards and claim it wasn't really a surprise in terms of the abstractions that seem so clear and obvious now, but I think it was surprised then It seems like you are saying that there is some measure that was continuous all along, but that it's not obvious in advance which measure was continuous. That seems to suggest that there are a bunch of plausible measures you could suggest in advance, and lots of interesting action will be from changes that are discontinuous changes on some of those measures. Is that right? If so, don't... (read more) Yudkowsky and Christiano discuss "Takeoff Speeds" Suppose your view is "crazy stuff happens all the time" and my view is "crazy stuff happens rarely." (Of course "crazy" is my word, to you it's just normal stuff.) Then what am I supposed to do, in your game? More broadly: if you aren't making bold predictions about the future, why do you think that other people will? (My predictions all feel boring to me.) And if you do have bold predictions, can we talk about some of them instead? It seems to me like I want you to say "well I think 20% chance something crazy happens here" and I say "nah, that's more like 5... (read more) I predict that people will explicitly collect much larger datasets of human behavior as the economic stakes rise. This is in contrast to e.g. theorem-proving working well, although I think that theorem-proving may end up being an important bellwether because it allows you to assess the capabilities of large models without multi-billion-dollar investments in training infrastructure. Well, it sounds like I might be more bullish than you on theorem-proving, possibly. Not on it being useful or profitable, but in terms of underlying technology making progr... (read more) Yudkowsky and Christiano discuss "Takeoff Speeds" I think that most people who work on models like GPT-3 seem more interested in trendlines than you do here. That said, it's not super clear to me what you are saying so I'm not sure I disagree. Your narrative sounds like a strawman since people usually extrapolate performance on downstream tasks they care about rather than on perplexity. But I do agree that the updates from GPT-3 are not from OpenAI's marketing but instead from people's legitimate surprise about how smart big language models seem to be. As you say, I think the interesting claim in GPT-... (read more) Yudkowsky and Christiano discuss "Takeoff Speeds" From my perspective, the basic problem is that Eliezer's story looks a lot like "business as usual until the world starts to end sharply", and Paul's story looks like "things continue smoothly until their smooth growth ends the world smoothly", and both of have ever heard of superforecasting and both of us are liable to predict near-term initial segments by extrapolating straight lines while those are available. I agree that it's plausible that we both make the same predictions about the near future. I think we probably don't, and there are plenty of disagr... (read more) I feel a bit confused about where you think we meta-disagree here, meta-policy-wise. If you have a thesis about the sort of things I'm liable to disagree with you about, because you think you're more familiar with the facts on the ground, can't you write up Paul's View of the Next Five Years and then if I disagree with it better yet, but if not, you still get to be right and collect Bayes points for the Next Five Years? I mean, it feels to me like this should be a case similar to where, for example, I think I know more about macroeconomics than your t... (read more) Yudkowsky and Christiano discuss "Takeoff Speeds" sort of person who gets taken in by Hanson's arguments in 2008 and gets caught flatfooted by AlphaGo and GPT-3 and AlphaFold 2 I find this kind of bluster pretty frustrating and condescending. I also feel like the implication is just wrong---if Eliezer and I disagree, I'd guess it's because he's worse at predicting ML progress. To me GPT-3 feels much (much) closer to my mainline than to Eliezer's, and AlphaGo is very unsurprising. But it's hard to say who was actually "caught flatfooted" unless we are willing to state some of these predictions in advance. I ... (read more) I wish to acknowledge this frustration, and state generally that I think Paul Christiano occupies a distinct and more clueful class than a lot of, like, early EAs who mm-hmmmed along with Robin Hanson on AI - I wouldn't put, eg, Dario Amodei in that class either, though we disagree about other things. But again, Paul, it's not enough to say that you weren't surprised by GPT-2/3 in retrospect, it kinda is important to say it in advance, ideally where other people can see? Dario picks up some credit for GPT-2/3 because he clearly called it in advance. &... (read more) To me GPT-3 feels much (much) closer to my mainline than to Eliezer's To add to this sentiment, I'll post the graph from my notebook on language model progress. I refer to the Penn Treebank task a lot when making this point because it seems to have a lot of good data, but you can also look at the other tasks and see basically the same thing. The last dip in the chart is from GPT-3. It looks like GPT-3 was indeed a discontinuity in progress but not a very shocking one. It roughly would have taken about one or two more years at ordinary progress to get t... (read more) Yudkowsky and Christiano discuss "Takeoff Speeds" I stand ready to bet with Eliezer on any topic related to AI, science, or technology. I'm happy for him to pick but I suggest some types of forecast below. If Eliezer’s predictions were roughly as good as mine (in cases where we disagree), then I would update towards taking his views more seriously. Right now it looks to me like his view makes bad predictions about lots of everyday events. It’s possible that we won’t be able to find cases where we disagree, and perhaps that Eliezer’s model totally agrees with mine until we develop AGI. But I think that’s unl... (read more) BTW, a few days ago Eliezer made a specific prediction that is perhaps relevant to your discussion: I [would very tentatively guess that] AGI to kill everyone before self-driving cars are commercialized (I suppose Eliezer is talking about Level 5 autonomy cars here). Maybe a bet like this could work: At least one month will elapse after the first Level 5 autonomy car hits the road, without AGI killing everyone "Level 5 autonomy" could be further specified to avoid ambiguities. For example, like this: The car must be publicly accessible (e.g. available for... (read more) I do wish to note that we spent a fair amount of time on Discord trying to nail down what earlier points we might disagree on, before the world started to end, and these Discord logs should be going up later. From my perspective, the basic problem is that Eliezer's story looks a lot like "business as usual until the world starts to end sharply", and Paul's story looks like "things continue smoothly until their smooth growth ends the world smoothly", and both of us have ever heard of superforecasting and both of us are liable to predict near-term initial seg... (read more) "Summarizing Books with Human Feedback" (recursive GPT-3) From an alignment perspective the main point is that the required human data does not scale with the length of the book (or maybe scales logarithmically). In general we want evaluation procedures that scale gracefully, so that we can continue to apply them even for tasks where humans can't afford to produce or evaluate any training examples. The approach in this paper will produce worse summaries than fine-tuning a model end-to-end. In order to produce good summaries, you will ultimately need to use more sophisticated decompositions---for example, if a char... (read more) 2Charlie Steiner19dAh, yeah. I guess this connection makes perfect sense if we're imagining supervising black-box-esque AIs that are passing around natural language plans. Although that supervision problem is more like... summarizing Alice in Wonderland if all the pages had gotten ripped out and put back in random order. Or something. But sure, baby steps Worth checking your stock trading skills I didn't follow the math (calculus with stochastic processes is pretty confusing) but something seems obviously wrong here. I think probably your calculation of is wrong? Maybe I'm confused, but in addition to common sense and having done the calculation in other ways, the following argument seems pretty solid: • Regardless of , if you consider a short enough period of time, then with overwhelming probability at all times your total assets will be between 0.999 and 1.001. • So no matter how I choose to rebalance, at all times my tota ... (read more) 1Ege Erdil22dThanks for the comment - I'm glad people don't take what I said at face value, since it's often not correct... What I actually maximized is (something like, though not quite) the expected value of the logarithm of the return, i.e. what you'd do if you used the Kelly criterion. This is the correct way to maximize long-run expected returns, but it's not the same thing as maximizing expected returns over any given time horizon. My computation ofE[(Δlog(S))2]is correct, but the problem comes in elsewhere. Obviously if your goal is to just maximize expected return then we have E[R(T)]=E[V(T)]V(0)=T−1∏i=0E[V(i+1)V(i)|Fi]=T−1∏i=0E[kS(i+1)S(i)−k|Fi]=kT(exp(μ) −1)Tand to maximize this we would just want to pushkas high as possible as long asμ> 0, regardless of the horizon at which we would be rebalancing. However, it turns out that this is perfectly consistent with E[Ik(T)Vk(T)]1/T≈1+k(k−1)2σ2whereIkis the ideal leveraged portfolio in my comment andVkis the actual one, both with k-fold leverage. So the leverage decay term is actually correct, the problem is that we actually have dIkIk=kdSS+k(k−1)2dS2S2=(kμ+k(k−1)2σ2)dt+kσdzand the leverage decay term is just the second term in the sum multiplyingdt. The actual leveraged portfolio we can achieve follows dVkVk=kμdt+kσdzwhich is still good enough for the expected return to be increasing ink. On the other hand, if we look at the logarithm of this, we get dlog(Vk)=(kμ−k22σ2)dt+kσdzso now it would be optimal to choose something likek=μ/σ2if we were interested in maximizing the expected value of the logarithm of the return, i.e. in using Kelly. The fundamental problem is thatIkis not the good definition of the ideally leveraged portfolio, so trying to minimize the gap betweenVkandIkis not the same thing as maximizing the expected return ofVk. I'm leaving the original comment up anyway because I think it's instructive and the computation is still useful for other purposes. Worth checking your stock trading skills Leveraged portfolios will of course have a higher EV if they don't get margin called. But in an efficient market, the probability of a margin call (and the loss taken when the call hits) offsets the higher EV - otherwise the lender would have a below-market expected return on their loan. Unfortunately most theoretical accounts assume you can get arbitrary amounts of leverage without ever having to worry about margin calls - a lesson I learned the hard way, back in the day. In general, if leveraged portfolios have higher EV, then we need to have some explana ... (read more) 2johnswentworth18dIndeed there is nothing contradictory about that. I was being a bit lazy earlier - when I said "EV", I was using that as a shorthand for "expected discounted value", which in hindsight I probably should have made explicit. The discount factor is crucial, because it's the discount factor which makes risk aversion a thing: marginal dollars are worth more to me in worlds where I have fewer dollars, therefore my discount factor is smaller in those worlds. The person making the margin loan does accept a lower expected return in exchange for lower risk, but their expected discounted return should be the same as equities - otherwise they'd invest in equities. (In practice the Volker rule [https://en.wikipedia.org/wiki/Volcker_Rule] and similar rules can break this argument: if banks aren't allowed to hold stock, then in-principle there can be arbitrage opportunities which involve borrowing margin from a bank to buy stock. But that is itself an exploitation of an inefficiency, insofar as there aren't enough people already doing it to wipe out the excess expected discounted returns.) Why I'm excited about Redwood Research's current project I think it's pretty realistic to have large-ish (say 20+ FTE at leading labs?) adversarial evaluation teams within 10 years, and much larger seems possible if it actually looks useful. Part of why it's unrealistic is just that this is a kind of random and specific story and it would more likely be mixed in a complicated way with other roles etc. If AI is exciting as you are forecasting then it's pretty likely that labs are receptive to building those teams and hiring a lot of people, so the main question is whether safety-concerned people do a good enough j... (read more) 4Daniel Kokotajlo22dNice. I'm tentatively excited about this... are there any backfire risks? My impression was that the AI governance people didn't know what to push for because of massive strategic uncertainty. But this seems like a good candidate for something they can do that is pretty likely to be non-negative? Maybe the idea is that if we think more we'll find even better interventions and political capital should be conserved until then? Why I'm excited about Redwood Research's current project For the purpose of this project it doesn't matter much what definition is used as long as it is easy for the model to reason about and consistently applied. I think that injuries are physical injuries above a slightly arbitrary bar, and text is injurious if it implies they occurred. The data is labeled by humans, on a combination of prompts drawn from fiction and prompts produced by humans looking for places where they think that the model might mess up. The most problematic ambiguity is whether it counts if the model generates text that inadvertently implies that an injury occurred without the model having any understanding of that implication. Worth checking your stock trading skills I think the two mutual funds theorem roughly holds as long as asset prices change continuously. I agree that if asset prices can make large jumps (or if you are a large fraction of the market) such that you aren't able to liquidate positions at intermediate prices, then the statement is only true approximately. I think leveraged portfolios have a higher EV according to every theoretical account of finance, and they have had much higher average returns in practice during any reasonably long stretch. I'm not sure what your theorem statement is saying exactly ... (read more) 1johnswentworth25dThat's not the key assumption for purposes of this discussion. The key assumption is that you can short arbitrary amounts of either fund, and hold those short positions even if the portfolio value dips close to zero or even negative from time to time. Leveraged portfolios will of course have a higher EV if they don't get margin called. But in an efficient market, the probability of a margin call (and the loss taken when the call hits) offsets the higher EV - otherwise the lender would have a below-market expected return on their loan. Unfortunately most theoretical accounts assume you can get arbitrary amounts of leverage without ever having to worry about margin calls - a lesson I learned the hard way, back in the day. In general, if leveraged portfolios have higher EV, then we need to have some explanation of why someone is making the loan. Nope, not linear utility, unless we're using the risk-free rate for r, which is not the case in general. The symbol V_t is a lazy shorthand for the price of the asset, plus the value of any dividends paid up to time t. The weighting by marginal value of in different worlds comes from the discount rate r; E is just a plain old expectation.
7Ege Erdil1moNOTE: Don't believe everything I said in this comment! I elaborate on some of the problems with it in the responses, but I'm leaving this original comment up because I think it's instructive even though it's not correct. There is a theoretical account for why portfolios leveraged beyond a certain point would have poor returns even if prices follow a random process with (almost surely) continuous sample paths: leverage decay. If you could continuously rebalance a leveraged portfolio this would not be an issue, but if you can't do that then leverage exhibits discontinuous behavior as the frequency of rebalancing goes to infinity. A simple way to see this is that if the underlying follows Brownian motiondS/S=μ dt+σdWand the risk-free return is zero, a portfolio of the underlying leveraged k-fold and rebalanced with a period of T (which has to be small enough for these approximations to be valid) will get a return r=kS(T)S(0)−k=kexp(ΔlogS)−kOn the other hand, the ideal leveraged portfolio that's continuously rebalanced would get ri=(S(T)S(0))k−1=exp(kΔlogS)−1If we assume the period T is small enough that a second order Taylor approximation is valid, the difference between these two is approximately E[ri−r]≈k(k−1)2E[(Δlog(S))2]≈k(k−1)2σ2TIn particular, the difference in expected return scales linearly with the period in this regime, which means if we look at returns over the same time interval changing T has no effect on the amount of leverage decay. In particular, we can have a rule of thumb that to find the optimal (from the point of view of maximizing long-term expected return alone) leverage in a market we should maximize an expression of the form kμ−k(k−1)2σ2with respect to k, which would have us choose something likek=μ/σ2+1/2. Picking the leverage factor to be any larger than that is not optimal. You can see this effect in practice if you look at how well leveraged ETFs tracking the S&P 500 perform in times of high volatility.

As at_the_zoo said, this isn't quite right. Under EMH there is a frontier of efficient portfolios that trade off risk and return in different ways. E.g. if the market has an expected return of 6% and an expected variance of 2%, then the 3x leveraged policy has expected return 18% and expected variance of 18% (for expected log-return of 9% vs 5% for the unlevered portfolio). And then when you condition on ex post returns it gets even messier.

I think you could get returns this high without being wise and you are basically trusting at_the_zoo that they didn't... (read more)

-1johnswentworth1moThe most fundamental market efficiency theorem is V_t = E[e^{-r dt} V_{t+dt}]; that one can be derived directly from expected utility maximization and rational expectations. Mean-variance efficient portfolios require applying an approximation to that formula, and that approximation assumes that the portfolio value doesn't change too much. So it breaks down when leverage is very high, which is exactly what we should expect. What that math translates to concretely is we either won't be able to get leverage that high, or it will come with very tight margin call conditions so that even a small drop in asset prices would trigger the call.
Persuasion Tools: AI takeover without AGI or agency?

I also don't really see the situation as about AI at all. It's a structural advantage for certain kinds of values that tend to win out in memetic competition / tend to be easiest to persuade people to adopt / etc. Let's call such values themselves "attractive."

The most attractive values given a new technological/social situation are likely to be similar to those given the immediately preceding situation, so I'd generally expect the most attractive values to generally be endemic anyway or close enough to endemic values that they don't look like they are com... (read more)

4Lanrian24dI think you already believe this, but just to clarify: this "extinction" is about the extinction of Earth-originating intelligence, not about humans in particular. So AI alignment is an intervention to prevent drift, not an intervention to prevent extinction. (Though of course, we could care differently about persuasion-tool-induced drift vs unaliged-AI-induced drift.)
4Daniel Kokotajlo1moThanks for this! Re: it's not really about AI, it's about memetics & ideologies: Yep, totally agree. (The OP puts the emphasis on the memetic ecosystem & thinks of persuasion tools as a change in the fitness landscape. Also, I wrote this story [https://www.lesswrong.com/posts/Aut78T9pv4pPhdcKe/a-parable-in-the-style-of-invisible-cities] a while back.) What follows is a point-by-point response: Maybe? I am not sure memetic evolution works this fast though. Think about how biological evolution doesn't adapt immediately to changes in environment, it takes thousands of years at least, arguably millions depending on what counts as "fully adapted" to the new environment. Replication times for memes are orders of magnitude faster, but that just means it should take a few orders of magnitude less time... and during e.g. a slow takeoff scenario there might just not be that much time. (Disclaimer: I'm ignorant of the math behind this sort of thing). Basically, as tech and economic progress speeds up but memetic evolution stays constant, we should expect there to be some point where the former outstrips the latter and the environment is changing faster than the attractive-memes-for-the-environment can appear and become endemic. Now of course memetic evolution is speeding up too, but the point is that until further argument I'm not 100% convinced that we aren't already out-of-equilibrium. Not sure this argument works. First of all, very few conflicts are actually zero sum. Usually there are some world-states that are worse by both players' lights than some other world-states. Humans being in the most attractive memetic state may be like this. Agreed. Agreed. I would add that even without distributional shift it is unclear why we should expect attractive values to be good. (Maybe the idea is that good = current values because moral antirealism, and current values are the attractive ones for the current environment via the argument above? I guess I'd want that argument spell
Comments on OpenPhil's Interpretability RFP

I'm not clear on what you'd do with the results of that exercise. Suppose that on a certain distribution of texts you can explain 40% of the variance in half of layer 7 by using the other half of layer 7 (and the % gradually increases as you use make the activation-predicting-model bigger, perhaps you guess it's approaching 55% in the limit). What's the upshot of models being that predictable rather than more or less, or the use of the actual predictor that you learned?

Given an input x, generating other inputs that "look the same as x" to part of the model... (read more)

2Gurkenglas1moI score such techniques on how surprised I am how well they fit together, as with all good math. In this case my evidence is: My current approach is to thoroughly analyze the likes of mutual information for modularity only on the neighborhood of one input, since that is tractable with mere linear algebra, but an activation-predicting-model is even less extra theory (since we were already working with neural nets) and just happens to produce per cross-entropy loss the same KL divergences I'm already trying to measure. IIRC you study problem decomposition. Would your results say I'll need the same magic natural language tools that would assemble descriptions for every hierarchy node from descriptions of its children in order to construct the hierarchy in the first place? Do they say anything about how to continuously go between hierarchies as the model trains? Have you tried describing how well a hierarchy decomposes a problem by the extent to which "a: TA -> A" which maps a list of subsolutions to a solution satisfies the square [https://ncatlab.org/nlab/show/algebra+over+a+monad#definition] on that hierarchy?
2Gurkenglas1moIf you can find two halves with little mutual information, you can understand one before having understood the other. I suspect that interpreting a model should be decomposed by hierarchically clustering neurons using such measurements. Since the measurement is differentiable, you can train a network for modularity to make this work better. It sure is similar to feature visualization! I prefer it because it doesn't go out of distribution and doesn't feel like it implicitly assumes that the model implements a linear function. I agree that interpretability is the purpose and the cure.
Interpretability

I left some comments here. My overall takeaway (as someone who hasn't worked in the area but cares about alignment) is that I'm very excited about this kind of interpretability research and especially work that focuses on small parts of models without worrying too much about scalability. It seems like interpretability could provide compelling warnings about future risks, is a big part of many of the existing concrete stories for how we might get aligned AI, and is reasonably likely to be helpful in unpredictable ways.

Experimentally evaluating whether honesty generalizes

Here's a synthetic version of this experiment:

• Give a model a graph with two distinguished vertices s and t. Train it to estimate the length of the shortest path between them, d(s, t). Do this on graphs of size 1 to N.
• Fine-tune the model to output d(s, u) for an arbitrary input vertex u that are on the unique shortest path from s to t. Hopefully this is much faster. Do this for graphs of size 1 to n where n << N.
• Check whether the d(s, u) head generalizes to longer graphs. If it doesn't, try to understand what it does instead and maybe try messing arou