Ajeya Cotra

Wiki Contributions


Biology-Inspired AGI Timelines: The Trick That Never Works

The definition of "year Y compute requirements" is complicated in a kind of crucial way here, to attempt to a) account for the fact that you can't take any amount of compute and turn it into a solution for some task literally instantly, while b) capturing that there still seems to be a meaningful notion of "the compute you need to do some task is decreasing over time." I go into it in this section of part 1.

First we start with the "year Y technical difficulty of task T:"

  • In year Y, imagine a largeish team of good researchers (e.g. the size of AlphaGo's team) is embarking on a dedicated project to solve task T.
  • They get an amount of $ D dumped on them, which could be more $ than exists in the whole world, like 10 quadrillion or whatever.
  • With a few years of dedicated effort (e.g. 2-5), plus whatever fungible resources they could buy with D dollars (e.g. terms of compute and data and low-skilled human labor), can that team of researchers produce a program that solves task T? Here we assume that the fungible resources are infinitely available if you pay, so e.g. if you pay a quadrillion dollars you can get an amount of compute that is (FLOP/$ in year Y) * (1 quadrillion), even though we obviously don't have that many computers.

And the "technical difficulty of task T in year Y" is how big D is for the best plan that the researchers can come up with in that time. What I wrote in the doc was:

The price of the bundle of resources that it would take to implement the cheapest solution to T that researchers could have readily come up with by year Y, given the CS field’s understanding of algorithms and techniques at that time.

And then you have "year Y compute requirements," which is whatever amount of compute they'd buy with whatever portion of D dollars they spend on compute.

This definition is convoluted, which isn't ideal, but after thinking about it for ~10 hours it was the best I could do to balance a) and b) above.

With all that said, I actually do think that the team of good researchers could have gotten GPT-level perf with somewhat more compute a couple years ago, and AlphaGo-level perf with significantly more compute several years ago. I'm not sure exactly what ratio would be, but I don't think it's many OOMs.

The thing you said about it being an average with a lot of spread is also true. I think a better version of the model would have probability distributions over the algorithmic progress, hardware progress, and spend parameters; I didn't put that in because the focus of the report was estimating the 2020 compute requirements distribution. I did try some different values for those parameters in my aggressive and conservative estimates but in retrospect the spread was not wide enough on those.

Christiano, Cotra, and Yudkowsky on AI progress

Yes, Rob is right about the inference coming from the bet and Eliezer is right that the bet was actually 1:1 odds but due to the somewhat unusual bet format I misread it as 2:1 odds.

Draft report on AI timelines

David Roodman put together a Guesstimate model that some people might find helpful: https://www.getguesstimate.com/models/18944

Draft report on AI timelines

There are some limited sensitivity analysis in the "Conservative and aggressive estimates" section of part 4.

Anna and Oliver discuss Children and X-Risk

Belatedly, I did a bit of outside-view research on the time and monetary costs of kids (though a couple parent friends kindly sanity-checked some of it). I presented it at my house's internal conference, but some folks suggested I share more broadly in case it's helpful to others: here is the slide deck. The assumptions are Bay Area, upper-middle-class parents (e.g. both programmers or something like that) who both want to keep their careers and are therefore willing to pay a lot for childcare.

Notes from "Don't Shoot the Dog"

Thanks for writing this up! Appreciate the personal anecdotes too. Curious if you or Jeff have any tips and tricks for maintaining the patience/discipline required to pull off this kind of parenting (for other readers, I enjoyed some of Jeff's thoughts on predictable parenting here). Intuitively to me, it seems like this is a reason that the value-add from paying for childcare might be higher than you'd think naively — not only do you directly save time, you might also have more emotional reserves to be consistent and disciplined if you get more breaks.

The case for aligning narrowly superhuman models

I'm personally skeptical that this work is better-optimized for improving AI capabilities than other work being done in industry. In general, I'm skeptical of perspectives that work that the rationalist/EA/alignment crowd does Pareto-dominates the other work going on -- that is, that it's significantly better for both alignment and capabilities than standard work, such that others are simply making a mistake by not working on it regardless of what their goals are or how much they care about alignment. I think sometimes this could be the case, but I wouldn't bet on it being a large effect. In general, I expect work optimized to help with alignment to be worse on average at pushing forward capabilities, and vice versa.

The case for aligning narrowly superhuman models

In my head the point of this proposal is very much about practicing what we eventually want to do, and seeing what comes out of that; I wasn't trying here to make something different sound like it's about practice. I don't think that a framing which moved away from that would better get at the point I was making, though I totally think there could be other lines of empirical research under other framings that I'd be similarly excited about or maybe more excited about.

In my mind, the "better than evaluators" part is kind of self-evidently intriguing for the basic reason I said in the post (it's not obvious how to do it, and it's analogous to the broad, outside view conception of the long-run challenge which can be described in one sentence/phrase and isn't strongly tied to a particular theoretical framing):

I’m excited about tackling this particular type of near-term challenge because it feels like a microcosm of the long-term AI alignment problem in a real, non-superficial sense. In the end, we probably want to find ways to meaningfully supervise (or justifiably trust) models that are more capable than ~all humans in ~all domains.[4] So it seems like a promising form of practice to figure out how to get particular humans to oversee models that are more capable than them in specific ways, if this is done with an eye to developing scalable and domain-general techniques.

A lot of people in response to the draft were pushing in the direction that I think you were maybe gesturing at (?) -- to make this more specific to "knowing everything the model knows" or "ascription universality"; the section "Why not focus on testing a long-term solution?" was written in response to Evan Hubinger and others. I think I'm still not convinced that's the right way to go.

The case for aligning narrowly superhuman models

I don't feel confident enough in the frame of "inaccessible information" to say that the whole agenda is about it. It feels like a fit for "advice", but not a fit for "writing stories" or "solving programming puzzles" (at least not an intuitive fit -- you could frame it as "the model has inaccessible information about [story-writing, programming]" but it feels more awkward to me). I do agree it's about "strongly suspecting it has the potential to do better than humans" rather than about "already being better than humans." Basically, it's about trying to find areas where lackluster performance seems to mostly be about "misalignment" rather than "capabilities" (recognizing those are both fuzzy terms).

The case for aligning narrowly superhuman models

Yeah, you're definitely pointing at an important way the framing is awkward. I think the real thing I want to say is "Try to use some humans to align a model in a domain where the model is better than the humans at the task", and it'd be nice to have a catchy term for that. Probably a model which is better than some humans (e.g. MTurkers) at one task (e.g. medical advice) will also be better than those same humans at many other tasks (e.g. writing horror stories); but at the same time for each task, there's some set of humans (e.g. doctors in the first case and horror authors in the second) where the model does worse.

I don't want to just call it "align superhuman AI today" because people will be like "What? We don't have that", but at the same time I don't want to drop "superhuman" from the name because that's the main reason it feels like "practicing what we eventually want to do." I considered "partially superhuman", but "narrowly" won out.

I'm definitely in the market for a better term here.

Load More