Daniel Kokotajlo

Philosophy PhD student, worked at AI Impacts, then Center on Long-Term Risk, now OpenAI Futures/Governance team. Views are my own & do not represent those of my employer. I subscribe to Crocker's Rules and am especially interested to hear unsolicited constructive criticism. http://sl4.org/crocker.html

Two of my favorite memes:


(by Rob Wiblin)

My EA Journey, depicted on the whiteboard at CLR:

 

Sequences

Agency: What it is and why it matters
AI Timelines
Takeoff and Takeover in the Past and Future

Wiki Contributions

Comments

  1. Yes, pausing then (or a bit before then) would be the sane thing to do. Unfortunately there are multiple powerful groups racing, so even if one does the right thing, the others might not. (That said, I do not think this excuses/justifies racing forward. If the leading lab gets up to the brink of AGI and then pauses and pivots to a combo of safety research + raising awareness + reaping benefits + coordinating with government and society to prevent others from building dangerously powerful AI, then that means they are behaving responsibly in my book, possibly even admirably.)
  2. I chose my words there carefully -- I said "could" not "would." That said by default I expect them to get to ASI quickly due to various internal biases and external pressures.

Reply to first thing: When I say AGI I mean something which is basically a drop-in substitute for a human remote worker circa 2023, and not just a mediocre one, a good one -- e.g. an OpenAI research engineer. This is what matters, because this is the milestone most strongly predictive of massive acceleration in AI R&D. 

Arguably metaculus-AGI implies AGI by my definition (actually it's Ajeya Cotra's definition) because of the turing test clause. 2-hour + adversarial means anything a human can do remotely in 2 hours, the AI can do too, otherwise the judges would use that as the test. (Granted, this leaves wiggle room for an AI that is as good as a standard human at everything but not as good as OpenAI research engineers at AI research)

Anyhow yeah if we get metaculus-AGI by 2025 then I expect ASI by 2027. ASI = superhuman at every task/skill that matters. So, imagine a mind that combines the best abilities of Von Neumann, Einstein, Tao, etc. for physics and math, but then also has the best abilities of [insert most charismatic leader] and [insert most cunning general] and [insert most brilliant coder] ... and so on for everything. Then imagine that in addition to the above, this mind runs at 100x human speed. And it can be copied, and the copies are GREAT at working well together; they form a superorganism/corporation/bureaucracy that is more competent than SpaceX / [insert your favorite competent org].

Re independence: Another good question! Let me think...
--I think my credence in 2, conditional on no AGI by 2030, would go down somewhat but not enough that I wouldn't still endorse it. A lot depends on the reason why we don't get AGI by 2030. If it's because AGI turns out to inherently require a ton more compute and training, then I'd be hopeful that ASI would take more than two years after AGI.
--3 is independent.
--4 maybe would go down slightly but only slightly.

Depends on comparative advantage I guess. 

Yes, I really do. I'm afraid I can't talk about all of the reasons for this (I work at OpenAI) but mostly it should be figure-outable from publicly available information. My timelines were already fairly short (2029 median) when I joined OpenAI in early 2022, and things have gone mostly as I expected. I've learned a bunch of stuff some of which updated me upwards and some of which updated me downwards.

As for the 15% - 15% thing: I mean I don't feel confident that those are the right numbers; rather, those numbers express my current state of uncertainty. I could see the case for making the 2024 number higher than the 2025 (exponential distribution vibes, 'if it doesn't work now then that's evidence it won't work next year either' vibes.) I could also see the case for making the 2025 number higher (it seems like it'll happen this year, but in general projects usually take twice as long as one expects due to the planning fallacy, therefore it'll probably happen next year)

 

But all of the agents will be housed in one or three big companies. Probably one. And they'll basically all be copies of one to ten base models. And the prompts and RLHF the companies use will be pretty similar. And the smartest agents will at any given time be only deployed internally, at least until ASI. 

I guess I was including that under "hopefully it would have internalized enough human ethics that things would be OK" but yeah I guess that was unclear and maybe misleading.

Can you elaborate? I agree that there will be e.g. many copies of e.g. AutoGPT6 living on OpenAI's servers in 2027 or whatever, and that they'll be organized into some sort of "society" (I'd prefer the term "bureaucracy" because it correctly connotes centralized heirarchical structure). But I don't think they'll have escaped the labs and be running free on the internet.

 

  1. Probably there will be AGI soon -- literally any year now.
  2. Probably whoever controls AGI will be able to use it to get to ASI shortly thereafter -- maybe in another year, give or take a year.
  3. Probably whoever controls ASI will have access to a spread of powerful skills/abilities and will be able to build and wield technologies that seem like magic to us, just as modern tech would seem like magic to medievals.
  4. This will probably give them godlike powers over whoever doesn't control ASI.
  5. In general there's a lot we don't understand about modern deep learning. Modern AIs are trained, not built/programmed. We can theorize that e.g. they are genuinely robustly helpful and honest instead of e.g. just biding their time, but we can't check.
  6. Currently no one knows how to control ASI. If one of our training runs turns out to work way better than we expect, we'd have a rogue ASI on our hands. Hopefully it would have internalized enough human ethics that things would be OK.
  7. There are some reasons to be hopeful about that, but also some reasons to be pessimistic, and the literature on this topic is small and pre-paradigmatic.
  8. Our current best plan, championed by the people winning the race to AGI, is to use each generation of AI systems to figure out how to align and control the next generation.
  9. This plan might work but skepticism is warranted on many levels.
  10. For one thing, there is an ongoing race to AGI, with multiple megacorporations participating, and only a small fraction of their compute and labor is going towards alignment & control research. One worries that they aren't taking this seriously enough.

In the worlds where we get AGI in the next 3y, the money can (and large chunks of it will) get donated, partly to GiveDirectly and suchlike, and partly to stuff that helps AGI go better.

The remaining 50% basically exponentially decays for a bit and then has a big fat tail. So off the top of my head I'm thinking something like this:

15% - 2024
15% - 2025
15% - 2026
10% - 2027
5% - 2028
5% - 2029
3% - 2030
2% - 2031
2% - 2032
2% - 2033
2% - 2034
2% - 2035
... you get the idea.

 

Load More