Was a philosophy PhD student, left to work at AI Impacts, then Center on Long-Term Risk, then OpenAI. Quit OpenAI due to losing confidence that it would behave responsibly around the time of AGI. Not sure what I'll do next yet. Views are my own & do not represent those of my current or former employer(s). I subscribe to Crocker's Rules and am especially interested to hear unsolicited constructive criticism. http://sl4.org/crocker.html
Some of my favorite memes:
(by Rob Wiblin)
(xkcd)
My EA Journey, depicted on the whiteboard at CLR:
(h/t Scott Alexander)
In general it would be helpful to have a range of estimates.
I think the range is as follows:
Estimates based on looking at how fast humans can do things (e.g. WW2 industrial scaleup) and then modifying somewhat upwards (e.g. 5x) in an attempt to account for superintelligence... should be the lower bound, at least for the scenario where superintelligence is involved at every level of the process.
The upper bound is the Yudkowsky bathtub nanotech scenario, or something similarly fast that we haven't thought of yet. Where the comparison point for the estimate is more about the laws of physics and/or biology.
However, I expect RL on CoT to amount to "process-based supervision," which seems inherently safer than "outcome-based supervision."
I think the opposite is true; the RL on CoT that is already being done and will increasingly be done is going to be in significant part outcome-based (and a mixture of outcome-based and process-based feedback is actually less safe than just outcome-based IMO, because it makes the CoT less faithful)
My impression is that software has been the bottleneck here. Building a hand as dextrous as the human hand is difficult but doable (and has probably already been done, though only in very expensive prototypes); having the software to actually use that hand intelligently and deftly as a human would has not yet been done. But I'm not an expert. Power supply is different -- humans can work all day on a few Big Macs, whereas robots will need to be charged, possibly charged frequently or even plugged in constantly. But that doesn't seem like a significant obstacle.
Re: WW2 vs. modern: yeah idk. I don't think the modern gap between cars and humanoid robots is that big. Tesla is making Optimus after all. Batteries, electronics, chips, electric motors, sensors... seems like the basic components are the same. And seems like the necessary tolerances are pretty similar; it's not like you need a clean room to make one but not the other, and it's not like you need hyperstrong-hyperlight exotic materials for one but not the other. In fact I can think of one very important, very expensive piece of equipment (the gigapress) that you need for cars but not for humanoid robots.
All of the above is for 'minimum viable humanoid robots' e.g. robots that can replace factory and construction workers. They might need to be plugged in to the wall often, they might wear out after a year, they might need to do some kinds of manipulations 2x slower due to having fatter fingers or something. But they don't need to e.g. be capable of hiking for 48 hours in the wilderness and fording rivers all on the energy provided by a Big Mac. Nor do they need to be as strong-yet-lightweight as a human.
Thanks for writing this. I think this topic is generally a blind spot for LessWrong users, and it's kind of embarrassing how little thought this community (myself included) has given to the question of whether a typical future with human control over AI is good.
I don't think it's embarrassing or a blind spot. I think I agree that it should receive more thought on the margin, and I of course agree that it should receive more thought all things considered. There's a lot to think about! You may be underestimating how much thought has been devoted to this so far. E.g. it was a common topic of discussion at the center on long-term-risk while I was there. And it's not like LW didn't consider the question until now; my recollection is that various of us considered it & concluded that yeah probably human takeover is better than AI takeover in expectation for the reasons discussed in this post.
Side note: The title of this post is "Human Takeover Might Be Worse than AI Takeover" but people seem to be reading it as "Human Takeover Will Be Worse In Expectation than AI Takeover" and when I actually read the text I come away thinking "OK yeah, these arguments make me think that human takeover will be better in expectation than AI takeover, but with some significant uncertainty."
My view is not "can no longer do any good," more like "can do less good in expectation than if you had still some time left before ASI to influence things." For reasons why, see linked comment above.
I think that by the time Metaculus is convinced that ASI already exists, most of the important decisions w.r.t. AI safety will have already been made, for better or for worse. Ditto (though not as strongly) for AI concentration-of-power risks and AI misuse risks.
I'd be interested in an attempt to zoom in specifically on the "repurpose existing factories to make robots" part of the story. You point to WW2 car companies turning into tank and plane factories, and then say maybe a billion humanoid robots per year within 5 years of the conversion.
My wild guesses:
Human-only world: Assume it's like ww2 all over again except for some reason everyone thinks humanoid robots are the main key to victory:
Then yeah, WW2 seems like the right comparison here. Brief google and look at some data makes me think maybe combat airplane production scaled up by an OOM in 1-2 years early on, and then tapered off to more like a doubling every year. I think what this means is that we should expect something like an OOM/year of increase in humanoid robot production in this scenario, for a couple years? So, from 10,000/yr (assuming it starts today) to a billion/yr 5 years later?
ASI-powered world: Assume ASIs are overseeing and directing the whole process + government is waiving red tape etc. (perhaps because ASI has convinced them it's a good idea):
So obviously things will go significantly faster with ASI in charge and involved at every level. The question is how much faster. Some thoughts:
Overall I'd guess that we would get to a billion/yr humanoid robot production within about a year of ASI, and that the bulk of these robots would be substantially more sophisticated as well compared to present-day robots. And it's easier for me to imagine things going faster than that, than slower, though perhaps I should also account for various biases that push in the other direction. For now I'll just hand-wave and hope it cancels out.
I am saying that expected purchasing power given Metaculus resolved ASI a month ago is less, for altruistic purposes, than given Metaculus did not resolve ASI a month ago. I give reasons in the linked comment. Consider the analogy I just made to nuclear MAD -- suppose you thought nuclear MAD was 60% likely in the next three years, would you take the sort of bet you are offering me re ASI? Why or why not?
I do not think any market is fully efficient and I think altruistic markets are extremely fucking far from efficient. I think I might be confused or misunderstanding you though -- it seems you think my position implies that OP should be redirecting money from AI risk causes to causes that assume no ASI? Can you elaborate?
Thanks for proposing this bet. I think a bullet point needs to be added:
- Your median date of superintelligent AI as defined by Metaculus was the end of 2028. If you believe the median date is later, the bet will be worse for you.
- The probability of me paying you if you win was the same as the probability of you paying me if I win. The former will be lower than the latter if you believe the transfer is less likely given superintelligent AI, in which case the bet will be worse for you.
Thanks for the reply.
(I'm tracking the possibility that LLMs are steadily growing in general capability and that they simply haven't yet reached the level that impresses me personally. But on balance, I mostly don't expect this possibility to be realized.)
That possibility is what I believe. I wish we had something to bet on better than "inventing a new field of science," because by the time we observe that, there probably won't be much time left to do anything about it. What about e.g. "I, Daniel Kokotajlo, are able to use AI agents basically as substitutes for human engineer/programmer employees. I, as a non-coder, can chat with them and describe ML experiments I want them to run or websites I want them to build etc., and they'll make it happen at least as quickly and well as a competent professional would." (And not just for simple websites, for the kind of experiments I'd want to run, which aren't the most complicated but they aren't that different from things actual AI company engineers would be doing.)
What about "The model is seemingly as good at solving math problems and puzzles as Thane is, not just on average across many problems but on pretty much any specific problem including on novel ones that are unfamiliar to both of you?
Humans have "bottom-up" agency: they're engaging in fluid-intelligence problem-solving and end up "drawing" a decision-making pattern of a specific shape. An LLM, on this model, has a database of templates for such decision-making patterns, and it retrieves the best-fit agency template for whatever problem it's facing. o1/RL-on-CoTs is a way to deliberately target the set of agency-templates an LLM has, extending it. But it doesn't change the ultimate nature of what's happening.
In particular: the bottom-up approach would allow an agent to stay on-target for an arbitrarily long time, creating an arbitrarily precise fit for whatever problem it's facing. An LLM's ability to stay on-target, however, would always remain limited by the length and the expressiveness of the templates that were trained into it.
Miscellaneous thoughts: I don't yet buy that this distinction between top-down and bottom-up is binary, and insofar as it's a spectrum then I'd be willing to bet that there's been progress along it in recent years. Moreover I'm not even convinced that this distinction matters much for generalization radius / general intelligence, and it's even less likely to matter for 'ability to 5x AI R&D' which is the milestone I'm trying to predict first. Moreover, I don't think humans stay on-target for an arbitrarily long time.
Cool stuff! I remember way back when people first started interpreting neurons, and we started daydreaming about one day being able to zoom out and interpret the bigger picture, i.e. what thoughts occurred when and how they caused other thoughts which caused the final output. This feels like, idk, we are halfway to that day already?