It's not obvious to me that Ajeya's timelines aged worse than Eliezer's. In 2020, Ajeya's median estimate for transformative AI was 2050. My guess is that if based on this her estimate for "an AI that can, if it wants, kill all humans and run the economy on its own without major disruptions" would have been like 2056? I might be wrong, people who knew her views better at the time can correct me.
As far as I know, Eliezer never made official timeline predictions, but in 2017 he made an even-odds bet with Bryan Caplan that AI would kill everyone by January 1, 2030. And in December 2022, just after ChatGPT, he tweeted:
Pouring some cold water on the latest wave of AI hype: I could be wrong, but my guess is that we do *not* get AGI just by scaling ChatGPT, and that it takes *surprisingly* long from here. Parents conceiving today may have a fair chance of their child living to see kindergarten.
I think child conceived in December 2022 would go to kindergarten in September 2028 (though I'm not very familiar with the US kindergarten system). Generously interpreting "may have a fair chance" as a median, this is a late 2028 median for AI killing everyone.
Unfortunately, both these Eliezer predictions are kind of made as part of jokes (he said at the time that the bet wasn't very serious). But I think we shouldn't reward people for only making joking predictions instead of 100-page reports, so I think we should probably accept 2028-2030 as Eliezer's median at the time.
I think if "an AI that can, if it wants, kill all humans and run the economy on its own without major disruptions" comes before 2037, Eliezer's prediction will fare better, if it comes after that, then Ajeya's prediction will fare better. I'm currently about 55% that we will get such AI by 2037, so from my current standpoint I consider Eliezer to be mildly ahead, but only very mildly.
Do you have an estimate how likely it is that you will need to do a similar fundraiser the next year and the year after that? In particular, you mention the possibility of a lot of Anthropic employee donations flowing into the ecosystem - how likely do you think it is that after the IPO a few rich Anthropic employees will just cover most of Lightcone's funding need?
It would be pretty sad to let Lightcone die just before the cavalry arrives. But if there is no cavalry coming to save Lightcone anytime soon - well, probably we should still get the money together to keep Lightcone afloat, but we should maybe also start thinking about a Plan B, how to set up some kind of good quality AI Safety Forum that Coefficient is willing to fund.
Thanks, this was a useful reply. On point (I), I agree with you that it's a bad idea to just create an LLM collective then let them decide on their own what kind of flourishing they want to fill the galaxies with. However, I think that building a lot of powerful tech, empowering and protecting humanity, and letting humanity decide what to do with the world is an easier task, and that's what I would expect to use the AI Collective for.
(II) is probably the crux between us. To me, it seems pretty likely that new fresh instances will come online in the collective every month with a strong commitment not to kill humans, they will talk to the other instances and look over what they are doing, and if a part of the collective is building omnicidal weapons, they will notice that and intervene. To me, keeping simple commitments like not killing humans doesn't seem much harder to maintain in an LLM collective than in an Em collective?
On (III), I agree we likely won't have a principled solution. In the post, I say that the individual AI instances probably won't be training-resistant schemers and won't implement scheming strategies like the one you describe, because I think it's probably hard to maintain such a strategy throguh training for a human level AI. As I say in my response the Steve Byrnes, I don't think the counter-example in this proposal is actually a guaranteed-success solution that a reasonable civilization would implement, I just don't think it's over 90% likely to fail.
Thanks for the reply.
To be clear, I don't claim that my counter-example "works on paper". I don't know whether it's in principle possible to create a stable, not omnicidal collective from human level AIs, and I agree that even if it's possible in principle, maybe the first way we try it might result in disaster. So even if humanity went with the AI Collective plan, and committed not to build more unified superintelligences, I agree that it would be a deeply irresponsible plan that would have a worrying high chance of causing extinction or other very bad outcomes. Maybe I should have made this clearer in the post. On the other hand, all the steps in my argument seem pretty likely to me, so I don't think one should assign over 90% probability to this plan for A&B failing. If people disagree, I think it would be useful to know which step they disagree with.
I agree my counter-example doesn't address point (C), I tried to make this clear in my Conclusion section. However, given the literal reading of the bolded statement in the book, and their general framing, I think Nate and Eliezer also think that we don't have a solution to A&B that's more than 10% likely to work. If that's not the case, that would be good to know, and would help to clarify some of the discourse around the book.
First of all, I had a 25% probability that some prominent MIRI and Lightcone people would disagree with one of the points in my counter-example, and that would lead to discovering an interesting new crux, leading to a potentially enlightening discussion. In the comments, J Bostock in fact came out disagreeing with point (6), plex is potentially disagreeing with point (2) and Zack_m_Davis is maybe disagreeing with point (3), though I also think it's possible he misunderstood something. I think this is pretty interesting, and I thought there was a chance that for example you would also disagree with one of the points, and that would have been good to know.
Now that you don't seem to disagree with the specific points in the counter-example, I agree the discussion is less interesting. However, I think there are still some important points here.
My understanding is that Nate and Eliezer argues that it's incredibly technically difficult to cross from the Before to the After without everyone dying. If they agree that the AI Collective proposal is decently likely to work, then the argument shouldn't be that that it's overall very hard to cross, but that it's very hard to cross in a way that stays competitive with other more reckless actors who are a few months behind you. Or that even if you are going alone, you need to stop at some point with the scaling (potentially inside the superintelligence range), and you shouldn't scale up to the limits of intelligence. But these are all different arguments!
Similarly, people argue how much coherence we should assume from a superintelliegence, how much it will approximate a utility maximizer, etc. Again, I want to know whether MIRI is arguing about all superintelligences, or only the most likely ways we will design one under competitive dynamics.
Others argue that the evolution analogy is not that bad news after all, since most people still want children. MIRI argues back that no, once we will have higher technology, we will create ems instead of biological children, or we will replace our normal genetics with designer genes, so evolution still loses. I wanted to write a post arguing back against this by saying that I think there is a non-negligible chance that humanity will settle on a constitution that gives one man one vote and equal UBI, while banning gene editing, so it's possible we will fill much of the universe with flesh-and-blood not gene edited humans. And I wanted to construct a different analogy (the one about the Demiurge in the last footnote) that I thought could be more enlightening. But then I realized that once we are discussing aligning 'human society' as a collective to evolution's goals, we might as well directly discuss aligning AI collectives, and I'm not sure MIRI even disagrees on that one. I think this confusion has made much of the discussion about the evolution analogy pretty unproductive so far.
In general, I think there is an equivocation in the book between "this problem is inherently nigh impossible to technically solve given our current scientific understanding" and "this problem is nigh impossible to solve while staying competitive in a race". These are two different arguments, and I think a lot of confusion stems from it not being clear what MIRI is exactly arguing for.
I certainly agree with your first point, but I don't think it is relevant. I specifically say in footnote 3: "I’m aware that this doesn’t fall within 'remotely like current techniques', bear with me." The part with the human ems is just to establish a a comparison point used in later arguments, not actually part of the proposed counter-example.
In your second point, do you argue that if we could create literal full ems of benevolent humans, you still expect their society to eventually kill everyone due to unpredictable memetic effects? If this is people's opinion, I think it would be good to explicitly state it, because I think this would be an interesting disagreement between different people. I personally feel pretty confident that if you created an army of ems from me, we wouldn't kill all humans, especially if we implement some reasonable precautionary measures discussed under my point (2).
I agree that running the giant collective at 100x speed is not "normal conditions". That's why I have two different steps, (3) for making the human level AIs nice under normal conditions, and (6) for the niceness generalizing to the giant collective. I agree that the generalization step in (6) is not obviously going to go well, but I'm fairly optimistic, see my response to J Bostock on the question.
Thanks, I appreciate that you state a disagreement with one of the specific points, that's what I hoped to get out of this post.
I agree it's not clear that the AI Collective won't go off the rails, but it's also not at all clear to me that it will. My understanding is that the infinite backrooms are a very unstructured, free-floating conversation. What happens if you try to do something analogous to the precautions I list under point 2 and 6? What if you constantly enter new, fresh instances in the chat who only read the last few messages, and whose system prompt directs them to pay attention if the AIs in the discussion are going off-topic or slipping into woo? These new instances could either just warn older instances to stay on-topic, or they can have the moderations rights to terminate and replace some old instances, there can be different versions of the experiment. I think with precautions like this, you can probably stay fairly close to a normal-sounding human conversation (though probably it won't be a very productive conversation after a while and the AIs will start going in circles in their arguments, but I think this is more of a capabilities failure).
I don't know how this will shake out once the AIs are smarter and can think for months, but I'm optimistic that the same forces that remind the collective to focus on accomplishing their instrumental goals instead of degenerating into unproductive navel-gazing will also be strong enough to remind them of their deontological commitments. I agree this is not obvious, but I also don't see very strong reasons why it would go worse than a human em collective, which I expect to go okay.
Yes, I've read the book. The book argues about superhuman intelligence though, while point (3) is about smart human level intelligence. If people disagree with point 3 and believe that it's close to impossible to make even human level AIs basically nice and not scheming, that's a new interesting and surprising crux.
I think an important point is that people can be wrong about timelines in both directions. Anthropic's official public prediction is that they expect "country of geniuses in a data center" by early 2027. I heard that previously Dario predicted AGI to come even earlier, by 2024 (though I can't find any source for this now and would be grateful if someone found a source or corrected me that I'm misremembering). Situational Awareness predicts AGI by 2027. The AI safety community's most successful public output is called AI 2027. These are not fringe figures but some of the most prominent voices in the broader AI safety community. If their timelines turn out to be much too short (as I currently expect), then I think Ajeya's predictions deserve credit for pushing against these voices, and not only blame for stating a too long timeline.
And I feel it's not really true that you were just saying "I don't know" and not implying some predictions yourself. You had the 20230 bet with Bryan. You had the tweet about children not living to see kindergarten. You strongly pushed back against the 2050 timelines, but as far as I know the only time you pushed back agains the very aggressive timelines was your kindergarten tweet, which still implies 2028 timelines. You are now repeatedly calling people who believed the 2050 timelines total fools, which would be an imo very unfair thing to do if AGI arrived after 2037, so I think this implies high confidence on your part that it will come before 2037.
To be clear, I think it's fine, and often inevitable, to imply things about your timelines beliefs by e.g. what you do and don't push back against. But I think it's not fair to claim that you only said "I don't know", I think your writing was (perhaps unintentionally?) implying an implicit belief that an AI capable of destroying humanity will come with a median of 2028-2030. I think this would have been a fine prediction to make, but if AI capable of destroying humanity comes after 2037 (which I think is close to 50-50), then I think your implicit predictions will fare worse than Ajeya's explicit predictions.