The Crux List. The original text is included as a backup, but it formats much better on Substack, and I haven’t yet had time to re-format it for WordPress or LessWrong.
This post is a highly incomplete list of questions where I either have large uncertainty, have observed strong disagreement with my perspective ,or both, and where changing someone’s mind could plausibly impact one’s assessment of how likely there is to be a catastrophe from loss of control of AGI, or how likely such a catastrophe is conditional on AGI being developed.
I hope to continue expanding and editing this list over time, if it proves useful enough to justify that, and perhaps to linkify it over time as well, and encourage suggesting additional questions or other ways to improve it.
The failure of this list to converge on a small number of core crux-style questions, I believe, reflects and illustrates the problem space, and helps explain why these questions have been so difficult and resulted in such wide and highly confident disagreements. There is no compact central disagreement, there are many different ones, that influence and interact with each other in complex ways, and different people emphasize and focus on different aspects, and bring different instincts, heuristics, experiences and knowledge.
When looking through this list, you may encounter questions that did not even occur to you to consider, either because you did not realize the answer was non-obvious, or the consideration never even occurred in the first place. Those could be good places to stop and think.
A lot of these questions take the form of ‘how likely is it, under Y conditions, that X will happen?’ It is good to note such disagreements, while also noticing that many such questions come out of hopeful thinking or searching for and backward chaining from non-catastrophic outcomes or the prospect of one. Usually, if your goal is to figure things out rather than locate a dispute, a better question would be, in that scenario: What happens?
It can still be useful to see what others have proposed, as they will have ideas you missed, and sometimes those will be good ideas. Other times, it is important to anticipate their objections, even if they are not good.
If you are interested only in the better questions of ‘what happens?’ rather than in classifying whether or how outcomes are catastrophic, you can skip the first two sections and start at #3.
If there are cruxes or other good questions that you have observed or especially one that you have, that you do not see on this list, you are encouraged to comment to share them, with or without saying what your answers are.
The list is long because people have very different intuitions, ideas, models and claims about the future, for a variety of reasons, and focus in different places. I apologize that I have had neither the time to make it longer, or to make it shorter.
Thus, it is probably not your best strategy to read straight through the list, instead focusing on the sections if any that are relevant and interesting to you.
Thanks for engaging. I hope this was helpful.
The link to the substack version says "private."
That error was fixed, but let's say 'please help fix the top of the post, for reasons that should be obvious, while we fix that other bug we discussed.'
I think I fixed the top-of-post again, but, I thought I fixed it yesterday and I'm confused what happened. Whatever's going on here is much weirder than usual.
The target of the second hyperlink appears to contain some HTML, which breaks the link and might be the source of some other problems:
If everything is a crux, is anything a crux?
No, not in general, which is one of the main points - I wrote this partly to illustrate that there was no single thing that one could address to handle that large a portion of debates, objections or questions.
Nice list, though there's a prerequisite crux before these.
i.e. What is 'intelligence'?
More specifically, I think the crux is whether we mean direct or amortized optimization when talking about intelligence (or selection vs control if you prefer that framing).
Huh, selection vs control is an interesting way to look at it, though I'm not sure if it's a dichotomy or more of a multi-dimensional spectrum.
Gave an upvote though for raising the point.
I do think this is worth pondering in some form, and asking whether the question is implied or should be a subquestion somewhere...
What is an AGI? I have seen a lot of "not a true scotman" around this one.
This seems like a non-sequitor, there might or might not even be such a thing as 'AGI' depending on how one understands intelligence, hence why it is a prerequisite crux.
Can you clarify what your trying to say?
Yeah, sorry about that. I didn't put much effort into my last comment.
Defining intelligence is tricky, but to paraphrase EY, it's probably wise not to get too specific since we don't fully understand Intelligence yet. In the past, people didn't really know what fire was. Some would just point to it and say, "Hey, it's that shiny thing that burns you." Others would invent complex, intellectual-sounding theories about phlogiston, which were entirely off base. Similarly, I don't think the discussion about AGI and doom scenarios gets much benefit from a super precise definition of intelligence. A broad definition that most people agree on should be enough, like "Intelligence is the capacity to create models of the world and use them to think."
But I do think we should aim for a clearer definition of AGI (yes, I realize 'Intelligence' is part of the acronym). What I mean is, we could have a more vague definition of intelligence, but AGI should be better defined. I've noticed different uses of 'AGI' here on Less Wrong. One definition is a machine that can reason about a wide variety of problems (some of which may be new to it) and learn new things. Under this definition, GPT4 is pretty much an AGI. Another common definition on this forum is an AGI is a machine capable of wiping out all humans. I believe we need to separate these two definitions, as that's really where the core of the crux lies.
I think a good definition for AGI is capability for open-ended development, the point where the human side of the research is done, and all it needs to reach superintelligence from that point on is some datacenter maintenance and time, so that eventually it can get arbitrarily capable in any domain it cares for, on its own. This is a threshold relevant for policy and timelines. GPT-4 is below that level (it won't get better without further human research, no matter how much time you give it), and ability to wipe out humans (right away) is unnecessary for reaching this threshold.
I think we also care about how fast it gets arbitrarily capable. Consider a system which finds an approach which can measure approximate actions-in-the-world-Elo (where an entity with an advantage of 200 on their actions-in-the-world-Elo score will choose a better action 76% of the time), but it's using a "mutate and test" method over an exponentially large space, such that the time taken to find the next 100 point gain takes 5x as long, and it starts out with an actions-in-the-world-Elo 1000 points lower than an average human with a 1 week time-to-next-improvement. That hypothetical system is technically a recursively self-improving intelligence that will eventually reach any point of capability, but it's not really one we need to worry that much about unless it finds techniques to dramatically reduce the search space.
Like I suspect that GPT-4 is not actually very far from the ability to come up with a fine-tuning strategy for any task you care to give it, and to create a simple directory of fine-tuned models, and to create a prompt which describes to it how to use that directory of fine-tuned models. But fine-tuning seems to take an exponential increase in data for each linear increase in performance, so that's still not a terribly threatening "AGI".
Sure, natural selection would also technically be an AGI by my definition as stated, so there should be subtext of it taking no more than a few years to discover human-without-supercomputers-or-AI theoretical science from the year 3000.
Defining intelligence is tricky, but to paraphrase EY, it's probably wise not to get too specific since we don't fully understand Intelligence yet.
That's probably true, but that would imply we would understand even less what 'artificial intelligence' or 'artificial general intelligence' are?
Spelling it out like that made me realize how odd talking about AI or AGI is. In no other situation, that I've heard of, would a large group of folks agree that there's a vague concept with some confusion around it and then proceed to spend the bulk of their efforts to speculate on even vaguer derivatives of that concept.
This is cool. Something I might try later this week as an exercise is going through every question (at least at the top level of nesting, maybe some of the nested questions as well), and give yes / no / it depends answers (or other short phrases, for non Y/N questions), without much justification.(Some of the cruxes here overlap with ones I identified in my own contest entry. Some, I think are unlikely to be truly key as important cruxes. Some, I have a fairly strong and confident view on, but would not be surprised if my view is not the norm. Some, I haven't considered in much detail at all...)
Definitely a lot of these are unlikely to be that crucial in the overall picture - but they all definitely have been points that people have relied on in arguments or discussions I've seen or been in, or questions that could potentially change my own model in ways that could matter, or both.