Disclaimer: The article was translated from Russian using GPT. It was partially proofread, but I don't know English well and therefore there may be rare translation artifacts.
Show me exactly where you break Soares' “4 background claims”—if you don’t break them, it means you agree the assumption is correct. It makes sense to assume that the premise could be correct because, all other things being equal, if I’m highly uncertain, the error in one direction means a lot of effort, while in the other, it’s everything we’ve got.
(friend Slava to me in response to my skepticism.)
I’ve already irreversibly invested in reading safety articles, so let my role be to deeply doubt the initial premise of expectations. If they are so strongly tied to reality, they won’t break under skepticism, right?
In this essay, I will attack Soares’ “4 background claims” to question them and show why, even if I agree with some of them, my expectations of the future differ from Nate’s, or why my level of uncertainty is higher.
- If you don’t break them, it means you agree that the assumption is correct.
Let’s take a look at these 4 premises—do I agree with them as predictions?
We call this ability “intelligence,” or “general intelligence.” This isn’t a formal definition — if we knew exactly what general intelligence was, we’d be better able to program it into a computer — but we do think that there’s a real phenomenon of general intelligence that we cannot yet replicate in code.
I assumed, by “general” it refers to the ability to solve a wide variety of tasks through Is this premise sufficient for conclusions related to the existential threat of AI? For me, not yet.
This premise was valid 1,000 years ago in the same formulation, yet the risk of existential threat was lower (there were no nuclear weapons, no presumed AGI threat, no other risks associated with advanced technologies).
How can such a statement generate inaccurate predictions? If it is generalized as: “Since the ability to achieve goals in many areas exists, then in this particular area, where progress with frightening consequences has not yet been achieved, it could be—and soon.”
People have systematically erred in their forecasts when observing a sudden leap in progress within a field, generalizing to all other areas. In rarer cases, they were overly optimistic that a sharp leap did not indicate further acceleration of progress in that field soon after. However, the assessment depends on too many parameters, which people usually do not fully account for. Specialists in the field may have these parameters, but there are hundreds of examples where groups of specialists from different countries, despite having the most accurate models of a complex area, failed in their predictions due to hasty generalizations. and there was a transfer of intuitions from one area to another, where some of the transfer mechanisms do not match.
If some key part of this general intelligence was able to evolve in the few million years since our common ancestor with chimpanzees lived, this suggests there may exist a relatively short list of key insights that would allow human engineers to build powerful generally intelligent AI systems.
Maybe. Or maybe not. It’s possible that a “universal AI” (for which no one has an exact model) might not be achievable in a way that aligns with generalized intuitions about it. Recalling a rough quote from some alignment article, “Show me at least one detailed scenario where we survive,” I feel irritated because I detect an implicit assumption about “we die,” and I would be satisfied with “show me at least one detailed scenario where we die.”
If I postulate that an invisible dragon will appear in my garage in a year, I could start arguing with skeptics by saying, “Show me a detailed scenario in which the invisible dragon does NOT appear in my garage in a year.” They respond with something, but it’s not even a 10-page essay, and I claim the scenario isn’t detailed enough to convince me.
Alternatively, someone could press me to provide a detailed, likely scenario of the invisible dragon appearing in my garage in a year. I’m unlikely to write a 10-page essay with a detailed scenario, and even if I do, I suspect it would reveal many false assumptions and other erroneous predictions.
Thus, when I see a generalized statement like “X has many abilities to do many things in various domains,” it doesn’t shift my confidence that this X will lead the world in the near future into a highly specific, unlikely state (completely destroying humanity within a few years is a much more complex task than not destroying it).
Often, when you don’t have an exact model of a process but make first-system (in Kahneman's sense) generalizations from past cases to guess an outcome, guessing works because some past patterns align and generalization succeeds. However, when it comes to AGI, where some patterns have never been repeated before, generalization might fail.
And I dislike the comfort in statements with assumptions like “we will definitely die, and I don’t have a single exact scenario” because believing in such things once opens the door to overconfidence in other similar scenarios in the future. And this belief influences many motivations—just as the accuracy of one’s convictions depends on being comfortable or uncomfortable with such predictions.
"Humans have a highly universal ability to solve problems and achieve goals across various domains."
The phrase "a universal ability to solve problems" might create expectations that nearly all conceivable problems can be solved and that there’s no clear boundary where this ceases to be true. The author might interpret it this way, using it as a plug when trying to convince someone of a vague prediction, searching for arguments about why success is likely. Here, the umbrella-like nature of the "universal" cluster comes into play, creating an illusion of the broad applicability of human intelligence and its tools to a wide range of tasks—including tasks we’re not even aware of yet.
If your goal is to instill fear, you might optimize the generalizability of these capabilities and outcomes. The broader they seem, the greater the uncertainty—and fear often increases with uncertainty about outcomes. However, the concept of universality loses sensitivity to specific scenarios, allowing the word "universal" to to imply vague predictions like "it will somehow kill everyone" or "a complex, unlikely scenario will somehow occur."
Suggested revision: Replace the phrase with something like:
"Humans have the ability to solve many different problems, more than other animals, but this ability is limited by the laws of physics and the environment."
If you observe the consequences of human cognitive systems without understanding the details of how they work, you might lump them all into one bucket, label it with a specific word—say, "intelligence"—and then recall the achievements of this mechanism ( building this, solving that). From there, it’s easy to start accumulating expectations about a "vague mechanism related to computation with words and images." You might then combine this with remembered results from the past and use generalizations to predict the future.
But these generalizations—and the use of the same word, "intelligence," for "similar" phenomena—can lead to false predictions, such as:
l am worried that, despite an expressed "lack of formalization" or a fixed list of expectations (and domains of this expectations"), Nate (in other articles) allows himself bold predictions with strong signaling of confidence amidst such great uncertainty.
***
Pay attention to how your vague expectations —connected to something nonspecific yet intrinsically unpleasant when "felt" internally—may become insensitive to environmental constraints that limit your prediction. The vaguer your scenario and the greater the uncertainty, the more I expect people to succumb to this tendency. When unpleasant feelings arise from modeling something vague and undesirable, there’s a temptation to agree with the conclusion that this unpleasant outcome is certain (to escape the uncertainty, which is physiologically uncomfortable).
Example: I noticed in myself that it is physiologically unpleasant to imagine crossing on a red light, even if I saw that for 400 meters to the left and right there are no cars. Observing this mechanism, I could verbalize it as "I don’t want to cross on red; something terrible will happen." But when I asked myself what exactly terrible and how, if there are no cars, my analytical part would say, "I don’t know how. Yes, there are no cars, and 100% none will appear in 15 seconds." But it’s still scary out of habit. If I didn’t have a block on the word "terrible," I would have called it that.
That is, fear and high confidence, physiologically driven, are present even in a precision-obsessed rationalist, and physiology wins over even this.
What can I say, then, when you DO NOT KNOW if there will be a car around the corner, if it will be something else, or even if there’s a ton of other uncertainty? Fear, by the same pattern that stopped me from crossing the road, tempts you even more to neglect the precision of models like "how exactly something will happen" and to settle for the bottom line: "something bad and possibly fatal," because this very bottom line feels more comfortable compared to the uncertainty.
Researchers at MIRI tend to lack strong beliefs about when smarter-than-human machine intelligence will be developed. We do, however, expect that (a) human-equivalent machine intelligence will eventually be developed (likely within a century, barring catastrophe); and (b) machines can become significantly more intelligent than any human.
Humans use their intelligence to create tools and plans and technology that allow them to shape their environments to their will (and fill them with refrigerators, and cars, and cities). We expect that systems which are even more intelligent would have even more ability to shape their surroundings, and thus, smarter-than-human AI systems could wind up with significantly more control over the future than humans have.
P.S.
"The argument about the importance of artificial intelligence rests on these four statements"
Importance for what purposes? In a vacuum again? For most purposes, guess which ones? Should I replace the word "important" here with "important for survival or for pleasant consequences, which means impossible without fulfilling these 4 conditions"? Should I interpret this as important to Nate Soares, in terms of the word "important" being linked to his stress about the absence of something related to these premises? On my map, the word "important" almost always points to feelings and preferences, and people abuse this word using the mind projection fallacy to create some kind of stable property of "important" on some object so that you would consume this seeming property regardless of who said the word, and thus there was a transfer of attitude from one person to another, and since there are seemingly stable properties, you will have stress about losing this thinking habit which will give motivation to preserve the property. That is, to experience anxiety about the absence of importance.
Since Nate doesn't decode the word "important" here, I'll have to decode it in my own way, and I'll decode it like this — the argument about the importance of artificial intelligence, I suppose, means that Nate wants to transfer to you his emotional attitude towards the things he's saying. The word "important," I expect, is always connected with stress about the absence of the important thing. Even if this is from happiness, in this essay I expect Nate's task is to add stress to you about what he wrote and peace about his arguments. I prefer to remain uncertain about stress, moving slightly in the opposite direction based on current evidence, and will be very skeptical of Nate's current arguments, creating artificial anxiety about them for myself so that there would be motivation to look for counter-theses and not refuse to become a believer in a highly uncertain unambiguous outcome.