After some introspection, I realized my timelines are relatively long, which doesn't seem to be shared by most people around here. So this is me thinking out loud, and perhaps someone will try to convince me otherwise. Or not.

First things first, I definitely agree that a sufficiently advanced AI can pose an existential risk -- that's pretty straightforward. The key part, however, is "sufficiently advanced".

Let's consider a specific claim "Within X years, there will be a superintelligent AGI powerful enough to pose a significant existential threat", where X is any number below, say, 30[1].

Since this is a positive claim, I can't exactly refute it from thin air. Let's instead look at the best arguments for it I can think of, and why they ultimately don't convince me. Due to the temporal nature of the claim, they should involve recent technological advances and new AI capabilities[2].


With this preamble out of the way, let's look at the biggest recent achievements/fields of research, and why they won't kill us just yet

  1. AlphaGo proves machines already outsmart humans
    The game of Go was, apparently, considered complex enough that playing it requires general intelligence. Perhaps once upon a time people thought that -- I never did, and still don't. 
    In a sense, board games like this are extremely simple. You have a well defined, discrete set of fully-observable states and actions, fixed and simple rules of the game, and a very clear objective - win the game. The complexity of this game is far lower than that of a "simple" task like "make me a cup of coffee without killing any infants along the way" if you really dig into the details. 
    Looking at the specific algorithm that solved Go -- MCTS didn't really catch on in other domains beyond similar board games in the general practice of RL. 
    So all in all, I don't think that being able to play Go necessarily extends to a general intelligence; and I don't think that the specific way "we" solved Go will do that either.
  2. Diffusion Models exhibit general intelligence
    Dall-E, Stable Diffusion, Midjourney -- all of them extremely impressive, with real-world impacts. All of them solving an extremely narrow (though admittedly difficult) task of reversing the image captioning process. Even though there's a wide variety of concepts, objects, relations between them (which might point at generality), the task is ultimately only one -- change the pixels to increase the likelihood.
    There's no agency, no reactivity, just learning a statistical model of what pixels generally correspond to a "bear", and what pixels in relation to the bear represent "riding a laser unicorn". It's an amazing narrow AI system -- but sufficiently detached from any decision processes that I don't think it points towards any existential risk.
  3. General RL progress -- RL systems are basically AGI already
    If you asked me a few years ago "What is the field of ML that's the closest to whatever will eventually create an AGI", I would point to reinforcement learning -- that's why I decided to make it (the beginning of) my career. If you asked me the same question today, I would probably have the same answer -- but it will be much more difficult than I thought at the time.
    RL is by far the closest we are to creating agents -- at least following the classical definition of proactive (reward function), reactive (observation-action loop), and with social ability (in some multiagent cases).
    RL is extremely versatile -- it can play Chess, Go, Starcraft, it can control industrial processes, it can even simulate virtual crowds.
    But here's the well-guarded secret: RL... kinda sucks. Sure, the formalism is general enough to have a simulated paperclip maximizer, but the modern algorithms are not. Deploying it in the real world is another beast in of itself, since our best algorithms still often requires hundreds-thousdands-millions of full rollouts, depending on the task. 
    Can we do Sim2Real? Sure. But good luck building a simulation good enough that the agent doesn't try to create a paperclip out of the difference between 0.1 + 0.2 and 0.3.
    I expect (or hope?) that there is a paradigm shift somewhere ahead of us that will improve RL algorithms significantly. Perhaps then, I will see the situation differently. But as it stands, I don't see poor old PPO turning me into paperclips.
  4. ChatGPT, GPT-4 are superintelligent and general
    The star of the day. It's impressive, it's scary, it's versatile, and... it tests on the training set.
    Don't get me wrong -- GPT-4 is a tremendously powerful model. It's truly amazing at predicting the next token to an extent that I definitely didn't expect. It can be a great teacher to get you up to speed on any knowledge that's in its database, it can solve many common programming tasks, and spit out information-shaped sentences to fill out a cover letter. But try to go beyond that, and it's not so powerful.
    Now, admittedly I didn't run proper evaluations in that aspect, so I'm relying on some of the officially reported results, and my own anecdata. Both of these seem to point towards the fact that ChatGPT is excellent at regurgitating information in its training set, but it's not that great at solving new problems. It apparently solves algorithmic tasks really well -- as long as they're old enough to be in its training data. Anything newer, and the performance tanks. This perfectly mirrors my experience trying to get it to write some code -- if it's a simple Bootcamp 101 "Make a website with React", sure, it will do great. If it's something that's truly new and complex -- expect many exceptions or segfaults. Possibly infinitely many of them. Apologies, it got confused.
    And let's not forget the point mentioned in Diffusion models -- as far as we know, GPT-4 doesn't have an internal state, it doesn't have goals or perceptions, it's not an agent. And without that[3], I don't see an existential risk, even if we scale it up 1000x. 


As it stands, I'm pretty convinced[4] that we need a breakthrough[5] (or two) to get to a level of intelligence that's general, superhuman, and potentially threatening. None of our current methods are powerful enough that simply scaling, or incrementally improving them will get us there. On the other hand, there are many great things that these systems can do to improve our lives, so for the time being, I'll happily keep working on AI capabilities, even in the limited scope of my current research. 



  1. ^

    Considering the difference between 1993 and 2023, I have no clue what 2053 will be like.

  2. ^

    Any claim that doesn't rely on recent events, might have as well been made in 1023, when killer robots weren't a big concern [citation needed]. Note that "recent" is a very relative term, but I'm omitting the rise of computers and neural networks in general from this text.

  3. ^

    The news of the week is the plugin system which might move it a step towards agent-ishness, but imo it's a rather small step in the context of existential risk.

  4. ^

    Note: if this were political Twitter, I'd fully expect a response along the lines "Omg you're missing the absolute basics, educate yourself before posting". While I admittedly have not read every single piece of relevant literature, I'd still estimate that over the years I did much more reading and thinking on the topic than a vast majority of the (global/western) population. Possibly even more than the average AI researcher, since x-risk only recently kinda started entering a mainstream. 

  5. ^

    Something on a similar scale to the recent rediscovery of neural networks and their effectiveness.


New Comment
1 comment, sorted by Click to highlight new comments since: Today at 7:51 AM

Interesting read.

While I also have experienced that GPT-4 can't solve the more challanging problems I throw at it, I also recognize that most humans probably wouldn't be able to solve many of those problems either within a reasonable amount of time.

One possibility is that the ability to solve novel problems might follow an S curve. Where it took a long time for AI to become better at novel task than 10% of people, but might go quickly from there to outperform 90%, but then very slowly increase from there.

However, I fail to see why that must neccessarily be true (or false), so if anyone has arguments for/against they are more than welcom.

Lastly I would like to ask the author if they can give an example of a problem such that if solved by AI, they would be worried about "imminent" doom? "new and complex" programming problems is mentioned, so if any such example could be provided it might contribute to discussion.

New to LessWrong?