Am I secretly excited for AI getting weird?

porby

This post is arguably darker than my other one. I don't make any persuasive arguments about AI forecasting here; if you don't feel like looking at doominess, feel free to skip this.

I've noticed a few instances of what look like people assuming that those who are visibly concerned about AI risk don't really buy into the full weight of what they're saying.

Recently, I came across this (hi, niknoble!):

As a specific example of what I suspect is a bit of cognitive dissonance, look at the recent post on AGI by porby, which predicts AGI by 2030. I loved reading that post because it promises that the future is going to be wild. If porby is right, we're all in for an adventure. Based on the breathless tone of the post, I would surmise that porby is as excited by his conclusion as I am. For example, we have this excerpt:
This is crazy! I'm raising my eyebrows right now to emphasize it! Consider also doing so! This is weird enough to warrant it!
Would you have predicted this in 2016? I don't think I would have!
Does this strike you as someone who dreads the arrival of AGI? It seems to me like he is awaiting it with great anticipation.
But then in the comments on the post, he says that he hopes he's wrong about AGI! If you're reading this porby, do you really want to be wrong?

This is an excellent example of the kind of thing I'm talking about, so I'm going to use it.

I think my writing and speaking style defaults to a kind of lightness that can be misleading. So let me try to write something a little darker.

Well, do you?

Because I don't think P(doom | AGI) is anywhere close to 0, especially for AGI developed on very short timescales:
YES, I DO WANT TO BE WRONG.

The kind of "excitement" I feel about near-term AGI is adjacent to hearing the tornado siren, looking at the radar, seeing the warned cell moving straight east, walking out on my porch to look at a black wall of rain a mile or two away, and seeing the power flashes straight west of me as the tornado rips lives apart. While grabbing a mattress to throw over a tub, I'm doing some quick mental calculations- the statistical rarity of EF-3 or stronger tornadoes, will it stay on the ground, how large is it (glance at the hook on the reflectivity map), how sturdy is this house (the feeling of the entire house shunting to one side during an earlier storm's 120 mph winds wasn't promising), how much damage would a near miss cause? All the while, telling my family to get their shoes, don't worry, we have time (do we have time? probably), just get into the bathroom.

It didn't stay on the ground. Also, we have a storm shelter now. It only took about 6 close calls to bite that bullet!

More than excitement

You know that voyeuristic "excitement" of a really bad hurricane about to make landfall? Something wildly out of the ordinary, something that breaks the sense of normalcy and reminds you that human civilization is fragile? It's a weird, darkly attractive kind of novelty.

Watching COVID-19 in January 2020 felt that way, for a while. It was distant and not here, so it felt like an almost fictional threat. Within a few weeks, it became something else. With a sense of vertigo, I settled into the realization that it was really happening. I told my parents to go buy anything they wouldn't want to run out of if there was a run on it at the stores, because things are going to get bad, and that a lot of people were going to die. I explained they had medical backgrounds that put them at much higher risk, and hospitals might get overwhelmed soon, so they shouldn't go inside places with other people if they could avoid it. And if they do, wear masks.

The transition of the threat to inevitable reality made me feel bad about the feeling of excitement even though the feeling isn't intrinsically positive or rewarding. Something trying to kill you is exciting in its own way, but some part of me wanted to press undo. Reality was going in a direction I really didn't want, and it seemed like if I could apologize for the desire to gawk, it could make things turn out okay. But it doesn't work that way.

Reality has no safety net

Seeing advancements like GPT-3- when I hadn't yet properly priced in previous work- was like that, except stronger. That same kind of vertigo and a little bit of tunnel vision as the implications of the mechanisms I was looking at spooled out into the future. The transition from feeling fictional to being inevitable reality.

Do I bother saving for retirement? Is having kids okay? If it goes bad, is it going to go bad fast enough that the brief suffering of children is an acceptable risk? What do I say to the people that trust me when they ask me for my true thoughts, when I know there's a good chance they'll believe me? Is it enough to say, "if you're not in a position to move the needle, just live for the future where you still exist, because that's the only future where you can have regrets?" Is that comforting?

If you stare at what it means for things like tuberculosis, malaria, aging, or any of the other old monsters we haven't yet extinguished to still exist, you know that we're allowed to lose.

If you put the button in front of me to summon up a random AGI of the kind that would seem to follow from today's lackadaisical approaches, I couldn't press it yet. Even knowing that the potential upside is the end of those old monsters. Even knowing that by choosing not to press it, I need to be ready to face down the mother cradling the corpse of their infant child asking me why I didn't do something when I had the power to do so.

Because it can be worse.

Different parts of me get excited about this in different directions.

On the one hand, I see AI alignment as highly solvable. When I scan out among a dozen different subdisciplines in machine learning, generative modeling, natural language processing, cognitive science, computational neuroscience, predictive coding, etc., I feel like I can sense the faint edges of a solution to alignment that is already holographically distributed among collective humanity.

Getting AGI that has the same natural abstractions that biological brains converge on, that uses interpretable computational architectures for explicit reasoning, that continuously improves its internal predictive models of the needs and goals of other agents within its sphere of control and uses these models to motivate its own behavior in a self-correcting loop of corrigibility, that cares about the long-term survival of humanity and the whole biosphere; all of this seems like it is achievable within the next 10-20 years if we could just get all the right people working together on it. And I'm excited at the prospect that we could be part of seeing this vision come to fruition.

On the other hand, I realize that humanity is full of bad faith actors and otherwise good people whose agendas are constrained by perverse local incentives. Right now, deep learning is prone to fall to adversarial examples, completely failing to recognize what it's looking at when the texture changes slightly. Natural language understanding is still brittle, with transformer models probably being a bit too general-purpose for their own good. Reinforcement learning still falls prey to Goodharting, which would almost certainly lead to disaster if scaled up sufficiently. Honestly, I don't want to see an AGI emerge that's based on current paradigms just hacked together into something that seems to work. But I see groups moving in that direction anyway.

Without an alignment-adjacent paradigm shift that offers competitive performance over existing models, the major developers of AI are going to continue down a dangerous path, while no one else has the resources to compete. In this light, seeing the rapid progress of the last decade from Alex-Net to GPT-3 and DALLE-2 creates the sort of foreboding excitement that you talked about here. The train is barreling forward at an accelerating pace, and reasonable voices may not be loud enough over the roar of the engines to get the conductor to switch tracks before we plunge over a cliff.

I'm excited for the possibilities of AGI as I idealize it. I'm dreading the likelihood of a dystopic future with no escape if existing AI paradigms take over the world. The question becomes, how do we switch tracks?

This nicely sums up the feeling of "excitement" I get when thinking about this sort of thing too. It's more of an anxiety really. I agree it's very related to the feeling of COVID-19 as it was unfolding in the first weeks of February and March 2020. It's also similar to the feeling I got when Trump was elected, or on Jan 6. It's just this feeling like something big and momentous is happening or is about to happen. That all of a sudden something totally consequential that you hadn't previously considered is coming to pass. It's a weird, doom-y sort of feeling.

Yup. I'd liken it to the surreality of a bad dream where something irrevocable happens, except there's no waking up.

Don't worry, as soon as AGI goes live we'll all have a peaceful, eternal rest.