Response to Tyler Cowen’s Existential risk, AI, and the inevitable turn in human history

Zvi

Predictions are hard, especially about the future. On this we can all agree.

Tyler Cowen offers a post worth reading in full in which he outlines his thinking about AI and what is likely to happen in the future. I see this as essentially the application of Stubborn Attachments and its radical agnosticism to the question of AI. I see the logic in applying this to short-term AI developments the same way I would apply it to almost all historic or current technological progress. But I would not apply it to AI that passes sufficient capabilities and intelligence thresholds, which I see as fundamentally different.

I also notice a kind of presumption that things in most scenarios will work out and that doom is dependent on particular ‘distant possibilities,’ that often have many logical dependencies or require a lot of things to individually go as predicted. Whereas I would say that those possibilities are not so distant or unlikely, but more importantly that the result is robust, that once the intelligence and optimization pressure that matters is no longer human that most of the outcomes are existentially bad by my values and that one can reject or ignore many or most of the detail assumptions and still see this.

My approach is, I’ll respond in-line to Tyler’s post, then there is a conclusion section will summarize the disagreements.

In several of my books and many of my talks, I take great care to spell out just how special recent times have been, for most Americans at least. For my entire life, and a bit more, there have been two essential features of the basic landscape:

1. American hegemony over much of the world, and relative physical safety for Americans.

2. An absence of truly radical technological change.

I notice I am still confused about ‘truly radical technological change’ when in my lifetime we went from rotary landline phones, no internet and almost no computers to a world in which most of what I and most people I know do all day involves their phones, internet and computers. How much of human history involves faster technological change than the last 50 years?

When I look at AI, however, I strongly agree that what we have experienced is not going to prepare us for what is coming, even in the most slow and incremental plausible futures that don’t involve any takeoffs or existential risks. AI will be a very different order of magnitude of speed, even if we otherwise stand still.

Unless you are very old, old enough to have taken in some of WWII, or were drafted into Korea or Vietnam, probably those features describe your entire life as well.

In other words, virtually all of us have been living in a bubble “outside of history.”

Now, circa 2023, at least one of those assumptions is going to unravel, namely #2. AI represents a truly major, transformational technological advance. Biomedicine might too, but for this post I’ll stick to the AI topic, as I wish to consider existential risk.

#1 might unravel soon as well, depending how Ukraine and Taiwan fare. It is fair to say we don’t know, nonetheless #1 also is under increasing strain.

The relative physical safety we enjoy, as I see it, mostly has nothing to do with American hegemony, and everything to do with other advances, and with the absurd trade-offs we have made in the name of physical safety, to the point of letting it ruin our ability to live life and our society’s ability to do things.

When there is an exception, as there recently was, we do not handle it well.

Have we already forgotten March of 2020? How many times in history has life undergone that rapid and huge a transformation? According to GPT-4, the answer is zero. It names The Black Death, Industrial Revolution and World War II, while admitting they fall short. Yes, those had larger long-term impacts, by far (or so we think for now, I agree but note it is too soon to tell), yet they impacted things relatively slowly.

I for one would call my experience of Covid living in history, in Tyler’s sense. I would also note that almost all of the associated changes were negative. Life really did get much worse, and to this day remains much worse than the counterfactual, with major hits to our health, economy, currency, national debt, society, social links and institutional trust.

In several other ways, too, I have felt like I was living in history, even if I’ve known no war or danger of conquest. Cultural change in my lifetime and especially in the last 10 years has been extremely rapid on many fronts, far more than I would expect to find in most 10-year historical periods, whatever you think of the changes. My children’s lives are not so similar to my experience.

It’s not always about war. And if you’re asking ‘how many times did I game out, fully seriously, because I felt it was important to know the answer, a potential breakdown of social order or peace in the United States in the last 5 years, even if those scenarios have so far not come to pass?’ I will simply say the answer isn’t one or zero.

Hardly anyone you know, including yourself, is prepared to live in actual “moving” history. It will panic many of us, disorient the rest of us, and cause great upheavals in our fortunes, both good and bad. In my view the good will considerably outweigh the bad (at least from losing #2, not #1), but I do understand that the absolute quantity of the bad disruptions will be high.

Yes. I believe that a lot of us have already been severely panicked and disoriented. Often multiple times, not only by Covid. Historically people would likely have better rolled with such punches. That was not what I observed.

I am reminded of the advent of the printing press, after Gutenberg. Of course the press brought an immense amount of good, enabling the scientific and industrial revolutions, among many other benefits. But it also created writings by Lenin, Hitler, and Mao’s Red Book. It is a moot point whether you can “blame” those on the printing press, nonetheless the press brought (in combination with some other innovations) a remarkable amount of true, moving history. How about the Wars of Religion and the bloody 17th century to boot? Still, if you were redoing world history you would take the printing press in a heartbeat. Who needs poverty, squalor, and recurrences of Ghenghis Khan-like figures?

Yes. We can agree Printing Press Good, and almost every other technological invention of the past 10,000 years good. Fire, The Wheel, Agriculture, Iron Working, Writing, Gunpowder, Steam Engines, Industrialization, Automobiles, Airplanes, Computers, Phones, you name it, it’s all pretty great with notably rare exceptions and there was mostly no reasonable path to stopping those exceptions for long.

I’d go another step, and say that the list of suppressed technologies that we did stop or slow also contains mostly good things. Of course there are obvious exceptions, like gain of function research and bioweapons and nukes to make the rubble bounce, and also I can see the argument for things like television and social media and crack cocaine and American cheese (so it’s clear it’s not only lethal weapons here, don’t let this list distract you), but these are the exceptions that prove the rule.

The Printing Press was still quite the gradual transition. There is a reason, which I had confirmed by GPT-4, that you can play Europa Universalis or for most historical people live your life in Europe after the printing press comes online in 1440 and until the Reformation comes knocking, and mostly not notice.

I’d also say that AI is fundamentally different from all prior inventions. This is an amazing tool, but it is not only a tool, it is the coming into existence of intelligence that exceeds our own in strength and speed, likely vastly so. This is not the same danger as Lenin or Hitler or Mao writing things or using tools.

But since we are not used to living in moving history, and indeed most of us are psychologically unable to truly imagine living in moving history, all these new AI developments pose a great conundrum. We don’t know how to respond psychologically, or for that matter substantively. And just about all of the responses I am seeing I interpret as “copes,” whether from the optimists, the pessimists, or the extreme pessimists (e.g., Eliezer). No matter how positive or negative the overall calculus of cost and benefit, AI is very likely to overturn most of our apple carts, most of all for the so-called chattering classes.

Yes. AI is very much going to overturn a lot of our apple carts. I continue to reserve judgment on ‘most’ depending on the scenario, especially if you don’t only consider the ‘chattering classes.’

The reality is that no one at the beginning of the printing press had any real idea of the changes it would bring. No one at the beginning of the fossil fuel era had much of an idea of the changes it would bring. No one is good at predicting the longer-term or even medium-term outcomes of these radical technological changes (we can do the short term, albeit imperfectly). No one. Not you, not Eliezer, not Sam Altman, and not your next door neighbor.

How well did people predict the final impacts of the printing press? How well did people predict the final impacts of fire? We even have an expression “playing with fire.” Yet it is, on net, a good thing we proceeded with the deployment of fire (“Fire? You can’t do that! Everything will burn! You can kill people with fire! All of them! What if someone yells “fire” in a crowded theater!?”).

How do we know this? What counts as ‘good at predicting’ such changes? I think in broad strokes you could make some very good predictions in 1440, and I am guessing that many did.

If you want to say fire caused ‘all of human history’ and then I mean, sure, in details that was very hard to predict. Not fair.

Then again, if you’re giving fire that level of credit, what if your prediction was ‘humans will harness more of nature, will discover more things, will be fruitful and multiply and dominate the Earth and increasingly over time destroy that which existing other animals value?’

That doesn’t seem like a crazy prediction to make. It isn’t specific but it doesn’t have to be. Nor is it a distant possibility, or extremely unlikely because there are so many other possible outcomes.

The ancients agreed that was a good and fair prediction. Consider the Myth of Prometheus, very on point. In the version I was taught, the existing powers (The Greek Gods) forbade giving humans the ability to recursively self-improve their abilities (harness fire, this exact thing) because this would allow the humans to displace and disempower the Gods over time, the same way the Gods had displaced the Titans.

Which is indeed exactly what happened, in any important sense. Even if the Greek Gods really did exist the future would now belong to the humans or to their AIs. Did that result in something the Greek Gods would still value? In some ways yes, in some very important ways no. The reason the answer is partly yes is because the Greek Gods were, in many real senses, stories created by and therefore aligned with humans.

The ‘within-a-lifetime’ predictions for the outcomes from fire, of course, seem like they’d be pretty fair and plausibly accurate, for what that’s worth.

So when people predict a high degree of existential risk from AGI, I don’t actually think “arguing back” on their chosen terms is the correct response. Radical agnosticism is the correct response, where all specific scenarios are pretty unlikely. Nonetheless I am still for people doing constructive work on the problem of alignment, just as we do with all other technologies, to improve them. I have even funded some of this work through Emergent Ventures.

What counts as a ‘specific scenario’ here?

Consider the predictions made above, about what would happen after the invention of fire, or similar predictions made in the wake of other basic discoveries, or based on homo sapiens having reached the necessary thresholds of intelligence and perhaps cultural transmission.

If you had predicted, early in the Industrial Revolution, that industry and those who mastered it would rapidly dominate the globe and any who did not embrace it, and anything they did not value would mostly get destroyed?

What if I claim that if we encountered Robin Hanson’s grabby aliens tomorrow, as unlikely as that is statistically, and they didn’t care about us or go galaxy brained on acausal trade or something, we would all be super duper dead, and at a minimum we are not going to be getting much of that cosmic endowment?

I would claim that my core model of AI risk is largely, to me, on a similar level. Not ‘can’t travel faster than speed of light’ or ‘Memento Mori’ or ‘if you raise the price you lower the quantity’ but also not that different either?

If you create something with superior intelligence, that operates at faster speed, that can make copies of itself, what happens by default?

That new source of intelligence will rapidly gain control of the future. It is very, very difficult to prevent this from happening even under ideal circumstances.

Some version of that intelligence will be pointed by someone, at some point, towards achieving some goal. Even if you think it is possibleto design powerful AIs that are not agents and not used as agents, and even to use them to perform miracles or pivotal acts, have you been watching what humans are doing? We are already designing tools to explicitly turn GPT into an agent.

So it doesn’t matter. There will be an agent and there will be a goal. Whatever that goal is (if you want to strengthen this requirement, it is easy to see that ambitious open-ended goals will totally be given to AIs pretty often anyway unless we prevent this), again by default, the best way to maximally and maximally reliably achieve that goal will be to rapidly take control of the future, provided that is within the powers of the system. Which it will be. So the system will do that.

This may or may not require or involve recursive self-improvement. The system then, unless again something goes very, very right, wipes out all value in the universe.

Greatly more powerful things take over from much less powerful things. Things that are much more intelligent than us, and faster than us to boot, and that can be copied, and that can be pointed towards goals, will be greatly more powerful than us. None of this requires complex detailed prediction.

This is not like our past tools. Even in most scenarios where many impossible-seeming things go spectacularly well things do not turn out so hot for us humans or our values.

There is no reason to think things should work out well for us. If we have true radical agnosticism, consider most possible arrangements of matter, or most low-entropy possible arrangements of matter, or most possible inhabitants of mind-space, or most possible advanced intelligences and their possible values, or anything like that.

If you think it would be fine if all the humans get wiped out and replaced by something even more bizarre and inexplicable and that mostly does not share our values, provided it is intelligent and complex, and don’t consider that doom, then that is a point of view. We are not in agreement. I would also warn that we should not presume that the resulting universe would actually be that likely to have that much intelligence or complexity when the music stops. Again, under radical agnosticism or otherwise, one should notice that most configurations of the universe seem pretty wasteful and devoid of value.

Do I and similar others get into tons of detail then, about how stopping this transfer of power from humans to AIs from happening, or preventing all the humans from dying in its wake, is super difficult? Oh yes, because there are so many non-obvious and complex reasons why this is hard, and why so many imagined alternative scenarios are actually Can’t Happens or damn near one.

I do think the detailed discussions are valuable, that they are vital context to modeling what might happen, and towards getting a reasonable distribution of possible outcomes. They’re still beyond scope here.

Key here is that in my model, the space of possible futures that involve the creation of transformational AI has quite a lot of no good very bad zero-value options in it, including but not limited to the default or baseline scenarios. Whereas the space of good outcomes is hard to locate, and requires specific things to go right in some unexpected way.

When we notice Earth seems highly optimized for humans and human values, that is because it is being optimized by human intelligence, without an opposing intelligent force. If we let that cause change, the results change.

I am a bit distressed each time I read an account of a person “arguing himself” or “arguing herself” into existential risk from AI being a major concern. No one can foresee those futures!

Once you keep up the arguing, you also are talking yourself into an illusion of predictability. Since it is easier to destroy than create, once you start considering the future in a tabula rasa way, the longer you talk about it, the more pessimistic you will become. It will be harder and harder to see how everything hangs together, whereas the argument that destruction is imminent is easy by comparison. The case for destruction is so much more readily articulable — “boom!”

I don’t think this would pass as a model of what most such people are thinking. It certainly does not pass mine.

I might say: The core reason we so often have seen creation of things we value win over destruction is, once again, that most of the optimization pressure by strong intelligences was pointing in that directly, that it was coming from humans, and the tools weren’t applying intelligence or optimization pressure. That’s about to change.

I almost never hear arguments like the one quoted above made, at least not in any load-bearing way. I cannot recall anyone saying ‘easier to create than destroy,’ although that is certainly true as far as it goes. It is more like in the long run ‘it is easier for someone to create and do than to ensure no one creates and does’ and ‘once created this thing will be able to create or destroy at will and this will involve our destruction, whatever is created.’ We are very much not saying ‘things get complex and I can’t see a solution, boom is easy, so probably boom.’

One does not need ‘predictability’ to gain some insight into what things might plausibly happen versus which can’t happen, or which types of scenarios are relatively more likely, even under quite a lot of uncertainty. Waving one’s hand and saying ‘can’t predict things’ doesn’t get you out of this. Nor does saying ‘tools have always worked out fine in the past, we’re still here and things are good.’

Nor do we get to make the move ‘all things are possible, therefore doomed scenarios are distant and not likely and not worth worrying about.’ What makes you think that most future scenarios are not doomed, in terms of what you would value about the universe, however you think about that, and we need only worry about particular narrow specific dooms that don’t add up to much probability mass? What makes you think that us puny humans get to keep deciding what happens, or that what ends up happening will be something of which we approve?

I presume this is essentially, at core, Tyler’s Stubborn Attachments argument. That more economic growth and prosperity creates more value, even if it might not take the form you would like, and that in the long run nothing else matters?

I don’t entirely buy that argument even if transformational AI or AGI was not a practical physical possibility. I am sympathetic to that view in such worlds, I think the view is true on the margin for many people and most policy choices. I have some faith that if the future still fundamentally was based on what humans wanted and decided, in good Hayekian style, I’d worry about stubborn equilibria and path dependence, including in terms of their ability to guide longer term growth, but I would worry far less about this than most others.

Yet at some point your inner Hayekian (Popperian?) has to take over and pull you away from those concerns. (Especially when you hear a nine-part argument based upon eight new conceptual categories that were first discussed on LessWrong eleven years ago.) Existential risk from AI is indeed a distant possibility, just like every other future you might be trying to imagine. All the possibilities are distant, I cannot stress that enough. The mere fact that AGI risk can be put on a par with those other also distant possibilities simply should not impress you very much.

This seems like a strange reference class claim, and seems like it grasps at associations and affectations. One cannot say every future is equally distant, or use that with arbitrary divisions of possible classes of futures, none of this is probability.

Given this radical uncertainty, you still might ask whether we should halt or slow down AI advances. “Would you step into a plane if you had radical uncertainty as to whether it could land safely?” I hear some of you saying.

I would put it this way. Our previous stasis, as represented by my #1 and #2, is going to end anyway. We are going to face that radical uncertainty anyway. And probably pretty soon. So there is no “ongoing stasis” option on the table.

I can say that if there was a plane where I had radical uncertainty, or 90% confidence, on its ability to land safely, I would not get on that plane. If you said ‘but you will eventually get on a plane at some point’ I would say all right, let’s work on our air travel technology and build a different plane. If you told me ‘yes we might not have to put everyone on Earth into this radically uncertain plane now but we definitely are going to do it with some plane, eventually, might as well do it now, I’d probably get to work on airplane safety.

No, we cannot have ongoing stasis. The AI is very much out of the box and on its way, as I know full well, and advances will continue. I don’t have any hope of preventing GPT-5 and I don’t know anyone else who does either, whether or not it is a good idea.

I find this reframing helps me come to terms with current AI developments. The question is no longer “go ahead?” but rather “given that we are going ahead with something (if only chaos) and leaving the stasis anyway, do we at least get something for our trouble?” And believe me, if we do nothing yes we will re-enter living history and quite possibly get nothing in return for our trouble.

With AI, do we get positives? Absolutely, there can be immense benefits from making intelligence more freely available. It also can help us deal with other existential risks. Importantly, AI offers the potential promise of extending American hegemony just a bit more, a factor of critical importance, as Americans are right now the AI leaders. And should we wait, and get a “more Chinese” version of the alignment problem? I just don’t see the case for that, and no I really don’t think any international cooperation options are on the table. We can’t even resurrect WTO or make the UN work or stop the Ukraine war.

Too late, it’s happening, you can’t stop it and it’s good actually. I know that meme.

So essentially the argument here is that if we don’t build AI fast to beat China then the Chinese will build it first, and we cannot possibly make a deal here, so we had better build it first to maintain our hegemony, the important thing is which monkey gets the banana first?

That is exactly the nightmare scenario thinking we’ve been warning about for decades, shouting from the rooftops, who says the future is so hard to predict?

It also does not bear on the question of what one should expect, should one go down that road, even if true.

It is entirely possible (I do not endorse these numbers at all) that if we build AGI here in America quickly we die with 50% probability and if we let China build it we die with 75% probability instead, or perhaps we die with 50% probability either way but if China builds the AI and we live then we get a totalitarian future, or what not.

And perhaps there is in practice actually no way out of the dilemma. And maybe we should therefore with heavy hearts do the most dangerous thing that has ever been done. No missing moods. Do not pretend the 50% risk is undefined and therefore almost zero. Litany of Tarski, if it’s 5% or 10% or 50% or 90% I want to believe that.

Besides, what kind of civilization is it that turns away from the challenge of dealing with more…intelligence? That has not the self-confidence to confidently confront a big dose of more intelligence? Dare I wonder if such societies might not perish under their current watch, with or without AI? Do you really want to press the button, giving us that kind of American civilization?

The kind of civilization that wants to survive. That wants its people and their legacies to survive. Seriously.

I don’t want a society that has the self-confidence to commit suicide because it wouldn’t look confident to not do that. If we are who I want us to be? We will not go quietly into the night. We will not perish without a fight.

(Nor will we presume that we can successfully face down a technologically vastly superior an alien invasion with bravery and a computer virus.)

If you tell me we can make all our people more intelligent? Or all our children more intelligent? Great, let’s totally do that.

If you instead propose creating powerful truly alien computer intelligences that we have no idea how to control, whose values we cannot predict, that will inevitably take control of our future and impose very alien values that I model as very likely not including keeping us around all that long, for reasons we’ve discussed a lot already? Let’s not do that.

Unless you are fine with that outcome. In which case we are not in agreement.

So we should take the plunge. If someone is obsessively arguing about the details of AI technology today, and the arguments on LessWrong from eleven years ago, they won’t see this. Don’t be suckered into taking their bait. The longer a historical perspective you take, the more obvious this point will be. We should take the plunge. We already have taken the plunge.

This is a call to not consider the object-level physical arguments about how AI is likely to work and what it is likely to do when it scales up over the medium-term. That does not seem like a good way to predict its likely consequences, at all. Or to ensure good outcomes, at all.

My Inner Tyler says that’s the point, you can’t predict such outcomes, Stubborn Attachments, economic growth, ship has sailed regardless, stop pretending you matter or you have any control over the future, there is no other way. I don’t agree.

We designed/tolerated our decentralized society so we could take the plunge.

See you all on the other side.

Yes, we designed and tolerated our society in order to be able to create lots of new tools, and do lots of new things. Which we have gotten out of the habit of doing, instead preventing us from building houses and clean energy projects and trying new medicines and doing a wide variety of things without explicit permission. When we do get in the way, it’s almost always making things much worse, creating stagnation and impoverishment. It’s terrible.

And yes, I absolutely want to be able to say that all of that applies to AI as well.

I even think it actually does apply to AIs like GPT-4. I expect great and positive things.

I still can’t help but notice that we are all on schedule to then die if we keep going. Not 99%+ definitely, but not ‘distant possibility’ or anything one can ignore. And that’s bad, you see, and worth doing quite a lot to prevent.

Conclusion

In most ways in most contexts, my model is remarkably close to Tyler’s, as I understand it. If we were having this argument about building a tool that wasn’t intelligent, I would almost always agree. We should go ahead and build it, far more than we actually do. We both want to see more focus on economic growth, less restraint and regulation, less worry about distributional impacts or shifts in what humans value over time, more confidence that life will get better as a result of improving physical conditions.

I would even apply that to current AI systems like GPT-4, even with plug-ins. I see the direct risks there as fully acceptable, except for the risk of what comes after.

That brings us to where we centrally disagree. When we cross the necessary thresholds and AI gets sufficiently powerful, I expect most outcomes to be existentially bad by my own values, in ways that are very difficult to avoid. I see this as robust, not based on a complex chain of logical reasoning.

I also strongly expect that the safety protocols that work now will suddenly stop working at exactly the worst possible time, and that this is simply a fact about the physical world. We’ll need solutions that at least might work, and we don’t have them. Assuming that things likely kind of turn out normal and fine on this level seems like exactly the type of thing Tyler is warning others not to think in so many other contexts.

I also put much higher probability and credence to particular scenarios of rapid and complete existential risk, especially those that involve some combination of self-improvement, power-seeking, instrumental convergence, sharp left turns, orthogonality of goals and the AGI winning before we know there is a battle or that the AGI even exists in anything like its current form and capabilities. I do not consider this a ‘distant possibility’ at all. I don’t have it at 99% or anything, but I see this as the natural default outcome and the details of how we get there as mostly not much altering the destination.

The thing is, I am not relying on that to explain why I am worried.

I see these as two distinct disagreements, both about how seriously we should take certain particular more narrow scenarios in terms of probability, and the question of whether the bulk of other potential outcomes we should consider doomed versus not doomed (and I do think that most of the not doomed ones probably go very well).

That brings us to the third disagreement, which is a more universal question of whether one can make usefully predictions about the future at all beyond the short-term – Tyler as I understand him says no, hence Stubborn Attachments. I say yes, and that while finding good interventions to change outcomes over longer time horizons is difficult it is not impossible, or was it in the past.

We would have many disagreements about details of arguments, except that Tyler in his third disagreement is arguing that none of those details matter. I would say that they matter very much for the second argument, even if one rejects the first on the basis of the third.

The fourth disagreement is in Tyler’s assertion of a fait accompli. Even if we could slow things down or stop them here, he says, we can’t stop China or make a deal with them, so we need to go ahead anyway. Well, not with that attitude we can’t, that’s only going to make the race more intense, faster and less safe. I am not convinced China could make real progress in AI on its own rather than doing imitation. I am not convinced coordination and other interventions hopeless, even if good ones are difficult to find – we don’t have a solution to this, even a partial one, but we also don’t have a solution to alignment.

I do see a lot of signs that the necessary concerns are gaining in traction and attention, and that those in AI labs take them increasingly seriously. That greatly increases our chances of success in various ways. Some dignity has been won versus the counterfactual, new lines of action are possible in the future. What we have so far is inadequate and will definitely fail, I don’t like where we are, it is still a start, every little bit helps.

There is also a fifth disagreement, where Tyler considers us to have not lived through history, that tech advances have been unusually slow and non-disruptive, that unless we build AI soon that suddenly we will once again live in ‘interesting times’ anyway, that are filled with danger and disruption in ways we will not like, sufficiently so that perhaps substantial risk of ruin is justified to prevent this.

I think there are important things being gestured at here, and that goes double if we ‘bake in’ existing AI technologies that we can’t hope to undo. A lot of things are going to change, our lives are going to be disrupted. I still think that in many ways my life has been pretty disrupted by technological change. It has been extremely physically safe, more so than I would even want, but I don’t expect that the end of hegemony would put me in any physical danger. It is not only in America that life is deeply physically safe.

At core: I think taking an attitude of fait accompli, of radical uncertainty and not attempting to predict the future or what might impact it, is not The Way, here or anywhere else. Nor should we despair that there is anything we can do to change our odds of success or sculpt longer term outcomes beyond juicing economic growth and technological advancement (although in almost every case we should totally be juicing real economic growth and technological advancement).

If you think we can’t slow things down, or that slowing things down would inevitably hand the race to China, I notice that we are already slowing things down in the name of safety concerns even if they are other safety concerns, that there is real and growing effort to worry about all sorts of risks, both in general and in the AI labs. We are not favorites, the game board is in a terrible state, the odds are against us and the situation is grim, but the game is going in many ways much better than I expected, or at least much better than I would have expected given the pace of capabilities progress.

My other approach, as always, continues to be that even if we cannot solve the problem directly, we can help people better understand the problem, help people better understand the world, improve our ability to reason and make good decisions generally, improve the world such that coordination and cooperation and optimism and personal sacrifices become more viable – almost entirely in ways that I would hope Tyler would agree with.

I feel like I must be reading this wrong, because Tyler seems to be saying that uncertainty somehow weighs against risk. This is deeply confusing to me, as normally people treat the association as running the other way.

Yes. His argument is it is against any particular risk and here the risk is particular, or something. Scott Alexander's response is... less polite than mine, and emphasizes this point.

Just read that one this morning. Glad we have a handle for it now.

Confusion, I dub thee ~~Tyler's Weird Uncertainty Argument~~ Safe Uncertainty Fallacy!

First pithy summarization:

Safety =/= SUFty

Re uncertainty about safety, if we were radically uncertain about how safe AI is, then the optimists would be more pessimistic, and the pessimists would be more optimistic.

In particular that means I'd have to be more pessimistic, while the extreme pessimists like Yudkowsky would have to be more optimistic on the problem.

Yeah - it's odd, but TC is a self-professed contrarian after all.

I think the question here is: why doesn't he actually address the fundamentals of the AGI doom case? The "it's unlikely / unknown" position is really quite a weak argument which I doubt he would make if he actually understood EY's position.

Seeing the state of the discourse on AGI risk just makes it more and more clear that the AGI risk awareness movement has failed to express its arguments in terms that non-rationalists can understand.

People like TC should the first type of public intellectual to grok it, because EY's doom case is is highly analogous to market dynamics. And yet.

Tyler Cowen is using the fact that it's hard to say anything about the long term effects of a technology to dispute technical arguments about immediate effects. You can predict pretty conclusively in advance that a certain nuclear power plant design might cause a meltdown, for instance, and engineers wouldn't be "unwise" to suggest so.

I’d also say that AI is fundamentally different from all prior inventions. This is an amazing tool, but it is not only a tool, it is the coming into existence of intelligence that exceeds our own in strength and speed, likely vastly so.

I think the above quote is the key thing. Human beings have a lot of intuitions and analogies about tools, technologies and social change. As far as I can tell, all of these involve the intuition that technologies simply magnify the effect of human labor, intentions and activities. AGI would be a thing which could act entirely autonomously from humans in many if not all areas of activity and these base human intuitions and analogies arguably wouldn't apply.

As an AI Alignment optimist, I do agree that AGI is probably different from other technologies.

But, if I had to explain why I disagree with the conclusion of near inevitable doom, I think my biggest disagreement is that we don't have to totally replicate the learning process that humans and animals have.

To explain this better, I think it's worth taking a difference between offline learning and online learning.

Offline learning is essentially Cartesian learning in nature: We give the AI a batch of data, it learns something, we give it another batch of data, and we repeat the process until the AI generalizes and learns. One important point is the AI has no control of the data, or what it learns, like in Cartesian systems. Thus it doesn't have incentives to hack a human's values, nor does it have incentives to learn deceptive alignment.

Online learning is the classic embedded agency picture, where the AI selects it's own data points to learn, with rewards given, but the AI is driving the entirety of learning, and it controls what it learns, as well as their distributions. Humans and animals are near pure online learners, and their architectures almost prohibit offline learning.

Speaking of deceptive alignment, Pretraining from Human Feedback does quite a lot better than Reinforcement Learning from Human Feedback, as it gives it a simple, myopic goal that is way more outer aligned than the Maximum Likelihood Estimation goal which is currently the goal of LLMs.

It also prevents deceptive alignment, given that it's a myopic goal.

It is also competitive with Reinforcement Learning from Human Feedback.

So once we've eliminated or reduced deceptive alignment and outer alignment issues, there's not much else to do but turn on the AGI.

This is why I'm much more optimistic than you Zvi over how well AI alignment could go: I think we can replicate a finite time Cartesian AI and I think there's a fairly straightforward path to alignment based on new work.

First problem, A lot of future gains may come from RL style self play (IE:let the AI play around solving open ended problems) That's not safe in the way you outline above.

The other standard objection is that even if the initial AGI is safe people will do their best to jailbreak the hell out of that safety and they will succeed.

That's a problem when put together with selection pressure for bad agentic AGIs (since they can use sociopathic strategies good AGIs will not use like scamming, hacking, violence etc.). (IE:natural selection goes to work and the results blow up in our face)

Short of imposing very stringent unnatural selection on the initial AGIs to come, the default outcome is something nasty emerging. Do you trust the AGI to stay aligned when faced with all the bad actors out there?

Note:my P(doom)=30% (P(~doom) depends on either a good AGI executing one of the immoral strategies to pre-empt a bad AGI (50%) or maybe somehow scaling just fixes alignment(20%))

>First problem, A lot of future gains may come from RL style self play (IE:let the AI play around solving open ended problems)

How do people see this working? I understand the value of pointing to AI dominance in Chess/Go as illustrating how we should expect AI to recursively exceed humans at tasks, but I can't see how RL would be similarly applied to "open-ended problems" to promote similar explosive learning. What kind of open problems with a clear and instantly-discernable reward function would promote AGI growth, rather than a more-narrow type of growth geared towards solving the particular problem well?

Note: This is an example of how to do the bad thing (extensive RL fine tuning/training). If you do it the result may be misalignment, killing you/everyone.

To name one good example that is very relevant, programming, specifically having the AI complete easy to verify small tasks.

The general pattern is to take existing horribly bloated software/data and extract useful subproblems from it. (EG:find the parts of this code that are taking the most time) and then turn those into problems for the AI to solve(eg: here is a function + examples of it being called, make it faster). Ground truth metrics would be simple things that are easy to measure (EG:execution time, code quality/smallness, code coverage, is the output the same?) and then credit assignment for sub-task usefulness can be handled by an expected value estimator trained on that ground truth as is done in traditional game playing RL. Possibly it's just one AI with different prompts.

Basically Microsoft takes all the repositories on GitHub that build sucessfully and have some unit tests, and builds an AI augmented pipeline to extract problems from that software. Alternatively, a large company that runs lots of code takes snapshots + IO traces of production machines, and derives examples from that. You need code in the wild doing it's thing.

Some example sub-tasks in the domain of software engineering:

make a piece of code faster
make this pile of code smaller
is f(x)==g(x)? If not find a counterexample (useful for grading the above)
find a vulnerability and write an exploit.
- fix the bug while preserving functionality
identify invariants/data structures/patterns in memory (EG:linked lists, reference counts)
- useful as a building block for further tasks (EG:finding use after free bugs)

GPT-4 can already use a debugger to solve a dead simple reverse engineering problem albeit stupidly^[1] https://arxiv.org/pdf/2303.12712.pdf#page=119

Larger problems could be approached by identifying useful instrumental subgoals once the model can actually perform them reliably.

The finished system should be able to extend shoggoth tentacles into a given computer, identify what that computer is doing and make it do it better or differently.

The finished system might be able to extend shoggoth tentacles into other things too! (EG:embedded systems, FPGAs) Capability limitations would stem from the need for fast feedback so software, electronics and programmable hardware should be solvable. For other domains, simulation can help(limited by simulation fidelity and goodharting). The eventual result is a general purpose engineering AI.

Tasks heavily dependent on human judgement (EG:is this a good book? Is this action immoral) have obviously terrible feedback cost/latency and so scale poorly. This is a problem if we want the AI to not do things a human would disapprove of.

^{^}
RL training could lead to a less grotesque solution. IE:just read the password from memory using the debugger rather than writing a program to repeatedly run the executable and brute force the password.

>The finished system should be able to extend shoggoth tentacles into a given computer, identify what that computer is doing and make it do it better or differently.

Sure. GPT-X will probably help optimize a lot of software. But I don't think having more resource efficiency should be assumed to lead to recursive self-improvement beyond where we'd be at given a "perfect" use of current software tools. Will GPT-X be able to break out of those current set of tools, only having been trained to complete text and not to actually optimize systems? I don't take this for granted, and my view is that LLMs are unlikely to devise radically new software architectures on their own.

<rant>It really pisses me off that the dominant "AI takes over the world" story is more or less "AI does technological magic". Nanotech assemblers, superpersuasion, basilisk hacks and more. Skeptics who doubt this are met with "well if it can't it just improves itself until it can". The skeptics obvious rebuttal that RSI seems like magic too is not usually addressed.</rant>

Note:RSI is in my opinion an unpredictable black swan. My belief is RSI will yield somewhere between 1.5-5x speed improvement to a nascent AGI from improvements in GPU utilisation and sparsity/quantisation, requiring significant cognition spent to achieve speedups. AI is still dangerous in worlds where RSI does not occur.

Self play generally gives superhuman performance(GO,chess, etc.) even in more complicated imperfect information games (DOTA, Starcraft). Turning a field of engineering into a self-playable game likely leads to (superhuman(80%),Top-human equiv(18%),no change(2%)) capabilities in that field. Superhuman or top-human software engineering (vulnerability discovery and programming) is one relatively plausible path to AI takeover.

https://googleprojectzero.blogspot.com/2023/03/multiple-internet-to-baseband-remote-rce.html

Can an AI take over the world if it can?:

do end to end software engineering
find vulnerabilities about as well as the researchers at project zero
generate reasonable plans on par with a +1sd int human (IE:not hollywood style movie plots like GPT-4 seems fond of)

AI does not need to be even superhuman to be an existential threat. Hack >95% of devices, extend shoggoth tentacles, hold all the data/tech hostage, present as not skynet so humans grudgingly cooperate, build robots to run economy(some humans will even approve of this), kill all humans, done.

That's one of the easier routes assuming the AI can scale vulnerability discovery. With just software engineering and a bit of real world engineering(potentially outsourceable) other violent/coercive options could work albeit with more failure risk.

Math problems, physical problems, doing stuff in simulations, playing games.

RL isn't magic though. It works in the Go case because we can simulate Go games quickly and easily score the results and then pit adversarial AIs against eachother in order to iteratively learn.

I don't think this sort of process lends itself to the sort of tasks that we can only see an AGI accomplishing. You can't train it to say write a better version of Winds of Winter than GRRM could because you don't have a good algorithm to score each iteration.

So what I'm really trying to ask is what specific sort of open ended problems do we see being particularly conducive to fostering AGI, as opposed to a local maximizer that's highly specialized towards the particular problem?

A generality maximizer, where the machine has a large set of "skills" it has learned on many different tasks, can allow it to perform well on zero shot untrained tasks. This was seen in Palm-E and GPT-4.

A machine that can do a very large number of tasks that are evaluatable, and at least do ok by mimicking the average human or by weighting the text it learned from by scoring estimates is still an AGI.

I think you moved the goalposts from "machine as capable as an average human" or even. "capable as a top 1 percent human and superintelligent in any task with a narrow metric" to "beats humans at EVERYTHING". That is an unreasonable goal and high performing ASIs may not be able to write better that grrm either.

I'm asking specifically about the assertion that "RL style self play" could be used to iterate to AGI. I don't see what sort of game could lead to this outcome. You can't have this sort of self-play with "solve this math problem" as far as I can tell, and even if you could I don't see why it would promote AGI as opposed to something that can solve a narrow class of math problems.

Obviously LLMs have amazing generalist capabilities. But as far as I can tell you can't iterate on the next version of these models by hooking them up to some sort of API that provides useful, immediate feedback... we're not at the cusp of removing the HF part of the RLHF loop. I think understanding this is key to whether we should expect slow takeoff vs. fast takeoff likelihood.

Anyways here's how to get an AGI this way : https://www.lesswrong.com/posts/Aq82XqYhgqdPdPrBA/full-transcript-eliezer-yudkowsky-on-the-bankless-podcast?commentId=Mvyq996KxiE4LR6ii

This will work, the only reason it won't get used is it is possibly not the computationally cheapest option. (this proposal is incredibly expensive for compute unless we do a lot of reuse of components between iterations).

Whether you consider a machine that has a score heuristic that forces generality by negatively weighting complex specialized architectures and heavily waiting zero shot multimodal/multi-skill tasks, and is able to do hundreds of thousands of tasks an "AGI" is up to your definition.

Since the machine would be self replicating and capable of all industrial, construction, driving, logistics, software writing tasks - all things that conveniently fall into the scope of 'can be objectively evaluated' I say it's an AGI. It's capable of everything needed to copy itself forever and to self improve, it's functionally a sentient new civilization. The things you mentioned - like beating GRRM at writing a good story - do not matter.

Sure, this is useful. To your other posts, I don't think we're really disagreeing about what AGI is - I think we'd agree that if you took a model with GPT4-like capabilities and hooked it up to a chess API to reinforce it you would end up with a GPT4 model that's very good at playing chess, not something that has strongly-improved its general underlying world model and thus would also be able to say improve its LSAT score. And this is what I'm imaging most self-play training would accomplish... but I'm open to being wrong. To your point about having a "benchmark of many tasks", I guess maybe I could imagine hooking it up to like 100 different self-playing games which are individually easy to run but require vastly different skills to master, but I could also see this just... not working as well. Teams have been trying this for a decade or so already, right? A breakthrough is possible though for sure.

I'm just trying to underscore that there are lots of tasks which we hope that AGIs would be able to accomplish (eg. solving open math problems) but we probably cannot use RL to directly iterate a model to accomplish this task because we can't define a gradient of reward that would help define the AGI.

To your point about having a "benchmark of many tasks", I guess maybe I could imagine hooking it up to like 100 different self-playing games which are individually easy to run but require vastly different skills to master, but I could also see this just... not working as well. Teams have been trying this for a decade or so already, right? A breakthrough is possible though for sure.

No, nobody has been trying anything for decades that matters. As it turns out, the only thing that matters was scale. So there are 3 companies that had enough money for scale, and they are the only efforts that count, and all combined have done a small enough number of full scale experiments you can count them up with 2 hands. @gwern has expressed the opinion that we probably didn't even need the transformer, other neural networks likely would have worked at these scales.

As for the rest of it, no, we're saying at massive scales, we abdicate trying to understand AGI architectures - since they are enormously complex and coupled machined - and just iteratively find some that work by trial and error.

"work" includes generality. The architecture that can play 100 games and does extremely well at game 101 the first try gets way more points than one that doesn't. The one that has never read a book on the topic of the LSAT but still does well on the exam is exactly what we are looking for. (though this can be tough to filter since obviously it's simply easier to train on all text in existence).

One that has controlled a robot to manipulate fine wire and many object manip tasks, and one that has passed the exams for a course on electronics, and then first try builds a working circuit in a simulated world is what we're looking for. So more points on that.

That's the idea. Define what we want the machine to do and what we mean by "generality", iterate over the search space a very large number of times. In an unbiased way, pick the most distinct n winners and have those winners propose the next round of AGI designs and so on.

And most of the points for the winners are explicitly for the generality behavior we are seeking.

>As it turns out, the only thing that matters was scale.

I mean, in some sense yes. But AlphaGo wasn't trained by finding a transcript of every Go game that had ever been played, but instead was trained via self-play RL. But attempts to create general game-playing agents via similar methods haven't worked out very well, in my understanding. I don't assume that if we just threw 10x or 100x data at them that this would change...

>The architecture that can play 100 games and does extremely well at game 101 the first try gets way more points than one that doesn't. The one that has never read a book on the topic of the LSAT but still does well on the exam is exactly what we are looking for.

Yes, but the latter exists and is trained via human reinforcement learning that can't be translated to self-play. The former doesn't exist as far as I can tell. I don't see anyone proposing to improve GPT-4 by turning from HFRL to self-play RL.

Ultimately I think there's a possibility that the improvements to LLMs from further scaling may not be very large, and instead we'll need to find some sort of new architecture to create dangerous AGIs.

Gpt-4 did RL feedback that was self evaluation across all the inputs users fed by chatGPT.

Self play would be having it practice leetcode problems with the RL feedback the score.

The software support is there and the RL feedback worked, why do you think it is even evidence to say "obvious thing that works well hasn't been done yet or maybe it has, openAI won't say"

There is also a tremendous amount of self play possible now with the new plugin interface.

You can connect them to such an API and it's not hard and we already have the things to make the API and you can start with llms. It's a fairly simple recursive bench and obvious.

Main limit is just money.

I think you need to define what you think AGI is first.

I think with a reasonable, grounded, and measurable version of AGI it is trivial to do with self play. Please tell me what you think AGI means. I don't think it matters if there are subjective things the AGI can't do well.

First problem, A lot of future gains may come from RL style self play (IE:let the AI play around solving open ended problems) That's not safe in the way you outline above.

Still, offline learning is very useful, and so long as you do enough offline learning, then you don't have problems in the online learning phase.

Next, jailbreaking. I'll admit, this isn't something I initially covered, though if we admit that alignment is achievable, and we only have the question over whether alignment is stable, then in my model we've won almost all the value, as my threat model is closer to "We want good, capable AGI, but we can't get it because aligning it is very difficult."

So I think alignment was the load-bearing part of my model, and thus we have much lower p(Doom), more like 0.1-10% probability.

So once we've eliminated or reduced deceptive alignment and outer alignment issues, there's not much else to do but turn on the AGI.

This is an argument for feasibility of making the first AGIs aligned. Which doesn't make them safer than humans, able to avoid/prevent building of the actually dangerous AGIs with different designs shortly thereafter.

Hot take - we've been in denial for several decades now about a deep, nagging epistemological crisis. If the "AI disaster" was your pipes breaking, and you filed a claim with your insurance company about it, they'd deny it as being the result of wear and tear.

Human knowledge long ago passed the point where it was possible for a single person to understand significant pieces of it, operationally. The level of trust that's required to function is terrifying. Ai does all that - faster.