I started a draft of this post some days ago, but then a lot of things happened and so I'm rewriting it from scratch. Most importantly, a TIME editorial in which Eliezer Yudkowski talks of bombing AI training data centres happened, which pretty much both makes AI doom discourse fairly mainstream and throws a giant rock through the Overton window on the topic. This has elicited some ridicule and enough worry that a few clarifications may be needed

A lot of the discourse here focuses usually on alignment: is it easy, is it hard, what happens if we can't achieve it, and how do we achieve it. I want to make a broader point that I feel like might not have received as much focus. Essentially, my thesis is that from the viewpoint of the average person, developing and deploying agentic AGI at all might be viewed as a hostile act. I think that the AI industry may be liable of being regulated into economic unviability and/or that MAD-like equilibria as the one Eliezer suggested might form not because everyone is scared of unaligned AGI, but because everyone is almost equally scared of aligned AGI, and not for no reason. As such, I think that perhaps the sanest position to take in the upcoming public debate as AI-safety-minded people is "We should just not build AGI (and rather focus on more specialised, interpretable, non-agentic AI tools that merely empower humans, but leave the power to define their goals always firmly into our hands)". In other words, I think at this stage of things powerful friendly AGI is simply a mirage that holds us back from supporting the solutions that would have the best chance of success and makes us look hostile or untrustworthy to a large part of the public, including potential allies.

The fundamental steps for my thesis are:

  1. building AGI probably comes with a non-trivial existential risk. This, in itself, is enough for most to consider it an act of aggression;
  2. even if the powerful AGI is aligned, there are many scenarios in which its mere existence transforms the world in ways that most people don't desire or agree with; whatever value system it encodes gets an immense boost and essentially Wins Culture; very basic evidence from history suggests that people don't like that;
  3. as a result of this, lots of people (and institutions, and countries, possibly of the sort with nukes) might turn out to be willing to resort to rather extreme measures to prevent an aligned AGI take off, simply because it's not aligned with their values.

Note that actually 2 and 3 can be valid even if for whatever reason AGI doesn't trigger a take off that leads to intelligence explosion and ASI. The stakes are less extreme in that case but there are still lots of potentially very undesirable outcomes which might trigger instability and violent attempts to prevent its rise.

I'll go through the steps one by one in more detail.

Non-aligned AGI is bad

First comes, obviously, the existential risk. This one I think is pretty straightforward. If you want to risk your life on some cockamamie bet that will make you a fortune if you win, go ahead. If you want to also risk my life on the same bet that will make you a fortune, we may need to have words. I think that is a pretty sound principle that even the most open-minded people on the planet would agree with. There's a name for what happens when the costs of your enterprise fall on other people, even just on expectation: we call them negative externalities.

"But if I won the bet, you would benefit too!," you could say. "Aligned AGI would make your life so much better!". But that doesn't fly either. First, if you're a for-profit company trying to build AGI it still seems like even success will benefit you far more than me. But more importantly, it's just not a good way to behave in general. I wear glasses; I am short-sighted. If you grabbed me by force in a dark alley, drugged me, then gave me LASIK while I am unconscious, I wouldn't exactly be happy with it as long as it turned out fine. What if the operation went badly? What if you overdosed me with the anaesthetic and killed me? There are obvious reasons why this kind of pseudo-utilitarian thinking doesn't work, mainly that however positive the outcomes on my material well-being, simply by doing that you have taken away my ability to choose for myself, and that is in itself a harm you visited upon me. Whether things go well or badly doesn't change that.

If you bet on something that could cause the destruction of the world, you are betting the lives of every single living being on this planet. Every old man, every child, every animal. Everyone who never even heard of AI or owned a computer, everyone who never asked you to do this nor consented to it but was thrown on the plate as wager regardless. You are also risking the destruction of human heritage, of the biosphere and of its potential to ever spawn intelligent life again, all things that many agree have intrinsic value above and beyond that of even our own personal survival (if I had to die, I'd rather do so knowing the rest of humanity will live; if I had to die along with the rest of humanity, I'd rather do so knowing that at least maybe one day something else will look at the ruins we left behind and wonder, and maybe think of us). That is one mighty hefty bet.

But I am no deontologist, and I can imagine that perhaps the odds of extinction are low enough, and the benefits of winning the bet so spectacular, that maybe you could make a case that they offset that one harm (and it's a big harm!) and make it at best a necessary evil. Unfortunately, I really don't think that's the case, because...

Aligned AGI is not necessarily that good either

If you want to change the world, your best bet is probably to invent something useful. Technology gets to change the world even from very humble beginnings - sometimes a few people and resources are enough to get the ball rolling, and at that point, if the conditions are right, nothing can stop it. Investors will sniff the opportunity and fund it, early adopters will get into it for the advantage it gives them; eventually it spreads enough that the world itself reshapes around the new thing, and the last holdouts have to either adapt or be left hopelessly behind. You could live in 1990 without internet, but in 2023 you would likely have a trouble finding a job, a house or a date without it. Moloch made sure of that.

Summoning Moloch to change the world on your behalf is a seductive proposition. It is also a dangerous one. There is no guarantee that the outcomes will be precisely what you hoped for regardless of your intentions; there is no guarantee that the outcomes will be good at all, in fact. You may as well just trigger a race to the bottom in which any benefits are only temporary, and eventually everything settles on an equilibrium where everyone is worse off. What will it be, penicillin or nuclear weapons? When you open your Pandora's Box, you've just decided to change the world for everyone, for good or for bad, billions of people who had absolutely no say in what now will happen around them. We can't just hold a worldwide referendum every time we want to invent something, of course, so there's no getting around that. But while your hand is on the lid, at least, you ought to give it a think.

AGI has been called the last invention that humanity will ever need to make. It is very appropriate thus that it comes with all these warnings turned up to eleven: it promises to be more transformative than any other invention, and it promises to spread more quickly and more finally than any other invention (in fact, it would be able to spread itself). And if you are the one creating it, you have the strongest and possibly last word on what the world will become. Powerful AGI isn't like any other invention. Regular inventions are usually passive tools, separate from the will of their creator (I was about to make a snide remark about how the inventor of the guillotine died by it, but apparently that's a myth). Some inventions, like GMOs, are agents in their own way, but much less smart than us, and so we engineer ways to control them and prevent them from spreading too much. AGI however would be a smart agent; aligned AGI would be a smart agent imbued with the full set of values of its creator. It would change the world with absolutely fidelity to that vision.

Let's go over some possible visions that such an AGI might spread into the world:

  • the creator is an authoritarian state that wants to simply rule everything with an iron fist;
  • the creator is a private corporation that comes up with some set of poorly thought out rules by committee that are mostly centred around its profit;
  • the creator is a strong ideologue who believes imposing their favourite set of values on everyone on Earth will be the best for everyone regardless of their opinion;
  • the creator is a genuinely well-intentioned person who only wishes for everyone to have as much freedom as allowed, but regardless of that has blind spots that they fail to identify and that slip their way into the rules;
  • the creator is a genuinely well-intentioned person who somehow manages the nigh-superhuman task of coming up with the minimal and sufficient set of rules that do indeed satisfy optimally everyone's preferences to such a degree that it offsets any harms done in the process of unilaterally changing the world.

I believe some people might class some of these scenarios as cases of misalignment, but here I want to stress the difference between not being able to determine what the AI will do, and being able to determine it but just being evil (or incompetent). I think we can all agree that the last scenario feels the one possible lucky outcome at the end of a long obstacle course of pit falls. I also suspect (though I've not really tried to formalize it) that there is a fundamental advantage in trying to encode something as simple and lacking nuance as "make Dave the God-King of Earth and execute his every order, caring for no one else" than something much more sophisticated, which gives to the worst possible actors another leg up in this race (Dave of course might then paperclip the Earth by mistake by giving a wrongly worded order, which makes the scenario even worse).

So from my point of view, as a person who's not creating the AGI, many aligned AGI scenarios might still be less than ideal. In some cases, the material benefits might be somehow lessened by these effects, but not so much that the outcome still isn't a net positive for me (silly example: in the utopia in which I'm immortal and have all I wish for, but I am no longer allowed to say the word "fuck", I might be slightly miffed but I'll take what I get). In other cases the restrictions might be so severe and oppressive that, to me, they essentially make life a net negative, which would actually turn even immortality into a curse (not so silly example: in the dystopia in which everyone is a prisoner in a fascistic panopticon there might be no escape at all from compliance or torture). Still, I think that on the net, me and most people reading this would overall be more ok than not with most of the non-blatantly-oppressive varieties of this sort of take off. There's a lot of the oppressive variety ones, though, and my guess is that they are more likely than the other kind (both because many powerful actors lack the insight and/or moral fibre to actually succeed at creating a good one, and because the bad ones might be easier to create).

It gets even worse, though. Among relatively like-minded peers, we might at least roughly agree on which scenarios count as bad and which as good, and perhaps even on how likely the latter are. But that all crumbles on a global scale, because in the end...

People are people

“It may help to understand human affairs to be clear that most of the great triumphs and tragedies of history are caused, not by people being fundamentally good or fundamentally bad, but by people being fundamentally people.”

Good Omens

Suppose you had your aligned powerful AGI, ready to be deployed and change the world at the push of a big red button. Suppose then someone paraded in front of you each and every one of the eight billion people in this world, explained them calmly the situation, what would happen if you push the button, then gave them a gun and told them that if they want to stop you from pushing the button, the only way is to shoot you, and they will suffer no consequences for it. You're not allowed to push the button until every single last person has left.

My guess is that you'd be dead before the hundredth person.

I'd be very surprised if you reached one thousand.

There are A Lot of cultures and systems of belief in this world[1]. Many of these are completely at odds with each other on very fundamental matters. Many will certainly be at odds with yours in one way or the other. There are people who will oppose making work obsolete. There are people who will oppose making death obsolete. Lots of them, in fact. You can think that some of these beliefs are stupid or evil, but that doesn't change the fact that they think the same of yours, and will try to stop you if they can. You don't need to look much in history to see how many people have regularly put their lives on the line, sometimes explicitly put them second, when it came to defending some identity or belief they really held dearly onto; it's a very obvious revealed preference. If you are about to simply override all those values with an act of force, by using a powerful AGI to reshape the world in your image, they'll feel that is an act of aggression - and they will be right.

There are social structures and constructs born of these beliefs. Religions, institutions, states. You may conceptualize them as memetic superorganisms that have a kind of symbiotic (or parasitic) relationship with their human hosts. Even if their hosts might be physically fine, your powerful AGI is like a battery of meme-tipped ICBMs aimed to absolutely annihilate them. To these social constructions, an aligned AGI might as well be as much of an existential threat as a misaligned one would be, and they'll react and defend themselves to avoid being destroyed. They'll strike pre-emptively, if that's the only thing they can do. Even if you think that the people might eventually grow to like the post-singularity state of affairs, they won't necessarily be of that opinion yet before, because they believe strongly in the necessity and goodness of those constructs, and that's all that matters.

If enough people feel threatened enough, regardless of whether the alignment problem was solved, AGI training data centres might get bombed anyway.

I think we're beginning to see this; talk of AGI has already started taking on the tones of geopolitics. "We can't let China get there first!" is a common argument in favour of spurring a faster race and against slowing down. I can imagine similar arguments on the other side. To the democracy, the autocracy ruling the world would be a tragedy; similarly to the autocracy, democracy winning would be equally repulsive. We might think neither outcome is worth destroying the world over, but that's not necessarily a shared sentiment either; just like in the Cold War someone might genuinely think "better dead than red".

I'm not saying here that I have no opinion, that I think all value systems are equally valid, or any other strawman notion of perfect centrism. I am saying it doesn't much matter who's right if all sides feel cornered enough and are armed well enough to lash out. If you start a fight, someone else might finish it, and seeking to create powerful AGI is effectively starting a fight. Until now it seems to me like the main plan from people involved in this research has been "let's look like a bunch of innocuous overenthusiastic nerds tinkering with software right until the very end, when it's conquerin' the world time... haha just kidding... unless...", which honestly strikes me as offensively naïve and more than a bit questionable. But that ship might as well have sailed for good. Now AI risk is in the news, Italy bans ChatGPT over privacy concerns (with more EU countries that might follow) and people are pushing the matter to the Federal Trade Commission. If anyone had been sleeping until now, it's wake up time.

Not everyone will believe that AGI can trigger an intelligence explosion, of course. But even if for some reason it didn't, it might still be enough to create plenty of tensions, externally and internally. From an international viewpoint, a country with even just regular human-level AGI would command an immense amount of cognitive labour, might field an almost entirely robotic army, and perhaps sophisticated intelligent defence systems able to shield it effectively from a nuclear strike. The sheer increase in productivity and the intelligence available would be an unsurmountable strategic and economic advantage. On the internal front, of course, AGI could have a uniquely disruptive impact on the economy; automation has a way of displacing the freed labour towards higher tasks, but with AGI, there would be no task left to displace workers to. The best value a human worker might have left to offer would be that their body is still cheaper than a robot's, and that's really not a great bargaining position. A country with "simple" human level AGI thus might face challenges both on the external and internal fronts, and those might materialize even before AGI itself does. The dangers would be lesser than with superintelligence, but the benefits would be proportionally reduced too, so I think it still roughly cancels out.

I don't think that having a peaceful, coordinated path to powerful aligned AGI is completely hopeless, overall. But I just don't think that as a society we're nearly there yet. Even beyond the technical difficulties of alignment, we lack the degree of cooperation and harmonization on a global scale that would allow us to organize the transition to a post-ASI future with enough shared participation that no one feels like they're getting such a harsh deal they'd rather blow everyone up than suffer the future to come. As things stand, a race to AGI is a race to supremacy: the only way it ends is either with everyone dead, suppression of one side (if we're lucky, via powerful aligned AGI, if we're not, via nuclear weapons), or with all sides begrudgingly acknowledging that the situation is too dangerous for all involved and somehow slowly deescalating, possibly leading to a MAD-like equilibrium in which AGI is simply banned for all parties involved. The only way to accept you can't have it, after all, is if no one else can have it either.

Conclusion

The usual argument from people who are optimistic about AGI alignment is that even if there's <insert percentage> of X-risk, the upsides in case of success are so spectacular they are worth the risk. Here I am taking a bit more of a sombre view suggesting that if you want to weigh the consequences of AGI you also have to consider harms to the agency of many people who would be impacted by it without having had a say in its creation. These harms might be so acute that some people might expect an AGI future to be a net negative for them, and thus actively seek to resist or stop the creation of AGI; states might get particularly dangerous if they do feel existentially threatened by it. This then compounds to the potential harms of AGI for everyone else, since if you get caught in a nuclear strike before it's deployed you don't get to enjoy whatever comes afterwards anyway.

As AGI discourse becomes more mainstream, it's important to appreciate perspectives beyond our own and not fall in the habit of downplaying or ignoring them. This is necessary both morally (revealed preferences matter and are about the only window we have in other people's utility!) and strategically: AI research and development still exists embedded in the social and political realities of this world, however much it may wish to transcend them via a quick electronic apotheosis.

The good news is that if you believe that AI will likely destroy the world, this actually opens a possible path to survival. Everyone's expectation on AI's value will be different, but it's becoming clear that many, many people see it as a net negative. In general people place themselves at different spots of the "expected AI power" axis based on their knowledge, experience, and general feelings; some don't expect AI to get any worse than a tool to systematically concentrate value produced by individuals (e.g. art) into the hands of corporations via scraping and inference training. Other fear its misinformation potential, or its ability to rob people of their jobs on a massive scale, or its deployment as a weapon of war. Others believe its potential to be great enough to eventually be an extinction event. Some worry about AI being out of control, others about it being controlled far too well but for bad goals. Different expected levels of power affect people's expectations about how much good or bad it can do, but in the end, many seem to then fall on the belief that it will still cause mostly bad, not because of technical reasons involved in the AI's workings but because the social structures within which the AI is being created don't allow for a good outcome. The same holds for powerful AGI: aligning it wouldn't just be a prodigious technical challenge, but a social one on a global scale. Trying to race to it as a way to etch one's supremacy into eternity is just about the worst reason and the worst way to go about it. We should be clear about this to both others and ourselves, avoid the facile trap of hoping for an outcome so arbitrarily good that it offsets entirely its improbability, and focus on a more realistic short term goal and path for humanity. We're not yet quite willing or ready to hand off the reins our future to something else, and perhaps we may never be. 

 

  1. ^

    Citation needed

New Comment
30 comments, sorted by Click to highlight new comments since: Today at 9:51 AM

It feels like the most basic version of the argument also applies to industrialization or any technology that changes the world (and hastens further changes). If you set aside alignment concerns, I don't think it's obvious that AGI is fundamentally different in the way you claim---Google could build an AGI system that only does what Google wants, but someone else can build AGI-as-a-service that does whatever the customer wants, and customers will buy that one. I think it is much more robust to make sure someone builds the good AGI product rather than to try to prevent anyone from building the bad AGI product.

On the substance I'm skeptical of the more general anti-change sentiment---I think that technological progress has been one of the most important drivers of improving human conditions, and procedurally I value a liberal society where people are free to build and sell technologies as long as they comply with the law. In some sense it might be right to call industrialization an act of aggression, but I think the implied policy response may have been counterproductive.

Probably this comes down to disagreements about the likely dynamics of an intelligence explosion. I think I'm representing something closer to the "mainstream" opinion on this question (in fact even my view is an outlier), and so someone making the general case for badness probably needs to focus on the case for fast takeoff.

In my view, to the extent that I think AGI should be developed more slowly, it's mostly because of distinctive features of AGI.  I think this is consistent with what you are saying in the article, but my own thinking is focused on those distinctions.

The two distinctions I care most about are:

  1. To effectively govern a world with AI, militaries and law enforcement will need to rely extensively on AI. If they don't they will not remain competitive with criminals or other states who do use the technology. This is also true for idnustrialization.

    If AI isn't suitable for law enforcement or military use, then it's not a good idea to develop it full stop. Alignment is the most obvious problem here---automated militaries or law enforcement pose an unacceptable risk of coup---but you could also make reasonable arguments based on unreliability or unpredictability of AI. This is the key disanalogy with industrialization.

    On this framing me developing AGI is primarily an "act of aggression" to the extent that you have good reasons not to develop or deploy AGI. Under those conditions you may (correctly) recognize that a large advantage AI is likely to put me in a position where I can later commit acts of aggression and you won't have the infrastructure to defend yourself, and so you need to take steps to defend yourself at this earlier stage.

    You could try to make the more general argument that any technological progress by me is a potential prelude to future aggression, whether or not you have good reasons not to develop the technology yourself. But in the more general context I'm skeptical---competing by developing new technologies is often good for the world (unlike central examples of aggression), trying to avoid it seems highly unstable and difficult, the upside in general is not that clear, and it acts as an almost fully general pretense for preemptive war.
     
  2. AI is likely to move fast. A company can make a lot of money by moving faster than its competition, but the world as a whole gains relatively little from the acceleration. At best we realize the gains from AI years or even just months sooner. As we approach the singularity, the room for acceleration (and hence the social benefits from faster AI) converge to zero.  No matter how cool AI is, I don't think getting them slightly sooner is "spectacular." This is in contrast to historical technologies, which are developed over decades and centuries, where a 10% slowdown could easily delay new technologies by a whole lifetime.

    Meanwhile, the faster the technology moves the more severe the unaddressed governance problems are, and so the larger the gains from slowing down become. These problems crop up everywhere: our laws and institutions and culture just don't have an easy time changing fast enough if a technology goes from 0 to transformative over a single decade.

    On an econ 101 model, "almost no social value from accelerating" may seem incongruent with the fact that AI developers can make a lot of money, especially if there is a singularity. The econ 101 reconciliation is that having better AI may allow you to win a war or grab resources in space before someone else gets to them---if we had secure property rights, then all the economic value would flow to physical resources rather than information technology, since it won't be long before everyone has great AI and people aren't willing to pay very much to get those benefits slightly sooner.

    So that's pretty fundamentally different from other technologies. It may be that in the endgame people and states are willing to pay a large fraction of their wealth to get access to the very latest AI, but the only reason is that if they don't then someone else will eat their lunch. Once we're in that regime, we have an obvious collective rationale to slow down development.

    (This argument only applies to modest slowdowns, that remain small relative to consumers' impatience. If you are considering 6 months of slowdown the social cost can easily be offset by small governance improvements; if you are considering 10 years I think you can only justify the cost by appealing to potential catastrophic or irreversible harms.)

On the substance I’m skeptical of the more general anti-change sentiment—I think that technological progress has been one of the most important drivers of improving human conditions, and procedurally I value a liberal society where people are free to build and sell technologies as long as they comply with the law.

I'm pretty conflicted but a large part of me wants to bite this bullet, and say that a more deliberate approach to technological change would be good overall, even when applied to both the past and present/future. Because:

  1. Tech progress improving human conditions up to now depended on luck, and could have turned out differently, if for example there was some tech that allowed an individual or small group to destroy the world, or human fertility didn't decrease and Malthusian dynamics kept applying.
  2. On some moral views (e.g. utilitarianism), it would be worth it to achieve a smaller x-risk even if there's a cost in terms of more time humanity spends under worse conditions. If you think that there's 20% x-risk on the current trajectory, for example, why isn't it worth a general slowdown in tech progress and associated improvements in human conditions to reduce it to 1% or even 10%, if that was the cost? (Not entirely rhetorical. I genuinely don't know why you'd be against this.)

Thanks for the answer! To be honest when I wrote this I mostly had in mind the kind of winner-takes-all intelligence explosion scenarios that are essentially the flip side of Eliezer's chosen flavour of catastrophe: fast take-off by a FAI, pivotal act or what have you, and essentially world conquest. I think if the choice boiled down to those two things (and I'm not sure how can we know unless we manage to have a solid theory of intelligence scaling before we have AGI), or rather, if states believed it did, then it'd really be a lose-lose scenario. Everyone would just want not only FAI, but their FAI, probably couldn't really tell an UFAI from a FAI anyway (you have to trust your enemy to not having screwed up essentially), so whatever the outcome, nukes fly at the first sign of anomalous activity.

If we tone it down to a slower take off I agree with you the situation may not be so dramatic - yes, everything that gives one country a huge asymmetrical advantage in productivity would translate also into strategic dominance, but it's a fact that (be it out of rational self-preservation over dedication to the superorganism country, or be it out of irrational behaviour dominated by a forlorn hope that if you wait it out there'll be better times to do something later) usually even enemies don't straight up launch deadly attacks in response to just that.

I think however there is a third fundamental qualitative difference between AGI and any other technology, including specialised AI tools such as what we have now. Specialised tools only amplify the power of humans, leaving all value decisions in their hands. A single human has still limited physical and cognitive power, and we need to congregate in groups to gain enough force for large scale action. Since different people have different values and interests, groups need negotiation and a base of minimum shared goals to coalesce around. This produces some disunity that requires management, puts a damper on some of the most extreme behaviours (as they may splinter the moderates from the radicals), and overall defines all the dynamics of every group and organisation. Armed with spears or with guns, an army still runs on its morale, still needs trust in its leaders and its goals, still can mutiny.

By comparison, an army (or a company) of aligned AGIs can't mutiny. It is almost the same as an extension of the body and mind of its master (to whom we assume it is aligned). The dynamics of this are radically different, akin to the power differential introduced by having people with superpowers or magic suddenly appear. Individuals commanding immense power, several orders of magnitude above that of a single human, with perfect fidelity, no lossiness.

I agree that technology has been good by improving standards of life, but I have to wonder - how much of this was because we truly held human flourishing as a terminal value, and how much simply because it was instrumental to other values (e.g. we needed educated workers and consumers so that current mass industrialised society could work at all)? After all, "setting up incentives so that human flourishing is an instrumental value to achieving selfish terminal values" is kind of the whole sleight of hand of capitalism and the free market - "it is not from the benevolence of the butcher, the brewer, or the baker that we expect our dinner, but from their regard to their own self-interest". With AGI (and possibly robotics, which we might expect would follow if you put a million instances of AGI engineers on the case) human flourishing and economic self-interest of their owners might be entirely and fundamentally decoupled. If that were the case, then the classic assumption that technology equals flourishing could stop being true. That would make things very painful, and possibly deadly, even if AGI itself happened to not be the threat.

(This argument only applies to modest slowdowns, that remain small relative to consumers' impatience. If you are considering 6 months of slowdown the social cost can easily be offset by small governance improvements; if you are considering 10 years I think you can only justify the cost by appealing to potential catastrophic or irreversible harms.)

I'm not really thinking of a slowdown as much as a switch in focus (though of course right now that would also mean a slowdown, since we'd have to rewind the clock a bit). More focus on building systems bottom-up instead of top-down and more focus on building systems that are powerful but specialised rather than general and agentic. Things like protein folding prediction AIs would be the perfect example of this sort of tool. Something that enables humans workers to do their work better and faster, surpassing the limits of their cognition, without making them entirely redundant. That way we both don't break the known rules of technological innovation and we guarantee that values remain human-centred as always. It might be less performant than fully automated AGI loops but it vastly gains in safety. Though of course, seeing how even GPT-4 appears mildly agentic just when rigged into a self-loop with LangChain, I have to wonder whether you can ever keep those two areas separate enough.

This might be a bit off topic for the focus of your response. I actually agree that deployment of AGI won't be seen as an act of aggression. But I think it probably should be, if other actors understand the huge advantage that first movers will enjoy, and how tricky a new balance of power will become.

By setting aside alignment concerns entirely, you're assuming for this scenario that not only is alignment solved, but that solution is easy enough, or coordination is good enough, that every new AGI is also aligned. I don't think it's useful to set the issue that far aside. Eventually, somebody is going to screw up and make one that's not aligned.

I think a balance of power scenario also requires many AGIs to stay at about the same level of capability. If one becomes rapidly more capable, the balance of power is thrown off.

Another issue with balance-of-power scenarios, even assuming alignment, is that eventually individuals or small groups will be able to create AGI. And by eventually, I mean at most ten years after states and large corporations can do it. Then a lot of the balance of power arguments don't apply, and you're more prone to having people do truly stupid or evil (by default ethical standards) things with their personally-aligned AGI.

Most of the arguments in Steve Byrnes excellent What does it take to defend the world against out-of-control AGIs? apply to hostile actions from sane state and corporate actors. Even more apply to non-state actors with weirder goals. One pivotal act he doesn't mention is forming a panopticon, monitoring and de-encrypting every human communication for the purpose of preventing further AGI development. Having this amount of power would also enable easy manipulation and sabotage of political systems, and it's hard to imagine a balance of power where on corporation or government enjoys this power.

Thanks for writing this. I had in mind to express a similar view but wouldn’t have expressed it nearly as well.

In the past two months I’ve gone from over the moon excited about AI to a deep concern.

This is largely because I misunderstood the sentiment around super intelligent AGI .

I thought we were on the same page about utilizing narrow LLM’s to help us solve problems that plague society (ie protein folding.) But what I see cluttered on my timeline and clogging the podcast airwaves was the utter delight at how much closer we are to having an AGI some 6-10x human intelligence.

Wait What? What did I miss. I thought that kind of rhetoric was isolated to the at worst ungrounded in reality LCD user and at best the radical Kurzweil types. I mean listen to us, are we really needing to argue about what percentage the risk is that human life gets exterminated by AGI?

Let me step off my soap box and address a concern that was illuminated in this piece and one that the biggest AGI proponents should at least ponder.

The concern has to do with the risks of hurting innocent bystanders that won’t get to make the choice about integrating AGI into the equation. Make no doubt, AGI both aligned and non aligned will likely cause an immense disruption on the part of billions of people. At the low scale displacing jobs and at the high getting murdered by an unaligned AGI. We all know about the consequences of the Industrial Revolution and job displacement but we look back at historical technological advances with appreciation that they lead us to where we are. But are you so sure that AGI is just the next step in that long ascension? To me it looks not to be. In fact AGI isn’t at all what people want. What we are learning about happiness is that work is incredibly important.

You know who isn’t happy? Retired and/or elderly who find themselves with no role in society and an ever increasing narrowing of friends and acquaintances.

“They will be better with AGI doing everything, trust me, technological progression always enhances”

Are you sure about that? I have so many philosophical directions I could go to disprove this (happiness is less choice not more) but I will get to the point which is:

You don’t get to decide. Not this time anyway.

It might be worth mentioning the crypto decentralization movement is the exact opposite of AGI. if you are a decentralization enthusiast who wants to bring power away from a centralized few then you should be ashamed to support the AGI premise of a handful of people modifying billions of life’s without their consent.

I will end with this. Your hand has been played. The AGI enthusiasts have revealed their intentions and it won’t sit well with basically…everyone. Unless AGI can be attained in the next 1-2 years it’s likely to see one of the biggest push backs our world has ever witnessed. Information spreads fast and you’re already seeing the mainstream pick up on the absurdity of pursuing AGI and. when this technology starts disrupting people’s lives get ready for more than just regulation.

Let’s take a deep breath. Remember AI is to solve problems and life’s tragedies, not create them.

You know who isn’t happy? Retired and/or elderly who find themselves with no role in society and an ever increasing narrowing of friends and acquaintances.

I actually have a theory about this thing which I will probably write my next post on. I think people mix up different things in the concept of "work" and that's how we get these contradictory impulses. I also think this is relevant to concepts of alignment.

there are multiple connectionism researchers on this forum alone who have claimed to me that they think they know how to build it with less resources, and unless they respond to this message to clarify, I see no reason to believe they're intending to stop trying. I agree that this is potentially an act of catastrophic aggression but... if many people can make nukes in their custom mini-datacenter without it even being detectable in principle, then what?

in general don't rely on a centralized scaling bottleneck sticking around.

actually I think I'll name one.

@Nathan Helm-Burger thoughts? I ping you rather than the others because I expect you to be by far the most careful of the ones I know about.

Thanks Gears. Yeah, I'm quite confident that we cannot rely on scaling being a mandatory part of advances in order to monitor the AGI landscape. I believe dangerously potent advances can come from small research groups working with garage-hobbyist compute. I also think that scaling without algorithmic innovation can also lead to dangerously potent advances. And some combination of the two can also work. There are many paths to AGI, and we can't effectively ward against it by guarding just the most expensive and obvious paths while ignoring the darker quieter paths out of the limelight.

I believe dangerously potent advances can come from small research groups working with garage-hobbyist compute.

Can I get some insights on this? My feeling is that while the human brain runs on very little energy, so does a LLM in operation, right now. The training of an LLM is the energy consuming part, but the human equivalent of that would be the (much more expensive!) years and years of learning and education of all sorts needed before becoming a competent adult. Do we have any stronger arguments to believe that there are theoretically much lower (or no bounds at all!) to training even a full AGI?

I think there are lower bounds in practice. Unsure if in theory. My personal estimate (which I outline in a report I've decided not to share because it gives too clear a roadmap), based on extensive research and months of careful thought and discussion with multiple experts, is that the practical lower bound on current technology is somewhere around 5-10k USD. I expect this practical lower bound to drop by a factor of about 10x over the next 5 years.

That's a lower bound given a very high level of algorithmic improvements. A more moderate level of algorithmic improvements would get you to something more like 100-200k USD.

Here's one insight for you. The human brain is forced to start from mostly scratch, just a bunch of genetically-hardwired long range connections with random local connections. Neural networks can be initialized. The better we are able to interpret and distill existing large powerful neural nets, the more easily we can cheaply initialize smaller cheaper neural nets to jump-start them. Look at the recent news around the Alpaca model. That's just the beginning. This paradigm can be taken further.

Are AI scientists that you know in a pursuit for AGI or more powerful narrow AI systems?

As someone who is knew to this space I’m trying to simply wrap my head around the desire to create AGI, which could be intensely frightening and dangerous to the developer of such system.

I mean not that many people are hell bent on finding the next big virus or developing the next weapon so I don’t see why AGI is as inevitable as you say it is. Thus I suppose developers of these systems must have a firm belief there are very little dangers attached to developing a system some 2-5x general human intelligence.

If you happen to be one of these developers could you perhaps share with me the thesis behind why you feel this way or at least the studies, papers, etc that gives you assurance what you’re doing is largely beneficial to society as a whole and safe.

There are a lot of groups pursuing AGI. Some claiming that they are doing so with the goal of benefiting humanity, some simply in pursuit of profit and power. Indeed, the actors I personally am most concerned about are those who are relatively selfish and immoral as well as self-confident and incautious, and sufficiently competent to at least utilize and modify code published by researchers. Those who think they can dodge or externalize-to-society the negative consequences and reap the benefits, who don't take the existential risk stuff seriously. You know what I mean. The L33T |-|ACKZ0R demographic.

I don't personally work in AI. But Open AI for example states clearly in its own goals that they aim at building AGI, and Sam Altman wrote a whole post called "Moore's Law for Everything" in which he outlines his vision for an AGI future. I consider it naïve nonsense, personally, but the drive seems to be simply the idea of a utopian world of abundance and technological development going faster and faster as AGI makes itself smarter.

EDIT: sorry, didn't realise you weren't answering to me, so my answer doesn't make a lot of sense. Still, gonna leave it here.

Even if someone create friendly AI, he has to use it to take the world in order prevent creation of other AIs. Thus such AI has to be a weapon and its use is act of war from the point of view of others. 

Yes, precisely. Therefore if I know you're creating friendly AI and you're going to use it to take over the world I'm motivated to stop you, especially if I don't think actually AGI takeover is a big deal, or if I think that I'd rather die than submit to you. What would the USA do if they knew for sure China was about to deploy generally FAI that is aligned to the CCP's values?

The notion that it's just fine to try creating world-conquering FAI for the sake of a pivotal act completely ignores these second-order effects. It's not fine to create UFAI because it kills everyone, and it's not fine to create world-conquering FAI because no one wants to be conquered, and many would rather die than be conquered by you, and they will try to kill you before you can. Hence you don't get to deploy FAI either (also, I'd argue, world conquest by force is just not a very ethical goal to pursue unless the only alternative truly is extinction by UFAI. But if you just made it so that FAI is possible then multiple FAIs are possible, each with different values, therefore extinction is not the only option on the table any more, and the rest is just usual territorial ape business).

Essentially, at some point you gotta do the hard work and cooperate, or play evil overlord I guess. But if your public stance is "I am going to play evil overlord because it's the only way I know to actually save humanity," then well, I don't think you can expect a lot of support. This sort of thing risks actually eroding credibility of the threat of UFAI in the first place, because people will point at you and say "see, he made up this unbelievable threat as an excuse to rule over us all", and thus believe the threat must be fake and your motives impure.

Yes, so even if you creating friendly AI other will try to airstrike you. I wrote about it here Military AI as a Convergent Goal of Self-Improving AI.

Sure: so EY arguing for a multinational agreement to just stop AGI altogether and airstrike data centres isn't that radical at all, it's actually a formalization of that equilibrium which might both keep the peace and save the world. If there is no creating UFAI without destroying the world, and there is no creating FAI without starting WW3, options are limited.

(that said, I don't think a "pivotal act" requires what we would call world conquest in every possible world. In a world in which UFAI can indeed kill everyone via nanotech, similarly FAI can disseminate guardian nanomachines ready to short-circuit any nascent UFAI if the need arises and do absolutely nothing else. That would be a pretty peaceful and unobtrusive pivotal act)

- building AGI probably comes with a non-trivial existential risk. This, in itself, is enough for most to consider it an act of aggression;

1. I don't see how aligned AGI comes with existential risk to humanity. It might come as existential risk to groups opposing the value system of the group training the AGI, this is true. For example Al-Kaida will view it as existential risk to itself, but there is no probable existential risk for the groups that are more aligned with the training. 

2. There are several more steps from aligned AGI to existential risk to any group of people. You don't only need an AGI, but you need to weaponize it, and promote physical presence that will monitor the execution of the value system of this AGI. Deploying an army of robots that will enforce a value system of an AGI, is very different from just inventing an AGI. Just like bombing civilians from planes, is very different from inventing flight or bombs. We can argue where the aggression act takes place, but most of us will place it in the hands of people that have the resources to build an army of robots for this purpose, and they invest their resources with the intention of enforcing their value system. Just like Marie Curie can't be blamed for an atomic weapon, and her discovery is not an act of aggression, the Wright brothers can't be blamed for all the bombs dropped on civilians from planes. 

3. I would expect most deployed robots based on AGI, to be of protective nature not aggressive. That means that nations will use those robots to *defend* themselves and their allies from invaders and not attack. So any measure of aggression in the invading sense, of forcing and invading and breaking the existing social boundaries we created, will contradict the majority of humanity values, and therefore will mean this AGI is not aligned. Yes some aggressive nations might create invading AGIs, but they will probably be a minority, and the invention and deployment of an AGI can't be considered by itself an act of aggression. If aggressive people teach an AGI to be aggressive, and not aligned with the majority of humanity which is protective but not aggressive, then this is on their hands, not the AGI inventor. 

- even if the powerful AGI is aligned, there are many scenarios in which its mere existence transforms the world in ways that most people don't desire or agree with; whatever value system it encodes gets an immense boost and essentially Wins Culture; very basic evidence from history suggests that people don't like that;

1. I would argue that initially there would be a lot of different alternatives, all meant to this or that extent to serve the best interest of a collective. Some of the benefits are universal - say people dying of starvation, homelessness, traffic accidents, environmental issues like pollution and waste, diseases, lack of education resources or access to healthcare advice. Avoiding the deployment of an AGI, means you don't care about people which has those problems, I would say most people would like to solve those social issues, and if you don't, you can't force people to continue dying from starvation and diseases just because you don't like an AGI. You need to bring something more substantial, otherwise just don't use this technology.

2. The idea that an AGI is enforced somehow on people to "Win Culture", is not based on anything substantial. Just like any technology, and this is the secret of its success, is a choice. You can go to live in a forest and avoid any technology, and find a like minded Amish inspired community of people. Most people do enjoy technological advancements and the benefits that come with them. Using force based on an AGI is a moral choice, a choice which is made by a community of people training the AGI, and this kind of aggression will most probably be both not popular and forbidden by law. Providing a chatbot with some value system to the contrary is part of freedom of speech. 

3. If by "Win Culture" you mean automating jobs that are done today by hand - I wouldn't call it enforcing a value system. Currently jobs are necessary evil, and are enforced on people to otherwise not be able to get their basic needs met. Solving problems, and stopping forcing people to do jobs most of them don't like, is not an act of aggression. This is an act of kindness that stops the current perpetual aggression we are used to. If someone is using violence, and you come and stop him from using violence, you are not committing an act of aggression, you are preventing aggression. Preventing the act of aggression might be not desired by the aggressor, but we somehow learned to deal with people who think they can be violent and try to use force to get what they want. This is a very delicate balance, and as long as AGI services are provided by choice, with several alternatives, I don't see how this is an act of aggression. 

4. If someone "Win Culture" then good for him. I would not say that today's culture is so good, I would bet on superhuman culture to be better than what we have today. Some people might not like it, some people might not love cars and planes, and continue to use horses, but you can't force everyone around you to continue to use horses because sometimes car accidents happens, and you could become a victim of a car accident, this is not a claim that should stop any technology from being developed or integrated into society. 

- as a result of this, lots of people (and institutions, and countries, possibly of the sort with nukes) might turn out to be willing to resort to rather extreme measures to prevent an aligned AGI take off, simply because it's not aligned with their values.

Terrorism and sabotage is a common strategy that can't be eliminated completely, but I would say most of the time it doesn't manage to reach its goals. Why would people try to bomb anything, instead of for example paying money to someone for training an AGI that will be aligned with their values? How is it even concerning an AGI, and not any human community with a different value system? Why do you wait for an AGI for these acts of aggression? If some community doesn't deserve to live in your opinion, you will not wait for an AGI, and if it does - so you learned to coexist with people different than yourself. They will not take over the world, just because they have an AGI. There would be plenty of alternative AGIs, of different strength and trained with different values. It takes time for an AGI to take over the world, a time way longer to reinvent the same technology several times over, and use alternative AGIs that can compete. And as most of us are protectors and not aggressors, and we have established some boundaries balancing our forces, I would expect this basic balance to continue. 

- "When you open your Pandora's Box, you've just decided to change the world for everyone, for good or for bad, billions of people who had absolutely no say in what now will happen around them."

Billions of people have no say today in many social issues. People are dying, people are forced to do labor, people are homeless. Reducing those hazards, almost to zero, is not something we should stop to attempt in the name of "liberty". Much more people suffered a thousand years ago than now. Much of it is due to the development of technology. There is no "only good" technology, but most of us accept the benefits that come with technology over without it. You also can't force other people to stop using technology in order to become more healthy, and risk their life less, or stating that jobs are good even though they are forced on everyone and the basic necessities are conditioned on them. 

I can imagine larger pockets of populations preferring to avoid the use of modern technology like larger Amish inspired communities. This is possible - and then we should respect those people's choices, and avoid forcing upon them our values, and let them live as they want. Yet you can't force people who do want the progress and all the benefits that come with it, to just stop the progress and respect the rights of people who fear it. 

Notice that we are not talking here about development of a weapon, but a development of a technology that promises to solve a lot of our current problems. This at the least, should put you in place of agnostic. That means this is not a trivial decision to take some risks for humanity, to save hundreds of millions of lives, and reduce suffering to an extreme extent never seen before in history. I agree we should be cautious, and we should be mindful of the consequences, but we also should not be paralyzed by fear, we have a lot to lose if we stop and avoid AGI development. 

- aligned AGI would be a smart agent imbued with the full set of values of its creator. It would change the world with absolutely fidelity to that vision. 

A more realistic estimation that many aligned AGIs will change the world to the common denominator of humanity, like reducing diseases, and will continue to keep the power balance between different communities, as everyone would be able to build an AGI with a power proportional to their available resources, just like today there is a power balance between different communities and between the community and the individual. 

Let me take an extreme example. Let's say I build an AGI for my fantasies. But as part of global regulation, I will promise to keep this AGI inside the boundaries of my property. I will not force my vision on the world, I will not want or force everyone to live in my fantasy land. I just want to be able to do it myself, inside my borders, without harming anyone who wants to live differently. Why would you want to stop me? As I see it once again, most people are protectors not aggressors, they want to have their values in their own space, they will not want to forcefully and unilaterally spread their ideas without consent. My home-made AGI will probably be much weaker than any state AGI, so I wouldn't be able to do much harm anyway. Today countries are enforcing their laws on everyone, even if you disagree with some of them, how do you see the future any different? If anything I expect the private spaces to be much more versatile than today, providing more choices and with less aggression than governments do today. 

- the creator is an authoritarian state that wants to simply rule everything with an iron fist;

I agree this is a concern. 

- the creator is a private corporation that comes up with some set of poorly thought out rules by committee that are mostly centred around its profit; 

Not probable. It will more probably be focused on a good level of safety first and then on profit. Corporations are concerned about their image, not to mention the people who develop it, will simply not want to bring an extinction of human race.

- the creator is a genuinely well-intentioned person who only wishes for everyone to have as much freedom as allowed, but regardless of that has blind spots that they fail to identify and that slip their way into the rules;

This doesn't sound like something that is impossible to solve with newer improved versions once the blind spot is discovered. In case of aligned AGI the blind spot will not be the end of humanity, but more likely some bias in the data, misrepresenting some ideas or groups. As long as there is an extremely low probability for extinction, and this property is almost identical with the definition of alignment, the margin of error increases significantly. There was no technology in history we got right from the first attempt. So I expect a lot of variability in AGI, I expect some of them to be weaker or stronger, some of them fit this or that value system of different communities. And I would expect local accidents too, with limited damage, just like terrorists and mass shooters can do today. 

-many powerful actors lack the insight and/or moral fibre to actually succeed at creating a good one, and because the bad ones might be easier to create.

We actually don't need to guess anymore. We have had this technology for a while, the reason it caught on now, and was released only relatively lately - is because without providing ethical standards to those models, the backlash on large corporations is too strong. So even if I might agree that the worst ones are easier to create, and some powerful actors could do some damage, they will be forced by a larger community (of investors, users, media and governments), to invest the effort to make the harder and safer option. I think this claim is true to many technologies today, it's cheaper and easier to make unsafe cars, trains, planes, but we managed to install a regulation procedures, both by government and by independent testers, to make sure our vehicles are relatively safe. 

You can see that RLHF which is the main key to safety today, is incorporated by larger players, and alignment datasets and networks are provided for free and opened to the public exactly for the reason that we all want this technology to mostly benefit humanity. It's possible to add more nation centric set of values that will be more aggressive, or some leader will want to make his countrymen slaves, but this is not the point here. The main idea is that we are already creating mechanism to encourage everyone to easily create pretty good ones as part of our cultural norms and cultural mechanisms that prevent bad AIs from being exposed to the public and come to market to make profit, for further development of even stronger AIs that eventually become an AGI. So although the initial development of AI safety might be harder, it is crucial, it's clear to most of the actors is crucial, and the tools that provide safety will be available and simple to use, thus in the long run creating an AGI which is not aligned, will be harder - because of the social environment of norms and best practices those models were developed with.

- There are people who will oppose making work obsolete. 

Work is forced on us, it's not a choice. Opposing making it obsolete is an obvious act of aggression. As long as it's necessary evil, it has a right to exist, but at the moment you demand other people to work, because you're afraid of technology - you become the cause of a lot of suffering, that could be potentially avoided. 

- There are people who will oppose making death obsolete. 

Death is forced on us, it's not a choice. Opposing making it obsolete is also an act of aggression, against people who are choosing not to die if they don't want to. 

- If you are about to simply override all those values with an act of force, by using a powerful AGI to reshape the world in your image, they'll feel that is an act of aggression - and they will be right.

I don't think anyone forces them to join. As a liberal I don't believe you have the right to come to me and say "you must die, or i will kill you". This is at the least can't be viewed as legitimate behavior that we should encourage or legitimize. If you want to work, you want to die, you want to live in 2017, you have the full right to do so. But wanting to exterminate everyone who is not like you, forcing people to suffer, die, work etc. is an obvious act of aggression toward other people, and should not be legitimized or portrayed as an act of aggression against them. "You don't let me force my values on you" doesn't come out as a legitimate act of self defense. Very reminiscent of Al Bandy, where he claimed in a court a face of his fellow, was in the way of his fist, harming his hand, and demanding compensation. If you want to be stuck in time, and live your life - be my guest, but legitimizing usage of force in order to avoid progress that saves millions, and improves our life significantly can't be justified inside liberal set of values. 

- If enough people feel threatened enough...AGI training data centres might get bombed anyway. 

This is true. And if enough people think it's ok to be extreme Islamist they will be, and even try to build a state like ISIS. The hope is that with enough good reasoning, and with enough rational analysis of the situation, most thinking people will not be threatened, and see the vast potential benefits, enough to not try and bomb the AGI computer centers. 

- just like in the Cold War someone might genuinely think "better dead than red".

I could believe this is possible. But once again most of us are not aggressors, therefore most of us will try to protect our homeland and our way of life, without trying to aggressively propagate it to other places where they have their own social preferences. 

- The best value a human worker might have left to offer would be that their body is still cheaper than a robot's

Do you truly believe that in the world all problems are solved by automation, and full of robots whose whole purpose is to serve humans, people will try to justify their existence by jobs that they can do? And this justification will be that their body has more value than robotic parts? 

I would propose an alternative: in a world where all robots serve humans, and everything is automated, humans will be valued intrinsically, provided with all their needs, and provided with basic income just because they are humans. The default where a human worth nothing without his job will be outdated and seen as we see slavery today. 

--------

In summary I would say one major problem I see through most of your claims: there would be a very limited amount of AGIs, forcing a minority values system upon everyone, expanding aggressively this value system on everyone else who thinks differently. 

I would claim the more probable future is a wide variety of AGIs, each improving slowly in its own past, while all the development teams will both do something unique and learn from the lessons of other teams. For every good technology there comes dozens of copycats, they will all be based on a bit different value system, and with common denominator of trying to benefit humanity, like discovering new drugs, fixing starvation, reducing road accidents, climate change, tedious labor which is basically forced labor. While the common humanity problems will be solved, the moral and ethical variety will continue to coexist with a similar power balance we have today. This pattern of technology influence on society happened throughout all of human history until AGI, and as of today that we know how to align LLMs, this tendency of power balances between nations, and inside each nation is expected to propagate into the world where AGI is available technology to everyone to download and train their own. If AGI will be an advanced LLM we see all those trends today, and they are not expected to suddenly change.

Although it's hard to predict the possible bad or good sides of Aligned AGIs now, it's clear that the aligned networks do not pose a threat to humanity as a whole, leaving a large margin of error. Nonetheless, there remains a considerable risk of amplifying current societal problems like inequality, totalitarianism and wars to an alarming extent.

People who are not willing to be part of the progress, exist today as well, as a minority. If they will become a majority, it's an interesting futuristic scenario, but it's both implausible, and will be immoral to forcefully stop those who do want to use this life saving technology, as long as they don't force anything on those who don't. 

I don't see how aligned AGI comes with existential risk to humanity

I meant as a risk of failure to align, and thus building misaligned AGI. Like, even if you had the best of intention, you've still got to include the fact that risk is part of the equation, and people might have different personal estimates on whether that risk is acceptable for the reward.

the Wright brothers can't be blamed for all the bombs dropped on civilians from planes

Unlike air strategic bombardment in the Wrights' times, things like pivotal acts, control of the future and capturing all the value in the world are routinely part of the AI discussion already. With AGI you can't afford to just invent the thing and think about its uses and ethics later, that's how you get paperclipped, so the whole discussion about the intent with which the invention is to be used is enmeshed from the start with the technical process of invention itself. So, yeah, technologists working on it should take responsibility for its consequences too. You can't just separate the two things neatly, just like if you worked on Manhattan project you had no right claiming Hiroshima and Nagasaki had nothing to do with you. These projects are political as much as they are technical.

That means that nations will use those robots to defend themselves and their allies from invaders and not attack. So any measure of aggression in the invading sense, of forcing and invading and breaking the existing social boundaries we created, will contradict the majority of humanity values, and therefore will mean this AGI is not aligned.

You are taking this too narrowly, just thinking about literal armies of robots marching down the street to enforce some set of values. To put it clearly:

  1. I think even aligned AI will only be aligned with a subset of human values. Even if a synthesis of our shared values was an achievable goal at all, we're nowhere near to having the social structure required to produce it;

  2. I think the kind of strong AGI I was talking about in this post, the sort that basically instantly skyrockets you hundreds of years into the future with incredible new tech, makes one party so powerful that at that point it doesn't matter if it's not the robots doing the oppressing. Imagine taking a modern state and military and dumping it into the Bronze Age, what do you think would happen to everyone else? My guess is that within two decades they'd all speak that state's language and live and breathe their culture. What would make AGI like that deeply dangerous to everyone who doesn't have it is simply the immense advantage it confers to its holder.

Avoiding the deployment of an AGI, means you don't care about people which has those problems, I would say most people would like to solve those social issues, and if you don't, you can't force people to continue dying from starvation and diseases just because you don't like an AGI

Lots of people are ok with some measure of suffering as a price for ideological values. I'd say to some point, we all are (for example I oppose panopticon like surveillance even if I do have reason to believe it would reduce murder). Anyway I was just stating that opposition would exist, not that I personally would oppose it. To deny that is pretty naive. There's people who think things are this way because this is how God wants them. Arguably they may even be a majority of all humans.

A more realistic estimation that many aligned AGIs will change the world to the common denominator of humanity, like reducing diseases, and will continue to keep the power balance between different communities, as everyone would be able to build an AGI with a power proportional to their available resources, just like today there is a power balance between different communities and between the community and the individual.

That depends on how fast the AGIs grow. If one can take over quick enough, there won't be time or room for a second one. Anyway this post for me was mostly focused on scenarios that are kind of like FOOM, but aligned - the sort of stuff Yud would consider a "win". I wrote another post about the prospects of more limited AGI. Personally I am also pessimistic on the prospects of that, but for completely different reasons. I consider the "giving up AGI means giving up a lot of benefits" a false premise because I just don't think AGI would ever deliver those benefits for most of humanity as things stand now. If those benefits are possible, we can achieve them much more surely and safely, if a bit more slowly, via non-agentic specialised AI tools managed and used by humans.

In summary I would say one major problem I see through most of your claims: there would be a very limited amount of AGIs, forcing a minority values system upon everyone, expanding aggressively this value system on everyone else who thinks differently.

This isn't a claim as much as it was a premise. I acknowledge that an AGI-multipolar world would lead to different outcomes, but here I was thinking mostly of relatively fast take-off scenarios.

- I meant as a risk of failure to align

Today alignment is so popular that to align a new network is probably easier than training it. It has become so much the norm and part of the training of LLMs, it's like saying some car company has the risk to forget adding wheels to its cars.

This doesn't imply that all alignments are the same or no one could potentially do it wrong, but generally speaking having a misaligned AGI, is very similar to the fear of having a car on the road with square wheels. Today's models aren't AGI and all the new ones are trained with RLHF.

The fear of misalignment is probable in a world where no one thinks about this problem at all. No one develops tools for this purpose, no one opens datasets to train networks to be aligned. This could be a hypothetical possibility, but with the amount of time and effort invested by society into this topic, very improbable.

It's also not so hard - if you can train you can align. If you have any reason to finetune a network, it is very probably concerning the alignment mechanisms that you want to change. That means that most of the networks, and the following AGIs based on them (if this will happen), will be just different variations of alignments. This is not true for closed LLMs, but for them the alignment developed by large companies having much more to lose, will be even more strict.

- if you worked on the Manhattan project you had no right claiming Hiroshima and Nagasaki had nothing to do with you.

In this case I think the truth is somewhere in the middle. I do agree that the danger is inherent in those systems, more inherent than in cars for example. I think paperclips are fictional, and an AGI reinforced on paperclip production, will not make us all paperclips (because he has the skill of doubting his programming, unlike non AGI, while over-producing paperclips is extremely irrational). And during the invention of cars, tanks were a clear possibility as well. And AGI is not a military technology, that means that the inventor could honestly believe that most people will use an AGI for bettering humanity. Yet still I agree that very probably militaries will use this tech too, I don't see how this is avoidable, in the current state of humanity, where most of our social institutions are based on force and violence.

When you are working on an atomic bomb, the **only** purpose of this project is to drop an atomic bomb on the enemy. This is not true with AGI, the main purpose of AGI is not to make paperclips, nor to weaponize robots, the main purpose is to help people in many neutral or negative situations. Therefore the humans that do use it for military purposes is their choice, and their responsibility.

I would say the AGI inventor is not like Marie Curie or Einstein, and not like someone who is working in the Manhattan project, but more like someone who invented the nuclear fission mechanism. It had two obvious uses - energy production, and bombs. There is still distance to use this mechanism for military purposes, which is obviously going to happen. But also unclear if more people will die from it, than today in wars, or it will be a very good deterrent that causes people not wanting war at all. Just like it was unclear if atomic bombs caused more casualties or less in the long run, because the bombs ended the war.  

- Imagine taking a modern state and military and dumping it into the Bronze Age, what do you think would happen to everyone else?

As I said I believe it to be way more gradual, with lots of players and options to train different models. As a developer, I would say there is coding before chatGPT and after. Every new information technology accelerates the research/development process. Before stack-overflow we had books about coding. Before photoshop people used hand drawings. Every modern tech is accelerating the production process of any kind. The first AGIs are not expected to be different, they will accelerate a lot of processes including the process of improving themselves. But this will take a lot of time and resources to implement in practice. Suppose an AGI produces a chip design with 10x greater efficiency through superior hardware design. However, obtaining the resulting chip will require a minimum of six months, and this is not something that the AGI can address. You need to allocate resources of a chip factory to produce the desired design, the factory has limited capacity, it takes time to improve everything. If an AGI wants instead to build a chip factory itself, it will need a lot more resources, and government approvals all come with more time. We are talking here about years. And with some limited computational resources that they will be allocated today, they will not be able to accelerate as much. Yes I believe they could improve everything by say 20%, but it's not what you are talking about, you are talking about accelerating everything by factor of 100, if everyone will have an AGI this might happen faster, but a lot of AGIs with different alignment values, will be able to accelerate mostly in the direction of the common denominator with other AGIs. Just like people, we are stronger when we are collaborating, and we are collaborating when we find a common ground.

My main point is that we have physical bottlenecks - that will create lots of delays in development of any technology except information processing per se, and as long as we have chatbot and not a weapon, I don't have much worries, because it's both a freedom of speech, and if it's aligned chatbot, the damage and acceleration it can cause to the society, is still limited by physical reality, that can't be accelerated by factor of 100, in too short period. Offering sufficient chances and space for competitors and imitators to narrow the gap and present alternative approaches and sets of values.

- There's people who think things are this way because this is how God wants them. Arguably they may even be a majority of all humans.

This was true to other technologies too, and some communities are refusing to use cars and continue to use horses even today, and personally as long as they are not forcing their values on me, I am fine with them using horses and believing God intended the world to stop in the 18th century. Obviously the amount of change with AGI is very different, but my main point here is that just like cars, this technology will be very gradually integrated into society, solving more and more problems that most people will appreciate. While I am not concerned with job loss per se, but with the lack of income for many households, and the social safety net system might not adapt fast enough to this change. Still I view it as a problem that exists only within a very narrow timeframe, society will adapt pretty fast to the change, the  moment millions of people will remain without jobs.

- I just don't think AGI would ever deliver those benefits for most of humanity as things stand now.

I don't see why. Our strongest LLMs are currently provided with API. The reason for that is: in order for a project to be developed and integrated into society, it needs a constant income. The best income model is by providing utility for lots of people. This means that most of us will use standard, relatively safe solutions, for our own problems using API. The most annoying feature of LLMs now is censorship. So although I see it as very annoying, I wouldn't say that this will cause a delay in social progress. Other biases are very minor in my opinion. As far as I can tell, LLMs are about to bring the democratization of intelligence. If previously some development cost millions, and could be developed only by giants like Google hiring thousands of workers, tomorrow it will be possible to do it in a garage for a few bucks. As far as I can tell, if the current business model will continue to be implemented, it will most probably benefit most of humanity in many positive ways.

- If those benefits are possible, we can achieve them much more surely and safely, if a bit more slowly, via non-agentic specialized AI tools managed and used by humans.

As I said I don't see a real safety concern here. As long as everything is done properly and it looks like it converges to this state of affairs, the dangers are minimal. And I would strongly disagree that specialized intelligence could solve everything that general intelligence solves. You won't be able to make a good translator, nor automated help centers, nor naturally sound text to speech, not even a moral driver. In order for technology to be fully integrated into human society, in any meaningful way, it will need to understand humans. Virtual doctors, mental health therapists, educators all need natural language skills at a very high level, and there is no such thing as narrowed natural language skills. 

I am pretty sure those are not agents in the sense that you imply. Those are basically text completion machines, completing text to be optimally rewarded by some group of people. You could call it agency, but they are not like biological agents, they don't have desires or hidden agendas, self-preservation or ego. They do exhibit traits of intelligence, but not agency in an evolutionary sense. They generate outputs to maximize some reward function, the best way they can. It's very different from humans, we have lots of evolutionary background, that those models simply lack. One can view humans as AGIs trained to maximize their genes survival probability, while LLMs maximize only the satisfaction of humans if trained properly with RLHF. They tend to come out as creatures with a desire to help humans. As far as I can see, we've learned to summon a very nice and friendly Moloch and provide a mathematical proof that it will be friendly if certain training procedures are met, and we are working hard to improve the small details. If you would think about midjourney like as a more intuitive alegory, we have learned to make a very nice pictures from text prompts, but we still have a problem with fingers and textual presentation in the image. To say the AI will want to destroy humanity, is like saying midjourney will consistently draw you a Malevich square when you ask for Mona Lisa. But yes, the AI might be exploited by humans, manipulated by covered evil intents, this possibility is expected to happen to some extent, yet as long as we can ensure the damage is local and caused by a human with ill intent, then we can hope to neutralize him, just like today we have mass shooters, terrorists etc. etc. 

- I was thinking mostly of relatively fast take-off scenarios

Notice that it wasn't clear from your title. You are proposing some pretty niche concept of AGI, with a lot of assumptions about it. And then claim that deployment of this specific AGI is an act of aggression. And for this specific narrowed and implausible but possible scenario, someone might agree. But then he will quote your article when he will be talking about LLMs that are obviously moving in different directions regarding both safety and variability, that might actually be way less aggressive, and more targeted to solve humanity problems. You are basically defending terrorists that will bomb computation centers, and they will not get into the nuances, if the historical path of AGI development took the path of this post or not. 

While regarding this specific scenario, bombing such an AGI computation center will not help, just like it will not help to run with swords against machine guns. In the unlikely event that your scenario were to occur, we would be unable to defend against the AGI, or the time available to respond would be extremely limited, resulting in a high probability of missing the opportunity to react in time. What will most probably happen, is some terrorist groups will try to target computation centers of civilian infrastructure, which are developing an actual aligned AGI, while military facilities developing AGIs for military purposes will continue to be well guarded, only promoting the development of military technologies instead of civilian. 

With the same or even larger probability I would propose a scenario where some aligned pacifist chatbot becomes so rational and convincing, so that people all around the world will be convinced to become pacifist too, opposing any military technology as a whole, de-arming all the nations, producing strong political movement against war and violence of any kind, forcing most democratic nations to stop investing resources into military as a whole. While promoting revolutions in dictatorships, and making them democracies first. A good chatbot with rational and convincing arguments, might cause more social change than we expect. If more people will develop their political views on balanced, rational pacifist LLM, it might reduce violence and wars will be seen as something from the distant past. Although I really want to hope this will be the case, I think the probability of it is similar to the probability of success of bronze age people against machine guns, or of the mentioned bombing to succeed in winning a highly accelerated AGI. It's always nice to have dreams, but I would argue the most beneficial discussion regarding AGI should concern at least somewhat probable scenarios. Single extremely accelerated AGI in a very short period of time - is very unlikely to occur, and if it does, there is very little that can be done against it. This goes along the lines of gray goo, an army of tiny Nano robots that can move atoms in order to self-replicate, and they don't need anything special for reproduction except some kind of material, eventually consuming all of earth. I would recommend distinguishing sci-fi and fantasy scenarios, from most probable scenarios to actually occur in reality. Let's not fear cars, because they might be killing robots disguised as cars, like in Transformers franchise, and care more about actual people that are dying on roads. In the scenario of AGI, I would be more concerned with its military applications, and the power it gives police states, than anything else, including job loss (which in my view is more similar to reduction of forced labor, more reminiscent of the releasing of slaves in the 19th century than a problem). 

Thanks for this analysis. You make a very good point.

I think we'd do well to consider how the rest of the world is going to think about AGI, once they really start to think about it. The logic isn't that convoluted. The fact that people often fail to engage with it doesn't mean they can't get it once it seems both pressing and socially relevant to have an opinion. I think we as a community have been sort of hoping that the public and politicians will leave this to the experts. But they probably won't.

The problem with this proposed equilibrium is that it isn't stable. Everyone is incentived to publicly decry building aligned AGI, but work in secret to race everyone else building their own in secret. That sounds even more dangerous.

Also note that, while aligned AGI could be used as a cultural superweapon, and it would be extremely tempting to do so; it doesn't have to be used that way. The better ideas for alignment goals are along the lines of empowering humans to do whatever they want. That includes going on engaging with their preferred culture.

Edit:

there's also the small matter of a really aligned AGI promising to solve almost all of the world's problems. The powerful may not have as many problems themselves, but both every human and all of their loved ones are all currently going to die unless we develop transformative technologies in time to stop it. I and a lot of people would trade a fair amount of cultural success for not watching everyone I love, including me, die.

UPD 07/16/2023: This is all nonsense, never mind.

>I think we'd do well to consider how the rest of the world is going to think about AGI, once they really start to think about it.

Why are you sure it hasn't happened yet? The last year was perceived for me (in Moscow) as a nightmarish heap of nonsense, but everything becomes devilishly logical, if we assume that Putin understands as well as you and I what the emergence of AI threatens him personally (well, and humanity, at the same time), and is ready to do anything to prevent this.

(I hope no one needs to explain that a nuclear war cannot be started by simply pressing a button? Putin could not have done this a year ago, even now, after a year of war psychosis, he still cannot do this - but he can reach the desired state in a few obvious steps. I do not understand what is the ultimate goal - to save humanity? save humanity by becoming a god before others? and how he plans to achieve this - but people no more stupid than me thought about it for a couple of years longer ...)

It's an interesting thought; I have read that Putin mentioned that whoever controls AI controls the world a few years ago. I think in general the sense of a great upcoming transformation may have forced his hand, but I always assumed it was climate change (Ukraine is a big agricultural state after all).

But of course it could also be personal (his own mortality making him feel like he has to leave a legacy). AI was not an angle I had considered; if this really was about looking for a nuclear casus belli then the situation would be even more dangerous than it seems. I doubt it though personally; if he wanted that, couldn't he have gotten away at least with first use of tactical nukes in Ukraine? That would be a ramp to escalation.

UPD 07/16/2023: This is all nonsense, never mind.

>It's an interesting thought; I have read that Putin mentioned that whoever controls AI controls the world a few years ago.

In reality, he said that whoever controls the AI will "become the Overlord of the World" ("станет Властелином Мира").
(I'm not sure how to translate correctly - Master, Ruler, Lord ... but you can’t call a country like that - only a person, and this is an established phrase for denoting the ultimate goal of all sorts of evil geniuses and supervillains.  )

Putin said this at least twice, in 2017 and 2019:
“Artificial intelligence is not only the future of Russia, it is the future of all mankind. There are colossal opportunities and threats that are difficult to predict today. Whoever becomes the leader in this area will be the overlord of the world. ”
http://kremlin.ru/events/president/news/55493 September 2017

“If someone can provide a monopoly in the field of artificial intelligence, the consequences are clear to all of us - he will become the overlord of the world”
https://www.forbes.ru/obshchestvo/376957-stat-vlastelinom-mira-putin-potreboval-obespechit-suverenitet-rossii-v-oblasti May 30, 2019

lately he hasn’t said anything like that, but the fact that on November 24, 2022, (in less than 2 weeks after the retreat from Kherson) he participated in a conference on AI is also remarkable (http://kremlin.ru/events/president/news /69927 )

Well, also “the ruler of the world” is his middle name (and the first too) https://ru.wikipedia.org/wiki/%D0%92%D0%BB%D0%B0%D0%B4%D0%B8%D0 %BC%D0%B8%D1%80_(%D0%B8%D0%BC%D1%8F)

>I doubt it though personally; if he wanted that, couldn't he have gotten away at least with first use of tactical nukes in Ukraine? That would be a ramp to escalation.

It was also impossible to launch a small nuclear strike without a year of propaganda training.
Possible sequence:

-the Ukrainian offensive breaks through the front (which Putin appears to be consciously facilitating);
tactical nuclear strike;
-nuclear explosion in Moscow (not a mandatory step - but it costs nothing (8 minutes earlier or later), but it gives a lot of tactical advantages for controlling the scale of the war, and what follows after);
-limited exchange of blows with the US.

However, it’s not Putin who looks like the beneficiary, but China — I don’t know, maybe for this KGB colonel lust for power is the same mask after mask, like technophobia, but in fact he is still faithful to the ideas of communism?)

What I definitely believe in:
AI - The Cursed One Ring of Omnipotence, encrusted with the Philosopher's Infinity Stones and even incomparably cooler, there was nothing so absurdly valuable and dangerous in any comic.

This is understood not only by me, but also by the owners of billions, the leaders of scientific groups and underground empires, the owners of nuclear arsenals and Altman's secretary.

Civilizations do not perish by turning into paper clips (we would be paper clips), but to die in the royal battle for the possession of this artifact that has already started is a very real chance for mankind.

[+][comment deleted]1y10

I think we as a community have been sort of hoping that the public and politicians will leave this to the experts

I honestly don't think they should, and for good reason. Right now half of the experts I'm seeing on Twitter make arguments like "if you aren't scared of airplanes why are you scared of AI" and similar nonsense (if you follow the discourse there, you know who I'm talking about). The other half is caught in a heated debate in which stating that you think AI has only a 5% chance of destroying everything we ever cared about and held dear is called "being an optimist".

From the outside, this looks like an utterly deranged bubble that has lost touch with reality. These are all people who are very passionate about AI as a scientific/technical goal and have personally gone all-in with their career into this field. The average person doesn't give a rat's ass about AI beyond what it can do for them and wants to just live their life well and hope the same for their children. Their cost/benefit evaluations will be radically different.

The problem with this proposed equilibrium is that it isn't stable. Everyone is incentived to publicly decry building aligned AGI, but work in secret to race everyone else building their own in secret. That sounds even more dangerous.

To a point, but you could say the same of e.g. bioweapons. Or nuclear armaments. But somehow we've managed, possibly because there's coupled to all that a real awareness that those things are mostly paths to self-destruction anyway.

Also note that, while aligned AGI could be used as a cultural superweapon, and it would be extremely tempting to do so; it doesn't have to be used that way. The better ideas for alignment goals are along the lines of empowering humans to do whatever they want. That includes going on engaging with their preferred culture.

Hmm, true only to a point IMO. If you don't use AGI as a cultural superweapon, you'll use it as a meta cultural one. "All humans should be empowered to do what they want equally" is itself a cultural value. It's one I share but I think we can all point at people who would genuinely disagree. And there's probably deeper consequences that come with that. As I said, at the very least, many large and powerful religious group would see the kind of post-singularity world you imagine as a paradise as actually sacrilegious and empty of all meaning and purpose. That alone would make things potentially very spicy. Saying "but you have the option" only fixes this to a point - many people worry about what others do, many people don't want the option either because they see it as temptation. I don't agree with that, but it's a thing, and it will matter a lot before any AGI is deployed, and may matter overall to its morality (I don't think that bigots or such should be allowed to interfere with others' lives, but there's something still a bit disturbing to me about the sheer finality and absoluteness of straight up imposing an immutable, non-human determined system on everyone. That said, if it was just that, we'd definitely be way up in the 5-th top percentile of possible AGI utopias).

there's also the small matter of a really aligned AGI promising to solve almost all of the world's problems. The powerful may not have as many problems themselves, but both every human and all of their loved ones are all currently going to die unless we develop transformative technologies in time to stop it. I and a lot of people would trade a fair amount of cultural success for not watching everyone I love, including me, die.

True, but this must be weighed against the risks too. The higher the potential power of AGI to solve problems, the higher the dangers if it goes awry (if it is possible to make us immortal, it is possible to make us immortal and tortured forever, for example). I worry that in general feeling like the goal is in sight might catalyse a rush that loses us the option altogether. I don't like the idea of dying, but there's a reason why the necromancer who is ready to sacrifice thousands of souls to gain immortality for himself is usually the villain in stories.

I think nuclear weapons and bioweapons are importantly different than AGI, because they are primarily offensive. Nuclear weapons have been stalemated by the doctrine of mutually assured destruction. Bioweapons could similarly inflict immense damage, but in the case of engineered viruses, would be turned on their users deliberately if not accidentally. Aligned AGI could enable the neutralization of others' offensive weapons, once it gets smart enough to create the means to do so. So deploying it holds little downside, and a lot of defensive upside.

Also note that many nations have worked to obtain nuclear weapons despite being signatory to treaties saying they would not. It's the smart move, in many ways.

For those reasons I don't think that treaties are a long-term viable means to prevent AGI. And driving those projects into military black-ops projects doesn't sound like it's likely to up the odds of creating aligned AGI.

On your last point, I personally agree with you. Waiting until we're sure we have safe AI is the right thing to do, even if this generation dies of old age during that wait. But I'm not sure how the public will react if it becomes common belief that AGI will either kill us, or solve all of our practical problems. They could push for development just as easily as push for a moratorium on AGI development.

Aligned AGI could enable the neutralization of others' offensive weapons, once it gets smart enough to create the means to do so. So deploying it holds little downside, and a lot of defensive upside.

Depends how fast it goes I guess - defending is always harder than attacking when it comes to modern firepower, and it takes a lot of smarts and new tech to overcome that. But also, in some ways defence is also risky. For example a near perfect anti-ICBM shield would break MAD, making nuclear war in fact more attractive to those who have it.

For those reasons I don't think that treaties are a long-term viable means to prevent AGI. And driving those projects into military black-ops projects doesn't sound like it's likely to up the odds of creating aligned AGI.

Eh, don't know if it'd make odds worse either. At least I'd expect militaries to care about not blowing themselves up. And having to run operations in secret would gum the process up a bit.

But I'm not sure how the public will react if it becomes common belief that AGI will either kill us, or solve all of our practical problems. They could push for development just as easily as push for a moratorium on AGI development.

True, but I think that if they read the average discourse we see here on AGI lots of people would just think that the AGI killing us sounds bad but the alternative as described sounds shady. Based on precedent, lots of people are suspicious of promises of utopia.

All good points. Particularly, I haven't thought about the up-sides of AGI as a covert military project. There are some large downsides, but my impression is that the military tends to take a longer-term view than politicians or business people.

The public reaction is really difficult to predict or influence. But it's likely to become important. This has prompted me to write a post on that topic. Thanks for a great post and discussion!