There's one thing history seems to have been trying to teach us: that the contents of the future are determined by power, economics, politics, and other conflict-theoritic matters.

Turns out, nope!

Almost all of what the future contains is determined by which of the two following engineering problems is solved first:

  • How to build a superintelligent AI (if solved first, everyone dies forever)
  • How to build an aligned superintelligent AI (if solved first, everyone gets utopia)

…and almost all of the reasons that the former is currently a lot more likely are mistake theory reasons.

The people currently taking actions that increase the probability that {the former is solved first} are not evil people trying to kill everyone, they're confused people who think that their actions are actually increasing the probability that {the latter is solved first}.

Now, sure, whether you're going to get a chance to talk with OpenAI/Deepmind/Anthropic's leadership enough to inform them that they're in fact making things worse is a function of economics and politics and the like. But ultimately, for the parts that really matter here, this is a matter of explaining, not of defeating.

And, sure, the implementation details of "utopia" do depend on who launches the aligned superintelligent AI, but I expect you'd be very happy with the utopia entailed by any of the possibilities currently on the table. The immense majority of the utility you're missing out on is from getting no utopia at all and everyone dying forever, rather than getting the wrong utopia implementation details.

The reason that the most likely outcome is that everyone dies forever, is that the people who get to impact which of those outcomes is going to happen are mistaken (and probably not thinking hard enough about the problem to realize that they're mistaken).

They're not evil and getting them to update to the correct logical beliefs is a matter of reason (and, if they're the kind of weak agents that are easily influenced by what others around them think, memetics) rather than a matter of conflict.

They're massively disserving everyone's interests, including their own. And the correct actions for them to take would massively serve their own interests as well as everyone else's. If AI kills everyone they'll die too, and if AI creates utopia they'll get utopia along with everyone else — and those are pretty much the only two attractors.

We're all in this together. Some of us are just fairly confused, not agentically pursuing truth, and probably have their beliefs massively biased by effects such as memetics. But I'm pretty sure nobody in charge is on purpose trying to kill everyone; they're just on accident functionally trying to kill everyone.

And if you're not using your power/money to affect which of those two outcomes is more likely to happen than the other, then your power/money is completely useless. They won't be useful if we all die, and they won't be useful if we get utopia. The only use for resources, right now, if you want to impact in any way what almost all of the future contains (except for maybe the next 0 to 5 years, which is about how long we have), is in influencing which of those two engineering problems is solved first.

This applies to the head of the major AI orgs just as much as it applies to everyone else. One's role in an AI org is of no use whatsoever except for influencing which of those two problems are solved first. The head of OpenAI won't particurly get a shinier utopia than everyone else if alignment is solved in time, and they won't particularly die less than everyone else if it isn't.

Power/money/being-the-head-of-OpenAI doesn't do anything post-singularity. The only thing which matters, right now, is which of those two engineering problems is solved first.

New Comment
65 comments, sorted by Click to highlight new comments since: Today at 4:08 PM

I wish I could believe that.

To start off, I agree with pretty much all of it. It's unlikely that any of the main players actively want the world to end, and inasmuch as they'll bring that outcome about, it'll be by mistake. It's marginally more likely that some of them are risking the world in the pursuit of personal power/status/ideology, and monstrously consider everyone's deaths an acceptable risk, but I'm not certain of that either.

That said, I agree that we could, in principle, all cooperate here. The thing often touted about as a counterargument is "but China". Except China doesn't want to die any more than the US, and there isn't, in principle, any reason the Chinese government can't be convinced of the seriousness of the danger. They would believe in the reality of an asteroid on a collision course with Earth if shown the evidence, and the AGI threat is no less real.

What we all can cooperate towards is to ban AGI development, and seek other, more controllable and less unilateral ways to create superintelligences. Human cognitive enhancement, genetic engineering, uploads, whatever.

What we can all do, together, is avert the omnicide. It's in ~no-one's interests.

What I don't believe is that we could cooperate on building an AGI that would bring about an utopia.

The term "AI Alignment" is a bit obfuscatory. The technical problem of AI Alignment would be better termed the control problem. An aligned AGI isn't necessarily an AGI that does the best for humanity; an aligned AGI is one that does precisely what its designer(s) intended for it to.

And the utopias of the majority of people, or companies, or governments, across history and across the world today, would be death or a hellscape for most other people. We would not enjoy life in a world in which North Korea, or a corporate sociopath, or Trump's US, or some bureaucracy, solved AGI alignment.

Again, I would like to empathize that "that means we need to outrace everyone else" is not the correct takeaway from this.

  • Firstly, who are "we"? There is no person or a group of people Humanity can trust with doing it right.
    • No-one who is currently racing can be trusted. A US company successfully solving alignment isn't particularly more likely to result in a pro-humanity utopia, or an utopia for you, than the Chinese doing it first.
    • No-one who may start racing can be trusted. The US military nationalizing the project isn't going to result in a good outcome, either. And if we set things up such that the AI's value system is being decided as a matter of national policy, via a democratic process? Where people discuss it among themselves, with politicians chiming in, and maybe then vote on it, and then the result is interpreted by some bureaucratic process, and then someone types it in? Don't make me laugh. We'd be lucky if the AI that'd result from this just kills everyone and tiles the universe with paperwork, rather than trapping everyone in some inescapable Kafkaesque nightmare.
      • (To be clear, it's not because the people will want bad things. It's because our processes for eliciting and agglomerating their preferences – any and all processes in wide use – are an abomination.)
    • To do it right, whatever process builds the AGI would need to be actively, desperately trying not to leave its fingerprints on the future. But I trust pretty much no-one and no thing to actually do that, instead of hijacking the future for their values. ... Except myself and a few specific people, of course. But I rather doubt that you or the people of Indonesia or humanity as a whole would feel particularly enthusiastic about handing off that decision to me, yeah?
    • Same for any other candidature. There is nobody to whom Humanity as a whole can entrust its future; nobody to whom it should feel comfortable deferring that decision.
  • Secondly, that'd be giving up too early. None of the above invalidates the core argument: even if we can't agree on an utopia, pretty much all of us would prefer to keep things as they are, keep incrementally improving our conditions and painstakingly negotiating compromises, to all of us just dying overnight. And the policy of "so we need to outrace the competitors before they build a hell" doesn't actually lead to your utopia, because you are not going to solve alignment in those conditions either. That policy leads to death. Which, again, ~nobody wants.

So I agree that we can all cooperate on this, that the current state of affairs is a mistake, and that we can negotiate for a better outcome. Ban AGI research internationally. Keep advancing using other technologies.

We will still need to reconcile our differences later on, of course. But it can be done incrementally, a steady pace of negotiation and power balancing and cultural mingling and sanity-raising. There are routes of intelligence enhancement that are more gradual, that let us preserve this sort of incrementalism and stability while still letting Humanity keep empowering itself. Gradual intelligence augmentation, via biotechnological or cyborgism-like or upload-based means. 

Humanity-as-a-whole can't entrust its future to any given part of itself. But it can still build a future for itself. There doesn't have to be a singular point in time at which we are deciding the whole shape of the future and are then unable to backtrack.

Bottom line: As things stand, anyone anywhere solving either AGI or AGI alignment is not, on expectation, going to lead to a good outcome for humanity. Our processes are too dysfunctional:

  • We can't trust each other to let each other solve alignment in peace, and–
  • – we can't survive if we let that distrust get to us and start racing each other, because then none of us solve alignment and we all die.

The best outcome we can all cooperate towards – and that is a good outcome that we can all cooperate towards – is to ban the accursed thing.

[-][anonymous]5mo150

Except China doesn't want to die any more than the US, and there isn't, in principle, any reason the Chinese government can't be convinced of the seriousness of the danger. They would believe in the reality of an asteroid on a collision course with Earth if shown the evidence, and the AGI threat is no less real.

The current belief held by many people is that future AI can be controlled.  And I think it's a statement of fact that if you accept architectural limitations that will lower net performance,  you can build controllable/safe systems that exhibit AGI and ASI like behavior.  [I think it's fair to disagree on how much performance you have to give up, how strong a series of 'boxes' you use, and the outcome when the models escape]

So it's a race to get these systems and whoever loses, loses it all.  

You probably disagree with the above, but at present the Chinese and US governments appear to be acting like they are racing.  

For example, China is reacting to get access to compute:

https://wccftech.com/chinese-factories-dismantling-thousands-of-nvidia-geforce-rtx-4090-gaming-gpus-turning-ai-solutions/

https://www.tomshardware.com/news/old-rtx-3080-gpus-repurposed-for-chinese-ai-market-with-20gb-and-blower-style-cooling

https://www.chinadaily.com.cn/a/202310/21/WS65330e19a31090682a5e9dce.html

The US government is acting to shut off China's access to compute:

https://fortune.com/2023/12/02/ai-chip-export-controls-china-nvidia-raimondo/

 

The issue with your point of view is that as long as the evidence leaves 2 positions in superposition with good probability mass on both:

[ future AGI/ASI systems can be controlled and harnessed by humans using straightforward methods | future AGI/ASI systems cannot be controlled or harnessed by humans easily ]

Then the parties have to assume that AGI/ASI can be controlled, and will provide a pivotal advantage, and have to get strapped with their own.  Hence a race.  

 

To resolve the above superposition, the race would have to continue until AGI/ASI exists and many versions of it have been tested.  

  1.   If all the versions fail to be controllable and the system causes industrial accident after accident (see nuclear fission power), that's one reality and in that world, heavy regulations and restrictions would make sense and likely be supported.  

        2. If at least one architecture turns out to be pretty controllable, then that's the other world.

        3.  The third world is that the utility gain from even early ASI is so enormous that it kills everyone.  

 

I assume you believe (3) to be a fact.  But how do you propose to convince the key decision-makers without direct evidence?  

When you talk about the overwhelming power of an ASI - can invent nanotechnology in days, coordinate drone strikes that depose entire governments within hours, convince people to act against their own interests - think of how that sounds to government policymakers.  That sounds like a weapon you had better get immediately.  Conversely, 'weaker' and perhaps more realistic ASI systems that needs years to do the above and vast resources are more controllable.

When you talk about the overwhelming power of an ASI - can invent nanotechnology in days, coordinate drone strikes that depose entire governments within hours, convince people to act against their own interests - think of how that sounds to government policymakers.  That sounds like a weapon you had better get immediately

Yeah, that's a difficult framing problem.

Suppose there were a device such that, if built, it would cause an explosion powerful enough to crack the planet; and suppose there were an industry racing to build it, believing that it's possible to harness it as a revolutionary energy source. Say, if that whole "will it ignite the atmosphere?" thing with nuclear bombs weren't possible to rule out in advance of testing one, that'd about fit the bill.

It seems plausible that if that were literally our problem, it'd be possible to convince governments to ban the pursuit of this entire technology. Especially if they didn't manage to classify it start to finish; if we could leverage public pressure.

The problem is in the flavour/aesthetic. "Creating a really smart thing" is pretty difficult to equate with "accidentally setting off a planet-shattering explosion" in most people's minds. Nevertheless, it should be theoretically possible to pick a way to convey the message of AI Risk that'd activate all the same heuristics in people's minds as "nuclear accident risk". The crux of political messaging is that you don't actually necessarily need to delve into the concrete scenarios, or put them front and center – you just need to pick a message that resonates with people at some abstraction level.

I'd played around with the idea a year ago, but haven't really developed it further.

[-][anonymous]5mo160

Ok I have tried to table out the outcomes in this situation. This is from a viewpoint of a "power bloc", for example if the UK bans AI research but their close allies the USA secretly defect, it would be the same as the UK choosing to secretly defect.  

Note that also for the upper part of this table, the !(others accelerate) outcome, all countries in the world who have the ability to access the necessary chips and access to nuclear weapons must each separately choose not to accelerate.  In an attempt at a worldwide ban, anyone who chooses to accelerate is protected by their own nuclear weapons, which there is no effective defense to pre-AGI.  So they get to independently choose to pay|!pay the international outrage and sanctions if they wish to access the "take the planet" outcomes.  

This makes it a choice of (pay|!pay) ^ n, where n is the number of actually separate factions.  It would be interesting to see how large n actually is.  Obviously [West, China] are factions, so n is at least 2, but how many other parties are there?  Is Taiwan or Israel their own parties?  How long would it take Russia to obtain the chips necessary?

 

Italicized is I think what AI doomers believe.

Bold is I think what e/acc believe.  

 

AI Containability x Other country actionsBan AIBan AI publically, defectAccelerate AI 
Easy Containability, others banstasistake the planettake the planet++ 
Occasionally Escapes, others banstasistake the planettake the planet++ 
Uncontainable, others banstasisAI choosesAI chooses 
Easy Containability, others defectgovernment deposedWW3 or Cold War 2take the planet 
Occasionally Escapes, others defectgovernment deposedWW3 or Cold War 2take the planet 
Uncontainable, others defectAI choosesAI choosesAI chooses 
Easy Containability, others accelerategovernment deposedgovernment deposedWW3 or Cold War 2 
Occasionally Escapes, others accelerategovernment deposedgovernment deposedWW3 or Cold War 2 
Uncontainable, others accelerateAI choosesAI choosesAI chooses 


I would like to add some color to this table, not sure how.  But in general, governments are going to perceive the "deposed" scenario as an unacceptable outcome, a war as a disfavorable but for powerful governments, winnable outcome, and obviously they would prefer the world where they can 'take the planet'.  This is where using AGI/ASI and exponential production rates, the government manufactures whatever tool they want in the numbers necessary to depose everyone else.  Theoretically this doesn't need to be a weapon, for example you could offer aging treatments to citizens of other countries (and their elder relatives) if they rescind their current citizenship.  And financially buy all of the assets of all the other countries.

I think this very neatly shows e/acc as a belief.  If you think there is no real chance all the other countries will stop developing AI, you only have the rightmost column as a valid choice.  All the outcomes are not great but the rightmost column is the least bad.

This also seems to show why 'doomer' faction members have such a depressed attitude.  All the outcomes are bad. The 'stasis' one means everyone dies from aging and it's unstable - it ends on the first defection.  All the rest leave humans at the mercy of a machine that has random alignment.  Even the "aligned self modifying AGI/ASI" dream would mean the outcome is still "AI chooses", just humans have weighted the outcome in their favor.

@Thane Ruthenis  I am very very curious to see your reaction.  If this is a bad visualization of the 'board' I'd love to make it more detailed in a grounded, reasonable way.  For example I am assuming the 'occasionally escapes' scenario means the AGI/ASI do occasionally defect or break out, and some of the defections do cause significant human casualties, but humans do win each battle eventually.   This would be consistent with your 'unstable nuclear software' post.

 I think doomer members believe that the utility benefit of being an escaped self modifying ASI is so large that those outcomes become "AI chooses" as well.  I have been lumping that into "uncontainable"

Note that also for the upper part of this table, the !(others accelerate) outcome, all countries in the world who have the ability to access the necessary chips and access to nuclear weapons must each separately choose not to accelerate.  In an attempt at a worldwide ban, anyone who chooses to accelerate is protected by their own nuclear weapons

I don't think that's right.

  • Ground fact: If you take the premise that AGI presents an existential risk as a given, merely risking a nuclear exchange in order to prevent someone else from building it (the infamous "bomb the datacenters of foreign defectors" proposal) is correct. If you're taking it seriously, and the enemy is taking it seriously, then you know that sanctions and being Greatly Concerned won't stop them, and that their success would be your end, and that it's not certain that if you bomb them they'll retaliate. So you bomb them.
  • So if both parties are taking the promises and risks of AGI seriously, then a sufficiently big coalition choosing to ban AGI can effectively ban it for everyone else, including non-signatories. The non-signatories will know the threat of mere nuclear weapons won't deter the others.
    • I mean, of course it'd still be precarious and there'd be constant attempts to push the boundaries, but the NatSec agencies have been playing that game against each other for a while, and it may be stable enough.
  • Conversely, if some of the parties aren't taking the risks seriously, and they're willing to accelerate, and are posturing about how others' attempts to prevent or sabotage their AGI projects will be met with nuclear retaliation... Yeah, I'm calling that bluff.
    • If Russia did anything good the last two years, it's making the "I'll nuke you if you cross this red line!" look like something a clown says then never acts on.

If you think there is no real chance all the other countries will stop developing AI, you only have the rightmost column as a valid choice

I mean, that's just the standard Prisoner's Dilemma setup there, no? And it's sometimes possible to make people recognize that defect/defect and cooperate/cooperate are the only stable states between two similar-enough agents, and that they should therefore all cooperate. Making people recognize this is non-trivial, yes, but it's a problem that's sometimes been solved.

Also, in this case, the cooperating party can, in some scenarios, force the other party to cooperate as well, which somewhat changes the calculus.

All the outcomes are bad. The 'stasis' one means everyone dies from aging.  

Eh, I don't agree with that either. AGI isn't the only technology left, nor the only technology that can prevent aging, and banning AGI doesn't have to mean banning all technology. I'd already mentioned other forms of intelligence enhancement as possibilities. Hell, the AI tools that exist today, even if frozen at the current type of architecture, can likely be realized to greatly accelerate other technologies. Immortality escape velocity may very well be achievable in the next 20-30 years even without AGI.

There's a very valid concern about bureaucracies and the culture of over-caution strangling innovation, which I'm very sympathetic to. But risking blowing up the planet over it seems excessive. Maybe try some mass anti-FDA protests or something like that first?

So I'd replace "stasis" with "incremental advancement".

For example I am assuming the 'occasionally escapes' scenario means the AGI/ASI do occasionally defect or break out, and some of the defections do cause significant human casualties, but humans do win each battle eventually

Yeah, that seem like one of the possible ways the world could be.

Though I'd note that I expect the level of superintelligence necessary to beat humanity to be surprisingly low. You don't necessarily need to be at "derives nanotechnology in a few days, Basilisk-hacks people en masse". Don't even have to be self-modifying. Just being a bit smarter than humans + having more parallel-processing power may be enough. See the arguments here and here.

So if one of the escapees is at least that competent, and it manages to get a toehold in the human civilization (get one of the countries to shelter it, get a relatively powerful social movement on its side, distribute itself far enough that it'd take time to find and shut down all its instances), that may be enough for it to start up a power-accumulation loop that'd outpace humanity's poorly-coordinated attempts to make it stumble. Especially if governments aren't willing to risk nuclear exchanges over the issue.

[-][anonymous]5mo40
  • Conversely, if some of the parties aren't taking the risks seriously, and they're willing to accelerate, and are posturing about how others' attempts to prevent or sabotage their AGI projects will be met with nuclear retaliation... Yeah, I'm calling that bluff.
    • If Russia did anything good the last two years, it's making the "I'll nuke you if you cross this red line!" look like something a clown says then never acts on

It depends on the party and their geographic area and their manufacturing ability.  Small countries like Israel or Taiwan, both of whom are nuclear capable, yes, you can probably destroy the key tools needed to make ICs and prevent imports of new ones.

With China|Russia|EU|USA, they will mass manufacture air defense and bury all the important elements for their AI effort.  You will have to fire nukes, and they will return fire and everyone in your faction dies.  So from a decisionmaker's point of view now it is

[AI might be containable (like current computers) | AI might be risky but containable| AI might be uncontainable].  Only in the last box do we hit [AI chooses <the survival of humanity>].  While if we fire nukes now, all the outcomes are [we die]. 


Though I'd note that I expect the level of superintelligence necessary to beat humanity to be surprisingly low. You don't necessarily need to be at "derives nanotechnology in a few days, Basilisk-hacks people en masse". Don't even have to be self-modifying. Just being a bit smarter than humans + having more parallel-processing power may be enough. See the arguments here and here.

So in that scenario, you can be 1 of 2 groups of humans:

A. [you built the ASI that just escaped]

B.  [you banned it and someone else built it]

Note that in situation A, you bring in all the experts and prior AI tools you used to accomplish A.  You can examine your container code (with the help of 'trustworthy' models you have for this) and find and patch the bugs that were exploited.  You can lobotomize your local copies of the ASI and query it to predict what it's going to do next. 

And you have an ASI, presumably you have robots that can copy themselves.  You can start building countermeasures.

A lot of attacks are asymmetric, I admit that.  The countermeasures are much more expensive than the attack.  For example if you are up against arbitrarily designed pathogens or rogue nanotechnology, there is probably no vaccine that will work.  Anyone infected is a goner or would have to have their brain uploaded to save them from an LN2 frozen sample.  But space suits will stop the protein based pathogens, and even diamond nanobots will have materials they can't cut through due to too little energy stored in them.  

You also have the situation that the ASI that escaped is likely attempting self improvement and so it may become more capable than humans best models.  

So you can lose in this situation, but you have tools.  You can act still.  It's not over.  AI banning parties just lose automatically.  In fact human institutions  that fail to adopt ai internally also all lose automatically.

This relates to the geopolitical decision table above because the defection risk means someone might be about to create this situation for you unless you also secretly defect.  Yeah, it's prisoners dilemma albeit the "cooperate" payoff seems to be very poor, it has a dominant strategy of acceleration.

 

This seems like a key crux.  @Thane Ruthenis is accelerating AI at a geopolitical level the dominant strategy in a game theoretic sense?  If it's not dominant, why?  What's wrong with this table, what additional rows or labels do I need to add to express this more completely?

With China|Russia|EU|USA, they will mass manufacture air defense and bury all the important elements for their AI effort.  You will have to fire nukes

Mm. Let me generalize your spread of possibilities as follows:

  • "AGI might be containable"  "AGI is an incredibly powerful technology... but just that"
  • "AGI might be risky but containable"  "AGI, by itself, may be a major geopolitical actor"
  • "AGI might be uncontainable"  "AGI can defeat all of humanity combined"

Whether one believes that AGI is containable is entangled with how much they should expect to benefit by developing one. If a government thinks AGIs aren't going to be that impressive, they won't fight hard to be able to develop one, and won't push against others trying to develop one. If a government is concerned about AGI enough to ban it domestically, it sounds like they expect accident risks to be major disasters/extinction-level, which means they'd expect solving AGI to grant whoever does it hegemony.

So in the hypothetical case where we have a superpower A taking AGI seriously enough to ban it domestically and to try to bully other nations into the same, and a defecting superpower B burying their datacenters in response, so that the only way to stop it is nukes? Then it sounds like superpower A would recognize that it's about to lose the whole lightcone to B if it does nothing, so it'll go ahead and actually fire the nukes.

And it should be able to credibly signal such resolve to B ahead of time, meaning the defector would expect this outcome, and so not do the bury-the-datacenters thing.

Like... Yeah, making a government recognize that AGI risk is so major it needs to ban all domestic development and try to enforce an international ban is a tall order. But once a major government is sold on the idea, it'll also automatically be willing to risk a nuclear exchange to enforce this ban, which will be visible to the other parties.

Conversely, if the government isn't taking AGI seriously enough to risk a nuclear exchange, it's probably not taking it seriously enough to ban it domestically (to the point of not even engaging in secret research) either. Which invalidates the premise of the "what if one of the major actors chooses not to race" hypothetical.

So you can lose in this situation, but you have tools.  You can act still.  It's not over.  AI banning parties just lose automatically.

Fair. I expect it'd devolve out of anyone's control rapidly, but I agree that it'd look like a game they'd be willing to play, for the relevant parties.

Edit:

What's wrong with this table, what additional rows or labels do I need to add to express this more completely?

As above, I think some of the possibilities correspond to self-contradictory worlds. If a superpower A worries about AGI enough to ban it despite the race dynamics, it's not going to sit idly by and let superpower B win the game; even on pain of a nuclear exchange. So foolishly accelerating against serious concern from other parties gets you nuked, which means "do nothing and keep incrementally improving without AGI" is the least-bad option.

Or so it should ideally work out. There's a bunch of different dimensions along which nations can take AGI seriously or not. I'll probably think about it later, maybe compile a table too.

[-][anonymous]5mo40

Ok.  I think this collapses even further to a 1 dimensional comparison.

I made a mistake in the table, it's not all or nothing.  Some models are containable and some are not, with the uncontainable models being more capable.

Utilitysafe = ([strongest containable model you can build]*, resources)

Utilityrogue = ( [strongest model that can be developed]**, resources)

Or in other words, there is a utility gain that is a function of how capable to model is.  Utility gain is literally doing more with less.  

Obviously a nearly 0 intelligence evolutionary process evolved life, using billions of years and the biosphere of a planet that required some enormous number of dice rolls at a galaxy or universe wide scale.  

Utility_evolution = (random walk search, the resources of probably the Milky way galaxy)

Utility is domain specific but for example, to do twice as good as evolution, you could design life with half the resources.   If you had double the utility in a tank battle, you can win with half the tanks.

And you're asserting a belief that Utilityrogue >>> Utilitysafe.

Possibly true, possibly not. 

But from the point of view of policymakers, they know a safe AI that is some amount stronger than current SOTA can be developed.  And that having that model lets them fight against any rogues or models from other players

If in numbers, actual reality is say  Utilityrogue = 2*(to limit of buildable compute)Utilitysafe, we should build ASI as fast as possible.

And if the actual numbers are  

Utilityrogue = 1000*(to limit of buildable compute)Utilitysafe

We shouldn't.  

Do we have any numerical way to estimate this ratio?  Is there a real world experiment we could perform to estimate what this is?  Right now should policymakers assume it's a high ratio or a low ratio?

What if we don't know and have a probability distribution.  Say it's (90 percent 2*, 10 percent 1000*).

From the perspective of the long term survival of humanity, this is "pDoom is almost 10 percent".  But what are national policymakers going to do?  What do we have to do in response?


*I think we both agree there is some level of AI capability that's safe?  Conventional software has AI like behavior and it's safe ish.

** Remember RSI doesn't end up with infinite intelligence, it will have diminishing returns as you approach the most capable model that current compute will support.

And you're asserting a belief that Utilityrogue >>> Utilitysafe.

I... don't think I'm asserting that? Maybe I'm misunderstanding you.

What I'm claiming is that:

Suppose we have two geopolitical superpowers, A and B.

  • If both A and B proceed at a measured pace. It's unclear which of them wins the AI race. If AGI isn't particularly dangerous, there isn't even such a thing as "winning" it, it's just a factor in a global power game. But the more powerful it is, the more likely it is that the party who's better at it will grow more powerful, all the way to becoming the hegemon.
    • So "both proceed steadily" isn't an equilibrium: each will want to go just a bit faster.
  • If party A accelerates, while party B either proceeds steadily or bans AGI, but in a geopolitically toothless manner, party A either wins (if AGI isn't extremely dangerous, or if A proceeds quickly-but-responsibly) or kills everyone (if AGI's an existential threat and A is irresponsible).
    • That isn't an equilibrium either: party B won't actualize this hypothetical.
  • If both A and B accelerate, neither ends up building AGI safely. It's either constant disasters or everyone straight-up dies (depending on how powerful AGI is).
    • That is an equilibrium, but of "everyone loses" kind. (AGI power only determines the extent of the loss.)
  • If party A bans AGI internationally, in a way it takes seriously, but party B accelerates anyway, then party A acts to stop B, all the way up to a lose/lose nuclear exchange.
    • That is not an equilibrium, as going here is just a loss for B.
  • If party A bans AGI internationally, and party B respects the ban's seriousness, the relative balance of power is preserved. It's unclear who takes the planet, because it's too far in the uncertain future and doesn't depend on just one factor.
    • That is the equilibrium it seems worth going for.

In table form, it'd be something like:

#ScenarioP(A wins)P(B wins)Equilibrium?
1A & B: proceed steadily0.50.5No (A goes to 2)
2A: speed up, B: steady0.990No (B goes to 3)
3A: speed up, B: speed up more00.98

No (A&B iterate 2-3, 

until reaching 4)

4A & B: rush maximally0.010.01Yes
5A: toothless ban, B: steady01No (A goes to 2 or 7)
6A: toothless ban, B: rush00.01No (A goes to 4 or 7)
7A: toothy ban, B: rush00No (B goes to 8)
8A: toothy ban, B: join the ban0.50.5Yes
[-][anonymous]5mo40

So I thought about it overnight and I wanted to add a comment. What bothers me about this table is that nuclear brinkmanship - "stop doing that or we will kill ourselves and you" - it doesn't seem very probable to ever happen.

I know you want this to happen, and I know you believe this is one of the few routes to survival.

But think about the outcomes if you are playing this brinkmanship game.

Action : Nuke. Outcome : death in the next few hours.

Action : back down.

Outcome pAIsafe : life (maybe under occupation) Outcome pAIrogue : life or delayed death (AI chooses) Outcome pAIweak : life.

If the above is correct, this makes n player brinkmanship games over AI get played with the expectation the other will back down. Which means... acceleration becomes the dominant strategy again. Acceleration and preparing for a nuclear war.

What bothers me about this table is that nuclear brinkmanship - "stop doing that or we will kill ourselves and you" - it doesn't seem very probable to ever happen.

I think that proves too much. By this logic, nuclear war can never happen, because "stop invading us or we will kill ourselves and you" results in a similar decision problem, no? "Die immediately" vs. "maybe we can come back from occupation via guerilla warfare". In which case pro-AI-ban nations can just directly invade the defectors and dig out their underground data centers via conventional methods?

Or even just precision-nuke just the data centers, because they know the attacked nation won't retaliate with a strike on the attacker's population centers in the fear of a retaliatory annihilatory strike? Again, a choice of "die immediately" vs. "maybe we can hold our own geopolitically without AGI after all".

Edit:

Outcome pAIsafe : life (maybe under occupation) Outcome pAIrogue : life or delayed death (AI chooses) Outcome pAIweak : life.

Also, as I'd outlined, I expect that a government whose stance on AGI is like this isn't going to try to ban it domestically to begin with, especially if AI has so much acknowledged geopolitical importance that some other nation is willing to nuclear-war-proof its data centers. The scenario where a nation bans domestic AI and tries to bully others into doing the same is a scenario in which that nation is pretty certain that the outcomes of "AI safe" and "AI weak" aren't gonna happen.

[-][anonymous]5mo20

I was focusing on what the [containable, uncontainable] continuum of possibilities means.

But ok, looking at this table, 

#ScenarioP(A wins)P(B wins)Equilibrium?
4A & B: rush maximally0.01 wins if rogue utility large, 0.5 if rogue utility is small.0.01 wins if rogue utility large, 0.5 if rogue utility is small.Yes.  Note this is the historical outcome for most prior weapons technologies.  [chemical and biological weapons being exceptions]
8A: toothy ban, B: join the ban0.50.5

Yes, but unstable.  It's less and less stable the more parties there are.  If it's A....J, then at any moment, at least one party may be "line toeing".  This is unstable if there is a marginal gain from line toeing - the party with slightly stronger, barely legal AGI has more GDP, which eventually means they begin to break away from the pack.  This breaks down to 2 then 3 in your model then settles on 4.

 

Historical example I can think of would be treaties on weapons post ww1.  It began to fail with line toeing. 


https://en.wikipedia.org/wiki/Arms_control 


The United States developed better technology to get better performance from their ships while still working within the weight limits, the United Kingdom exploited a loop-hole in the terms, the Italians misrepresented the weight of their vessels, and when up against the limits, Japan left the treaty. The nations which violated the terms of the treaty did not suffer great consequences for their actions. Within little more than a decade, the treaty was abandoned.

It seems reasonable to assume this is a likely outcome for AI treaties.  


For this not to be the actual outcome, something has to have changed from the historical examples - which included plenty of nuclear blackmail threats - to the present day.  What has changed?  Do we have a rational reason to think it will go any differently?


Note also this line toeing behavior is happening right now from China and Nvidia.

 

Rogue utility is the other parameter we need to add to this table to make it complete.  

I was focusing on what the [containable, uncontainable] continuum of possibilities means.

Ahh, you mean, what's expected utility of having a controlled AGI of power X vs. expected disutility of having a rogue AGI of the same power? And how does the expected payoff of different international strategies change as X gets larger?

Hm. Let's consider just A's viewpoint, and strategies of {steady progress, accelerate, ban only domestically, ban internationally}.

  • Steady progress is always viable up to the capability level where AGI becomes geopolitically relevant; let's call that level .
  • Acceleration is nonviable in the range of values below  but above some threshold at which accident risk is large enough to cause public backlash. The current global culture is one of excess caution and status-quo bias, and the "desire to ban" would grow very quickly with "danger", up to some threshold level at which it won't even matter how useful it is, the society would never want it. Let's call that threshold .
  • Domestic-only ban is viable in the range . If it's not a matter of national security, and the public is upset, it gets banned. It's also viable if A strongly expects that AGI will be uncontrollable but the disaster won't spill over into other countries: i. e., if it expects that if B rushes AGI, it'll only end up shooting itself in the foot. Let's call that .
  • At , acceleration, a toothy international ban, and (in some circumstances) a domestic-only ban are viable.
  • At , only acceleration and a toothy ban are viable. At that point, what decides between them is the geopolitical actor's assessment of how likely they are to successfully win the race. If it's low enough, only the ban has non-zero expected utility (since even a full-scale nuclear war likely won't lead to extinction).

So we have , steady progress is viable at , acceleration is viable at , domestic ban is viable at  and sometimes also at , and a toothy international ban is viable at .

Not sure if that's useful.

Yes.  Note this is the historical outcome for most prior weapons technologies.  [chemical and biological weapons being exceptions]

Interesting how exceptions are sometimes reachable after all, isn't it?

Yes, but unstable.  It's less and less stable the more parties there are

Yep. But as I'd said, it's a game NatSec agencies are playing against each other all the time, and if they're actually sold on the importance of keeping this equilibrium, I expect they'd be okay at that. In the meantime, we can ramp up other cognitive-enhancement initiatives... Or, well, at least not die for a few years longer.

[-][anonymous]5mo40

Ahh, you mean, what's expected utility of having a controlled AGI of power X vs. expected loss of having an rogue AGI of the same power? And how does the expected payoff of different international strategies change as X gets larger?

No.  At a given level of human understanding of AI, there are 2 levels of model.

  1.  The strongest model humans can contain (this means the model is mostly working for humans and is not going to deliberately betray).  This is Utilitysafe.
  2. The strongest model that the current level of (compute built in total + total data that exists + total robotics) is able to exist.  Until very recently, 1 == 2.  Humans could not build enough compute to make a model that was remotely dangerous.  This is Utilityrogue.

And then I have realized that intelligence is meaningless.  What matters is utility.  Utility is the relative benefit of intelligence.  It's worthless to win "70% of tasks vs the next best software model" by doing 1% better, humans will crush the AI if they get to have 1% more resources.  

Utility is some enormous table of domains and it's a multiplier.  I gave examples.  

General intelligence, like current models are beginning to show, mean that utility in one domain transfer to another domain.  General learning intelligence, which we can do with scripts tacked onto current models, can have utility in all domains but require training data to learn new domains.

Examples:

Utility[designing living creatures]: 

             Humans = millions.  (we could design other living organisms from scratch with all new amino acids in millions of times less time that evolution needs)

             Controllable AI from deepmind : 100* humans or more

             Evolution : 1.0 (baseline)


Utility[chip, algorithm design]

        Humans = 1.0

        Controllable AI from deepmind : Less than 1.1 times humans.  See https://deepmind.google/impact/optimizing-computer-systems-with-more-generalized-ai-tools/.   4% for video compression

  Utility[tank battles]

        Humans = 1.0

        Tactical AI = ?.  But think about it.  Can you win with half the tanks?  Probably.  1/10?  1/100?  Probably not.

  Utility[manufacturing]

       Humans = 1.0

        Machine policy solver = ?

   Utility [aircraft design]

        Humans = 1.0

         RL solver from humans = ?

 

And so on.  Exponential growth complicates this, even a small utility benefit would allow one party to win.  

When I try to reason in a grounded way over "ok, what would the solution from humans look like?  What would the solution from a narrow AI* that humans control look like?  What is the best solution physics allows?". 

 Well it depends.  On easy tasks (pathfinding, aiming a ballistic weapon) or even modestly complex tasks (manipulation, assembly, debugging), I don't see the best solution as being very much better.  For extremely complex tasks I don't know.  

Current empirical data shows only small utility gains for now.

 

If the [Utilitysafe, Utilityrogue] multiplier is very large (100+), in all long term futures the AIs choose the future.  Nature can't abide a vacuum like that.  Doomers/decelerationists can only buy a small amount of time.

If the [Utilitysafe, Utilityrogue] multiplier is small(<2.0), you must accelerate, because the [Utilitysafe] vs [regular humans] multiplier is still an unwinnable battle due to exponential growth.   Humans can survive in this future as long as at least 2/3 of the "bits" - the physical resources - are in the hands of Utilitysafe systems.

Medium term values (2-100) it depends, you need to be very careful but maybe you can keep AI under control for a little while.  You need 99% of the resources to be in the hands of safe systems, escaped AI are very much a crisis.  

 

*myopia and various forms of container are strategies that lower a general AI to a narrow AI without the cost of developing a custom narrow AI.

I appreciate this discussion, and want to throw in my 2 cents:

I believe that any sort of delay (such as via an attempted ban which does slow AI development down) buys some chance of improved outcomes for humanity. In fact, a lot of my hope for a good outcome lies in finding a way to accomplish this delay whilst accelerating safety research.

The timing/speed at which the defection happens matters a lot! Not just the probability of defection occurring. How much does developing AI in secret instead of openly slow down a government? None? I expect it slows it at least somewhat.

[-][anonymous]5mo4-2

So let's engage on that.

Imagine it's the cold war. You are an anti nuclear advocate. "STOP preparing to put city killer fusion bombs onto missiles and cramming them into submarines! You endanger human civilization itself, it is possible that we could all die!" you say.

Faction wise you get dismissed as a communist sympathizer. Your security clearances would be revoked and you would be fired as a federal employee in that era. "Communist" then means "decel" now.

I would agree that every part of the statement above is true. Preparing to commit genocide with thermonuclear devices is a morally dubious act, and while there is debate on the probability of a nuclear winter, it is correct to say that the chance is nonzero that a total nuclear war at the cold war arsenal peak could have extincted humanity or weakened humanity to the point the next crisis finished it off.

But...you don't have a choice. You are locked into a race. Trying to reframe it not as a race doesn't change the fact it's a race. Any international agreements not to build loaded ICBMs you can confidently predict the other parties will secretly defect on.

Historically several countries did defect : Israel, Taiwan, and South Africa being the immediate 3 I can think of. Right now Ukraine is being brutalized for its choice to cooperate. Related to your point above, they took longer to get nukes than the superpowers did due to the need for secrecy.

I think we are in the same situation now. I think the payoff matrix in favor of AI is far more favorable to the case for AI than nukes were. The incentives are far, far stronger.

So it seems like a grounded view, based on the facts, to predict the following outcome: AI acceleration until the singularity.

In more details : you talk about regulations and possibly efforts going underground. This only becomes relevant if out of all major power blocks, there is not at least one bloc building ASI at the maximum speed possible. I think this is what will happen as this seems to be what is happening today, so it just has to continue in one power bloc.

We have not seen a total war effort yet but it seems maybe inevitable. (Right now ai investment is still a tiny part of the economy at maybe 200 Billion/year. A total war effort means the party puts all resources into preparing for the coming nuclear war and developing ASI. A less than total war effort, but most of the economy goes into ai, is also possible and would be what economics would cause to happen naturally)

The other blocs choices don't matter at that point, since the winner will take over the market for ASI services and own (or be subsumed by) the outcome.

There's other cold war analogs as well. Many people during the era expected the war would happen during their career. They participated in developing and deploying nukes and expected a fight to the end. Yet again, what choice did they have.

Do we actually have a choice now?

As far as I know, most datacenters are not currently secretly underground in bomb-proof bunkers. If this were known to be the case, I'd have different views of the probable events in the next few years. 

Currently, as far as I know, most data centers are out in the open. They weren't built with the knowledge that soon they would become the equivalent of nuclear ICBM silos. The current state of the world is heavily offense-favoring, so far as I know.

Do you disagree? Do you think ASI will be fully developed and utilized sufficiently to achieve complete worldwide hegemony by the user without there being any major international conflict? 

[-][anonymous]5mo42

Well, as I mentioned in the other post, but I will open a larger point here: anything physics permits can theoretically happen in the future, right? For complex situations like this, scientifically validated models like particle physics do not exist yet. All we can do is look at what historically happened in a similar scenario and our prior should be that the outcome draw is from the same probability distribution this round. Agree/disagree?

So for nuclear programs, all the outcomes have happened. Superpowers have built vast campuses and secret cities, and made the plutonium and assembled the devices in aboveground facilities. Israel apparently did it underground. Iran has been successfully decelerated from their nuclear ambitions for decades.

But the general trend I feel is that a superpower can't be stopped by bombing, and another common element has happened a bunch of times historically. Bombing and military actions often harden a belligerents resolve. They hardened the UKs resolve, US resolve, Russian resolve, it goes on.

So in the hypothetical world where party A is too worried about AI dangers to build their own, party B is building it, unless A can kill B, B would respond to the attack by a total war effort and would develop AI and win or die. (Die from nukes, die from the AI, or win the planet)

That is what I predict would be the outcome and we can enumerate all the wars where this historically has happened if you would like. Trend wise the civilian and military leaders on B, consuming their internal propaganda, tend to commit to total war.

If B is a superpower (the USA and China, maybe EU) they can kill all the other parties with their nukes, so choosing to kill B is choosing suicide for yourself.

Yes, I am imagining that there is some sort of toothy international agreement with official inspectors posted at every data center worldwide. That's what I mean by delay. Or the delay which could come from the current leading labs slowing down and letting China catch up. 

If the first lab to cross the finish line gets forcibly nationalized or assassinated by state-sponsored terrorist groups, why hurry? Why stay ahead? If we can buy six more months by not rushing quite as hard, why not buy them? What do the big labs lose by letting China catch up? Maybe this seems unlikely now, but I expect the end-game months are going to look a lot different. 

See my other comment here: https://www.lesswrong.com/posts/A4nfKtD9MPFBaa5ME/we-re-all-in-this-together?commentId=JqbvwWPtXbFuDqaib 

[-][anonymous]5mo20

Responding to your other comment: Probably AI labs will be nationalized, yes, as models reach capabilities levels to be weapons in themselves.

The one here: a "toothy international agreement" to me sounds indistinguishable from the "world government and nukes are banned" world that makes rational sense from the 1950s but has not happened for 73 years.

Why would it happen this time? In your world model, are you imagining that the "world government" outcome was always a non negligible probability and the dice roll could go this way?

Or do you think that a world that let countries threaten humanity and every living person in a city with doomsday nuclear arsenals would consider total human extinction a more serious threat than nukes, and people would come together to an agreement?

Or do you think the underlying technology or sociological structure of the world has changed in a way that allows world governments now, but didn't then?

I genuinely don't know how you are reaching these conclusions. Do you see my perspective? Countless forces between human groups create trends, and those trends are the history and economics we know. To expect a different result requires the underlying rules to have shifted.

A world government seems much more plausible to me in a world where the only surviving fraction of humanity is huddled in terror in the few remaining underground bunkers belonging to a single nation.

Note: I don't advocate for this world outcome, but I do see it as a likely outcome in the worlds where strong international cooperation fails.

And if we set things up such that the AI's value system is being decided as a matter of national policy, via a democratic process? Where people discuss it among themselves, with politicians chiming in, and maybe then vote on it, and then the result is interpreted by some bureaucratic process, and then someone types it in? Don't make me laugh. We'd be lucky if the AI that'd result from this just kills everyone and tiles the universe with paperwork, rather than trapping everyone in some inescapable Kafkaesque nightmare.

  • (To be clear, it's not because the people will want bad things. It's because our processes for eliciting and agglomerating their preferences – any and all processes in wide use – are an abomination.)

 

We will still need to reconcile our differences later on, of course. But it can be done incrementally, a steady pace of negotiation and power balancing and cultural mingling and sanity-raising. There are routes of intelligence enhancement that are more gradual, that let us preserve this sort of incrementalism and stability while still letting Humanity keep empowering itself. Gradual intelligence augmentation, via biotechnological or cyborgism-like or upload-based means.

The is a lot of statism or pessimism about the potential for improving coordination in your comments. No mentions of the projects of the kind:

Thanks for the links! I've been idly thinking about such projects as well, nice to see what ideas others have been considering. Hopefully there's something workable in there.

But my median prediction is that none of that works, yes. Let alone on the relevant timeline (<10 years). Stuff like Manifold Markets and Twitter's Community Notes are steps in the right direction, but they're so very ridiculously tiny. In the meanwhile, the pressures destroying the coordination ability continue to mount.

My optimism would rise dramatically if one of these ideas spawns, say, an SV startup centered around it that ends up valued at billions of dollars within the next three years.

Maybe this is going to be such a startup? -- A proposal for improving the global online discourse

No-one who is currently racing can be trusted. A US company successfully solving alignment isn't particularly more likely to result in a pro-humanity utopia, or an utopia for you, than the Chinese doing it first.

Why do you think this. I don't know how you even could find a sufficiently large team of californian AI researchers who lack cosmopolitan values, who wouldn't blow the whistle if someone asked them to implement like, a nationalistic or oligarchic (They still have capped profits.) utility function or whatever instead of the obvious "optimize the preferences of all currently living humans".

I feel like you have to be missing a basic understanding of the cultural background of college-educated californians or the degree of decentralization in their decisionmaking (you could read sama's reinstatement as a demonstration of a cult of personality, but I think that is wrong, what actually happened was staff organized, chose the leader they preferred, and got their way).

In general, I agree that global cooperation might not be possible, but the door hasn't closed yet, and if it closes, there is still something that can be done.

Big upvote for looking for routes to cooperation instead of either despairing or looking for reasons for conflict.

This all got a little long, so I'll put the biggest conclusion up front: I think we're in a good situation right now, in which the leading players pursuing AGI are probably not sadists, dense psychopaths, or zealots of mistaken ideologies. We'd probably like their utopias just fine. If we assume the competition to control aligned AGI will be much broader, we have more reason to be concerned.

One major crux of this post's claims is the intuition that there would be only minor variations in the "utopia" brought about by different actors with an aligned ASI. Intuitions/theories seem to vary widely on this point. OP didn't present much argument for that, so let me expand on it a little.

In sum, it's a question about human nature. Given unlimited power, will people use it to give people what they want, or will they enforce a world order most people hate?

This of course depends on the individuals and ideologies that wind up controlling that AGI.

It requires little benevolence to help people when it's trivially easy for the people in charge to do it.

This is one reason for optimism. It's based on the prediction that aligned AGI becomes aligned ASI and ASI can create a post-scarcity world. In a post-scarcity world, everyone can easily be given material resources that empower them to do whatever they want.

The pessimistic view is that some people or organizations don't have even the small bit of benevolence required to do good when it's trivially easy.

The thesis is that other motivations would outweigh their small amount of empathy/benevolence. This could be sadism; desire for revenge for perceived harms; or sincere belief in a mistaken worldview (e.g., religion or other prescriptive forms of ethics).

I think those possibilities are real, but we must also ask how those ideologies and emotional preferences would change over time.

Another reason for pessimism is not believing or not considering the post-scarcity hypothesis. The way corporations and individuals wield power in a world with scarcity does not inspire confidence. But the profit motive barely applies once you've "won that game". How would a corporation with access to unlimited production use that power? I think it would depend on the particular individuals in power, and their ideologies. And the power motive that's so destructive loses much of its emotional force once that individual has attained nearly unlimited power.

The creators of aligned AGI have won whatever game they've been playing. They have access to unlimited personal wealth and power. They and their loved ones are permanently safe from physical harms. No individual in history has ever been in that position.

I think the common saying "power corrupts" is quite mistaken, in an understandable way. The pursuit of power is what corrupts. It rewards unethical decisions, and provides pressure for the individual to see those decisions as ethical or virtuous. This corrupts their character. Every leader in history has had legitimate concerns about preserving their power. The individuals controlling AGI would not. If this is correct, the question is how corrupted they became while acquiring power, and whether they'll over time become more generous, as the legitimate reasons for selfishness disappear in reality, and perhaps in their emotional makeup as a result.

There's another important concern that sociopaths tend to acquire power in our current systems, while hiding their sociopathy. I think this is true, but we don't know how common it is. And we don't know that much about sociopathy. I think it probably exists on a spectrum, since it doesn't have a single source of genetic variance.

So, I hold out some hope that even most people on the sociopathy spectrum, or ideologically confused power structures would shift toward having the minimal benevolence-balance to provide a pretty awesome utopia. But I'd prefer to gamble on the utopias offered by Altman, Hassabis, or Amodio. This is an argument against an AI pause, but not a strong one.

If this is correct, the question is how corrupted they became while acquiring power, and whether they'll over time become more generous, as the legitimate reasons for selfishness disappear in reality, and perhaps in their emotional makeup as a result.

Sure, maybe. Or maybe they'll choose to drift farther and farther from human-like motivations themselves, subjected to whatever post-human experiences they'll choose to indulge in or by directly self-modifying themselves. And even if they do become more compassionate later on, it may be already too late by then.

I think you're imagining whoever nabs the AGI controls as continuing to live among humanity for a while, with the planet still mostly arranged the way it is now, and gradually learning empathy as they interact with people.

I expect things to get much, much weirder, basically immediately.

It requires little benevolence to help people when it's trivially easy for the people in charge to do it.

It's also easy to forget about the existence of people whom you don't need and who can't force you to remember them. And just tile over them.

Not maliciously. Not even in a way that chooses to be actively uncaring. Just by... not particularly thinking about them, as you live in your post-human simulation spaces or exploring the universe or whatever. And when you do come back to check, you notice the AI has converted all of them to computronium, because at some point you'd value-drifted away from caring about them, and the AI noticed that. And true enough, you're not sad to discover they're all gone. Just an "oh well".

It's the ethical analogue to Oversight Misses 100% of Thoughts [You] Don't Think.

Like, you know all the caricatures about out-of-touch rich people? That'll be that, but up to eleven.

I think the common saying "power corrupts" is quite mistaken, in an understandable way. The pursuit of power is what corrupts

Agreed, 100%.

What's strange to me about this line of reasoning is the assumption that society will behave in an orderly peaceful status-quo-maintaining way right up until ASI is in the hands of a small group of individuals.

That seems unlikely to me, unless the ramp up from near-AGI to full-blown ASI is faster than I expect. I expect a period of several months or even a couple of years.

Those aren't likely to be months filled with tranquility and orderly respect for private property and common law and good international relations. Not if the state of things is widely known by major power groups such as state actors.

Do you think that the US or China or Russia is going to let a company get full power over the world without a fight? That coercive force and extralegal activities will play no part in this?

Also, it's not all 'mutually assured nuclear destruction' or nothing. The post-cold-war world has shown us lots of ways that state violence happens despite the looming threat of nuclear war. Particularly, false-front terrorist groups being funded and encouraged by state actors with loose control over them. 

And a big big factor here that changes everything is what technologies get uncovered / unleashed in the months leading up to this dramatic final sprint. The closer we get to AGI the faster new technologies will appear. 

I expect there's going to be new scary offense-favoring technological developments in the next 2-5 years as we approach AGI, coming more rapidly the closer we get.

That is not a scenario that makes for peaceful orderly acquisition of total world hegemony by corporate leaders.

I expect it looks a lot more like this, except without the reappearances: https://www.bbc.com/news/world-asia-64672095 

You're absolutely right that the government will get involved. I was hoping for more of a collaboration between the tech company that creates it, and the government. If we don't have a breakdown in democracy (which is entirely possible but I don't think inevitable), that will put the government in charge of ASI. Which sounds bad, but it could be worse - having nobody but the ASI in charge, it it being misaligned.

My hope is something like "okay, we've realized that this is way too dangerous to go making more of them. So nobody is allowed to. But we're going to use this one for the betterment of all. Technologies it creates will be distributed without restrictions, except when needed for safety".

Of course that will be a biased version of "for the good of all", but now I think the public scrutiny and the sheer ease of doing good might actually win out.

Hmm. I still think that perhaps this view of a possible future might not be taking enough account of the fact that 'the government' isn't a thing. There are in fact several governments governing various portions of the human population, and they don't always agree on who should be in charge. I am suggesting that whichever of these governments seems to be about to take control of a technology which will give it complete control over all the other governments... might be in for a rocky time. Sometimes they get mad at each other, or power hungry, and do some rather undemocratic things.

Right. I mean the US government. My timelines are short enough that I expect one of the US tech firms to achieve AGI first. Other scenarios seem possible but unlikely to me.

The scenario I am suggesting as seeming likely to me is that Russia and/or China are going to, at some point, recognize that the US companies (and thus US government) are on the brink of achieving AGI sufficiently powerful to ensure global hegemony. I expect that in that moment, if there is not a strong international treaty regarding sharing of power, that Russian and/or China will feel backed into a corner. In the face of an existential risk to their governance, the governments and militaries are likely to undertake either overt or covert acts of war. 

If such a scenario does come to pass, in a highly offense-favoring fragile world-state as the one we are in, the results would likely be extremely messy. As in, lots of civilian casualties, and most or all of the employees of the leading labs dead.

Thus, I don't think it makes sense to focus on the idea of "OpenAI develops ASI and the world smoothly transitions into Sam Altman as All-Powerful-Ruler-of-Everything-Forever" without also considering that an even more likely scenario if things seem to be going that way is all employees of OpenAI dead, most US datacenters bombed, and a probable escalation into World War III but with terrifying new technology.

So what I'm saying is that your statement:


But I'd prefer to gamble on the utopias offered by Altman, Hassabis, or Amodio. This is an argument against an AI pause, but not a strong one.

Is talking about a scenario that to me seems screened off from probably occurring by really bad outcomes. Like, I'd put less than 5% chance of a leading AI lab getting all the way to deployment-ready aligned ASI without either strong international cooperation and treaties and power-sharing with other nations, or substantial acts of state-sponsored violence with probable escalation to World War. I believe a peaceful resolution in this scenario requires treaties first.

You have a hypothesis that powerful people are amenable to reason. I see a lot of countervailing evidence.

powerful people are extremely adversarially robust, and adversarial robustness degrades communication drastically.

Powerful people are not amenable to reason in the naively straightforward sense where someone tells a true thing at them a lot and then they act like they believe it. Nevertheless, there are complicated sociopolitical pathways by which the actions of powerful systems and players can be influenced to make them act in accordance with some ground truth of reality. You can tell by how, for example, a lot of institutions act like gravity is a real thing, or like there are such things as "crime" or "natural disasters", or by how they're continually forced to acknowledge the preferences of novel social movements (e. g., sexual minorities).

It is not easy or straightforward, but it is not hopelessly undoable.

I agree it isn't hopeless. But I do think that if someone only conforms to something when they are forced to (including truth and reason) we can't say that they care about the thing, we can at most say that they treat it as a constraint.

Your strong conclusion 

The people currently taking actions that increase the probability that {the former is solved first} are not evil people trying to kill everyone, they're confused people who think that their actions are actually increasing the probability that {the latter is solved first}.

rests on a few over-simplifications associated with

Almost all of what the future contains is determined by which of the two following engineering problems is solved first:

  • How to build a superintelligent AI (if solved first, everyone dies forever)
  • How to build an aligned superintelligent AI (if solved first, everyone gets utopia)

For example:

  • Say, I'm too old to expect aligned AI to give me eternal life (or aligned AI simply might not mean eternal life/bliss for me, for whichever reason; maybe as it's still better to start with newborns more efficiently made into bliss-enjoying automatons or whatever utopia entails), so for me individually, the intermediate years before superintelligence are the relevant ones, so I might rationally want to earn money by working on enriching myself, whatever the (un-)alignment impact of it
  • Given the public goods nature of alignment, I might fear it's unlikely for us to cooperate, so we'll all freeride by working to enrich ourselves working on various things rather leaning towards building unaligned AI. With such a prior, it may be rational for any self-interested person - also absent confusion - to indeed freeride: 'hopefully I make a bit of money for an enjoyable few or many years, before unaligned AGI destroys us with near-certainty anyway'.
  • Even if technical alignment was not too difficult, standard Moloch-type effects (again, all sorts of freeriding/power seeking) might mean chances of unaligned users of even otherwise 'aligned' technology are overwhelming, again meaning most value for most people lies in increasing their material welfare for the next few years to come, rather than 'wasting' their resources towards an futile alignment project.
  • Assume we're natural humans, instead of perfectly patient beings (I know, weird assumption). So you're eating the chocolate and drinking the booze, even if you know in the longer-term future it's a net negative for you. You need not be confused about it, you need not even be strictly irrational from today's point of view (depending on the exact definition of the term): As you might genuinely care more about the "you" in the current and coming few days or years, than about that future self that simply doesn't yet feel too close to you - in other words, just the way it can be rational to care a bit more about yourself than about others, you might care a bit more about your 'current' self than about that "you" a few decades - or an eternity - ahead. So, the near-eternal utopia might mean a lot to you, but not infinitely much. It is easy to see that then, it may well remain rational for you to use your resources towards more mundane aims of increasing your material welfare for the few intermediate years to come - given that your own potential marginal contribution to P(Alignment first) is very small (<<1).

Hence, I'm afraid, when you take into account real-world complexities, it may well be rather perfectly rational for individuals to race on towards unaligned superintelligence; you'd require more altruism rather than only more enlightenment to improve the situation. In a messy world of 8 billions in capitalistic partly-zero-sum competition (or here even partly negative-sum competition), it simply isn't simple to get cooperation even if individuals were approximately rational and informed - even if, indeed, we are all in this together.

  • Say, I'm too old to expect aligned AI to give me eternal life (or aligned AI simply might not mean eternal life/bliss for me, for whichever reason; maybe as it's still better to start with newborns more efficiently made into bliss-enjoying automatons or whatever utopia entails), so for me individually, the intermediate years before superintelligence are the relevant ones, so I might rationally want to earn money by working on enriching myself, whatever the (un-)alignment impact of it

I expect that the set of people who:

  • Expect to have died of old age within five years
  • Are willing to reduce how long they'll expected-live in order to be richer before they die
  • Are willing to sacrifice all of humanity's future (including the future of their loved-ones who aren't expected to die of old age within five years)
  • Take actions who impact the what superintelligence is built

is extremely small. It's not like Sam Altman is 70.


  • Given the public goods nature of alignment, I might fear it's unlikely for us to cooperate, so we'll all freeride by working to enrich ourselves working on various things rather leaning towards building unaligned AI. With such a prior, it may be rational for any self-interested person - also absent confusion - to indeed freeride: 'hopefully I make a bit of money for an enjoyable few or many years, before unaligned AGI destroys us with near-certainty anyway'.

  • Even if technical alignment was not too difficult, standard Moloch-type effects (again, all sorts of freeriding/power seeking) might mean chances of unaligned users of even otherwise 'aligned' technology are overwhelming, again meaning most value for most people lies in increasing their material welfare for the next few years to come, rather than 'wasting' their resources towards an futile alignment project.

But it's not public-goods kind of thing. If they knew that the choice was between:

  • Rich now, dead in five years
  • Less rich now, post-scarcely-rich and immortal and in utopia forever in five years

then pretty much nobody would still choose the former. If people realized the truth, they would choose otherwise.

In my experience, most of the selfishness people claim to have to justify continuing to destroy the world instead of helping alignment is less {because that's their actual core values and they're acting rationally} and more just {finding excuses to not have to think about the problem and change their minds/actions}. I talk more about this here.

To be fair, this is less a failure of {being wrong about this specific thing}, and more a failure of {being less good at rationality in general}. But it's still mistake-theoritic moreso than conflict-theoritic.

I expect that the set of people who:

  • Expect to have died of old age within five years
  • Are willing to reduce how long they'll expected-live in order to be richer before they die
  • Are willing to sacrifice all of humanity's future (including the future of their loved-ones who aren't expected to die of old age within five years)
  • Take actions who impact the what superintelligence is built

is extremely small.

It would be extremely small if we'd be talking about binaries/pure certainty.

If in reality, everything is uncertain, and in particular (as I think), everyone has individually a tiny probability of changing the outcome, everyone ends up free-riding.

This is true for the commoner[1] who's using ChatGPT or whichever cheapest & fastest AI tool he finds for him to succeed in his work, therefore supporting the AI race and "Take actions who impact the what superintelligence is built".

It may also be true for CEOs of many AI companies. Yes their distopia-probability-impact is larger, but equally so do their own career, status, power - and future position within the potential new society, see jacob_cannell's comment - depend more strongly hinge on their action.

(Imperfect illustrative analogy: Climate change may kill a hundred million people or so, the being called human will tend to fly around the world, heating it up. Would anyone be willing to "sacrifice" hundred million people for her trip to Bali? I have some hope they wouldn't. But, she'll not avoid the holiday if her probability of avoiding disastrous climate change anyway is tiny. And if instead of her holiday, her entire career, fame, power depended on her to continue polluting, even if she was a global scale polluter, she'd likely enough not stop emitting for the sake of changing. I think we clearly must acknowledge this type of public good/freerider dynamics in the AI domain.

***

In my experience, most of the selfishness people claim to have to justify continuing to destroy the world instead of helping alignment is less {because that's their actual core values and they're acting rationally} and more just {finding excuses to not have to think about the problem and change their minds/actions}.

Agree with a lot in this, but w/o changing my interpretation much: Yes, humans are good in rationalization of their bad actions indeed. But they're especially good at it when it's in their egoistic interest to continue the bad thing. So both the commoner and the AI CEO alike, might well rationalize 'for complicated reason it's fine for the world if we (one way or another) heat up the AI race a bit' in irrational ways - really as they might rightly see it in their own material interest to be continuing to do so, and want to make their own brain & others see them as good persons nevertheless.

 

  1. ^

    Btw, I agree the situation is a bit different for commoners vs. Sam Altman & co. I read your post as being about persons in general, even people who are merely using the AI tools and therefore economically influence the domain via the market forces. If that was not only my wrong reading, then you might simplify the discussion if you edit your post to refer to those with significant probability of making a difference (I interpret your reply in that way; though I also don't think the result changes much, as I try to explain)

almost all of the reasons that the former is currently a lot more likely are mistake theory reasons.

Not necessarily.  The primary reason is that building an aligned superintelligence is strictly a subset of building a superintelligence, and is therefore harder.  Just how much harder is unknown, but likely orders of magnitude, and perhaps nigh-impossible.

For the question of whether to stop doing a very hard but profitable and likely possible thing, to wait for (since it's unclear how to work on) an extremely hard and maybe impossible thing, there will be a mix of honest disagreement (mistake theory) and adversarial ignorance (conflict).  And there's no easy way to know which is which, as the conflict side will find it easy to pretend ignorance.  

Nitpicking a particular topic of interest to me:

Power/money/being-the-head-of-OpenAI doesn't do anything post-singularity.

It obviously does?

I am very confused why people make claims in this genre. "When the Singularity happens, this (money, conflict, the problems I'm experiencing) won't be a problem anymore."

This mostly strikes me as magical, far-mode thinking. It's like people have an afterlife-shaped hole after losing religion. The specific, real reality in front of you won't magically suddenly change after an Intelligence Explosion and assuming we're alive in some coherent state. Money and power are very, very likely to still exist afterwards, just in a different state that makes sense as a transformation of the current world.

[-]O O5mo8-1

Money and power won't matter as much, but status within your social "tribe" will be probably one of the most important things to most. For example, being good at video games, sports, getting others to follow your ideas, etc. 

"When the Singularity happens, this (money, conflict, the problems I'm experiencing) won't be a problem anymore."

I mean... yeah?

Some things I think would cause people to disagree:

  • They think a friendly-AGI-run society would have some use for money, conflict, etc. I'd say the onus is on them to explain why we would need those things in such a society.
  • They think a good "singularity" would not be particularly "weird" or sci-fi looking, which ignores the evidence of technological development throughout history. I think this is what the "The specific, real reality in front of you" sentence is about. A medieval peasant would very much disagree with that sentence, if they were suddenly thrust into a modern grocery store. I think they would say the physical reality around them changed to a pretty magical-seeming degree.
  • Any/all of the above, but applied to a harmful singularity. (E.g. thinking that an unfriendly AGI could not kill everyone, rendering their previous problems irrelevant.)

This seems to be a combo of the absurdity heuristic and trying to "psychoanalyze your way to the truth". Just because something sounds kind of like some elements of some religions, does not make it automatically false.

(I'd be less antsy about this if this was a layperson's comment in some reddit thread, but this is a LessWrong comment on an AI alignment researcher's post. I did not to see this sort of thing in this place at this time.)

A medieval peasant would very much disagree with that sentence, if they were suddenly thrust into a modern grocery store. I think they would say the physical reality around them changed to a pretty magical-seeming degree.

 

They would still understand the concept of paying money for food. The grocery store is pretty amazing but  it's fundamentally the same transaction as the village market. I think the burden of proof is on people claiming that money will be 'done away with' because 'post-scarcity', when there will always be economic scarcity. It might take an hour of explanation and emotional adjustment for a time-displaced peasant to understand the gist of the store, but it's part of a clear incremental evolution of stores over time.

I think a basically friendly society is one that exists at all and is reasonably okay (at least somewhat clearly better) compared to the current one. I don't see why economic transactions, conflicts of all sorts, etc wouldn't still happen, assuming the lack of existentially-destructive ones that would preclude the existence of such a hypothetical society. I can see the nature of money changing, but not the fundamentals of there being trades.

I don't think AI can just decide to do away with conflicts via unilateral fiat without an enormous amount of multipolar effort, in what I would consider a friendly society not ran by a world dictator. Like, I predict it would be quite likely terrible to have an ASI with such disproportionate power that it is able to do that, given it could/would be co-opted by power-seekers. 

I also think that trying to change things too fast or 'do away with problems' is itself something trending along the spectrum of unfriendliness from the perspective of a lot of humans. I don't think the Poof Into Utopia After FOOM model makes sense, that you have one shot to send a singleton rocket into gravity with the right values  or forever hold your peace. This thing itself would be an unfriendly agent to have such totalizing power and make things go Poof without clear democratic deliberation and consent. This seems like one of the planks of SIAI ideology that seems clearly wrong to me, now, though not indubitably so. There seems to be a desire to make everything right and obtain unlimited power to do so, and this seems intolerant of a diversity of values.

This seems to be a combo of the absurdity heuristic and trying to "psychoanalyze your way to the truth". Just because something sounds kind of like some elements of some religions, does not make it automatically false.

I am perfectly happy to point out the ways people around here obviously use Singularitarianism as a (semi-)religion, sometimes, as part of the functional purpose of the memetic package. Not allowing such social observations would be epistemically distortive. I am not saying it isn't also other things, nor am I saying it's bad to have religion, except that problems tend to arise. I think I am in this thread, on these questions, coming with more of a Hansonian/outside view perspective than the AI zookeeper/nanny/fully automated luxury gay space communism one.

They think a friendly-AGI-run society would have some use for money, conflict, etc. I'd say the onus is on them to explain why we would need those things in such a society.

Because of the laws of thermodynamics holding, basically. I do buy that a lot of stuff could switch over to non-money modes, but if we assume that the basic laws of physics fundamentally still hold true, then this can't work, and this is one of those areas where you need to give hard evidence.

Much more generally, the Industrial Revolution is a good example, in that it really did improve the lives of humans massively, even with imperfect distribution of benefits, but it didn't end conflict or money, and I'd argue there was a use for money (Although the Industrial Revolution did drastically reduce the benefits of war to non-ideological actors.)

They think a good "singularity" would not be particularly "weird" or sci-fi looking, which ignores the evidence of technological development throughout history. I think this is what the "The specific, real reality in front of you" sentence is about.

Interestingly enough, while I think this is true over the long-term, and potentially even over the short term, I think a major problem is LWers tend to underestimate how long things take to change, and in general have a bit of a bad habit of assuming everything changing at maximum speed.

A medieval peasant would very much disagree with that sentence, if they were suddenly thrust into a modern grocery store. I think they would say the physical reality around them changed to a pretty magical-seeming degree.

I agree that the medieval peasant would be very surprised at how much things changed, but they'd also detect a lot of continuity, and would have a lot of commonalities, especially on the human side of things.

This seems to be a combo of the absurdity heuristic and trying to "psychoanalyze your way to the truth". Just because something sounds kind of like some elements of some religions, does not make it automatically false.

But it does decrease the credence, potentially substantially, and that could be important.

Now, my general view is that I do think there's reason to believe that AI could be the greatest technology in history, but I agree with the OP that there's a little magic often involved, and it's a little bit of a red flag how much AI gets compared to gods.

And contra some people, I do think the psychoanalyze your way to the truth is more useful than people think, especially if you have good a priori reason to expect biases to drive the discussion, because they can allow you to detect red flags.

Power/money/being-the-head-of-OpenAI doesn't do anything post-singularity.

Any realistic practical 'utopia' will still have forms of money (fungible liquid socio-economic power), and some agents will have vastly more of it than others: like today, except only moreso.

I would say some, maybe even the majority, but not all possible post-singularity societies will have fungible liquid socio-economic power. However, I do think that simply pointing out that post-singularity societies are not guaranteed (or even likely) to be free of socio-economic inequality is sufficient to make the point. I'm not sure current power and wealth will translate into important differences in most future societies, but I'm pretty sure that being the head of the lab of the company that succeeds at creating hegemony-capable AI will have payoffs. Its not certain that this hegemony-capable AI will be aligned ASI. In fact, I rather suspect that that outcome is quite difficult to achieve and unlikely. It only needs to be narrow tool AI of sufficient power to ensure no-one else in the world is able to develop competing tech. To me, that seems like a lower bar to clear.

I don't know how meaningful it is to talk about post-singularity, but post-AGI we'll transition to an upload economy if we survive. The marginal cost of upload existence will probably be cheaper - even much cheaper - than human existence immediately, but it looks like the initial cost of uploading using practical brain scanning tech will be very expensive - on order foundation model training cost expensive. So the default scenario is an initial upload utopia for the wealthy. Eventually with ASI we should be able to upload many people en masse more generally via the simulation argument, but then there are interesting tradeoffs of how much wealth can be reserved in advance precommitment for resurrecting ancestors vs earlier uploads, de novo AGI, etc.

I strongly doubt that an aligned superintelligence couldn't upload everyone-who-wants-to-be-uploaded cheaply (given nanotech), but if it couldn't, I doubt its criterion for who gets to be uploaded first will be "whoever happens to have money" rather than a more-likely-to-maximize-utility criterion such as "whoever needs it most right now".

This is why I think wealth allocations are likely to not be relevant; giving stuff to people who had wealth pre-singularity is just not particularly utility-maximizing.

I don't view ASI as substantially different than an upload economy. There are strong theoretical reasons why (relatively extreme) inequality is necessary for pareto efficiency, and pareto efficiency is the very thing which creates utility (see critch's recent argument for example, but there were strong reasons to have similar beliefs long before).

The distribution of contributions towards the future is extremely heavy tailed: most contribute almost nothing, a select few contribute enormously. Future systems must effectively trade with the present to get created at all: this is just as true for corporations as it is for future complex AI systems (which will be very similar to corporations).

Furthermore, uploads will be able to create copies of themselves proportional to their wealth, so wealth and measure become fungible/indistinguishable. This is already true to some extent today - the distribution of genetic ancestry is one of high inequality, the distribution of upload descendancy will be far more inequal and on accelerated timescales.

rather than a more-likely-to-maximize-utility criterion such as "whoever needs it most right now".

This is a bizarre, disastrously misguided socialist political fantasy.

The optimal allocation of future resources over current humans will necessarily take the form of something like a historical backpropagated shapley value distribution: future utility allocated proportionally to counterfactual importance in creating said future utility. Well functioning capitalist economies already do this absent externalities; the function of good governance is to internalize all externalities.

I don't view ASI as substantially different than an upload economy.

I'm very confused about why you think that. Unlike an economy, an aligned ASI is an agent. Its utility function can be something that looks at the kind of economy you ndescribe, and goes "huh, actually, extreme inequality seems not great, what if everyone got a reasonable amount of resources instead".

It's like you don't think the people who get CEV'd would ever notice Moloch; their reflection processes would just go "oh yeah whatever this is fine keep the economy going".

Most worlds where we don't die are worlds where a single aligned ASI achieves decisive strategic advantage (and thus is permanently in charge of the future), in which case that single AI is an agent running a person/group's CEV, which gets to look at the-world-as-a-whole and notice when its utility isn't maximized and then can just do something else which is not that.

Some of the worlds where we don't die are multipolar, which just means that a few different ASIs achieve decisive strategic advantage at a close-enough time that what they fill the lightcone with is a weighed composition of their various utility functions. But the set of ASIs who get to take part in that composition is still a constant set, locked-in forever, without any competition to worry about, and that composed utility function is then what looks at the-world-as-a-whole.

This point, that I think an immutable set of ASIs grab the entire future and then they're in charge forever and they simply stop any competitor to themselves from ever appearing, feels like it's in the direction of the crucial disagreement between our perspectives. Whether my "socialist political fantasy" indeed describes what worlds-where-we-don't-die actually look like feels like it's downstream of that.

the function of good governance is to internalize all externalities.

That sure is a take! Wouldn't the function of good governance be to maximize nice things, regardless of whether that's best achieved by patching all the externality-holes in a capitalist economy, or by doing something which is not that?

I don't view ASI as substantially different than an upload economy.

I'm very confused about why you think that.

You ignored most of my explanation so I'll reiterate a bit differently. But first taboo the ASI fantasy.

  • any good post-AGI future is one with uploading - humans will want this
  • uploads will be very similar to AI, and become moreso as they transcend
  • the resulting upload economy is one of many agents with different values
  • the organizational structure of any pareto optimal multi-agent system is necessarily market-like
  • it is a provable fact that wealth/power inequality is a consequent requisite side effect

Most worlds where we don't die are worlds where a single aligned ASI achieves decisive strategic advantage

Unlikely but it also doesn't matter as what alignment actually means is the resulting ASI must approximate pareto optimality with respect to various stakeholder utility functions, which requires that:

  • it uses stakeholder's own beliefs to evaluate utility of actions
  • it must redistribute stakeholder power (ie wealth) toward agents with better predictive beliefs over time (in a fashion that looks like internal bayesian updating).

In other words, the internal structure of the optimal ASI is nigh indistinguishable from an optimal market.

Additionally, the powerful AI systems which are actually created are far more likely to be one which precommit to honoring their creator stakeholder weath distribution. In fact - that is part of what alignment actually means.

I appreciate these views being stated clearly, and at once feel a positive feeling toward the author, and also am shaking my head No. As others have pointed out, the mistake theory here is confused.

I think it's not exactly wrong. The way in which it's right is this:

If people doing AGI research understood what we understand about the existential risk of AGI, most of them would stop, and AGI research would go much slower.

In other words, most people are amenable to reason on this point, in the sense that they'd respond to reasons to not do something that they've been convinced of. This is not without exception; some players, e.g. Larry Page (according to Elon Musk), want AGI to take the world from humanity.

The way in which the mistake theory is wrong is this:

Many people doing AGI research are not trying, and in many cases trying not, to understand what we understand about AGI risk.

So it's not just a mistake. It's a choice, that choice has motivations, and those motivations are in conflict with our motivations, insofar as they shelter themselves from reason.

Larry Page (according to Elon Musk), want AGI to take the world from humanity

(IIRC, Tegmark, who was present for the relevant event, has confirmed that Page had stated his position as described.)

So it's not just a mistake. It's a choice, that choice has motivations, and those motivations are in conflict with our motivations, insofar as they shelter themselves from reason.

This still seems, to me, like a special case of "mistake".

It's not just epistemic confusion that can be most easily corrected with good evidence and arguments. That's what I think we're talking about.

Well, I wrote about this here: https://www.lesswrong.com/posts/tMtMHvcwpsWqf9dgS/class-consciousness-for-those-against-the-class-system

But the internet loves to downvote without explaining why...

But ultimately, for the parts that really matter here, this is a matter of explaining, not of defeating

Of course, defeating people who are mistakenly doing the wrong thing could also work, no? Even if we take the assumption that people doing the wrong thing are merely making a mistake by their own lights to be doing so, it might be practically much more feasible to divert them away from doing it or otherwise prevent them from doing it, rather than to rely on successfully convincing them not to do it. 

Not all people are going to be equally amenable to explanation. It's not obvious to me at least that we should limit ourselves to that tool in the toolbox as a rule, even under an assumption where everyone chasing bad outcomes is simply mistaken/confused.

But I'm pretty sure nobody in charge is on purpose trying to kill everyone; they're just on accident functionally trying to kill everyone.

I'm less sure about this. I've met plenty of human extinctionists. You could argue that they're just making a mistake, that it's just an accident. But I do think it is meaningful that there are people who are willing to profess that they want humanity to go extinct and take actions in the world that they think nudge us towards that direction, and other people that don't do those things. The distinction is a meaningful one, even under a model where you claim that such people are fundamentally confused and that if they were somehow less confused they would pursue better things. 

And if you're not using your power/money to affect which of those two outcomes is more likely to happen than the other, then your power/money is completely useless. They won't be useful if we all die, and they won't be useful if we get utopia.

I disagree with this, because I think the following three things are true:

  1. There is a finite amount of value in the accessible universe (or multiverse, or whatever).
  2. Some people have unbounded "values", especially around positional goods like status among other humans.

A way I imagine this concretely playing out, conditional on intent alignment succeeding, is that very powerful post-human beings descended from people who controlled AI during the pivotal period playing very costly status games with each other, constructing the cosmic equivalent of the Bughatti Veyron or the Oman Royal Yacht Squadron, without being concerned with impartial value. I still expect them to provide for the "basic" needs of humanity, because it is so incredibly cheap, making it a utopia for people with bounded or modest goals, but e.g. preventing impartial hedonic utilitarians or people with many positional or nosy values from enacting their goals.

This depends on the people ultimately in charge of powerful AI systems to be philosophically unsophisticated, but most people are philosophically unsophisticated, and philosophical sophistication appears mostly uncorrelated with engineering or business success, so this doesn't seem like a bottleneck.

This view of course fails when single individuals become exceedingly powerful, in which case I don't have as strong a story. I'd be interested in what individual humans have historically done when they were strongly dominant over all forces around them.

Um, I'm pretty sure that history has some examples of what individual humans tend to do when strongly dominant over all local social forces. If we extrapolate from that, it uh, doesn't look pretty. We can hope that things will be different when there is an abundance of material wealth, but I don't feel much confidence in that.

the next 0 to 5 years, which is about how long we have

Can you do a full post on how you see the next 5 years unfolding if it does take that long? I read takeoff speeds on your blog, can you assign best guess timeframes for your model?

[-]Hide5mo20

It's encouraging to see more emphasis recently on the political and public-facing aspects of alignment. We are living in a far-better-than-worst-case world where people, including powerful ones, are open to being convinced. They just need to be taught - to have it explained to them intuitively.

It seems cached beliefs produced by works like you get about five words have led to a passive, unspoken attitude among many informed people that attempting to explain anything complicated is futile. It isn't futile. It's just difficult. 

I think "you get about five words" is mostly right... It's just that it's "five words per message", not "five words on the issue, total". You have to explain in short bursts, but you can keep building on your previous explanations.

[+]bender5mo-11-14