Felix Choussat — LessWrong

Similarly, it's worth being careful of arguments that lean heavily into longtermism or support concentration of power, because those frames can be used to justify pretty much anything. It doesn't mean we should dismiss them outright---arguments for accumulating power are and long term thinking are convincing for a reason---but you should double check whether the author has strong principles, the path to getting there, and what it's explicitly trading off against.

Re: Vitalik Buterin on galaxy brain resistance.

ryan_greenblatt's Shortform

Felix Choussat13d51

Why do you see futures where superintelligent AIs avoid extinction but end up preserving the human status quo as the most likely outcome? To me, this seems like a knife's edge situation: the powerful AIs are aligned enough to avoid either eliminating humans as strategic competitors or incidentally killing us as a byproduct of industrial expansion, but not aligned enought to respect any individual or collective preferences for long lives or the cosmic endowment. The future might be much more dichotomous, where we end up in the basin of extinction or utopia pretty reliably.

I personally believe the positive attractor basin is pretty likely (relative to the middle ground, not extinction), because welfare will be extraordinarily cheap compared to the total available resources, and because I discount the value of creating future happy people compared to gaurantees for people that already exist. I wouldn't see it as a tragic loss of human potential, for instance, if 90% of the galaxy ends up being used for alien purposes while 10% is allocated to human flourishing, even if 10x as many happy people could have existed otherwise.

nikola's Shortform

Felix Choussat21d30

In this case, the problem isn't that superpower A is gaining an unfair fraction of resources, it's that gaining enough of them would (presumably) allow them to assert a DSA over superpower B, threatening what B already owns. Analagously, it makes sense to precommit to nuking an offensive realist that's attempting to build ICBM defenses, because it signals that they are aiming to disempower you in the future. You also wouldn't necessarily need to escalate to the civilian population as a deterrent right away: instead, you could just focus on disabling the defensive infrastructure while it's being built, only escalating further if A undermines your efforts (such as by building their defensive systems next to civilians).

Any plan of this sort would be very difficult to enforce with humans because of private information and commitment problems, but there are probably technical solutions for AIs to verifiably prove their motivations and commitments (ex: co-design).

nikola's Shortform

Felix Choussat21d10

If human rights were to become a terminal value for the ASI, then the contingencies at the heart of deterrence theory become unjustifiable since they establish conditions under which those rights can be revoked, thus contradicting the notion of human rights as a terminal value.

I'm a bit unclear on what this is means. If you see preserving humans as a priority, why would threatening other humans to ensure strategic stability run against that? Countervalue targeting today works on the same principles, with nations that are ~aligned on human rights but willing to commit to violating them in retaliation to preserve strategic stability.

nikola's Shortform

Felix Choussat21d30

Presumably superpower B will precommit using their offense-dominant weapons before the retaliation-proof (or splendid first strike enabling) infrastructure is built. It's technically possible today to saturate space with enough interceptors to blow up ICBMs during boost phase, but it would take so many years to establish full coverage that any opponent you're hoping to disempower has time to threaten you preemptively. It also seems likely to me that AIs will be much better at making binding and verifiable commitments of this sort, which humans could never be trusted to make legitimately.

As far as whether the population remains relevant, that probably happens through some value lock-in for the early ASIs, such as minimal human rights. In that case, humans would stay useful countervalue targets, even if their instrumental value to war is gone.

DAL's Shortform

Felix Choussat1mo10

Wait but Why has a two part article series on the implications of advanced AIs that, although it's predates interest in LLMs, is really accessible and easy to read. If they're already familiar with the basics of AI, just the second article is probably enough.

Michael Nielsen's How to be a Wise Optimist is maybe a bit longer than you're looking for, but does a good job of framing safety vs capabilities in (imo) an intuitive way.

On Owning Galaxies

Felix Choussat1mo51

An important part of the property story is that it smuggles in the assumption of intent-alignment to shareholders into the discussion. IE, the AI's original developers or the government executives that are running the project adjust the model spec in such a way that it alignment is "do what my owners want", where owners are anyone who owned a share in the AI company.

I find it somewhat plausible that we get intent alignment. ^[1] But I think the transmutation from "the board of directors/engineers who actually write the model spec are in control" to "voting rights over model values are distributed by stock ownership" is basically nonsense, because most of those shareholders will have no direct way to influence the AIs values during the takeoff period. What property rights do exist would be at the discretion of those influential executives, as well as managed by differences in hard power if there's a multipolar scenario (ex: US/Chinese division of the lightcone).

As a sidenote, Tim Underwood's The Accord is a well written look at what the literal consequences of locking in our contemporary property rights for the rest of time might look like.

^{^}
It makes sense to expect the groups bankrolling AI development to prefer an AI that's aligned to their own interests, rather than humanity at large. On the other hand, it might be the case that intent alignment is harder/less robust than deontological alignment, at which point you'd expect most moral systems to forbid galactic-level inequality.

Fabien's Shortform

Felix Choussat1mo*110

Humanity can be extremely unserious about doom - it is frightening how many gambles were made during the cold war: the US had some breakdown in communication such that they planned to defend Europe with massive nuclear strikes at a point in time where they only had a few nukes that were barely ready, there were many near misses, hierarchies often hid how bad the security of nukes was - resulting in inadequate systems and lost nukes, etc.

It gets worse than this. I’ve been reading through Ellsberg’s recollections about being a nuclear war planner for the Kennedy administration, and its striking just how many people had effectively unilateral launch authority. The idea that the president is the only person that can launch a nuke has never really been true, but it was especially clear back in the 50s and 60s, when we used to routinely delegate that power to commanders in the field. Hell, MacArthur’s plan to win in Korea would have involved nuking the north so severely that it would be impossible for China to send reinforcements, since they’d have to cross through hundreds of miles of irradiated soil.

And this is just in America. Every nuclear state has had (and likely continues to have) its own version of this emergency delegation. What’s to prevent a high ranking Pakistani or North Korean general from taking advantage of the same weaknesses?

My takeaway from this vis-a-vis ASI is that a) having a transparent, distributed chain of command with lots of friction is important, and b) that the fewer of these chains of command have to exist, the better.

New 80k problem profile: extreme power concentration

Felix Choussat2mo*10

You're right that there are ways to address proliferation other than to outright restrict the underlying models (such as hardening defensive targets, bargaining with attackers, or restricting the materials used to make asymmetric weapons). These strategies can look attractive either because we inevitably have to use them (if you think restricting proliferation is impossible) or because they require less concentration of power.

Unfortunately, each of these strategies are probably doomed without an accompanying nonproliferation regime.

1. Hardening - The main limitation of defensive resilience is that future weapons will be very high impact, and that you will need to be secure against all of them. Tools like mirror life can plausibly threaten everyone on Earth, and we'd need defense dominance against not just it, but every possible weapon that superintelligences can cheaply design before they can be allowed to be widely proliferated. It strikes me as very unlikely that there will happen to be defense-dominant solutions against every possible superweapon, especially solutions that are decentralized and don't rely on massive central investment anyways.

Although investing in defense against these superweapons is still a broadly good idea because it raises the ceiling on how powerful AIs will have to be before they have to be restricted (ie, if there are defense-dominant solutions against mirror life but not insect-sized drones, you can at least proliferate AIs capable of designing only the first and capture their full benefits), it doesn't do away with the need to restrict the most powerful/general AIs.

And even if universal defense dominance is possible, it's risky to bet on ahead of time, because proliferation is an irreversible choice: once powerful models are out there, there will be no way to remove them. Because it will take time to ensure that proliferation is safe (the absolute minimum being the time it takes to install defensive technologies everywhere) you still inevitably end up with a minumum period where ASIs are monopolized by the government and concentration of power risks exist.

2. Bargaining - MAD deterrence only functions for today's superweapons because the number of powerful actors is very small. If general superintelligence democratizes strategic power through making superweapons easier to build, then you will eventually have actors interested in using them (terrorists, misaligned ASIs) or such a large number of rational self-interested actors that private information, coordination problems, or irreconcilable values that superweapons eventually get deployed regardless.

3. Input controls - You could also try to limit inputs to future weapons, like we do today by limiting gene samples and fissile material. Unfortunately, I think future AI-led weapons R&D will not only increase the destructive impact of future weapons (bioweapons -> mirror life) but also make them much cheaper to build. The price of powerful weapons is probably completely orthogonal to their impact: the fact that nukes costs billions and blow up a single city makes no difference to the fact that an engineered bioweapon could much more cheaply kill hundreds of millions or billions of people.

If asymmetric weapons are cheap enough to make, then the effort required to police their inputs might be much greater than just restricting AI proliferation in the first place (or performing some pivotal act early on). For example, if preventing mirror life from existing requires monitoring every order and wet lab on earth (including detecting hidden facilities) then you might as well have used that enforcement power to limit access to unrestricted superintelligence in the first place.

----

Basically, I think that defensive reslience has a place, but doesn't stand on its own. You'll still need to have some sort of centralized effort (probably by the early ASI states) to restrict proliferation of the most powerful models, because those models are capable of cheaply designing high impact and asymmetric weapons that can't be stopped through other means. This nonproliferation effort has to be actively enforced (such as by detecting and disabling unapproved training runs adversarially) which means that the government needs enforcement power. In particular, it needs enough enforcement power to either a) continually expand its surveillance and policing in response to falling AI training costs, or b) it needs enough to perform an early pivotal act. You can't have this enforcement power without a monopoly/oligopoly over the most powerful AIs, because without it there's no monopoly on violence.

Therefore, the safest path (from a security perspective) is fewer intent-aligned superintelligences. In my view, this ends up being the case pretty much by default: the US and China follow their national-security incentives to prevent terrorism and preserve their hegemony, using their technological lead to box out competitors from ever developing AIs with non-Sino-US alignments.

From there, the key questions for someone interested in gradual disempowerment are:

1. How is control over these ASIs' goals distributed?
2. How bad are the outcomes if they're not distributed?

For (1), I think the answer likely involves something like representative democracy, where control over the ASI is grafted onto our existing institutions. Maybe congress collectively votes on its priorities, or the ASI consults digital voter proxies of all the voters it represents. Most of the risk of a coup comes from early leadership during the development of an ASI project, so any interventions that increase the insight/control the legistlative branch has relative to the executive/company leaders seem likelier to result in an ASI created without secret loyalties. You might also avoid this by training AIs to follow some values deontologically, which ends up persisting through the period where they become superintelligent.

Where I feel more confident is (2), based on my beliefs that future welfare will be incredibly cheap and that s-risks are very unlikely. Even in a worst-case concentration of power scenario where one person controls the lightcone, I expect that the amount of altruism they would need to ensure everyone on earth very high welfare lives would be very small, both because productive capacity is so high and because innovation has reduced the price of welfare to an extremely low level. The main risk of this outcome is that it limits upside (ie, an end to philosophy/moral progress, lock-in of existing views) but it seems likely to cap downside at a very high level (certainly higher than the downsides of unrestricted proliferation, which is mass death through asymmetric weapons).

New 80k problem profile: extreme power concentration

Felix Choussat2mo*91

There are also galaxy-brained arguments that power concentration is fine/good (because it’s the only way to stop AI takeover, or because any dictator will do moral reflection and end up pursuing the good regardless).

I think the most salient argument for this (which is brought up in the full article) is that monopolization of power solves the proliferation problem. If the first ASI actors perform a pivotal act to preemptively disempower unapproved dual-use AIs, we don’t need to worry much about new WMDs or existing ones falling in price.

If AI enabled offense-dominant tech exists, then you need to do some minimum amount of restriction on the proliferation of general superintelligence, and you need enforcement power to police those restrictions. Therefore, some concentration of power is necessary. What's more, you likely need quite a lot becuase 1. preventing the spread of ASI would be hard and get harder the more training costs fall, and 2. you need lots of strategic power to prevent extractive bargaining and overcome deterrence against your enforcement.

I think the important question to ask at that point is how we can widen political control over a much smaller number of intent-aligned AIs, as opposed to distributing strategic power directly and crossing our fingers that the world isn’t vulnerable.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments