Felix Choussat — LessWrong

Origins and dangers of future AI capability denial

People don't believe in a flat earth because of evidence. They believe in it because it makes them special to be able to see beyond the veil where others cannot, to acquire hidden knowledge. This is a very natural and human thing to do. Along with hidden knowledge comes some social / community effects which might be desirable. As far as I can see, for flat earth specifically, there's really not much more to it than this.

I’d make the case that instead of using this frame to model how non X-risk people see AI, it’s better applied as a way to model what the rationalist community looks like to them. A small group of people with an extreme, fringe belief that spends most of its time generating evidence for that belief. People aren’t going off the epistemic validity of that evidence, they’re going off their gut feeling that it’s weird for a small group of people to be so concerned in the absence of a lot of social proof.

With that in mind, I don’t think the public messaging strategy should focus heavily on proving AI is dangerous. The rationalist community spends a lot of time arguing that AI powerful -> bad. But the public mostly already agrees with that conclusion, they just got to it from a different premise (job loss, environment, stifling of the human spirit, fear of change, etc). I think it’s important to strategically validate some of those—you really are going to lose your job!—and make the connection between losing your job and being very capable. While they might not end up with a perfect policy picture, they don’t need one. They just need to be worried enough to add pressure to their representatives, who can be targeted more deliberately.

Kabir Kumar's Shortform

Felix Choussat12d10

In the sense that people are contorting their opinions too much to make them palatable to outsiders, or that people within the AI safety community itself end up trying to pursue research that looks good to their peers instead of what does the most marginal good (ie doing increasingly elaborate research on the specifics of X risk instead of the boring grunt work of lobbying in D.C)

Daniel Kokotajlo's Shortform

Felix Choussat19d10

I also love draft, although I prefer to curate it myself. I like that it tests all of the skills of magic, including drafting, deck building, and piloting on the fly.

For example, this is a synergy cube I created a few months back. The overview page has examples of decks that emerge naturally while drafting certain synergies, although there are plenty more for someone creative to find. I think you’d like this sort of experience, because none of these cards wins individually, and the mana curve is so low that your threats will very rarely end up bottlenecked by land drops.

Daniel Kokotajlo's Shortform

Felix Choussat19d120

You get something like this with competitive multiplayer commander in Magic. As your resources (mana) increase each turn, your ability to win the game grows exponentially, while your ability to restrict what other players are doing grows linearly. The reason this happens is because there are powerful combos available that allow one player to instantly defeat the other three players, and that your interaction (answers to other player's combos) typically only stops one player at a time. On the face of it, it's better to spend two mana tutoring your win condition instead of two mana stopping one other player, because stopping the other player puts you both behind relative to the other two people who are sitting at the table. On the other hand, winning the game stops everyone else!

As a result, the people who tend to win are those who attempt to combo off after someone else already has, because the other players have wasted their resources on stopping the initial threat and don't have the mana left to interact with them.

While competitive commander doesn't have the same lose condition you're describing, this resource problem encourages a lot of bad Molochian behavior from a fun perspective. For example, you're strongly encouraged to lie about not having answers and forcing other people to pay the cost to interact (usually through something called priority bullying), in order to get ahead. You're also incentivized to misrepresent the severity of threats, so that people are less likely to interact with you and more likely to interact with your opponents. Depending on how much you want to push social boundries, you can also do things like commit to deals and then break them the moment it wins you the game, burning your immediate political capital for victory right now at the long term cost of distrust in every future game you play with the same people.

Another, rarer example of a molochian strategy is to force your opponents to play through an entire combo. Some combos can be extremely long and convoluted (usually because they're the non-deterministic kind, Kraak forcing people to flip dozens of coins is a favorite example), so people will usually end the game to keep on playing something else once the see the opponent get enough resources that them winning is very likely. But occasionally, you'll get players who want to see the entire thing executed by hand, in order for them to provide that it's possible. This can happen because it's RNG dependent (maybe all 15 coinflips end up tails and you fizzle), or because the comboing player is bluffing and doesn't actually have a win condition. 95% of the time, the result of this is that everyone at the table gets to wait an extra 20 minutes for the match to end in the outcome of the guy winning (because they've already generated so many resources), and 5% of the time you find out that they were lying and trying to scare everyone else into conceeding because they don't have an actual win condition.

TLDR: In CeDH, or competitive commander for Magic, the best strategies for winning are often a race to the bottom in terms of the amount of fun that people have with the game. Because there's an incentive for lying to other players and getting them to waste their resources, players who pursue socially toxic strategies that drive people not to play again end up winning more.

faul_sname's Shortform

Felix Choussat23d30

I’d argue that the way force is applied in each of these contexts has very different implications for the openness/rightness/goodness of the future. In von neumann’s time, there was no path to forcibly preventing Russia from acquiring nuclear weapons that did not involve using your own nuclear weapons to destroy an irrecoverable portion of their infrastructure, especially considering the fact that their economy was already blockaded off from potential sanctions.

Raemon is right that you cannot allow the proliferation of superintelligent AIs (because those AIs will allow you to cheaply produce powerful weapons). To stop this from happening ~permanently, you do probably need a single actor or very small coalition of actors to enforce that non-proliferation forever, likely through using their first to ASI position to permanently monopolize it and box out new entrants.

While the existence of this coalition would necessarily reduce the flexibility of the future, it would probably look a lot more like the IAEA and less like a preemptive nuclear holocaust. The only AI capabilities that need to be restricted are those related to weapons development, which means that every other non-coalition actor still gets to grab the upside of most AI applications. Analogously, the U.N security council have been largely successful at preventing nuclear proliferation to other countries by using their collective economic, political, and strategic position, while still allowing beneficial nuclear technology to be widely distributed. You can let the other countries build nuclear power plants, so long as you use your strategic influence to make sure they’re not enrichment facilities.

In practice, I think this (ideally) ends up looking something like the U.S and China agreeing on further non-proliferation of ASI, and then using their collective DSA over everybody else to monopolize the AI supply chain. From there, you can put a bunch of hardware-bound restrictions, mandatory verification/monitoring for data centers, and backdoors into every new AI application to make sure they’re aligned to the current regime. There’s necessarily a lot of concentration of power, but that’s only because it explicitly trades off with the monopoly of violence (ie, you can’t just give more actors more actors access to ASI weapons capabilities for self-determination without losing overall global security, same as with nukes).

I’m currently writing up a series of posts on the strategic implications of AI proliferation, so I’ll have a much more in-depth version of this argument here in a few weeks. I’m also happy to dm/call directly to talk about this in more detail!

AI and Cheap Weapons

Felix Choussat26d*40

I'll talk more about this in follow up posts, but I don't think the main danger is that the models will be voluntarily released. Instead, it'll just get cheaper and cheaper to train the models that have weapons capabilities as the algorithms get more efficient, which will eventually democratize those weapons.

Analogously, we can think about how cryptography was once a government controlled technology because of its strategic implications, but became widespread as the computing power required to host cryptographic algorithms became extremely cheap.

[Question] What the discontinuity is, if not FOOM?

Felix Choussat1mo10

I'll make the point that safety engineering can have discontinuous failure modes. The reason the Challenger collapsed was because some o-ring seals in a booster had gotten too cold before launch, preventing them from sealing off the flow of hot gas to the main engine and blowing up the rocket. The function of these o-rings is pretty binary: either gas is kept in and the rocket works, or it's let out and the whole thing explodes.

AI research might end up with similar problems. It's probably true that there is such a thing as good enough alignment, but that doesn't necessarily imply that progress on solving it can be made incrementally and doesn't have all or nothing stakes in deployment.

Felix C.'s Shortform

Felix Choussat1mo10

Is there a way to bring over the lesswrong format (the left sidebar table of contents, the floating citations off to the right) onto a personal blog? As far as I can tell the AI 2027 project forecasts are similar in structure, so it seems like someone's already managed it before.

I love the lesswrong formatting and plan to crosspost between my blog and here, but I want to be able to share the work either without the immediate connotations of lesswrong (for some policy audiences) or with personal connotations through my blog (as part of a job search, for instance).

The Culture Novels as a Dystopia

Felix Choussat1mo*100

I'd mentioned the Golden Age novels by Wright before when we'd gone hiking together, so I thought it'd be worth looking at some related flaws in his utopia.

The Sophotechs (the trilogy's equivalent of the minds), are philosophically libertarian. While they do intervene to stop direct violence, they otherwise enforce a system of complete freedom over the self, as well as the maintenance of property rights. This has some interesting consequences, the most detrimental of which is that everyone in the Golden Oecumene lives their otherwise utopian lives with metaphorical handguns on their nightstand. At any point, any citizen can make a decision which would destroy their life, identity, or cause them to suffer for eternity, and the Sophotechs will rigidly prevent anyone else from doing anything about it on the basis that it was their free will to do so. While there are precautions (you can be informed of the consequences of your actions, or enter a contract to be restrained from doing anything that would destroy you) the people with the wrong temperment to use these tools run the risk of what is essentially damnation.

Some examples from the books:

Permanently destroying your identity by believing yourself to be someone else.
Falling into uber-hedonism, where you wirehead yourself compulsively.
Modifying your brain so that your values will never shift.
Removing your ability to fill pity or empathy.

There's actually entire factions of society which exhibit these faults. There are obsessive hedonists which malicously try to spread the gospel of wireheading, and the Invariants, people whose brains are designed so that they don't have emotional conflicts and always act optimally in their best interests.

The system of property rights and limited government also has its own knock-on effects. Patents still exist and are permanent, so anyone who has ever developed a technology can maintain a permanent monopoly over it, including humans who are so old that they existed before the Sophotechs came along (~10,000 years old or so). Money still exists, although it's represented by access to the computing time of the Sophotechs instead of by trust in a government.

Because the role of the government is so limited (it exists to fund the commonwealth military with an extremely low tax, which we'll get to), there's no social safety net either. Everyone has to pay for all of the goods they consume, either from the rent on a patent or property, or from work. Work still exists since, at least in Wright's view, the Sophotechs have limited attention, and so humans can be paid extremely below-market rates for doing very specialized work. Combined with the fact that goods are so cheap and the fact that most people can hope to patent an extremely niche product that even a small slice of the trillions of people in the solar system use, most people enjoy a very comfortable existence (The median income from the books is the budget equivalent of a modern earth military per capita).

If for some reason you can't/don't want to pay for things though, then you do actually starve to death. This happens to one character directly in the novels (he spends all of his compute on maintaining a delusion that he's someone else), and presumably to others. In fact, one of the major motivations of many of the characters is to amass as many resources as possible, so that as the universe approaches inevitable heat death, they will be able to buy out more resources and stay alive longer than everyone else.

All of this is supposed to be balanced out by public activism. Society is basically organized into large factions which each have their own philosophical viewpoints, and they use their collective boycott power and social influence to try to control what they each see as socially degrading. The factions advocating for wireheading, for instance, are essentially sanctioned by the much wealthier and more powerful Horators, who are traditionalists (still transhumanists by today's standards, but who want to maintain respect for legacy human emotions, history, and the security of the commonwealth). Because wealth is somewhat ossified (all the major breakthroughs were patented thousands of years ago, and most of those people are Horators), this state of affairs is semi-stable. Individual rogue actors and inter-factional disputes still happen though, so there's no permanent solution to ensuring that the Golden Oecumene does actually remain both perfectly free and utopian.

The main conflict of the novels, in fact, is about the protagonist wanting to set up a civilization in another solar system, where the influence of the Horators will be greatly limited. His perspective is that he wants insurance against an alien attack on humanity to ensure that human life will be able to continue in the universe, while the Horators are worried that they won't be able to effectively ensure that their social rules against self torture and illiberalism are maintained light years away. The Sophotechs in the book are still constrained by light speed communication, so cultural drift is another huge problem the civilization is going to have to eventually deal with. Even if the original solar system remains basically utopian, they have no guarantee against the suffering of other galactic polities (since the people who colonized them can set things up of their own free will, similar theming to The Accord's habs).

All told, the libertarian value lock-in that the Golden Oecumene was created with is mostly extraordinarily utopian for ~everyone, although with the potential for basically arbitrary suffering, even though the Sophotechs are powerful enough to model anyone's mind and understand the actions they'd choose.

Spoilers for the end of the trilogy below. If you thought the conflicts I was describing above sound interesting, it's really a great series worth reading, and also available for free online on the author's website.

BREAK

At the very end of the final novel, and in the sequel short story The Far End of History, it becomes apparent that Wright's world is actually incredibly dystopian. The reason stems from the lock-in of a military law in the creators of the Sophotechs, which stipulates that they themselves are not allowed to directly use force. Their workaround is to use a man called Atkins for any violent legal or military ends they might need. Atkins is even older than the Sophotechs themselves, and having been a soldier, had voluntarily commited to the removal of his human rights for the purposes of war. In much the same sense that a soldier of the U.S can be compelled to risk their life on the battlefield despite their rights as a citizen, Atkins can basically be used for anything the Sophotechs need him to so long as it has strategic value.

The culmination of this is entire civilizations of just Atkins and his identical clones, which are created over hundreds of thousands of years as distractions from the Golden Oecumene. The citizens of these polities are variously tortured for information and exterminated by the Silent Oecumene (long story, but a divergent extra-solar faction of humanity from before the Sophotechs existed with philosophical differences). While the original Oecumene and its sisters are composed of humans with rights and so are presumably still utopian, 99% of all sentient human life in the universe pretty much ends up being constripted human soldiers, who have no guarantees against suffering. Even if it's all technically the same guy, it's hard to say that this is really the best things could have ended up.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments