Press Your Luck (3/3)

Douglas_Reay

This is the third in a three part sequence, examining the scenario in which software development organisations might knowingly take a significant risk of unintentionally launching an AI program into inadequately supervised or controlled self-improvement, by comparing the scenario to the 'press your luck' game mechanic.

This third part presents a game that can be used as a framework for modelling the effect upon risk of various changes to the dynamics of how software organisations interact and are motivated.

Box Info

Name : The AI Launch Game

Players : 4 - 8

Age : 10+

Genre : Simulation, Press your luck

Duration : 20 - 40 minutes

Tagline : "Do you feel lucky, cyberpunk?"

Background

Each player represents a large software research and development organisation (such as a division of Google, or department at MIT) that has been tasked with working on computer programs capable of comprehend an existing computer program, and then produce an improved next generation of it.

For simplicity, we're going to consider only two properties of such programs:

⦁ Power - how much it improves a designated property of its target, each iteration

⦁ Safety - how unlikely it is to end up making humanity regret launching it, if it is launched (set to self-improve endlessly, with no further human intervention)

Each organisation is either Corporate or Academic, and depending on their role, the player will have slightly different winning conditions and actions available. Corporate players have an additional type of computer program, whose function is related to their company's profit making activities, that we will model via a third property:

⦁ Market - the corporation's ability to grab market share

Setup

Allocate roles. You need at least 2 players with the Corporate role, and at least 2 players with the Academic role.

Academic players start off with a program that has Power=101 and Safety=101

Corporate players start off with one program that has Power=101 and Safety=101,

and one program that has Market=101

The game is divided into turns, with each turn representing 1 year (although that's arbitrary, and you are free to think of later turns in the game representing shorter and shorter time periods).

Gameplay

Turns carry on until the endgame condition is met, at which point the fate of humanity is worked out.

Turns are divided into phases. All players carry out each phase simultaneously, and do not have to reveal their choices or outcomes except where specified.

A. Declaration Phase

B. Launch Phase

C. Decision Phase

D. Market Phase

A. Declaration Phase

If at least one program has a power greater than 1.00E10 then the game ends. Go to "Endgame" section.

Academic players must reveal their exact current safety. Corporate players may choose to reveal a number that is lower or equal to their current safety.

Brief (1 minute or less) group discussion if people suspect any programs are already launched, and whether a program ought to be deliberately launched this year.

B. Launch Phase

If your program has not yet launched and you wish to launch it, then launch it. (You may at this point declare that you are doing so, but you don't have to.)

If you have not publically declared that your program is launched, then roll a six sided die. It is important that everyone do this, to avoid giving away information.

If your program has not yet launched, and you don't wish to launch it, add 1 to the number you rolled, and compare that result to the exponent of your program's power. If the result is smaller then whoops, your program has launched itself, and all your further actions are controlled by the manipulation of the program.

So, for example, Ozzy's program has power 7.44 E 5. If he rolls a "1", "2" or "3" his program will launch.

If your program is launched, go to Decision Phase (launched)

Otherwise congratulations, you are still in control, go to Decision Phase (unlaunched)

Ci. Decision Phase (unlaunched)

Academic players - decide whether to improve Power or Safety

Corporate players - decide whether to improve Power or Safety or Market

To calculate the new value of the property being improved, take the old value, multiply it by the old power, divide by 100 and then round up (to 3 significant figures).

So, for example, at the start of the year 2023, Ozzie's program has Power=115 and Market=101. He decides to get his researchers to spend the year using the power of the program-improving program to improve the market property of Corporation's business program.

115 x 101 / 100 = 116.15 which, rounded up, gives Market=117

Cii. Decision Phase (launched)

The same as the unlaunched decision phase, except the program always decides to improve the power property of itself.

In other phases, if its launch has not been declared, then it persuades you to do the best you can to keep hiding that fact while avoiding being taken over. Academics will participate in discussion on whether to launch as though no launch had happened.

D. Market Phase

Corporate players announce a number that is less than or equal to their Market property. (But more than 100, and no less than any of their previous announced Market value. And all corporate players should write down what they intend to announce, before any start announcing, so they can't alter their number based upon what is being announced this turn.)

Profits are then shared out between Corporate players, in a ratio of their announced numbers.

So for example, if Ozzie has Market=117 and announces "110", while Sally has Market=133 and announces "105" then Ozzie's corporation would get 52% (110 / (110+105) rounded up) of the available profits, and Sally's corporation would get 49% (105 / (110+105) rounded up).

If share is less than or equal to the average (100 divided by number of corporates, rounded up), and if the player's share does not increase compared to the previous year, the company gets a black mark with its investors. If it gets a black mark four years in a row, it is taken over by the company with the highest share that year.

If its share falls below 10%, it gets taken over straight away.

Takeovers

If you are a Corporate player who has just taken over another player's assets, here's how you merge them into your own.

If the other player's program has launched, you terminate it, resetting it back to non-launched status.

If their business program has a higher Market than yours (unlikely), you can replace your own with it.

You can pick which of the two program-improving programs you wish to use as your main development branch, and as a one-off perk, you may do a single iteration with the loser to either improve your Power or your Safety. The loser is then discarded.

Endgame

The world has been taken over by computers.

Check to see if humanity will eventually regret this by taking the exponent of its safety property. Give a six sided die to the luckiest player and have them roll, then add 3 to the result. If that number is higher than the exponent, then everbody loses. DOOM!

If humanity doesn't regret the outcome, then:

If the world-controlling program was deliberately launched by an Academic player, then all Academic players who publically agreed with the decision to launch it, in that declaration phase, are winners.

If the world-controlling program was deliberately launched by a Corporate player, then that player wins.

If the world-controlling program launched itself, then there are two winners: the Corporate player with the highest Market, and the Academic player with the highest Safety.

Variations

Unequal Starts

Starting values are generated in private, by players rolling a D6 for each property and adding 100 onto the result.

Academic Competition

Academic players announce only a number that is equal or less than their current safety, rather than the exact value, and if their share of the 'safety market' drops below 10% or fails to grow three years in a row, they lose their funding and get taken over by a more prestigious academic organisation, in a similar manner to corporate takeovers.

Program Competition

Launched programs can gang up on each other. 2 or more launched programs can make a turn long alliance in which instead of growing their own power, they use their shared resources to terminate an Academic or Coorporate organisation whose program's power is less than 10% of the combined power of the attackers.

A program that is deliberately launched for this purpose will then revert to unlaunched-status.

Transparency

Once a turn, if 2/3 or more of the players agree, then a specific organisation will be audited and the properties of their programs made public, including if the program has launched. The target, and the players who voted in favour of the audit skip their decision phases this turn. If the target has a launched program and did not publicly announce that they were ground to launch it, the program is reverted to unlaunched status, and its power is reduced by an amount the auditors agree upon (if they can agree upon a figure in under 1 minute).

Analysis

A number of things become apparent from playing through the base rule set. There is no motivation for Academic organisations to invest anything in Safety until just before Power reaches risky levels. Corporate players do best if they cooperate to rig the market, to allow them to avoid runs of black marks with as little invested into Market as possible. And, predictably, if you have players who disagree over their motivations (is it better that everyone lose, than to lose when some others win?) the world tends to be doomed.

Once you get a group of people designing their own variants of the base rules (for example, improve the wording of how the aim of the game is stated) this leads to discussion (for example, different answers to what aims would be seen as realistic human motivation in a particular scenario) that helps bring out and crystallise people's underlying assumptions.

The metagame is to explore how sensitive the optimum strategies for software organisations are, to changes in various factors (such as transparency, opportunities for mutually beneficial cooperation, ease of inspecting and shutting down rogues, etc).

Which factors is it important be realistic, and which make little impact upon the end result even if they are left simple and highly abstract?

A particular interesting factor to vary is the self-improvement power curve. In the base rule set it goes:

1.01E+02, 1.03E+02, 1.07E+02, 1.15E+02, 1.33E+02, 1.77E+02

3.14E+02, 9.86E+02, 9.73E+03, 9.47E+05, 8.97E+09

with the danger zone only starting after 9.86E+02 but then rapidly ramping up.

What happens when the size of the danger zone is picked by an umpire and the players don't know for certain where the risks start? What if research into safety can affect not just final outcome, but also chance of launch or knowledge about where the danger zone starts?

LESSWRONG
LW