GödelPilled — LessWrong

Everett Insurance as a Misalignment Backup

(Posting this here because I've been lurking a long time and just decided to create an account. Not sure if this idea is well structured enough to warrent a high level post and I need to accrue karma before I can post anyways.)

Premise:

There exist some actions, which have such low cost, as well as high payoff in the case that the Everett interpretation happens to be true, that their expected value makes them worth doing even if P(everett=true) is very low. (personal prior: P(everett=true) = %5)

Proposal:

The proposal behind Everett Insurance is straightforward: use Quantum Random Number Generators (QRNGs) to introduce quantum random seeds into large AI systems. QRNGs are devices that use quantum processes (such as photon emission or tunneling) to generate random numbers that are truly unpredictable and irreproducible. These random numbers can be used as seeds during training and\or inference passes in large AI systems. By doing so, if the Everett interpretation happens to be true, we would create a broad swath of possible universes in which each and every AI inference (or training run) will be slightly different.

The idea is based on the Everett interpretation of quantum mechanics (also known as the many-worlds interpretation), which holds that there are many worlds that exist in parallel at the same space and time as our own. According to this interpretation, every time a quantum interaction occurs with different possible outcomes (such as measuring the spin of an electron), all outcomes are obtained, each in a different newly created world. For example, if one flips a quantum coin (a device that uses a quantum process to generate a random bit), then there will be two worlds: one where the coin lands heads and one where it lands tails.

This alignment method is not something that should be used as a primary alignment strategy, given that it only has any value on the off chance the Everett interpretation is correct. I only bring it up because its low cost and effort of implementation (low alignment tax) leads it to have high expected value, and it is something that could easily be implemented now.

Expected value:

The Everett interpretation of quantum mechanics has been controversial since its inception; however this method of misalignment backup has a high expected value even assuming a low chance of the Everett interpretation being correct. Implementing QRNGs into existing AI systems would be relatively easy, as it only requires adding a few lines of code. The potential benefit is enormous, as it could reduce the existential risk that misaligned AI systems wipe out humanity accross every future. Therefore, even if we assign a very low probability to the Everett interpretation of quantum mechanics being true, the expected value of Everett Insurance is still high enough to potentially justify its adoption now.

Implementation:

One possible way to implement Everett Insurance now is by adding a random seed, hashed from a QRNG api, to the pytorch dropout function. The dropout function randomly zeroes some of the elements of an input tensor with a given probability during training. By using a QRNG api to generate a random seed for each dropout layer, we would introduce quantum randomness into the parameters and outputs.

This method has some limitations: it only works for neural networks that use dropout layers (which I believe currently includes all LLMs) and it only introduces quantum randomness during training, not inference. In the future, these limitations could be overcome by developing more efficient and accessible ways to integrate QRNGs into machine learning libraries. For example, one could use QRNGs to generate random seeds for inference passes as well as training passes; one could use QRNGs to generate random seeds for other types of layers or functions besides dropout; and one could use QRNGs that are embedded into hardware devices or chips instead of relying on external APIs. As QRNGs become cheaper due to higher demand, further integration of random seeds could be implemented across other parts of the AI development and inference process.

Drawbacks:

The biggest potential negative externality that I can think of, is if people who believe the Everett interpretation is true, reduce their effort on other alignment strategies. This strategy only has any value at all if the Everett interpretation is true and we have no way of testing that hypothesis currently. Therefore, we should not reduce our effort in any other alignment strategies and should probably implement this at a low level (in machine learning library functions) in order to reduce awareness and thereby prevent reduction of effort on other alignment strategies.

The second drawback is cost. The existing QRNGs I found had API costs on the order of $0.005/request, and I didn't see any stats on the generation rate of these quantum random numbers (i.e. will two API requests at the same Unix clock time return the same QRN?). However, the API cost would be negligible relative to other training costs of current large LLMs. Additionally, the manufactured demand would likely lead to reduced per API costs if similar functionality was added to inference steps.

Another potential criticism is that in the case that we actually do live in an Everett universe, the variance from quantum randomness is large enough to already lead to both universes where AI ends up aligned and universes where it ends up misaligned. My response to this is that, in spite of quantum mechanics, computers are largely deterministic (with the exception of rare occurrences like cosmic rays striking transistors) and any variance that does occur is largely corrected for at the scale of floating point operations in GPUs. Additionally, we don't understand the brain enough yet to know if quantum randomness (through potential mechanisms like Brownian motion inside synapses) is significant enough to lead to timing differences in neurons firing in biological neural networks. Therefore, we don't yet know if human behavior is largely deterministic in an Everett universe. Due to these unknows, it's possible that even in an Everett universe, we could find ourselves at a point in time where all possible futures lead to misalignment without the addition of artificial variance through a method like this.

Finally, one of the larger problems with any potential implementations, is that the variance due to the random seed needs to actually be large enough that it leads to non-deterministic variance in output and behavior. This could have the potential drawback that additional variance in AI behavior leads to reduced trust and reliability of these systems. However, this cost could be greatly reduced by using a well-designed hash of the QRN that leads to deterministic behavior for the vast majority of quantum random seeds, but generates larger amounts of variance for some small fraction of possible seeds.

[EDIT] replaced "in all" with "across every" to try and clarify a sentence

[EDIT] Adding an additional criticism from a user on reddit:

"This idea seems to be premised on some misunderstandings.

The Born rule gives a measure over the possible universes. The usual way of reconciling utilitarianism with the many-worlds interpretation is simply to take the expectation value of utility over this measure. So our goal then is to avoid AI catastrophe in as many universes as possible. Our goal is not merely to avoid it in some universe.

But suppose you really do just want there to be some world where AI doesn't take over. Then there's no need to do anything, as any world that is not forbidden by e.g. conservation principles will occur with some nonzero probability (or "exist", as a many-worlder would say). Is AI alignment forbidden by fundamental laws of physics? Of course not." -endquote

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments