## LESSWRONGLW

The theory will be a lot more useful once actual leakage ratios are estimated. This paper was mathematically specific, because the purpose of it was to establish a few equations to use when estimating the Friendliness ratio and constraints to AI projects. It was written more to build a mathematical foundation for that than it was a simple intro of the ideas to most readers.

Obviously this was meant as more of a research article than a blog post, but we felt like LessWrong was a good place to publish it given the subject.

It was written more to build a mathematical foundation

That's a pretty simple mathematical foundation for a toy problem.

How about introducing uncertainty into your framework? You will not be dealing with hard numbers, you will be dealing with probability distributions and that makes things considerably more complex.

# 23

Ozzie Gooen and Justin Shovelain

## Summary

Friendly artificial intelligence (FAI) researchers have at least two significant challenges. First, they must produce a significant amount of FAI research in a short amount of time. Second, they must do so without producing enough general artificial intelligence (AGI) research to result in the creation of an unfriendly artificial intelligence (UFAI). We estimate the requirements of both of these challenges using two simple models.

Our first model describes a friendliness ratio and a leakage ratio for FAI research projects. These provide limits on the allowable amount of artificial general intelligence (AGI) knowledge produced per unit of FAI knowledge in order for a project to be net beneficial.

Our second model studies a hypothetical FAI venture, which is responsible for ensuring FAI creation. We estimate necessary total FAI research per year from the venture and leakage ratio of that research. This model demonstrates a trade off between the speed of FAI research and the proportion of AGI research that can be revealed as part of it. If FAI research takes too long, then the acceptable leakage ratio may become so low that it would become nearly impossible to safely produce any new research.

## Introduction

A general artificial intelligence (AGI) is an AI that could perform all the intellectual tasks a human can.[1] When one is created, it may recursively become more intelligent to the point where it is vastly superior to human intelligences.[2] This AGI could be either friendly or unfriendly, where friendliness means it would have values that humans would favor, and unfriendliness means that it would not.[3]

It is likely that if we do not explicitly understand how to make a friendly general artificial intelligence, then by the time we make a general artificial intelligence, it will be unfriendly.[4] It is also likely that we are much further from understanding how to make a friendly artificial intelligence than we are from understanding how to make a general artificial intelligence.[5][6]

Thus, it is important to create more FAI research, but it may also be important to make sure to not produce much AGI research when doing so. If it is 10 times as difficult to understand how to make an FAI than to understand how to make an AGI, then a FAI research paper that produces 0.3 equivalent papers worth of AGI research will probably increase the chances of a UFAI. Given the close relationship of FAI and AGI research, producing FAI research with a net positive impact may be difficult to do.

## Model 1. The Friendliness and Leakage Ratios for an FAI Project

### The Friendliness Ratio

Let's imagine that there is necessary amount of research to build an AGI, Gremaining. There is also some necessary amount of research to build a FAI, Fremaining. These two have units of rgeneral (general AI research) and rfriendly (friendly AI research), which are not precisely defined but are directly comparable.

Which threshold is higher? According to much of the research in this field, Fremaining. We need significantly more research to create a friendly AI than an unfriendly one.

#### Figure 1. Example research thresholds for AGI and FAI.

To understand the relationship between these thresholds, we use the following equation.

$\small&space;f_{global}=\frac{\mathit{F_{remaining}}}{\mathit{G_{remaining}}}$

We call this the friendliness ratio. The friendliness ratio is useful for a high level understanding of world total FAI research requirements and is a heuristic guide for how difficult the problem of differential technological development is.

The friendliness ratio would be high if Fremaining > Gremaining. For example, if there are 2000 units of remaining research for an FAI and 20 units for an AGI, the friendliness ratio would be 100. If someone published research with 20 units of FAI research but 1 unit of AI research, their research would not meet the friendliness ratio requirement (100 vs 20/1) and would thus make the problem even worse.

### The Leakage Ratio

For specific projects it may be useful to have a measure that focuses directly on the negative outcome.

For this we can use the leakage ratio, which represents the amount of undesired AGI research created per unit of FAI research. It is simply the inverse of the friendliness ratio.

$\small&space;l_{global}=\frac{\mathit{G_{threshold}}}{\mathit{F_{remaining}}}$

$\small&space;l_{project}=\frac{\mathit{G_{project}}}{\mathit{F_{project}}}$

In order for a project to be net beneficial,

$l_{project}

### Estimating if a Project is Net Friendliness Positive

#### Question: How can one estimate if a project is net friendliness-positive?

A naive answer would be to make sure that it falls over the global friendliness ratio or under the global leakage ratio.

Global AI research rates need to fulfill the friendliness ratio in order to produce a FAI. Therefore, if an advance in friendliness research gets produced with FAI research Fproject, but in the process it also produces AGI research Gproject, then this would be net friendliness negative if

$\frac{F_{project}}{G_{project}}

Later research would need to make up for this under-balance.

### AI Research Example

Say that Gremaining = 200rg and Fremaining =2000rf, leading to a friendliness ratio of fglobal = 10 and a global maximum leakage ratio of lglobal = 0.1. In this case, specific research projects could be evaluated to make sure that they meet this threshold. One could imagine an organization deciding what research to do or publish using the following chart.

 Description AGI Research $\inline&space;\dpi{100}&space;G_{p}$ FAI Research $\inline&space;\dpi{100}&space;F_{p}$ Friendliness Ratio $\inline&space;\dpi{100}&space;f_{p}$ Leakage Ratio $\inline&space;\dpi{100}&space;l_{p}$ Project 1 Rat simulation $\inline&space;\dpi{100}&space;10r_{g}$ $\inline&space;\dpi{100}&space;60r_{f}$ 6 0.17 Project 2 Math Paper $\inline&space;\dpi{100}&space;2r_{g}$ $\inline&space;\dpi{100}&space;22r_{f}$ 11 0.09 Project 3 Technical FAI Advocacy $\inline&space;\dpi{100}&space;1r_{g}$ $\inline&space;\dpi{100}&space;14r_{f}$ 14 0.07

In this case, only Projects 2 and 3 have a leakage ratio of less than 0.1, meaning that only these would net beneficial. Even though Project 1 has generated safety research, it would be net negative.

### Model 1 Assumptions:

1. There exists some threshold Gremaining of research necessary to generate an unfriendly artificial intelligence.

2. There exists some threshold Fremaining of research necessary to generate a friendly artificial intelligence.

3. If Gremaining is reached before Fremaining, a UFAI will be created. If after, an FAI will be created.

## Model 2. AGI Leakage Limits of an FAI Venture

Question: How can an FAI venture ensure the creation of an FAI?

Let's imagine a group that plans to ensure that an FAI is created. We call this an FAI Venture.

This venture would be constrained by time. AGI research is being created internationally and, if left alone, will likely create an UFAI. We can consider research done outside of the venture as external research and research within the venture as internal research. If internal research is done too slowly, or if it leaks too much AGI research, an unfriendly artificial intelligence could be created before Fremaining is met.

We thus split up friendly and unfriendly research creation into two categories, external and internal research. Then we consider the derivative of each with respect to time. For simplicity, we assume the unit of time is years.

G'i = AGI research produced internally per year

F'i = FAI research produced internally per year

G'e = AGI research produced externally per year

F'e = FAI research produced externally per year

We can understand that there exists times, tf and tg, which are the times at which the friendly and general remaining thresholds are met.

tf = Year in which Fremaining is met

tg = Year in which Gremaining is met

These times can be estimated as follows:

$\frac{G_{remaining}}{G'_{i}+G'_{e}}=t_{g}$

$\frac{F_{remaining}}{F'_{i}+F'_{e}}=t_{f}$

The venture wants to make sure that ttg so that the eventual AI is friendly (assumption 3). With this, we find that:

$\inline&space;\small&space;F'_{i}>C_{0}+C_{1}\cdot&space;G'_{i}$

Where the values of C0 and C1 both include the friendliness ratio $\inline&space;\dpi{100}&space;f_{global}=\frac{F_{remaining}}{G_{remaining}}$.

$\inline&space;\small&space;C_{0}=f_{global}G'_{e}-F'_{e}$

$\inline&space;\small&space;C_{1}=f_{global}$

This implies a linear relationship between F'i and G'i. The more FAI research the FAI venture can produce, the more AGI research it is allowed to leak.

This gives us a clean way to go from a G'i value the venture could expect to the F'i it would need to be successful.

The C0 value describes the absolute minimum amount of FAI research necessary in order to have a chance at a successful outcome. While the resulting acceptable leakage ratio at this point would be impossible to meet, the baseline is easy to calculate. Assuming that F'e << fglobalG'e, we can estimate that

$\inline&space;\small&space;C_{0}\approx&space;f_{global}G'_{e}$

If we wanted to instead calculate G'i using F'i, we could use the following equation. This may be more direct to the intentions of a venture (finding the acceptable amount of AGI leakage after estimating FAI productivity).

$\inline&space;\dpi{120}&space;\small&space;G'_{i}<-(G'_{e}+\frac{F'_{e}}{f_{global}})+(\frac{1}{f_{global}})\cdot&space;F'_{i}$

### Model 2 Example

For example, let's imagine that the $\inline&space;\dpi{100}&space;f_{global}=10\frac{r_{f}}{r_{g}}$ and $\inline&space;\dpi{100}&space;G'_{e}=10\frac{r_{g}}{year}$. In this case, $\inline&space;\dpi{100}&space;C_{0}=100\frac{r_{f}}{year}$. This means that if the venture could make sure to leak exactly $\inline&space;\dpi{100}&space;0\frac{r_{g}}{year}$, it would need to average a FAI research rate of 10 times that of the entire world's output of AGI research. This amount increases as 100 / (1 - 10 * lproject). If the venture expects an estimated leakage ratio of 0.05, they would need to double their research output to $\inline&space;\dpi{100}&space;C_{0}=200\frac{r_{f}}{year}$, or 20 times global AGI output.

Figure 2. F'i per unit of maximum permissible G'i

#### What to do?

The numbers in the example above are a bit depressing. There is so much global AI research that it seems difficult to imagine the world averaging an even higher rate of FAI research, which would be necessary if the friendliness ratio is greater than 1.

There are some upsides. First, much hard AI work is done privately in technology companies without being published, limiting G'i. Second, the numbers of rg and rf don't perfectly correlate with the difficulty to reach them. It may be that we have diminishing marginal returns with our current levels of rg, so similar levels of rf will be easier to reach.

It's possible that Fremaining may be surprisingly low or that Gremaining may be surprisingly high.

Projects with high leakage ratios don't have to be completely avoided or hidden. The G'i value is specifically for research that will be in the hands of the group that eventually creates a AGI, so it would make sense that FAI research organizations could share high risk information between each other as long as it doesn't leak externally. The FAI venture mentioned above could be viewed as a collection of organizations rather than one specific one. It may even be difficult for AGI research implications to move externally, if the FAI academic literature is significantly separated from AGI academic literature. This logic provides a heuristic guide to choosing research projects, choosing if to publish research already done, and managing concentrations of information.

### Model 2 Assumptions:

1-3. The same 3 assumptions for the previous model.

4. The rates of research creation will be fairly constant.

5. External and internal rates of research do not influence each other.

## Conclusion

The friendliness ratio provides a high-level understanding of the amount of global FAI research per unit AGI research needed to create an FAI. The leakage ratio is the inverse of the friendliness ratio applied to a specific FAI project, to specify if that specific project is net friendliness positive. These can be used to understand the challenge for AGI research and tell if a particular project is net beneficial or net harmful.

To understand the challenges facing an FAI Venture, we found the simple equation

$\inline&space;\small&space;F'_{i}>C_{0}+C_{1}\cdot&space;G'_{i}$

where

$\inline&space;\small&space;C_{0}=f_{global}G'_{e}=F'_{e}$

$\inline&space;\small&space;C_{1}=f_{global}$

This paper was focused on establishing the mentioned models instead of estimating input values. If the models are considered useful, there should be more research to estimate these numbers. The models could also be improved to incorporate uncertainty, the growing returns of research, and other important limitations that we haven't considered. Finally, the friendliness ratio concept naturally generalizes to other technology induced existential risks.

## Appendix

a. Math manipulation for Model 2

$\frac{G_{remaining}}{G'_{i}+G'_{e}}=t_{g}$

$\dpi{100}&space;\frac{F_{remaining}}{F'_{i}+F'_{e}}=t_{f}$

$t_{f}

$\dpi{100}&space;\frac{F_{remaining}}{F'_{i}+F'_{e}}<\frac{G_{remaining}}{G'_{i}+G'_{e}}$

$\dpi{100}&space;G'_{i}<(\frac{G_{remaining}\cdot F'_{e}}{F_{remaining}}-G'_{e})(\frac{G_{remaining}}{F_{remaining}}F'_{i})$

$\dpi{100}&space;F_{i}>(\frac{F_{remaining}\cdot G'_{e}}{G_{remaining}}-F'_{e})(\frac{F_{remaining}\cdot G'_{i}}{G_{remaining}})$

This last equation can be written as

$\dpi{100}&space;F'_{i}>C_{0}+C_{1}\cdot&space;G'_{i}$

Where

$\dpi{100}&space;C_{0}=\frac{F_{remaining}\cdot&space;G'_{e}}{G_{remaining}}-F'_{e}$

$\dpi{100}&space;C_{1}=\frac{F_{remaining}}{G_{remaining}}$

Recalling the friendliness ratio, $\inline&space;\dpi{100}&space;f_{global}=\frac{F_{remaining}}{G_{remaining}}$, we can simplify these constructs further.

$\dpi{100}&space;C_{0}=f_{global}G'_{e}-F'{e}$

$\dpi{100}&space;C_{1}=f_{global}$

## References

[1] What is AGI? https://intelligence.org/2013/08/11/what-is-agi/, 2013, Luke Muehlhauser

[2] Intelligence Explosion FAQ, (https://intelligence.org/ie-faq/), MIRI

[3] Artificial Intelligence as a Positive and Negative Factor in Global Risk, 2008, Global Catastrophic Risks, Yudkowsky

[4] Aligning Superintelligence with Human Interest: A Technical Research Agenda, https://intelligence.org/files/TechnicalAgenda.pdf, Nate Soares and Benja Fellenstein, MIRI

[5] Superintelligence, 2014, Nick Bostrom

[6] The Challengeof Friendly AI, https//www.youtube.com/watch?v=nkB1e-JCgmY&noredirect=1 Yudkowsky, 2007