Critique my Model: The EV of AGI to Selfish Individuals

ozziegooen

[Edit: Changes suggested in the comments make this model & its takeaways somewhat outdated (this was one desired outcome of posting it here!). Be sure to read the comments.]

I recently spent a while attempting to explain my views on the EV of AGI for selfish individuals. I attempted to write a more conventional blog post, but after a lot of thinking about it moved to a Guesstimate model, and after more thinking about it, realized that my initial views were quite incorrect. I've decided to simply present my model, along with several points I find interesting about it. I'm curious to hear what people here think of it, and what points are the most objectionable.

Model

Video walkthrough

Model Summary:

This model estimates the expected value of AGI outcomes to specific individuals with completely selfish values. I.E; if all you care about is your future happiness, how many QALYs would exist in expectation for you from scenarios where AGI occurs. For example, a simpler model could say that there's a 1% chance that an AGI happens, but if it does, you get 1000 QALYs from life extension, so the EV of AGI would be ~10 QALYs.

The model only calculates the EV for individuals in situations where an AGI singleton happens; it doesn't compare this to counterfactuals where an AGI does not happen or is negligible in importance.

The conclusion of this specific variant of the model is a 90% confidence interval of around -300 QALYs to 1600 QALYs. I think in general my confidence bounds should have been wider, but have found it quite useful in refining my thinking on this issue.

Thoughts & Updates:

1. I was surprised at how big of a deal hyperbolic discounting was. This turned out to be by far one of the most impactful variables. Originally I expected the resulting EV to be gigantic, but the discounting rate really changed the equation. In this model the discount rate would have to be less than 10^-13 to have less than a 20% effect on the resulting EV. This means that even if you have an incredibly low discount rate (10^-8 seems very low to me), you still need to consider it.

2. One reason why I made this model was to figure out if purely selfish people would have rational reasons to work on AI safety. At this point I'm not very sure. If the EV of it going well is +1k QALYs to a selfish person, then the EV of spending a year of hard work on making it turn out well is much less than that. Honestly, this model as it is suggests that selfish individuals aware of AI risk may be best off not worrying about it. Of course, this is the hypothetical selfish individual; actual people typically place some value on other humans, especially ones close to them, and this model doesn't take that into account.

3. My prior, even having read a fair bit of S-risk literature, is still that the probabilities and intensity levels of negative outcomes are quite smaller than that for positive outcomes. If this is not the case that would be incredibly significant, perhaps making it quite risky to be alive today. Please write in the comments if you think these numbers are wrong and what your estimates would be.

4. I think the main variable here that should get more analysis is the "Chances of Happening" for "Positive Outcomes". This cell is conditional on the facts that an AGI get developed that decides to preserve humans, that a given individual lives long enough to witness this, and that the controllers of the AGI decide to have it sustain that individual indefinitely. Those are a bunch of conditions that could get broken out into more detail.

5. I personally find selfishness to be somewhat philosophically incoherent, so it's difficult to say what exactly the maximum number of QALYS per year could hypothetically be experienced by one selfish person.

6. If the total expected QALYs of an AGI is greater than what one would otherwise expect in their life (say, over 50), that would suggest that the majority of one's expected value of their entire life would come from things after the AGI. I find this useful evidence for convincing my S1 complain less about short term sacrifices. Like, "Don't worry about the chocolate bar; if I can just make it to the AGI, there's far more good stuff later on."

Some thoughts on modeling in general:

1. I really like the idea of people presenting more probabilistic models as an alternative to regular blog posts. I'm not sure what the correct format for presenting such a model is. The goal would be for it to be as simple as possible to be understood by people, but also for it to be reasonable and interesting.

2. When I typically think about writing posts I often imagine the optimal post to be one that argues well for one novel side of a debate. However, a reasonable model should really aim to optimize for accuracy, not argumentation. If anything, I may be convinced that the presentation of a strong opinion on one side of a debate is generally a counter-signal to informational quality.

3. My sense is that well reasoned models should typically result in things that agree with our deep intuitions, rather than our naive assumptions about what a model would say.

4. I really like the ability of models (particularly probabilistic ones) to simplify discussion. I'm quite optimistic about their use, and would encourage more people to try doing more analysis using similar models.

5. I apologize if it seems like I'm promoting Guesstimate, my own product. I'd say here that I do think it's the best tool for this (I basically made it just for things like this), and at this point, I'm basically running it as a loss. I personally don't expect it to really ever be profitable, but I'm quite happy with what it's able to do for many people in the rationalist/EA communities. Guesstimate is free (for public use) and open source. The use of private accounts is very cheap, and if that cost is too much for you let me know and I'll give you free access.

Future Work (for myself or others)

1. I would really like to see a collection of "common knowledge distributions" that approximate community accepted beliefs of specific variables. In this case I came up with many of these numbers myself, but would have preferred it if some could have been more established. One convenience of this kind of modeling is that important metrics seem to come up again and again in different models, so if there were a collection of generally accepted metrics, this would make modeling much easier. These values don't even require that much consensus; even if some are debated within a few orders of magnitude, that could easily be enough to be very useful.

2. Some of the most obvious next models to do of this type are models of the Expected Value of AGI for groups with other belief systems, utilitarianism or similar being the most obvious examples. It could also be interersting to break down the EV of AGI based on which group creates it. For instance, I'd like to see a matrix of the EV of safe AGIs created by different actors to different other actors; for instance, how would utilitarians view a safe AGI created by the Chinese government, or how would the collective of Western nations view one created by a rogue state?

3. There's a ton of work that could be done on the EV of specific interventions, but this is a whole topic to itself.

Being 100% certain about temporal discounting seems excessive; I think most people don't endorse pure time preferences. I think other mechanisms are better for thinking about how to trade off future benefits against current costs (e.g. just directly querying the same intuitive procedure that gave you the ad hoc discount rate estimate to get a direct estimate).

For some discussion see Carl on discounting.

Good point.

This actually makes a huge difference, I feel kind of bad for not doubting this factor enough in the model.

I think I'm willing to accept the fact that under deliberation, there's at least a 1% chance that the "rational selfish agent" would accept a hyperbolic discount rate of 10^-13 or lower. That would make a pretty huge difference.

Here's a the resulting forked model:

https://www.getguesstimate.com/models/10594

The total expected QALYs is now 100M, compared to 420. It's kind of alarming how sensitive the result is to something that seems so minor. I guess that indicates I should be writing down the confidence intervals more than the means, and should add in model uncertainty and other such considerations.

That said, here are some reasons why I think temporal discounting is not as useless as his post may suggest.

One difference between the model and Carl's post (I think), is that this model is trying to be specific to a "selfish individual", which is quite different from utility from the perspective of a utilitarian or policy analyst.

Some of this may wind up in some vagueness on what it means to be a "rational selfish agent." The way I think about it, people change over time, so if you value your near term happiness because of your soon-future's similarity to yourself, you may value your long term happiness a bit less so.

Another point I'd make in it's defense, is that even if the hyperbolic discount rate is incredibly small (10^-8), it still makes a very considerable impact.

Of course, one challenge with the experimental support on the subject is that I believe it's rather devoid of questions involving living millions of years.

The way I think about it, people change over time, so if you value your near term happiness because of your soon-future’s similarity to yourself, you may value your long term happiness a bit less so.

If this is the main reason for time discounting, it doesn't seem appropriate to extend it into the indefinite future especially when thinking about AGI. For example, once we create superintelligence, it probably wouldn't be very difficult to stop the kinds of changes that would cause you to value your future self less.

This is really great! I would also love to see more probabilistic models from people, for various things they care about.

Another update: Apparently, my assumption that the universe would be 6 Billion years old was very incorrect. Seems like it's possible that useful computation could be done in 10^2500 years, which is much better.

https://en.wikipedia.org/wiki/Future_of_an_expanding_universe

I'd be interested in reading the literature you mention that suggests positive outcomes are more likely than negative outcomes, conditioned on AGI being developed. My sense is that if AGI is developed and the transition goes badly for humans, but an individual still lives for a long time, then it's quite likely that the individual has a bad life since if you select uniformly from environments that keep humans alive but are otherwise unoptimized for wellbeing, I'd expect most to be quite unhappy.

It also seems like you place around 66% probability (2.5 : 1.3) on our chances of successfully navigating the intelligence explosion. This seems quite high and may be worth pulling out into a separate variable just to make it more explicit.

I don't remember many specific instances of literature suggesting this, but have definitely seen it here and there, and have discussed it with other people of the safety community who seem to strongly share the assumption.

I'd agree that there would be almost all unhappiness if taken from the distribution of randomly chosen situations that could keep humans alive, but this doesn't seem like the right distribution to me. I think there's a much likelier chance that an AI would keep us alive if it were the sake of making us happy, than for any other purpose. I honestly don't know what humans are good for in the long run other thank being kept alive for enjoyment / for near-AGI negotiation deals.

I think an analogy is humans and animals. In the short run we may be making things worse for them, but in the long run I'd bet we won't really care about them (we'd have other ways of getting nutrition) and would either kill them all or keep some in zoos/museums/protected lands. Their future basically depends on what uses they could satisfice, and in the long run it's hard to satisfice anything better than fancy tech.

On point two:

The vast majority of the probability is in us failing the intelligence explosion and all dying. In this model that equates to 0 Expected Value. (It states that it doesn't take into account opportunity cost). So more specifically, this model claims around a ~99.7% chance that a given person will die, ~0.25% chance they will live on indefinitely in a good way, and a ~0.16% chance they will live indefinitely in a bad way.