Critique my Model: The EV of AGI to Selfish Individuals

by ozziegooen4 min read8th Apr 20189 comments


AI RiskForecasting & PredictionAI

[Edit: Changes suggested in the comments make this model & its takeaways somewhat outdated (this was one desired outcome of posting it here!). Be sure to read the comments.]

I recently spent a while attempting to explain my views on the EV of AGI for selfish individuals. I attempted to write a more conventional blog post, but after a lot of thinking about it moved to a Guesstimate model, and after more thinking about it, realized that my initial views were quite incorrect. I've decided to simply present my model, along with several points I find interesting about it. I'm curious to hear what people here think of it, and what points are the most objectionable.


Video walkthrough

Model Summary:

This model estimates the expected value of AGI outcomes to specific individuals with completely selfish values. I.E; if all you care about is your future happiness, how many QALYs would exist in expectation for you from scenarios where AGI occurs. For example, a simpler model could say that there's a 1% chance that an AGI happens, but if it does, you get 1000 QALYs from life extension, so the EV of AGI would be ~10 QALYs.

The model only calculates the EV for individuals in situations where an AGI singleton happens; it doesn't compare this to counterfactuals where an AGI does not happen or is negligible in importance.

The conclusion of this specific variant of the model is a 90% confidence interval of around -300 QALYs to 1600 QALYs. I think in general my confidence bounds should have been wider, but have found it quite useful in refining my thinking on this issue.

Thoughts & Updates:

1. I was surprised at how big of a deal hyperbolic discounting was. This turned out to be by far one of the most impactful variables. Originally I expected the resulting EV to be gigantic, but the discounting rate really changed the equation. In this model the discount rate would have to be less than 10^-13 to have less than a 20% effect on the resulting EV. This means that even if you have an incredibly low discount rate (10^-8 seems very low to me), you still need to consider it.

2. One reason why I made this model was to figure out if purely selfish people would have rational reasons to work on AI safety. At this point I'm not very sure. If the EV of it going well is +1k QALYs to a selfish person, then the EV of spending a year of hard work on making it turn out well is much less than that. Honestly, this model as it is suggests that selfish individuals aware of AI risk may be best off not worrying about it. Of course, this is the hypothetical selfish individual; actual people typically place some value on other humans, especially ones close to them, and this model doesn't take that into account.

3. My prior, even having read a fair bit of S-risk literature, is still that the probabilities and intensity levels of negative outcomes are quite smaller than that for positive outcomes. If this is not the case that would be incredibly significant, perhaps making it quite risky to be alive today. Please write in the comments if you think these numbers are wrong and what your estimates would be.

4. I think the main variable here that should get more analysis is the "Chances of Happening" for "Positive Outcomes". This cell is conditional on the facts that an AGI get developed that decides to preserve humans, that a given individual lives long enough to witness this, and that the controllers of the AGI decide to have it sustain that individual indefinitely. Those are a bunch of conditions that could get broken out into more detail.

5. I personally find selfishness to be somewhat philosophically incoherent, so it's difficult to say what exactly the maximum number of QALYS per year could hypothetically be experienced by one selfish person.

6. If the total expected QALYs of an AGI is greater than what one would otherwise expect in their life (say, over 50), that would suggest that the majority of one's expected value of their entire life would come from things after the AGI. I find this useful evidence for convincing my S1 complain less about short term sacrifices. Like, "Don't worry about the chocolate bar; if I can just make it to the AGI, there's far more good stuff later on."

Some thoughts on modeling in general:

1. I really like the idea of people presenting more probabilistic models as an alternative to regular blog posts. I'm not sure what the correct format for presenting such a model is. The goal would be for it to be as simple as possible to be understood by people, but also for it to be reasonable and interesting.

2. When I typically think about writing posts I often imagine the optimal post to be one that argues well for one novel side of a debate. However, a reasonable model should really aim to optimize for accuracy, not argumentation. If anything, I may be convinced that the presentation of a strong opinion on one side of a debate is generally a counter-signal to informational quality.

3. My sense is that well reasoned models should typically result in things that agree with our deep intuitions, rather than our naive assumptions about what a model would say.

4. I really like the ability of models (particularly probabilistic ones) to simplify discussion. I'm quite optimistic about their use, and would encourage more people to try doing more analysis using similar models.

5. I apologize if it seems like I'm promoting Guesstimate, my own product. I'd say here that I do think it's the best tool for this (I basically made it just for things like this), and at this point, I'm basically running it as a loss. I personally don't expect it to really ever be profitable, but I'm quite happy with what it's able to do for many people in the rationalist/EA communities. Guesstimate is free (for public use) and open source. The use of private accounts is very cheap, and if that cost is too much for you let me know and I'll give you free access.

Future Work (for myself or others)

1. I would really like to see a collection of "common knowledge distributions" that approximate community accepted beliefs of specific variables. In this case I came up with many of these numbers myself, but would have preferred it if some could have been more established. One convenience of this kind of modeling is that important metrics seem to come up again and again in different models, so if there were a collection of generally accepted metrics, this would make modeling much easier. These values don't even require that much consensus; even if some are debated within a few orders of magnitude, that could easily be enough to be very useful.

2. Some of the most obvious next models to do of this type are models of the Expected Value of AGI for groups with other belief systems, utilitarianism or similar being the most obvious examples. It could also be interersting to break down the EV of AGI based on which group creates it. For instance, I'd like to see a matrix of the EV of safe AGIs created by different actors to different other actors; for instance, how would utilitarians view a safe AGI created by the Chinese government, or how would the collective of Western nations view one created by a rogue state?

3. There's a ton of work that could be done on the EV of specific interventions, but this is a whole topic to itself.