It seems like a major issue here is that people often have limited introspective access to what their "true values" are. And it's not enough to know some of your true values; in the example you give the fact that you missed one or two causes problems even if most of what you're doing is pretty closely related to other things you truly value. (And "just introspect harder" increases the risk of getting answers that are the results of confabulation and confirmation bias rather than true values, which can cause other problems.)
Here's an attempt to formalize the "is partying hard worth so much" aspect of your example:
It's common (with some empirical support) to approximate utility as proportional to log(consumption). Suppose Alice has $5M of savings and expected-future-income that she intends to consume at a rate of $100k/year over the next 50 years, and that her zero utility point is at $100/year of consumption (since it's hard to survive at all on less than that). Then she's getting log(100000/100) = 3 units of utility per year, or 150 over the 50 years.
Now she finds out that there's a 50% chance that the world will be destroyed in 5 years. If she maintains her old spending patterns her expected utility is .5*log(1000)*50 + .5*log(1000)*5 = 82.5. Alternately, if interest rates were 0%, she might instead change her plan to spend $550k/year over the next 5 years and then $50k/year subsequently (if she survives). Then her expected utility is log(5500)*5+.5*log(50)*45 = 56.9, which is worse. In fact her expected utility is maximized by spending $182k over the next five years and $91k after that, yielding an expected utility of about 82.9, only a tiny increase in EV. If she has to pay extra interest to time-shift consumption (either via borrowing or forgoing investment returns) she probably just won't bother. So it seems like you need very high confidence of very short timelines before it's worth giving up the benefits of consumption-smoothing.
Why would you expect her to be able to diminish the probability of doom by spending her million dollars? Situations where someone can have a detectable impact on global-scale problems by spending only a million dollars are extraordinarily rare. It seems doubtful that there are even ways to spend a million dollars on decreasing AI xrisk now when timelines are measured in years (as the projects working on it do not seem to be meaningfully funding-constrained), much less if you expected the xrisk to materialize with 50% probability tomorrow (less time than it takes to e.g. get a team of researchers together).
I think it generally makes sense to try to smooth personal consumption, but that for most people I know this still implies a high savings rate at their first high-paying job.
Yeah that's essentially the example I mentioned that seems weirder to me, but I'm not sure, and at any rate it seems much further from the sorts of decisions I actually expect humanity to have to make than the need to avoid Malthusian futures.
I'm happy to accept the sadistic conclusion as normally stated, and in general I find "what would I prefer if I were behind the Rawlsian Veil and going to be assigned at random to one of the lives ever actually lived" an extremely compelling intuition pump. (Though there are other edge cases that I feel weirder about, e.g. is a universe where everyone has very negative utility really improved by adding lots of new people of only somewhat negative utility?)
As a practical matter though I'm most concerned that total utilitarianism could (not just theoretically but actually, with decisions that might be locked-in in our lifetimes) turn a "good" post-singularity future into Malthusian near-hell where everyone is significantly worse off than I am now, whereas the sadistic conclusion and other contrived counterintuitive edge cases are unlikely to resemble decisions humanity or an AGI we create will actually face. Preventing the lock-in of total utilitarian values therefore seems only a little less important to me than preventing extinction.
I think
- Humans are bad at informal reasoning about small probabilities since they don't have much experience to calibrate on, and will tend to overestimate the ones brought to their attention, so informal estimates of the probability very unlikely events should usually be adjusted even lower.
- Humans are bad at reasoning about large utilities, due to lack of experience as well as issues with population ethics and the mathematical issues with unbounded utility, so estimates of large utilities of outcomes should usually be adjusted lower.
- Throwing away most of the value in the typical case for the sake of an unlikely case seems like a dubious idea to me even if your probabilities and utility estimates are entirely correct; the lifespan dilemma and similar results are potential intuition pumps about the issues with this, and go through even with only single-exponential utilities at each stage. Accordingly I lean towards overweighting the typical range of outcomes in my decision theory relative to extreme outcomes, though there are certainly issues with this approach as well.
As far as where the penalty starts kicking in quantitatively, for personal decisionmaking I'd say somewhere around "unlikely enough that you expect to see events at least this extreme less than once per lifetime", and for altruistic decisionmaking "unlikely enough that you expect to see events at least this extreme less than once in the history of humanity". For something on the scale of AI alignment I think that's around 1/1000? If you think the chances of success are still over 1% then I withdraw my objection.
The Pascalian concern aside I note that the probability of AI alignment succeeding doesn't have to be *that* low before its worthwhileness becomes sensitive to controversial population ethics questions. If you don't consider lives averted to be a harm then spending $10B to decrease the chance of 10 billion deaths by 1/10000 is worse value than AMF. If you're optimizing for the average utility of all lives eventually lived then increasing the chance of a flourishing future civilization to pull up the average is likely worth more but plausibly only ~100x more (how many people would accept a 1% chance of postsingularity life for a 99% chance of immediate death?) so it'd still be a bad bet below 1/1000000. (Also if decreasing xrisk increases srisk, or if the future ends up run by total utilitarians, it might actually pull the average down.)
I think that I'd easily accept a year of torture in order to produce ten planets worth of thriving civilizations. (Or, if I lack the resolve to follow through on a sacrifice like that, I still think I'd have the resolve to take a pill that causes me to have this resolve.)
I'd do this to save ten planets of worth of thriving civilizations, but doing it to produce ten planets worth of thriving civilizations seems unreasonable to me. Nobody is harmed by preventing their birth, and I have very little confidence either way as to whether their existence will wind up increasing the average utility of all lives ever eventually lived.
There's some case for it but I'd generally say no. Usually when voting you are coordinating with a group of people with similar decision algorithms who you have some ability to communicate with, and the chance of your whole coordinated group changing the outcome is fairly large, and your own contribution to it pretty legible. This is perhaps analogous to being one of many people working on AI safety if you believe that the chance that some organization solves AI safety is fairly high (it's unlikely that your own contributions will make the difference but you're part of a coordinated effort that likely will). But if you believe is extremely unlikely that anybody will solve AI safety then the whole coordinated effort is being Pascal-Mugged.
In particular it seems very plausible that I would respond by actively seeking out a predictable dark room if I were confronted with wildly out-of-distribution visual inputs, even if I'd never displayed anything like a preference for predictability of my visual inputs up until then.