The scenario I'm imagining isn't an AGI that merely "gets rid of" humans. See SignFlip.

Would it be likely for the utility function to flip *completely*, though? There's a difference between some drift in the utility function and the AI screwing up and designing a successor with the complete opposite of its utility function.

Is it plausible that an AGI could have some sort of exploit (buffer overflow maybe?) that could be exploited (maybe by an optimization daemon…?) and cause a sign flip in the utility function?

How about an error during self-improvement that leads to the same sort of outcome? Should we expect an AGI to sanity-check its successors, even if it’s only at or below human intelligence?

Sorry for the dumb questions, I’m just still nervous about this sort of thing.

Thanks for your response, just a few of my thoughts on your points:

If you *can* stop doing philosophy and futurism

To be honest, I've never really *wanted* to be involved with this. I only really made an account here *because* of my anxieties and wanted to try to talk myself through them.

If an atom-for-atom identical copy of you, *is* you, and an *almost* identical copy is *almost* you, then in a sufficiently large universe where all possible configurations of matter are realized, it makes more sense to think about the relative measure of different configurations rather than what happens to "you".

I don't buy that theory of personal-identity personally. It seems to me that if the biological me that's sitting here right now isn't *feeling* the pain, that's not worth worrying about as much. Like, I can *imagine* that a version of me might be getting tortured horribly or experiencing endless bliss, but my consciousness doesn't (as far as I can tell) "jump" over to those versions. Similarly, were *I* to get tortured it'd be unlikely that I care about what's happening to the "other" versions of me. The "continuity of consciousness" theory *seems* stronger to me, although admittedly it's not something I've put a lot of thought into. I wouldn't want to use a teleporter for the same reasons.

*And* there are evolutionary reasons for a creature like you to be *more* unable to imagine the scope of the great things.

Yes, I agree that it's possible that the future could be just as good as an infinite torture future would be bad. And that my intuitions are somewhat lopsided. But I do struggle to find that comforting. Were an infinite-torture future realised (whether it be a SignFlip error, an insane neuromorph, etc.) the fact that I could've ended up in a utopia wouldn't console me one bit.

As anyone could tell from my posting history, I've been obsessing & struggling psychologically recently when evaluating a few ideas surrounding AI (what if we make a sign error on the utility function, malevolent actors creating a sadistic AI, AI blackmail scenarios, etc.) It's predominantly selfishly worrying about things like s-risks happening to me, or AI going wrong so I have to live in a dystopia and can't commit suicide. I don't worry about human extinction (although I don't think that'd be a good outcome, either!)

I'm wondering if anyone's gone through similar anxieties and have found a way to help control them? I'm diagnosed ASD and I wouldn't consider it unlikely that I've got OCD or something similar on top of it, so it's possibly just that playing up.

Not really, because it takes time to train the cognitive skills necessary for deception.

Would that not be the case with *any* form of deceptive alignment, though? Surely it (deceptive alignment) wouldn't pose a risk at all if that were the case? Sorry in advance for my stupidity.

Sorry for the dumb question a month after the post, but I've just found out about deceptive alignment. Do you think it's plausible that a signflipped AGI could fake being an FAI in the training stage, just to take a treacherous turn at deployment?

It’s more a selfish worry, tbh. I don’t buy that pleasure being unlimited can cancel it out though - even if I were promised a 99.9% chance of Heaven and 0.1% chance of Hell, I still wouldn’t want both pleasure and pain to be potentially boundless.

I do agree that they’re symmetrical. I just find it worrying that I could potentially experience such enormous amounts of pain, even when the opposite is also a possibility.

I'd still expect a reasonable utility function to *cap* the (dis)utility of pain. If it didn't, the (possible) torture of just one creature capable of experiencing arbitrary amounts/degrees/levels of pain would effectively be 'Pascal's hostage'

I suppose I never thought about that, but I'm not entirely sure how it'd work in practice. Since the AGI could never be 100% certain that the pain it's causing is at its maximum, it might further increase pain levels, just to *make sure* that it's hitting the maximum level of disutility.

It also seems unclear why evolution would result in creatures able to experience pain more intensely than such a maximum.

I think part of what worries me is that, even if we had a "maximum" amount of pain, it'd be hypothetically possible for humans to be re-wired to remove that maximum. I'd think that I'd still be the same person experiencing the same consciousness *after* being rewired, which is somewhat troubling.

If the pain a superintelligence can cause scales linearly or better with computational power, then the thought is even more terrifying.

Overall, you make some solid points that I wouldn't have considered otherwise.

