Open & Welcome Thread - August 2020

The scenario I'm imagining isn't an AGI that merely "gets rid of" humans. See SignFlip.

Open & Welcome Thread - August 2020

Would it be likely for the utility function to flip *completely*, though? There's a difference between some drift in the utility function and the AI screwing up and designing a successor with the complete opposite of its utility function.

Open & Welcome Thread - August 2020

Is it plausible that an AGI could have some sort of exploit (buffer overflow maybe?) that could be exploited (maybe by an optimization daemon…?) and cause a sign flip in the utility function?

How about an error during self-improvement that leads to the same sort of outcome? Should we expect an AGI to sanity-check its successors, even if it’s only at or below human intelligence?

Sorry for the dumb questions, I’m just still nervous about this sort of thing.

Open & Welcome Thread - July 2020

Thanks for your response, just a few of my thoughts on your points:

If you *can* stop doing philosophy and futurism

To be honest, I've never really *wanted* to be involved with this. I only really made an account here *because* of my anxieties and wanted to try to talk myself through them.

If an atom-for-atom identical copy of you, *is* you, and an *almost* identical copy is *almost* you, then in a sufficiently large universe where all possible configurations of matter are realized, it makes more sense to think about the relative measure of different configurations rather than what happens to "you".

I don't buy that theory of personal-identity personally. It seems to me that if the biological me that's sitting here right now isn't *feeling* the pain, that's not worth worrying about as much. Like, I can *imagine* that a version of me might be getting tortured horribly or experiencing endless bliss, but my consciousness doesn't (as far as I can tell) "jump" over to those versions. Similarly, were *I* to get tortured it'd be unlikely that I care about what's happening to the "other" versions of me. The "continuity of consciousness" theory *seems* stronger to me, although admittedly it's not something I've put a lot of thought into. I wouldn't want to use a teleporter for the same reasons.

*And* there are evolutionary reasons for a creature like you to be *more* unable to imagine the scope of the great things.

Yes, I agree that it's possible that the future could be just as good as an infinite torture future would be bad. And that my intuitions are somewhat lopsided. But I do struggle to find that comforting. Were an infinite-torture future realised (whether it be a SignFlip error, an insane neuromorph, etc.) the fact that I could've ended up in a utopia wouldn't console me one bit.

Open & Welcome Thread - July 2020

As anyone could tell from my posting history, I've been obsessing & struggling psychologically recently when evaluating a few ideas surrounding AI (what if we make a sign error on the utility function, malevolent actors creating a sadistic AI, AI blackmail scenarios, etc.) It's predominantly selfishly worrying about things like s-risks happening to me, or AI going wrong so I have to live in a dystopia and can't commit suicide. I don't worry about human extinction (although I don't think that'd be a good outcome, either!)

I'm wondering if anyone's gone through similar anxieties and have found a way to help control them? I'm diagnosed ASD and I wouldn't consider it unlikely that I've got OCD or something similar on top of it, so it's possibly just that playing up.

Likelihood of hyperexistential catastrophe from a bug?
Not really, because it takes time to train the cognitive skills necessary for deception.

Would that not be the case with *any* form of deceptive alignment, though? Surely it (deceptive alignment) wouldn't pose a risk at all if that were the case? Sorry in advance for my stupidity.

Likelihood of hyperexistential catastrophe from a bug?

Sorry for the dumb question a month after the post, but I've just found out about deceptive alignment. Do you think it's plausible that a signflipped AGI could fake being an FAI in the training stage, just to take a treacherous turn at deployment?

‘Maximum’ level of suffering?

It’s more a selfish worry, tbh. I don’t buy that pleasure being unlimited can cancel it out though - even if I were promised a 99.9% chance of Heaven and 0.1% chance of Hell, I still wouldn’t want both pleasure and pain to be potentially boundless.

‘Maximum’ level of suffering?

I do agree that they’re symmetrical. I just find it worrying that I could potentially experience such enormous amounts of pain, even when the opposite is also a possibility.

‘Maximum’ level of suffering?
I'd still expect a reasonable utility function to *cap* the (dis)utility of pain. If it didn't, the (possible) torture of just one creature capable of experiencing arbitrary amounts/degrees/levels of pain would effectively be 'Pascal's hostage'

I suppose I never thought about that, but I'm not entirely sure how it'd work in practice. Since the AGI could never be 100% certain that the pain it's causing is at its maximum, it might further increase pain levels, just to *make sure* that it's hitting the maximum level of disutility.

It also seems unclear why evolution would result in creatures able to experience pain more intensely than such a maximum.

I think part of what worries me is that, even if we had a "maximum" amount of pain, it'd be hypothetically possible for humans to be re-wired to remove that maximum. I'd think that I'd still be the same person experiencing the same consciousness *after* being rewired, which is somewhat troubling.

If the pain a superintelligence can cause scales linearly or better with computational power, then the thought is even more terrifying.

Overall, you make some solid points that I wouldn't have considered otherwise.

Load More