Posts

Sorted by New

Wiki Contributions

Comments

1p2y20

The three options do seem much like shades of the same thing (various forms of death):

  1. Might be preferable to 1 in a multiverse as you might simply might find yourself not experiencing those "branches".
  2. Seems really bad, however it's not even clear that those humans would even be that conscious enough to sustain a self which might actually lower the "moral status" (or how much negative utility you want to ascribe to it), as that requires some degree of social interaction.
  3. Is better than 3, but by how much, it's not obvious if the selves of these blissed out humans wouldn't dissolve in the same way as in 3 (think what solitary confinement does to a mind, or what would happen if you increased the learning rate to a neural network high enough?)

So to me these all seem to be various shades of death in various forms. I might prefer 2 because I do expect a multiverse.

I would propose that there's some ways of shaping the rewards for a potential AGI that while not safe by the AI safety's community's standards, nor aligned in a "does what you want and nothing else" sense, it might give a much higher chance of positive outcomes than these examples despite still being a gamble: a curiosity drive, for example see OpenAI's "Large Scale Curiosity" paper, I would also argue that GPT-n's do end up with something similar by default, without fine tuning or nudging it in that direction with RL.

Why a curiosity drive (implemented as intrinsic motivation, based on prediction error)? I believe it's likely that a lot of our complex values are due to this and a few other special things (empathy) as they interact with our baser biological drives and the environment. I also believe that having such a drive might be essential if we'd want future AGIs to be peers we cooperate with - and if by some chance it doesn't work out and we go extinct, at the very least it might result in an AGI civilization with relatively rich experiences, so it wouldn't be a total loss from some sort of utilitarian perspective.

My initial reply to this was rather long-winded, rambling and detailed, it had justifications for those beliefs, but it was deemed inappropriate for a front page post comment, so I've posted it to my shortform if you'd like to see my full answer (or should appear there if it gets past the spam filter).