In popular AGI takeoff scenarios, AGIs end up doing something "unnatural" where the complexity of the goal is completely orthogonal to their ability to fulfill that goal. I'm looking for a scenario in which an AGI's objective function is "very unnatural". Not maximizing paperclips, building companies, or performing content recommendation, but something distinctly un-human. Thank you!

New Answer
New Comment

2 Answers sorted by

Charlie Steiner

Aug 06, 2021

40

I'm assuming we're not counting normal instrumental convergent goals as "too natural," so our AGI can do things like gather resources, attempt to rearrange lots of matter, etc.

One fun scenario that gives weird results is someone attempting to maximize the output of a classifier trained by supervised learning. So you train something to detect when either a static pattern or some sort of dynamic system of matter is "good," and then you try to maximize "goodness," and then you get the universe equivalent of an adversarial example.

This leads to the weird behavior of taking certain easy-to-perceive patterns that correlate with the goodness-signal in the training data (but not all such patterns) and the AI trying as hard as it can to make those patterns as intense as possible throughout the universe.

Dagon

Aug 03, 2021

40

I suspect that the observed behaviors will feel normal - even if the goals and motives are inscrutable, humans will infer more pedestrian goals from the actions they see.  In other words, the motivations may be unnatural or weird, but the actions will still be actions, and we'll make up reasons that we can believe.

For instance, the AGI may cure cancer by making our DNA more resistant to mutation, because they fear uncontrolled variance in their pets.  We'll only notice the cure, not the motive.

3 comments, sorted by Click to highlight new comments since: Today at 11:53 AM

I don't understand the question. Maximizing paperclips at the expense of everything else strikes me as "distinctly un-human", isn't it?

I think maximizing paperclips is at least comprehensible to a human as a thing that some agent might take actions to do, even if it's extremely narrow from a human point of view. I suspect that the more alien tasks are likely to be in the opposite direction: too complex for us to comprehend. Which also makes them difficult when trying to find and post examples.

While reading, I was thinking along the lines of: in the space of all possible mappings from states of the universe (including histories) to "value", what proportion would make any sense at all? I suspect almost none. The problem is that almost all of them are also indescribably complex. Now, an emerging super-intelligence isn't likely to have a purely random value function, and might not have anything we recognise as a value function at all, but it may still be an exercise that gives some hint at just how different non-human agents could possibly be.

I suspect that almost all of those would still lead to comprehensible instrumental goals though, such as "find out what the state of the universe actually is", and "take over a lot of the universe to direct it toward a more highly valued state".

Yeah, in one sense this question seems impossible to answer - "help me comprehend something incomprehensible to me by definition."

But there's another type of answer; of the utility functions that are alien in OP's sense, it is possible that most will share patterns discernible by humans. OP could be asking what those patterns are.

I'm not sure how worthwhile it is to try to predict the high-level behavior of a generic superintelligence whose goals we don't understand.