Why No *Interesting* Unaligned Singularity?

The emergence of the meme ecosystem could be considered an "interesting" partial success from evolution's perspective.

I think it's plausible that an unaligned singularity could lead to things we consider interesting, because human values might be more generic than they appear, the apparent complexity an emergent feature resulting from power-seeking and curiosity drives or mesa-optimization. I also think the whole framework of "a singleton with a fixed utility function becomes all-powerful and optimizes that for all eternity" might be wrong, since human values don't seem to work that way.

[-]AprilSR4y80

I think "somewhat interesting" is a very low bar and should be distinguished from near-misses which still preserve a lot of what humanity values.

[-]Quintin Pope4y70

I don’t agree with this line of reasoning because actual ML systems don’t implement either a simplicity or speed prior. If we assume away inner alignment failures, then current ML systems implement something like a speed-capped simplicity prior.

If we allow inner alignment failures, then things become FAR more complex. Self perpetuating mesa optimizer will try to influence the distribution of future hypotheses to make sure the system retains the mesa optimizer in question. The system’s “prior” thus becomes path dependent in a way that neither the speed nor simplicity priors capture at all.

A non-inner aligned system is more likely to retain hypotheses that arise earlier in training, since the mesa optimizers implementing those hypotheses can guide the system’s learning process away from direction that would remove the mesa optimizer in question. This seems like the sort of thing that could cause an unaligned AI to nonetheless have a plurality of nonhuman values that it perpetuates into the future.

[-]Donald Hobson4y60

I don't know. I think the space of things that some humans would find kind of cool is large.

LIke I have the instinctive reaction of "warp drives and dyson spheres, wow cool".

That isn't to say that I place a large amount of utility on that outcome, just that my initial reaction is ooh, cool.

The chance of a mars rover nearly but not quite hitting mars is large. It was optimized in the direction of mars, and one little mistake can cause a near miss.

For AI, if humans try to make FAI, but make some fairly small mistake, we could well get a near miss. Of course, if we make lots of big mistakes, we don't get that.

[-]David Udell4y20

I agree with the first bit! I think a universe of relativistic grabby AGI spheres-of-control is at least slightly cool.

It's possible that the actual post-unaligned-singularity world doesn't look like that, though. Like, "space empires" is a concept that reflects a lot of what we care about. Maybe large-scale instrumentally efficient behavior in our world is really boring, and doesn't end up keeping any of the cool bits of our hard sci-fi's speculative mature technology.

The chance of a mars rover nearly but not quite hitting mars is large. It was optimized in the direction of mars, and one little mistake can cause a near miss.

For AI, if humans try to make FAI, but make some fairly small mistake, we could well get a near miss. Of course, if we make lots of big mistakes, we don't get that.

I don't think the mars mission analogy goes through given the risk of deceptive alignment.

Optimizing in the direction of a mars landing when there's a risk of a deceptive agent just means everything looks golden and mars-oriented up until you deploy the model … and then the deployed model starts pursuing the unrelated goal it's been waiting to pursue all along. The deployed model's goal is allowed to be completely detached from mars, as long as the model with that unrelated goal was good at playing along once it was instantiated in training.

[-]Donald Hobson3y20

Well if we screw up that badly with deceptive misalignment, that corresponds to crashing on the launchpad.

It is reasonably likely that humans will have some technique they use that is intended to minimize deceptive misalignment. Or that gradient descent shapes the goals to something similar to what we want before the AI is smart enough to be deceptive.

[-]Pattern4y*50

Why did 'evolution (which is not interesting?) end up creating something interesting (us)' while (your premise) 'ML (which is not interesting) will not end up creating something interesting (to us)'?

Edited to add italics, above.

[-]Matthew Barnett4y50

Evolution lacks the foresight that humans have. It seems relatively plausible that if evolution had only a little more foresight, it would have been able to instill its objectives much more robustly. Consider that even with the very stupid objective function we were given (maximize inclusive genetic fitness) many humans still say they want children intrinsically. That's not exactly the same thing as wanting to maximize genetic fitness, but it's pretty close.

Plus, my interpretation of agentofuser's question was that they weren't asking whether unaligned AGI would produce something we'd consider good outright; merely that it would retain some elements of human value. I think this is far more plausible than it seems Eliezer does, for reasons that Paul Christiano talked about here.

[-]Anthony DiGiovanni4y10

Why does the person asking this question care about whether "interesting"-to-humans things happen, in a future where no humans exist to find them interesting?

[-]David Udell4y10

Because beings like us (in some relevant respects) that outlive us might carry the torch of our values into that future! Don't read too much into the word "interesting" here: I just meant "valuable by our lights in some respect, even if only slightly.

Sure, it sucks if humanity doesn't live to be happy in the distant future, but if some other AGI civilization is happy in that future, I prefer that to an empty paperclip-maximized universe without any happiness.

[-]Anthony DiGiovanni4y20

That all sounds fair. I've seen rationalists claim before that it's better for "interesting" things (in the literal sense) to exist than not, even if nothing sentient is interested by them, so that's why I assumed you meant the same.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

12

Why No Interesting Unaligned Singularity?

12

12

The Question:

The Answer: