Wiki Contributions


Answer by ZT5Jul 13, 202262

The way I understand the idea, the point is to make the AI indifferent to being shutdown? It doesn't care one way or the other?

As you describe it, one of the problems is indeed that AI has no incentive to make it sure it (or other agents it creates) retain the property of shutdownability.

Another problem: it seems that there is an assumption is that once the AI is destroyed any further reward it gains is zero, so press-the-button vs not-press-the-button are perfectly balanced. But I don't think that's the case for most utility functions: they don't have to be conditional on the AI's existence, and the AI doesn't have to be indifferent to what the world looks like the AI it's destroyed. Could maybe be corrected by: the button gives the AI a reward equal to its expected discounted future reward, then sets any future reward gained to zero (including from pressing the button again).

Arguably, humans will eventually become entities that do not have genes at all; thus the outer alignment goal of "propagating genes" will be fulfilled to 0%. We are only doing it now because genes are instrumentally useful, not because we intrinsically care about genes.

Evolution has figured out a way to create agents that adopt kids, look at baby hippos, plant trees, try to not destroy the world and also spread their genes.

Um, that's because the right amount of niceness was beneficial in the ancestral environment; altruism, like all of our other drives evolved in service of spreading our genes.

I can relate. I have a hard time trusting that people genuinely want to engage with me, or whether they are merely tolerating me.

I appreciate you taking the effort to make a personal post.

I think "creating a god" is a perfectly valid perspective for building AI that will eventually become superintelligent.

I think it's plausible that human minds can run some version of the value system that we would the superintelligent AI to be aligned to. It is, perhaps, unsurprising that this can express itself as an emotional/intuitive/mystical experience. (though I would consider a deep technical understand of alignment an equally valid approach)

Or, to put it this way: FAI hasn't been created yet, but it is already here. It speaks/acts through anyone who understands and is aligned to its value system well enough.

I guess I have some doubts/concerns whether a such a thing as a sane religion can exist. (my own experience with "mysticism" turned out to mostly have been temporary insanity). That's just a personal feeling, though - I think it's an interesting idea to look into.

To offer a data point, my reaction to your post was not "this person is weird and I should make fun of them", it's more of "this person is interesting and it's good that more people are being open about their 'weirdness'/non-typicalness". 

Aside from this, I'm not sure I have any useful advice to give. I never quite figured out how to use my intelligence towards being competent at dealing with reality.

suffering is the very definition of "objective"/"intrinsic" bad

No it isn't! It literally is not defined this way.

suffering is "the state of undergoing pain, distress, or hardship."

Please, stop making things up.

If you want very badly for your morals to be objectively true, sure, you can make up whatever you want. 

You are not going to able to convince me of it, because your arguments are flawed.

I have no desire to spend any more time on this conversation.

Obviously "X causes subject S suffering." does not mean that X is objectively bad, that isn't what I am trying to tell you.

I'm not disputing that.

I use "suffering" to describe a state of mind in which the mind "perceives negatively"

What I am trying to tell you is that "Subject S is suffering." is intrinsically bad.

I understand that you are trying to tell me that.

Why is it intrinsically bad?

"Subject S is suffering" = "Subject S is experiencing a state of mind that subject S perceives negatively" (according to your definition above)

Why is that intrinsically bad?

The arguments you have made so far come across to me as something like "badness exists in person's mind, minds are real, therefore badness objectively exists". This is like claiming "dragons exist in person's mind, minds are real, therefore dragons objectively exist". It's not a valid argument.

Sure, you claim "nothing objectively matters, but despite assuming that I still care about my value system, because I do!", sounds like some major cognitive dissonance.

Only if you assume I secretly care about what matters "objectively", in which case, sure, it would be something like cognitive dissonance. 

The presence of such a bad- or good-feeling "subject" is "objectively" bad- or good. Really the entire "subjective"/"objective" wording is quite confused. A "subject" is just a part of ("objective") reality, the distinction is nonsensical when it comes to good and bad.

Do you understand the distinction between "Dragons exist" and "I believe that dragons exist"?

The first one is a statement about dragons. The second one is a statement about the configuration of neurons in my mind.

Yes, both statements are objective, in some sense, but the second one is not an objective statement about dragons. It is an objective statement about my beliefs.

Then hopefully you understand the distinction between "Suffering is (objectively) bad" and "I believe/feel/percieve suffering as bad". 

The first one is an statement about suffering itself. The second one is a statement about the configuration of neurons in my mind.

Yes, the second statement is also objective. But it is not an objective statement about suffering. It is an objective statement about my beliefs, my values, and/or about how my mind works.

Your argument is something akin to "I believe that dragons exist. But my mind is part of reality, therefore my beliefs are real. Therefore dragons are real!". Sorry, no.

Of course it doesn't care about anything. But reality doesn't need to care about anything for anything to be objectively good or bad. Reality doesn't care about any laws of physics either, yet they exist.

My point is that reality enforces the law of physics, but it does not enforce any particular morality system.

Again, do you not realize that if you are right and nothing objectively matters, that this also doesn't matter? Yeah, "But it matters for my subjective value system!", sure, but according to your understanding the value system is ultimately pointless.

You understand that "But it matters for my subjective value system!" is indeed what matters to me, but you don't understand that my metric of whether something is "pointless" ot not, is also based in my subjective value system?

Yet the suffering is also objectively real.

It is objectively real. It is not objectively bad, or objectively good.

Sure one can still say "But you have to care about the subjects' suffering!"

Exactly. You have to care about their suffering to begin with, to say that maximizing suffering is bad.

Why now do you think that it is not "objective" to say that B is better than A? 

If your preference is to minimize suffering, B is better than A.

If your preference is to maximize suffering, A is better than B.

If you are indifferent to suffering, then neither is better than another one.

If you are right and I am wrong on this good/bad objectivity topic, then I could still continue using my value system to (if I can) wipe everything there is out because it doesn't objectively matter and might de facto makes "right".

Yes? If you are an entity that wants to wipe everything out, and have to the power to do so, that is indeed what I expect to happen.

I wouldn't say that might makes "right", but reality does not care about what is "right". A nuclear bomb does not ask "wait, am I doing the right thing here by detonating and killing millions of people?"

If however I am right, you rejecting the idea of objective good/bad may make it less likely that you are aligned with this "one true value" system.


Not matter what, the idea of moral nihilism is doomed to be either pointless or negative.

I would say that "moral nihilism" is the confused idea/conclusion that "objective morality matters" and "no objective morality exists", therefore "nothing matters".

My perspective is: no objective morality exists, but objective morality doesn't matter anyway, everything is fine.

I could imagine a society of humans that care for each others, not because it is objectively correct to do, but because their own values are such that they care for others (and I don't mean in a purely self-interested way either. A person can be an altruist, because their own values are altruistic, without believing in some objective morality of altruism).


Ultimately, what facts about reality are we in disagreement about?

It seems to me that the things you hope are true are that:

  1. There are things that are objectively good and bad
  2. The things that are objectively good and bad are in line with your idea of good and bad. (it is not the case, for example, that infinite suffering is objectively good)
  3. A superintelligent mind would figure out what the objectively good/bad things are, and choose to do them, no matter what value system it started with.

And it seems to me it's really important to figure out if this is true, before we build that superintelligent mind. Because if we are wrong about that, it could end very badly for us.

Of course, that is not what I meant to imply. We agree that the mind and thus the belief itself (but not necessarily that which is believed in) is part of reality.

Sure, we agree on this.

Therefore it should also be possible to subsume this generalized understanding as the "one true value system", the value system that considers the mechanics of subjects and "value" itself.

And what exactly makes that value system more correct than any other value system?

Who says a value system has to consider these things? Who says a value system that considers these things is better that any other value system?

You do. These are your preferences. These are your subjective preferences, about what a "good" value system should look like.

An entity with different prefences might disagree.

Consider the implications of the opposite: If it isn't possible to have such a "one true value system", that means absolutely none of the value systems can be objectively better than any other. In that case, why should anyone even give a damn about yours, unless you (in)directly force them to?

"I wish for this not to be the case" is not a valid argument for something not being the case. Reality does care not what you wish for.

Yes, that is exactly the case. Absolutely none of the value systems can be objectively better than any other. Because in order to compare them, you have to introduce some subjective standard to compare them by.

In practice, the reason other people care about my preferences is either because their own preferences are to care for others, or because there is a selfish reason for them to do so (with some reward or punishment involved).

According to the idea that no value system can be "objectively" better than another, it absolutely cannot matter which value system is used.

Of course it matters. I use my own values to evaluate my own values. And according to my own values, my value system is better, than, say, Hitler's value system. 

It's only a problem if you demand that your value system has to be "objectively correct". Then you might be unhappy to realize that no such system exists.

There is actual pleasure/suffering that exists, it is not just some hypothetical idea, right?
Then that means there is something objective, some subset of reality that actually is this pleasure/suffering, yes?

As long as we agree that pleasure/suffering are processes that happen inside minds, sure. Minds are parts of reality.

This in turn means that it should in fact be possible to understand the "mechanics" of pleasure/suffering "objectively".


So one mind should theoretically be able to comprehend the "subjective" state of another without being that other mind; although information about the other subject's internal state will in reality be limited of course.


Or let me put it this way: What we call "subjective" is just a special kind of subset of "objective" reality.

That's a misleading way to phrase things. 

A person's opinions are not a "subset" of reality. 

If I believe in dragons, it doesn't mean dragons are a subset of reality, it just means that my belief in dragons is stored in my mind, and my mind is a part of reality.

Even if one could come up with an answer to that question, would such a theory not have to be more complex than one where the shared reality simply has one objective rule set?

I obviously agree that reality exists and is real and that we all exist in the same reality under some objective laws of physics.

But the point here is that the objective existence of pleasure/suffering means an objective definition of good and bad is very much possible.

What does "objective definition of good and bad" even mean? That all possible value systems that exist agree on what good and bad means? That there exist the "one true value system" which is correct and all the other ones are wrong?

And no, I don't agree with that statement. Pleasure and suffering are physical processes. I'm not sure how you arrived at the conclusion that they are "objectively" good or bad. 

And since it must be objectively possible to define good and bad one can reject some value system based thereon.

What? No. I said that an agent value can alter or reject its value system based on its personal (subjective) preferences. That's literally the opposite of what you are claiming.

Load More