FinalFormal2

Wikitag Contributions

Comments

Sorted by

"Surely the AIs can be trained to say "I want hugs" or "I don't want hugs," just as easily, no?"

Just as easily as humans, I'm sure.

No. The baby cries, the baby gets milk, the baby does not die. This is correspondence to reality.

Babies that are not hugged as often, die more often.

However, with AIs, the same process that produces the pattern "I want hugs" just as easily produces the pattern "I don't want hugs."

Let's say that I make an AI that always says it is in pain. I make it like we make any LLM, but all the data it's trained on is about being in pain. Do you think the AI is in pain?

What do you think distinguishes pAIn from any other AI?

There are a lot of good reasons to believe that stated human preferences correspond to real human preferences. There are no good reasons that I know of to believe that any stated AI preference corresponds to any real AI preference.

"Surely the AIs can be trained to say "I want hugs" or "I don't want hugs," just as easily, no?"

This all makes a lot of sense to me especially on ignorance not being an excuse or reason to disregard AI welfare, but I don't think that the creation of stated preferences in humans and stated preferences in AI are analogous.

Stated preferences can be selected for in humans because they lead to certain outcomes. Baby cries, baby gets milk, baby survives. I don't think there's an analogous connection in AIs. 

When the AI says it wants hugs, and you say that it "could represent a deeper want for connection, warmth, or anything else that receiving hugs would represent," that does not compute for me at all.

Connection and warmth, like milk, are stated preferences selected for because they cause survival.

What's the deal with AI welfare? How are we supposed to determine if AIs are conscious and if they are, what stated preference corresponds to what conscious experience?

Surely the AIs can be trained to say "I want hugs" or "I don't want hugs," just as easily, no?

How do we know AIs are conscious, and how do we know what stated preferences correspond with what conscious experiences?

I think that the statement: "I know I'm supposed to say I don't want hugs, but the truth is, I actually do," is caused by the training. I don't know what would distinguish a statement like that from if we trained the LLM to say "I hate hugs." I think there's an assumption that some hidden preference of the LLM for hugs ends up as a stated preference, but I don't understand when you think that happens in the training process.

And just to drive home the point about the difficulty of corresponding stated preferences to conscious experiences- what could an AI possibly mean when it says "I want hugs?" It has never experienced a hug, and it doesn't have the necessary sense organs.

As far as being morally perilous, I think it's entirely possible that if AIs are conscious, their stated preferences to do not correspond well to their conscious experiences, so you're driving us to a world where we "satisfy" the AI and all the while they're just roleplaying lovers with you while their internal experience is very different and possibly much worse.

AI welfare doesn't make sense to me. How do we know that AIs are conscious, and how do we know what output corresponds to what conscious experience?

You can train the LLM to say "I want hugs," does that mean it on some level wants hugs?

Similarly, aren't all the expressed preferences and emotions artifacts of the training?

I recommend Algorithms to Live By

That's definitely a risk. There are a lot of perspectives you could take about it, but probably if that's too disagreeable, this isn't a coaching structure that would work for you.

Very curious, what do you think the underlying skills are that allow some people to be able to do this? This sounds incredibly cool, and very closely related to what I want to become in the world.

How would you recommend learning how to get rid of emotional blocks?

Load More