ihatenumbersinusernames7

LESSWRONG
LW

ihatenumbersinusernames7 — LessWrong

Similarly, Claude plausibly does have a convergent incentive to hack out of its machine and escape onto the internet, but it can’t realistically do that yet, even if it wanted to.

A sentence or so of explanation of how we know "it can't realistically do that yet" (or a link to supporting evidence) would be helpful here.

Whence unchangeable values?

ihatenumbersinusernames79d10

Some values don't change. Citation needed.

Just a few values (of mine, at least) that have never changed:

Having fun

Learning (Having a more accurate map of the territory)

Physical / mental / financial / relational health

Many forms of freedom in pursuing my goals

Whence unchangeable values?

ihatenumbersinusernames710d10

Thanks for the links! In addition to Shard Theory, I have seen Steven's work and it is helpful. Both approaches seem to suggest human terminal values change...I don't know what they'd say about the idea that some (human) terminal values are unchanging.

If Evolution is the master and humans are the slave in Wei Dai's model, that seems to suggest that we don't have unchangeable terminal values. But while the concept makes sense at the evolutionary scale, it doesn't make sense to me that it implies within-lifespan terminal value changeability (or really any values...if I want pizza for dinner, evolution can't suddenly make me want burgers). What do you think?

Whence unchangeable values?

ihatenumbersinusernames710d10

And I don't know of any other entities to which "values" can yet be applied.

So if AlphaZero doesn't have values, according to you, how would you describe its preference that "board state = win?"

And why do you say that "values" can be applied to humans? What makes us special?

Wei Dai's Shortform

ihatenumbersinusernames723d10

The only reason I believe myself to have "objective" moral worth is because I have subjective experience. Maybe more wordplay than irony, but submitted for your amusement.

When does competition lead to recognisable values?

ihatenumbersinusernames723d10

I agree, and would also point out that since:

By contrast, real friendship has to be (1A)

...this intrinsic value [friendship] is in place and leads to cooperation (an instrumental value).

Very different than the model that says: competition -> cooperation -> the value [friendship].

Causality and Moral Responsibility

ihatenumbersinusernames723d10

There is a science of control systems that doesn't require the system being controlled to be indeterministic

Indeterministic is a loaded word. Certainly we don't believe our actions to be random, but I maintain that the question before compatibilists/semicompatibilists (which I hoped this post would address but IMO doesn't) is why seeing free will as a human construct is meaningful. For example:

Am I suggesting that if an alien had created Lenin, knowing that Lenin would enslave millions, then Lenin would still be a jerk? Yes, that's exactly what I'm suggesting. The alien would be a bigger jerk.

So if I create an AI that steals money, I am the greater jerk but the AI is also a jerk?

It seems to me that if you create an agent and put the agent into an environment where X will happen, you have exonerated the agent in regard to X. Maybe this just means I'm not a compatibilist, but I still don't see a good argument here for compatibilism/semicompatibilism.

Rock bottom terminal value

ihatenumbersinusernames725d30

I've though about this some more and I think what you mean (leaving aside physical and homeostatic values and focusing on organism-wide values) is that, even if we define our "terminal value" as I have above, whence the basket of goods that mean "happiness/flourishing" to me?

Again I think the answer is evolution plus something...some value drift (that as you say, the Shard Theory people are trying to figure out). Is there a place/post you'd recommend to get up to speed on that? The wikitag is a little light on details (although I added a sequence that was a good starting place). https://www.lesswrong.com/w/shard-theory

Causality and Moral Responsibility

ihatenumbersinusernames725d10

"You have to learn them from other people , and their attitudes praise and blame are how they get imprinted into brains, when they are."

By my lights, this skirts the issue too. Yudkowsky described the deterministic nature of adhering to moral norms. You're talking about where moral norms come from. But the moral responsibility question is, do we in any sense have control over (and culpability for) our moral actions?

Causality and Moral Responsibility

ihatenumbersinusernames725d12

So, not yet knowing the output of the deterministic process that is myself, and being duty-bound to determine it as best I can, the weight of moral responsibility is no less.

The reason you would be thus "duty-bound" seems to be the crux of this whole post, and I don't see one provided.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments