Isn't LessWrong a disproof of this? Aren't we thousands of people? If you picked two active LWers at random, do you think the average overlap in their reading material would be 5 words? More like 100,000, I'd think.

Reply

2

Acting Wholesomely

PhilGoetz1mo1811

I think it would be better not to use the word "wholesome". Using it is cheating, by letting us pretend at the same time that (A) we're explaining a new kind of ethics, which we name "wholesome", and (B) that we already know what "wholesome" means. This is a common and severe epistemological failure mode which traces back to the writings of Plato.

If you replace every instance of "wholesome" with the word "frobby", does the essay clearly define "frobby"?

It seems to me to be a way to try to smuggle virtue ethics into the consequentialist rationality community by disguising it with a different word. If you replace every instance of "wholesome" with the word "virtuous", does the essay's meaning change?

Reply

Good HPMoR scenes / passages?

PhilGoetz2mo20

Thank you! The 1000-word max has proven to be unrealistic, so it's not too long. You and g-w1 picked exactly the same passage.

Reply

Good HPMoR scenes / passages?

PhilGoetz2mo20

Thank you! I'm just making notes to myself here, really:

Harry teaches Draco about blood science and scientific hypothesis testing in Chapter 22.
Harry explains that muggles have been to the moon in Chapter 7.
Quirrell's first lecture is in chapter 16, and it is epic! Especially the part about why Harry is the most-dangerous student.

Reply

Even if you have a nail, not all hammers are the same

PhilGoetz4mo20

I think the problem is that each study has to make many arbitrary decisions about aspects of the experimental protocol. This decision will be made the same way for each subject in a single study, but will vary across studies. There are so many such decisions that, if the meta-analysis were to include them as dependent variables, each study would introduce enough new variables to cancel out the statistical power gain of introducing that study.

Reply

Ends Don't Justify Means (Among Humans)

PhilGoetz4mo20

You have it backwards. The difference between a Friendly AI and an unfriendly one is entirely one of restrictions placed on the Friendly AI. So an unfriendly AI can do anything a friendly AI could, but not vice-versa.

The friendly AI could lose out because it would be restricted from committing atrocities, or at least atrocities which were strictly bad for humans, even in the long run.

Your comment that they can commit atrocities for the good of humanity without worrying about becoming corrupt is a reason to be fearful of "friendly" AIs.

Reply

On my AI Fable, and the importance of de re, de dicto, and de se reference for AI alignment

PhilGoetz7mo20

By "just thinking about IRL", do you mean "just thinking about the robot using IRL to learn what humans want"? 'Coz that isn't alignment.

'But potentially a problem with more abstract cashings-out of the idea "learn human values and then want that"' is what I'm talking about, yes. But it also seems to be what you're talking about in your last paragraph.

"Human wants cookie" is not a full-enough understanding of what the human really wants, and under what conditions, to take intelligent actions to help the human. A robot learning that would act like a paper-clipper, but with cookies. It isn't clear whether a robot which hasn't resolved the de dicto / de re / de se distinction in what the human wants will be able to do more good than harm in trying to satisfy human desires, nor what will happen if a robot learns that humans are using de se justifications.

Here's another way of looking at that "nor what will happen if" clause: We've been casually tossing about the phrase "learn human values" for a long time, but that isn't what the people who say that want. If AI learned human values, it would treat humans the way humans treat cattle. But if the AI is to learn to desire to help humans satisfy their wants, it isn't clear that the AI can (A) internalize human values enough to understand and effectively optimize for them, while at the same time (B) keeping those values compartmentalized from its own values, which make it enjoy helping humans with their problems. To do that the AI would need to want to propagate and support human values that it disagrees with. It isn't clear that that's something a coherent, let's say "rational", agent can do.

Reply

Applause Lights

PhilGoetz7mo20

How is that de re and de dicto?

Reply

On my AI Fable, and the importance of de re, de dicto, and de se reference for AI alignment

PhilGoetz7mo20

You're looking at the logical form and imagining that that's a sufficient understanding to start pursuing the goal. But it's only sufficient in toy worlds, where you have one goal at a time, and the mapping between the goal and the environment is so simple that the agent doesn't need to understand the value, or the target of "cookie", beyond "cookie" vs. "non-cookie". In the real world, the agent has many goals, and the goals will involve nebulous concepts, and have many considerations and conditions attached, eg how healthy is this cookie, how tasty is it, how hungry am I. It will need to know /why/ it, or human24, wants a cookie in order to intelligently know when to get the cookie, and to resolve conflicts between goals, and to do probability calculations which involve the degree to which different goals are correlated in the higher goals they satisfy.

There's a confounding confusion in this particular case, in which you seem to be hoping the robot will infer that the agent of the desired act is the human, both in the case of the human, and of the AI. But for values in general, we often want the AI to act in the way that the human would act, not to want the human to do something. Your posited AI would learn the goal that it wants human24 to get a cookie.

What it all boils down to is: You have to resolve the de re / de dicto / de se interpretation in order to understand what the agent wants. That means an AI also has to resolve that question in order to know what a human wants. Your intuitions about toy examples like "human 24 always wants a cookie, unconditionally, forever" will mislead you, in the ways toy-world examples misled symbolic AI researchers for 60 years.

Reply

[AN #58] Mesa optimization: what it is, and why we should care

PhilGoetz7mo20

So, "mesa" here means "tabletop", and is pronounced "MAY-suh"?

Reply